Manuel Pégourié-Gonnard fb0e4f0d1a aria: optimise byte perms on Intel
(A similar commit for Arm follows.)

Use specific instructions for moving bytes around in a word. This speeds
things up, and as a side-effect, slightly lowers code size.

ARIA_P3 (aka reverse byte order) is now 1 instruction on x86, which speeds up
key schedule. (Clang 3.8 finds this but GCC 5.4 doesn't.)

I couldn't find an Intel equivalent of ARM's ret16 (aka ARIA_P1), so I made it
two instructions, which is still much better than the code generated with
the previous mask-shift-or definition, and speeds up en/decryption. (Neither
Clang 3.8 nor GCC 5.4 find this.)

Before:
O	aria.o	ins
s	7976	43,865
2	10520	37,631
3	13040	28,146

After:
O	aria.o	ins
s	7768	33,497
2	9816	28,268
3	11432	20,829

For measurement method, see previous commit:
"aria: turn macro into static inline function"
2018-02-27 12:39:12 +01:00
..
2018-02-27 12:39:12 +01:00
2017-07-27 21:44:33 +01:00
2018-02-27 12:39:12 +01:00
2018-02-27 12:39:12 +01:00
2017-02-15 09:08:26 +00:00
2017-10-10 19:04:27 +03:00
2018-02-27 12:39:12 +01:00
2018-02-27 12:39:12 +01:00
2018-02-22 10:24:30 +00:00
2018-02-22 10:24:30 +00:00
2018-02-22 10:24:30 +00:00
2015-09-04 14:21:07 +02:00
2015-09-04 14:21:07 +02:00
2017-10-29 17:53:52 +02:00
2018-02-27 12:39:12 +01:00
2018-01-29 10:24:50 +01:00
2015-09-04 14:21:07 +02:00
2018-02-06 15:59:38 +02:00
2015-09-04 14:21:07 +02:00