Manuel Pégourié-Gonnard 377b2b624d aria: optimize byte perms on Arm
Use specific instructions for moving bytes around in a word. This speeds
things up, and as a side-effect, slightly lowers code size.

ARIA_P3 and ARIA_P1 are now 1 single-cycle instruction each (those
instructions are available in all architecture versions starting from v6-M).
Note: ARIA_P3 was already translated to a single instruction by Clang 3.8 and
armclang 6.5, but not arm-gcc 5.4 nor armcc 5.06.

ARIA_P2 is already efficiently translated to the minimal number of
instruction (1 in ARM mode, 2 in thumb mode) by all tested compilers

Manually compiled and inspected generated code with the following compilers:
arm-gcc 5.4, clang 3.8, armcc 5.06 (with and without --gnu), armclang 6.5.

Size reduction (arm-none-eabi-gcc -march=armv6-m -mthumb -Os): 5288 -> 5044 B

Effect on executing time of self-tests on a few boards:
FRDM-K64F   (Cortex-M4):    444 ->  385 us (-13%)
LPC1768     (Cortex-M3):    488 ->  432 us (-11%)
FRDM-KL64Z  (Cortex-M0):   1429 -> 1134 us (-20%)

Measured using a config.h with no cipher mode and the following program with
aria.c and aria.h copy-pasted to the online compiler:

 #include "mbed.h"
 #include "aria.h"

int main() {
    Timer t;
    t.start();
    int ret = mbedtls_aria_self_test(0);
    t.stop();
    printf("ret = %d; time = %d us\n", ret, t.read_us());
}
2018-02-27 12:39:12 +01:00
..
2018-02-27 12:39:12 +01:00
2017-07-27 21:44:33 +01:00
2018-02-27 12:39:12 +01:00
2018-02-27 12:39:12 +01:00
2017-02-15 09:08:26 +00:00
2017-10-10 19:04:27 +03:00
2018-02-27 12:39:12 +01:00
2018-02-27 12:39:12 +01:00
2018-02-22 10:24:30 +00:00
2018-02-22 10:24:30 +00:00
2018-02-22 10:24:30 +00:00
2015-09-04 14:21:07 +02:00
2015-09-04 14:21:07 +02:00
2017-10-29 17:53:52 +02:00
2018-02-27 12:39:12 +01:00
2018-01-29 10:24:50 +01:00
2015-09-04 14:21:07 +02:00
2018-02-06 15:59:38 +02:00
2015-09-04 14:21:07 +02:00