Motivation is similar to NO_UDBL_DIVISION.
The alternative implementation of 64-bit mult is straightforward and aims at
obvious correctness. Also, visual examination of the generate assembly show
that it's quite efficient with clang, armcc5 and arm-clang. However current
GCC generates fairly inefficient code for it.
I tried to rework the code in order to make GCC generate more efficient code.
Unfortunately the only way to do that is to get rid of 64-bit add and handle
the carry manually, but this causes other compilers to generate less efficient
code with branches, which is not acceptable from a side-channel point of view.
So let's keep the obvious code that works for most compilers and hope future
versions of GCC learn to manage registers in a sensible way in that context.
See https://bugs.launchpad.net/gcc-arm-embedded/+bug/1775263