ec/curve25519.c: "double" ecdhx25519 performance on 64-bit platforms.
"Double" is in quotes because improvement coefficient varies
significantly depending on platform and compiler. You're likely
to measure ~2x improvement on popular desktop and server processors,
but not so much on mobile ones, even minor regression on ARM
Cortex series. Latter is because they have rather "weak" umulh
instruction. On low-end x86_64 problem is that contemporary gcc
and clang tend to opt for double-precision shift for >>51, which
can be devastatingly slow on some processors.
Just in case for reference, trick is to use 2^51 radix [currently
only for DH].
Reviewed-by: Rich Salz <rsalz@openssl.org>