Improve the overflow handling in rsaz_512_sqr
We have always a carry in %rcx or %rbx in range 0..2
from the previous stage, that is added to the result
of the 64-bit square, but the low nibble of any square
can only be 0, 1, 4, 9.
Therefore one "adcq $0, %rdx" can be removed.
Likewise in the ADX code we can remove one
"adcx %rbp, $out" since %rbp is always 0, and carry is
also zero, therefore that is a no-op.
Reviewed-by: Paul Dale <paul.dale@oracle.com>
(Merged from https://github.com/openssl/openssl/pull/10576)