Optimize AES-GCM implementation on aarch64
Comparing to current implementation, this change can get more
performance improved by tunning the loop-unrolling factor in
interleave implementation as well as by enabling high level parallelism.
Performance(A72)
new
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-128-gcm 113065.51k 375743.00k 848359.51k
1517865.98k
1964040.19k
1986663.77k
aes-192-gcm 110679.32k 364470.63k 799322.88k
1428084.05k
1826917.03k
1848967.17k
aes-256-gcm 104919.86k 352939.29k 759477.76k
1330683.56k
1663175.34k
1670430.72k
old
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-128-gcm 115595.32k 382348.65k 855891.29k
1236452.35k
1425670.14k
1429793.45k
aes-192-gcm 112227.02k 369543.47k 810046.55k
1147948.37k
1286288.73k
1296941.06k
aes-256-gcm 111543.90k 361902.36k 769543.59k
1070693.03k
1208576.68k
1207511.72k
Change-Id: I28a2dca85c001a63a2a942e80c7c64f7a4fdfcf7
Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
Reviewed-by: Paul Dale <paul.dale@oracle.com>
(Merged from https://github.com/openssl/openssl/pull/9818)