# CBC en-/decrypt CTR XTS
# POWER8[le] 3.96/0.72 0.74 1.1
# POWER8[be] 3.75/0.65 0.66 1.0
-# POWER9[le] 3.05/0.65 0.65 0.80
+# POWER9[le] 4.02/0.86 0.84 1.05
+# POWER9[be] 3.99/0.78 0.79 0.97
$flavour = shift;
# PPC970/G5 9.29/+160% ?
# POWER7 8.62/+61% 3.38
# POWER8 8.70/+51% 3.36
-# POWER9 6.61/+29% 3.30(*)
+# POWER9 8.80/+29% 4.50(*)
#
# (*) this is trade-off result, it's possible to improve it, but
# then it would negatively affect all others;
# 2x aggregated reduction improves performance by 50% (resulting
# performance on POWER8 is 1 cycle per processed byte), and 4x
# aggregated reduction - by 170% or 2.7x (resulting in 0.55 cpb).
-# POWER9 delivers 0.40 cpb.
+# POWER9 delivers 0.51 cpb.
$flavour=shift;
$output =shift;
# PPC970 7.00/+114% 3.51/+205%
# POWER7 3.75/+260% 1.93/+100%
# POWER8 - 2.03/+200%
-# POWER9 - 1.56/+150%
+# POWER9 - 2.00/+150%
#
# Do we need floating-point implementation for PPC? Results presented
# in poly1305_ieee754.c are tricky to compare to, because they are for
# PPC970 6.03/+80%
# POWER7 3.50/+30%
# POWER8 3.75/+10%
-# POWER9 2.80/+12%
$flavour = shift;
* POWER6 4.92
* POWER7 4.50
* POWER8 4.10
- * POWER9 3.14
*
* z10 11.2
* z196+ 7.30
# PPC970/G5 14.6/+120%
# POWER7 10.3/+100%
# POWER8 11.5/+85%
-# POWER9 7.2/+45%
+# POWER9 9.4/+45%
#
# (*) Corresponds to SHA3-256. Percentage after slash is improvement
# over gcc-4.x-generated KECCAK_1X_ALT code. Newer compilers do
# buffer for r=1088, which matches SHA3-256. This is 17% better than
# scalar PPC64 code. It probably should be noted that if POWER8's
# successor can achieve higher scalar instruction issue rate, then
-# this module will loose... And it does on POWER9 with 8.8 vs. 7.2.
+# this module will loose... And it does on POWER9 with 12.0 vs. 9.4.
$flavour = shift;
# build of sha512-ppc.pl, presented for reference.
#
# POWER8 POWER9
-# SHA256 9.9 [15.8] 9.2 [9.3]
-# SHA512 6.3 [10.3] 5.8 [5.9]
+# SHA256 9.9 [15.8] 12.2 [12.5]
+# SHA512 6.3 [10.3] 7.7 [7.9]
$flavour=shift;
$output =shift;