-# (*) Note that this result is ~15% lower than result for 32-bit
-# code, meaning that it's possible to improve it, but it's
-# more than likely at the cost of the others...
+# (*) But corresponding loop has less instructions, which should have
+# positive effect on upcoming Bulldozer, which has one less ALU.
+# For reference, Intel code runs at 6.8 cpb rate on Opteron.
+# (**) Note that Core2 result is ~15% lower than corresponding result
+# for 32-bit code, meaning that it's possible to improve it,
+# but more than likely at the cost of the others (see rc4-586.pl
+# to get the idea)...