// on Itanium2! What to do? Reschedule loops for Itanium2? But then
// Itanium would exhibit anti-scalability. So I've chosen to reschedule
// for worst latency for every instruction aiming for best *all-round*
-// performance.
+// performance.
// Q. How much faster does it get?
// A. Here is the output from 'openssl speed rsa dsa' for vanilla
.global bn_sqr_words#
.proc bn_sqr_words#
.align 64
-.skip 32 // makes the loop body aligned at 64-byte boundary
+.skip 32 // makes the loop body aligned at 64-byte boundary
bn_sqr_words:
.prologue
.save ar.pfs,r2