modes/gcm128.c: coalesce calls to GHASH.
On contemporary platforms assembly GHASH processes multiple blocks
faster than one by one. For TLS payloads shorter than 16 bytes, e.g.
alerts, it's possible to reduce hashing operation to single call.
And for block lengths not divisible by 16 - fold two final calls to
one. Improvement is most noticeable with "reptoline", because call to
assembly GHASH is indirect.
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6312)