MIPS32R3 provides the EXT instruction to extract bits from
registers. As the AES table is already 1K aligned we can
use it everywhere and speedup table address calculation by
10%. Performance numbers:
decryption 16B 64B 256B 1024B 8192B
-------------------------------------------------------------------
aes-256-cbc 5636.84k 6443.26k 6689.02k 6752.94k 6766.59k bef.
aes-256-cbc 6200.31k 7195.71k 7504.30k 7585.11k 7599.45k aft.
-------------------------------------------------------------------
aes-128-cbc 7313.85k 8653.67k 9079.55k 9188.35k 9205.08k bef.
aes-128-cbc 7925.38k 9557.99k 10092.37k 10232.15k 10272.77k aft.
encryption 16B 64B 256B 1024B 8192B
-------------------------------------------------------------------
aes-256 cbc 6009.65k 6592.70k 6766.59k 6806.87k 6815.74k bef.
aes-256 cbc 6643.93k 7388.69k 7605.33k 7657.81k 7675.90k aft.
-------------------------------------------------------------------
aes-128 cbc 7862.09k 8892.48k 9214.04k 9291.78k 9311.57k bef.
aes-128 cbc 8639.29k 9881.17k 10265.86k 10363.56k 10392.92k aft.
Reviewed-by: Paul Dale <paul.dale@oracle.com>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/8206)