math: new pow
from https://github.com/ARM-software/optimized-routines,
commit
04884bd04eac4b251da4026900010ea7d8850edc
The underflow exception is signaled if the result is in the subnormal
range even if the result is exact.
code size change: +3421 bytes.
benchmark on x86_64 before, after, speedup:
-Os:
pow rthruput: 102.96 ns/call 33.38 ns/call 3.08x
pow latency: 144.37 ns/call 54.75 ns/call 2.64x
-O3:
pow rthruput: 98.91 ns/call 32.79 ns/call 3.02x
pow latency: 138.74 ns/call 53.78 ns/call 2.58x