nearbyint optimization (only clear inexact when necessary)
old code saved/restored the fenv (the new code is only as slow
as that when inexact is not set before the call, but some other
flag is set and the rounding is inexact, which is rare)
before:
bench_nearbyint_exact
5000000 N 261 ns/op
bench_nearbyint_inexact_set
5000000 N 262 ns/op
bench_nearbyint_inexact_unset
5000000 N 261 ns/op
after:
bench_nearbyint_exact
10000000 N 94.99 ns/op
bench_nearbyint_inexact_set
25000000 N 65.81 ns/op
bench_nearbyint_inexact_unset
10000000 N 94.97 ns/op