inline cp15 thread pointer load in arm dynamic TLSDESC asm when possible
the indirect function call is a significant portion of the code path
for the dynamic case, and most users are probably building for ISA
levels where it can be omitted.
we could drop at least one register save/restore (lr) with this
change, and possibly another (ip) with some clever shuffling, but it's
not clear whether there's a way to do it that's not more expensive, or
whether avoiding the save/restore would have any practical effect, so
in the interest of avoiding complexity it's omitted for now.