fix thread structure/dtv-pointer corruption on powerpc
per the powerpc psabi, offset 4 of the stack at call time belongs to
the callee and is used for spilling lr (return address). in addition,
offset 0 on the stack must contain a pointer to the previous stack
frame, or a null pointer for the initial stack frame of a thread.
__clone failed to setup any stack frame on the new thread's stack,
thereby allowing the start function it called to clobber offset 4 of
the new thread's struct __pthread, which contains the dtv pointer.
add code to setup a proper stack frame and align the stack pointer to
a multiple of 16 (also an abi requirement) if it was not already
aligned.