the instruction used to align the stack, "and $sp, $sp, -8", does not
actually exist; it's expanded to 2 instructions using the 'at'
(assembler temporary) register, and thus cannot be used in a branch
delay slot. since alignment mod 16 commutes with subtracting 8, simply
swapping these two operations fixes the problem.
crt1.o was not affected because it's still being generated from a
dedicated asm source file. dlstart.lo was not affected because the
stack pointer it receives is already aligned by the kernel. but
Scrt1.o was affected in cases where the dynamic linker gave it a
misaligned stack pointer.
" addu $5, $5, $gp \n"
" lw $25, 4($ra) \n"
" addu $25, $25, $gp \n"
-" subu $sp, $sp, 16 \n"
+" and $sp, $sp, -8 \n"
" jalr $25 \n"
-" and $sp, $sp, -8 \n"
+" subu $sp, $sp, 16 \n"
".set pop \n"
);