the braf instruction's destination register is an offset from the
address of the braf instruction plus 4 (or equivalently, the address
of the next instruction after the delay slot). the code for dlsym was
incorrectly computing the offset to pass using the address of the
delay slot itself. in other places, a label was placed after the delay
slot, but I find this confusing. putting the label on the branch
instruction itself, and manually adding 4, makes it more clear which
branch the offset in the constant pool goes with.
.type dlsym, @function
dlsym:
mov.l L1, r0
- braf r0
-1: mov.l @r15, r6
+1: braf r0
+ mov.l @r15, r6
.align 2
-L1: .long __dlsym@PLT-(1b-.)
+L1: .long __dlsym@PLT-(1b+4-.)