Created attachment 188440 [details] Save and Restore U-Boot registers. Using `ubldr.bin` I was getting intermittent behaviour depending on which options I compiled into U-Boot. Eventually I discovered that the issue is that the entry point in `sys/boot/arm/uboot/start.S` is not correctly saving and restoring registers. Most importantly the r9 register is the base of the global data and this is not typically a callee-saved register, but must be for U-Boot. There is a change to this file on April 2 by "ian" which saves the argument and return address but it doesn't go far enough. My patch is to applied to the state before that patch. Essentially the important caller-saved registers are pushed (r0, r1, r9, lr) before the relocation call, and popped after. Then r8/r9 are saved as usual for the syscall trampoline, and lr is stored in r8 (now free) as a callee-saved value before calling into `main`. The call to `main` can no longer be a tail call because we must restore r9 especially after main returns (although since we have used r8 to hold lr we must also restore this). I don't know how this affects the ELF version, but this patch is critical for the correct functioning of the binary version of UBLDR.
Oh, also note that in the previous commit from April 2, three registers are pushed, but I'm not sure that that is legal on ARMv7 due to the following advisory note: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0046b/IHI0046B_ABI_Advisory_1.pdf I don't know exactly whether that applies in this situation, but the new code pushes an even number of registers and so is aligned according to the note.
A commit references this bug: Author: ian Date: Sun Dec 10 21:51:27 UTC 2017 New revision: 326752 URL: https://svnweb.freebsd.org/changeset/base/326752 Log: Save and restore r9 register in arm ubldr. In old gcc 4.2, r9 was a callee- saved register, but in arm EABI it may be either callee-saved or dedicated to some special purpose (such as a TLS pointer). It appears clang does not treat it as a callee-saved register (instead using it as another work register, similar to r12). Another important side effect of these changes is that saving an extra register in the push/pop statements keeps the stack aligned to an 8-byte boundary during the self_reloc() call, as it always should have been. As stated in the PR... Essentially the important caller-saved registers are pushed (r0, r1, r9, lr) before the relocation call, and popped after. Then r8/r9 are saved as usual for the syscall trampoline, and lr is stored in r8 (now free) as a callee-saved value before calling into `main`. The call to `main` can no longer be a tail call because we must restore r9 especially after main returns (although since we have used r8 to hold lr we must also restore this). PR: 224008 Changes: head/stand/arm/uboot/start.S
Patch committed, thanks. Sorry it took me so long to get to this, I'm way behind. I'll close the PR after MFCing the change to 11-stable in a few days.
(In reply to Ian Lepore from comment #3) Under two weeks is a pretty good turn around. Thanks.
A commit references this bug: Author: ian Date: Mon Dec 18 17:17:07 UTC 2017 New revision: 326934 URL: https://svnweb.freebsd.org/changeset/base/326934 Log: MFC (conceptually) r326752, r326754: This is a direct commit to 11-stable, because the code has moved and the directories have been restructured in 12-current, but it just hand- applies the same patches to the corresponding files in their old locations. r326752: Save and restore r9 register in arm ubldr. In old gcc 4.2, r9 was a callee- saved register, but in arm EABI it may be either callee-saved or dedicated to some special purpose (such as a TLS pointer). It appears clang does not treat it as a callee-saved register (instead using it as another work register, similar to r12). Another important side effect of these changes is that saving an extra register in the push/pop statements keeps the stack aligned to an 8-byte boundary during the self_reloc() call, as it always should have been. As stated in the PR... Essentially the important caller-saved registers are pushed (r0, r1, r9, lr) before the relocation call, and popped after. Then r8/r9 are saved as usual for the syscall trampoline, and lr is stored in r8 (now free) as a callee-saved value before calling into `main`. The call to `main` can no longer be a tail call because we must restore r9 especially after main returns (although since we have used r8 to hold lr we must also restore this). r326754: When building for arm arches, set PKTALIGN to the max cache line size supported by the arch, to meet u-boot's requirement that I/O be done in cache-aligned chunks. PR: 223977 224008 Changes: stable/11/sys/boot/arm/uboot/start.S stable/11/sys/boot/uboot/lib/libuboot.h