Hi, I builded current source code and boot failed to allocate staging area: 9. and 9 is EFI_OUT_OF_RESOURCES. I have nanopct4 Loading kernel... /boot/kernel/kernel text=0x2a8 text=0x82c250 text=0x24bb6c data=0x1b9ea8 data=0x0+0x34e000 0x8+0x132d08+0x8+0x15aaa2- Loading configured modules... can't find '/boot/entropy' can't find '/etc/hostid' Using DTB provided by EFI at 0x80e8000. EFI framebuffer information: addr, size 0x0, 0x0 dimensions 0 x 0 stride 0 masks 0x00000000, 0x00000000, 0x00000000, 0x00000000 failed to allocate staging area: 9 failed to allocate staging area: 9 failed to allocate staging area: 9 failed to allocate staging area: 9 failed to allocate staging area: 9 failed to allocate staging area: 9
Is there an ESP and if so, how recent is the loader in that partition?
(In reply to Graham Perrin from comment #1) I build image from this script https://github.com/Martinfx/BuildBoard/blob/main/create_image_nanopc_t4.sh
Same problem here. It looks like it was introduced by the llvm stack update. I narrowed it down to this commit 3a9a9c0ca44ec535dcf73fe8462bee458e54814b. The problem doesn't exist at cb2ae6163174b90e999326ecec3699ee093a5d43, the last commit before the llvm merge from vendor. In my case I'm using a RockPro64 and boot via network: Consoles: EFI console Reading loader env vars from /efi/freebsd/loader.env FreeBSD/arm64 EFI loader, Revision 1.1 (Thu May 19 20:51:18 IST 2022 root@capeta) Command line arguments: nfsroot=172.16.0.1:/ztank/rockpro64 Image base: 0xf0dd2000 EFI version: 2.80 EFI Firmware: Das U-Boot (rev 8225.1792) Console: efi (0x1000) Load Path: /boot\loader_lua.efi Load Device: /VenHw(e61d73b9-a384-4acc-aeab-82e828f3628b)/MAC(867402fbbe41,1)
The culprit is bootaa64.efi AKA loader_lua.efi. Using the older version from March 24 2022 solved the issue in my case. It was FreeBSD 14.0-CURRENT #3 main-n255718-f2ab9160844: Fri May 20 08:59:20 CEST 2022 on PINE64LTS that suffered from this issue.
(In reply to Danilo Egea Gondolfo from comment #3) This commit is works cb2ae6163174b90e999326ecec3699ee093a5d43
The file bootaa64.efi got striped a lot. The version from March 24 is 1272140 bytes long, the new version is only 1182412 bytes long. Will try to investigate further in conditions allow.
It looks like the issue is the loop in bi_load_efi_data in bootinfo.c is too smart for clang, so it gets confused and thinks efihdr and mm doesn't get initialised. This causes it to remove all the code after the getenv meaning we return from this getenv call to efi_copy_init. efi_copy_init then enters an infinite loop allocating all memory until it runs out & complains, however is still in the loop so continues to try and fail to allocate more memory. I have a local fix I'll push for review soon, but a work around for now seems to be making sure efihdr and mm are initilised to NULL before the comment starting "Matthew Garrett has observed ..."
I doubt efihdr is the problem. It's likely that the call to BS->GetMemoryMap the first time round the loop, which is guaranteed to be executed, reads an uninitialised mm, and thus we have trivially provably guaranteed UB ("The value of an object with automatic storage duration is used while it is indeterminate"). What value it takes doesn't matter as the first time round the loop we use sz = 0 so, unless the memory map has 0 entries, it's guaranteed to fit, but it must be initialised to something determinate. Minimal-ish reproducer: https://godbolt.org/z/KTvd73osd
Just FYI: My report at https://lists.freebsd.org/archives/freebsd-arm/2022-May/001354.html possibly is the same sort of issue. Both aarch64 and armv7 systems had the system-llvm14 based build of loader.efi crash very early in the boot. I reverted just the loader.efi copies that predated my progressing to the system llvm14 basis (so, back to late April) and that made things work. But amd64's update via the same source tree contents did not have problems using its llvm14 based loader.efi. (No other FreeBSD platforms around to try.) So I've only evidence for armv7 and aarch64 problems. I'll note that the aarch64 context uses an EDK2 based UEFI/ACPI and the armv7 context uses a U-Boot 2022.04 based UEFI (not ACPI). Both have been in use for some time before this upgrade activity and were not changed. My installed contexts are from non-debug builworld buildkernel based builds, despite booting main [so: 14]. I could install debug kernels and see what they report if I also put back the loader.efi copies built via llvm14. Right now that does not look to be likely to be useful. For reference for the failure contexts: aarch64: MACCHIATObin Double Shot armv7: Orange Pi+ 2ed I have access to more aarch64 contexts and a RPi2 v1.1 (armv7). But I avoided upgrading the loader.efi copies on any boot media for either type after the first one of each type got the boot failures --and I reverted on the media that showed the failure for each type.
(In reply to Mark Millard from comment #9) Looking at the information that the armv7 context reported: . . . Hit [Enter] to boot immediately, or any other key for command prompt. Booting [/boot/kernel/kernel]... Using DTB provided by EFI at 0x47edf000. Kernel entry at 0xb2e00200... Kernel args: (null) undefined instruction pc : [<b8dd34a4>] lr : [<b8e3128c>] reloc pc : [<44e3f4a4>] lr : [<44e9d28c>] sp : b9f6a328 ip : b69e1c00 fp : b9f6a368 r10: b9f6a374 r9 : 00000000 r8 : b8f1f11c r7 : c0e03000 r6 : 00008000 r5 : b6981500 r4 : 00000000 r3 : 00000065 r2 : 00000076 r1 : b8f1b847 r0 : 00000000 Flags: nZCv IRQs off FIQs off Mode SVC_32 Code: e08f0000 e1a0e00f ea01776e 00144492 (00146ddf) UEFI image [0xb8dd3000:0xb8f2632b] pc=0x4a4 '/efi\boot\bootarm.efi' Resetting CPU ... That Code sequence appears at: Disassembly of section .text: 00000018 <efi_start>: . . . 00000178 <bi_load>: . . . 4ac: e08f0000 add r0, pc, r0 4b0: e1a0e00f mov lr, pc 4b4: ea01776e b 5e274 <getenv> 4b8: 00144492 .word 0x00144492 . . . It is also the only place in: stand/efi/loader_lua/loader_lua.sym.full with that code sequence. Showing some more context, including 0x4a4: . . . 484: eb003ae5 bl f020 <file_addmetadata> 488: e59f3058 ldr r3, [pc, #88] ; 4e8 <bi_load+0x370> 48c: e1a00005 mov r0, r5 490: e3a0100c mov r1, #12 494: e3a02004 mov r2, #4 498: e79f3003 ldr r3, [pc, r3] 49c: eb003adf bl f020 <file_addmetadata> 4a0: e1a00005 mov r0, r5 4a4: eb01fec5 bl 7ffc0 <geli_export_key_metadata> 4a8: e59f003c ldr r0, [pc, #60] ; 4ec <bi_load+0x374> 4ac: e08f0000 add r0, pc, r0 4b0: e1a0e00f mov lr, pc 4b4: ea01776e b 5e274 <getenv> 4b8: 00144492 .word 0x00144492 4bc: 00146ddf .word 0x00146ddf 4c0: 00146dd2 .word 0x00146dd2 4c4: 0014402f .word 0x0014402f 4c8: 0014fb74 .word 0x0014fb74 4cc: 0014fad8 .word 0x0014fad8 4d0: 00147610 .word 0x00147610 4d4: 0014a23b .word 0x0014a23b 4d8: 0014a367 .word 0x0014a367 4dc: 0014b6c4 .word 0x0014b6c4 4e0: 00144892 .word 0x00144892 4e4: 001428f8 .word 0x001428f8 4e8: 0014f8e4 .word 0x0014f8e4 4ec: 001483ab .word 0x001483ab 4f0: 00144c81 .word 0x00144c81 4f4: 001484fc .word 0x001484fc 000004f8 <efi_copy_init>: . . . The bl to bi_load is in: 0000a2b4 <elf32_arm_exec>: a2b4: e92d4830 push {r4, r5, fp, lr} a2b8: e28db008 add fp, sp, #8 a2bc: e24dd008 sub sp, sp, #8 a2c0: e3a01002 mov r1, #2 a2c4: e1a05000 mov r5, r0 a2c8: eb00173d bl ffc4 <file_findmetadata> a2cc: e3500000 cmp r0, #0 a2d0: 0a000016 beq a330 <elf32_arm_exec+0x7c> a2d4: e1a04000 mov r4, r0 a2d8: eb013172 bl 568a8 <efi_time_fini> a2dc: e5940024 ldr r0, [r4, #36] ; 0x24 a2e0: ebffd8b0 bl 5a8 <efi_translate> a2e4: e1a04000 mov r4, r0 a2e8: e59f006c ldr r0, [pc, #108] ; a35c <elf32_arm_exec+0xa8> a2ec: e1a01004 mov r1, r4 a2f0: e08f0000 add r0, pc, r0 a2f4: eb01527f bl 5ecf8 <printf> a2f8: e5951008 ldr r1, [r5, #8] a2fc: e59f005c ldr r0, [pc, #92] ; a360 <elf32_arm_exec+0xac> a300: e08f0000 add r0, pc, r0 a304: eb01527b bl 5ecf8 <printf> a308: e5950008 ldr r0, [r5, #8] a30c: e28d1004 add r1, sp, #4 a310: e1a0200d mov r2, sp a314: e3a03001 mov r3, #1 a318: ebffd796 bl 178 <bi_load> a31c: e3500000 cmp r0, #0 a320: 0a000006 beq a340 <elf32_arm_exec+0x8c> a324: e1a05000 mov r5, r0 a328: eb013132 bl 567f8 <efi_time_init> a32c: ea000000 b a334 <elf32_arm_exec+0x80> a330: e3a0504f mov r5, #79 ; 0x4f a334: e1a00005 mov r0, r5 a338: e24bd008 sub sp, fp, #8 a33c: e8bd8830 pop {r4, r5, fp, pc} a340: eb0011a5 bl e9dc <dev_cleanup> a344: e59d0004 ldr r0, [sp, #4] a348: e12fff34 blx r4 a34c: e59f0010 ldr r0, [pc, #16] ; a364 <elf32_arm_exec+0xb0> a350: e08f0000 add r0, pc, r0 a354: e1a0e00f mov lr, pc a358: ea015251 b 5eca4 <panic> a35c: 0013f3bc .word 0x0013f3bc a360: 0013b5f5 .word 0x0013b5f5 a364: 00139826 .word 0x00139826 May be the above will prompt something about the problem.
(In reply to Mark Millard from comment #10) Hmm: void geli_export_key_metadata(struct preloaded_file *kfp) { struct keybuf *keybuf; keybuf = malloc(GELI_KEYBUF_SIZE); geli_export_key_buffer(keybuf); file_addmetadata(kfp, MODINFOMD_KEYBUF, GELI_KEYBUF_SIZE, keybuf); explicit_bzero(keybuf, GELI_KEYBUF_SIZE); free(keybuf); } No possibility of malloc failure and a bad-to-use keybuf value? (But I'm not literate about the libsa expectations for how things operate.)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=0d6600b579be769b85f049ef421023316f21b5c3 commit 0d6600b579be769b85f049ef421023316f21b5c3 Author: Andrew Turner <andrew@FreeBSD.org> AuthorDate: 2022-05-21 10:45:41 +0000 Commit: Andrew Turner <andrew@FreeBSD.org> CommitDate: 2022-05-21 10:45:41 +0000 Set mm before passing it to the UEFI firmware When reading the UEFI memory map we pass in a pointer to the memory to hold the map. Unfortunately it wasn't initialised before the first use so clang decided it was undefined behaviour so the entire loop was removed. This leads to everything in bi_load after this to also be removed as dead code. The next function after bi_load in the binary is efi_copy_init. The above caused us to enter efi_copy_init with a return address of the start of the function. Because of this it would enter an infinite loop of calling the function, allocating memory, then returning to the start of the function. PR: 264021 stand/efi/loader/bootinfo.c | 1 + 1 file changed, 1 insertion(+)
(In reply to commit-hook from comment #12) It fixed the problem for me. Thanks, Andrew.
(In reply to commit-hook from comment #12) The updating to be based on a vintage that includes the commit lead to the loader.efi from the update working fine for aarch64 and for armv7 in my context, taking care of what I'd reported on the arm list and here. Thanks Andrew. Side note: geli_export_key_metadata still assumes that: keybuf = malloc(GELI_KEYBUF_SIZE); will return keybuf != NULL, so pointing to usable memory. But I looked and it appeared that for the libsa context malloc is structured to be able to return NULL, even though in my contexts it does not for this call.
(In reply to Jessica Clarke from comment #8) Could you include the reproducer from comment 8 as an attachment here please, and might it be usable as a base for a regression test?
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=e7219c3818d1554830555278a32934c9fb4e7ac3 commit e7219c3818d1554830555278a32934c9fb4e7ac3 Author: Andrew Turner <andrew@FreeBSD.org> AuthorDate: 2022-05-21 10:45:41 +0000 Commit: Andrew Turner <andrew@FreeBSD.org> CommitDate: 2022-06-07 14:20:18 +0000 Set mm before passing it to the UEFI firmware When reading the UEFI memory map we pass in a pointer to the memory to hold the map. Unfortunately it wasn't initialised before the first use so clang decided it was undefined behaviour so the entire loop was removed. This leads to everything in bi_load after this to also be removed as dead code. The next function after bi_load in the binary is efi_copy_init. The above caused us to enter efi_copy_init with a return address of the start of the function. Because of this it would enter an infinite loop of calling the function, allocating memory, then returning to the start of the function. PR: 264021 (cherry picked from commit 0d6600b579be769b85f049ef421023316f21b5c3) stand/efi/loader/bootinfo.c | 1 + 1 file changed, 1 insertion(+)