With the current upstream stable release of Petitboot (v1.15) FreeBSD fails to boot. The last output seen on the console comes from the bootloader itself: SIGTERM received, booting... In debugging mode the last kernel messages from the bootloader are: SIGTERM received, booting... [ 166.762499] xhci_hcd 0003:01:00.0: remove, state 1 [ 166.762554] usb usb4: USB disconnect, device number 1 [ 166.764014] xhci_hcd 0003:01:00.0: USB bus 4 deregistered [ 166.764126] xhci_hcd 0003:01:00.0: remove, state 1 [ 166.764180] usb usb3: USB disconnect, device number 1 [ 166.764261] usb 3-3: USB disconnect, device number 2 [ 166.764309] usb 3-3.1: USB disconnect, device number 4 [ 167.017917] usb 3-3.4: USB disconnect, device number 6 [ 167.194222] usb 3-4: USB disconnect, device number 3 [ 167.194271] usb 3-4.4: USB disconnect, device number 5 [ 167.340586] xhci_hcd 0003:01:00.0: USB bus 3 deregistered [ 167.340793] xhci_hcd 0001:01:00.0: remove, state 1 [ 167.340841] usb usb2: USB disconnect, device number 1 [ 167.340895] usb 2-3: USB disconnect, device number 2 [ 167.352973] xhci_hcd 0001:01:00.0: USB bus 2 deregistered [ 167.353095] xhci_hcd 0001:01:00.0: remove, state 1 [ 167.353141] usb usb1: USB disconnect, device number 1 [ 167.353200] usb 1-1: USB disconnect, device number 2 [ 167.516817] xhci_hcd 0001:01:00.0: USB bus 1 deregistered [ 168.212042] sd 0:0:0:0: [sda] Synchronizing SCSI cache [ 168.217010] kexec_core: Starting new kernel [ 168.229031] kexec: waiting for cpu 4 (physical 12) to enter 1 state [ 168.229955] kexec: waiting for cpu 6 (physical 14) to enter 1 state [ 168.230028] kexec: wait This failure is observable with the latest Raptor PNOR version 2.10. A temporarily workaround is to downgrade the platform PNOR firmware to version 2.00 until this bug is resolved.
A few notes for anyone looking at this bug: * The issue may be somewhere in kexec-lite *or* in how the FreeBSD kernel is "packaged", i.e. the exact section layout. * kexec-lite was upgraded in the new firmware, and issues were noted with kexec-lite back in the 2018 timeframe, however all previously known required patches for kexec-lite were included in the new firmware release. * kexec-lite may not receive much testing with BSD kernels * The Linux kernel was also updated to 6.6.y, and it's possible that new security features are interfering with the kexec() call when used with FreeBSD kernels.
Looks like FreeBSD is causing a checkstop: 20.47361|================================================ 20.49547|Error reported by prdf (0xE500) EID 0x900000A8 20.49548| PRD Signature : 0x70001 0xDD3F000E 20.53654| Signature Description : pu.core:k0:n0:s0:p00:c1 (COREFIR[14]) Machine check and ME = 0 Err 20.53772| UserData1 : 0x0007000100000101 20.53773| UserData2 : 0xdd3f000e00000000 20.53774|------------------------------------------------ 20.53774| Callout type : Procedure Callout 20.53775| Procedure : UNKNOWN: 0x11 20.53776| Priority : SRCI_PRIORITY_HIGH 20.53777|------------------------------------------------ 20.53778| Callout type : Hardware Callout 20.53781| Target : Physical:/Sys0/Node0/Proc0/EQ0/EX0/Core1 20.53782| Deconfig State : NO_DECONFIG 20.53782| GARD Error Type : GARD_NULL 20.53783| Priority : SRCI_PRIORITY_MED 20.53784|------------------------------------------------ 20.53785| System checkstop occurred during runtime on previous boot 20.53786|------------------------------------------------ 20.53787| Hostboot Build ID: 20.53788|================================================ From what I understand, this checkstop signature is caused by a double fault, i.e. a machine check exception triggered inside of another machine check exception handler.
After further investigation, the boot process changed somewhat with the newer kernels setting CONFIG_LOCK_DOWN_KERNEL_FORCE_INTEGRITY, meaning kexec_file_load() is used instead of kexec_load(). This is obviously something we want to support for security reasons, but it appears this also broke the FreeBSD boot process. Adding Brandon on CC...thoughts welcome!
What is different from kexec_load() to kexec_file_load()? Do you have any other insights into how the boot has changed between then and now?
(In reply to Justin Hibbits from comment #4) kexec_file_load() appears to hand all the parsing off to the Linux kernel: https://github.com/antonblanchard/kexec-lite/blob/6b0130b3c1ea489e061cda2805e6f8b68dc96a76/kexec.c#L798 The original kexec_load() function: https://github.com/antonblanchard/kexec-lite/blob/6b0130b3c1ea489e061cda2805e6f8b68dc96a76/kexec.c#L157 Either the Linux kernel is also parsing the ELF wrong (which is entirely possible!) or FreeBSD isn't 100% in compliance with the specification. I don't know which one it is but would suspect that the same bug that was present in kexec-lite [1] could easily be present in the Linux kernel somewhere... [1] https://github.com/antonblanchard/kexec-lite/commit/666c8464fd8e0ab2bc6f80aed393c97445b9a479
Does https://reviews.freebsd.org/D44015 solve this problem? Or is it for another problem? I want to make sure it's tagged appropriately.
(In reply to Justin Hibbits from comment #6) Yes, it appears it does. I have successfully booted FreeBSD with this patch on the 2.10 PNOR.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=b52dceb838116391996909ff50b49f950ee01f48 commit b52dceb838116391996909ff50b49f950ee01f48 Author: Shawn Anastasio <sanastasio@raptorengineering.com> AuthorDate: 2024-02-27 19:40:50 +0000 Commit: Justin Hibbits <jhibbits@FreeBSD.org> CommitDate: 2024-02-29 03:01:15 +0000 powerpc: Bump maximum number of FDT reserved mem entries Newer firmware on POWER systems, including v2.10 of the Talos II and Blackbird firmware can end up reserving more than 32 memory regions in the device tree, which exceeded an assumption made by ofw_machdep.c's excise_fdt_reserved(). Bump the maximum number of FDT reservations to the next power of 2 in order to fix booting on newer firmware. PR: 277097 Reviewed by: jhibbits Differential Revision: https://reviews.freebsd.org/D44015 sys/powerpc/ofw/ofw_machdep.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Can you MFC it to stable/14?
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=44aac9115f75fc98c955758c202e0025ae4a0876 commit 44aac9115f75fc98c955758c202e0025ae4a0876 Author: Shawn Anastasio <sanastasio@raptorengineering.com> AuthorDate: 2024-02-27 19:40:50 +0000 Commit: Justin Hibbits <jhibbits@FreeBSD.org> CommitDate: 2024-04-02 02:09:57 +0000 powerpc: Bump maximum number of FDT reserved mem entries Newer firmware on POWER systems, including v2.10 of the Talos II and Blackbird firmware can end up reserving more than 32 memory regions in the device tree, which exceeded an assumption made by ofw_machdep.c's excise_fdt_reserved(). Bump the maximum number of FDT reservations to the next power of 2 in order to fix booting on newer firmware. PR: 277097 Reviewed by: jhibbits Differential Revision: https://reviews.freebsd.org/D44015 (cherry picked from commit b52dceb838116391996909ff50b49f950ee01f48) sys/powerpc/ofw/ofw_machdep.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
^Triage: committed and MFCed back in 202.