See picture at https://people.freebsd.org/~rene/stuff/IMG_20250504_112114_565.jpg, the kernel fails to load (iiuc) after this point, toggling any of ACPI, safe mode, single user, or verbose does not help. The keyboard is unresponsive except for the power button and the fan(s) are spinning at full throttle. Reverting to an earlier boot environment (kernel) fixes things for now. This is a pkgbase FreeBSD:15:amd64/base_latest installation on a TUXEDO InfinityBook Pro Gen7 (MK1) laptop, see https://wiki.freebsd.org/Laptops/Tuxedo_InfinityBook_Pro16_Gen7_MK1 for further specifications. I'll try to bisect if that is desired.
Forgot to mention that updating the bootloader in /boot/efi/efi/freebsd to the latest version from /boot/loader.efi did not help either.
Yes, some bisection is needed to have any hope of tracking this down.
I git-bisect'ed the offending commit down to 9cdd40759617b15fdd6939d33f67aa2c9d2a6b1e Author: Chandrakanth patil <chandrakanth.patil@broadcom.com> Date: Sun Apr 27 17:40:45 2025 -0600 mpi3mr: Allow driver to be in-kenrel and add to GENERIC
Running a kernel from commit ed1b3f13e72a and loading mpi3mr from that commit using kldload works fine.
Running a kernel from f4d51d3e1a90dabbed26aacf1b58e20e23a19342 (the commit right before the problematic commit) and manually loading mpi3mr from that same revision works fine too. In my case, no devices are attached (as expected) : 27 1 0xffffffff84e1a000 1e3d0 mpi3mr.ko (/boot/kernel/mpi3mr.ko) Contains modules: Id Name 532 pci/mpi3mr
This problem seems specific to my TUXEDO laptop, a pkgbase FreeBSD:15:amd64/base_latest kernel from 75d173a84836 (so with mpi3mr built-in) boots fine on my Asus ROG GL553VW laptop (also without an mpi3mr device).
Some more data points: - building a GENERIC kernel from late yesterday src/main with nodevice mpi3mr hangs at boot - building the same GENERIC but with commit 9cdd40759617 (and 2f721943bf20) reverted boots fine.
^Triage: Cc: committer of 9cdd40759617b15fdd6939d33f67aa2c9d2a6b1e .
What happens if you run "copy_staging enable" at the loader prompt before booting the kernel?
(In reply to Mark Johnston from comment #9) That seems to fix it: rene@tuxedo:~ $ uname -a ; sysctl kern.bootfile FreeBSD tuxedo 15.0-CURRENT FreeBSD 15.0-CURRENT main-n276998-8ba4d145d351 GENERIC amd64 kern.bootfile: /boot/kernel.old/kernel rene@tuxedo:~ $ ls -l /boot/kernel.old/kernel -r--r--r-- 1 root wheel 31706024 May 5 00:55 /boot/kernel.old/kernel rene@tuxedo:~ $ kldstat -v|grep mpi3mr 192 pci/mpi3mr
(In reply to Rene Ladan from comment #10) This is mpi3mr built into the kernel itself, so no line mentioning mpi3mr.ko in the grep output.
What is the size of the kernel that works? What about the one that fails? It may be a size thing, since mpi3mr is kinda boring when there's no devices present in the system, and certainly before cninit(). And what happens if you agressively prune TUXEDO but leave mpi3mr in place. And can you get a boot verbose that includes the memory layout for a working kernel?
(In reply to Warner Losh from comment #12) Rene is running a GENERIC kernel. Since changing the EFI staging mode works around the problem, I guess the loader or early amd64 boot code have some kind of bug. It'd be useful to see output from "sysctl machdep.efi_map" and "sysctl vm.phys_segs" from a working kernel.
Working kernel (i.e. with mpi3mr inclusion from commit 9cdd40759617 and 2f721943bf20 backed out): -r--r--r-- 1 root wheel 31686840 May 5 12:54 /boot/kernel/kernel rene@tuxedo:~/oss/freebsd/ports/main $ strings /boot/kernel/kernel |grep -c mpi3mr 0 Size of failing kernel (without setting copy_staging to enable in the loader prompt), so GENERIC from pkgbase: -r--r--r-- 1 root wheel 31706024 May 5 00:55 /boot/kernel.old/kernel rene@tuxedo:~/oss/freebsd/ports/main $ strings /boot/kernel.old/kernel |grep -c mpi3mr 288 So not a significant increase, but perhaps crossing a magic boundary? I'll get the bootverbose output later.
Created attachment 260215 [details] output of machdep.efi_map
Created attachment 260216 [details] output of vm.phys_segs
The attachments in #15 and #16 are from *this* kernel (ie. GENERIC from pkgbase with mpi3mr built-in and copy_staging enabled in loader)
(In reply to Rene Ladan from comment #14) Hmm, got confused, the first kernel listed is a working one but with mpi3mr inclusion backed out.
May be try to increase the slop.
Rene, can you try booting a kernel after running "staging_slop 16777216" at the loader prompt? Don't change the copy_staging mode this time.
(In reply to Mark Johnston from comment #20) Just increasing staging_slop to 16M works too.
Hmmmm... any chance we can calculate the slop better or move whatever its stepping on?
This morning (i.e. 30 minutes ago) the laptop would not boot the modified kernel either unless I enabled copy_staging at the loader (increasing the slop did not work). The kernel and modules have not changed since 2025-05-05
(In reply to Rene Ladan from comment #23) Computers do not behave this way. The best I can suggest is to investigate if your machine has some 'enterprise' management features like Intel vPro or an AMD equivalent. Typically they are, disabled by default. With the mgmt enabled, you get the remote serial port, useful as the early console, which might be used to understand why the kernel or loader fails during hand-off or early boot.
(In reply to Konstantin Belousov from comment #24) Less satisfying and much more labor-intensive option is to systematically insert an infinite loop in the consecutive locations on the bootstrap path (mostly hammer_time()) to identify the line where reboot is converted into hang.
The other alternative is to ignore more of the EFI memory types since it's well known that EFI implentations keep using things too long. If that's the root cause, that will be a big problem... i can put together a patch
(In reply to Warner Losh from comment #26) Could you please explain how ignoring EFI memory types should help? It might indeed disturb the memory map enough to get the temporal relief, but otherwise I do not see it.
(In reply to Konstantin Belousov from comment #24) I looked through the Setup menu options but did not see anything to enable management mode to enable serial console access.