Bug 286579 - [boot] [mpi3mr?] kernel fails to boot after commit 9cdd40759617
Summary: [boot] [mpi3mr?] kernel fails to boot after commit 9cdd40759617
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 15.0-CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords: regression
Depends on:
Blocks:
 
Reported: 2025-05-04 15:19 UTC by Rene Ladan
Modified: 2025-05-14 14:58 UTC (History)
3 users (show)

See Also:


Attachments
output of machdep.efi_map (5.59 KB, text/plain)
2025-05-06 20:29 UTC, Rene Ladan
no flags Details
output of vm.phys_segs (902 bytes, text/plain)
2025-05-06 20:30 UTC, Rene Ladan
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Rene Ladan freebsd_committer freebsd_triage 2025-05-04 15:19:28 UTC
See picture at https://people.freebsd.org/~rene/stuff/IMG_20250504_112114_565.jpg, the kernel fails to load (iiuc) after this point, toggling any of ACPI, safe mode, single user, or verbose does not help. The keyboard is unresponsive except for the power button and the fan(s) are spinning at full throttle.

Reverting to an earlier boot environment (kernel) fixes things for now.

This is a pkgbase FreeBSD:15:amd64/base_latest installation on a TUXEDO InfinityBook Pro Gen7 (MK1) laptop, see https://wiki.freebsd.org/Laptops/Tuxedo_InfinityBook_Pro16_Gen7_MK1 for further specifications. I'll try to bisect if that is desired.
Comment 1 Rene Ladan freebsd_committer freebsd_triage 2025-05-04 15:25:42 UTC
Forgot to mention that updating the bootloader in /boot/efi/efi/freebsd to the latest version from /boot/loader.efi did not help either.
Comment 2 Mark Johnston freebsd_committer freebsd_triage 2025-05-04 15:26:57 UTC
Yes, some bisection is needed to have any hope of tracking this down.
Comment 3 Rene Ladan freebsd_committer freebsd_triage 2025-05-04 19:15:27 UTC
I git-bisect'ed the offending commit down to 9cdd40759617b15fdd6939d33f67aa2c9d2a6b1e

Author: Chandrakanth patil <chandrakanth.patil@broadcom.com>
Date:   Sun Apr 27 17:40:45 2025 -0600

    mpi3mr: Allow driver to be in-kenrel and add to GENERIC
Comment 4 Rene Ladan freebsd_committer freebsd_triage 2025-05-04 19:25:15 UTC
Running a kernel from commit ed1b3f13e72a and loading mpi3mr from that commit using kldload works fine.
Comment 5 Rene Ladan freebsd_committer freebsd_triage 2025-05-04 19:43:37 UTC
Running a kernel from f4d51d3e1a90dabbed26aacf1b58e20e23a19342 (the commit right before the problematic commit) and manually loading mpi3mr from that same revision works fine too. In my case, no devices are attached (as expected) :

27    1 0xffffffff84e1a000    1e3d0 mpi3mr.ko (/boot/kernel/mpi3mr.ko)
	Contains modules:
		 Id Name
		532 pci/mpi3mr
Comment 6 Rene Ladan freebsd_committer freebsd_triage 2025-05-04 19:56:48 UTC
This problem seems specific to my TUXEDO laptop, a pkgbase FreeBSD:15:amd64/base_latest kernel from 75d173a84836 (so with mpi3mr built-in) boots fine on my Asus ROG GL553VW laptop (also without an mpi3mr device).
Comment 7 Rene Ladan freebsd_committer freebsd_triage 2025-05-05 11:44:33 UTC
Some more data points:
- building a GENERIC kernel from late yesterday src/main with nodevice mpi3mr hangs at boot
- building the same GENERIC but with commit 9cdd40759617 (and 2f721943bf20) reverted boots fine.
Comment 8 Mark Linimon freebsd_committer freebsd_triage 2025-05-06 02:55:00 UTC
^Triage: Cc: committer of 9cdd40759617b15fdd6939d33f67aa2c9d2a6b1e .
Comment 9 Mark Johnston freebsd_committer freebsd_triage 2025-05-06 17:09:38 UTC
What happens if you run "copy_staging enable" at the loader prompt before booting the kernel?
Comment 10 Rene Ladan freebsd_committer freebsd_triage 2025-05-06 19:29:55 UTC
(In reply to Mark Johnston from comment #9)

That seems to fix it:

rene@tuxedo:~ $ uname -a ; sysctl kern.bootfile
FreeBSD tuxedo 15.0-CURRENT FreeBSD 15.0-CURRENT main-n276998-8ba4d145d351 GENERIC amd64
kern.bootfile: /boot/kernel.old/kernel
rene@tuxedo:~ $ ls -l /boot/kernel.old/kernel
-r--r--r--  1 root wheel 31706024 May  5 00:55 /boot/kernel.old/kernel
rene@tuxedo:~ $ kldstat -v|grep mpi3mr
		192 pci/mpi3mr
Comment 11 Rene Ladan freebsd_committer freebsd_triage 2025-05-06 19:32:12 UTC
(In reply to Rene Ladan from comment #10)

This is mpi3mr built into the kernel itself, so no line mentioning mpi3mr.ko in the grep output.
Comment 12 Warner Losh freebsd_committer freebsd_triage 2025-05-06 20:11:28 UTC
What is the size of the kernel that works? What about the one that fails? It may be a size thing, since mpi3mr is kinda boring when there's no devices present in the system, and certainly before cninit().

And what happens if you agressively prune TUXEDO but leave mpi3mr in place.

And can you get a boot verbose that includes the memory layout for a working kernel?
Comment 13 Mark Johnston freebsd_committer freebsd_triage 2025-05-06 20:23:23 UTC
(In reply to Warner Losh from comment #12)
Rene is running a GENERIC kernel.  Since changing the EFI staging mode works around the problem, I guess the loader or early amd64 boot code have some kind of bug.

It'd be useful to see output from "sysctl machdep.efi_map" and "sysctl vm.phys_segs" from a working kernel.
Comment 14 Rene Ladan freebsd_committer freebsd_triage 2025-05-06 20:24:52 UTC
Working kernel (i.e. with mpi3mr inclusion from commit 9cdd40759617 and 2f721943bf20 backed out):
-r--r--r--  1 root wheel 31686840 May  5 12:54 /boot/kernel/kernel
rene@tuxedo:~/oss/freebsd/ports/main $ strings /boot/kernel/kernel |grep -c mpi3mr
0

Size of failing kernel (without setting copy_staging to enable in the loader prompt), so GENERIC from pkgbase:
-r--r--r--  1 root wheel 31706024 May  5 00:55 /boot/kernel.old/kernel
rene@tuxedo:~/oss/freebsd/ports/main $ strings /boot/kernel.old/kernel |grep -c mpi3mr
288

So not a significant increase, but perhaps crossing a magic boundary?

I'll get the bootverbose output later.
Comment 15 Rene Ladan freebsd_committer freebsd_triage 2025-05-06 20:29:46 UTC
Created attachment 260215 [details]
output of machdep.efi_map
Comment 16 Rene Ladan freebsd_committer freebsd_triage 2025-05-06 20:30:54 UTC
Created attachment 260216 [details]
output of vm.phys_segs
Comment 17 Rene Ladan freebsd_committer freebsd_triage 2025-05-06 20:31:48 UTC
The attachments in #15 and #16 are from *this* kernel (ie. GENERIC from pkgbase with mpi3mr built-in and copy_staging enabled in loader)
Comment 18 Rene Ladan freebsd_committer freebsd_triage 2025-05-06 20:43:01 UTC
(In reply to Rene Ladan from comment #14)
Hmm, got confused, the first kernel listed is a working one but with mpi3mr inclusion backed out.
Comment 19 Konstantin Belousov freebsd_committer freebsd_triage 2025-05-06 23:14:31 UTC
May be try to increase the slop.
Comment 20 Mark Johnston freebsd_committer freebsd_triage 2025-05-07 12:16:51 UTC
Rene, can you try booting a kernel after running "staging_slop 16777216" at the loader prompt?  Don't change the copy_staging mode this time.
Comment 21 Rene Ladan freebsd_committer freebsd_triage 2025-05-07 19:42:32 UTC
(In reply to Mark Johnston from comment #20)

Just increasing staging_slop to 16M works too.
Comment 22 Warner Losh freebsd_committer freebsd_triage 2025-05-07 19:57:03 UTC
Hmmmm... any chance we can calculate the slop better or move whatever its stepping on?
Comment 23 Rene Ladan freebsd_committer freebsd_triage 2025-05-10 10:26:57 UTC
This morning (i.e. 30 minutes ago) the laptop would not boot the modified kernel either unless I enabled copy_staging at the loader (increasing the slop did not work). The kernel and modules have not changed since 2025-05-05
Comment 24 Konstantin Belousov freebsd_committer freebsd_triage 2025-05-10 14:51:31 UTC
(In reply to Rene Ladan from comment #23)
Computers do not behave this way.

The best I can suggest is to investigate if your machine has some 'enterprise'
management features like Intel vPro or an AMD equivalent.  Typically they are,
disabled by default.  With the mgmt enabled, you get the remote serial port,
useful as the early console, which might be used to understand why the kernel
or loader fails during hand-off or early boot.
Comment 25 Konstantin Belousov freebsd_committer freebsd_triage 2025-05-10 15:14:01 UTC
(In reply to Konstantin Belousov from comment #24)
Less satisfying and much more labor-intensive option is to systematically insert
an infinite loop in the consecutive locations on the bootstrap path (mostly
hammer_time()) to identify the line where reboot is converted into hang.
Comment 26 Warner Losh freebsd_committer freebsd_triage 2025-05-10 15:39:12 UTC
The other alternative is to ignore more of the EFI memory types since it's well known that EFI implentations keep using things too long. If that's the root cause, that will be a big problem... i can put together a patch
Comment 27 Konstantin Belousov freebsd_committer freebsd_triage 2025-05-10 17:29:00 UTC
(In reply to Warner Losh from comment #26)
Could you please explain how ignoring EFI memory types should help?
It might indeed disturb the memory map enough to get the temporal relief,
but otherwise I do not see it.
Comment 28 Rene Ladan freebsd_committer freebsd_triage 2025-05-14 14:58:18 UTC
(In reply to Konstantin Belousov from comment #24)

I looked through the Setup menu options but did not see anything to enable management mode to enable serial console access.