Bug 262856 - kernel panic at boot time
Summary: kernel panic at boot time
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.0-STABLE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-03-27 09:58 UTC by Friedrich Volkmann
Modified: 2024-04-23 21:15 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Friedrich Volkmann 2022-03-27 09:58:24 UTC
A few days ago, I tried to upgrade from 12-STABLE to 13-STABLE but the new kernel panicked instantly. Here's a transscript of the screen photo I made:

avail memory = 1552841136 (14889 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <ALASKA A M I>
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
FreeBSD/SMP: 1 package(s) x 4 core(s) x 2 hardware threads
kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address  = 0x28
fault code             = supervisor read data, page not present
instruction pointer    = 0x20:0xffffffff80396715
stack pointer          = 0x28:0xffffffff80e17f70
frame pointer          = 0x28:0xffffffff80e17f80
code segment           = base rx0, limit 0xfffff, type 0x1b
                       = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags       = resume, IOPL = 0
current process        = 0 ()
trap number            = 12
panic: page fault
cpuid = 0
time = 1
Uptime: 1s

original photo: http://www.volki.at/austausch/panic_2022-03/screenshot_panic.jpg
kernel: http://www.volki.at/austausch/panic_2022-03/kernel
kernel config: http://www.volki.at/austausch/panic_2022-03/CU08

Of course, I could try a GENERIC kernel and, if it works, change line by line until the panic happens again, but this is a production system so I want to avoid experiments as long as I don't know if they are necessary.
Comment 1 Krautmaster 2022-03-28 11:04:06 UTC
could try enter the bootloader, then use the commands

set hw.vga.acpi_ignore_no_vga="1"
boot

and see if its booting.
Comment 2 Friedrich Volkmann 2022-03-30 19:33:27 UTC
No. Panic again: http://www.volki.at/austausch/panic_2022-03/screenshot_panic1.jpg

Then I updated the kernel sources (apparently to post-3.1) and replaced syscons with vt. The panic screenshot now contains a few more lines: http://www.volki.at/austausch/panic_2022-03/screenshot_panic1.jpg (this time I didn't set hw.vga.acpi_ignore_no_vga="1")

It may be worth to mention that "make installkernel" finishes with an error message:
...
kldxref /boot/kernel
kldxref: error while reading /boot/kernel/iwlwifi-9000-pu-b0-jf-b0-46.ucode.ko: Bad address
kldxref: error while reading /boot/kernel/iwlwifi-9260-th-b0-jf-b0-46.ucode.ko: Bad address
--------------------------------------------------------------
>>> Installing kernel CU08 completed on Wed Mar 30 20:47:41 CEST 2022
--------------------------------------------------------------

Also suspicious: The generic kernel fails to build:

# make buildkernel KERNCONF=GENERIC
...
ctfconvert -L VERSION -g vers.o
linking kernel.full
ld: error: undefined symbol: __ashlti3
>>> referenced by sli4.c:0 (/usr/src13/sys/dev/ocs_fc/sli4.c:0)
>>>               sli4.o:(sli_cmd_reg_fcfi)
>>> referenced by sli4.c:0 (/usr/src13/sys/dev/ocs_fc/sli4.c:0)
>>>               sli4.o:(sli_cmd_reg_fcfi_mrq)
>>> referenced by sati_util.c:0 (/usr/src13/sys/dev/isci/scil/sati_util.c:0)
>>>               sati_util.o:(sati_ata_download_microcode_construct)
*** Error code 1

Stop.
make[2]: stopped in /usr/obj/usr/src13/amd64.amd64/sys/GENERIC
*** Error code 1
*** Error code 1
Comment 3 Friedrich Volkmann 2022-03-30 19:36:46 UTC
Sorry, wrong links. The new panic screenshots are:
http://www.volki.at/austausch/panic_2022-03/screenshot_panic2.jpg
http://www.volki.at/austausch/panic_2022-03/screenshot_panic3.jpg
Comment 4 Ed Maste freebsd_committer freebsd_triage 2024-04-23 12:05:23 UTC
There is insufficient information to make progress here. At a minimum you'd need to build a kernel with KDB and KDB_TRACE options to print a backtrace upon panic. If you're no longer in a position to reproduce this please add a comment to that effect.
Comment 5 Friedrich Volkmann 2024-04-23 12:48:04 UTC
I managed to resolve this for me by adding the options from the MINIMAL kernel that weren't in my custum kernel. I think the culprit was I had to add:

options        NUMA

I don't know what "Non-Uniform Memory Architecture" means and I'm not aware of having such hardware. It would be great to explain that in NOTES and mention the kernel panic at an early boot stage without that option.
Comment 6 Ed Maste freebsd_committer freebsd_triage 2024-04-23 18:31:33 UTC
(In reply to Friedrich Volkmann from comment #5)
NUMA refers to SMP (multiprocessor) systems that have different access characteristics for different memory regions. Lack of that option should (of course) not result in a panic, but if the option is always included in GENERIC and MINIMAL kernel configurations without it may not be tested regularly.
Comment 7 Friedrich Volkmann 2024-04-23 21:15:45 UTC
My mainboard is an MSI Z97S SLI Plus with Intel Z97 Express (Haswell) chipset. My guess is that the onboard GPU makes the memory architecture non-uniform by occupying some of the system RAM.