Bug 283285 - Kernel panic at boot on Intel Atom C3758 w/ QAT module
Summary: Kernel panic at boot on Intel Atom C3758 w/ QAT module
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 14.2-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords: crash, regression
Depends on:
Blocks:
 
Reported: 2024-12-12 17:29 UTC by Ben Schumacher
Modified: 2024-12-30 14:25 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ben Schumacher 2024-12-12 17:29:00 UTC
After attempting to upgrade my system to 14.2-RELEASE, I've encountered a crash that appears to be related to the qat.ko driver. Strangely, it seems I am able to load the module after boot, but when I have it enabled in my loader.conf, the kernel crashes.

I have not been able to successfully produce a dump, despite attempting to manualling assign a dumpdev in the loader. Also, I cannot interact with this from the console, though I don't entirely understand why, since I do have a USB keyboard attached.

This text is copied from a picture I took of my console:

Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 08
fault virtual address = 0x4
fault code instruction pointer
= supervisor read data, page. not present = 0x20:0xffffffff8087e352
stack pointer
= 0x28:0xfffffe00e1f679b0
frame pointer
= 0x28:0xfffffe00e1f67a70
code segment
= base Bx0, limit Bxfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags
= interrupt enabled, resume, TOPL = B
current process
= 0 (firmuare taskq)
rdi: fffffe00e1f67cf0 rsi: fffff80001bf8c01 rdx: fffff80001bf8c00
rcx: fffffe00e1f67d70 8: 00000000000003е3 9: 0000000000000000 rax: 0000000000000000 rbx: fffffe00e1f67cf0 rbp: fffffe00e1f67a70 r10: fffff80001c7de90 r11: 0000000000000003 r12: fffffe00e1f67a94 r13: 0000000000000000 r14: fffffe00e1f67d60 r15: fffff80001956740
trap number
= 12
panic: page fault
cpuid = 2
time = 3
KDB: stack backtrace:
80 Bxffffffff8080313d at kdb_backtrace+0x5d
#1 Bxffffffff807b6be9 at vpanic+0x169
0xffffffff807b6a73 at panic+0x43
#3 Bxffffffff80bcf0ßd at trap_fatal+#x3fd
84 0xffffffff88bcf056 at trap_pfault+0x46
15 Bxffffffff80ba9788 at calltrap+Bx8
Bxffffffff80889a14 at namei+0x104
7 Bxffffffff80Baefae at vn_open_cred+0x55e
#B Bxffffffff807fef95 at loadimage+0x235
89 0xffffffff808175c1 at taskqueue_run_locked+0x191
810 0xffffffff80818852 at taskqueue_thread_loop+Đxc2
#11 Bxffffffff80771f2f at fork_exit+@x7f
#12 Bxffffffff80baa7ee at fork_trampoline+Axe
Uptime: 3s
Automatic reboot in 15 seconds - press a key on the console to abort

I diagnosed this by commenting out all of the _load statements in my /boot/loader.conf, and then enabling them one-by-one. Leaving qat_load and qat_c3xxx_fw_load commented out allowed me to boot.

# use Intel QAT
#qat_c3xxx_fw_load="YES"    # BFS 2024-12-12 
#qat_load="YES"             # BFS 2024-12-12 

But I am able to load these modules from the command-line after boot:

$ kldload qat_c3xxx_fw
$ kldload qat
$ kldstat -v

... cut for space ...

30    1 0xffffffff83545000   122c20 qat_c3xxx_fw.ko (/boot/kernel/qat_c3xxx_fw.ko)
	Contains modules:
		 Id Name
		404 qat_c3xxx_fw_fw
31    1 0xffffffff830e3000     4390 qat.ko (/boot/kernel/qat.ko)
	Contains modules:
		 Id Name
		414 nexus/qat
32    6 0xffffffff830e8000    15dd0 qat_hw.ko (/boot/kernel/qat_hw.ko)
	Contains modules:
		 Id Name
		413 pci/qat_c4xxx
		408 pci/qat_200xx
		412 pci/qat_dh895xcc
		409 pci/qat_4xxx
		411 pci/qat_c3xxx
		407 pci/qat_c62x
		410 pci/qat_4xxxvf
33    9 0xffffffff830fe000    30010 qat_common.ko (/boot/kernel/qat_common.ko)
	Contains modules:
		 Id Name
		405 qat_common
34    8 0xffffffff8312f000    68cd8 qat_api.ko (/boot/kernel/qat_api.ko)
	Contains modules:
		 Id Name
		406 qat_api

I do have a custom kernel, though this is mostly to remove a bunch of devices that I do not use. This system acts as a NAS/VM host within my homelab. It is a Supermicro A2SDi-8C+-HLN4F with 32 GB of ECC RAM.

The QAT functionality isn't strictly required for me, so I've left the module disabled at boot, not that this machine is frequently restarted.

I'm happy to try to help further diagnose this if I can.

Thanks.
Comment 1 ss3bsd 2024-12-30 13:18:18 UTC
I encountered the same panic when I added qat_load="YES" but forgot qat_c3xxx_fw_load="YES" in /boot/loader.conf.


---
panic: page fault                                                                                   
cpuid = 6                                                                                           
time = 4                                                                                            
KDB: stack backtrace:                                                                               
#0 0xffffffff809de9ed at kdb_backtrace+0x5d                                                         
#1 0xffffffff80990d51 at vpanic+0x131                                                               
#2 0xffffffff80990c13 at panic+0x43                                                                 
#3 0xffffffff80e87a0b at trap_fatal+0x40b                                                           
#4 0xffffffff80e87a56 at trap_pfault+0x46                                                           
#5 0xffffffff80e5db98 at calltrap+0x8                                                               
#6 0xffffffff80a69604 at namei+0x104                                                                
#7 0xffffffff80a8f18a at vn_open_cred+0x53a                                                         
#8 0xffffffff809da619 at loadimage+0x239                                                            
#9 0xffffffff809f3ed2 at taskqueue_run_locked+0x182                                                 
#10 0xffffffff809f5152 at taskqueue_thread_loop+0xc2                                                
#11 0xffffffff8094a75f at fork_exit+0x7f                                                            
#12 0xffffffff80e5ebfe at fork_trampoline+0xe                                                       
Uptime: 4s                                                                                          
Automatic reboot in 15 seconds - press a key on the console to abort                                
--> Press a key on the console to reboot,                                                           
--> or switch off the system now.
Comment 2 Mark Johnston freebsd_committer freebsd_triage 2024-12-30 14:25:27 UTC
Which FreeBSD version did you upgrade from?  14.x has a different QAT driver than 13, so I'm wondering if there was a regression there.

If anyone has a dmesg from an old, successful boot with QAT loaded from loader.conf, I'd like to see it.