Summary: | Kernel panic on CURRENT-amd64 with ixl - possibly caused by r344062 | ||||||
---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | Jeff Pieper <jeffrey.e.pieper> | ||||
Component: | kern | Assignee: | Eric Joyner <erj> | ||||
Status: | Closed FIXED | ||||||
Severity: | Affects Some People | CC: | erj, jeffrey.e.pieper, marius, shurd | ||||
Priority: | --- | Keywords: | IntelNetworking | ||||
Version: | CURRENT | Flags: | erj:
mfc-stable12+
|
||||
Hardware: | amd64 | ||||||
OS: | Any | ||||||
Attachments: |
|
ixl needs to be loaded as a module after boot to get a dump. When loaded at boot, the console locks up after the panic. Also, when ixl is loaded at boot, the panic occurs before swap is mounted. Yes, r344062 uncovered this bug and the patch in attachment 202007 [details] is the
correct fix. Nevertheless, I'll add sanity checks to iflib_irq_set_affinity()
so this kind of bug will cause a graceful failure in future instead of a panic
due to an assertion firing.
A commit references this bug: Author: erj Date: Thu Feb 14 18:02:37 UTC 2019 New revision: 344132 URL: https://svnweb.freebsd.org/changeset/base/344132 Log: ixl: Fix panic caused by bug exposed by r344062 Don't use a struct if_irq for IFLIB_INTR_IOV type interrupts since that results in get_core_offset() being called on them, and get_core_offset() doesn't handle IFLIB_INTR_IOV type interrupts, which results in an assert() being triggered in iflib_irq_set_affinity(). PR: 235730 Reported by: Jeffrey Pieper <jeffrey.e.pieper@intel.com> MFC after: 1 day Sponsored by: Intel Corporation Changes: head/sys/dev/ixl/if_ixl.c head/sys/dev/ixl/ixl_pf.h A commit references this bug: Author: erj Date: Fri Feb 15 19:13:11 UTC 2019 New revision: 344163 URL: https://svnweb.freebsd.org/changeset/base/344163 Log: MFC r344132: ixl: Fix panic caused by bug exposed by r344062 Don't use a struct if_irq for IFLIB_INTR_IOV type interrupts since that results in get_core_offset() being called on them, and get_core_offset() doesn't handle IFLIB_INTR_IOV type interrupts, which results in an assert() being triggered in iflib_irq_set_affinity(). PR: 235730 Reported by: Jeffrey Pieper <jeffrey.e.pieper@intel.com> Sponsored by: Intel Corporation Changes: _U stable/12/ stable/12/sys/dev/ixl/if_ixl.c stable/12/sys/dev/ixl/ixl_pf.h |
Created attachment 202007 [details] ixl panic fix The ixl driver panics at attach with GENERIC: Unread portion of the kernel message buffer: ixl0: <Intel(R) Ethernet Connection X722 for 10GBASE-T - 2.1.0-k> mem 0xa1000000-0xa1ffffff,0xa3018000-0xa301ffff at device 0.0 numa-domain 0 on pci6 ixl0: fw 3.1.52953 api 1.5 nvm 3.04 etid 800008c9 oem 1.262.0 ixl0: PF-ID[0]: VFs 32, MSI-X 129, VF MSI-X 5, QPs 384, MDIO shared ixl0: Using 1024 tx descriptors and 1024 rx descriptors ixl0: queue equality override not set, capping rx_queues at 28 and tx_queues at 28 ixl0: Using 28 rx queues 28 tx queues ixl0: Using MSI-X interrupts with 29 vectors panic: Assertion tid >= 0 failed at /diskless/os/FreeBSD/13.0-CURRENT_latest/usr/src/sys/net/iflib.c:5652 cpuid = 43 time = 1550084681 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00e8552750 vpanic() at vpanic+0x1b4/frame 0xfffffe00e85527b0 panic() at panic+0x43/frame 0xfffffe00e8552810 iflib_irq_set_affinity() at iflib_irq_set_affinity+0x437/frame 0xfffffe00e85528c0 iflib_softirq_alloc_generic() at iflib_softirq_alloc_generic+0xd1/frame 0xfffffe00e8552910 ixl_if_msix_intr_assign() at ixl_if_msix_intr_assign+0xf2/frame 0xfffffe00e8552980 iflib_device_register() at iflib_device_register+0x972/frame 0xfffffe00e8552ce0 iflib_device_attach() at iflib_device_attach+0xb7/frame 0xfffffe00e8552d10 device_attach() at device_attach+0x3ea/frame 0xfffffe00e8552d50 device_probe_and_attach() at device_probe_and_attach+0x71/frame 0xfffffe00e8552d80 pci_driver_added() at pci_driver_added+0xe6/frame 0xfffffe00e8552dc0 devclass_driver_added() at devclass_driver_added+0x7a/frame 0xfffffe00e8552e00 devclass_add_driver() at devclass_add_driver+0x189/frame 0xfffffe00e8552e40 module_register_init() at module_register_init+0xc0/frame 0xfffffe00e8552e70 linker_load_module() at linker_load_module+0xb78/frame 0xfffffe00e8553190 kern_kldload() at kern_kldload+0xef/frame 0xfffffe00e85531e0 sys_kldload() at sys_kldload+0x5b/frame 0xfffffe00e8553210 amd64_syscall() at amd64_syscall+0x276/frame 0xfffffe00e8553330 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe00e8553330 --- syscall (304, FreeBSD ELF64, sys_kldload), rip = 0x8002dc96a, rsp = 0x7fffffffe3d8, rbp = 0x7fffffffe950 --- KDB: enter: panic cpustop_handler () at /usr/src/sys/x86/x86/mp_x86.c:1406 1406 CPU_SET_ATOMIC(cpu, &stopped_cpus); (kgdb) This is possibly due to r344062, which touched the iflib taskqueue code. We do not see this with GENERIC-NODEBUG. Attached is a possible fix thanks to erj.