Bug 235730 - Kernel panic on CURRENT-amd64 with ixl - possibly caused by r344062
Summary: Kernel panic on CURRENT-amd64 with ixl - possibly caused by r344062
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Some People
Assignee: Eric Joyner
URL:
Keywords: IntelNetworking
Depends on:
Blocks:
 
Reported: 2019-02-14 04:58 UTC by Jeff Pieper
Modified: 2019-02-15 21:13 UTC (History)
4 users (show)

See Also:
erj: mfc-stable12+


Attachments
ixl panic fix (838 bytes, patch)
2019-02-14 04:58 UTC, Jeff Pieper
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jeff Pieper 2019-02-14 04:58:15 UTC
Created attachment 202007 [details]
ixl panic fix

The ixl driver panics at attach with GENERIC:

Unread portion of the kernel message buffer:
ixl0: <Intel(R) Ethernet Connection X722 for 10GBASE-T - 2.1.0-k> mem 0xa1000000-0xa1ffffff,0xa3018000-0xa301ffff at device 0.0 numa-domain 0 on pci6
ixl0: fw 3.1.52953 api 1.5 nvm 3.04 etid 800008c9 oem 1.262.0
ixl0: PF-ID[0]: VFs 32, MSI-X 129, VF MSI-X 5, QPs 384, MDIO shared
ixl0: Using 1024 tx descriptors and 1024 rx descriptors
ixl0: queue equality override not set, capping rx_queues at 28 and tx_queues at 28
ixl0: Using 28 rx queues 28 tx queues
ixl0: Using MSI-X interrupts with 29 vectors
panic: Assertion tid >= 0 failed at /diskless/os/FreeBSD/13.0-CURRENT_latest/usr/src/sys/net/iflib.c:5652
cpuid = 43
time = 1550084681
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00e8552750
vpanic() at vpanic+0x1b4/frame 0xfffffe00e85527b0
panic() at panic+0x43/frame 0xfffffe00e8552810
iflib_irq_set_affinity() at iflib_irq_set_affinity+0x437/frame 0xfffffe00e85528c0
iflib_softirq_alloc_generic() at iflib_softirq_alloc_generic+0xd1/frame 0xfffffe00e8552910
ixl_if_msix_intr_assign() at ixl_if_msix_intr_assign+0xf2/frame 0xfffffe00e8552980
iflib_device_register() at iflib_device_register+0x972/frame 0xfffffe00e8552ce0
iflib_device_attach() at iflib_device_attach+0xb7/frame 0xfffffe00e8552d10
device_attach() at device_attach+0x3ea/frame 0xfffffe00e8552d50
device_probe_and_attach() at device_probe_and_attach+0x71/frame 0xfffffe00e8552d80
pci_driver_added() at pci_driver_added+0xe6/frame 0xfffffe00e8552dc0
devclass_driver_added() at devclass_driver_added+0x7a/frame 0xfffffe00e8552e00
devclass_add_driver() at devclass_add_driver+0x189/frame 0xfffffe00e8552e40
module_register_init() at module_register_init+0xc0/frame 0xfffffe00e8552e70
linker_load_module() at linker_load_module+0xb78/frame 0xfffffe00e8553190
kern_kldload() at kern_kldload+0xef/frame 0xfffffe00e85531e0
sys_kldload() at sys_kldload+0x5b/frame 0xfffffe00e8553210
amd64_syscall() at amd64_syscall+0x276/frame 0xfffffe00e8553330
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe00e8553330
--- syscall (304, FreeBSD ELF64, sys_kldload), rip = 0x8002dc96a, rsp = 0x7fffffffe3d8, rbp = 0x7fffffffe950 ---
KDB: enter: panic

cpustop_handler () at /usr/src/sys/x86/x86/mp_x86.c:1406
1406		CPU_SET_ATOMIC(cpu, &stopped_cpus);
(kgdb) 

This is possibly due to r344062, which touched the iflib taskqueue code. We do not see this with GENERIC-NODEBUG. Attached is a possible fix thanks to erj.
Comment 1 Jeff Pieper 2019-02-14 12:49:34 UTC
ixl needs to be loaded as a module after boot to get a dump. When loaded at boot, the console locks up after the panic. Also, when ixl is loaded at boot, the panic occurs before swap is mounted.
Comment 2 Marius Strobl freebsd_committer 2019-02-14 13:43:27 UTC
Yes, r344062 uncovered this bug and the patch in attachment 202007 [details] is the
correct fix. Nevertheless, I'll add sanity checks to iflib_irq_set_affinity()
so this kind of bug will cause a graceful failure in future instead of a panic
due to an assertion firing.
Comment 3 commit-hook freebsd_committer 2019-02-14 18:03:09 UTC
A commit references this bug:

Author: erj
Date: Thu Feb 14 18:02:37 UTC 2019
New revision: 344132
URL: https://svnweb.freebsd.org/changeset/base/344132

Log:
  ixl: Fix panic caused by bug exposed by r344062

  Don't use a struct if_irq for IFLIB_INTR_IOV type interrupts since that results
  in get_core_offset() being called on them, and get_core_offset() doesn't
  handle IFLIB_INTR_IOV type interrupts, which results in an assert() being triggered
  in iflib_irq_set_affinity().

  PR:		235730
  Reported by:	Jeffrey Pieper <jeffrey.e.pieper@intel.com>
  MFC after:	1 day
  Sponsored by:	Intel Corporation

Changes:
  head/sys/dev/ixl/if_ixl.c
  head/sys/dev/ixl/ixl_pf.h
Comment 4 commit-hook freebsd_committer 2019-02-15 19:13:40 UTC
A commit references this bug:

Author: erj
Date: Fri Feb 15 19:13:11 UTC 2019
New revision: 344163
URL: https://svnweb.freebsd.org/changeset/base/344163

Log:
  MFC r344132:

  ixl: Fix panic caused by bug exposed by r344062

  Don't use a struct if_irq for IFLIB_INTR_IOV type interrupts since that results
  in get_core_offset() being called on them, and get_core_offset() doesn't
  handle IFLIB_INTR_IOV type interrupts, which results in an assert() being triggered
  in iflib_irq_set_affinity().

  PR:		235730
  Reported by:	Jeffrey Pieper <jeffrey.e.pieper@intel.com>
  Sponsored by:	Intel Corporation

Changes:
_U  stable/12/
  stable/12/sys/dev/ixl/if_ixl.c
  stable/12/sys/dev/ixl/ixl_pf.h