Bug 216539 - page fault in t4iov_attach during boot, on Dell CS23-SH motherboards
Summary: page fault in t4iov_attach during boot, on Dell CS23-SH motherboards
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Some People
Assignee: John Baldwin
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-01-27 23:58 UTC by Alan Somers
Modified: 2017-02-03 21:38 UTC (History)
4 users (show)

See Also:


Attachments
iov_attach.patch (1.23 KB, patch)
2017-01-30 17:28 UTC, John Baldwin
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alan Somers freebsd_committer freebsd_triage 2017-01-27 23:58:25 UTC
Dell C23-SH motherboards, on recent versions of stable/11 and head, will panic with a page fault in t4iov_attach if T4-related modules are loaded at boot time.  They do not panic if the T4-related modules are loaded later.  Notably, these machines don't even have a T4 card installed.  Also of note, hw.pci.enable_pcie_hp=0 is set in loader.conf to workaround PR211699.  A SuperMicro Sandy Bridge-based system with the same loader.conf settings running the same version of head didn't panic.

The machine is running FreeBSD 12.0-CURRENT r311461.  It's loader.conf contains:
kern.geom.label.disk_ident.enable="0"
kern.geom.label.gptid.enable="0"
kern.geom.label.disk_ident.enable="0"
kern.geom.label.gptid.enable="0"
zfs_load="YES"
if_cxgbe_load="YES"
t4_fw_load="YES"
t4_tom_load="YES"
t5_fw_load="YES"
hw.pci.enable_pcie_hp=0

And the panic string is:
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x0
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff817b269a
stack pointer           = 0x28:0xffffffff81bb57d0
frame pointer           = 0x28:0xffffffff81bb5820
code segment            = base rx0, limit 0xfffff, type 0x1b
                         = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (swapper)
trap number             = 12
panic: page fault
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at 0xffffffff803301fb = db_trace_self_wrapper+0x2b/frame 0xffffffff81bb5360
vpanic() at 0xffffffff804f3512 = vpanic+0x182/frame 0xffffffff81bb53e0
panic() at 0xffffffff804f3383 = panic+0x43/frame 0xffffffff81bb5440
trap_fatal() at 0xffffffff808181d1 = trap_fatal+0x351/frame 0xffffffff81bb54a0
trap_pfault() at 0xffffffff808183c3 = trap_pfault+0x1e3/frame 0xffffffff81bb5500
trap() at 0xffffffff8081794c = trap+0x26c/frame 0xffffffff81bb5710
calltrap() at 0xffffffff807fbc01 = calltrap+0x8/frame 0xffffffff81bb5710
--- trap 0xc, rip = 0xffffffff817b269a, rsp = 0xffffffff81bb57e0, rbp = 0xffffffff81bb5820 ---
t4iov_attach() at 0xffffffff817b269a = t4iov_attach+0x14a/frame 0xffffffff81bb5820
device_attach() at 0xffffffff80530760 = device_attach+0x420/frame 0xffffffff81bb5880
bus_generic_attach() at 0xffffffff8053194d = bus_generic_attach+0x2d/frame 0xffffffff81bb58a0
pci_attach() at 0xffffffff80390125 = pci_attach+0xd5/frame 0xffffffff81bb58e0
device_attach() at 0xffffffff80530760 = device_attach+0x420/frame 0xffffffff81bb5940
bus_generic_attach() at 0xffffffff8053194d = bus_generic_attach+0x2d/frame 0xffffffff81bb5960
acpi_pcib_acpi_attach() at 0xffffffff8034806a = acpi_pcib_acpi_attach+0x3ba/frame 0xffffffff81bb59d0
device_attach() at 0xffffffff80530760 = device_attach+0x420/frame 0xffffffff81bb5a30
bus_generic_attach() at 0xffffffff8053194d = bus_generic_attach+0x2d/frame 0xffffffff81bb5a50
acpi_attach() at 0xffffffff8033af1f = acpi_attach+0xdbf/frame 0xffffffff81bb5b10
device_attach() at 0xffffffff80530760 = device_attach+0x420/frame 0xffffffff81bb5b70
bus_generic_attach() at 0xffffffff8053194d = bus_generic_attach+0x2d/frame 0xffffffff81bb5b90
nexus_acpi_attach() at 0xffffffff807f7363 = nexus_acpi_attach+0x73/frame 0xffffffff81bb5bc0
device_attach() at 0xffffffff80530760 = device_attach+0x420/frame 0xffffffff81bb5c20
bus_generic_new_pass() at 0xffffffff80531f89 = bus_generic_new_pass+0xe9/frame 0xffffffff81bb5c50
bus_set_pass() at 0xffffffff8052e3cc = bus_set_pass+0x8c/frame 0xffffffff81bb5c80
configure() at 0xffffffff8087eaa9 = configure+0x9/frame 0xffffffff81bb5c90
mi_startup() at 0xffffffff8047b3a8 = mi_startup+0x118/frame 0xffffffff81bb5cb0
btext() at 0xffffffff8028802c = btext+0x2c
Uptime: 1s
Comment 1 John Baldwin freebsd_committer freebsd_triage 2017-01-30 17:26:37 UTC
Hmm, I think I see the panic in that we don't check for pci_find_dbsf() returning NULL.  However, I don't see how you have a matching PCI device that gets past the probe routine.  Do you have 'pciconf -l' output?

Oh, never mind.  Somehow I missed checking the vendor ID.  Please try the attached fix.
Comment 2 John Baldwin freebsd_committer freebsd_triage 2017-01-30 17:28:27 UTC
Created attachment 179432 [details]
iov_attach.patch
Comment 3 Dave Baukus 2017-01-30 23:31:20 UTC
The proposed patch fixes the problem; the only addition I required was to include one of the files that #defines PCI_VENDOR_ID_CHELSIO
Comment 4 commit-hook freebsd_committer freebsd_triage 2017-01-31 18:54:23 UTC
A commit references this bug:

Author: jhb
Date: Tue Jan 31 18:54:14 UTC 2017
New revision: 313020
URL: https://svnweb.freebsd.org/changeset/base/313020

Log:
  Fix a couple of issues with t4iov probe and attach.

  - Check for Chelsio vendor ID in probe routines.
  - Fail attach instead of faulting if pci_find_dbsf() doesn't find a
    device.

  PR:		216539
  Reported by:	asomers
  Tested by:	Dave Baukus <daveb@spectralogic.com>
  MFC after:	3 days
  Sponsored by:	Chelsio Communications

Changes:
  head/sys/dev/cxgbe/t4_iov.c
Comment 5 commit-hook freebsd_committer freebsd_triage 2017-02-03 21:38:13 UTC
A commit references this bug:

Author: jhb
Date: Fri Feb  3 21:37:28 UTC 2017
New revision: 313175
URL: https://svnweb.freebsd.org/changeset/base/313175

Log:
  MFC 313020: Fix a couple of issues with t4iov probe and attach.

  - Check for Chelsio vendor ID in probe routines.
  - Fail attach instead of faulting if pci_find_dbsf() doesn't find a
    device.

  PR:		216539
  Sponsored by:	Chelsio Communications

Changes:
_U  stable/11/
  stable/11/sys/dev/cxgbe/t4_iov.c