Summary: | kernel panic when loading ix interface ( was kernel panic on boot PowerEdge R720xd ) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | eric | ||||||||
Component: | kern | Assignee: | freebsd-bugs (Nobody) <bugs> | ||||||||
Status: | Closed Overcome By Events | ||||||||||
Severity: | Affects Many People | CC: | Justin, eric, erj, fredrik, gabe, jeffrey.e.pieper, jeremyshinall, kevpatt, krzysztof.galazka, lincolnb, pkubaj, sfilter, shurd, tnelson | ||||||||
Priority: | --- | Keywords: | IntelNetworking | ||||||||
Version: | 11.0-RC1 | ||||||||||
Hardware: | amd64 | ||||||||||
OS: | Any | ||||||||||
Attachments: |
|
Description
eric
2016-08-12 12:06:57 UTC
Created attachment 173582 [details]
screen dump idrac console
same issue on RC1 BUT the machine boots if I remove entry for ix in rc.conf But still get kernel panic when trying to use the ix if. ifconfig ix1 inet 192.168.0.1 netmask 255.255.255.0 -> panic same issue on 11.0-RC2, see attached file rc2.png Created attachment 174058 [details]
screen dump idrac console - trying 11.0-RC2
Same panic when trying RC3 I wonder what "phy_type 10" is in this context? What type of cable/SFP are you using here? Created attachment 175232 [details]
Picture of kernel panic
I am also seeing this with two different revisions of the X520-DA2 in 11.0-RC3.
I can confirm this card works completely OK in 10.3-RELEASE
We cannot repro this on 11-RC3 ix-3.1.13-k) using an X520 adapter, so I have a few questions: 1. Is this a mezz card or an adapter 2. Are you using iDRAC/shared ports? Shared with which devices? This could also be a PCI Hotplug issue, as it was added during this timeframe: https://svnweb.freebsd.org/base?view=revision&revision=304246 (In reply to Jeff Pieper from comment #8) In my case, it's an adapter in a PCIe slot. The server is a Dell PowerEdge 2950. I've tried two different revisions of this card, can provide details if needed. 10Gb is connected via SFP+ copper. iDRAC is configured on a 1Gbps onboard card but currently unused. We are only seeing this with 82598EB controller. Tests with 82599ES works. This is still a thing on 11-STABLE (r321894) / ix 3.2.12-k. Instant panic just when running "ifconfig ix0 up" (same deal with ix1 as well). Snippet from pciconf -lv: ix0@pci0:4:0:0: class=0x020000 card=0xa21f8086 chip=0x10f18086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = '82598EB 10-Gigabit AF Dual Port Network Connection' class = network subclass = ethernet ix1@pci0:4:0:1: class=0x020000 card=0xa21f8086 chip=0x10f18086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = '82598EB 10-Gigabit AF Dual Port Network Connection' class = network subclass = ethernet Not sure what you would want to see to debug. I have a core if that helps. Some output from kgdb below: Fatal trap 12: page fault while in kernel mode cpuid = 4; apic id = 04 fault virtual address = 0x0 fault code = supervisor read instruction, page not present instruction pointer = 0x20:0x0 stack pointer = 0x28:0xfffffe000037f9b8 frame pointer = 0x28:0xfffffe000037f9e0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (ix1 linkq) trap number = 12 panic: page fault cpuid = 4 KDB: stack backtrace: #0 0xffffffff805f43d7 at kdb_backtrace+0x67 #1 0xffffffff805b2186 at vpanic+0x186 #2 0xffffffff805b1ff3 at panic+0x43 #3 0xffffffff8095f4a2 at trap_fatal+0x322 #4 0xffffffff8095f4f9 at trap_pfault+0x49 #5 0xffffffff8095ed36 at trap+0x286 #6 0xffffffff80944db1 at calltrap+0x8 #7 0xffffffff80605237 at taskqueue_run_locked+0x127 #8 0xffffffff806063d8 at taskqueue_thread_loop+0xc8 #9 0xffffffff80575dc5 at fork_exit+0x85 #10 0xffffffff809452ee at fork_trampoline+0xe Uptime: 1m28s Dumping 2385 out of 65475 MB: (CTRL-C to abort) ..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% [snip] #0 doadump (textdump=<value optimized out>) at pcpu.h:222 222 __asm("movq %%gs:%1,%0" : "=r" (td) (kgdb) backtrace #0 doadump (textdump=<value optimized out>) at pcpu.h:222 #1 0xffffffff805b1d01 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:366 #2 0xffffffff805b21c0 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:759 #3 0xffffffff805b1ff3 in panic (fmt=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:690 #4 0xffffffff8095f4a2 in trap_fatal (frame=0xfffffe000037f8f0, eva=0) at /usr/src/sys/amd64/amd64/trap.c:801 #5 0xffffffff8095f4f9 in trap_pfault (frame=0xfffffe000037f8f0, usermode=0) at pcpu.h:222 #6 0xffffffff8095ed36 in trap (frame=0xfffffe000037f8f0) at /usr/src/sys/amd64/amd64/trap.c:421 #7 0xffffffff80944db1 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236 #8 0x0000000000000000 in ?? () Current language: auto; currently minimal (kgdb) I think that helps indicate where the problem might be, but one issue on our end is that here, we don't really have any 82598 cards. So we won't have a way to verify if any fix works for you. I've since removed the card from service, so I don't particularly have any stake in it working. But I still have it, and don't mind helping out. So I see two ways forward at the moment: 1) I can try to setup a test machine, and we can do a bit of back and forth while I test patches for you. 2) I can donate the card. Let me know if you're interested. Confirming in FreeBSD 11.1 that all my Intel 82598EB NICs cause a kernel panic. Only solution I could find was replacing them with Intel 82599ES NICs. Broken NIC: https://ark.intel.com/products/36918/Intel-82598EB-10-Gigabit-Ethernet-Controller Working NIC: https://ark.intel.com/products/41282/Intel-82599ES-10-Gigabit-Ethernet-Controller I have a pair of 82598 based intel NICs I can setup on a test bench if that would be helpful. Can confirm that I am also experiencing this issue on a FreeNAS 11.1 installation, and it occurs as soon as the interface is brought 'up'. Using DAC cables, happens when not connected at remote end. I can confirm, seeing same behavior reported here, on 10.4 amd64 and later. Specific hardware is HP DL380e Gen8 with Intel E10G42AFDAGP5 interface using copper DAC cables. Installed with 10.3 amd64, seeing proper operation. (In reply to Tim Nelson from comment #16) Could you check if you can reproduce that issue with out-of-tree driver: https://downloadcenter.intel.com/download/14688/Ethernet-Intel-Network-Adapters-Driver-for-PCIe-10-Gigabit-Network-Connections-Under-FreeBSD- or with the iflib version of ix driver form 12.0-CURRENT (4.0.0)? I can confirm this bug on FreeNAS-RELEASE-11.2, on a Dell PowerEdge R710 with Intel E10G42AFDA NIC. Kernel panics as soon as boot process tries to bring up ix devices. It does not seem to matter if DA cables are attached, detached, or linked on the other end. One more thing ... I would like to try an out-of-tree / FreeBSD 12 driver, however: I wish I knew of a way to disable the ixgbe driver, but since I can't complete a boot I'm not sure how. I don't have easy access to simply remove the PCIe card. (In reply to Kevin H. Patterson from comment #19) You could try to disable the driver by setting hints as described here: https://www.freebsd.org/doc/handbook/device-hints.html The driver will still load but it won't attach to devices so the panic should not happen. Every port have to be disabled separately, so in case of dual port card you should try: set hint.ix.0.disabled=1 set hint.ix.1.disabled=1 (In reply to Krzysztof Galazka from comment #17) FWIW, I can confirm that my card operates normally under FreeBSD 12.0-RELEASE. I was able to boot from the install CD and bring up the interface from the shell, ping, etc. with no kernel panic. Can this driver be backported to FreeBSD 11? (In reply to Kevin H. Patterson from comment #21) +1 for this. I'm having the exact same problem with an E10G42AFDA card in a Dell r515 on FreeNAS 11.2-U2. I bought this card explicitly because it's on the FreeBSD HCL. Would love to get some traction on this really old issue. As noted in comment 21, the driver works fine on FreeBSD 12 and 11 is EOL. Closing. |