Bug 211771 - kernel panic when loading ix interface ( was kernel panic on boot PowerEdge R720xd )
Summary: kernel panic when loading ix interface ( was kernel panic on boot PowerEdge R...
Status: Closed Overcome By Events
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.0-RC1
Hardware: amd64 Any
: --- Affects Many People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords: IntelNetworking
Depends on:
Blocks:
 
Reported: 2016-08-12 12:06 UTC by eric
Modified: 2023-02-03 17:30 UTC (History)
14 users (show)

See Also:


Attachments
screen dump idrac console (145.48 KB, image/png)
2016-08-12 12:08 UTC, eric
no flags Details
screen dump idrac console - trying 11.0-RC2 (165.68 KB, image/png)
2016-08-25 13:21 UTC, eric
no flags Details
Picture of kernel panic (550.99 KB, image/png)
2016-09-28 17:36 UTC, Lincoln Bryant
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description eric 2016-08-12 12:06:57 UTC
Upgrade from 10.3-p5

freebsd-update upgrade -r 11.0-BETA4
installs but hangs on first reboot.
Last entry in console before panic: ix1: Detected phy_type 10

Could not attach the screen dump I have from the console, so I'm typing the first lines



Fatal trap 12: page fault unite in kernei nude
cpuid = 15; apic id = 23
fauit virtual address = 0x0
fault code = supervisor read instructiun, page not present
instruction pointer = 0x20:0x0
slack painter =
Comment 1 eric 2016-08-12 12:08:20 UTC
Created attachment 173582 [details]
screen dump idrac console
Comment 2 eric 2016-08-13 19:19:38 UTC
same issue on RC1 BUT the machine boots if I remove entry for ix in rc.conf

But still get kernel panic when trying to use the ix if.

ifconfig ix1 inet 192.168.0.1 netmask 255.255.255.0 -> panic
Comment 3 eric 2016-08-25 13:20:18 UTC
same issue on 11.0-RC2, see attached file rc2.png
Comment 4 eric 2016-08-25 13:21:16 UTC
Created attachment 174058 [details]
screen dump idrac console - trying 11.0-RC2
Comment 5 eric 2016-09-19 11:32:14 UTC
Same panic when trying RC3
Comment 6 Sean Bruno freebsd_committer freebsd_triage 2016-09-28 17:32:36 UTC
I wonder what "phy_type 10" is in this context?

What type of cable/SFP are you using here?
Comment 7 Lincoln Bryant 2016-09-28 17:36:25 UTC
Created attachment 175232 [details]
Picture of kernel panic

I am also seeing this with two different revisions of the X520-DA2 in 11.0-RC3.

I can confirm this card works completely OK in 10.3-RELEASE
Comment 8 Jeff Pieper 2016-09-29 17:01:32 UTC
We cannot repro this on 11-RC3 ix-3.1.13-k) using an X520 adapter, so I have a few questions:

1. Is this a mezz card or an adapter
2. Are you using iDRAC/shared ports? Shared with which devices?

This could also be a PCI Hotplug issue, as it was added during this timeframe:
https://svnweb.freebsd.org/base?view=revision&revision=304246
Comment 9 Lincoln Bryant 2016-09-29 18:18:31 UTC
(In reply to Jeff Pieper from comment #8)

In my case, it's an adapter in a PCIe slot. The server is a Dell PowerEdge 2950. I've tried two different revisions of this card, can provide details if needed. 

10Gb is connected via SFP+ copper. 

iDRAC is configured on a 1Gbps onboard card but currently unused.
Comment 10 Fredrik Lennmark 2016-10-04 13:29:58 UTC
We are only seeing this with 82598EB controller. Tests with 82599ES works.
Comment 11 Michael Galati 2017-08-11 06:41:08 UTC
This is still a thing on 11-STABLE (r321894) / ix 3.2.12-k.  Instant panic just when running "ifconfig ix0 up" (same deal with ix1 as well).

Snippet from pciconf -lv:

ix0@pci0:4:0:0: class=0x020000 card=0xa21f8086 chip=0x10f18086 rev=0x01 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82598EB 10-Gigabit AF Dual Port Network Connection'
    class      = network
    subclass   = ethernet
ix1@pci0:4:0:1: class=0x020000 card=0xa21f8086 chip=0x10f18086 rev=0x01 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82598EB 10-Gigabit AF Dual Port Network Connection'
    class      = network
    subclass   = ethernet

Not sure what you would want to see to debug.  I have a core if that helps.  Some output from kgdb below:

Fatal trap 12: page fault while in kernel mode
cpuid = 4; apic id = 04
fault virtual address	= 0x0
fault code		= supervisor read instruction, page not present
instruction pointer	= 0x20:0x0
stack pointer	        = 0x28:0xfffffe000037f9b8
frame pointer	        = 0x28:0xfffffe000037f9e0
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 0 (ix1 linkq)
trap number		= 12
panic: page fault
cpuid = 4
KDB: stack backtrace:
#0 0xffffffff805f43d7 at kdb_backtrace+0x67
#1 0xffffffff805b2186 at vpanic+0x186
#2 0xffffffff805b1ff3 at panic+0x43
#3 0xffffffff8095f4a2 at trap_fatal+0x322
#4 0xffffffff8095f4f9 at trap_pfault+0x49
#5 0xffffffff8095ed36 at trap+0x286
#6 0xffffffff80944db1 at calltrap+0x8
#7 0xffffffff80605237 at taskqueue_run_locked+0x127
#8 0xffffffff806063d8 at taskqueue_thread_loop+0xc8
#9 0xffffffff80575dc5 at fork_exit+0x85
#10 0xffffffff809452ee at fork_trampoline+0xe
Uptime: 1m28s
Dumping 2385 out of 65475 MB: (CTRL-C to abort) ..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
[snip]
#0  doadump (textdump=<value optimized out>) at pcpu.h:222
222		__asm("movq %%gs:%1,%0" : "=r" (td)
(kgdb) backtrace 
#0  doadump (textdump=<value optimized out>) at pcpu.h:222
#1  0xffffffff805b1d01 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:366
#2  0xffffffff805b21c0 in vpanic (fmt=<value optimized out>, 
    ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:759
#3  0xffffffff805b1ff3 in panic (fmt=<value optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:690
#4  0xffffffff8095f4a2 in trap_fatal (frame=0xfffffe000037f8f0, eva=0)
    at /usr/src/sys/amd64/amd64/trap.c:801
#5  0xffffffff8095f4f9 in trap_pfault (frame=0xfffffe000037f8f0, usermode=0)
    at pcpu.h:222
#6  0xffffffff8095ed36 in trap (frame=0xfffffe000037f8f0)
    at /usr/src/sys/amd64/amd64/trap.c:421
#7  0xffffffff80944db1 in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:236
#8  0x0000000000000000 in ?? ()
Current language:  auto; currently minimal
(kgdb)
Comment 12 Eric Joyner freebsd_committer freebsd_triage 2017-08-24 21:51:26 UTC
I think that helps indicate where the problem might be, but one issue on our end is that here, we don't really have any 82598 cards. So we won't have a way to verify if any fix works for you.
Comment 13 Michael Galati 2017-08-25 05:26:17 UTC
I've since removed the card from service, so I don't particularly have any stake in it working.  But I still have it, and don't mind helping out.

So I see two ways forward at the moment:

1) I can try to setup a test machine, and we can do a bit of back and forth while I test patches for you.

2) I can donate the card.

Let me know if you're interested.
Comment 14 Gabriel Zellmer 2017-12-06 05:16:14 UTC
Confirming in FreeBSD 11.1 that all my Intel 82598EB NICs cause a kernel panic. Only solution I could find was replacing them with Intel 82599ES NICs.

Broken NIC:
https://ark.intel.com/products/36918/Intel-82598EB-10-Gigabit-Ethernet-Controller

Working NIC:
https://ark.intel.com/products/41282/Intel-82599ES-10-Gigabit-Ethernet-Controller
Comment 15 Justin Opotzner 2018-01-19 15:51:24 UTC
I have a pair of 82598 based intel NICs I can setup on a test bench if that would be helpful.

Can confirm that I am also experiencing this issue on a FreeNAS 11.1 installation, and it occurs as soon as the interface is brought 'up'.

Using DAC cables, happens when not connected at remote end.
Comment 16 Tim Nelson 2018-01-31 03:40:24 UTC
I can confirm, seeing same behavior reported here, on 10.4 amd64 and later. Specific hardware is HP DL380e Gen8 with Intel E10G42AFDAGP5 interface using copper DAC cables. Installed with 10.3 amd64, seeing proper operation.
Comment 17 Krzysztof Galazka 2018-02-02 16:08:10 UTC
(In reply to Tim Nelson from comment #16)

Could you check if you can reproduce that issue with out-of-tree driver: https://downloadcenter.intel.com/download/14688/Ethernet-Intel-Network-Adapters-Driver-for-PCIe-10-Gigabit-Network-Connections-Under-FreeBSD-
or with the iflib version of ix driver form 12.0-CURRENT (4.0.0)?
Comment 18 Kevin H. Patterson 2019-01-27 21:42:44 UTC
I can confirm this bug on FreeNAS-RELEASE-11.2, on a Dell PowerEdge R710 with Intel E10G42AFDA NIC.

Kernel panics as soon as boot process tries to bring up ix devices. It does not seem to matter if DA cables are attached, detached, or linked on the other end.
Comment 19 Kevin H. Patterson 2019-01-27 21:46:23 UTC
One more thing ... I would like to try an out-of-tree / FreeBSD 12 driver, however: I wish I knew of a way to disable the ixgbe driver, but since I can't complete a boot I'm not sure how. I don't have easy access to simply remove the PCIe card.
Comment 20 Krzysztof Galazka 2019-01-28 10:16:28 UTC
(In reply to Kevin H. Patterson from comment #19)

You could try to disable the driver by setting hints as described here: https://www.freebsd.org/doc/handbook/device-hints.html
The driver will still load but it won't attach to devices so the panic should not happen. Every port have to be disabled separately, so in case of dual port card you should try:

set hint.ix.0.disabled=1
set hint.ix.1.disabled=1
Comment 21 Kevin H. Patterson 2019-01-31 00:54:05 UTC
(In reply to Krzysztof Galazka from comment #17)

FWIW, I can confirm that my card operates normally under FreeBSD 12.0-RELEASE. I was able to boot from the install CD and bring up the interface from the shell, ping, etc. with no kernel panic.

Can this driver be backported to FreeBSD 11?
Comment 22 Jeremy Shinall 2019-03-28 21:59:21 UTC
(In reply to Kevin H. Patterson from comment #21)
+1 for this.

I'm having the exact same problem with an E10G42AFDA card in a Dell r515 on FreeNAS 11.2-U2. I bought this card explicitly because it's on the FreeBSD HCL. Would love to get some traction on this really old issue.
Comment 23 Piotr Kubaj freebsd_committer freebsd_triage 2023-02-03 17:30:19 UTC
As noted in comment 21, the driver works fine on FreeBSD 12 and 11 is EOL. Closing.