Bug 252626 - ixl: panic on attach of X722 on-motherboard interfaces
Summary: ixl: panic on attach of X722 on-motherboard interfaces
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.2-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-net (Nobody)
URL:
Keywords: IntelNetworking, panic
Depends on:
Blocks:
 
Reported: 2021-01-12 23:49 UTC by Garrett Wollman
Modified: 2022-06-23 12:39 UTC (History)
5 users (show)

See Also:


Attachments
fatal-trap (284.71 KB, image/png)
2022-06-20 15:32 UTC, Santiago Martinez
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Garrett Wollman freebsd_committer 2021-01-12 23:49:32 UTC
I upgraded our fleet of ~20 NFS servers today from 12.1 to 12.2-RELEASE-p2. Three of the newest servers, which have X722 interfaces on the motherboard, crash at attach time with the following trace:

--- trap 0xc, rip = 0xffffffff80757efd, rsp = 0xfffffe0075dff2d0, rbp = 0xfffffe0075dff2e0 ---
grouptaskqueue_enqueue() at grouptaskqueue_enqueue+0xd/frame 0xfffffe0075dff2e0
ixl_admin_timer() at ixl_admin_timer+0x11/frame 0xfffffe0075dff300
softclock_call_cc() at softclock_call_cc+0x141/frame 0xfffffe0075dff3b0
softclock() at softclock+0x79/frame 0xfffffe0075dff3d0
ithread_loop() at ithread_loop+0x23c/frame 0xfffffe0075dff430
fork_exit() at fork_exit+0x7e/frame 0xfffffe0075dff470
fork_trampoline() at f

(There's an omitted frame where ixl_admin_timer() calls iflib_admin_intr_deferred() -- apparently pf->vsi.ctx is not initialized yet, or there's a missing memory barrier?)

The problem is not observed on older servers with X710 interfaces.  (We don't use the 1G interfaces for anything, and if we didn't have servers that need it, I would just yank ixl(4) from our standard kernel configuration.  I might still do that and just load the module on the three servers that need it.)

The systems that panic have:

ixl0@pci0:26:0:0:       class=0x020000 card=0x37d115d9 chip=0x37d18086 rev=0x09 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Ethernet Connection X722 for 1GbE'
    class      = network
    subclass   = ethernet

The systems that boot (and operate properly) have:

ixl0@pci0:3:0:0:        class=0x020000 card=0x00088086 chip=0x15728086 rev=0x01 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Ethernet Controller X710 for 10GbE SFP+'
    class      = network
    subclass   = ethernet
Comment 1 Krzysztof Galazka 2021-01-14 14:20:08 UTC
(In reply to Garrett Wollman from comment #0)
vsi->ctx is set at the beginning of ixl_if_attach_pre and timer is started at the end of ixl_if_attach_post so it looks a bit strange. I haven't seen anything like that before. I'm trying to get a reproduction. Is there a chance for a core dump? Also could you, please, check the FW version (sysctl dev.ixl.0.fw_version) and provide exact model of the motherboard?
Comment 2 Garrett Wollman freebsd_committer 2021-01-15 18:23:40 UTC
(In reply to Krzysztof Galazka from comment #1)
Since these are production file servers, I removed the ixl driver so that I could complete the upgrade within the scheduled window, and won't be able to tell you the firmware version any time soon.

The machine is an iXsystems IXC-4224P-IXN, and the motherboard is a Supermicro X11DPH-i rev 1.10.  According to our inventory, these servers were delivered in October, 2018.
Comment 3 Santiago Martinez 2022-06-20 15:32:47 UTC
Created attachment 234817 [details]
fatal-trap
Comment 4 Santiago Martinez 2022-06-20 15:37:01 UTC
Sorry. just submitted the image without text by mistake.
This is happening on a supermicro server running 13.1.

We have servers with the same spec. 3 of them run FreeBSD 13.0 + intel driver from ports and they don't show any issues at all (SRIOV and PT are working).

One of the servers has been wiped and installed with 13.1. Without the intel driver from ports, it works well. As soon as the server is booted with the intel driver from the ports, it starts having issues. Sometimes it just freezes, and sometimes it triggers a trap 12 and the system reboot.

The trap always occurs when the interface gets activated (ifconfig ixl2 up).
Comment 5 Santiago Martinez 2022-06-20 15:54:32 UTC
Something interesting.

When compiling the drivers (from ports), it provides three options related to netmap, auto/on/off.

1 - When compiling the driver with netmap off the problem disappears.

2 - When compiling the driver with netmap auto the driver compiles but triggers the trap when booted and the interface is activated.

3 - When compiling the driver with netmap on, the driver fails to compile, which probably explains the aforementioned behaviour.
Comment 6 Santiago Martinez 2022-06-20 16:24:16 UTC
have done some test, added some IOV and things seems to work. Still, somebody should fix the netmap support ( as I'm not sure whats the correct way to fix it)

I have sent an email to freebsd@intel.com
Comment 7 Vincenzo Maffione freebsd_committer 2022-06-22 16:58:56 UTC
Just to make sure I understand correctly.
These issues are not affecting the iflib(4) based ixl driver that comes with the stock kernel, are they?
Is the issue only related to the intel drivers from ports?
Comment 8 benoitc 2022-06-22 17:10:56 UTC
i hve same chipset intel card on HPE (both media card with 1Gb and 10Gb). When IOV is enabled using the stock kernel IXL driver of freebsd I get a panic at boot. 

When using the intel-ixl-kmod driver it works but once SRV is enabled I get the bug #234073. When, I upgrade to latest intel-ixl-kmod, the kernel panics and reboot.

```
 dmesg |grep ixl
module_register: cannot register pci/ixl from kernel; already loaded from if_ixl_updated.ko
Module pci/ixl failed to register: 17
ixl0: <Intel(R) Ethernet Connection 700 Series PF Driver, Version - 1.12.2> mem 0xeb000000-0xebffffff,0xef000000-0xef007fff at device 0.0 numa-domain 0 on pci9
ixl0: using 1024 tx descriptors and 1024 rx descriptors
ixl0: fw 5.5.67510 api 1.12 nvm 5.50 etid 80003373 oem 1.268.0
ixl0: The driver for the device detected a newer version of the NVM image than expected.
ixl0: Please install the most recent version of the network driver.
ixl0: PF-ID[0]: VFs 32, MSI-X 129, VF MSI-X 5, QPs 384, MDIO shared
ixl0: Using MSI-X interrupts with 9 vectors
ixl0: Allocating 8 queues for PF LAN VSI; 8 queues active
ixl0: Ethernet address: b4:7a:f1:dd:c6:00
ixl0: SR-IOV ready
ixl0: The device is not iWARP enabled
ixl1: <Intel(R) Ethernet Connection 700 Series PF Driver, Version - 1.12.2> mem 0xec000000-0xecffffff,0xef008000-0xef00ffff at device 0.1 numa-domain 0 on pci9
ixl1: using 1024 tx descriptors and 1024 rx descriptors
ixl1: fw 5.5.67510 api 1.12 nvm 5.50 etid 80003373 oem 1.268.0
ixl1: The driver for the device detected a newer version of the NVM image than expected.
ixl1: Please install the most recent version of the network driver.
ixl1: PF-ID[1]: VFs 32, MSI-X 129, VF MSI-X 5, QPs 384, MDIO shared
ixl1: Using MSI-X interrupts with 9 vectors
ixl1: Allocating 8 queues for PF LAN VSI; 8 queues active
ixl1: Ethernet address: b4:7a:f1:dd:c6:01
ixl1: SR-IOV ready
ixl1: The device is not iWARP enabled
ixl2: <Intel(R) Ethernet Connection 700 Series PF Driver, Version - 1.12.2> mem 0xed000000-0xedffffff,0xef010000-0xef017fff at device 0.2 numa-domain 0 on pci9
ixl2: using 1024 tx descriptors and 1024 rx descriptors
ixl2: fw 5.5.67510 api 1.12 nvm 5.50 etid 80003373 oem 1.268.0
ixl2: The driver for the device detected a newer version of the NVM image than expected.
ixl2: Please install the most recent version of the network driver.
ixl2: PF-ID[2]: VFs 32, MSI-X 129, VF MSI-X 5, QPs 384, I2C
ixl2: Using MSI-X interrupts with 9 vectors
ixl2: Allocating 8 queues for PF LAN VSI; 8 queues active
ixl2: Ethernet address: b4:7a:f1:dd:c6:02
ixl2: SR-IOV ready
ixl2: The device is not iWARP enabled
ixl3: <Intel(R) Ethernet Connection 700 Series PF Driver, Version - 1.12.2> mem 0xee000000-0xeeffffff,0xef018000-0xef01ffff at device 0.3 numa-domain 0 on pci9
ixl3: using 1024 tx descriptors and 1024 rx descriptors
ixl3: fw 5.5.67510 api 1.12 nvm 5.50 etid 80003373 oem 1.268.0
ixl3: The driver for the device detected a newer version of the NVM image than expected.
ixl3: Please install the most recent version of the network driver.
ixl3: PF-ID[3]: VFs 32, MSI-X 129, VF MSI-X 5, QPs 384, I2C
ixl3: Using MSI-X interrupts with 9 vectors
ixl3: Allocating 8 queues for PF LAN VSI; 8 queues active
ixl3: Ethernet address: b4:7a:f1:dd:c6:03
ixl3: SR-IOV ready
ixl3: The device is not iWARP enabled
ixl2: Link is up, 10 Gbps Full Duplex, Requested FEC: None, Negotiated FEC: None, Autoneg: False, Flow Control: None
ixl2: link state changed to UP
ixl3: Link is up, 10 Gbps Full Duplex, Requested FEC: None, Negotiated FEC: None, Autoneg: False, Flow Control: None
ixl3: link state changed to UP
```

```
pciconf -vl ixl2
ixl2@pci0:100:0:2:      class=0x020000 rev=0x09 hdr=0x00 vendor=0x8086 device=0x37d3 subvendor=0x1590 subdevice=0x0219
    vendor     = 'Intel Corporation'
    device     = 'Ethernet Connection X722 for 10GbE SFP+'
    class      = network
    subclass   = ethernet
```

ried it (112.35 with netmap support disabled and got a kernel panic:

```
Fatal trap 12: page fault while in kernel mode
cpuid = 10; apic id = 0a
fault virtual address= 0x54
fault code= supervisor read data, page not present
instruction pointer= 0x20:0xffffffff82f04b8a
stack pointer        = 0x28:0xfffffe020d8108a0
frame pointer        = 0x28:0xfffffe020d8108e0
code segment= base rx0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process= 21330 (iovctl)
trap number= 12
panic: page fault
cpuid = 10
time = 1655897957
KDB: stack backtrace:
#0 0xffffffff80c69465 at kdb_backtrace+0x65
#1 0xffffffff80c1bb1f at vpanic+0x17f
#2 0xffffffff80c1b993 at panic+0x43
#3 0xffffffff810afdf5 at trap_fatal+0x385
#4 0xffffffff810afe4f at trap_pfault+0x4f
#5 0xffffffff81087528 at calltrap+0x8
#6 0xffffffff82efe716 at ixl_reconfigure_filters+0x66
#7 0xffffffff82f24b90 at ixl_vf_setup_vsi+0x3b0
#8 0xffffffff82f2469e at ixl_add_vf+0x1ee
#9 0xffffffff8086e407 at pci_iov_ioctl+0x1497
#10 0xffffffff80ab4e46 at devfs_ioctl+0xc6
#11 0xffffffff80d0cde4 at vn_ioctl+0x1a4
#12 0xffffffff80ab54fe at devfs_ioctl_f+0x1e
#13 0xffffffff80c897cb at kern_ioctl+0x25b
#14 0xffffffff80c894d1 at sys_ioctl+0xf1
#15 0xffffffff810b06ec at amd64_syscall+0x10c
#16 0xffffffff81087e3b at fast_syscall_common+0xf8
Uptime: 13s
```
Comment 9 Santiago Martinez 2022-06-23 12:39:55 UTC
Hi Vincenzo, 

I think there are multiple issues, that maybe need to be handled separately.

Current setup:
OS: 13.1-RELEASE
Card: Ethernet Connection X722 for 10GBASE-T

With stock kernel ixl driver: 
1- NO-SRIOV - ethernet card works, port comes up and connectivity looks ok. 
2- SRIOV - when iovctl -C -f /etc/iovctl.conf is executed, then kernel panics. 


With ixl driver from ports:
Version: ixl 1.12.2
boot loader: ixl_updated_load="YES":
 
1- NO-SRIOV - when ethernet ports come up (ifconfig up) there is a panic.
There seems to be related to a problem with NETMAP, please see:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264809. Recompiling the driver from ports with netmap support disabled, makes the trick for bot NO-SRIOV and SRIOV (at least for me)