Bug 253708 - Problem booting FreeBSD 12.2 on Dell R730xd unless verbose boot (possibly Intel ixl and/or lagg failover related)
Summary: Problem booting FreeBSD 12.2 on Dell R730xd unless verbose boot (possibly Int...
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.2-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-02-20 00:14 UTC by Peter Eriksson
Modified: 2021-02-20 20:53 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Eriksson 2021-02-20 00:14:36 UTC
There seems to be some timing-related problem when booting a Dell R730xd server with FreeBSD 12.2. Possibly related to the Intel X710 10G ethernet cards, since the freeze seems to be happening when it activates the second ixl port.

It boots just fine with FreeBSD 11.3

If I hit 7 during the boot sequence and enable verbose boot then the system seems to boot just fine though. 

Non-verbose serial port (via IPMI) boot output below. One interresting thing is that it seems to kill the IPMI session too when this happens (over a dedicated separate port):

ixl3: Using MSI-X interrupts with 9 vectors
ixl3: Ethernet address: 3c:fd:fe:0f:3c:c6
ixl3: Allocating 8 queues for PF LAN VSI; 8 queues active
ixl3: PCI Express Bus: Speed 8.0GT/s Width x8
ixl3: Failed to initialize SR-IOV (error=2)
ixl3: netmap queues/slots: TX 8/1024, RX 8/1024
pci2: <unknown> at device 17.0 (no driver attached)
ahci0: <Intel Wellsburg AHCI SATA controller> port 0x3078-0x307f,0x308c-0x308f,0x3070-0x3077,0x3088-0x308b,0x3040-0x305f mem 0x97111000-0x971117ff at device 17.4 numa-domain 0 on pci2
ahci0: AHCI v1.30 with 4 6Gbps ports, Port Multiplier not supported
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich1: <AHCI channel> at channel 1 on ahci0
ahciem0: <AHCI enclosure management bridge> on ahci0
xhci0: <Intel Wellsburg USB 3.0 controller> mem 0x97100000-0x9710ffff at device 20.0 numa-domain 0 on pci2
xhci0: 32 bytes context size, 64-bit DMA
ixl0: Link is up, 10 Gbps Full Duplex, Requested FEC: None, Negotiated FEC: None, Autoneg: False, Flow Control: None
ixl0: link state changed to UP
usbus0 numa-domain 0 on xhci0
usbus0: 5.0Gbps Super Speed USB v3.0
pci2: <simple comms> at device 22.0 (no driver attached)
pci2: <simple comms> at device 22.1 (no driver attached)
ehci0: <Intel Wellsburg USB 2.0 controller> mem 0x97113000-0x971133ff at device 26.0 numa-domain 0 on pci2
usbus1: EHCI version 1.0
usbus1 numa-domain 0 on ehci0
usbus1: 480Mbps High Speed USB v2.0
pcib7: <PCI-PCI bridge> at device 28.0 numa-domain 0 on pci2
pci7: <PCI bus> numa-domain 0 on pcib7
pcib8: <ACPI PCI-PCI bridge> at device 28.7 numa-domain 0 on pci2
pci8: <ACPI PCI bus> numa-domain 0 on pcib8
pcib9: <PCI-PCI bridge> at device 0.0 numa-domain 0 on pci8
pci9: <PCI bus> numa-domain 0 on pcib9
pcib10: <PCI-PCI bridge> at device 0.0 numa-domain 0 on pci9
pci10: <PCI bus> numa-domain 0 on pcib10
ixl2: Link is upError: No response to keepalive - Terminating session

ixl0 & ixl2 is set up as a lagg0 port in failover mode.
Comment 1 Peter Eriksson 2021-02-20 20:53:26 UTC
I tried booting some various kernels on our test R730xd server. It’s currently using a LACP lagg though and runnings FreeBSD 11.4 but is otherwise identical to the (production) servers that are experiencing this. 


Result: Short story - problem seems to be with lagg as failover.


With lagg0 as LACP:

ifconfig_lagg0="laggproto lacp laggport ixl0 laggport ixl2 130.236.8.40 netmask 255.255.255.224 lacp_fast_timeout"                 
ifconfig_lagg0_ipv6="inet6 2001:6b0:17:2400::8:40/64 lacp_fast_timeout”  

Works fine:
- FreeBSD 13.0-STABLE #1 stable/13-d69677407: Sat Feb 20 19:43:06 CET 2021
- FreeBSD 12.2-STABLE stable/12-n1-d666638e2 GENERIC amd64
- FreeBSD 12.2-RELEASE-p3 #41 r369323M: Sat Feb 20 21:06:13 CET 2021
- FreeBSD 11.4-RELEASE-p7 #0: Wed Jan 27 16:09:57 UTC 2021


With lagg0 as Failover:

ifconfig_lagg0="laggproto failover laggport ixl0 laggport ixl2 130.236.8.40 netmask 255.255.255.224 -use_flowid"
ifconfig_lagg0_ipv6="inet6 2001:6b0:17:2400::8:40/64 -use_flowid"

Hangs on boot:
- FreeBSD 12.2-RELEASE-p3 #41 r369323M: Sat Feb 20 21:06:13 CET 2021

Works:
- FreeBSD 13.0-STABLE #1 stable/13-d69677407: Sat Feb 20 19:43:06 CET 2021
- FreeBSD 12.2-STABLE stable/12-n1-d666638e2 GENERIC amd64
- FreeBSD 11.4-RELEASE-p7 #0: Wed Jan 27 16:09:57 UTC 2021
- FreeBSD 11.3-RELEASE-p10 #0: Tue Jun  9 08:49:05 UTC 2020



ixl0: <Intel(R) Ethernet Controller X710 for 10GbE SFP+ - 2.3.0-k> mem 0xc9000000-0xc9ffffff,0xca008000-0xca00ffff at device 0.0 numa-domain 1 on pci15
ixl0: fw 7.83.59945 api 1.9 nvm 7.10 etid 800075df oem 19.4864.12
ixl0: PF-ID[0]: VFs 64, MSI-X 129, VF MSI-X 5, QPs 768, I2C
ixl0: Using 1024 TX descriptors and 1024 RX descriptors
ixl0: Using 8 RX queues 8 TX queues
ixl0: Using MSI-X interrupts with 9 vectors
ixl0: Ethernet address: 3c:fd:fe:24:e7:e0
ixl0: Allocating 8 queues for PF LAN VSI; 8 queues active
ixl0: PCI Express Bus: Speed 8.0GT/s Width x8
ixl0: Failed to initialize SR-IOV (error=2)
ixl0: netmap queues/slots: TX 8/1024, RX 8/1024