Bug 273372 - SR-IOV Networking in Bhyve Causes Chelsio T520-SO-CR to Fail on Host, Kernel Panic if Reset
Summary: SR-IOV Networking in Bhyve Causes Chelsio T520-SO-CR to Fail on Host, Kernel ...
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 14.0-STABLE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-virtualization (Nobody)
URL:
Keywords: bhyve, crash
Depends on:
Blocks:
 
Reported: 2023-08-26 22:31 UTC by Mark McBride
Modified: 2024-04-01 00:14 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mark McBride 2023-08-26 22:31:33 UTC
If I setup SR-IOV for bhyve using a Chelsio T520-SO-CR network adapter, start the install process for a FreeBSD guest, but shortly (maybe a minute) after I load the driver for the pci device in the guest, the network card stops responding in the guest and host.

# Relevant /boot/loader.conf

vmm_load="YES"
nmdm_load="YES"

t5fw_cfg_load="YES"
if_cxgbe_load="YES"
if_cxgbev_load="YES"

# Relevant /etc/rc.conf

iovctl_files="/etc/iov/cxl1.conf"

vm_enable="YES"
vm_dir="zfs:zapps/bhyve"
vm_list=""
vm_delay="5"


# Relevant /etc/iov/cxl1.conf

PF {
    device : "cxl1";                                                                                                num_vfs : 14;
}
DEFAULT {
    passthrough : false;
}
# ...
# VFs for bhyve
VF-10 {
    mac-addr : "aa:88:44:00:02:20";
    passthrough : true;
}
# ...


### pciconf -lvbc in host
ppt0@pci0:2:0:49:       class=0x020000 rev=0x00 hdr=0x00 vendor=0x1425 device=0x5807 subvendor=0x1425 subdevice=0x0000
    vendor     = 'Chelsio Communications Inc'
    device     = 'T520-SO Unified Wire Ethernet Controller [VF]'
    class      = network
    subclass   = ethernet
    bar   [10] = type Memory, range 32, base rxdd00a000, size 4096, enabled
    bar   [18] = type Memory, range 32, base rxdd060000, size 32768, enabled
    bar   [20] = type Memory, range 32, base rxdd194000, size 8192, enabled
    cap 10[70] = PCI-Express 2 endpoint max data 256(2048) FLR NS
                 max read 512
                 link x0(x8) speed 0.0(8.0)
    cap 11[b0] = MSI-X supports 8 messages
                 Table in map 0x20[0x0], PBA in map 0x20[0x8000]
    cap 05[50] = MSI supports 32 messages, 64 bit, vector masks
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 0 corrected
    ecap 000e[140] = ARI 1
    ecap 0017[150] = TPH Requester 1

Use vm install freebsd-test FreeBSD-13.2-RELEASE-amd64-bootonly.iso
Config line for passthru:
passthru0="2/0/49"

Connect with vm console freebsd-test
Installer starts, choose shell.

### pciconf -lvbc in guest, prior to driver load
none0@pci0:0:5:0:       class=0x020000 rev=0x00 hdr=0x00 vendor=0x1425 device=0x5807 subvendor=0x1425 subdevice=0x0000
    vendor     = 'Chelsio Communications Inc'
    device     = 'T520-SO Unified Wire Ethernet Controller [VF]'
    class      = network
    subclass   = ethernet
    bar   [10] = type Memory, range 32, base rxc000e000, size 4096, enabled
    bar   [18] = type Memory, range 32, base rxc0000000, size 32768, enabled
    bar   [20] = type Memory, range 32, base rxc000c000, size 8192, enabled
    cap 10[70] = PCI-Express 2 endpoint max data 256(2048) FLR NS
                 max read 512
                 link x0(x8) speed 0.0(8.0)
    cap 11[b0] = MSI-X supports 8 messages
                 Table in map 0x20[0x0], PBA in map 0x20[0x8000]
    cap 05[50] = MSI supports 32 messages, 64 bit, vector masks

# kldload cxlv
t5vf0: <Chelsio T520-SO VF> mem 0xc000e000-0xc000efff,0xc0000000-0xc0007fff,0xc000c000-0xc000dfff at device 5.0 on pci0
t5vf0: 1 ports, 2 MSI-X interrupts, 3 eq, 2 iq
cxlv0: <port 0> on t5vf0
cxlv0: 1 txq, 1 rxq (NIC)

### pciconf -lvbc in guest, after driver load
t5vf0@pci0:0:5:0:       class=0x020000 rev=0x00 hdr=0x00 vendor=0x1425 device=0x5807 subvendor=0x1425 subdevice=0x0000
    vendor     = 'Chelsio Communications Inc'
    device     = 'T520-SO Unified Wire Ethernet Controller [VF]'
    class      = network
    subclass   = ethernet
    bar   [10] = type Memory, range 32, base rxc000e000, size 4096, enabled
    bar   [18] = type Memory, range 32, base rxc0000000, size 32768, enabled
    bar   [20] = type Memory, range 32, base rxc000c000, size 8192, enabled
    cap 10[70] = PCI-Express 2 endpoint max data 256(2048) FLR NS
                 max read 512
                 link x0(x8) speed 0.0(8.0)
    cap 11[b0] = MSI-X supports 8 messages, enabled
                 Table in map 0x20[0x0], PBA in map 0x20[0x8000]
    cap 05[50] = MSI supports 32 messages, 64 bit, vector masks

Shortly after loading the driver, I lose networking on the host.
dmesg shows nothing after event

# ifconfig looks normal
cxl0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=6ec07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6,HWRXTSTMP,NOMAP>
        ether 00:07:43:36:bc:80
        inet 10.0.1.201 netmask 0xffffff00 broadcast 10.0.1.255
        media: Ethernet 10Gbase-Twinax <full-duplex,rxpause,txpause>
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>

# ifconfig cxl0 down
# ifconfig cxl0 up
# Aug 26 15:21:11 core18 kernel: t5nex0: command 0x16 in mbox 4 timed out (0x4014c010).
Aug 26 15:21:11 core18 kernel: t5nex0: mbox 4 cmdsent 16a0094400000001 05dc050000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Aug 26 15:21:11 core18 kernel: t5nex0: mbox 4 current 16a0094400000001 05dc050000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Aug 26 15:21:11 core18 kernel: t5nex0: encountered fatal error, adapter stopped (1).
Comment 1 Mark McBride 2023-08-26 23:50:08 UTC
Tried with 14.0-ALPHA3 from https://download.freebsd.org/snapshots/VM-IMAGES/14.0-ALPHA3/amd64/Latest/

Same results. Driver loads in guest. All seems well for a few seconds, and then the all networking becomes unavailable.
Comment 2 Navdeep Parhar freebsd_committer freebsd_triage 2023-09-05 16:28:57 UTC
I tried 13.2 as well as 14.0-alpha3 on bare metal running 13.2 and 14.0-alpha3
inside bhyve VMs.  The host creates VF devices from a T520 and passes them
through -- one to each VM.  The cxgbev VF driver inside the VM worked properly
in all cases.

The only bug I ran into was the kernel creating unusable VFs for a card in a
particular slot.  It was fixed in
https://cgit.freebsd.org/src/commit/?id=7063f94283af60818429a0c2d70e80ae4ad5c146
That is not the root cause of this problem.

Can you please try these to debug your problem further:
1. Reduce num_vfs to 2 or 4 (from 14) temporarily, for testing.
2. When the problem occurs, grab the device log _before_ attempting any ifconfig
   down/up.  sysctl -n dev.t5nex.0.misc.devlog
Comment 3 Mark McBride 2023-09-06 01:24:28 UTC
After reducing num_vfs to 2 (from 14), and setting both to passthrough, I can boot the bhyve vm without issue. It's been online now for ~15 minutes. Previous fail time was less than a minute. Whatever the issue, this seems to avoid it.

Not sure if it's important, but with 14 VFs, 10 of them were not passthrough (for jails), and 4 of them were. Mentioning it in case the mix has anything to do with it.
Comment 4 Mark McBride 2023-09-09 19:51:38 UTC
I just attempted 14 vfs with 14-BETA1 as the host. Still fails. As requested ...

# sysctl -n dev.t5nex.0.misc.devlog                                                                                                                                                                             
      Seq#           Tstamp     Level  Facility  Message                                                                                                                                                                      
         0           427889      INFO      CORE  log initialized @ 0x200a0000 size 32768 (128 entries) fwrev 0x011b0400 pcie_fw 0x0014cc10                                                                                    
         1           521576      INFO      CORE  bootstrap firmware took 27 msecs to run                                                                                                                                      
         2           525170      INFO      CORE  Serial Configuration version: 0x1006000 VPD version: 0x1                                                                                                                     
         3           525173      INFO      CORE  pcie: npf 7 (pfbitmap 0x7f) nvf 64 (pf 0..7 0x1010101000000000) vfstride 4                                                                                                   
         4           535983      INFO      CORE  flr_timer_start: flowc_id 125 0x205fff80 buf 0x205fc0c0                                                                                                                      
         5           575529      INFO        CF  cf_parse: file memtype 0x1 memaddr 0x5e0000 mapped @ 0x205e0000:                                                                                                             
         6           575559      INFO        CF  configuration file parser: pl timeout value is too large, changing from 10000 to 4194usecs                                                                                   
         7           576541      INFO      CORE  configured with caps nbm|link 0x00000000 switch|nic 0x00030001 toe|rdma 0x00010003 iscsi|crypto 0x00410000 fcoe:0x0                                                          
         8           576579    NOTICE      CORE  pfvf_init: 150 SMT entires are still unused                                                                                                                                  
         9           576592      INFO        HW  hw_tp_tcp_tunings: tuning for LAN environment                                                                                                                                
        10           576622      INFO       RES  le configuration: nentries 2048 route 32 clip 32 filter 1440 server 416 active 128 hash 0 nserversram 0                                                                      
        11           577565      INFO      DMAQ  Firmware using new VIID format ([11:9] PFN, [8] VI Valid, [7:0] VIN)                                                                                                         
        12           578623      INFO       RES  flowc_table 0x200c8000, size 211456                                                                                                                                          
        13           580825    NOTICE        RI  Not enough memory for 21 ird/ord per conn, ird/ord changed to 21                                                                                                             
        14           582460      INFO      CORE  flr_timer_start: flowc_id 1651 0x200fb980 buf 0x2039c800                                                                                                                     
        15           612719      INFO       RES  le initialization: nentries 2048 route 32 clip 32 filter 1440 server 416 active 128 hash 0 nserversram 0                                                                     
        16           836644      INFO      PORT  module[0]: gpio 11 vendor id 004020, identifier 0x03, SFP28(byte 36/192) 0xff, SFP(byte 3/131) 0x00, 1G (byte 6) 0x00                                                        
        17           836646      INFO      PORT  optical length(byte 15/142) 0, copper cable(byte 8/147) 0x04, length(byte 18/146) 3, module_type 0x04                                                                        
        18           836649      INFO      PORT  hw_mac_init_port[0], ptype 0x9, speed 0x4, lanes 0x1, fec 0x0                                                                                                                
        19           836685      INFO        TM  pktsched channel 0 sets speed (from 0) to 10000000 kbps                                                                                                                      
        20           836693      INFO      PORT  port_init: TRY_OTHER_SPEED flag is default set for T5                                                                                                                        
        21           836696      INFO      PORT  port_refill_other_speed: module speed 0x4, try speed 0x6                                                                                                                     
        22           842475      INFO      PORT  module[1]: gpio 15 vendor id 004020, identifier 0x03, SFP28(byte 36/192) 0xff, SFP(byte 3/131) 0x00, 1G (byte 6) 0x00                                                        
        23           842476      INFO      PORT  optical length(byte 15/142) 0, copper cable(byte 8/147) 0x04, length(byte 18/146) 3, module_type 0x04                                                                        
        24           842478      INFO      PORT  hw_mac_init_port[1], ptype 0x9, speed 0x4, lanes 0x2, fec 0x0                                                                                                                
        25           842503      INFO        TM  pktsched channel 1 sets speed (from 0) to 10000000 kbps                                                                                                                      
        26           842509      INFO      PORT  port_init: TRY_OTHER_SPEED flag is default set for T5                                                                                                                        
        27           842510      INFO      PORT  port_refill_other_speed: module speed 0x4, try speed 0x6                                                                                                                     
        28           842521      INFO        TM  pktsched channel 2 sets speed (from 0) to 1000000 kbps                                                                                                                       
        29           842534      INFO        TM  pktsched channel 3 sets speed (from 0) to 1000000 kbps                                                                                                                       
        30         11621579    NOTICE       FLR  pfn 1 vfn 1 via command                                                                                                                                                      
        31         11675139    NOTICE       FLR  pfn 1 vfn 2 via command                                                                                                                                                      
        32         11732187    NOTICE       FLR  pfn 1 vfn 3 via command                                                                                                                                                      
        33         11784054    NOTICE       FLR  pfn 1 vfn 4 via command                              
        34         11836035    NOTICE       FLR  pfn 1 vfn 5 via command
        35         11888064    NOTICE       FLR  pfn 1 vfn 6 via command
        36         11947325    NOTICE       FLR  pfn 1 vfn 7 via command
        37         12002982    NOTICE       FLR  pfn 1 vfn 8 via command
        38         12061187    NOTICE       FLR  pfn 1 vfn 9 via command
        39         12114554    NOTICE       FLR  pfn 1 vfn 10 via command
        40         12442526      INFO      PORT  port_link_state_handler[0] powering up
        41         12442541      INFO      PORT  port[0] update (flowcid 1456 rc 0)
        42         12742529      INFO      PORT  port_hss_sigdet[0]: hss_sigdet changed to 0x1
        43         12742533      INFO      PORT  hw_mac_link_status[0] int_cause 0x13019f0, link_status 0x12
        44         12942526      INFO      PORT  port[0] link up (1) (speed 0x4 acaps 0x70004 lpcaps 0x70000)
        45         12942531      INFO      PORT  port[0] set PAUSE PARAMS: pppen 0 txpe 0x1 rxpe 0x1
        46         12942539      INFO      PORT  port[0] update (flowcid 1456 rc 0)
        47         26080128       ERR      CORE  insufficient caps to process mailbox cmd: pfn 0x1 vfn 0x1; r_caps 0x86 wx_caps 0x82 required r_caps 0x81 w_caps 0x5
        48         26142587      INFO      PORT  port_link_state_handler[1] powering up
        49         26142593      INFO      PORT  port[1] update (flowcid 1475 rc 0)
        50         26442579      INFO      PORT  port_hss_sigdet[1]: hss_sigdet changed to 0x2
        51         28492130       ERR      CORE  insufficient caps to process mailbox cmd: pfn 0x1 vfn 0x2; r_caps 0x86 wx_caps 0x82 required r_caps 0x81 w_caps 0x5
        52         28542596      INFO      PORT  port[1] update (flowcid 1475 rc 0)
        53         28542600      INFO      PORT  port[1] update (flowcid 1476 rc 0)
        54         30042592      INFO      PORT  port_hss_sigdet[1]: hss_sigdet changed to 0x0
        55         30642594      INFO      PORT  port_hss_sigdet[1]: hss_sigdet changed to 0x2
        56         30642596      INFO      PORT  hw_mac_link_status[1] int_cause 0x10018f0, link_status 0x12
        57         30742595      INFO      PORT  hw_mac_link_status[1] int_cause 0x10018f0, link_status 0x12
        58         30942601      INFO      PORT  port[1] link up (1) (speed 0x4 acaps 0x70004 lpcaps 0x70000)
        59         30942605      INFO      PORT  port[1] set PAUSE PARAMS: pppen 0 txpe 0x1 rxpe 0x1
        60         30942614      INFO      PORT  port[1] update (flowcid 1475 rc 0)
        61         30942617      INFO      PORT  port[1] update (flowcid 1476 rc 0)
        62         30948649       ERR      CORE  insufficient caps to process mailbox cmd: pfn 0x1 vfn 0x3; r_caps 0x86 wx_caps 0x82 required r_caps 0x81 w_caps 0x5
        63         31042603      INFO      PORT  port[1] update (flowcid 1475 rc 0)
        64         31042607      INFO      PORT  port[1] update (flowcid 1476 rc 0)
        65         31042610      INFO      PORT  port[1] update (flowcid 1477 rc 0)
        66         33283779       ERR      CORE  insufficient caps to process mailbox cmd: pfn 0x1 vfn 0x4; r_caps 0x86 wx_caps 0x82 required r_caps 0x81 w_caps 0x5
        67         33342616      INFO      PORT  port[1] update (flowcid 1475 rc 0)
        68         33342621      INFO      PORT  port[1] update (flowcid 1476 rc 0)
        69         33342625      INFO      PORT  port[1] update (flowcid 1477 rc 0)
        70         33342628      INFO      PORT  port[1] update (flowcid 1478 rc 0)
        71         36739130       ERR      CORE  insufficient caps to process mailbox cmd: pfn 0x1 vfn 0x6; r_caps 0x86 wx_caps 0x82 required r_caps 0x81 w_caps 0x5
        72         36742629      INFO      PORT  port[1] update (flowcid 1475 rc 0)
        73         36742634      INFO      PORT  port[1] update (flowcid 1476 rc 0)
        74         36742637      INFO      PORT  port[1] update (flowcid 1477 rc 0)
        75         36742641      INFO      PORT  port[1] update (flowcid 1478 rc 0)
        76         36742644      INFO      PORT  port[1] update (flowcid 1480 rc 0)
        77         39149129       ERR      CORE  insufficient caps to process mailbox cmd: pfn 0x1 vfn 0x7; r_caps 0x86 wx_caps 0x82 required r_caps 0x81 w_caps 0x5
        78         39242638      INFO      PORT  port[1] update (flowcid 1475 rc 0)
        79         39242643      INFO      PORT  port[1] update (flowcid 1476 rc 0)
        80         39242646      INFO      PORT  port[1] update (flowcid 1477 rc 0)
        81         39242650      INFO      PORT  port[1] update (flowcid 1478 rc 0)
        82         39242653      INFO      PORT  port[1] update (flowcid 1480 rc 0)
        83         39242656      INFO      PORT  port[1] update (flowcid 1481 rc 0)
        84         43102156       ERR      CORE  insufficient caps to process mailbox cmd: pfn 0x1 vfn 0x8; r_caps 0x86 wx_caps 0x82 required r_caps 0x81 w_caps 0x5
        85         43142655      INFO      PORT  port[1] update (flowcid 1475 rc 0)
        86         43142659      INFO      PORT  port[1] update (flowcid 1476 rc 0)
        87         43142663      INFO      PORT  port[1] update (flowcid 1477 rc 0)
        88         43142666      INFO      PORT  port[1] update (flowcid 1478 rc 0)
        89         43142670      INFO      PORT  port[1] update (flowcid 1480 rc 0)
        90         43142673      INFO      PORT  port[1] update (flowcid 1481 rc 0)
        91         43142676      INFO      PORT  port[1] update (flowcid 1482 rc 0)
        92       1548458671      INFO       FLR  pfn 1 vfn 11 FSM start
        93       1548528664      INFO       FLR  pfn 1 vfn 11 FSM complete
        94       1600183614    NOTICE       FLR  pfn 1 vfn 11 via command
        95       1643893645       ERR      CORE  insufficient caps to process mailbox cmd: pfn 0x1 vfn 0xb; r_caps 0x86 wx_caps 0x82 required r_caps 0x81 w_caps 0x5
        96       1643949059      INFO      PORT  port[1] update (flowcid 1475 rc 0)
        97       1643949064      INFO      PORT  port[1] update (flowcid 1476 rc 0)
        98       1643949068      INFO      PORT  port[1] update (flowcid 1477 rc 0)
        99       1643949072      INFO      PORT  port[1] update (flowcid 1478 rc 0)
       100       1643949075      INFO      PORT  port[1] update (flowcid 1480 rc 0)
       101       1643949078      INFO      PORT  port[1] update (flowcid 1481 rc 0)
       102       1643949081      INFO      PORT  port[1] update (flowcid 1482 rc 0)
       103       1643949084      INFO      PORT  port[1] update (flowcid 1485 rc 0)
Comment 5 Mark McBride 2023-09-23 18:22:08 UTC
Saw there were driver fixes in the 14 beta releases, so just commenting to state that the problem is still present in 14.0-BETA3
Comment 6 Graham Perrin 2023-09-23 18:55:38 UTC
^Triage: 

* component kern for a kernel panic
* version
* severity
* keyword
* status.
Comment 7 Mark McBride 2023-10-28 21:36:39 UTC
Still present 14.0-RC3. For clarity, the kernel issue only happens if I try to reset the PCI device with devctl AFTER the card/driver fails.

To rule out the mobo/processor being the culprit on the the PCI bus, I have tested with a Supermicro X11SSM-F and Intel Xeon E3-1275 v6, and with a Supermicro X12STH-F and Intel Xeon E-2388G. Both fail with the same symptoms. Driver loads, device looks active in the VM. Can't ping anything, dhclient does nothing, and then after a minute or so the whole card fails (both physical ports, not just the one I'm using for SR-IOV).
Comment 8 Mark McBride 2023-11-06 22:44:30 UTC
... the plot thickens.

Out of curiosity I put a 2nd network card in the system and set it up with SR-IOV (and Intel X710-DA2). I enable 4 VFs, all passthrough. I booted up a virtual machine, got an IP just fine using one of the X710 vfs. 

And then the Chelsio died! 

To be sure, I connected to the system using serial-over-lan. Sure enough, I could connect to the virtual machine's console just fine and ping things over the X710 vf, but the Chelsio card, which was not being used at all by the virtual machine was unresponsive.

In short, enabling SR-IOV on an Intel X710 kills my Chelsio card that has 6 vfs for jails.
Comment 9 Mark McBride 2023-12-08 14:10:09 UTC
More data (as of 14.0-RELEASE-p2)

Bug still present. I reversed my testing to see how things would go with a VM-first approach.

* I disabled all jails
* I set SR-IOV on each of 2 NIC ports to 2 passthrough, 4 non-passthrough. Altogether, 12 VFs on 2 PFs.
* I set the processor's iGPU device and 4 NIC passthrough VFs for passthrough in /boot/loader.conf via pptdevs=
* I started a Debian VM (gpu + VF) and FreeBSD VM (VF) using vm-bhyve

Everything works fine. Both VMs start, I can see the GPU and setup networking on the VFs in Debian. And networking is fine in FreeBSD. No issues after several hours.

However, starting any jail using an SR-IOV VF will bring it all down.

* I config jails.conf with 1 jail: vnet; vnet.interface = "cxlv0";
* I start the jail

Almost immediately the Chelsio card fails. As usual, I can connect to the system via serial console and all seems well except networking is dead.

In summary, I can create VMs, or I can create jails. The first will succeed, and the second will trigger failure.
Comment 10 Mark McBride 2024-02-02 20:20:46 UTC
Problem still exists. Adding this link as it is a very detailed walk-through of exactly what I'm doing and the hardware I'm using:

https://markmcb.com/freebsd/vs_linux/sriov_is_first_class/