Bug 290793 - iovctl on mlx5en won't work
Summary: iovctl on mlx5en won't work
Status: In Progress
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 15.0-CURRENT
Hardware: amd64 Any
: --- Affects Some People
Assignee: Bjoern A. Zeeb
URL:
Keywords: regression
Depends on:
Blocks:
 
Reported: 2025-11-04 16:11 UTC by David BOYER
Modified: 2025-12-08 15:45 UTC (History)
8 users (show)

See Also:
bz: mfc-stable15?
bz: mfc-stable14-


Attachments
patch for sr-iov VF in linux_pci.c (746 bytes, patch)
2025-11-21 11:45 UTC, David BOYER
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description David BOYER 2025-11-04 16:11:19 UTC
Hello,
I am not able to create functional VF on Freebsd 15, while it works on 14.3.

I tried different scenarios, setting or not a mac-addr, one VF or multiple for example, but it made no difference. Plus, I tried firmwares 14.32.1010 and 14.32.1908.

The behavior is the same on different machines.

---------------------------------
The iovctl config looks like (mac-addr edited):

PF {
        device : "mlx5_core0";
        num_vfs : 1;
}

DEFAULT {
        passthrough : false;
}

VF-0 {
        mac-addr : "aa:bb:cc:dd:ee:ff";
}
--------------------------------------------

pciconf: for the 1st port 

root@pulseczar:~ # pciconf -lcv  mlx5_core0@pci0:1:0:0
mlx5_core0@pci0:1:0:0:  class=0x020000 rev=0x00 hdr=0x00 vendor=0x15b3 device=0x1015 subvendor=0x15b3 subdevice=0x0003
    vendor     = 'Mellanox Technologies'
    device     = 'MT27710 Family [ConnectX-4 Lx]'
    class      = network
    subclass   = ethernet
    cap 10[60] = PCI-Express 2 endpoint max data 512(512) FLR RO NS
                 max read 512
                 link x8(x8) speed 8.0(8.0) ASPM L1(L1)
    cap 03[48] = VPD
    cap 11[9c] = MSI-X supports 64 messages, enabled
                 Table in map 0x10[0x2000], PBA in map 0x10[0x3000]
    cap 09[c0] = vendor (length 24)
    cap 01[40] = powerspec 3  supports D0 D3  current D0
    ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected
    ecap 000e[150] = ARI 1
    ecap 0010[180] = SR-IOV 1 IOV enabled, Memory Space enabled, ARI enabled
                     2 VFs configured out of 8 supported
                     First VF RID Offset 0x0002, VF RID Stride 0x0001
                     VF Device ID 0x1016
                     Page Sizes: 4096 (enabled), 8192, 16384, 32768, 65536, 131072, 262144, 524288, 1048576, 2097152, 4194304
    iov bar  [1a4] = type Prefetchable Memory, range 64, base rxfcf6000000, size 1048576, enabled
    ecap 0019[1c0] = PCIe Sec 1 lane errors 0
    ecap 000d[230] = ACS 1 Source Validation unavailable, Translation Blocking unavailable
                     P2P Req Redirect unavailable, P2P Cmpl Redirect unavailable
                     P2P Upstream Forwarding unavailable, P2P Egress Control unavailable
                     P2P Direct Translated unavailable, Enhanced Capability unavailable

----------------------------------------------------------

pciconf for the 1st VF:

none0@pci0:1:0:2:       class=0x020000 rev=0x00 hdr=0x00 vendor=0x15b3 device=0x1016 subvendor=0x15b3 subdevice=0x0003
    vendor     = 'Mellanox Technologies'
    device     = 'MT27710 Family [ConnectX-4 Lx Virtual Function]'
    class      = network
    subclass   = ethernet
    cap 10[60] = PCI-Express 2 endpoint max data 128(512) FLR
                 max read 128
                 link x0(x8) speed 0.0(8.0) ASPM disabled(L1)
    cap 11[9c] = MSI-X supports 12 messages
                 Table in map 0x10[0x2000], PBA in map 0x10[0x3000]

-----------------------------------------------------------

extract from dmesg:

mlx5_core: INFO: (mlx5_core0): E-Switch: SRIOV enabled: active vports(3)
mlx5_core2: <mlx5_core> at device 0.2 on pci1
mlx5_core2: lkpi_pci_request_region: failed to alloc bar 0 type 3 rid 16
mlx5_core2: WARN: wait_fw_init:778:(pid 4309): Waiting for FW initialization, timeout abort in 3 s
mlx5_core2: WARN: wait_fw_init:778:(pid 4309): Waiting for FW initialization, timeout abort in 0 s
mlx5_core2: Firmware over 5000 MS in pre-initializing state, aborting
mlx5_core2: ERR: init_one:1709:(pid 4309): mlx5_load_one failed -16
device_attach: mlx5_core2 attach returned 16

----------------------------------
Comment 1 David BOYER 2025-11-12 22:46:39 UTC
As a side note, I was finally able to test the same configuration with an Intel x710 card and it works correctly.
Comment 2 David BOYER 2025-11-21 00:58:56 UTC
The bad behavior was introduced by the commit ff31767e530ab.
The bar appears as not (yet?) valid 
I tried to find a function or a boolean such as "is_vf" to modify if condition, but I failed.
The only working patch I could come up with is the following:


diff --git a/sys/compat/linuxkpi/common/src/linux_pci.c b/sys/compat/linuxkpi/common/src/linux_pci.c
index 8507a59a8df3..04ff817666a3 100644
--- a/sys/compat/linuxkpi/common/src/linux_pci.c
+++ b/sys/compat/linuxkpi/common/src/linux_pci.c
@@ -1223,12 +1223,6 @@ lkpi_pci_request_region(struct pci_dev *pdev, int bar, const char *res_name,
        if (!lkpi_pci_bar_id_valid(bar))
                return (-EINVAL);
 
-       /*
-        * If the bar is not valid, return success without adding the BAR;
-        * otherwise linuxkpi_pcim_request_all_regions() will error.
-        */
-       if (pci_resource_len(pdev, bar) == 0)
-               return (0);
        /* Likewise if it is neither IO nor MEM, nothing to do for us. */
        type = pci_resource_type(pdev, bar);
        if (type < 0)
Comment 3 David BOYER 2025-11-21 11:45:02 UTC
Created attachment 265551 [details]
patch for sr-iov VF in linux_pci.c

On FreeBSD, the BAR size of VFs can temporarily be 0 and. This makes the lkpi_pci_request_region function exits, disabling further manipulations.
This patch introduces an exception for VFs so they can work properly.

A better approach would be maybe to check the size of the BAR when initializing the PF.
Comment 4 Bjoern A. Zeeb freebsd_committer freebsd_triage 2025-11-21 18:51:58 UTC
(In reply to David BOYER from comment #3)

What's the call graph into this?


I can only find mlx5_core/mlx5_main.c::request_bar -> pci_request_regions()
which ends up at the end of a loop in lkpi_pci_request_region().

You made me actually look at Linux code and there you end up in __pci_request_region()
which has the same checks at the beginning it seems:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pci.c#n3935

So the problem is likely elsewhere in the LinuxKPI emulation or in mlx5.


Also the error logged:
mlx5_core2: lkpi_pci_request_region: failed to alloc bar 0 type 3 rid 16
comes from after that check?  So if it would fail and return we would not see this?  Is that a secondary problem?


If you say "On FreeBSD, the BAR size of VFs can temporarily be 0 and."
Would we assume mlx5core to only call pci_request_regions() once they are properly initialized?


I am trying to get access to a mlx5 in order to see if I can dig into this but someone who knows the driver should likely be able to do this a lot easier than I.
Comment 5 Bjoern A. Zeeb freebsd_committer freebsd_triage 2025-11-21 20:41:07 UTC
Hi,

I found myself access to a test machine with an mlx5.
I am building a main with a GENERIC kernel currently to netboot.

Is there a manual on how to set this up / reproduce this or can you give me one?
Comment 6 David BOYER 2025-11-22 10:27:59 UTC
What I did:

Create the file /etc/iovctl/mce0.conf
---------------------------------

PF {
        device : "mlx5_core0";
        num_vfs : 1;
}

DEFAULT {
        passthrough : false;
}

VF-0 {
        mac-addr : "aa:bb:cc:dd:ee:ff";
}
--------------------------------------------
Then run: 
iovctl -C -f /etc/iovctl/mce0.conf


If you want that to start on boot, add to /etc/rc.conf.d/iovctl
iovctl_enable="YES"
iovctl_files="/etc/iovctl/*.conf"
Comment 7 David BOYER 2025-11-22 11:45:11 UTC
(In reply to David BOYER from comment #3)

What's the call graph into this?

I had answered yesterday but just gor a message reading that a collision occured.

I can only find mlx5_core/mlx5_main.c::request_bar -> pci_request_regions()
which ends up at the end of a loop in lkpi_pci_request_region().
==> The same for me

You made me actually look at Linux code and there you end up in __pci_request_region()
which has the same checks at the beginning it seems:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pci.c#n3935

So the problem is likely elsewhere in the LinuxKPI emulation or in mlx5.
My understanding is that pci_resource_len may return 0 on FreeBSD while linux reports the right size.
Then, on FreeBSD, pci_request_region exits because pci_resource_len == 0, bus_alloc_resource_any is not called and the memory is not allocated.
Why linux reports the right size ? 
https://wiki.osdev.org/PCI reads "To determine the amount of address space needed by a PCI device, you must save the original value of the BAR, write a value of all 1's to the register, then read it back. The amount of memory can then be determined by masking the information bits, performing a bitwise NOT ('~' in C), and incrementing the value by 1. The original value of the BAR should then be restored".
If I read correctly, this is done by pci_read_bases, found in "drivers/pci/probe.c". It purpose is to find the BAR memory size.

Also the error logged:
mlx5_core2: lkpi_pci_request_region: failed to alloc bar 0 type 3 rid 16
comes from after that check?  So if it would fail and return we would not see this?  Is that a secondary problem?
Yes, it comes after. Because the memory allocation step did not run (pci_resource_len == 0).

If you say "On FreeBSD, the BAR size of VFs can temporarily be 0 and."
Would we assume mlx5core to only call pci_request_regions() once they are properly initialized?
I guess we can say they are not properly initialized because it is done by the kernel (on linux) not by the driver itself.

Please note that I am not an expert at all and had not read sourcecode of linux and FreeBSD before, so I may be wrong somewhere, especially in my wording.
Comment 8 David BOYER 2025-11-22 11:49:46 UTC
(In reply to Bjoern A. Zeeb from comment #4)
I made a mistake, my last message was in response to Bjoern comment #4, not my comment #3.
I cannot modify it, hence this correction.
Comment 9 Bjoern A. Zeeb freebsd_committer freebsd_triage 2025-11-23 17:57:56 UTC
(In reply to David BOYER from comment #6)

Hi,

thanks for the apparently simple sample.  I am still running into a prereq problem:

I enabled SR-IOV in BIOS.
I enabled it in the Mellanox Firmware flexboot thingy.
GENERIC has SR-IOV support in it as much as I could see.

# ls -l /dev/iov
ls: /dev/iov: No such file or directory

# iovctl -C -f /etc/iovctl/mce0.conf
iovctl: Could not open device '/dev/iov/mlx5_core0': No such file or directory

What else needs doing to get this?

# pciconf -lBbcevV mlx5_core0
mlx5_core0@pci0:2:0:0:  class=0x020000 rev=0x00 hdr=0x00 vendor=0x15b3 device=0x1013 subvendor=0x15b3 subdevice=0x0008
    vendor     = 'Mellanox Technologies'
    device     = 'MT27700 Family [ConnectX-4]'
    class      = network
    subclass   = ethernet
    bar   [10] = type Prefetchable Memory, range 64, base rx383ffc000000, size 33554432, enabled
    cap 10[60] = PCI-Express 2 endpoint max data 256(512) FLR NS
                 max read 512
                 link x16(x16) speed 8.0(8.0)
    cap 03[48] = VPD
    cap 11[9c] = MSI-X supports 128 messages, enabled
                 Table in map 0x10[0x2000], PBA in map 0x10[0x3000]
    cap 09[c0] = vendor (length 24)
    cap 01[40] = powerspec 3  supports D0 D3  current D0
    ecap 0003[100] = Serial 1 xxx
    ecap 0001[110] = AER 1 0 fatal 0 non-fatal 1 corrected
    ecap 000e[170] = ARI 1
    ecap 0019[1c0] = PCIe Sec 1 lane errors 0
  PCI-e errors = Correctable Error Detected
                 Unsupported Request Detected
     Corrected = Advisory Non-Fatal Error
    VPD ident  = 'CX416A - ConnectX-4 QSFP'
    VPD ro PN  = 'MCX416A-CCAT         '
    VPD ro EC  = 'A5'
    VPD ro SN  = 'MT1544X0xxxx            '
    VPD ro V0  = 'PCIeGen3 x16    '
Comment 10 David BOYER 2025-11-23 21:31:06 UTC
(In reply to Bjoern A. Zeeb from comment #9)

It's hard to tell. 

You can install mstflint:
pkg install -y mstflint

Then execute: mstconfig -d pci0:2:0:0 q | grep  -E 'NUM_OF_VFS|SRIOV_EN'
NUM_OF_VFS -> number of VFS allowed
SRIOV_EN -> sr-iov enabled

On my computer:
mstconfig -d pci0:1:0:0 q | grep  -E 'NUM_OF_VFS|SRIOV_EN'
        NUM_OF_VFS                                  8                   
        SRIOV_EN                                    True(1)

This can be changed (enabling sriov and setting 4 VFs) : mstconfig -d pci0:2:0:0 set SRIOV_EN=1 NUM_OF_VFS=4
Maybe your bios is not fully compatible or missing something, maybe something else.
From https://docs.nvidia.com/networking/display/mlnxofedv53100143/single+root+io+virtualization+(sr-iov)
"SR-IOV" must be enabled (obviously) but virtualization too.
For AMD: > dmesg | grep --color -i svm
For Intel: the freebsd forum says it shoudld be : dmesg | grep VT-x. But I cannot confirm.
I know for sure that iommu must be enabled, but you already know.

If that does not help, I might need more information. The models of your motherboard and cpu could help.
Comment 11 Konstantin Belousov freebsd_committer freebsd_triage 2025-11-23 21:53:25 UTC
(In reply to David BOYER from comment #10)
iommus (VT-D/AMD IOMMU) as well as VT-x/SVM are not needed at all for the device
SRIOV enable or use.

Bjoern, please boot with verbose, then look at the dmesg lines reported by the
mlx5en driver.  It must say something about configuring eswitch together with
the max number of VF ports configured.
Comment 12 Bjoern A. Zeeb freebsd_committer freebsd_triage 2025-11-23 22:03:13 UTC
(In reply to David BOYER from comment #10)

Thanks. Good to know about the tool and docs.
It came up with what I had configured according to the tool and /dev/iov was there...  Maybe a complete power cycle (power off/on) had helped.


Let me start again; it's a boot verbose (I have no idea what is normally logged and what is not):

# uname -mrv
16.0-CURRENT FreeBSD 16.0-CURRENT #2 mlx5-n282064-a8740ba860bf: Fri Nov 21 21:21:29 UTC 2025     bz@dev.example.net:/tank/users/bz/obj/tank/users/bz/git/FreeBSD/freebsd-src/amd64.amd64/sys/GENERIC amd64

# iovctl -D -f /etc/iovctl/mce0.conf
mlx5_core: INFO: (mlx5_core0): E-Switch: disable SRIOV: active vports(4)

# pciconf -lv | grep -A4 vendor=0x15b3
mlx5_core0@pci0:2:0:0:  class=0x020000 rev=0x00 hdr=0x00 vendor=0x15b3 device=0x1013 subvendor=0x15b3 subdevice=0x0008
    vendor     = 'Mellanox Technologies'
    device     = 'MT27700 Family [ConnectX-4]'
    class      = network
    subclass   = ethernet
mlx5_core1@pci0:2:0:1:  class=0x020000 rev=0x00 hdr=0x00 vendor=0x15b3 device=0x1013 subvendor=0x15b3 subdevice=0x0008
    vendor     = 'Mellanox Technologies'
    device     = 'MT27700 Family [ConnectX-4]'
    class      = network
    subclass   = ethernet

# cat /etc/iovctl/mce0.conf 
PF {
        device : mlx5_core0;
        num_vfs : 3;
}

DEFAULT {
        passthrough : true;
}

# VF for use by host
VF-0 {
        mac-addr : "02:01:02:03:04:00";
        passthrough : false;
}
VF-1 {
        mac-addr : "02:01:02:03:04:01";
}

# iovctl -C -f /etc/iovctl/mce0.conf
mlx5_core: INFO: (mlx5_core0): E-Switch: E-Switch enable SRIOV: nvfs(3)
mlx5_core: WARN: FDB: Failed to add flow rule: dmac_v(0xfffff8000182ba0cM) dmac_c(0xfffff8000182b80cM) -> vport(0), err(-45)
mlx5_core: INFO: (mlx5_core0): E-Switch: SRIOV enabled: active vports(4)
pcib3: allocated prefetch range (0x383fd2000000-0x383fd7ffffff) for rid 1a4 of mlx5_core0
mlx5_core0: Lazy allocation of 0x6000000 bytes rid 0x1a4 type 3 at 0x383fd2000000
found-> vendor=0x15b3, dev=0x1014, revid=0x00
        domain=0, bus=2, slot=0, func=2
        class=02-00-00, hdrtype=0x00, mfdev=0
        cmdreg=0x0000, statreg=0x0010, cachelnsz=0 (dwords)
        lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)
        MSI-X supports 6 messages in map 0x10
mlx5_core0: WARN: mlx5_eswitch_set_vport_mac:1199:(pid 6379): Failed to mlx5_modify_nic_vport_node_guid vport(1) err=(-45)
mlx5_core0: ERR: mlx5_iov_add_vf:1969:(pid 6379): setting MAC for VF 1 failed, error 45
found-> vendor=0x15b3, dev=0x1014, revid=0x00
        domain=0, bus=2, slot=0, func=3
        class=02-00-00, hdrtype=0x00, mfdev=0
        cmdreg=0x0000, statreg=0x0010, cachelnsz=0 (dwords)
        lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)
        MSI-X supports 6 messages in map 0x10
mlx5_core0: WARN: mlx5_eswitch_set_vport_mac:1199:(pid 6379): Failed to mlx5_modify_nic_vport_node_guid vport(2) err=(-45)
mlx5_core0: ERR: mlx5_iov_add_vf:1969:(pid 6379): setting MAC for VF 2 failed, error 45
found-> vendor=0x15b3, dev=0x1014, revid=0x00
        domain=0, bus=2, slot=0, func=4
        class=02-00-00, hdrtype=0x00, mfdev=0
        cmdreg=0x0000, statreg=0x0010, cachelnsz=0 (dwords)
        lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)
        MSI-X supports 6 messages in map 0x10
mlx5_core2: <mlx5_core> at device 0.2 on pci3
pcib3: attempting to grow memory window for (0-0xffffffff,0x2000000)
        front candidate range: 0xf8000000-0xf9ffffff
        back candidate range: 0xfc000000-0xfdffffff
mlx5_core2: 0x2000000 bytes of rid 0x10 res 3 failed (0, 0xffffffffffffffff).
pcib3: attempting to grow memory window for (0-0xffffffff,0x2000000)
        front candidate range: 0xf8000000-0xf9ffffff
        back candidate range: 0xfc000000-0xfdffffff
mlx5_core2: 0x2000000 bytes of rid 0x10 res 3 failed (0, 0xffffffffffffffff).
mlx5_core2: ERR: mlx5_cmd_init:1557:(pid 6379): Driver cmdif rev(5) differs from firmware's(62446)
mlx5_core2: ERR: mlx5_load_one:1110:(pid 6379): Failed initializing command interface, aborting
mlx5_core2: ERR: init_one:1709:(pid 6379): mlx5_load_one failed -22
device_attach: mlx5_core2 attach returned 22
pci3: <network, ethernet> at device 0.3 (no driver attached)
pci3: <network, ethernet> at device 0.4 (no driver attached)

# pciconf -lv | grep -A4 vendor=0x15b3
mlx5_core0@pci0:2:0:0:  class=0x020000 rev=0x00 hdr=0x00 vendor=0x15b3 device=0x1013 subvendor=0x15b3 subdevice=0x0008
    vendor     = 'Mellanox Technologies'
    device     = 'MT27700 Family [ConnectX-4]'
    class      = network
    subclass   = ethernet
mlx5_core1@pci0:2:0:1:  class=0x020000 rev=0x00 hdr=0x00 vendor=0x15b3 device=0x1013 subvendor=0x15b3 subdevice=0x0008
    vendor     = 'Mellanox Technologies'
    device     = 'MT27700 Family [ConnectX-4]'
    class      = network
    subclass   = ethernet
--
none62@pci0:2:0:2:      class=0x020000 rev=0x00 hdr=0x00 vendor=0x15b3 device=0x1014 subvendor=0x15b3 subdevice=0x0008
    vendor     = 'Mellanox Technologies'
    device     = 'MT27700 Family [ConnectX-4 Virtual Function]'
    class      = network
    subclass   = ethernet
ppt0@pci0:2:0:3:        class=0x020000 rev=0x00 hdr=0x00 vendor=0x15b3 device=0x1014 subvendor=0x15b3 subdevice=0x0008
    vendor     = 'Mellanox Technologies'
    device     = 'MT27700 Family [ConnectX-4 Virtual Function]'
    class      = network
    subclass   = ethernet
ppt1@pci0:2:0:4:        class=0x020000 rev=0x00 hdr=0x00 vendor=0x15b3 device=0x1014 subvendor=0x15b3 subdevice=0x0008
    vendor     = 'Mellanox Technologies'
    device     = 'MT27700 Family [ConnectX-4 Virtual Function]'
    class      = network
    subclass   = ethernet


I assume the CF-0 "none6@pci0:2:0:2" node should have become mlx5_core2 (mce2)?
Comment 13 David BOYER 2025-11-23 22:11:11 UTC
(In reply to Bjoern A. Zeeb from comment #12)
I assume the CF-0 "none6@pci0:2:0:2" node should have become mlx5_core2 (mce2)?

Exactly.
For what it is worth, I get this with my modification:
# pciconf -lBbcevV mlx5_core2@pci0:1:0:2
mlx5_core2@pci0:1:0:2:  class=0x020000 rev=0x00 hdr=0x00 vendor=0x15b3 device=0x1016 subvendor=0x15b3 subdevice=0x0003
    vendor     = 'Mellanox Technologies'
    device     = 'MT27710 Family [ConnectX-4 Lx Virtual Function]'
    class      = network
    subclass   = ethernet
    bar   [10] = type Memory, range 32, base rxfcf6000000, size 1048576, enabled
    cap 10[60] = PCI-Express 2 endpoint max data 128(512) FLR
                 max read 128
                 link x0(x8) speed 0.0(8.0) ASPM disabled(L1)
    cap 11[9c] = MSI-X supports 12 messages, enabled
                 Table in map 0x10[0x2000], PBA in map 0x10[0x3000]
Comment 14 Bjoern A. Zeeb freebsd_committer freebsd_triage 2025-11-23 22:17:36 UTC
(In reply to David BOYER from comment #13)

I am a bit puzzled about all the WARN/ERR that are around it.  Are they normal or is there a driver/firmware mismatch?  I think these cards weren't used in a few years.
Comment 15 Konstantin Belousov freebsd_committer freebsd_triage 2025-11-23 22:26:29 UTC
(In reply to Bjoern A. Zeeb from comment #12)
I was asking about mlx5 driver messages from the boot, without even trying to
configure SRIOV.

That said, indeed, firmware must be updated.
Comment 16 Bjoern A. Zeeb freebsd_committer freebsd_triage 2025-11-23 22:37:15 UTC
(In reply to Konstantin Belousov from comment #15)

Yes, your comment came overlapping with me typing the before already.

[3.686949] mlx5_core0: <mlx5_core> mem 0x383ffc000000-0x383ffdffffff irq 32 at device 0.0 on pci3
[3.694355] mlx5: Mellanox Core driver 3.7.1 (November 2021)  << missing \n?
[4.061886] mlx5_core0: attempting to allocate 15 MSI-X vectors (128 supported)
..
[4.143047] mlx5_core0: using IRQs 72-86 for MSI-X
[4.166061] mlx5_core0: WARN: mlx5_vsc_set_space:125:(pid 0): Space 0x7 is not supported.
[4.172669] mlx5_core0: WARN: mlx5_fwdump_prep:99:(pid 0): VSC scan space is not supported
[4.179641] mlx5_core: INFO: (mlx5_core0): E-Switch: Total vports 11, l2 table size(65536), per vport: max uc(1024) max mc(16384)
[4.225477] pci0:2:0:1: reprobing on driver added
[4.231668] mlx5_core1: <mlx5_core> mem 0x383ffa000000-0x383ffbffffff irq 32 at device 0.1 on pci3
[4.575887] mlx5_core1: attempting to allocate 15 MSI-X vectors (128 supported)

[4.657320] mlx5_core1: using IRQs 87-101 for MSI-X
[4.685698] mlx5_core1: WARN: mlx5_vsc_set_space:125:(pid 0): Space 0x7 is not supported.
[4.692308] mlx5_core1: WARN: mlx5_fwdump_prep:99:(pid 0): VSC scan space is not supported
[4.699282] mlx5_core: INFO: (mlx5_core1): E-Switch: Total vports 11, l2 table size(65536), per vport: max uc(1024) max mc(16384)

[4.773045] mce0: Ethernet address: 7c:fe:90:30:xx:xx
[4.776802] mce0: link state changed to DOWN
[4.778092] mlx5_core0: ERR: mlx5_cmd_check:714:(pid 0): ACCESS_REG(0x805) op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)

[4.795174] mlx5_core0: ERR: mlx5_cmd_check:714:(pid 0): ACCESS_REG(0x805) op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)

[4.882680] mce1: bpf attached
[4.884171] mce1: Ethernet address: 7c:fe:90:30:xx:xx
[4.887923] mce1: link state changed to DOWN
[4.889062] mlx5_core1: ERR: mlx5_cmd_check:714:(pid 0): ACCESS_REG(0x805) op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
[4.906441] mlx5_core1: ERR: mlx5_cmd_check:714:(pid 0): ACCESS_REG(0x805) op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)


How do I update the firmware to the right version?
Sorry I am asking for extra hand holding but I figure it'll help to make progress faster.
Comment 17 Konstantin Belousov freebsd_committer freebsd_triage 2025-11-23 22:50:03 UTC
Syndrome 0x6c4d48 means 'register is not yet implemented'.  In other words, this
means that indeed firmware is too old for the driver.
Comment 18 David BOYER 2025-11-23 23:06:00 UTC
(In reply to Bjoern A. Zeeb from comment #16)

The download list of firmwares is available here: https://network.nvidia.com/support/firmware/connectx4en/
There are two firmwares

1) MCX416A-CCAT: the name looks exactly like your model but the firmware is old
to find it manually: archive versions -> 12.28.2302 -> MCX416A-CCAT -> MT_2150110033
the direct link is https://www.mellanox.com/downloads/firmware/fw-ConnectX4-rel-12_28_2302-MCX416A-CCA_Ax-UEFI-14.21.17-FlexBoot-3.6.102.bin.zip

2) MCX416A-CCA: without ending T but the firmware is recent
to find it manually: current versions -> MCX416A-CCA -> MT_2150110033
The direct link is https://www.mellanox.com/downloads/firmware/fw-ConnectX4-rel-12_28_4704-MCX416A-CCA_Ax-UEFI-14.21.21-FlexBoot-3.6.103.bin.zip

That being said, both files read 'MCX416A-CCA_Ax' so I think it is a better bet to test the recent one.

There are some information here to flash the firmware:
https://docs.nvidia.com/networking/display/mft/mstflint+burning+a+firmware+image
In summary
# mstflint -d <device> -i <fw-file> burn
where <device> is pci0:2:0:0 for you and <fw_file> is the firmware found in the archive

Then reboot
Comment 19 Bjoern A. Zeeb freebsd_committer freebsd_triage 2025-11-23 23:22:25 UTC
(In reply to David BOYER from comment #18)

Thanks so much for all the help!

The older firmware (Sept 2024) seems to be the full match (also PSID according to  mstflint -d pci0:2:0:0 q) indeed.  I'll start with that.

There's various other pages such as [1] which have a slightly different procedure and if modules are in the kernel you cannot unload, etc.  And depending on where I ended up information differed but I also found some Linux info [2] for how to get PSID and that helped.

Seems I can get myself some tea while those 16MB get written ...

# mstflint -d pci0:2:0:0 -i /root/fw-ConnectX4-rel-12_28_2302-MCX416A-CCA_Ax-UEFI-14.21.17-FlexBoot-3.6.102.bin burn
Done.
    Current FW version on flash:  12.12.1260
    New FW version:               12.28.2302


Burning FW image without signatures -   6%

<twiddle>

I keep this here as I hope someone else might also find it useful one day.  I know there's a second card in another machine next to this one which is currently busy in use by someone else and I'll likely have to do this all again one day ;-)

[1] https://docs.nvidia.com/networking/display/freebsdv371/firmware+programming
[2] https://network.nvidia.com/support/firmware/identification/
Comment 20 Bjoern A. Zeeb freebsd_committer freebsd_triage 2025-11-24 03:17:22 UTC
With new firmware the LinuxKPI error shows up as well (in addition to my instrumentation).
Also the "undo" from mlx5_core in LinuxKPI pci_release_resource() results in a panic (assertion) inside pci(4).

I do understand the problem and the workaround from the patch seems fine for now.
It is the cause but the reasoning is wrong.
pci_resource_len() calls lkpi_pci_get_bar() with true, which will create the resource and then in the follow-up bus_alloc_resource_any() fails (hence printing the error).
If we make pci_resource_len() call lkpi_pci_get_bar() with false the bar won't be there and the == 0 check will fail.
So in the end the check becomes pointless in this order.

However I need to fix the callers to deal with the problem and see how to do error handling there.

I'll likely not have time before mid-week to look again but I have to stop now. It's Mon 4:15AM.

Thanks for all the help for getting me setup for this so I could debug it!!!


Just so I do not lose it -- the panic.  That's likely for someone else to look.

mlx5_core2: WARN: wait_fw_init:779:(pid 5986): Waiting for FW initialization, timeout abort in 3 s
mlx5_core2: WARN: wait_fw_init:779:(pid 5986): Waiting for FW initialization, timeout abort in 0 s
mlx5_core2: Firmware over 5000 MS in pre-initializing state, aborting
mlx5_core2: ERR: init_one:1710:(pid 5986): mlx5_load_one failed -16
panic: pci_vf_release_mem_resource: rman 0xfffff80049fb9e30 doesn't match for resource 0xfffff80001a71d80
cpuid = 7
time = 1763946040
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00ff27f760
vpanic() at vpanic+0x136/frame 0xfffffe00ff27f890
panic() at panic+0x43/frame 0xfffffe00ff27f8f0
pci_vf_release_mem_resource() at pci_vf_release_mem_resource+0xf6/frame 0xfffffe00ff27f920
linuxkpi_pci_release_regions() at linuxkpi_pci_release_regions+0x10/frame 0xfffffe00ff27f940
mlx5_pci_close() at mlx5_pci_close+0x73/frame 0xfffffe00ff27f970
init_one() at init_one+0x138f/frame 0xfffffe00ff27f9e0
linux_pci_attach_device() at linux_pci_attach_device+0x56b/frame 0xfffffe00ff27fa40
device_attach() at device_attach+0x45b/frame 0xfffffe00ff27fa90
bus_attach_children() at bus_attach_children+0x4a/frame 0xfffffe00ff27fab0
pci_iov_enumerate_vfs() at pci_iov_enumerate_vfs+0x3b6/frame 0xfffffe00ff27fb30
pci_iov_ioctl() at pci_iov_ioctl+0x844/frame 0xfffffe00ff27fbc0
devfs_ioctl() at devfs_ioctl+0xd1/frame 0xfffffe00ff27fc10
VOP_IOCTL_APV() at VOP_IOCTL_APV+0x51/frame 0xfffffe00ff27fc40
vn_ioctl() at vn_ioctl+0x160/frame 0xfffffe00ff27fcb0
devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfffffe00ff27fcd0
kern_ioctl() at kern_ioctl+0x2a1/frame 0xfffffe00ff27fd40
sys_ioctl() at sys_ioctl+0x12f/frame 0xfffffe00ff27fe00
amd64_syscall() at amd64_syscall+0x169/frame 0xfffffe00ff27ff30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00ff27ff30
--- syscall (54, FreeBSD ELF64, ioctl), rip = 0x378cf1afa03a, rsp = 0x378ced110678, rbp = 0x378ced1106d0 ---
KDB: enter: panic
[ thread pid 5986 tid 100225 ]
Stopped at      kdb_enter+0x33: movq    $0,0x1217452(%rip)
Comment 21 Bjoern A. Zeeb freebsd_committer freebsd_triage 2025-11-24 22:32:12 UTC
https://reviews.freebsd.org/D53902
Comment 22 commit-hook freebsd_committer freebsd_triage 2025-12-02 18:47:17 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=ed29ffd396e522a45ab1980c12a75b3409b51712

commit ed29ffd396e522a45ab1980c12a75b3409b51712
Author:     Bjoern A. Zeeb <bz@FreeBSD.org>
AuthorDate: 2025-12-02 16:04:22 +0000
Commit:     Bjoern A. Zeeb <bz@FreeBSD.org>
CommitDate: 2025-12-02 18:46:20 +0000

    LinuxKPI: pci: undo the pci_resource_len() check in lkpi_pci_request_region()

    Creating non-passthru SR-IOV interfaces on a mlx5en(4) failed.
    The problem lies in the pci_resource_len() call but not that the BAR length
    is tmeporary 0 but in that we call lkpi_pci_get_bar() with a true argument
    which will create the BAR resource for us and report the approriate length
    back.  However, the later call to bus_alloc_resource_any() will then fail
    given the resource already exists.

    Restore the previous behaviour and let bus_alloc_resource_any() do the
    work.  Adjust the return values from -ENODEV to -EBUSY to match callers
    expectations.

    In linuxkpi_pcim_request_all_regions(), like in linuxkpi_pci_request_regions(),
    filter out the -EBUSY errors as "not an error" and try the next bar.
    This also seems to be consistent with the expectations of the callers.

    PR:             290793
    Reported by:    David BOYER (jcduss13 gmail.com)
    Tested on:      mlx5en, iwlwifi, mt7921
    Reviewed by:    kib
    Fixes:          7e21158d44cd "implement [linuxkpi_]pcim_request_all_regions()"
    Sponsored by:   The FreeBSD Foundation
    MFC after:      3 days
    Differential Revision: https://reviews.freebsd.org/D53902

 sys/compat/linuxkpi/common/src/linux_pci.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)
Comment 23 commit-hook freebsd_committer freebsd_triage 2025-12-08 15:45:21 UTC
A commit in branch stable/15 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=2032abb31cbe067d41067a81e529d91f1bace4c9

commit 2032abb31cbe067d41067a81e529d91f1bace4c9
Author:     Bjoern A. Zeeb <bz@FreeBSD.org>
AuthorDate: 2025-12-02 16:04:22 +0000
Commit:     Bjoern A. Zeeb <bz@FreeBSD.org>
CommitDate: 2025-12-08 15:43:51 +0000

    LinuxKPI: pci: undo the pci_resource_len() check in lkpi_pci_request_region()

    Creating non-passthru SR-IOV interfaces on a mlx5en(4) failed.
    The problem lies in the pci_resource_len() call but not that the BAR length
    is tmeporary 0 but in that we call lkpi_pci_get_bar() with a true argument
    which will create the BAR resource for us and report the approriate length
    back.  However, the later call to bus_alloc_resource_any() will then fail
    given the resource already exists.

    Restore the previous behaviour and let bus_alloc_resource_any() do the
    work.  Adjust the return values from -ENODEV to -EBUSY to match callers
    expectations.

    In linuxkpi_pcim_request_all_regions(), like in linuxkpi_pci_request_regions(),
    filter out the -EBUSY errors as "not an error" and try the next bar.
    This also seems to be consistent with the expectations of the callers.

    PR:             290793
    Reported by:    David BOYER (jcduss13 gmail.com)
    Tested on:      mlx5en, iwlwifi, mt7921
    Reviewed by:    kib
    Fixes:          7e21158d44cd "implement [linuxkpi_]pcim_request_all_regions()"
    Sponsored by:   The FreeBSD Foundation
    Differential Revision: https://reviews.freebsd.org/D53902

    (cherry picked from commit ed29ffd396e522a45ab1980c12a75b3409b51712)

 sys/compat/linuxkpi/common/src/linux_pci.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)