Summary: | hn(4): Communication stops when enabling SR-IOV secondary mlx5en(4) interface (640FLR-SFP28) on Windows Server 2022 | ||||||
---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | Michael <michael.adm> | ||||
Component: | kern | Assignee: | Hans Petter Selasky <hselasky> | ||||
Status: | Closed FIXED | ||||||
Severity: | Affects Some People | CC: | hselasky, net, weh | ||||
Priority: | --- | Flags: | koobs:
maintainer-feedback+
koobs: mfc-stable13? koobs: mfc-stable12- |
||||
Version: | 13.0-STABLE | ||||||
Hardware: | amd64 | ||||||
OS: | Any | ||||||
Attachments: |
|
Description
Michael
2022-02-06 14:04:45 UTC
Made a mistake here:
> "We make sure that there is a network connection through the first network adapter. Communication has stopped through this network adapter, both from the VM and from outside to the VM."
Network communication problems occur when SR-IOV is enabled only on the second network adapter VF mlx5en hn1 (ConnectX-4 Lx)
Hypervisor: Windows Server 2019 (1809, 17763.2510) Network adapters: 1. Mellanox ConnectX-3 EN NIC for OCP; 10GbE; dual-port SFP+; PCIe3.0 x8; IPMI disabled; R6 (Firmware version: 2.42.5000, Driver version: 5.50.14740.1) 2. Mellanox ConnectX-4 Lx - HPE Ethernet 10/25Gb 2-port 640FLR-SFP28 Adapter (Firmware version: 14.26.1040, Driver version: 2.80.25134.0) Guest: FreeBSD-14.0-CURRENT-amd64-20220203-e2fe58d61b7-252875-disc1.iso Generation: 2 (Configuration Version: 9.0) No changes were made (installing the system out of the box). Same behavior: SR-IOV enabled on ConnectX-3 VF mlx4en communication works, SR-IOV enabled on ConnectX-4 VF mlx5en communication does not work. Thank you for your report Michael. Are you able to test reproducibility with FreeBSD 12 and 13 images? The logs look normal to me. Several questions: 1. If only the second SRIOV nic (Mellanox CX-4) causes problem, what happen if you just enable this nic, not enable the first one (Mellanox CX-3)? Does the second interface work in this case? 2. How do you enable and disable the SRIOV interfaces? 3. How do you verify that the second interface stopped working? 'sysctl -a | grep mce' gives a lot of stats on the mce interface. Do you see any number changes after loading some traffic on this interface? Firstly, I did a little investigation to find out what kind of commit cuts the connection when using the SR-IOV technology and mlx5en VF. This is commit e059c120b4223fd5ec3af9def21c0519f439fe57. With the GENERIC kernel and the previous commit a8e715d21b963251e449187c98292fff77dc7576, everything works as it should - SR-IOV VF works for both ConnectX-3 mlx4en and ConnectX-4 mlx5en. root@frw05v5:/usr/src # git checkout e059c120b4223fd5ec3af9def21c0519f439fe57 Previous HEAD position was a8e715d21b9 mlx5en: Add race protection for SQ remap HEAD is now at e059c120b42 mlx5en: Create and destroy all flow tables and rules when the network interface attaches and detaches. After the e059c120b42 checkout, the mlx5en VF network connection breaks. > How do you verify that the second interface stopped working? The working state is checked in an elementary way: Ping from the VM IP address outside the VM (in the same subnet, of course) and ping from outside the VM IP of the network interface of our VM we need. > If only the second SRIOV nic (Mellanox CX-4) causes problem, what happen if you just enable this nic, not enable the first one (Mellanox CX-3)? Does the second interface work in this case? I cited the use of the first network interface as an example to illustrate that the SR-IOV VF technology is operational on the hypervisor. And no, even with only one ConnectX-4 mlx5en network interface, this behavior is as described at the beginning. > How do you enable and disable the SRIOV interfaces? Hyper-V Manager -> right click on the VM -> Options -> Network adapter -> Hardware acceleration -> "checkbox" Enable SR-IOV Secondly. FreeBSD-12.3-STABLE-amd64-20220203-r371543-disc1.iso - no changes were made (installing the system out of the box) - Everything is OK - SR-IOV VF works for both ConnectX-3 mlx4en and ConnectX-4 mlx5en. FreeBSD-13.0-STABLE-amd64-20220203-40b816bd4f0-249223-disc1.iso - no changes were made (installing the system out of the box) - Everything is OK - SR-IOV VF works for both ConnectX-3 mlx4en and ConnectX-4 mlx5en. I was able to reproduce on a VM in Azure as well. Following commit broke the Cx-4 VF driver on Hyper-V: commit e059c120b4223fd5ec3af9def21c0519f439fe57 Author: Hans Petter Selasky <hselasky@FreeBSD.org> Date: Tue Feb 1 16:20:12 2022 +0100 mlx5en: Create and destroy all flow tables and rules when the network interface attaches and detaches. Add HPS for comment and further investigation. I'm sorry for the breakage. I'll look into it ASAP. Possibly I should have waited pushing this patch to 13-stable. Let's hope there is a quick fix. I will try to reproduce later today. Meanwhile: Does manually re-adding the IP address for the hn/mce interface or setting the link up/down change anything? --HPS > Does manually re-adding the IP address for the hn/mce interface or setting
> the link up/down change anything?
No. Up/Down - doesn't change anything.
root@frw05v5:~ # ifconfig
. . .
hn1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=8051b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,TSO4,LRO,LINKSTATE>
ether 00:15:5d:d0:8b:43
inet 172.27.172.23 netmask 0xffffff00 broadcast 172.27.172.255
media: Ethernet 10GBase-CR1 <full-duplex,rxpause,txpause>
status: active
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
mce0: flags=8a43<UP,BROADCAST,RUNNING,ALLMULTI,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=8805bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,LRO,LINKSTATE>
ether 00:15:5d:d0:8b:43
media: Ethernet 10GBase-CR1 <full-duplex,rxpause,txpause>
status: active
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
root@frw05v5:~ # ifconfig hn1 down
root@frw05v5:~ # ifconfig hn1 up
root@frw05v5:~ # ifconfig mce0 down
root@frw05v5:~ # ifconfig mce0 up
^Triage: Update Version to reflect earliest affected branch/version. Original report is for CURRENT. Hi, Can you verify executing the following two commands gets communication back? ifconfig mce0 promisc ifconfig mce0 -promisc --HPS Created attachment 231684 [details]
Permanent patch to try
Yes, after this command: root@frw05v04:~ # ifconfig mce0 promisc connection appears, and, after this command: root@frw05v04:~ # ifconfig mce0 -promisc the connection also works. > Created attachment 231684 [details]
> Permanent patch to try
The patch also fixes network communication with the SR-IOV VF of the mlx5en network adapter
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=04f407a3e5e7bf452768201ace260b575f1a7924 commit 04f407a3e5e7bf452768201ace260b575f1a7924 Author: Hans Petter Selasky <hselasky@FreeBSD.org> AuthorDate: 2022-02-10 10:12:21 +0000 Commit: Hans Petter Selasky <hselasky@FreeBSD.org> CommitDate: 2022-02-10 10:17:42 +0000 mlx5en: Make sure the NIC IP addresses are written to firmware on link up. Fixes e059c120b4223fd5ec3af9def21c0519f439fe57 . PR: 261746 MFC after: 1 day Sponsored by: NVIDIA Networking sys/dev/mlx5/mlx5_en/mlx5_en_flow_table.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) Will be MFC'ed tomorrow. A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=16635c7b213a8da75bd54cf81abb984f69b0bbc5 commit 16635c7b213a8da75bd54cf81abb984f69b0bbc5 Author: Hans Petter Selasky <hselasky@FreeBSD.org> AuthorDate: 2022-02-10 10:12:21 +0000 Commit: Hans Petter Selasky <hselasky@FreeBSD.org> CommitDate: 2022-02-11 10:15:00 +0000 mlx5en: Make sure the NIC IP addresses are written to firmware on link up. Fixes e059c120b4223fd5ec3af9def21c0519f439fe57 . PR: 261746 Sponsored by: NVIDIA Networking (cherry picked from commit 04f407a3e5e7bf452768201ace260b575f1a7924) sys/dev/mlx5/mlx5_en/mlx5_en_flow_table.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) |