Windows Server 2016 (Hyper-V 2016) has the ability to support PCIe pass-through and NIC SR-IOV for non-Windows virtual machines (VMs) like Linux and FreeBSD VMs. A few months ago, we enabled PCIe pass-through for FreeBSD VM running on Hyper-V and successfully assigned a Mellanox ConnectX-3 PF device to the VM and the device worked fine in the VM. Now we have added code to support NIC SR-IOV (which is based on PCIe pass-through) in the Hyper-V hv_netvsc driver, but it turned out the VF driver failed to load, so I ported two patches from Linux: https://reviews.freebsd.org/D8867 https://reviews.freebsd.org/D8868 (Note: I only tested the PF/VF drivers in FreeBSD VM running on Hyper-V, but I didn’t test them with the patches on a bare metal FreeBSD machine (it’s not so easy to install such a FreeBSD machine in our lab for now), so it would be really helpful & important if people could review the patches and help to test bare metal.) With the 2 patches, the VF driver worked in my limited test. BTW, this link (https://community.mellanox.com/docs/DOC-2242) shows how to enable Mellanox ConnectX-3 VF for Windows VM running on Hyper-V 2012 R2. What I did to FreeBSD VM on Hyper-V 2016 is pretty similar. Next, I did more testing and identified 4 issues we need to address: 1. When the VF is hot removed, I see the below error, but it looks nonfatal, because later when the VF is hot added, it can still work. mlx4_core0: Failed to free mtt range at:20769 order:0 mlx4_core0: detached 2. The VF works fine when the VM has <=12 virtual CPUs, but if the VM has >=13 vCPUs, the VF driver fails to load: mlx4_core0: <mlx4_core> at device 2.0 on pci1 mlx4_core: Initializing mlx4_core: Mellanox ConnectX VPI driver v2.1.6 vmbus0: allocated type 3 (0xfe0800000-0xfe0ffffff) for rid 18 of mlx4_core0 mlx4_core0: Lazy allocation of 0x800000 bytes rid 0x18 type 3 at 0xfe0800000 mlx4_core0: Detected virtual function - running in slave mode mlx4_core0: Sending reset mlx4_core0: Sending vhcr0 mlx4_core0: HCA minimum page size:512 mlx4_core0: Timestamping is not supported in slave mode. mlx4_core0: attempting to allocate 20 MSI-X vectors (52 supported) mlx4_core0: using IRQs 256-275 for MSI-X mlx4_core0: Failed to allocate mtts for 1024 pages(order 10) mlx4_core0: Failed to initialize event queue table (err=-12), aborting. 3. The VF can't ping other VM's VF on the same host, and can't ping the PF on the same host either. On the same host, Windows VM <-> Windows VM and Windows VM <-> Linux VM are both OK. Only FreeBSD VM <-> Windows/Linux VMs can't work. I suspect something is wrong or missing in the mlx4 VF driver in FreeBSD. 4. I got the below when Live Migration didn’t work. It seems the VF’s detach method couldn’t finish successfully. Jan 11 19:16:43 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm channel is not idle. My toggle is 0 (op: 0x5) Jan 11 19:16:43 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST commandopcode FREE_RES (0xf01) Jan 11 19:16:43 decui-b11 kernel: mlx4_core0: Failed to free mtt range at:5937 order:0 Jan 11 19:16:54 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm channel is not idle. My toggle is 0 (op: 0x5) Jan 11 19:16:54 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST commandopcode CLOSE_PORT (0xa) Jan 11 19:18:04 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm channel is not idle. My toggle is 0 (op: 0x5) Jan 11 19:18:04 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST commandopcode FREE_RES (0xf01) Jan 11 19:19:14 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm channel is not idle. My toggle is 0 (op: 0x5) Jan 11 19:19:14 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST commandopcode QP_FLOW_STEERING_DETACH (0x66) Jan 11 19:19:14 decui-b11 kernel: mlx4_core0: Fail to detach network rule. registration id = 0x9000000000002 Jan 11 19:20:24 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm channel is not idle. My toggle is 0 (op: 0x5) Jan 11 19:20:24 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST commandopcode QP_FLOW_STEERING_DETACH (0x66) Jan 11 19:20:24 decui-b11 kernel: mlx4_core0: Fail to detach network rule. registration id = 0x9000000000003 Jan 11 19:21:34 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm channel is not idle. My toggle is 0 (op: 0x5) Jan 11 19:21:34 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST commandopcode QP_FLOW_STEERING_DETACH (0x66) Jan 11 19:21:34 decui-b11 kernel: mlx4_core0: Fail to detach network rule. registration id = 0x9000000000004 Jan 11 19:22:46 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm channel is not idle. My toggle is 0 (op: 0x5) Jan 11 19:22:46 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST commandopcode QP_FLOW_STEERING_DETACH (0x66) Jan 11 19:22:46 decui-b11 kernel: mlx4_core0: Fail to detach network rule. registration id = 0x9000000000005 Jan 11 19:23:56 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm channel is not idle. My toggle is 0 (op: 0x5) Jan 11 19:23:56 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST commandopcode QP_FLOW_STEERING_DETACH (0x66) Jan 11 19:23:56 decui-b11 kernel: mlx4_core0: Fail to detach network rule. registration id = 0x9000000000006 Jan 11 19:25:06 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm channel is not idle. My toggle is 0 (op: 0x5) Jan 11 19:25:06 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST commandopcode QP_FLOW_STEERING_DETACH (0x66) Jan 11 19:25:06 decui-b11 kernel: mlx4_core0: Fail to detach network rule. registration id = 0x9000000000007 Jan 11 19:26:16 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm channel is not idle. My toggle is 0 (op: 0x5) Jan 11 19:26:16 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST commandopcode SET_MCAST_FLTR (0x48) Jan 11 19:27:26 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm channel is not idle. My toggle is 0 (op: 0x5) Jan 11 19:27:26 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST commandopcode FREE_RES (0xf01) Jan 11 19:27:26 decui-b11 kernel: mlx4_core0: Failed to free icm of qp:2279 Jan 11 19:28:36 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm channel is not idle. My toggle is 0 (op: 0x5) Jan 11 19:28:36 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST commandopcode FREE_RES (0xf01) Jan 11 19:28:36 decui-b11 kernel: mlx4_core0: Failed to release qp range base:2279 cnt:1 Jan 11 19:29:46 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm channel is not idle. My toggle is 0 (op: 0x5) Jan 11 19:29:46 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST commandopcode 2RST_QP (0x21) Jan 11 19:30:56 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm channel is not idle. My toggle is 0 (op: 0x5) Jan 11 19:30:56 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST commandopcode HW2SW_CQ (0x17) Jan 11 19:30:56 decui-b11 kernel: mlx4_core0: HW2SW_CQ failed (-35) for CQN 0000b5 Jan 11 19:32:06 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm channel is not idle. My toggle is 0 (op: 0x5) Jan 11 19:32:06 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST commandopcode FREE_RES (0xf01) Jan 11 19:32:06 decui-b11 kernel: mlx4_core0: Failed freeing cq:181 More info about issue 4: In the case of Live Migration, it looks the host just rescinds the VF by force without sending the PCI_EJECT message to the VM. It looks the current Mellanox VF driver in FreeBSD can’t handle this case (i.e. the VF device disappears suddenly) and always hangs due to command timeout, because at that time the host denies the VM’s access to the VF. BTW, the VF driver in Linux VM doesn’t hang and it looks Live Migration can work, but the driver also prints out these scary messages: Jan 26 02:40:06 decui-lin-vm kernel: mlx4_core 99bb:00:02.0: Internal error detected on the communication channel Jan 26 02:40:06 decui-lin-vm kernel: mlx4_core 99bb:00:02.0: device is going to be reset Jan 26 02:40:06 decui-lin-vm kernel: mlx4_core 99bb:00:02.0: VF reset is not needed Jan 26 02:40:06 decui-lin-vm kernel: mlx4_core 99bb:00:02.0: device was reset successfully Jan 26 02:40:06 decui-lin-vm kernel: mlx4_en 99bb:00:02.0: Internal error detected, restarting device Jan 26 02:40:06 decui-lin-vm kernel: mlx4_core 99bb:00:02.0: command 0x5 failed: fw status = 0x1 Jan 26 02:40:06 decui-lin-vm kernel: hv_netvsc vmbus_16 eth1: VF down: enP39355p0s2 Jan 26 02:40:06 decui-lin-vm kernel: hv_netvsc vmbus_16 eth1: Data path switched from VF: enP39355p0s2 Jan 26 02:40:06 decui-lin-vm kernel: hv_netvsc vmbus_16 eth1: VF unregistering: enP39355p0s2 Jan 26 02:40:07 decui-lin-vm kernel: mlx4_core 99bb:00:02.0: Failed to close slave function Jan 26 02:40:07 decui-lin-vm kernel: mlx4_core 99bb:00:02.0: Detected virtual function - running in slave mode Jan 26 02:40:37 decui-lin-vm kernel: mlx4_core 99bb:00:02.0: recovering from previously mis-behaved VM Jan 26 02:41:07 decui-lin-vm kernel: mlx4_core 99bb:00:02.0: Communication channel is offline. Jan 26 02:41:07 decui-lin-vm kernel: mlx4_core 99bb:00:02.0: PF is not responsive, skipping initialization Jan 26 02:41:07 decui-lin-vm kernel: mlx4_core 99bb:00:02.0: Failed to initialize slave Jan 26 02:41:07 decui-lin-vm kernel: mlx4_core 99bb:00:02.0: mlx4_restart_one: ERROR: mlx4_load_one failed, pci_name=99bb:00:02.0, err=-5 Jan 26 02:41:07 decui-lin-vm kernel: mlx4_core 99bb:00:02.0: mlx4_restart_one was ended, ret=-5 Jan 26 02:41:07 decui-lin-vm kernel: mlx4_core 99bb:00:02.0: mlx4_remove_one: interface is down I think at least we need to port this patch “net/mlx4_core: Enable device recovery flow with SRIOV “ (https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=55ad359225b2232b9b8f04a0dfa169bd3a7d86d2) from Linux to FreeBSD.
To reproduce the 4 issues, we need to use today's HEAD code or newer: r312690 | dexuan | 2017-01-24 17:27:13 +0800 (Tue, 24 Jan 2017) | 8 lines hyperv/hn: add devctl_notify for VF_UP/DOWN events and manually apply the 2 patches mentioned in the bug report: https://reviews.freebsd.org/D8867 https://reviews.freebsd.org/D8868
Updates: hselasky committed the 2 patches (D8867, D8868) into the HEAD last Friday. Issue 1: No update. Issue 2: Actually Linux version of the driver has the same (similar?) issue and I reported it here: https://www.spinics.net/lists/netdev/msg420136.html Some people suspected that it failed to allocate a UAR, but after we increased LOG_BAR_SIZE with mlxconfig from 3 (8MB) to 5 (32MB), the issue was still there. We'll continue to work on this. Issue 3: No update. Issue 4: It looks this may be a host side issue. on Live Migration, the host just removes the VF from the guest by force suddenly. Working on this. But meanwhile, the VF driver should be improved to be more robust to cope with this scenario.
(In reply to Dexuan Cui from comment #2) Updates: Issue 2: Linux has made a patch (not posted yet) and we'll need to port it: https://www.spinics.net/lists/netdev/msg421306.html Issue 4: We may need to port more patches that were mentioned here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1650058/comments/7 " Mellanox has told me that the following three commits are needed for SR-IOV in Azure: 1. d585df1c5ccf net/mlx4_core: Avoid command timeouts during VF driver device shutdown 2. 7c3945bc2073 net/mlx4_core: Fix when to save some qp context flags for dynamic VST to VGT transitions 3. 291c566a2891 net/mlx4_core: Fix racy CQ (Completion Queue) free " and the 4th patch: commit 0cd9302734111abc0b5912b695336f2ee63cb22b net/mlx4_core: Reset flow activation upon SRIOV fatal command cases
(In reply to Dexuan Cui from comment #3) One more patch is needed: 6496bbf0ec48 net/mlx4_en: Fix bad WQE issue (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1650058/comments/8)
One more patch to port from Linux: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cbe4dac82e423ecc9a0ba46af24a860853259f4
Ping - any news on this issue? Have you ported more patches from Linux to FreeBSD?
(In reply to Hans Petter Selasky from comment #6) There are a bunch of Linux patches needed to be ported to FreeBSD, and a fully testing is required.
Do you know which release of Linux has all the needed MLX patches?
(In reply to Hans Petter Selasky from comment #8) I would check https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
A commit references this bug: Author: hselasky Date: Thu Jun 1 10:39:00 UTC 2017 New revision: 319413 URL: https://svnweb.freebsd.org/changeset/base/319413 Log: Free hardware queue resource after port is stopped in the mlx4en(4) driver. Else if the port is up the resource might still be busy and the MTT free will fail. PR: 216493 MFC after: 3 days Sponsored by: Mellanox Technologies Changes: head/sys/dev/mlx4/mlx4_en/mlx4_en_netdev.c
A commit references this bug: Author: hselasky Date: Thu Jun 1 10:44:48 UTC 2017 New revision: 319414 URL: https://svnweb.freebsd.org/changeset/base/319414 Log: Allow communication between functions on the same host when using the mlx4en(4) driver in SRIOV mode. Place a copy of the destination MAC address in the send WQE only under SRIOV/eSwitch configuration or when the device is in selftest. This allows communication between functions on the same host. PR: 216493 MFC after: 3 days Sponsored by: Mellanox Technologies Changes: head/sys/dev/mlx4/mlx4_en/mlx4_en_tx.c
Hi, There has been a series of fixes to mlx4en(4) in 12-current regarding SRIOV support. Can you re-test and report which issues are still remaining? --HPS
(In reply to Hans Petter Selasky from comment #12) Thanks, HPS! Will do.
A commit references this bug: Author: hselasky Date: Sun Jun 4 08:25:29 UTC 2017 New revision: 319563 URL: https://svnweb.freebsd.org/changeset/base/319563 Log: MFC r319414: Allow communication between functions on the same host when using the mlx4en(4) driver in SRIOV mode. Place a copy of the destination MAC address in the send WQE only under SRIOV/eSwitch configuration or when the device is in selftest. This allows communication between functions on the same host. PR: 216493 Approved by: re (kib) Sponsored by: Mellanox Technologies Changes: _U stable/11/ stable/11/sys/ofed/drivers/net/mlx4/en_tx.c
A commit references this bug: Author: hselasky Date: Sun Jun 4 08:29:17 UTC 2017 New revision: 319564 URL: https://svnweb.freebsd.org/changeset/base/319564 Log: MFC r319414: Allow communication between functions on the same host when using the mlx4en(4) driver in SRIOV mode. Place a copy of the destination MAC address in the send WQE only under SRIOV/eSwitch configuration or when the device is in selftest. This allows communication between functions on the same host. PR: 216493 Sponsored by: Mellanox Technologies Changes: _U stable/10/ stable/10/sys/ofed/drivers/net/mlx4/en_tx.c
A commit references this bug: Author: hselasky Date: Sun Jun 4 08:30:55 UTC 2017 New revision: 319565 URL: https://svnweb.freebsd.org/changeset/base/319565 Log: MFC r319414: Allow communication between functions on the same host when using the mlx4en(4) driver in SRIOV mode. Place a copy of the destination MAC address in the send WQE only under SRIOV/eSwitch configuration or when the device is in selftest. This allows communication between functions on the same host. PR: 216493 Sponsored by: Mellanox Technologies Changes: _U stable/9/sys/ stable/9/sys/ofed/drivers/net/mlx4/en_tx.c
A commit references this bug: Author: hselasky Date: Sun Jun 4 08:45:14 UTC 2017 New revision: 319566 URL: https://svnweb.freebsd.org/changeset/base/319566 Log: MFC r319413: Free hardware queue resource after port is stopped in the mlx4en(4) driver. Else if the port is up the resource might still be busy and the MTT free will fail. PR: 216493 Approved by: re (kib) Sponsored by: Mellanox Technologies Changes: _U stable/11/ stable/11/sys/ofed/drivers/net/mlx4/en_netdev.c
A commit references this bug: Author: hselasky Date: Sun Jun 4 08:47:09 UTC 2017 New revision: 319567 URL: https://svnweb.freebsd.org/changeset/base/319567 Log: MFC r319413: Free hardware queue resource after port is stopped in the mlx4en(4) driver. Else if the port is up the resource might still be busy and the MTT free will fail. PR: 216493 Sponsored by: Mellanox Technologies Changes: _U stable/10/ stable/10/sys/ofed/drivers/net/mlx4/en_netdev.c
A commit references this bug: Author: hselasky Date: Sun Jun 4 08:48:27 UTC 2017 New revision: 319568 URL: https://svnweb.freebsd.org/changeset/base/319568 Log: MFC r319413: Free hardware queue resource after port is stopped in the mlx4en(4) driver. Else if the port is up the resource might still be busy and the MTT free will fail. PR: 216493 Sponsored by: Mellanox Technologies Changes: _U stable/9/sys/ stable/9/sys/ofed/drivers/net/mlx4/en_netdev.c
(In reply to commit-hook from comment #19) All the first 3 issues have been fixed and the patches have been in HEAD, stable/11 and stable 10. Let's close this bug and open a new bug for the live migration issue only.
MARKED AS SPAM