Summary: | [Hyper-V] Mellanox ConnectX-3 VF driver can't work when FreeBSD runs on Hyper-V 2016 | ||
---|---|---|---|
Product: | Base System | Reporter: | Dexuan Cui <decui> |
Component: | kern | Assignee: | freebsd-virtualization (Nobody) <virtualization> |
Status: | Closed FIXED | ||
Severity: | Affects Only Me | CC: | com.my.network, decui, hselasky, kyliel, sephe, tablosazi.farahan |
Priority: | --- | ||
Version: | CURRENT | ||
Hardware: | Any | ||
OS: | Any |
Description
Dexuan Cui
2017-01-26 13:08:06 UTC
To reproduce the 4 issues, we need to use today's HEAD code or newer: r312690 | dexuan | 2017-01-24 17:27:13 +0800 (Tue, 24 Jan 2017) | 8 lines hyperv/hn: add devctl_notify for VF_UP/DOWN events and manually apply the 2 patches mentioned in the bug report: https://reviews.freebsd.org/D8867 https://reviews.freebsd.org/D8868 Updates: hselasky committed the 2 patches (D8867, D8868) into the HEAD last Friday. Issue 1: No update. Issue 2: Actually Linux version of the driver has the same (similar?) issue and I reported it here: https://www.spinics.net/lists/netdev/msg420136.html Some people suspected that it failed to allocate a UAR, but after we increased LOG_BAR_SIZE with mlxconfig from 3 (8MB) to 5 (32MB), the issue was still there. We'll continue to work on this. Issue 3: No update. Issue 4: It looks this may be a host side issue. on Live Migration, the host just removes the VF from the guest by force suddenly. Working on this. But meanwhile, the VF driver should be improved to be more robust to cope with this scenario. (In reply to Dexuan Cui from comment #2) Updates: Issue 2: Linux has made a patch (not posted yet) and we'll need to port it: https://www.spinics.net/lists/netdev/msg421306.html Issue 4: We may need to port more patches that were mentioned here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1650058/comments/7 " Mellanox has told me that the following three commits are needed for SR-IOV in Azure: 1. d585df1c5ccf net/mlx4_core: Avoid command timeouts during VF driver device shutdown 2. 7c3945bc2073 net/mlx4_core: Fix when to save some qp context flags for dynamic VST to VGT transitions 3. 291c566a2891 net/mlx4_core: Fix racy CQ (Completion Queue) free " and the 4th patch: commit 0cd9302734111abc0b5912b695336f2ee63cb22b net/mlx4_core: Reset flow activation upon SRIOV fatal command cases (In reply to Dexuan Cui from comment #3) One more patch is needed: 6496bbf0ec48 net/mlx4_en: Fix bad WQE issue (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1650058/comments/8) One more patch to port from Linux: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cbe4dac82e423ecc9a0ba46af24a860853259f4 Ping - any news on this issue? Have you ported more patches from Linux to FreeBSD? (In reply to Hans Petter Selasky from comment #6) There are a bunch of Linux patches needed to be ported to FreeBSD, and a fully testing is required. Do you know which release of Linux has all the needed MLX patches? (In reply to Hans Petter Selasky from comment #8) I would check https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git A commit references this bug: Author: hselasky Date: Thu Jun 1 10:39:00 UTC 2017 New revision: 319413 URL: https://svnweb.freebsd.org/changeset/base/319413 Log: Free hardware queue resource after port is stopped in the mlx4en(4) driver. Else if the port is up the resource might still be busy and the MTT free will fail. PR: 216493 MFC after: 3 days Sponsored by: Mellanox Technologies Changes: head/sys/dev/mlx4/mlx4_en/mlx4_en_netdev.c A commit references this bug: Author: hselasky Date: Thu Jun 1 10:44:48 UTC 2017 New revision: 319414 URL: https://svnweb.freebsd.org/changeset/base/319414 Log: Allow communication between functions on the same host when using the mlx4en(4) driver in SRIOV mode. Place a copy of the destination MAC address in the send WQE only under SRIOV/eSwitch configuration or when the device is in selftest. This allows communication between functions on the same host. PR: 216493 MFC after: 3 days Sponsored by: Mellanox Technologies Changes: head/sys/dev/mlx4/mlx4_en/mlx4_en_tx.c Hi, There has been a series of fixes to mlx4en(4) in 12-current regarding SRIOV support. Can you re-test and report which issues are still remaining? --HPS (In reply to Hans Petter Selasky from comment #12) Thanks, HPS! Will do. A commit references this bug: Author: hselasky Date: Sun Jun 4 08:25:29 UTC 2017 New revision: 319563 URL: https://svnweb.freebsd.org/changeset/base/319563 Log: MFC r319414: Allow communication between functions on the same host when using the mlx4en(4) driver in SRIOV mode. Place a copy of the destination MAC address in the send WQE only under SRIOV/eSwitch configuration or when the device is in selftest. This allows communication between functions on the same host. PR: 216493 Approved by: re (kib) Sponsored by: Mellanox Technologies Changes: _U stable/11/ stable/11/sys/ofed/drivers/net/mlx4/en_tx.c A commit references this bug: Author: hselasky Date: Sun Jun 4 08:29:17 UTC 2017 New revision: 319564 URL: https://svnweb.freebsd.org/changeset/base/319564 Log: MFC r319414: Allow communication between functions on the same host when using the mlx4en(4) driver in SRIOV mode. Place a copy of the destination MAC address in the send WQE only under SRIOV/eSwitch configuration or when the device is in selftest. This allows communication between functions on the same host. PR: 216493 Sponsored by: Mellanox Technologies Changes: _U stable/10/ stable/10/sys/ofed/drivers/net/mlx4/en_tx.c A commit references this bug: Author: hselasky Date: Sun Jun 4 08:30:55 UTC 2017 New revision: 319565 URL: https://svnweb.freebsd.org/changeset/base/319565 Log: MFC r319414: Allow communication between functions on the same host when using the mlx4en(4) driver in SRIOV mode. Place a copy of the destination MAC address in the send WQE only under SRIOV/eSwitch configuration or when the device is in selftest. This allows communication between functions on the same host. PR: 216493 Sponsored by: Mellanox Technologies Changes: _U stable/9/sys/ stable/9/sys/ofed/drivers/net/mlx4/en_tx.c A commit references this bug: Author: hselasky Date: Sun Jun 4 08:45:14 UTC 2017 New revision: 319566 URL: https://svnweb.freebsd.org/changeset/base/319566 Log: MFC r319413: Free hardware queue resource after port is stopped in the mlx4en(4) driver. Else if the port is up the resource might still be busy and the MTT free will fail. PR: 216493 Approved by: re (kib) Sponsored by: Mellanox Technologies Changes: _U stable/11/ stable/11/sys/ofed/drivers/net/mlx4/en_netdev.c A commit references this bug: Author: hselasky Date: Sun Jun 4 08:47:09 UTC 2017 New revision: 319567 URL: https://svnweb.freebsd.org/changeset/base/319567 Log: MFC r319413: Free hardware queue resource after port is stopped in the mlx4en(4) driver. Else if the port is up the resource might still be busy and the MTT free will fail. PR: 216493 Sponsored by: Mellanox Technologies Changes: _U stable/10/ stable/10/sys/ofed/drivers/net/mlx4/en_netdev.c A commit references this bug: Author: hselasky Date: Sun Jun 4 08:48:27 UTC 2017 New revision: 319568 URL: https://svnweb.freebsd.org/changeset/base/319568 Log: MFC r319413: Free hardware queue resource after port is stopped in the mlx4en(4) driver. Else if the port is up the resource might still be busy and the MTT free will fail. PR: 216493 Sponsored by: Mellanox Technologies Changes: _U stable/9/sys/ stable/9/sys/ofed/drivers/net/mlx4/en_netdev.c (In reply to commit-hook from comment #19) All the first 3 issues have been fixed and the patches have been in HEAD, stable/11 and stable 10. Let's close this bug and open a new bug for the live migration issue only. MARKED AS SPAM MARKED AS SPAM |