220453 – [Hyper-V] need to update Mellanox ConnectX-3 VF driver to support Live Migration

Bug 220453 - [Hyper-V] need to update Mellanox ConnectX-3 VF driver to support Live Migration

Summary: [Hyper-V] need to update Mellanox ConnectX-3 VF driver to support Live Migration

Status:	New

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	CURRENT
Hardware:	Any Any

Importance:	--- Affects Only Me
Assignee:	freebsd-virtualization (Nobody)

URL:
Keywords:

Depends on:
Blocks:

Reported:	2017-07-03 18:08 UTC by Dexuan Cui
Modified:	2018-07-07 00:38 UTC (History)
CC List:	6 users (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Dexuan Cui 2017-07-03 18:08:42 UTC

After bug 216493 is closed, let's focus on this live migration bug.

With the latest HEAD or any stable/ branches,  we still get the below when Live Migration can't work (please refer to bug 216493 for possible patches we need to port from Linux):

Jan 11 19:16:43 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm channel is not idle. My toggle is 0 (op: 0x5)
Jan 11 19:16:43 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST commandopcode FREE_RES (0xf01)
Jan 11 19:16:43 decui-b11 kernel: mlx4_core0: Failed to free mtt range at:5937 order:0
Jan 11 19:16:54 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm channel is not idle. My toggle is 0 (op: 0x5)
Jan 11 19:16:54 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST commandopcode CLOSE_PORT (0xa)
Jan 11 19:18:04 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm channel is not idle. My toggle is 0 (op: 0x5)
Jan 11 19:18:04 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST commandopcode FREE_RES (0xf01)
Jan 11 19:19:14 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm channel is not idle. My toggle is 0 (op: 0x5)
Jan 11 19:19:14 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST commandopcode QP_FLOW_STEERING_DETACH (0x66)
Jan 11 19:19:14 decui-b11 kernel: mlx4_core0: Fail to detach network rule. registration id = 0x9000000000002
Jan 11 19:20:24 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm channel is not idle. My toggle is 0 (op: 0x5)
Jan 11 19:20:24 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST commandopcode QP_FLOW_STEERING_DETACH (0x66)
Jan 11 19:20:24 decui-b11 kernel: mlx4_core0: Fail to detach network rule. registration id = 0x9000000000003
Jan 11 19:21:34 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm channel is not idle. My toggle is 0 (op: 0x5)
Jan 11 19:21:34 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST commandopcode QP_FLOW_STEERING_DETACH (0x66)
Jan 11 19:21:34 decui-b11 kernel: mlx4_core0: Fail to detach network rule. registration id = 0x9000000000004
Jan 11 19:22:46 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm channel is not idle. My toggle is 0 (op: 0x5)
Jan 11 19:22:46 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST commandopcode QP_FLOW_STEERING_DETACH (0x66)
Jan 11 19:22:46 decui-b11 kernel: mlx4_core0: Fail to detach network rule. registration id = 0x9000000000005
Jan 11 19:23:56 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm channel is not idle. My toggle is 0 (op: 0x5)
Jan 11 19:23:56 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST commandopcode QP_FLOW_STEERING_DETACH (0x66)
Jan 11 19:23:56 decui-b11 kernel: mlx4_core0: Fail to detach network rule. registration id = 0x9000000000006
Jan 11 19:25:06 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm channel is not idle. My toggle is 0 (op: 0x5)
Jan 11 19:25:06 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST commandopcode QP_FLOW_STEERING_DETACH (0x66)
Jan 11 19:25:06 decui-b11 kernel: mlx4_core0: Fail to detach network rule. registration id = 0x9000000000007
Jan 11 19:26:16 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm channel is not idle. My toggle is 0 (op: 0x5)
Jan 11 19:26:16 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST commandopcode SET_MCAST_FLTR (0x48)
Jan 11 19:27:26 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm channel is not idle. My toggle is 0 (op: 0x5)
Jan 11 19:27:26 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST commandopcode FREE_RES (0xf01)
Jan 11 19:27:26 decui-b11 kernel: mlx4_core0: Failed to free icm of qp:2279
Jan 11 19:28:36 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm channel is not idle. My toggle is 0 (op: 0x5)
Jan 11 19:28:36 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST commandopcode FREE_RES (0xf01)
Jan 11 19:28:36 decui-b11 kernel: mlx4_core0: Failed to release qp range base:2279 cnt:1
Jan 11 19:29:46 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm channel is not idle. My toggle is 0 (op: 0x5)
Jan 11 19:29:46 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST commandopcode 2RST_QP (0x21)
Jan 11 19:30:56 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm channel is not idle. My toggle is 0 (op: 0x5)
Jan 11 19:30:56 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST commandopcode HW2SW_CQ (0x17)
Jan 11 19:30:56 decui-b11 kernel: mlx4_core0: HW2SW_CQ failed (-35) for CQN 0000b5
Jan 11 19:32:06 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm channel is not idle. My toggle is 0 (op: 0x5)
Jan 11 19:32:06 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST commandopcode FREE_RES (0xf01)
Jan 11 19:32:06 decui-b11 kernel: mlx4_core0: Failed freeing cq:181

Comment 1 Pete French 2018-06-16 10:53:30 UTC

Hi, has there been any progress on this ? Am interested in enabling high performance networking on my Azure machines, which I believe uses this driver, but I do not want to stop live migration form working for obvious reasons!

Comment 2 Dexuan Cui 2018-06-16 16:53:48 UTC

(In reply to pete from comment #1)
When I reported the bug, I was testing the case on my local Hyper-V hosts. I did not tested this case recently, but I suspect it may have been fixed already. hselasky (hps) knows this better than me.

It looks so far Azure doesn't live migrate a VM from a host to another host. I suppose this bug should not block you from using SR-IOV (i.e. Accelerated Networking) on Azure. If you use FreeBSD 10.4 and the coming 11.2, or the latest CURRENT code, I think SR-IOV should work out-of-box.

Comment 3 Pete French 2018-06-16 17:37:14 UTC

Ok, thanks I shall give it a try. I am using 11-STABLE (so 11.2 basically). You say "out of the box" - odes that mean I don't need to recompile with OFED enabled ? I usually do this when running Mellanox adapters.

Comment 4 Dexuan Cui 2018-06-16 18:25:53 UTC

(In reply to pete from comment #3)
I didn't check it myself, but I remember it's mentioned the Mellanox drivers are built by default now: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211528#c10

Comment 5 Pete French 2018-06-18 10:43:35 UTC

Using what is about to become 11.2 I get a panic when I have the Mellanox drivers loaded and accelerated networking enabled unfortunately:

Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address   = 0x1d4
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80bb34b4
stack pointer           = 0x28:0xfffffe03e27be9d0
frame pointer           = 0x28:0xfffffe03e27be9d0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (vmbusdev)
trap number             = 12
timeout stopping cpus
panic: page fault
cpuid = 2
KDB: stack backtrace:
#0 0xffffffff80b36697 at kdb_backtrace+0x67
#1 0xffffffff80af00f7 at vpanic+0x177
#2 0xffffffff80aeff73 at panic+0x43
#3 0xffffffff80f6ee0f at trap_fatal+0x35f
#4 0xffffffff80f6ee69 at trap_pfault+0x49
#5 0xffffffff80f6e636 at trap+0x2c6
#6 0xffffffff80f4e5ac at calltrap+0x8
#7 0xffffffff80ba6494 at namei+0x1b4
#8 0xffffffff80bc1d53 at vn_open_cred+0x233
#9 0xffffffff80ac31de at linker_load_module+0x47e
#10 0xffffffff80ac5061 at kern_kldload+0xc1
#11 0xffffffff8228b6f2 at mlx4_request_modules+0x92
#12 0xffffffff8228f986 at mlx4_load_one+0x3056
#13 0xffffffff82292c70 at mlx4_init_one+0x3c0
#14 0xffffffff82259ed5 at linux_pci_attach+0x405
#15 0xffffffff80b28e68 at device_attach+0x3b8
#16 0xffffffff80b2a0fd at bus_generic_attach+0x3d
#17 0xffffffff8076fac5 at pci_attach+0xd5

Comment 6 Dexuan Cui 2018-06-18 14:58:42 UTC

(In reply to pete from comment #5)
Thank for reporting this! Let us have a look.

Comment 7 Dexuan Cui 2018-06-18 15:30:39 UTC

Using today's HEAD (3309c975db94bf91f18da6a0285649a8903e56c1), I got this:

vmbus0: vmbus IDT vector 251
vmbus0: smp_started = 1
panic: vm_fault_hold: fault on nofault entry, addr: 0xfffffe0000dff000
cpuid = 15
time = 1
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffffff826d5520
vpanic() at vpanic+0x1a3/frame 0xffffffff826d5580
panic() at panic+0x43/frame 0xffffffff826d55e0
vm_fault_hold() at vm_fault_hold+0x2373/frame 0xffffffff826d5720
vm_fault() at vm_fault+0x60/frame 0xffffffff826d5760
trap_pfault() at trap_pfault+0x188/frame 0xffffffff826d57b0
trap() at trap+0x2ba/frame 0xffffffff826d58c0
calltrap() at calltrap+0x8/frame 0xffffffff826d58c0
--- trap 0xc, rip = 0xfffffe0000dff000, rsp = 0xffffffff826d5990, rbp = 0xffffffff826d59a0 ---
??() at 0xfffffe0000dff000/frame 0xffffffff826d59a0
vmbus_msghc_exec() at vmbus_msghc_exec+0x58/frame 0xffffffff826d59e0
vmbus_intrhook() at vmbus_intrhook+0x633/frame 0xffffffff826d5aa0
run_interrupt_driven_config_hooks() at run_interrupt_driven_config_hooks+0x7c/frame 0xffffffff826d5ac0
boot_run_interrupt_driven_config_hooks() at boot_run_interrupt_driven_config_hooks+0x20/frame 0xffffffff826d5b50
mi_startup() at mi_startup+0x118/frame 0xffffffff826d5b70
btext() at btext+0x2c
KDB: enter: panic
[ thread pid 0 tid 100000 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why
db>

Comment 8 Dexuan Cui 2018-06-18 17:44:02 UTC

(In reply to Dexuan Cui from comment #6)
With today's releng/11.2 (f55f63f9f9c29dae38ac323adfd253cec627873c), the mlx4 driver works fine for me. What's the exact version you're using?

Comment 9 Dexuan Cui 2018-06-18 19:35:34 UTC

(In reply to Dexuan Cui from comment #8)
Hmmm, I can't reproduce the panic in comment #7 any more, after I built a kernel from scratch (i.e. git clone the repo into a new directory, and make&install the kernel, and reboot) with the same version (3309c975db94bf91f18da6a0285649a8903e56c1).  


@pete, can you consistently reproduce your panic every time you reboot the VM?
What if you also build a kernel from scratch?

Comment 10 Pete French 2018-06-18 20:11:38 UTC

Just a quick one, will try and add more detail tomorrow.

I am using r334458 on STABLE-11 which was more or less the point where 11_2 branched I believe. I can't see anything in 11_2 after that point which is mlx4en or linuxkbi related. I can try doing an update tomorrow to the latest STABLE and see though. Will try and prepare a box just for testing this.

The compile has the following in src.conf

# All our Intel machines are post-Core2
CPUTYPE?=core2
# We are using exim and cups
WITHOUT_SENDMAIL=true
WITHOUT_LPR=true

and the following in make.conf

# Build ports in local
WRKDIRPREFIX=/usr/local/port-build
# Use new format packages
WITH_PKGNG=yes
DISABLE_VULNERABILITIES=yes
# If we install cups it overwrites the base
CUPS_OVERWRITE_BASE=yes

I switched the interface over to accelerated networking and rebooted, but it still came up using hn0. So I added a 'mlx4en_load="YES"' to loader.conf and rebooted - which gave me the panic. I tried many times, it panicked every time.

The only oddity in the setup is I am booting off ZFS instead of UFS. Drive is GPT partitioned, and I have the following in loader.conf

boot_serial="YES"
comconsole_speed="115200"
console="comconsole"
vfs.mountroot.timeout="300"
kern.geom.label.disk_ident.enable="0"
kern.geom.label.gptid.enable="0"
ahci_load="YES"
aesni_load="YES"
cryptodev_load="YES"
zfs_load="YES"
vfs.zfs.arc_max="1G"
vfs.zfs.prefetch_disable="1"
vfs.zfs.txg.timeout="5"
vfs.zfs.vdev.cache.size="10M"
vfs.zfs.vdev.cache.max="10M"

Don't know if any of that is significant at all - I assume adding the line to loader.conf is the right thing to do, yes ?

Comment 11 Dexuan Cui 2018-06-18 20:16:58 UTC

(In reply to pete from comment #10)
Yes, I also has mlx4en_load="YES" in my /boot/loader.conf.

Comment 12 Pete French 2018-06-19 14:28:58 UTC

So, I cloned the machine, to giuve me something to experiment with, and the clone panics in the same way. Watching it boot, it only does this when it gets to the networking part of the boot - and the message before it is that hn0 has status of DWON. I am surprised that it is also finding the hn0 interface - is that expected ?

Am trying to recover the box now, and will remove all networjking refernces from rc.conf to see what it does. Will then try a rebuild from todays STABLE and see if that works.

Comment 13 Dexuan Cui 2018-06-19 14:57:04 UTC

(In reply to pete from comment #12)
Yes, it's expected that you saw a hn interface.

Hyper-V is a little different from other hypervisors with respect to SR-IOV support for NIC.  Hyper-V provides a pair of NICs to the VM: one is the Para-Virtualized NIC (hn), and the other is the hardware VF NIC, and both the NICs share the same MAC address, and usually almost all of the network traffic goes through the VF NIC so that we can take advantage of the benefits of the hardware VF NIC (i.e. lower latency and CPU utilization, and higher throughput) , but if necessary the network traffic can dynamically switch to the PV NIC, which facilitates things like live migration of the VM. 

To enable SR-IOV in FreeBSD on Hyper-V, in 2016 we updated the PV NIC driver (i.e. the netvsc driver) a little, and we added a pci front-end driver (i.e. the pcib driver) to discover the VF device, and in Aug 2017 sephe implemented an automatic “bond mode”, with which we don’t need to manually use the lagg driver any more, and the configuration work is done automatically for the PV NIC (we don't and shouldn't directly touch the VF interface).

Comment 14 Pete French 2018-06-19 15:09:54 UTC

I got a capture of what it does right before the panic... so, this takes place directly after root is mounted.


Trying to mount root from zfs:zroot/ROOT/default []...
pci1: <PCI bus> on pcib1
mlx4_core0: <mlx4_core> at device 2.0 on pci1
<6>mlx4_core: Mellanox ConnectX core driver v3.4.1 (October 2017)
mlx4_core: Initializing mlx4_core
mlx4_core0: Detected virtual function - running in slave mode
mlx4_core0: Sending reset
mlx4_core0: Sending vhcr0
mlx4_core0: HCA minimum page size:512
mlx4_core0: Timestamping is not supported in slave mode
mlx4_en mlx4_core0: Activating port:1
mlxen0: Ethernet address: 00:0d:3a:20:9c:a0
<4>mlx4_en: mlx4_core0: Port 1: Using 4 TX rings
mlxen0: link state changed to DOWN
<4>mlx4_en: mlx4_core0: Port 1: Using 4 RX rings
hn0: link state changed to DOWN
<4>mlx4_en: mlxen0: Using 4 TX rings
<4>mlx4_en: mlxen0: Using 4 RX rings
<4>mlx4_en: mlxen0: Initializing port

Comment 15 Dexuan Cui 2018-06-19 15:17:47 UTC

(In reply to pete from comment #14)
Is it easy for you to test UFS? I didn't really use ZFS before. :-)

Comment 16 Pete French 2018-06-19 19:43:25 UTC

Thanks for the explanation - interesting! I don't have any UFS machines I could clone to test with, but will do so if we don't get anywhere with this.

meanwhile, however, I did the following test - comment out the line in loader.conf and instead load the module after the machine has booted. That works fine and the device appears with no panic (though of course its not in use).

So its something to do with it being loaded during boot.

What version would you like me to try compiling to and tests g again ? latest STABLE, or latest 11_2 ?

Comment 17 Dexuan Cui 2018-06-19 21:52:16 UTC

(In reply to pete from comment #16)
I opened a new bug for the panic issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229167

Let's use it to discuss the panic issue, as this one (Bug 220453) is supposed to focus on live migration with mlx VF. :-)

Comment 18 scorpionmage 2018-07-06 01:42:54 UTC

(In reply to Dexuan Cui from comment #7)

Hi - this also happens when booting off the latest snapshots of Current ISO, and it also gave the same panic after the latest compile of Current. I am running on Hyper V.

Comment 19 Dexuan Cui 2018-07-07 00:38:16 UTC

(In reply to scorpionmage from comment #18)
Let's track the vmbus_msghc_exec() panic in bug 229167. I'm going to commit a fix soon.