About 2 years ago, jhb fixed the PCI hot-remove issue in FreeBSD 11 with this patch: https://github.com/freebsd/freebsd/commit/01f4e87387, but the patch didn't make it into FreeBSD stable/10 and 10.4. Now when I test Mellanox SR-IOV VF with FreeBSD 10.4 VM on Hyper-V, the VM can easily panic when the VF is removed from the VM. I think it's because the above patch is missing here. Discussed with jhb and it looks we need to MFC more related patches from 11 to 10 to have a complete support of PCI/CardBus hotplug in FreeBSD stable/10. BTW, https://www.freebsd.org/security/security.html#sup shows the expected EoL date for stable/10 is October 31, 2018, but I believe many users can stick to 10.4 and stable/10 for a long time...
How to reproduce the issue: 1. build & install a stable/10 kernel in a VM running on Windows Server 2016 with Mellanox ConnectX-3 device that supports SR-IOV: 2. Enable SR-IOV for the VM by assigning a VF to the VM: [root@decui-103 ~]# hn1: vmbus0: chan34 subidx0 offer got notify, nvs type 128 vmbus0: chan34 assigned to cpu0 [vcpu0] pcib1: <Hyper-V PCI Express Pass Through> on vmbus0 pcib0: allocated type 3 (0xfe0000000-0xfe0001fff) for rid 0 of pcib1 vmbus0: allocated type 3 (0xfe0000000-0xfe0001fff) for rid 0 of pcib1 pcib1: gpadl_conn(chan34) succeeded pcib1: chan34 opened pci1: <PCI bus> on pcib1 pci1: domain=2, physical bus=0 found-> vendor=0x15b3, dev=0x1004, revid=0x00 domain=2, bus=0, slot=2, func=0 class=02-00-00, hdrtype=0x00, mfdev=0 cmdreg=0x0000, statreg=0x0010, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) MSI-X supports 52 messages in map 0x18 map[18]: type Prefetchable Memory, range 64, base r, size 23, memory disabled pci1: <network, ethernet> at device 2.0 (no driver attached) [root@decui-103 ~]# [root@decui-103 ~]# pciconf -l ... none1@pci2:0:2:0: class=0x020000 card=0x61b015b3 chip=0x100415b3 rev=0x00 hdr=0x00 3. disable the VF for the VM: [root@decui-103 ~]# pcib1: chan34 revoked hn1: pcib1: got notify, nvs type 128 chan34 detached pci1: detached pcib1: chan34 closed pcib1: detached vmbus0: chan34 freed [root@decui-103 ~]# [root@decui-103 ~]# pciconf -l ... none1@pci2:0:2:0: class=0x020000 card=0x61b015b3 chip=0x100415b3 rev=0x00 hdr=0x00 Here 'pciconf -l' should not show the VF any more while it does. 4. Repeat steps 2 and 3 a few times (usually I only need to repeat them 2~5 times), the VM will panic: Fatal trap 9: general protection fault while in kernel mode cpuid = 14; apic id = 0e instruction pointer = 0x20:0xffffffff809bac4a stack pointer = 0x28:0xfffffe00002f48d0 frame pointer = 0x28:0xfffffe00002f48f0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (vmbusdev) trap number = 9 panic: general protection fault cpuid = 14 KDB: stack backtrace: #0 0xffffffff809c64c0 at kdb_backtrace+0x60 #1 0xffffffff80986c86 at vpanic+0x126 #2 0xffffffff80986b53 at panic+0x43 #3 0xffffffff80da647d at trap_fatal+0x35d #4 0xffffffff80da6104 at trap+0x784 #5 0xffffffff80d8b5dc at calltrap+0x8 #6 0xffffffff809bab05 at device_delete_child+0x15 #7 0xffffffff809bab18 at device_delete_child+0x28 #8 0xffffffff80e3ad2c at hv_pci_delete_device+0x9c #9 0xffffffff80e3b113 at hv_eject_device_work+0x23 #10 0xffffffff809d7a05 at taskqueue_run_locked+0xf5 #11 0xffffffff809d8858 at taskqueue_thread_loop+0xb8 #12 0xffffffff8094d61a at fork_exit+0x9a #13 0xffffffff80d8bb1e at fork_trampoline+0xe I suspect the VM is accessing some free()'d memory when it hits the panic.
Created attachment 191458 [details] This patch only backports the pci part of the original patch.
(In reply to Dexuan Cui from comment #2) With the pci-part-only patch + today's stable/10(582e947f376bee9d92f3439a0516cf7b78106eb8), I'm able to hot add/remove Mellanox VF 10 times without any panic. I made the patch by ignoring the cardbus part, the sparc64/powerpc parts, and ignoring the changes to sys/dev/pci/pci_iov.c, which doesn't exist in FreeBSD 10. jhb is helping to MFC all the related patches to stable/10. Thank you, jhb!
To be clear, I had only said that when I looked at merging all of the PCI_HP stuff to stable/10 that looked a bit daunting. I just wanted to make sure this particular revision can be merged and that there weren't followup fixes to that revision or other changes required. Having looked today I think it's fine to just merge this change as-is on its own. I will do a build on 10.x in my normal testing and will commit it assuming it builds ok. Glad it fixes the issue you are running into though.
(In reply to John Baldwin from comment #4) Got it. I only expected this specific revision to be MFC'ed to stable/10. It looks to me this revision is self-contained and it can solve the issue I'm running into. As far as I'm concerned, the Mellanox VF usage here should work fine with this revision.
A commit references this bug: Author: jhb Date: Wed Mar 14 19:04:41 UTC 2018 New revision: 330938 URL: https://svnweb.freebsd.org/changeset/base/330938 Log: Convert pci_delete_child() to a bus_child_deleted() method. Instead of providing a wrapper around device_delete_child() that the PCI bus and child bus drivers must call explicitly, move the bulk of the logic from pci_delete_child() into a bus_child_deleted() method (pci_child_deleted()). This allows PCI devices to be safely deleted via device_delete_child(). - Add a bus_child_deleted method to the ACPI PCI bus which clears the device_t associated with the corresponding ACPI handle in addition to the normal PCI bus cleanup. - Change cardbus_detach_card to call device_delete_children() and move CardBus-specific delete logic into a new cardbus_child_deleted() method. - Use device_delete_child() instead of pci_delete_child() in the SRIOV code. - Add a bus_child_deleted method to the OpenFirmware PCI bus drivers which frees the OpenFirmware device info for each PCI device. To preserve KBI, a pci_delete_child() function is left in place that just calls device_delete_child(). PR: 226562 Requested by: dexuan Changes: _U stable/10/ stable/10/sys/dev/acpica/acpi_pci.c stable/10/sys/dev/cardbus/cardbus.c stable/10/sys/dev/pci/pci.c stable/10/sys/dev/pci/pci_private.h stable/10/sys/powerpc/ofw/ofw_pcibus.c stable/10/sys/sparc64/pci/ofw_pcibus.c
Yeah, I didn't find any follow up fixes that it needed to be grouped with. I did include a KBI combat shim in the MFC though.
(In reply to John Baldwin from comment #7) Hi John, As I tested today's stable/10, I can confirm the bug was fixed. Thank you very much!
(In reply to John Baldwin from comment #7) (In reply to Dexuan Cui from comment #8) Hi John, Today I ran # freebsd-update fetch # freebsd-update install and got this version by "uname -a": FreeBSD bsd104 10.4-RELEASE-p8 FreeBSD 10.4-RELEASE-p8 #0: Tue Apr 3 18:40:50 UTC 2018 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 and the issue can still reproduce. I know 10.4 will be EOL on October 31 (https://www.freebsd.org/security/), so I'm wondering if we should merge your fix from stable/10 branch to releng/10.4? If yes, I suppose we still need the errata procedure to do this -- if yes, can you help to file the errata for this patch? Or, if you'd like me to file it, please feel free to let me know. :-) I suppose 10.4 will continue to be used by a lot of people even after it becomes EOF this Oct...