Bug 226562 - [stable/10] backport pci/cardbus hot-remove support from FreeBSD 11 to 10
Summary: [stable/10] backport pci/cardbus hot-remove support from FreeBSD 11 to 10
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.4-STABLE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-03-12 21:20 UTC by Dexuan Cui
Modified: 2018-04-11 19:34 UTC (History)
2 users (show)

See Also:


Attachments
This patch only backports the pci part of the original patch. (4.12 KB, patch)
2018-03-12 22:35 UTC, Dexuan Cui
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dexuan Cui 2018-03-12 21:20:54 UTC
About 2 years ago, jhb fixed the PCI hot-remove issue in FreeBSD 11 with this patch: https://github.com/freebsd/freebsd/commit/01f4e87387, but the patch didn't make it into FreeBSD stable/10 and 10.4.

Now when I test Mellanox SR-IOV VF with FreeBSD 10.4 VM on Hyper-V, the VM can easily panic when the VF is removed from the VM. I think it's because the above patch is missing here.

Discussed with jhb and it looks we need to MFC more related patches from 11 to 10 to have a complete support of PCI/CardBus hotplug in FreeBSD stable/10.

BTW, https://www.freebsd.org/security/security.html#sup shows the expected EoL date for stable/10 is October 31, 2018, but I believe many users can stick to 10.4 and stable/10 for a long time...
Comment 1 Dexuan Cui 2018-03-12 21:53:17 UTC
How to reproduce the issue:

1. build & install a stable/10 kernel in a VM running on Windows Server 2016 with Mellanox ConnectX-3 device that supports SR-IOV: 

2. Enable SR-IOV for the VM by assigning a VF to the VM:

[root@decui-103 ~]# hn1: vmbus0: chan34 subidx0 offer
got notify, nvs type 128
vmbus0: chan34 assigned to cpu0 [vcpu0]
pcib1: <Hyper-V PCI Express Pass Through> on vmbus0
pcib0: allocated type 3 (0xfe0000000-0xfe0001fff) for rid 0 of pcib1
vmbus0: allocated type 3 (0xfe0000000-0xfe0001fff) for rid 0 of pcib1
pcib1: gpadl_conn(chan34) succeeded
pcib1: chan34 opened
pci1: <PCI bus> on pcib1
pci1: domain=2, physical bus=0
found-> vendor=0x15b3, dev=0x1004, revid=0x00
        domain=2, bus=0, slot=2, func=0
        class=02-00-00, hdrtype=0x00, mfdev=0
        cmdreg=0x0000, statreg=0x0010, cachelnsz=0 (dwords)
        lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)
        MSI-X supports 52 messages in map 0x18
        map[18]: type Prefetchable Memory, range 64, base r, size 23, memory disabled
pci1: <network, ethernet> at device 2.0 (no driver attached)

[root@decui-103 ~]#
[root@decui-103 ~]# pciconf -l
...
none1@pci2:0:2:0:       class=0x020000 card=0x61b015b3 chip=0x100415b3 rev=0x00 hdr=0x00

3. disable the VF for the VM:

[root@decui-103 ~]# pcib1: chan34 revoked
hn1: pcib1: got notify, nvs type 128
chan34 detached
pci1: detached
pcib1: chan34 closed
pcib1: detached
vmbus0: chan34 freed

[root@decui-103 ~]#
[root@decui-103 ~]# pciconf -l
...
none1@pci2:0:2:0:       class=0x020000 card=0x61b015b3 chip=0x100415b3 rev=0x00 hdr=0x00

Here 'pciconf -l' should not show the VF any more while it does.

4. Repeat steps 2 and 3 a few times (usually I only need to repeat them 2~5 times), the VM will panic:


Fatal trap 9: general protection fault while in kernel mode
cpuid = 14; apic id = 0e
instruction pointer     = 0x20:0xffffffff809bac4a
stack pointer           = 0x28:0xfffffe00002f48d0
frame pointer           = 0x28:0xfffffe00002f48f0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (vmbusdev)
trap number             = 9
panic: general protection fault
cpuid = 14
KDB: stack backtrace:
#0 0xffffffff809c64c0 at kdb_backtrace+0x60
#1 0xffffffff80986c86 at vpanic+0x126
#2 0xffffffff80986b53 at panic+0x43
#3 0xffffffff80da647d at trap_fatal+0x35d
#4 0xffffffff80da6104 at trap+0x784
#5 0xffffffff80d8b5dc at calltrap+0x8
#6 0xffffffff809bab05 at device_delete_child+0x15
#7 0xffffffff809bab18 at device_delete_child+0x28
#8 0xffffffff80e3ad2c at hv_pci_delete_device+0x9c
#9 0xffffffff80e3b113 at hv_eject_device_work+0x23
#10 0xffffffff809d7a05 at taskqueue_run_locked+0xf5
#11 0xffffffff809d8858 at taskqueue_thread_loop+0xb8
#12 0xffffffff8094d61a at fork_exit+0x9a
#13 0xffffffff80d8bb1e at fork_trampoline+0xe

I suspect the VM is accessing some free()'d memory when it hits the panic.
Comment 2 Dexuan Cui 2018-03-12 22:35:33 UTC
Created attachment 191458 [details]
This patch only backports the pci part of the original patch.
Comment 3 Dexuan Cui 2018-03-12 22:41:09 UTC
(In reply to Dexuan Cui from comment #2)

With the pci-part-only patch + today's stable/10(582e947f376bee9d92f3439a0516cf7b78106eb8), I'm able to hot add/remove Mellanox VF 10 times without any panic.

I made the patch by ignoring the cardbus part, the sparc64/powerpc parts, and ignoring the changes to sys/dev/pci/pci_iov.c, which doesn't exist in FreeBSD 10.


jhb is helping to MFC all the related patches to stable/10. Thank you, jhb!
Comment 4 John Baldwin freebsd_committer freebsd_triage 2018-03-13 00:05:45 UTC
To be clear, I had only said that when I looked at merging all of the PCI_HP stuff to stable/10 that looked a bit daunting.  I just wanted to make sure this particular revision can be merged and that there weren't followup fixes to that revision or other changes required.  Having looked today I think it's fine to just merge this change as-is on its own.  I will do a build on 10.x in my normal testing and will commit it assuming it builds ok.  Glad it fixes the issue you are running into though.
Comment 5 Dexuan Cui 2018-03-13 00:21:17 UTC
(In reply to John Baldwin from comment #4)
Got it. I only expected this specific revision to be MFC'ed to stable/10.  It looks to me this revision is self-contained and it can solve the issue I'm running into. As far as I'm concerned, the Mellanox VF usage here should work fine with this revision.
Comment 6 commit-hook freebsd_committer freebsd_triage 2018-03-14 19:05:33 UTC
A commit references this bug:

Author: jhb
Date: Wed Mar 14 19:04:41 UTC 2018
New revision: 330938
URL: https://svnweb.freebsd.org/changeset/base/330938

Log:
  Convert pci_delete_child() to a bus_child_deleted() method.

  Instead of providing a wrapper around device_delete_child() that the PCI
  bus and child bus drivers must call explicitly, move the bulk of the logic
  from pci_delete_child() into a bus_child_deleted() method
  (pci_child_deleted()).  This allows PCI devices to be safely deleted via
  device_delete_child().
  - Add a bus_child_deleted method to the ACPI PCI bus which clears the
    device_t associated with the corresponding ACPI handle in addition to
    the normal PCI bus cleanup.
  - Change cardbus_detach_card to call device_delete_children() and move
    CardBus-specific delete logic into a new cardbus_child_deleted() method.
  - Use device_delete_child() instead of pci_delete_child() in the SRIOV code.
  - Add a bus_child_deleted method to the OpenFirmware PCI bus drivers which
    frees the OpenFirmware device info for each PCI device.

  To preserve KBI, a pci_delete_child() function is left in place that
  just calls device_delete_child().

  PR:		226562
  Requested by:	dexuan

Changes:
_U  stable/10/
  stable/10/sys/dev/acpica/acpi_pci.c
  stable/10/sys/dev/cardbus/cardbus.c
  stable/10/sys/dev/pci/pci.c
  stable/10/sys/dev/pci/pci_private.h
  stable/10/sys/powerpc/ofw/ofw_pcibus.c
  stable/10/sys/sparc64/pci/ofw_pcibus.c
Comment 7 John Baldwin freebsd_committer freebsd_triage 2018-03-14 19:05:43 UTC
Yeah, I didn't find any follow up fixes that it needed to be grouped with.  I did include a KBI combat shim in the MFC though.
Comment 8 Dexuan Cui 2018-03-14 19:53:17 UTC
(In reply to John Baldwin from comment #7)
Hi John,
As I tested today's stable/10, I can confirm the bug was fixed.
Thank you very much!
Comment 9 Dexuan Cui 2018-04-11 19:34:30 UTC
(In reply to John Baldwin from comment #7)
(In reply to Dexuan Cui from comment #8)

Hi John,

Today I ran
# freebsd-update fetch
# freebsd-update install

and got this version by "uname -a":

FreeBSD bsd104 10.4-RELEASE-p8 FreeBSD 10.4-RELEASE-p8 #0: Tue Apr  3 18:40:50 UTC 2018     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

and the issue can still reproduce.

I know 10.4 will be EOL on October 31 (https://www.freebsd.org/security/), so I'm wondering if we should merge your fix from stable/10 branch to releng/10.4?

If yes, I suppose we still need the errata procedure to do this -- if yes, can you help to file the errata for this patch? Or, if you'd like me to file it, please feel free to let me know. :-)

I suppose 10.4 will continue to be used by a lot of people even after it becomes EOF this Oct...