Hot-attaching an LSI controller to a VMWare instance causes a panic on 2 out of 2 machines I tested on: Jun 12 09:47:52 nfs-server-00 pcib11: Attention Button Pressed: Detaching in 5 seconds Jun 12 09:47:52 nfs-server-00 pci4: <ACPI PCI bus> on pcib11 Jun 12 09:47:52 nfs-server-00 mpt1: <LSILogic SAS/SATA Adapter> at device 0.0 on pci4 Jun 12 09:47:52 nfs-server-00 mpt1: MPI Version=1.5.0.0 Jun 12 09:47:52 nfs-server-00 mpt1: cannot allocate 262144 bytes of request memory Jun 12 09:47:52 nfs-server-00 mpt1: mpt_dma_buf_alloc() failed! Jun 12 09:47:52 nfs-server-00 mpt1: failed to enable port 0 Jun 12 09:49:26 nfs-server-00 syslog-ng[1045]: syslog-ng starting up; version='3.23.1' Jun 12 09:49:26 nfs-server-00 Fatal trap 12: page fault while in kernel mode Jun 12 09:49:26 nfs-server-00 cpuid = 0; apic id = 00 Jun 12 09:49:26 nfs-server-00 fault virtual address = 0x10 Jun 12 09:49:26 nfs-server-00 fault code = supervisor read data, page not present Jun 12 09:49:26 nfs-server-00 instruction pointer = 0x20:0xffffffff80e67174 Jun 12 09:49:26 nfs-server-00 stack pointer = 0x28:0xfffffe083e1a6940 Jun 12 09:49:26 nfs-server-00 frame pointer = 0x28:0xfffffe083e1a6940 Jun 12 09:49:26 nfs-server-00 code segment = base rx0, limit 0xfffff, type 0x1b Jun 12 09:49:26 nfs-server-00 = DPL 0, pres 1, long 1, def32 0, gran 1 Jun 12 09:49:26 nfs-server-00 processor eflags = interrupt enabled, resume, IOPL = 0 Jun 12 09:49:26 nfs-server-00 current process = 0 (thread taskq) Jun 12 09:49:26 nfs-server-00 trap number = 12 Jun 12 09:49:26 nfs-server-00 panic: page fault Jun 12 09:49:26 nfs-server-00 cpuid = 0 Jun 12 09:49:26 nfs-server-00 KDB: stack backtrace: Jun 12 09:49:26 nfs-server-00 db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe083e1a65f0 Jun 12 09:49:26 nfs-server-00 vpanic() at vpanic+0x17e/frame 0xfffffe083e1a6650 Jun 12 09:49:26 nfs-server-00 panic() at panic+0x43/frame 0xfffffe083e1a66b0 Jun 12 09:49:26 nfs-server-00 trap_fatal() at trap_fatal+0x369/frame 0xfffffe083e1a6700 Jun 12 09:49:26 nfs-server-00 trap_pfault() at trap_pfault+0x49/frame 0xfffffe083e1a6760 Jun 12 09:49:26 nfs-server-00 trap() at trap+0x29d/frame 0xfffffe083e1a6870 Jun 12 09:49:26 nfs-server-00 calltrap() at calltrap+0x8/frame 0xfffffe083e1a6870 Jun 12 09:49:26 nfs-server-00 --- trap 0xc, rip = 0xffffffff80e67174, rsp = 0xfffffe083e1a6940, rbp = 0xfffffe083e1a6940 --- Jun 12 09:49:26 nfs-server-00 vm_page_next() at vm_page_next+0x4/frame 0xfffffe083e1a6940 Jun 12 09:49:26 nfs-server-00 kmem_unback() at kmem_unback+0x88/frame 0xfffffe083e1a6980 Jun 12 09:49:26 nfs-server-00 kmem_free() at kmem_free+0x43/frame 0xfffffe083e1a69b0 Jun 12 09:49:26 nfs-server-00 mpt_core_detach() at mpt_core_detach+0xf9/frame 0xfffffe083e1a69d0 Jun 12 09:49:26 nfs-server-00 mpt_detach() at mpt_detach+0xd5/frame 0xfffffe083e1a69f0 Jun 12 09:49:26 nfs-server-00 mpt_pci_detach() at mpt_pci_detach+0x23/frame 0xfffffe083e1a6a10 Jun 12 09:49:26 nfs-server-00 device_detach() at device_detach+0x183/frame 0xfffffe083e1a6a50 Jun 12 09:49:26 nfs-server-00 bus_generic_detach() at bus_generic_detach+0x48/frame 0xfffffe083e1a6a70 Jun 12 09:49:26 nfs-server-00 pci_detach() at pci_detach+0xe/frame 0xfffffe083e1a6a90 Jun 12 09:49:26 nfs-server-00 device_detach() at device_detach+0x183/frame 0xfffffe083e1a6ad0 Jun 12 09:49:26 nfs-server-00 device_delete_child() at device_delete_child+0x15/frame 0xfffffe083e1a6af0 Jun 12 09:49:26 nfs-server-00 pcib_pcie_hotplug_task() at pcib_pcie_hotplug_task+0x87/frame 0xfffffe083e1a6b20 Jun 12 09:49:26 nfs-server-00 taskqueue_run_locked() at taskqueue_run_locked+0x185/frame 0xfffffe083e1a6b80 Jun 12 09:49:26 nfs-server-00 taskqueue_thread_loop() at taskqueue_thread_loop+0xb8/frame 0xfffffe083e1a6bb0 Jun 12 09:49:26 nfs-server-00 fork_exit() at fork_exit+0x83/frame 0xfffffe083e1a6bf0
Looks like mpt_dma_buf_free() doesn't handle this particular failure mode properly.
Created attachment 215490 [details] proposed patch Could you try this patch? It'll still be necessary to figure out why the memory allocation is failing, but hopefully it won't panic anymore. mpt_dma_mem_free() has the same problem for now.
@Allan have you had a chance to test this?
For a baseline I tried: root@freebsd13:/home/jpaetzel # uname -a FreeBSD freebsd13.demosp.com 13.0-CURRENT FreeBSD 13.0-CURRENT #0 r364438: Fri Aug 21 09:29:22 PDT 2020 jpaetzel@freebsd13.demosp.com:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 I hot added an MPT controller to the system and didn't get a panic. What version of FreeBSD were you using that got the panic?
Ok, so I see the reason I couldn't reproduce the panic is this patch has already been committed to HEAD. So I think that's verification that the patch works.
(In reply to Josh Paetzel from comment #5) It was not committed as far as I know. The patch fixes bugs in the controller detach path. That is, the driver fails to attach for some reason (which you don't seem to hit), and panics when cleaning up.
I see said the blind man as he pissed into the wind, it's all coming back to me now. I had built the kernel with the patch, but tried to reproduce on the original kernel. I couldn't reproduce the issue, then looked at your patch and my sources (which already had the patch applied) and came to he conclusion that I couldn't reproduce the issue because I was testing the fix. To be clear, I was testing stock FreeBSD HEAD WITHOUT the patch and could not reproduce the problem.
^Triage: clear stale flags. To submitter: is this aging PR still relevant?