Bug 247208 - [mpt] VMWare virtualized LSI controller panics during hot-attach
Summary: [mpt] VMWare virtualized LSI controller panics during hot-attach
Status: In Progress
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.3-RELEASE
Hardware: amd64 Any
: --- Affects Many People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-06-12 14:34 UTC by Allan Jude
Modified: 2020-06-30 18:59 UTC (History)
3 users (show)

See Also:


Attachments
proposed patch (1.53 KB, patch)
2020-06-12 14:58 UTC, Mark Johnston
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Allan Jude freebsd_committer 2020-06-12 14:34:36 UTC
Hot-attaching an LSI controller to a VMWare instance causes a panic on 2 out of 2 machines I tested on:


Jun 12 09:47:52 nfs-server-00 pcib11: Attention Button Pressed: Detaching in 5 seconds
Jun 12 09:47:52 nfs-server-00 pci4: <ACPI PCI bus> on pcib11
Jun 12 09:47:52 nfs-server-00 mpt1: <LSILogic SAS/SATA Adapter> at device 0.0 on pci4
Jun 12 09:47:52 nfs-server-00 mpt1: MPI Version=1.5.0.0
Jun 12 09:47:52 nfs-server-00 mpt1: cannot allocate 262144 bytes of request memory
Jun 12 09:47:52 nfs-server-00 mpt1: mpt_dma_buf_alloc() failed!
Jun 12 09:47:52 nfs-server-00 mpt1: failed to enable port 0
Jun 12 09:49:26 nfs-server-00 syslog-ng[1045]: syslog-ng starting up; version='3.23.1'
Jun 12 09:49:26 nfs-server-00 Fatal trap 12: page fault while in kernel mode
Jun 12 09:49:26 nfs-server-00 cpuid = 0; apic id = 00
Jun 12 09:49:26 nfs-server-00 fault virtual address = 0x10
Jun 12 09:49:26 nfs-server-00 fault code            = supervisor read data, page not present
Jun 12 09:49:26 nfs-server-00 instruction pointer   = 0x20:0xffffffff80e67174
Jun 12 09:49:26 nfs-server-00 stack pointer         = 0x28:0xfffffe083e1a6940
Jun 12 09:49:26 nfs-server-00 frame pointer         = 0x28:0xfffffe083e1a6940
Jun 12 09:49:26 nfs-server-00 code segment          = base rx0, limit 0xfffff, type 0x1b
Jun 12 09:49:26 nfs-server-00                       = DPL 0, pres 1, long 1, def32 0, gran 1
Jun 12 09:49:26 nfs-server-00 processor eflags      = interrupt enabled, resume, IOPL = 0
Jun 12 09:49:26 nfs-server-00 current process               = 0 (thread taskq)
Jun 12 09:49:26 nfs-server-00 trap number           = 12
Jun 12 09:49:26 nfs-server-00 panic: page fault
Jun 12 09:49:26 nfs-server-00 cpuid = 0
Jun 12 09:49:26 nfs-server-00 KDB: stack backtrace:
Jun 12 09:49:26 nfs-server-00 db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe083e1a65f0
Jun 12 09:49:26 nfs-server-00 vpanic() at vpanic+0x17e/frame 0xfffffe083e1a6650
Jun 12 09:49:26 nfs-server-00 panic() at panic+0x43/frame 0xfffffe083e1a66b0
Jun 12 09:49:26 nfs-server-00 trap_fatal() at trap_fatal+0x369/frame 0xfffffe083e1a6700
Jun 12 09:49:26 nfs-server-00 trap_pfault() at trap_pfault+0x49/frame 0xfffffe083e1a6760
Jun 12 09:49:26 nfs-server-00 trap() at trap+0x29d/frame 0xfffffe083e1a6870
Jun 12 09:49:26 nfs-server-00 calltrap() at calltrap+0x8/frame 0xfffffe083e1a6870
Jun 12 09:49:26 nfs-server-00 --- trap 0xc, rip = 0xffffffff80e67174, rsp = 0xfffffe083e1a6940, rbp = 0xfffffe083e1a6940 ---
Jun 12 09:49:26 nfs-server-00 vm_page_next() at vm_page_next+0x4/frame 0xfffffe083e1a6940
Jun 12 09:49:26 nfs-server-00 kmem_unback() at kmem_unback+0x88/frame 0xfffffe083e1a6980
Jun 12 09:49:26 nfs-server-00 kmem_free() at kmem_free+0x43/frame 0xfffffe083e1a69b0
Jun 12 09:49:26 nfs-server-00 mpt_core_detach() at mpt_core_detach+0xf9/frame 0xfffffe083e1a69d0
Jun 12 09:49:26 nfs-server-00 mpt_detach() at mpt_detach+0xd5/frame 0xfffffe083e1a69f0
Jun 12 09:49:26 nfs-server-00 mpt_pci_detach() at mpt_pci_detach+0x23/frame 0xfffffe083e1a6a10
Jun 12 09:49:26 nfs-server-00 device_detach() at device_detach+0x183/frame 0xfffffe083e1a6a50
Jun 12 09:49:26 nfs-server-00 bus_generic_detach() at bus_generic_detach+0x48/frame 0xfffffe083e1a6a70
Jun 12 09:49:26 nfs-server-00 pci_detach() at pci_detach+0xe/frame 0xfffffe083e1a6a90
Jun 12 09:49:26 nfs-server-00 device_detach() at device_detach+0x183/frame 0xfffffe083e1a6ad0
Jun 12 09:49:26 nfs-server-00 device_delete_child() at device_delete_child+0x15/frame 0xfffffe083e1a6af0
Jun 12 09:49:26 nfs-server-00 pcib_pcie_hotplug_task() at pcib_pcie_hotplug_task+0x87/frame 0xfffffe083e1a6b20
Jun 12 09:49:26 nfs-server-00 taskqueue_run_locked() at taskqueue_run_locked+0x185/frame 0xfffffe083e1a6b80
Jun 12 09:49:26 nfs-server-00 taskqueue_thread_loop() at taskqueue_thread_loop+0xb8/frame 0xfffffe083e1a6bb0
Jun 12 09:49:26 nfs-server-00 fork_exit() at fork_exit+0x83/frame 0xfffffe083e1a6bf0
Comment 1 Mark Johnston freebsd_committer 2020-06-12 14:50:12 UTC
Looks like mpt_dma_buf_free() doesn't handle this particular failure mode properly.
Comment 2 Mark Johnston freebsd_committer 2020-06-12 14:58:52 UTC
Created attachment 215490 [details]
proposed patch

Could you try this patch?  It'll still be necessary to figure out why the memory allocation is failing, but hopefully it won't panic anymore.

mpt_dma_mem_free() has the same problem for now.