I've created a 10.3-BETA2 image for Google Compute Engine using swills' script and am getting a panic on boot when the VM is configured with Local SSD as NVMe (--local-ssd interface="NVME"). This is a regression from 10.2-RELEASE which will boot successfully with an identical configuration. Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x60 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80e16019 stack pointer = 0x28:0xfffffe01bfff59c0 frame pointer = 0x28:0xfffffe01bfff59e0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq11: virtio_pci0+) [ thread pid 12 tid 100039 ] Stopped at nvme_ctrlr_intx_handler+0x39: cmpq $0,0x60(%rdi) db> bt Tracing pid 12 tid 100039 td 0xfffff8000422e000 nvme_ctrlr_intx_handler() at nvme_ctrlr_intx_handler+0x39/frame 0xfffffe01bfff59e0 intr_event_execute_handlers() at intr_event_execute_handlers+0xab/frame 0xfffffe01bfff5a20 ithread_loop() at ithread_loop+0x96/frame 0xfffffe01bfff5a70 fork_exit() at fork_exit+0x9a/frame 0xfffffe01bfff5ab0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe01bfff5ab0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- Later with kgdb looks like ctrlr->ioq is null. (Line numbers won't line up exactly since I was attempting suggestions from the freebsd-stable@ thread, but panic was the same in all cases.) 0xffffffff80e16029 in nvme_ctrlr_intx_handler (arg=0xfffffe0000953000) at /usr/src/sys/dev/nvme/nvme_ctrlr.c:819 819 if (ctrlr->ioq[0].cpl) (kgdb) print ((struct nvme_controller *)arg)->ioq $1 = (struct nvme_qpair *) 0x0
Thank you for the report Andy If possible, could you: * Confirm whether or not the issue is reproducible on a recent 11.0-CURRENT * Include (as an attachment) another backtrace after the panic if reproducible
This is a regression due to r293328. This will happen on 11-CURRENT as well. r293328 changed when the controller's ioq array was allocated, such that when we start getting INTx interrupts for the admin queue, ioq is not allocated yet and caused this panic. See attached patch.
Created attachment 167329 [details] Patch for bug 207432
It's probably worth noting that avoiding INTx with 'hw.pci.honor_msi_blacklist=0' in /boot/loader.conf allows things to boot and function normally.
Hi Andy, Are you able to test the attached patch? I'm pretty sure this fixes your issue but wanted to wait to commit in case you can verify it. Thanks, -Jim
I can confirm that the attached patch does boot successfully in INTx mode. Thanks!
A commit references this bug: Author: jimharris Date: Wed Feb 24 00:01:10 UTC 2016 New revision: 295944 URL: https://svnweb.freebsd.org/changeset/base/295944 Log: nvme: fix intx handler to not dereference ioq during initialization This was a regression from r293328, which deferred allocation of the controller's ioq array until after interrupts are enabled during boot. PR: 207432 Reported and tested by: Andy Carrel <wac@google.com> MFC after: 3 days Sponsored by: Intel Changes: head/sys/dev/nvme/nvme_ctrlr.c