While working on improving virtio-scsi, I accidentally screwed things up in the bhyve userspace code, and that led to a panic in CTL: ``` Kernel page fault with the following non-sleepable locks held: exclusive sleep mutex CTL LUN (CTL LUN) r = 0 (0xfffff816e1081800) locked @ /usr/src/sys/cam/ctl/ctl.c:12274 stack backtrace: #0 0xffffffff80c0624c at witness_debugger+0x6c #1 0xffffffff80c07460 at witness_warn+0x430 #2 0xffffffff810e1bec at trap_pfault+0x8c #3 0xffffffff810b37c8 at calltrap+0x8 #4 0xffffffff8273cc47 at ctl_run+0x87 #5 0xffffffff82751f23 at ctl_ioctl_io+0x173 #6 0xffffffff80a09861 at devfs_ioctl+0xd1 #7 0xffffffff811ad061 at VOP_IOCTL_APV+0x51 #8 0xffffffff80cadcd0 at vn_ioctl+0x160 #9 0xffffffff80a09f2e at devfs_ioctl_f+0x1e #10 0xffffffff80c0c201 at kern_ioctl+0x2a1 #11 0xffffffff80c0beff at sys_ioctl+0x12f #12 0xffffffff810e2989 at amd64_syscall+0x169 #13 0xffffffff810b40bb at fast_syscall_common+0xf8 Fatal trap 12: page fault while in kernel mode cpuid = 9; apic id = 09 fault virtual address = 0xfffffe023f28ea68 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff8273cfba stack pointer = 0x28:0xfffffe019753e940 frame pointer = 0x28:0xfffffe019753ead0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 10033 (vtscsi:0-14) rdi: 00000000000ba34d rsi: fffffe023ecbd000 rdx: 000000005d1a6400 rcx: 0000000000007f71 r8: 0000000000000001 r9: ffffffff81e54920 rax: 000000005d1a6c00 rbx: fffffe0200e5f000 rbp: fffffe019753ead0 r10: 0000000000000000 r11: 0000000000000001 r12: 0000000000000000 r13: fffffe0200e5f000 r14: fffff816e1081800 r15: ffffffff8276a0d0 trap number = 12 panic: page fault cpuid = 9 time = 1763225206 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe019753e670 vpanic() at vpanic+0x136/frame 0xfffffe019753e7a0 panic() at panic+0x43/frame 0xfffffe019753e800 trap_pfault() at trap_pfault+0x47c/frame 0xfffffe019753e870 calltrap() at calltrap+0x8/frame 0xfffffe019753e870 --- trap 0xc, rip = 0xffffffff8273cfba, rsp = 0xfffffe019753e940, rbp = 0xfffffe019753ead0 --- ctl_scsiio_precheck() at ctl_scsiio_precheck+0x31a/frame 0xfffffe019753ead0 ctl_run() at ctl_run+0x87/frame 0xfffffe019753eaf0 ctl_ioctl_io() at ctl_ioctl_io+0x173/frame 0xfffffe019753ebc0 devfs_ioctl() at devfs_ioctl+0xd1/frame 0xfffffe019753ec10 VOP_IOCTL_APV() at VOP_IOCTL_APV+0x51/frame 0xfffffe019753ec40 vn_ioctl() at vn_ioctl+0x160/frame 0xfffffe019753ecb0 devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfffffe019753ecd0 kern_ioctl() at kern_ioctl+0x2a1/frame 0xfffffe019753ed40 sys_ioctl() at sys_ioctl+0x12f/frame 0xfffffe019753ee00 amd64_syscall() at amd64_syscall+0x169/frame 0xfffffe019753ef30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe019753ef30 --- syscall (54, FreeBSD ELF64, ioctl), rip = 0x1fd2863b2fba, rsp = 0x1fd6dabfaf28, rbp = 0x1fd6dabfaf70 --- KDB: enter: panic [ thread pid 10033 tid 101113 ] Stopped at kdb_enter+0x33: movq $0,0x121d032(%rip) ``` So the system panicked in `ctl_scsiio_precheck()`, which I would have assumed was there to make sure that the SCSI I/O request sent to CTL is valid. Why would it panic on an invalid I/O request? The code leading up to the panic is this: ``` ctl_scsiio_precheck+0x2fe: movl 0xc(%rbx),%edx ctl_scsiio_precheck+0x301: movl 0x10(%rbx),%eax ctl_scsiio_precheck+0x304: shll $0xb,%eax ctl_scsiio_precheck+0x307: addl %edx,%eax ctl_scsiio_precheck+0x309: cmpl $0x3,%esi ctl_scsiio_precheck+0x30c: jz ctl_scsiio_precheck+0x338 ctl_scsiio_precheck+0x30e: movq 0x90(%r14),%rsi ctl_scsiio_precheck+0x315: movl %eax,%edi ctl_scsiio_precheck+0x317: shrl $0xb,%edi ctl_scsiio_precheck+0x31a: movq (%rsi,%rdi,8),%rsi ``` Going back to the register dump from the panic, we see that `%rsi` contains the pointer to an array, and `%rdi` is used for indexing. Its value at the time was 0xba34d, which was 0x5d1a6800 + x before being shifted right by 0xb. This is the source fragment where the panic happened: ``` initidx = ctl_get_initindex(&ctsio->io_hdr.nexus); /* * If we've got a request sense, it'll clear the contingent * allegiance condition. Otherwise, if we have a CA condition for * this initiator, clear it, because it sent down a command other * than request sense. */ if (ctsio->cdb[0] != REQUEST_SENSE) { struct scsi_sense_data *ps; ps = lun->pending_sense[initidx / CTL_MAX_INIT_PER_PORT]; if (ps != NULL) ps[initidx % CTL_MAX_INIT_PER_PORT].error_code = 0; } ``` So we're getting `initidx` from the I/O header, and use it to index into `lun->pending_sense`. With an index value of 0xba34d, we'll reach far beyond the end of the array, and we can actually consider ourselves lucky that this causes a panic right away. This is what `ctl_get_initindex()` looks like: ``` uint32_t ctl_get_initindex(struct ctl_nexus *nexus) { return (nexus->initid + (nexus->targ_port * CTL_MAX_INIT_PER_PORT)); } ``` So its really just calculating an index from `initid` and `targ_port` given in the `ctl_nexus` structure. So let's look at the `ctsio` containing that `ctl_nexus`, which we got from the ioctl call. From the disassembly we know that its address is in `%r13`, which is 0xfffffe0200e5f000. ``` db> ex/x 0xfffffe0200e5f000,10 0xfffffe0200e5f000: 0 1 0 5d1a6400 ``` Now, this looks familiar, doesn't it? The userspace code passed 0x51da6400 as `initid` in `ctl_io->io_hdr.nexus`, apparently because I screwed up in the bhyve code. But the kernel really should have validated this input from userspace, making sure the initid is actually within reasonable limits, before using it to form an index into an in-kernel array.