r320065: Sun Jun 18 04:22:09 UTC 2017 - works r320900: Wed Jul 12 03:00:15 UTC 2017 - panics Sample of boot failure: <118>Setting hostname: tiny.nyi.freebsd.org. <118>Setting up harvesting: [UMA], [FS_ATIME],SWI,INTERRUPT,NET_NG,NET_ETHER,NET_TUN,MOUSE,KEYBOARD,D <118>Feeding entropy: . Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x28 Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 07 fault virtual address = 0x28 Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 06 apic id = 00 fault virtual address = 0x28 = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, fault virtual address = 0x28 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff803aab56 stack pointer = 0x28:0xfffffe0239fa3a90 fault code = supervisor read data, page not present IOPL = 0 current process = 0 (zio_write_intr_0) frame pointer = 0x28:0xfffffe0239fa3aa0 db> where Tracing pid 0 tid 100471 td 0xfffff80005452000 vdev_geom_io_done() at vdev_geom_io_done+0x36/frame 0xfffffe0239f9eaa0 zio_vdev_io_done() at zio_vdev_io_done+0x176/frame 0xfffffe0239f9ead0 zio_execute() at zio_execute+0xac/frame 0xfffffe0239f9eb20 taskqueue_run_locked() at taskqueue_run_locked+0x127/frame 0xfffffe0239f9eb80 taskqueue_thread_loop() at taskqueue_thread_loop+0xc8/frame 0xfffffe0239f9ebb0 fork_exit() at fork_exit+0x85/frame 0xfffffe0239f9ebf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0239f9ebf0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- Sample of panic when a volume degrades: root@nope.ysv.freebsd.org:/home/peter # zpool offline zroot mfid5p3 Fatal trap 12: page fault while in kernel mode cpuid = 4; apic id = 04 Fatal trap 12: page fault while in kernel mode fault virtual address = 0x28 Fatal trap 12: page fault while in kernel mode Fatal trap 12: page fault while in kernel mode Fatal trap 12: page fault while in kernel mode cpuid = 7; apic id = 07 cpuid = 1; apic id = 01 fault virtual address = 0x28 fault code = supervisor read data, page not present cpuid = 3; cpuid = 5; apic id = 03 Fatal trap 12: page fault while in kernel mode apic id = 05 fault virtual address = 0x28 fault virtual address = 0x28 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff803aab56 stack pointer = 0x28:0xfffffe085fb3aa90 instruction pointer = 0x20:0xffffffff803aab56 fault code = supervisor read data, page not present cpuid = 6; fault virtual address = 0x28 Fatal trap 12: page fault while in kernel mode fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff803aab56 stack pointer = 0x28:0xfffffe085fb3fa90 frame pointer = 0x28:0xfffffe085fb3aaa0 fault code = supervisor read data, page not present cpuid = 2; apic id = 02 apic id = 06 instruction pointer = 0x20:0xffffffff803aab56 fault virtual address = 0x28 fault code = supervisor read data, page not present stack pointer = 0x28:0xfffffe085fb30a90 instruction pointer = 0x20:0xffffffff803aab56 stack pointer = 0x28:0xfffffe085fb35a90 frame pointer = 0x28:0xfffffe085fb3faa0 code segment = base rx0, limit 0xfffff, type 0x1b stack pointer = 0x28:0xfffffe085fb44a90 frame pointer = 0x28:0xfffffe085fb44aa0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 fault virtual address = 0x28 = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, instruction pointer = 0x20:6 frame pointer = 0x28:0xfffffe085fb30aa0 code segment = base rx0, limit 0xfffff, type 0x1b code segment = base rx0, limit 0xfffff, type 0x1b frame pointer = 0x28:0xfffffe085fb35aa0 code segment = base rx0, limit 0xfffff, type 0x1b resume, IOPL = 0 stack pointer = 0x28:0xfffffe085fb26a90 = DPL 0, pres 1, long 1, def32 0, gran 1 = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = fault code = supervisor read data, page not frame pointer = 0x28:0xfffffe085fb26aa0 instruction pointer = 0x20:0xffffffff803aab56 processor eflags = interrupt enabled, code segment = base b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (zio_write_intr_2) [ thread pid 0 tid 100500 ] Stopped at vdev_geom_io_done+0x36: movq 0x28(%rbx),%rsi db> where Tracing pid 0 tid 100500 td 0xfffff8000aae6000 vdev_geom_io_done() at vdev_geom_io_done+0x36/frame 0xfffffe085fb30aa0 zio_vdev_io_done() at zio_vdev_io_done+0x176/frame 0xfffffe085fb30ad0 zio_execute() at zio_execute+0xac/frame 0xfffffe085fb30b20 taskqueue_run_locked() at taskqueue_run_locked+0x127/frame 0xfffffe085fb30b80 taskqueue_thread_loop() at taskqueue_thread_loop+0xc8/frame 0xfffffe085fb30bb0 fork_exit() at fork_exit+0x85/frame 0xfffffe085fb30bf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe085fb30bf0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- db> All cores trapped concurrently. zio in the vdev_geom_io_done() function is null.
Oops, make that: "zio->io_bio is NULL".
Indeed, it also affects mirrors. It's panicking at line 1094 of src/svn-current/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c: abd_return_buf_copy(zio->io_abd, bp->bio_data, zio->io_size); bp is a null pointer.
Created attachment 184417 [details] proposed patch Could everyone affected and anyone interested please test this patch? Thank you!
My patch is similar to your patch. It resolves the issue. I'll test yours as well.
(In reply to Cy Schubert from comment #4) Thank you! Could you please test the kernel with INVARIANTS if possible?
(In reply to Andriy Gapon from comment #5) No messages to console. Kernel built with: cwsys# strings /boot/kernel/kernel | grep INVARI Kernel compiled with INVARIANTS, may affect performance Support for modules compiled with INVARIANTS option options INVARIANT_SUPPORT options INVARIANTS cwsys#
A commit references this bug: Author: avg Date: Tue Jul 18 07:41:39 UTC 2017 New revision: 321111 URL: https://svnweb.freebsd.org/changeset/base/321111 Log: fix a regression in r320452, ZFS ABD import I overlooked the fact that vdev_op_io_done hook is called even if the actual I/O is skipped, for example, in the case of a missing vdev. Arguably, this could be considered an issue in the zio pipeline engine, but for now I am adding defensive code to check for io_bp being NULL along with assertions that that happens only when it can be really expected. PR: 220691 Reported by: peter, cy Tested by: cy MFC after: 1 week X-MFC with: r320156, r320452 Changes: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
This fixes both of the cases that we encountered in the cluster. Thank you!!
The issue is fixed in the only branch where it was present.