I recently upgraded to FreeBSD 12.2-RELEASE r366954 GENERIC amd64 and run plexmediaserver-plexpass-1.20.3.3483 Now I consistently receive a kernel panic soon after boot: Fatal trap 18: integer divide fault while in kernel mode cpuid = 11; apic id = 0b instruction pointer = 0x20:0xffffffff82963725 stack pointer = 0x28:0xfffffe009f78d480 frame pointer = 0x28:0xfffffe009f78d510 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 2247 (Plex Media Server) trap number = 18 panic: integer divide fault cpuid = 11 time = 1604252527 KDB: stack backtrace: #0 0xffffffff80c0a8f5 at kdb_backtrace+0x65 #1 0xffffffff80bbeb1b at vpanic+0x17b #2 0xffffffff80bbe993 at panic+0x43 #3 0xffffffff8108f911 at trap_fatal+0x391 #4 0xffffffff8108ed97 at trap+0x67 #5 0xffffffff81066f38 at calltrap+0x8 #6 0xffffffff82a148ce at zfs_rmnode+0x5e #7 0xffffffff82a35612 at zfs_freebsd_reclaim+0x42 #8 0xffffffff812482db at VOP_RECLAIM_APV+0x7b #9 0xffffffff80c8e376 at vgonel+0x216 #10 0xffffffff80c8e9c5 at vrecycle+0x45 #11 0xffffffff82a355bd at zfs_freebsd_inactive+0xd #12 0xffffffff812481ab at VOP_INACTIVE_APV+0x7b #13 0xffffffff80c8d0b0 at vinactive+0xf0 #14 0xffffffff80c8d5c8 at vputx+0x2b8 #15 0xffffffff80c9722c at kern_unlinkat+0x20c #16 0xffffffff810904c7 at amd64_syscall+0x387 #17 0xffffffff8106785e at fast_syscall_common+0xf8 Uptime: 1m27s I am not clear exactly which file is unlinked. Thank you for any assistance in resolving this situation, and let me know if I can provide additional information.
Do you have a crash dump?
Hi Andriy, yes, I did capture a crash dump - the panic trace above is from `dmesg -M` I tried to follow the FAQ entries about kgdb debugging, but despite running a GENERIC kernel which should have symbols, kgdb does not seem to be symbolicating my bt: % /usr/libexec/kgdb /boot/kernel/kernel vmcore.0 GNU gdb 6.1.1 [FreeBSD] This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols found)... Attempt to extract a component of a value that is not a structure pointer. Attempt to extract a component of a value that is not a structure pointer. #0 0xffffffff80bbe90a in doadump () (kgdb) bt #0 0xffffffff80bbe90a in doadump () #1 0xffffffff80bbe735 in kern_reboot () #2 0xffffffff80bbeb73 in vpanic () #3 0xffffffff80bbe993 in panic () #4 0xffffffff8108f911 in trap_fatal () #5 0xffffffff8108ed97 in trap () #6 0xffffffff81066f38 in calltrap () #7 0xffffffff82963725 in ?? () #8 0xfffff80432466800 in ?? () #9 0x0000000002000000 in ?? () #10 0xfffff804325045c0 in ?? () #11 0x000000000a379546 in ?? () #12 0xffffffffffffffff in ?? () #13 0xfffff8000521bc00 in ?? () #14 0xfffff80003a9f988 in ?? () #15 0x0000000000000000 in ?? ()
Is there any additional information I can provide here? Thanks
Try using kgdb from ports (comes with the gdb package), the one in the base system is ancient and slated to be removed. Though, the lack of symbols I think implies that your /usr/lib/debug/boot/kernel is not populated. Kernel debug symbols are shipped as a separate set, for 12.2 they'll be in https://ftp.freebsd.org/pub/FreeBSD/releases/amd64/12.2-RELEASE/kernel-dbg.txz
That works much better. I tried to briefly dig in and tried to identify the division causing the panic, which does look like an integer division of zero (minimum) by zero (iblkrange): Reading symbols from /boot/kernel/kernel... Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug... Unread portion of the kernel message buffer: Fatal trap 18: integer divide fault while in kernel mode cpuid = 11; apic id = 0b instruction pointer = 0x20:0xffffffff82963725 stack pointer = 0x28:0xfffffe009f78d480 frame pointer = 0x28:0xfffffe009f78d510 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 2247 (Plex Media Server) trap number = 18 panic: integer divide fault cpuid = 11 time = 1604252527 KDB: stack backtrace: #0 0xffffffff80c0a8f5 at kdb_backtrace+0x65 #1 0xffffffff80bbeb1b at vpanic+0x17b #2 0xffffffff80bbe993 at panic+0x43 #3 0xffffffff8108f911 at trap_fatal+0x391 #4 0xffffffff8108ed97 at trap+0x67 #5 0xffffffff81066f38 at calltrap+0x8 #6 0xffffffff82a148ce at zfs_rmnode+0x5e #7 0xffffffff82a35612 at zfs_freebsd_reclaim+0x42 #8 0xffffffff812482db at VOP_RECLAIM_APV+0x7b #9 0xffffffff80c8e376 at vgonel+0x216 #10 0xffffffff80c8e9c5 at vrecycle+0x45 #11 0xffffffff82a355bd at zfs_freebsd_inactive+0xd #12 0xffffffff812481ab at VOP_INACTIVE_APV+0x7b #13 0xffffffff80c8d0b0 at vinactive+0xf0 #14 0xffffffff80c8d5c8 at vputx+0x2b8 #15 0xffffffff80c9722c at kern_unlinkat+0x20c #16 0xffffffff810904c7 at amd64_syscall+0x387 #17 0xffffffff8106785e at fast_syscall_common+0xf8 Uptime: 1m27s (da0:isci0:0:0:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 (da0:isci0:0:0:0): CAM status: Command timeout (da0:isci0:0:0:0): Error 5, Retries exhausted (da0:isci0:0:0:0): Synchronize cache failed Dumping 1431 out of 32694 MB:..2%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 warning: Source file is more recent than executable. 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) (kgdb) bt #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:371 #2 0xffffffff80bbe735 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:451 #3 0xffffffff80bbeb73 in vpanic (fmt=<optimized out>, ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:880 #4 0xffffffff80bbe993 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:807 #5 0xffffffff8108f911 in trap_fatal (frame=0xfffffe009f78d3c0, eva=0) at /usr/src/sys/amd64/amd64/trap.c:921 #6 0xffffffff8108ed97 in trap (frame=0xfffffe009f78d3c0) at /usr/src/sys/amd64/amd64/trap.c:214 #7 <signal handler called> #8 0xffffffff82963725 in get_next_chunk (dn=0xfffff804325045c0, start=<optimized out>, minimum=0, l1blks=<optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:721 #9 dmu_free_long_range_impl (os=<optimized out>, dn=0xfffff804325045c0, offset=0, length=<optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:811 #10 dmu_free_long_range (os=<optimized out>, object=<optimized out>, offset=<optimized out>, length=<optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:887 #11 0xffffffff82a148ce in zfs_rmnode (zp=0xfffff803fc50a900) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:452 #12 0xffffffff82a35612 in zfs_freebsd_reclaim (ap=<optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:5416 #13 0xffffffff812482db in VOP_RECLAIM_APV (vop=0xffffffff82adf1a0 <zfs_vnodeops>, a=0xfffffe009f78d720) at vnode_if.c:2021 #14 0xffffffff80c8e376 in VOP_RECLAIM (vp=0xfffff80414bde780, td=0xfffff80003b5d000) at ./vnode_if.h:830 #15 vgonel (vp=0xfffff80414bde780) at /usr/src/sys/kern/vfs_subr.c:3501 #16 0xffffffff80c8e9c5 in vrecyclel (vp=0xfffff80414bde780) at /usr/src/sys/kern/vfs_subr.c:3342 #17 vrecycle (vp=0xfffff80414bde780) at /usr/src/sys/kern/vfs_subr.c:3323 #18 0xffffffff82a355bd in zfs_freebsd_inactive (ap=<optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:5387 #19 0xffffffff812481ab in VOP_INACTIVE_APV (vop=0xffffffff82adf1a0 <zfs_vnodeops>, a=0xfffffe009f78d7f8) at vnode_if.c:1955 #20 0xffffffff80c8d0b0 in VOP_INACTIVE (vp=0xfffff80414bde780, td=0xfffff80003b5d000) at ./vnode_if.h:807 #21 vinactive (vp=0xfffff80414bde780, td=0xfffff80003b5d000) at /usr/src/sys/kern/vfs_subr.c:3166 #22 0xffffffff80c8d5c8 in vputx (vp=0xfffff80414bde780, func=2) at /usr/src/sys/kern/vfs_subr.c:2894 #23 0xffffffff80c9722c in kern_unlinkat (td=0xfffff80003b5d000, fd=-100, path=0x810904000 <error: Cannot access memory at address 0x810904000>, pathseg=UIO_USERSPACE, oldinum=<optimized out>) at /usr/src/sys/kern/vfs_syscalls.c:1840 #24 0xffffffff810904c7 in syscallenter (td=0xfffff80003b5d000) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:144 #25 amd64_syscall (td=0xfffff80003b5d000, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1163 #26 <signal handler called> #27 0x0000000809f805aa in ?? () Backtrace stopped: Cannot access memory at address 0x7fffffffe238 /sys/amd64(kgdb) frame 7 #7 <signal handler called> (kgdb) up #8 0xffffffff82963725 in get_next_chunk (dn=0xfffff804325045c0, start=<optimized out>, minimum=0, l1blks=<optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:721 warning: Source file is more recent than executable. 721 (roundup(*start, iblkrange) - (minimum / iblkrange * iblkrange)) / (kgdb) list 716 * L1 blocks in this range have data. If we can, we use this 717 * worst case value as an estimate so we can avoid having to look 718 * at the object's actual data. 719 */ 720 uint64_t total_l1blks = 721 (roundup(*start, iblkrange) - (minimum / iblkrange * iblkrange)) / 722 iblkrange; 723 if (total_l1blks <= maxblks) { 724 *l1blks = total_l1blks; 725 *start = minimum; (kgdb) print iblkrange $1 = 0 (kgdb) print minimum $2 = 0 (kgdb) print start $3 = <optimized out>
Still affects 12.2-RELEASE-p1 We've identified the offending files and quarantined them, preventing unintentional crashes for the moment...
Unfortunately, the quarantine is not working. We seem to still suffer crashes when deleting files. I have tried to make new ZFS Filesystems, and still observe the crash there, so it must be at some lower level in the zpool. ZFS has no fsck utility. How can we recover our pool to a working state? :(
Sorry, I have no idea what the issue is here. I'd suggest asking on the freebsd-fs mailing list for suggestions.
Looks like https://github.com/openzfs/zfs/issues/8778 ?
Definitely looks like the same bug considering that the simple fix from OpenZFS is not applied to the FreeBSD zfs code - so that bug is definitely there.
I can confirm, the patch that Xin supplied on freebsd-fs clears this issue up entirely for us. Thank you!