Hello, today my PC locked up while using local Xorg. After hard reset, I got the following panic: solaris assert: 0 == dmu_tx_assign(tx, TXG_WAIT), file: /usr/local/share/deploy-tools/HEAD/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c, line: 330 assfail() at assfail+0x1a/frame 0xfffffe008cd842e0 zfs_unlinked_drain() at zfs_unlinked_drain+0x175/frame 0xfffffe008cd844b0 zfsvfs_setup() at zfsvfs_setup+0x5b/frame 0xfffffe008cd844e0 zfs_mount() at zfs_mount+0x731/frame 0xfffffe008cd84670 vfs_domount() at vfs_domount+0x730/frame 0xfffffe008cd84890 vfs_donmount() at vfs_donmount+0x807/frame 0xfffffe008cd84940 sys_nmount() at sys_nmount+0x72/frame 0xfffffe008cd84980 amd64_syscall() at amd64_syscall+0x281/frame 0xfffffe008cd84ab0 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe008cd84ab0 --- syscall (378, FreeBSD ELF64, sys_nmount), rip = 0x80037684a, rsp = 0x7fffffffcc68, rbp = 0x7fffffffcce0 --- #11 0xffffffff81eb523a in assfail () from /boot/kernel/opensolaris.ko #12 0xffffffff81b806a5 in zfs_unlinked_drain () from /boot/kernel/zfs.ko #13 0xffffffff81b997fb in zfsvfs_setup () from /boot/kernel/zfs.ko #14 0xffffffff81b972a1 in zfs_mount () from /boot/kernel/zfs.ko #15 0xffffffff808b44f0 in vfs_domount (td=0xfffffe008cd84368, fstype=<value optimized out>, fspath=<value optimized out>, fsflags=<value optimized out>, optlist=0xfffffe008cd848d8) at /usr/local/share/deploy-tools/HEAD/src/sys/kern/vfs_mount.c:892 #16 0xffffffff808b37b7 in vfs_donmount (td=0xfffff800110d2000, fsflags=0, fsoptions=0xfffff8000e16e100) at /usr/local/share/deploy-tools/HEAD/src/sys/kern/vfs_mount.c:726 #17 0xffffffff808b2f82 in sys_nmount (td=0xfffff800110d2000, uap=0xfffff800110d23c0) at /usr/local/share/deploy-tools/HEAD/src/sys/kern/vfs_mount.c:431 #18 0xffffffff80b39771 in amd64_syscall (td=0xfffff800110d2000, traced=0) at subr_syscall.c:135 #19 0xffffffff80b1586d in fast_syscall_common () at /usr/local/share/deploy-tools/HEAD/src/sys/amd64/amd64/exception.S:500 #20 0x000000080037684a in ?? () After I found the suspicious dataset which cused that panic due to quota exhaustion, there was another (single user) panic, which I haven't dumped to swap because this one wassn't saved yet... So here's just the kdb lines, transcribed manually from a picture I took: panic: solaris assert: delta > 0 ? dsl_dir_phys(dd)->dd_used_breakdown[oldtype] >= delt :-( missing rest of the line :-( /dsl_dir.c, line 1564 … assfail() dsl_dir_transfer_space() at dsl_dir_transfer_space dsl_dataset_block_born() at dsl_dataset_block_born dbuf_write_done() at dbuf_write_done arc_write_done() at arc_write_done zio_done() at zio_done zio_execute() at zio_execute taskqueue_run_locked() at taskqueue_run_locked taskqueue_thread_loop() at taskqueue_thread_loop fork_exit() at fork_exit fork_trampoline() at fork_trampoline Since it's a custom kernel, the latter is useless I guess, but in case somebody considers this kind of crash worth fixing, the second partly trace might point to other places in the same context. Will try to get to the sources and add a more useful backtrace, don't have them arround and ran out of time now... Thanks, -harry
The first panic indeed looks like combination of r334810 making dmu_tx_assign() errors fatal and quota overflow causing that. We need to handle those errors somehow. About second panic I am not sure, but I found such an interesting comment above zfs_unlinked_add(): * When dealing with the unlinked set, we dmu_tx_hold_zap(), but we * don't specify the name of the entry that we will be manipulating. We * also fib and say that we won't be adding any new entries to the * unlinked set, even though we might (this is to lower the minimum file * size that can be deleted in a full filesystem). So on the small * chance that the nlink list is using a fat zap (ie. has more than * 2000 entries), we *may* not pre-read a block that's needed. * Therefore it is remotely possible for some of the assertions * regarding the unlinked set below to fail due to i/o error. On a * nondebug system, this will result in the space being leaked. Curios whether it can be the case here.
A commit references this bug: Author: mav Date: Wed Aug 22 16:32:53 UTC 2018 New revision: 338206 URL: https://svnweb.freebsd.org/changeset/base/338206 Log: Add dmu_tx_assign() error handling in zfs_unlinked_drain(). The error handling got lost during r334810, while according to the report error there may happen in case of dataset being over quota. In such case just leave the node in the unlinked list to be freed sometimes later. PR: 229887 Sponsored by: iXsystems, Inc. Changes: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c
I was unable to reproduce the problem, but I believe committed patch should fix it.