We've gotten this panic on one of our virtual machines, which is running as a KVM virtual machine. This is on 10.1-RELEASE with a few patched cherry picked from 10.1-STABLE. (kgdb) bt #0 doadump (textdump=<value optimized out>) at pcpu.h:219 #1 0xffffffff8091e882 in kern_reboot (howto=260) at /release/usr/src/sys/kern/kern_shutdown.c:452 #2 0xffffffff8091ed75 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /release/usr/src/sys/kern/kern_shutdown.c:759 #3 0xffffffff8091edc3 in panic (fmt=0x0) at /release/usr/src/sys/kern/kern_shutdown.c:688 #4 0xffffffff81c6821d in assfail (a=<value optimized out>, f=<value optimized out>, l=<value optimized out>) at /release/usr/src/sys/modules/opensolaris/../../cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:81 #5 0xffffffff81a27381 in zil_commit (zilog=0xfffff800082ec400, foid=<value optimized out>) at /release/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c:1059 #6 0xffffffff81a56cf8 in zfs_sync (vfsp=<value optimized out>, waitfor=<value optimized out>) at /release/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:161 #7 0xffffffff809c53ca in sync_fsync (ap=<value optimized out>) at /release/usr/src/sys/kern/vfs_subr.c:3715 #8 0xffffffff80e3efe7 in VOP_FSYNC_APV (vop=<value optimized out>, a=<value optimized out>) at vnode_if.c:1330 #9 0xffffffff809c5c9b in sched_sync () at vnode_if.h:549 #10 0xffffffff808ee384 in fork_exit (callout=0xffffffff809c5950 <sched_sync>, arg=0x0, frame=0xfffffe0000272c00) at /release/usr/src/sys/kern/kern_fork.c:996 #11 0xffffffff80d03eae in fork_trampoline () at /release/usr/src/sys/amd64/amd64/exception.S:606 #12 0x0000000000000000 in ?? () Kernel dump is available if needed.
On FreeBSD head r303834 I can reliabily reproduce the panic string (with a different stack trace) by running zil_002_pos from the ZFS test suite (available in the zfsd project branch). I think the problem is caused by exporting a pool with a nonempty ZIL. panic: solaris assert: !zilog_is_dirty(zilog), file: /usr/home/alans/freebsd/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c, line: 1816 cpuid = 6 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0b5538f790 vpanic() at vpanic+0x182/frame 0xfffffe0b5538f810 panic() at panic+0x43/frame 0xfffffe0b5538f870 assfail() at assfail+0x1a/frame 0xfffffe0b5538f880 zil_close() at zil_close+0xf0/frame 0xfffffe0b5538f8b0 zfsvfs_teardown() at zfsvfs_teardown+0x51/frame 0xfffffe0b5538f900 zfs_umount() at zfs_umount+0x1c7/frame 0xfffffe0b5538f940 dounmount() at dounmount+0x6f4/frame 0xfffffe0b5538f9c0 sys_unmount() at sys_unmount+0x35d/frame 0xfffffe0b5538fae0 amd64_syscall() at amd64_syscall+0x2db/frame 0xfffffe0b5538fbf0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0b5538fbf0 --- syscall (22, FreeBSD ELF64, sys_unmount), rip = 0x8010f0efa, rsp = 0x7fffffffbac8, rbp = 0x7fffffffbf60 ---
(In reply to Alan Somers from comment #1) Alan, you seem to be running into a different bug, because the stack trace and the assertion are completely different despite having ZIL in common. I've also run into the original problem reported here and tried to discuss it on the OpenZFS mailing list: https://www.listbox.com/member/archive/274414/2016/08/sort/time_rev/page/3/entry/2:73/20160803073419:3317CA0E-596E-11E6-B5AA-B45042299F87/
It seems that this illumos issue is related to this bug: https://www.illumos.org/issues/3821 There is a pull request here: https://github.com/openzfs/openzfs/pull/208 So, if anyone is able to reproduce the problem or runs into it from time to time, then it would be nice to test the change.
Can anyone reproduce this problem using recent FreeBSD head (r309250 or newer) or stable/11?
I can still reproduce my version of the failure on head at r313567.
(In reply to Alan Somers from comment #5) I think that it's worthwhile opening a new bug for your issue.
Closing this bug. Please re-open if the problem from comment #0 happens again.
Just got this on a 11.0-STABLE r310734. panic: solaris assert: !zilog_is_dirty(zilog), file: /usr/src-11-r310734/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c, line: 1861 Crashdump available if needed.
(In reply to emz from comment #8) Please provide as much info from the crash dump as you are to extract with kgdb. At minimum the stack trace and *zilog, *lwb, *zilog->zl_dmu_pool in the zil_close frame.
Here they are (I can also attach the kernel.full along with the compressed crashdump and put it on a web-server, in case you need it): <7>sonewconn: pcb 0xfffff8014c2cb740: Listen queue overflow: 193 already in queue awaiting acceptance (1034 occurrences) panic: solaris assert: !zilog_is_dirty(zilog), file: /usr/src-11-r310734/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c, line: 1861 cpuid = 0 KDB: stack backtrace: #0 0xffffffff80b023a7 at kdb_backtrace+0x67 #1 0xffffffff80ab88e6 at vpanic+0x186 #2 0xffffffff80ab8753 at panic+0x43 #3 0xffffffff824fa20a at assfail+0x1a #4 0xffffffff8228f9da at zil_close+0x16a #5 0xffffffff822c4ee5 at zvol_last_close+0x15 #6 0xffffffff822c5e2a at zvol_d_close+0x6a #7 0xffffffff8096af83 at devfs_close+0x5d3 #8 0xffffffff810db544 at VOP_CLOSE_APV+0x84 #9 0xffffffff80b8b9d2 at vgonel+0xe2 #10 0xffffffff80b8b258 at vflush+0x4e8 #11 0xffffffff80966878 at devfs_unmount+0x38 #12 0xffffffff80b819b5 at dounmount+0x865 #13 0xffffffff80b8ce04 at vfs_unmountall+0xc4 #14 0xffffffff80b69e40 at bufshutdown+0x440 #15 0xffffffff80ab8064 at kern_reboot+0x244 #16 0xffffffff80ab7e13 at sys_reboot+0x583 #17 0xffffffff80f7366e at amd64_syscall+0x50e Uptime: 7d1h51m39s Dumping 26476 out of 65386 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /usr/lib/debug//boot/kernel/zfs.ko.debug...done. done. Loaded symbols for /boot/kernel/zfs.ko Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /usr/lib/debug//boot/kernel/opensolaris.ko.debug...done. done. Loaded symbols for /boot/kernel/opensolaris.ko Reading symbols from /boot/kernel/geom_mirror.ko...Reading symbols from /usr/lib/debug//boot/kernel/geom_mirror.ko.debug...done. done. Loaded symbols for /boot/kernel/geom_mirror.ko Reading symbols from /boot/kernel/pf.ko...Reading symbols from /usr/lib/debug//boot/kernel/pf.ko.debug...done. done. Loaded symbols for /boot/kernel/pf.ko Reading symbols from /boot/kernel/ichwd.ko...Reading symbols from /usr/lib/debug//boot/kernel/ichwd.ko.debug...done. done. Loaded symbols for /boot/kernel/ichwd.ko Reading symbols from /boot/kernel/ctl.ko...Reading symbols from /usr/lib/debug//boot/kernel/ctl.ko.debug...done. done. Loaded symbols for /boot/kernel/ctl.ko Reading symbols from /boot/kernel/iscsi.ko...Reading symbols from /usr/lib/debug//boot/kernel/iscsi.ko.debug...done. done. Loaded symbols for /boot/kernel/iscsi.ko #0 doadump (textdump=<value optimized out>) at pcpu.h:222 222 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt #0 doadump (textdump=<value optimized out>) at pcpu.h:222 #1 0xffffffff80ab8369 in kern_reboot (howto=260) at /usr/src-11-r310734/sys/kern/kern_shutdown.c:366 #2 0xffffffff80ab8920 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src-11-r310734/sys/kern/kern_shutdown.c:759 #3 0xffffffff80ab8753 in panic (fmt=<value optimized out>) at /usr/src-11-r310734/sys/kern/kern_shutdown.c:690 #4 0xffffffff824fa20a in assfail (a=<value optimized out>, f=<value optimized out>, l=<value optimized out>) at /usr/src-11-r310734/sys/cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:81 #5 0xffffffff8228f9da in zil_close (zilog=0xfffff805810a3800) at /usr/src-11-r310734/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c:1861 #6 0xffffffff822c4ee5 in zvol_last_close (zv=0xfffff80f84aff000) at /usr/src-11-r310734/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c:863 #7 0xffffffff822c5e2a in zvol_d_close (dev=<value optimized out>, flags=2228228, fmt=<value optimized out>, td=<value optimized out>) at /usr/src-11-r310734/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c:3116 #8 0xffffffff8096af83 in devfs_close (ap=<value optimized out>) at /usr/src-11-r310734/sys/fs/devfs/devfs_vnops.c:648 #9 0xffffffff810db544 in VOP_CLOSE_APV (vop=<value optimized out>, a=<value optimized out>) at vnode_if.c:534 #10 0xffffffff80b8b9d2 in vgonel (vp=0xfffff8056efcc938) at vnode_if.h:225 #11 0xffffffff80b8b258 in vflush (mp=<value optimized out>, rootrefs=<value optimized out>, flags=2, td=<value optimized out>) at /usr/src-11-r310734/sys/kern/vfs_subr.c:3034 #12 0xffffffff80966878 in devfs_unmount (mp=0xfffff800138b2660, mntflags=<value optimized out>) at /usr/src-11-r310734/sys/fs/devfs/devfs_vfsops.c:188 #13 0xffffffff80b819b5 in dounmount (mp=<value optimized out>, flags=<value optimized out>, td=<value optimized out>) at /usr/src-11-r310734/sys/kern/vfs_mount.c:1381 #14 0xffffffff80b8ce04 in vfs_unmountall () at /usr/src-11-r310734/sys/kern/vfs_subr.c:3878 #15 0xffffffff80b69e40 in bufshutdown (show_busybufs=0) at /usr/src-11-r310734/sys/kern/vfs_bio.c:1285 #16 0xffffffff80ab8064 in kern_reboot (howto=0) at /usr/src-11-r310734/sys/kern/kern_shutdown.c:352 #17 0xffffffff80ab7e13 in sys_reboot (td=<value optimized out>, uap=<value optimized out>) at /usr/src-11-r310734/sys/kern/kern_shutdown.c:228 #18 0xffffffff80f7366e in amd64_syscall (td=0xfffff80076170a00, traced=0) at subr_syscall.c:135 #19 0xffffffff80f55cdb in Xfast_syscall () at /usr/src-11-r310734/sys/amd64/amd64/exception.S:396 #20 0x000000080086ebda in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) frame 5 #5 0xffffffff8228f9da in zil_close (zilog=0xfffff805810a3800) at /usr/src-11-r310734/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c:1861 1861 VERIFY(!zilog_is_dirty(zilog)); (kgdb) p *zilog->zl_dmu_pool $3 = {dp_spa = 0xfffff80024aff000, dp_meta_objset = 0xfffff80024905000, dp_root_dir = 0xfffff800138f9400, dp_mos_dir = 0xfffff800243ab400, dp_free_dir = 0xfffff8002443d800, dp_leak_dir = 0x0, dp_origin_snap = 0xfffff80013f39800, dp_root_dir_obj = 2, dp_vnrele_taskq = 0xfffff800139d2d10, dp_meta_rootbp = { blk_dva = 0xfffff80013c3d048, blk_prop = 9226475966770118659, blk_pad = 0xfffff80013c3d080, blk_phys_birth = 0, blk_birth = 2459487, blk_fill = 7305, blk_cksum = {zc_word = 0xfffff80013c3d0a8}}, dp_tmp_userrefs_obj = 33211, dp_free_bpobj = {bpo_lock = {lock_object = {lo_name = 0xffffffff8231b35d "bpo->bpo_lock", lo_flags = 40960000, lo_data = 0, lo_witness = 0x0}, sx_lock = 1}, bpo_os = 0xfffff80024905000, bpo_object = 11, bpo_epb = 1024, bpo_havecomp = 1 '\001', bpo_havesubobj = 1 '\001', bpo_phys = 0xfffff800243f3800, bpo_dbuf = 0xfffff800243e3e80, bpo_cached_dbuf = 0x0}, dp_bptree_obj = 18561, dp_empty_bpobj = 65, dp_scan = 0xfffff800243c4200, dp_lock = { lock_object = {lo_name = 0xffffffff82321731 "dp->dp_lock", lo_flags = 40960000, lo_data = 0, lo_witness = 0x0}, sx_lock = 1}, dp_spaceavail_cv = {cv_description = 0xffffffff8232173e "dp->dp_spaceavail_cv", cv_waiters = 0}, dp_dirty_pertxg = 0xfffff80013c3d168, dp_dirty_total = 4227072, dp_mos_used_delta = 0, dp_mos_compressed_delta = 0, dp_mos_uncompressed_delta = 0, dp_last_wakeup = 654040753364800, dp_tx = {tx_cpu = 0xfffffe0010acb000, tx_sync_lock = {lock_object = {lo_name = 0xffffffff82328df4 "tx->tx_sync_lock", lo_flags = 40960000, lo_data = 0, lo_witness = 0x0}, sx_lock = 1}, tx_open_txg = 2459489, tx_quiesced_txg = 0, tx_syncing_txg = 2459488, tx_synced_txg = 2459487, tx_open_time = 611499869469547, tx_sync_txg_waiting = 2459487, tx_quiesce_txg_waiting = 2459489, tx_sync_more_cv = {cv_description = 0xffffffff82328e06 "tx->tx_sync_more_cv", cv_waiters = 0}, tx_sync_done_cv = {cv_description = 0xffffffff82328e1b "tx->tx_sync_done_cv", cv_waiters = 0}, tx_quiesce_more_cv = {cv_description = 0xffffffff82328e30 "tx->tx_quiesce_more_cv", cv_waiters = 1}, tx_quiesce_done_cv = {cv_description = 0xffffffff82328e48 "tx->tx_quiesce_done_cv", cv_waiters = 0}, tx_timeout_cv = {cv_description = 0x0, cv_waiters = 0}, tx_exit_cv = { cv_description = 0xffffffff82328e60 "tx->tx_exit_cv", cv_waiters = 0}, tx_threads = 2 '\002', tx_exiting = 0 '\0', tx_sync_thread = 0xfffff8002c49d000, tx_quiesce_thread = 0xfffff8002c48a500, tx_commit_cb_taskq = 0x0}, dp_dirty_datasets = {tl_lock = {lock_object = { lo_name = 0xffffffff82328eb9 "tl->tl_lock", lo_flags = 40960000, lo_data = 0, lo_witness = 0x0}, sx_lock = 1}, tl_offset = 456, tl_head = 0xfffff80013c3d2b8}, dp_dirty_zilogs = {tl_lock = {lock_object = { lo_name = 0xffffffff82328eb9 "tl->tl_lock", lo_flags = 40960000, lo_data = 0, lo_witness = 0x0}, sx_lock = 1}, tl_offset = 984, tl_head = 0xfffff80013c3d300}, dp_dirty_dirs = {tl_lock = {lock_object = { lo_name = 0xffffffff82328eb9 "tl->tl_lock", lo_flags = 40960000, lo_data = 0, lo_witness = 0x0}, sx_lock = 1}, tl_offset = 80, tl_head = 0xfffff80013c3d348}, dp_sync_tasks = {tl_lock = {lock_object = { lo_name = 0xffffffff82328eb9 "tl->tl_lock", lo_flags = 40960000, lo_data = 0, lo_witness = 0x0}, sx_lock = 1}, tl_offset = 0, tl_head = 0xfffff80013c3d390}, dp_config_rwlock = {rr_lock = {lock_object = { lo_name = 0xffffffff82324886 "rrl->rr_lock", lo_flags = 40960000, lo_data = 0, lo_witness = 0x0}, sx_lock = 1}, rr_cv = {cv_description = 0xffffffff82324894 "rrl->rr_cv", cv_waiters = 0}, rr_writer = 0x0, rr_anon_rcount = {rc_count = 0}, rr_linked_rcount = {rc_count = 0}, rr_writer_wanted = 0, rr_track_all = 1}, dp_blkstats = 0x0} (kgdb) p *zilog $4 = {zl_lock = {lock_object = {lo_name = 0xffffffff8232bef5 "zilog->zl_lock", lo_flags = 40960000, lo_data = 0, lo_witness = 0x0}, sx_lock = 1}, zl_dmu_pool = 0xfffff80013c3d000, zl_spa = 0xfffff80024aff000, zl_header = 0xfffff80031571150, zl_os = 0xfffff80031571000, zl_get_data = 0xffffffff822c4d60 <zvol_get_data>, zl_root_zio = 0x0, zl_lr_seq = 0, zl_commit_lr_seq = 0, zl_destroy_txg = 3, zl_replayed_seq = 0xfffff805810a3868, zl_replaying_seq = 0, zl_suspend = 0, zl_cv_writer = {cv_description = 0xffffffff8232bf37 "zilog->zl_cv_writer", cv_waiters = 0}, zl_cv_suspend = {cv_description = 0xffffffff8232bf4c "zilog->zl_cv_suspend", cv_waiters = 0}, zl_suspending = 0 '\0', zl_keep_first = 0 '\0', zl_replay = 0 '\0', zl_stop_sync = 0 '\0', zl_writer = 0 '\0', zl_logbias = 0 '\0', zl_sync = 0 '\0', zl_parse_error = 0, zl_parse_blk_seq = 0, zl_parse_lr_seq = 0, zl_parse_blk_count = 0, zl_parse_lr_count = 0, zl_next_batch = 2, zl_com_batch = 1, zl_cv_batch = 0xfffff805810a38f8, zl_itxg = 0xfffff805810a3918, zl_itx_commit_list = {list_size = 80, list_offset = 0, list_head = {list_next = 0xfffff805810a3a08, list_prev = 0xfffff805810a3a08}}, zl_itx_list_sz = 0, zl_cur_used = 0, zl_lwb_list = {list_size = 192, list_offset = 176, list_head = { list_next = 0xfffff80857361bf0, list_prev = 0xfffff80857361bf0}}, zl_vdev_lock = {lock_object = { lo_name = 0xffffffff8232bf22 "zilog->zl_vdev_lock", lo_flags = 40960000, lo_data = 0, lo_witness = 0x0}, sx_lock = 1}, zl_vdev_tree = {avl_root = 0x0, avl_compar = 0xffffffff8228f700 <zil_vdev_compare>, avl_offset = 8, avl_numnodes = 0, avl_size = 32}, zl_clean_taskq = 0xfffff8015cc60110, zl_bp_tree = {avl_root = 0x0, avl_compar = 0, avl_offset = 0, avl_numnodes = 0, avl_size = 0}, zl_replay_time = 0, zl_replay_blks = 0, zl_old_header = {zh_claim_txg = 0, zh_replay_seq = 0, zh_log = {blk_dva = 0xfffff805810a3ae0, blk_prop = 0, blk_pad = 0xfffff805810a3b18, blk_phys_birth = 0, blk_birth = 0, blk_fill = 0, blk_cksum = { zc_word = 0xfffff805810a3b40}}, zh_claim_blk_seq = 0, zh_flags = 0, zh_claim_lr_seq = 0, zh_pad = 0xfffff805810a3b78}, zl_prev_blks = 0xfffff805810a3b90, zl_prev_rotor = 0, zl_dirty_link = { tn_next = 0xfffff805810a3bd8, tn_member = 0xfffff805810a3bf8 "\001"}} (kgdb) p *lwb No symbol "lwb" in current context. (kgdb)
(In reply to emz from comment #10) Interesting. So, unlike the original report your crash is related to a ZFS volume, not a filesystem. It seems that the volume is configure in "dev" mode... Can you check what specific volume caused the panic and describe how you use? E.g. if it's used for swap or any particular filesystem, etc. Thanks! P.S. Perhaps it would be better to open a new bug report given the differences from the original report.