Bug 200592 - panic: solaris assert: zilog_is_dirty(zilog) || spa_freeze_txg(zilog->zl_spa) != UINT64_MAX, file: /release/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c, line: 1060
Summary: panic: solaris assert: zilog_is_dirty(zilog) || spa_freeze_txg(zilog->zl_spa)...
Status: Closed Feedback Timeout
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.1-STABLE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: Andriy Gapon
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-06-02 11:40 UTC by johans
Modified: 2017-09-15 14:16 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description johans 2015-06-02 11:40:06 UTC
We've gotten this panic on one of our virtual machines, which is running as a KVM virtual machine. This is on 10.1-RELEASE with a few patched cherry picked from 10.1-STABLE.

(kgdb) bt
#0  doadump (textdump=<value optimized out>) at pcpu.h:219
#1  0xffffffff8091e882 in kern_reboot (howto=260) at /release/usr/src/sys/kern/kern_shutdown.c:452
#2  0xffffffff8091ed75 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /release/usr/src/sys/kern/kern_shutdown.c:759
#3  0xffffffff8091edc3 in panic (fmt=0x0) at /release/usr/src/sys/kern/kern_shutdown.c:688
#4  0xffffffff81c6821d in assfail (a=<value optimized out>, f=<value optimized out>, l=<value optimized out>)
    at /release/usr/src/sys/modules/opensolaris/../../cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:81
#5  0xffffffff81a27381 in zil_commit (zilog=0xfffff800082ec400, foid=<value optimized out>)
    at /release/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c:1059
#6  0xffffffff81a56cf8 in zfs_sync (vfsp=<value optimized out>, waitfor=<value optimized out>)
    at /release/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:161
#7  0xffffffff809c53ca in sync_fsync (ap=<value optimized out>) at /release/usr/src/sys/kern/vfs_subr.c:3715
#8  0xffffffff80e3efe7 in VOP_FSYNC_APV (vop=<value optimized out>, a=<value optimized out>) at vnode_if.c:1330
#9  0xffffffff809c5c9b in sched_sync () at vnode_if.h:549
#10 0xffffffff808ee384 in fork_exit (callout=0xffffffff809c5950 <sched_sync>, arg=0x0, frame=0xfffffe0000272c00)
    at /release/usr/src/sys/kern/kern_fork.c:996
#11 0xffffffff80d03eae in fork_trampoline () at /release/usr/src/sys/amd64/amd64/exception.S:606
#12 0x0000000000000000 in ?? ()


Kernel dump is available if needed.
Comment 1 Alan Somers freebsd_committer 2016-08-08 22:47:21 UTC
On FreeBSD head r303834 I can reliabily reproduce the panic string (with a different stack trace) by running zil_002_pos from the ZFS test suite (available in the zfsd project branch).  I think the problem is caused by exporting a pool with a nonempty ZIL.

panic: solaris assert: !zilog_is_dirty(zilog), file: /usr/home/alans/freebsd/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c, line: 1816
cpuid = 6
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0b5538f790
vpanic() at vpanic+0x182/frame 0xfffffe0b5538f810
panic() at panic+0x43/frame 0xfffffe0b5538f870
assfail() at assfail+0x1a/frame 0xfffffe0b5538f880
zil_close() at zil_close+0xf0/frame 0xfffffe0b5538f8b0
zfsvfs_teardown() at zfsvfs_teardown+0x51/frame 0xfffffe0b5538f900
zfs_umount() at zfs_umount+0x1c7/frame 0xfffffe0b5538f940
dounmount() at dounmount+0x6f4/frame 0xfffffe0b5538f9c0
sys_unmount() at sys_unmount+0x35d/frame 0xfffffe0b5538fae0
amd64_syscall() at amd64_syscall+0x2db/frame 0xfffffe0b5538fbf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0b5538fbf0
--- syscall (22, FreeBSD ELF64, sys_unmount), rip = 0x8010f0efa, rsp = 0x7fffffffbac8, rbp = 0x7fffffffbf60 ---
Comment 2 Andriy Gapon freebsd_committer 2016-08-09 05:34:42 UTC
(In reply to Alan Somers from comment #1)
Alan, you seem to be running into a different bug, because the stack trace and the assertion are completely different despite having ZIL in common.

I've also run into the original problem reported here and tried to discuss it on the OpenZFS mailing list:
https://www.listbox.com/member/archive/274414/2016/08/sort/time_rev/page/3/entry/2:73/20160803073419:3317CA0E-596E-11E6-B5AA-B45042299F87/
Comment 3 Andriy Gapon freebsd_committer 2016-10-29 13:48:23 UTC
It seems that this illumos issue is related to this bug: https://www.illumos.org/issues/3821
There is a pull request here: https://github.com/openzfs/openzfs/pull/208
So, if anyone is able to reproduce the problem or runs into it from time to time, then it would be nice to test the change.
Comment 4 Andriy Gapon freebsd_committer 2017-02-10 15:22:54 UTC
Can anyone reproduce this problem using recent FreeBSD head (r309250 or newer) or stable/11?
Comment 5 Alan Somers freebsd_committer 2017-02-10 22:34:25 UTC
I can still reproduce my version of the failure on head at r313567.
Comment 6 Andriy Gapon freebsd_committer 2017-02-10 23:05:17 UTC
(In reply to Alan Somers from comment #5)
I think that it's worthwhile opening a new bug for your issue.
Comment 7 Andriy Gapon freebsd_committer 2017-02-21 10:50:02 UTC
Closing this bug.
Please re-open if the problem from comment #0 happens again.
Comment 8 emz 2017-09-14 06:14:15 UTC
Just got this on a 11.0-STABLE r310734.

panic: solaris assert: !zilog_is_dirty(zilog), file:
/usr/src-11-r310734/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c,
line: 1861

Crashdump available if needed.
Comment 9 Andriy Gapon freebsd_committer 2017-09-14 08:12:25 UTC
(In reply to emz from comment #8)
Please provide as much info from the crash dump as you are to extract with kgdb.
At minimum the stack trace and *zilog, *lwb, *zilog->zl_dmu_pool in the zil_close frame.
Comment 10 emz 2017-09-15 08:00:52 UTC
Here they are (I can also attach the kernel.full along with the compressed crashdump and put it on a web-server, in case you need it):

<7>sonewconn: pcb 0xfffff8014c2cb740: Listen queue overflow: 193 already in queue awaiting acceptance (1034 occurrences)
panic: solaris assert: !zilog_is_dirty(zilog), file: /usr/src-11-r310734/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c, line: 1861
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff80b023a7 at kdb_backtrace+0x67
#1 0xffffffff80ab88e6 at vpanic+0x186
#2 0xffffffff80ab8753 at panic+0x43
#3 0xffffffff824fa20a at assfail+0x1a
#4 0xffffffff8228f9da at zil_close+0x16a
#5 0xffffffff822c4ee5 at zvol_last_close+0x15
#6 0xffffffff822c5e2a at zvol_d_close+0x6a
#7 0xffffffff8096af83 at devfs_close+0x5d3
#8 0xffffffff810db544 at VOP_CLOSE_APV+0x84
#9 0xffffffff80b8b9d2 at vgonel+0xe2
#10 0xffffffff80b8b258 at vflush+0x4e8
#11 0xffffffff80966878 at devfs_unmount+0x38
#12 0xffffffff80b819b5 at dounmount+0x865
#13 0xffffffff80b8ce04 at vfs_unmountall+0xc4
#14 0xffffffff80b69e40 at bufshutdown+0x440
#15 0xffffffff80ab8064 at kern_reboot+0x244
#16 0xffffffff80ab7e13 at sys_reboot+0x583
#17 0xffffffff80f7366e at amd64_syscall+0x50e
Uptime: 7d1h51m39s
Dumping 26476 out of 65386 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /usr/lib/debug//boot/kernel/zfs.ko.debug...done.
done.
Loaded symbols for /boot/kernel/zfs.ko
Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /usr/lib/debug//boot/kernel/opensolaris.ko.debug...done.
done.
Loaded symbols for /boot/kernel/opensolaris.ko
Reading symbols from /boot/kernel/geom_mirror.ko...Reading symbols from /usr/lib/debug//boot/kernel/geom_mirror.ko.debug...done.
done.
Loaded symbols for /boot/kernel/geom_mirror.ko
Reading symbols from /boot/kernel/pf.ko...Reading symbols from /usr/lib/debug//boot/kernel/pf.ko.debug...done.
done.
Loaded symbols for /boot/kernel/pf.ko
Reading symbols from /boot/kernel/ichwd.ko...Reading symbols from /usr/lib/debug//boot/kernel/ichwd.ko.debug...done.
done.
Loaded symbols for /boot/kernel/ichwd.ko
Reading symbols from /boot/kernel/ctl.ko...Reading symbols from /usr/lib/debug//boot/kernel/ctl.ko.debug...done.
done.
Loaded symbols for /boot/kernel/ctl.ko
Reading symbols from /boot/kernel/iscsi.ko...Reading symbols from /usr/lib/debug//boot/kernel/iscsi.ko.debug...done.
done.
Loaded symbols for /boot/kernel/iscsi.ko
#0  doadump (textdump=<value optimized out>) at pcpu.h:222
222     pcpu.h: No such file or directory.
        in pcpu.h
(kgdb) bt
#0  doadump (textdump=<value optimized out>) at pcpu.h:222
#1  0xffffffff80ab8369 in kern_reboot (howto=260) at /usr/src-11-r310734/sys/kern/kern_shutdown.c:366
#2  0xffffffff80ab8920 in vpanic (fmt=<value optimized out>, ap=<value optimized out>)
    at /usr/src-11-r310734/sys/kern/kern_shutdown.c:759
#3  0xffffffff80ab8753 in panic (fmt=<value optimized out>) at /usr/src-11-r310734/sys/kern/kern_shutdown.c:690
#4  0xffffffff824fa20a in assfail (a=<value optimized out>, f=<value optimized out>, l=<value optimized out>)
    at /usr/src-11-r310734/sys/cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:81
#5  0xffffffff8228f9da in zil_close (zilog=0xfffff805810a3800)
    at /usr/src-11-r310734/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c:1861
#6  0xffffffff822c4ee5 in zvol_last_close (zv=0xfffff80f84aff000)
    at /usr/src-11-r310734/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c:863
#7  0xffffffff822c5e2a in zvol_d_close (dev=<value optimized out>, flags=2228228, fmt=<value optimized out>, 
    td=<value optimized out>) at /usr/src-11-r310734/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c:3116
#8  0xffffffff8096af83 in devfs_close (ap=<value optimized out>) at /usr/src-11-r310734/sys/fs/devfs/devfs_vnops.c:648
#9  0xffffffff810db544 in VOP_CLOSE_APV (vop=<value optimized out>, a=<value optimized out>) at vnode_if.c:534
#10 0xffffffff80b8b9d2 in vgonel (vp=0xfffff8056efcc938) at vnode_if.h:225
#11 0xffffffff80b8b258 in vflush (mp=<value optimized out>, rootrefs=<value optimized out>, flags=2, 
    td=<value optimized out>) at /usr/src-11-r310734/sys/kern/vfs_subr.c:3034
#12 0xffffffff80966878 in devfs_unmount (mp=0xfffff800138b2660, mntflags=<value optimized out>)
    at /usr/src-11-r310734/sys/fs/devfs/devfs_vfsops.c:188
#13 0xffffffff80b819b5 in dounmount (mp=<value optimized out>, flags=<value optimized out>, td=<value optimized out>)
    at /usr/src-11-r310734/sys/kern/vfs_mount.c:1381
#14 0xffffffff80b8ce04 in vfs_unmountall () at /usr/src-11-r310734/sys/kern/vfs_subr.c:3878
#15 0xffffffff80b69e40 in bufshutdown (show_busybufs=0) at /usr/src-11-r310734/sys/kern/vfs_bio.c:1285
#16 0xffffffff80ab8064 in kern_reboot (howto=0) at /usr/src-11-r310734/sys/kern/kern_shutdown.c:352
#17 0xffffffff80ab7e13 in sys_reboot (td=<value optimized out>, uap=<value optimized out>)
    at /usr/src-11-r310734/sys/kern/kern_shutdown.c:228
#18 0xffffffff80f7366e in amd64_syscall (td=0xfffff80076170a00, traced=0) at subr_syscall.c:135
#19 0xffffffff80f55cdb in Xfast_syscall () at /usr/src-11-r310734/sys/amd64/amd64/exception.S:396
#20 0x000000080086ebda in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) frame 5
#5  0xffffffff8228f9da in zil_close (zilog=0xfffff805810a3800)
    at /usr/src-11-r310734/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c:1861
1861            VERIFY(!zilog_is_dirty(zilog));
(kgdb) p *zilog->zl_dmu_pool
$3 = {dp_spa = 0xfffff80024aff000, dp_meta_objset = 0xfffff80024905000, dp_root_dir = 0xfffff800138f9400, 
  dp_mos_dir = 0xfffff800243ab400, dp_free_dir = 0xfffff8002443d800, dp_leak_dir = 0x0, 
  dp_origin_snap = 0xfffff80013f39800, dp_root_dir_obj = 2, dp_vnrele_taskq = 0xfffff800139d2d10, dp_meta_rootbp = {
    blk_dva = 0xfffff80013c3d048, blk_prop = 9226475966770118659, blk_pad = 0xfffff80013c3d080, blk_phys_birth = 0, 
    blk_birth = 2459487, blk_fill = 7305, blk_cksum = {zc_word = 0xfffff80013c3d0a8}}, dp_tmp_userrefs_obj = 33211, 
  dp_free_bpobj = {bpo_lock = {lock_object = {lo_name = 0xffffffff8231b35d "bpo->bpo_lock", lo_flags = 40960000, 
        lo_data = 0, lo_witness = 0x0}, sx_lock = 1}, bpo_os = 0xfffff80024905000, bpo_object = 11, bpo_epb = 1024, 
    bpo_havecomp = 1 '\001', bpo_havesubobj = 1 '\001', bpo_phys = 0xfffff800243f3800, bpo_dbuf = 0xfffff800243e3e80, 
    bpo_cached_dbuf = 0x0}, dp_bptree_obj = 18561, dp_empty_bpobj = 65, dp_scan = 0xfffff800243c4200, dp_lock = {
    lock_object = {lo_name = 0xffffffff82321731 "dp->dp_lock", lo_flags = 40960000, lo_data = 0, lo_witness = 0x0}, 
    sx_lock = 1}, dp_spaceavail_cv = {cv_description = 0xffffffff8232173e "dp->dp_spaceavail_cv", cv_waiters = 0}, 
  dp_dirty_pertxg = 0xfffff80013c3d168, dp_dirty_total = 4227072, dp_mos_used_delta = 0, dp_mos_compressed_delta = 0, 
  dp_mos_uncompressed_delta = 0, dp_last_wakeup = 654040753364800, dp_tx = {tx_cpu = 0xfffffe0010acb000, 
    tx_sync_lock = {lock_object = {lo_name = 0xffffffff82328df4 "tx->tx_sync_lock", lo_flags = 40960000, lo_data = 0, 
        lo_witness = 0x0}, sx_lock = 1}, tx_open_txg = 2459489, tx_quiesced_txg = 0, tx_syncing_txg = 2459488, 
    tx_synced_txg = 2459487, tx_open_time = 611499869469547, tx_sync_txg_waiting = 2459487, 
    tx_quiesce_txg_waiting = 2459489, tx_sync_more_cv = {cv_description = 0xffffffff82328e06 "tx->tx_sync_more_cv", 
      cv_waiters = 0}, tx_sync_done_cv = {cv_description = 0xffffffff82328e1b "tx->tx_sync_done_cv", cv_waiters = 0}, 
    tx_quiesce_more_cv = {cv_description = 0xffffffff82328e30 "tx->tx_quiesce_more_cv", cv_waiters = 1}, 
    tx_quiesce_done_cv = {cv_description = 0xffffffff82328e48 "tx->tx_quiesce_done_cv", cv_waiters = 0}, 
    tx_timeout_cv = {cv_description = 0x0, cv_waiters = 0}, tx_exit_cv = {
      cv_description = 0xffffffff82328e60 "tx->tx_exit_cv", cv_waiters = 0}, tx_threads = 2 '\002', 
    tx_exiting = 0 '\0', tx_sync_thread = 0xfffff8002c49d000, tx_quiesce_thread = 0xfffff8002c48a500, 
    tx_commit_cb_taskq = 0x0}, dp_dirty_datasets = {tl_lock = {lock_object = {
        lo_name = 0xffffffff82328eb9 "tl->tl_lock", lo_flags = 40960000, lo_data = 0, lo_witness = 0x0}, sx_lock = 1}, 
    tl_offset = 456, tl_head = 0xfffff80013c3d2b8}, dp_dirty_zilogs = {tl_lock = {lock_object = {
        lo_name = 0xffffffff82328eb9 "tl->tl_lock", lo_flags = 40960000, lo_data = 0, lo_witness = 0x0}, sx_lock = 1}, 
    tl_offset = 984, tl_head = 0xfffff80013c3d300}, dp_dirty_dirs = {tl_lock = {lock_object = {
        lo_name = 0xffffffff82328eb9 "tl->tl_lock", lo_flags = 40960000, lo_data = 0, lo_witness = 0x0}, sx_lock = 1}, 
    tl_offset = 80, tl_head = 0xfffff80013c3d348}, dp_sync_tasks = {tl_lock = {lock_object = {
        lo_name = 0xffffffff82328eb9 "tl->tl_lock", lo_flags = 40960000, lo_data = 0, lo_witness = 0x0}, sx_lock = 1}, 
    tl_offset = 0, tl_head = 0xfffff80013c3d390}, dp_config_rwlock = {rr_lock = {lock_object = {
        lo_name = 0xffffffff82324886 "rrl->rr_lock", lo_flags = 40960000, lo_data = 0, lo_witness = 0x0}, 
      sx_lock = 1}, rr_cv = {cv_description = 0xffffffff82324894 "rrl->rr_cv", cv_waiters = 0}, rr_writer = 0x0, 
    rr_anon_rcount = {rc_count = 0}, rr_linked_rcount = {rc_count = 0}, rr_writer_wanted = 0, rr_track_all = 1}, 
  dp_blkstats = 0x0}
(kgdb) p *zilog
$4 = {zl_lock = {lock_object = {lo_name = 0xffffffff8232bef5 "zilog->zl_lock", lo_flags = 40960000, lo_data = 0, 
      lo_witness = 0x0}, sx_lock = 1}, zl_dmu_pool = 0xfffff80013c3d000, zl_spa = 0xfffff80024aff000, 
  zl_header = 0xfffff80031571150, zl_os = 0xfffff80031571000, zl_get_data = 0xffffffff822c4d60 <zvol_get_data>, 
  zl_root_zio = 0x0, zl_lr_seq = 0, zl_commit_lr_seq = 0, zl_destroy_txg = 3, zl_replayed_seq = 0xfffff805810a3868, 
  zl_replaying_seq = 0, zl_suspend = 0, zl_cv_writer = {cv_description = 0xffffffff8232bf37 "zilog->zl_cv_writer", 
    cv_waiters = 0}, zl_cv_suspend = {cv_description = 0xffffffff8232bf4c "zilog->zl_cv_suspend", cv_waiters = 0}, 
  zl_suspending = 0 '\0', zl_keep_first = 0 '\0', zl_replay = 0 '\0', zl_stop_sync = 0 '\0', zl_writer = 0 '\0', 
  zl_logbias = 0 '\0', zl_sync = 0 '\0', zl_parse_error = 0, zl_parse_blk_seq = 0, zl_parse_lr_seq = 0, 
  zl_parse_blk_count = 0, zl_parse_lr_count = 0, zl_next_batch = 2, zl_com_batch = 1, 
  zl_cv_batch = 0xfffff805810a38f8, zl_itxg = 0xfffff805810a3918, zl_itx_commit_list = {list_size = 80, 
    list_offset = 0, list_head = {list_next = 0xfffff805810a3a08, list_prev = 0xfffff805810a3a08}}, 
  zl_itx_list_sz = 0, zl_cur_used = 0, zl_lwb_list = {list_size = 192, list_offset = 176, list_head = {
      list_next = 0xfffff80857361bf0, list_prev = 0xfffff80857361bf0}}, zl_vdev_lock = {lock_object = {
      lo_name = 0xffffffff8232bf22 "zilog->zl_vdev_lock", lo_flags = 40960000, lo_data = 0, lo_witness = 0x0}, 
    sx_lock = 1}, zl_vdev_tree = {avl_root = 0x0, avl_compar = 0xffffffff8228f700 <zil_vdev_compare>, avl_offset = 8, 
    avl_numnodes = 0, avl_size = 32}, zl_clean_taskq = 0xfffff8015cc60110, zl_bp_tree = {avl_root = 0x0, 
    avl_compar = 0, avl_offset = 0, avl_numnodes = 0, avl_size = 0}, zl_replay_time = 0, zl_replay_blks = 0, 
  zl_old_header = {zh_claim_txg = 0, zh_replay_seq = 0, zh_log = {blk_dva = 0xfffff805810a3ae0, blk_prop = 0, 
      blk_pad = 0xfffff805810a3b18, blk_phys_birth = 0, blk_birth = 0, blk_fill = 0, blk_cksum = {
        zc_word = 0xfffff805810a3b40}}, zh_claim_blk_seq = 0, zh_flags = 0, zh_claim_lr_seq = 0, 
    zh_pad = 0xfffff805810a3b78}, zl_prev_blks = 0xfffff805810a3b90, zl_prev_rotor = 0, zl_dirty_link = {
    tn_next = 0xfffff805810a3bd8, tn_member = 0xfffff805810a3bf8 "\001"}}
(kgdb) p *lwb
No symbol "lwb" in current context.
(kgdb)
Comment 11 Andriy Gapon freebsd_committer 2017-09-15 14:16:14 UTC
(In reply to emz from comment #10)
Interesting.  So, unlike the original report your crash is related to a ZFS volume, not a filesystem.  It seems that the volume is configure in "dev" mode...

Can you check what specific volume caused the panic and describe how you use?
E.g. if it's used for swap or any particular filesystem, etc.
Thanks!

P.S.
Perhaps it would be better to open a new bug report given the differences from the original report.