I'm observing hanging processes on a zfs system, apparently after umount tried to sync data before unmounting a jails devfs. tools-1-b# service jail restart git-1-b Stopping jails: git-1-b load: 1.77 cmd: umount 79620 [zilog->zl_cv_batch[0]] 568.08r 0.00u 0.01s 0% 2000k load: 1.65 cmd: umount 79620 [zilog->zl_cv_batch[0]] 575.16r 0.00u 0.01s 0% 2000k load: 2.32 cmd: umount 79620 [zilog->zl_cv_batch[0]] 580.44r 0.00u 0.01s 0% 2000k tools-1-b# procstat -kk 79620 PID TID COMM TDNAME KSTACK 79620 100216 umount - mi_switch+0xd2 sleepq_wait+0x3a _cv_wait+0x194 zil_commit+0x85 zfs_sync+0xad sys_sync+0x146 amd64_syscall+0x4ce Xfast_syscall+0xfb tools-1-b# kgdb GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols found) ... Attempt to extract a component of a value that is not a structure pointer. Attempt to extract a component of a value that is not a structure pointer. #0 0xffffffff80b0cafb in sched_switch () (kgdb) tid 100216 [Switching to thread 2772 (Thread 100216)]#0 0xffffffff80b0cafb in sched_switch () (kgdb) bt #0 0xffffffff80b0cafb in sched_switch () #1 0xffffffff80ae4ae2 in mi_switch () #2 0xffffffff80b3226a in sleepq_wait () #3 0xffffffff80a66464 in _cv_wait () #4 0xffffffff82252945 in ?? () #5 0xfffffe201d5308e0 in ?? () #6 0xfffff805d1af7500 in ?? () #7 0xfffffe201d530870 in ?? () #8 0xffffffff80bae521 in __mnt_vnode_markerfree_all () #9 0xffffffff8227aded in ?? () #10 0x00000000fffff7ff in ?? () #11 0xfffffe201d530968 in ?? () #12 0xfffffe201d530990 in ?? () #13 0xffffffff80bb48a6 in sys_sync () Previous frame identical to this frame (corrupt stack?) (kgdb) Two other processes are hanging too: tools-1-b# ps ax | awk '$3~"D"&&$5~"^/"' 36031 - DsJ 0:00.49 /var/cfengine/bin/cf-agent -Dfrom_cfexecd,scheduled_ 40395 - DJ 0:00.01 /var/cfengine/bin/cf-promises -c /var/cfengine/input 79620 0 D+ 0:00.01 /sbin/umount /jails/git-1-b/dev tools-1-b# procstat -kk 36031 PID TID COMM TDNAME KSTACK 36031 103064 cf-agent - mi_switch+0xd2 sleepq_wait+0x3a _cv_wait+0x194 zil_commit+0x85 zfs_freebsd_putpages+0x71e VOP_PUTPAGES_APV+0x8d vnode_pager_putpages+0xe3 vm_pageout_flush+0xed vm_object_page_collect_flush+0x21a vm_object_page_clean+0x197 vm_object_terminate+0x99 zfs_freebsd_reclaim+0x1e VOP_RECLAIM_APV+0x8f vgonel+0x2ef vrecycle+0x90 zfs_freebsd_inactive+0xd VOP_INACTIVE_APV+0x8d vinactive+0x17d tools-1-b# procstat -kk 40395 PID TID COMM TDNAME KSTACK 40395 103070 cf-promises - mi_switch+0xd2 sleepq_wait+0x3a _cv_wait+0x194 zil_commit+0x85 zfs_freebsd_putpages+0x71e VOP_PUTPAGES_APV+0x8d vnode_pager_putpages+0xe3 vm_pageout_flush+0xed vm_object_page_collect_flush+0x21a vm_object_page_clean+0x197 vm_object_terminate+0x99 zfs_freebsd_reclaim+0x1e VOP_RECLAIM_APV+0x8f vgonel+0x2ef vrecycle+0x90 zfs_freebsd_inactive+0xd VOP_INACTIVE_APV+0x8d vinactive+0x17d Everything is on zfs, with both the mothership and the jail in zroot/jails/cfengine-1-b sharing pool "zroot": tools-1-b# zpool list NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT data 2.91T 1.43M 2.91T - 0% 0% 2.14x ONLINE - zroot 744G 19.6G 724G - 19% 2% 1.60x ONLINE - I observe another process being reported at 100% cpu usage, that's highly unusual, also inside the cfengine-jail: tools-1-b# ps w 2988 PID TT STAT TIME COMMAND 2988 - TsJ 118:40.91 /usr/local/sbin/cf-serverd tools-1-b# procstat -kk 2988 PID TID COMM TDNAME KSTACK 2988 100835 cf-serverd - mi_switch+0xd2 thread_suspend_check+0x34a ast+0x5be doreti_ast+0x1f 2988 100922 cf-serverd - mi_switch+0xd2 thread_suspend_check+0x34a ast+0x5be doreti_ast+0x1f 2988 100963 cf-serverd - dmu_object_info_from_db+0x2d zfs_zget+0xde zfs_get_data+0x40 zil_commit+0x6a1 zfs_freebsd_putpages+0x71e VOP_PUTPAGES_APV+0x8d vnode_pager_putpages+0xe3 vm_pageout_flush+0xed vm_object_page_collect_flush+0x21a vm_object_page_clean+0x197 vm_object_terminate+0x99 zfs_freebsd_reclaim+0x1e VOP_RECLAIM_APV+0x8f vgonel+0x2ef vrecycle+0x90 zfs_freebsd_inactive+0xd VOP_INACTIVE_APV+0x8d vinactive+0x17d 2988 100966 cf-serverd - mi_switch+0xd2 thread_suspend_check+0x34a ast+0x5be doreti_ast+0x1f 2988 100970 cf-serverd - mi_switch+0xd2 thread_suspend_check+0x34a ast+0x5be doreti_ast+0x1f 2988 100973 cf-serverd - mi_switch+0xd2 thread_suspend_check+0x34a ast+0x5be doreti_ast+0x1f [ ... 1000 more threads with exactly "mi_switch+0xd2 thread_suspend_check+0x34a ast+0x5be doreti_ast+0x1f"] This thread 100963 seems to be looping around: tools-1-b# {repeat 20 procstat -k 2988} | awk '$2=="100963"' | cut -d- -f3- | sort | uniq dbuf_hold_impl dbuf_hold dnode_hold_impl dmu_bonus_hold zfs_zget zfs_get_data zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush vm_object_page_collect_flush vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle dbuf_read dmu_bonus_hold zfs_zget zfs_get_data zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush vm_object_page_collect_flush vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive VOP_INACTIVE_APV dbuf_read dnode_hold_impl dmu_bonus_hold zfs_zget zfs_get_data zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush vm_object_page_collect_flush vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive dmu_bonus_hold zfs_zget zfs_get_data zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush vm_object_page_collect_flush vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive VOP_INACTIVE_APV vinactive dmu_object_info_from_db zfs_zget zfs_get_data zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush vm_object_page_collect_flush vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive VOP_INACTIVE_APV vinactive dmu_object_info_from_dnode dmu_object_info_from_db zfs_zget zfsget_data zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush vm_object_page_collect_flush vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive VOP_INACTIVE_APV dmu_zfetch dbuf_read dnode_hold_impl dmu_bonus_hold zfs_zget zfs_get_data zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush vm_object_page_collect_flush vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle dnode_hold_impl dmu_bonus_hold zfs_zget zfs_get_data zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush vm_o bject_page_collect_flush vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive VOP_INACTIVE_APV dnode_rele dmu_bonus_hold zfs_zget zfs_get_data zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush vm_object_page_collect_flush vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive VOP_INACTIVE_APV zfs_zget zfs_get_data zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush vm_object_page_collect_flush vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive VOP_INACTIVE_APV vinactive vputx Only the above listed processes are affected, there are ~200 other ones running (and writing to zroot pool).
I seem to be able to reproduce this on a similar system, having cfengine running and run a loop in a shell like "while :; do sync; done": tools-2-b# ps ax | grep sync 27 - DL 1:27.96 [syncer] 3667 1 R+ 1017:57.50 sync tools-2-b# procstat -kk 3667 PID TID COMM TDNAME KSTACK 3667 102488 sync - dbuf_read+0x155 dnode_hold_impl+0x188 dmu_bonus_hold+0x1d zfs_zget+0xc5 zfs_get_data+0x40 zil_commit+0x6a1 zfs_sync+0xad sys_sync+0x146 amd64_syscall+0x4ce Xfast_syscall+0xfb tools-2-b# procstat -kk 3667 PID TID COMM TDNAME KSTACK 3667 102488 sync - dnode_hold_impl+0x409 dmu_bonus_hold+0x1d zfs_zget+0xc5 zfs_get_data+0x40 zil_commit+0x6a1 zfs_sync+0xad sys_sync+0x146 amd64_syscall+0x4ce Xfast_syscall+0xfb tools-2-b# procstat -kk 3667 PID TID COMM TDNAME KSTACK 3667 102488 sync - dmu_zfetch+0x7b dbuf_read+0x155 dnode_hold_impl+0x188 dmu_bonus_hold+0x1d zfs_zget+0xc5 zfs_get_data+0x40 zil_commit+0x6a1 zfs_sync+0xad sys_sync+0x146 amd64_syscall+0x4ce Xfast_syscall+0xfb tools-2-b# ps ax | awk '$3~"D"' | tail 22 - DL 0:00.00 [vmdaemon] 23 - DL 0:00.01 [pagezero] 24 - DL 0:02.13 [bufspacedaemon] 25 - DL 0:02.06 [bufdaemon] 26 - DL 0:02.31 [vnlru] 27 - DL 1:26.84 [syncer] 643 - DL 1:03.87 [pf purge] 3650 - DsJ 0:00.05 /var/cfengine/bin/cf-agent -f failsafe.cf 11495 - DJ 0:00.01 /var/cfengine/bin/cf-agent -f failsafe.cf 96409 - DL 0:15.85 [ftcleanup] tools-2-b# procstat -k 3650 PID TID COMM TDNAME KSTACK 3650 102929 cf-agent - mi_switch sleepq_wait _cv_wait zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush vm_object_page_collect_flush vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive VOP_INACTIVE_APV vinactive The shell loop is running in a screen and this indicates it took around 5 minutes for a sync(8) to stay hanging around: tools-2-b# ps ax | grep sync 27 - DL 1:27.96 [syncer] 3667 1 R+ 1017:57.50 sync tools-2-b# ps olstart 3667 STARTED Thu May 11 19:50:28 2017 This is the hanging cf-agent process: tools-2-b# ps olstart 3650 STARTED Thu May 11 19:50:28 2017 From this on, other calls to sync(2) are hanging: tools-2-b# printf '#include<unistd.h>\n int main(){sync();}' | cc -x c -o sync - tools-2-b# ./sync & [1] 67191 tools-2-b# procstat -kk 67191 PID TID COMM TDNAME KSTACK 67191 103195 sync - mi_switch+0xd2 sleepq_wait+0x3a _cv_wait+0x194 zil_commit+0x85 zfs_sync+0xad sys_sync+0x146 amd64_syscall+0x4ce Xfast_syscall+0xfb
Created attachment 182550 [details] "procstat -kk -a" output I'm now on tools-2-b# uname -a FreeBSD tools-2-b 11.0-RELEASE-p10 FreeBSD 11.0-RELEASE-p10 #0 r318236: Fri May 12 13:52:37 CEST 2017 root@tools-2-b:/usr/obj/usr/src/sys/DEBUG amd64 This is GENERIC plus: options INVARIANTS options INVARIANT_SUPPORT options WITNESS options WITNESS_SKIPSPIN options DEBUG_LOCKS options DEBUG_VFS_LOCKS options DIAGNOSTIC While running sync(2) in a loop, cf-agent hangs quickly: tools-2-b# ktrace cf-agent load: 20.62 cmd: cf-agent 5360 [zilog->zl_cv_batch[0]] 162.96r 0.01u 0.06s 0% 8740k tools-2-b# kdump -f ktrace.out | tail 5360 cf-agent CALL close(0x5) 5360 cf-agent RET close 0 5360 cf-agent CALL munmap(0x8006b3000,0x10) 5360 cf-agent RET munmap 0 5360 cf-agent CALL munmap(0x8006b4000,0x10) 5360 cf-agent RET munmap 0 5360 cf-agent CALL fcntl(0x4,F_SETLK,0xffffd450) 5360 cf-agent RET fcntl 0 5360 cf-agent CALL unlink(0x7fffffffd080) 5360 cf-agent NAMI "/tmp/SEMDMDBrvXw+F@)aP1" tools-2-b# procstat -kk 5360 PID TID COMM TDNAME KSTACK 5360 102508 cf-agent - mi_switch+0x167 sleepq_switch+0xe7 sleepq_wait+0x43 _cv_wait+0x1e4 zil_commit+0x85 zfs_freebsd_putpages+0x7d6 VOP_PUTPAGES_APV+0x104 vnode_pager_putpages+0x89 vm_pageout_flush+0x12d vm_object_page_collect_flush+0x23a vm_object_page_clean+0x1be vm_object_terminate+0xa9 vnode_destroy_vobject+0x79 zfs_freebsd_reclaim+0x63 VOP_RECLAIM_APV+0x104 vgonel+0x2bd vrecycle+0x4a zfs_freebsd_inactive+0xd tools-2-b# ps ax | grep sync 27 - DL 0:00.09 [syncer] 15560 0 S+ 0:00.00 grep sync 5177 1 RN 7:16.42 ./syncloop tools-2-b# kill 5177 tools-2-b# kill -9 5177 tools-2-b# procstat -kk 5177 PID TID COMM TDNAME KSTACK 5177 102515 syncloop - zfs_zget+0x20f zfs_get_data+0x7c zil_commit+0x926 zfs_sync+0xad sys_sync+0xe6 amd64_syscall+0x2f6 Xfast_syscall+0xfb tools-2-b# procstat -kk 5177 PID TID COMM TDNAME KSTACK 5177 102515 syncloop - zfs_zget+0x1b4 zfs_get_data+0x7c zil_commit+0x926 zfs_sync+0xad sys_sync+0xe6 amd64_syscall+0x2f6 Xfast_syscall+0xfb tools-2-b# procstat -kk 5177 PID TID COMM TDNAME KSTACK 5177 102515 syncloop - _sx_slock+0x73 dnode_verify+0xe2 dnode_hold_impl+0x53e dmu_bonus_hold+0x25 zfs_zget+0xcf zfs_get_data+0x7c zil_commit+0x926 zfs_sync+0xad sys_sync+0xe6 amd64_syscall+0x2f6 Xfast_syscall+0xfb tools-2-b# procstat -kk 5177 PID TID COMM TDNAME KSTACK 5177 102515 syncloop - dnode_hold_impl+0x53e dmu_bonus_hold+0x25 zfs_zget+0xcf zfs_get_data+0x7c zil_commit+0x926 zfs_sync+0xad sys_sync+0xe6 amd64_syscall+0x2f6 Xfast_syscall+0xfb tools-2-b# procstat -kk -a > procstat-kk-a.txt tools-2-b# kgdb -q /usr/lib/debug/boot/kernel/kernel.debug /dev/mem Reading symbols from /usr/lib/debug/boot/kernel/zfs.ko.debug...done. Loaded symbols for /usr/lib/debug/boot/kernel/zfs.ko.debug Reading symbols from /usr/lib/debug/boot/kernel/opensolaris.ko.debug...done. Loaded symbols for /usr/lib/debug/boot/kernel/opensolaris.ko.debug Reading symbols from /usr/lib/debug/boot/kernel/carp.ko.debug...done. Loaded symbols for /usr/lib/debug/boot/kernel/carp.ko.debug Reading symbols from /usr/lib/debug/boot/kernel/if_lagg.ko.debug...done. Loaded symbols for /usr/lib/debug/boot/kernel/if_lagg.ko.debug Reading symbols from /usr/lib/debug/boot/kernel/ums.ko.debug...done. Loaded symbols for /usr/lib/debug/boot/kernel/ums.ko.debug Reading symbols from /usr/lib/debug/boot/kernel/pf.ko.debug...done. Loaded symbols for /usr/lib/debug/boot/kernel/pf.ko.debug #0 sched_switch (td=0xfffff801b1245500, newtd=0xfffff80115531000, flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1973 1973 cpuid = PCPU_GET(cpuid); (kgdb) tid 102508 [Switching to thread 1672 (Thread 102508)]#0 sched_switch ( td=0xfffff8055a385000, newtd=0xfffff80115533000, flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1973 1973 cpuid = PCPU_GET(cpuid); Current language: auto; currently minimal #0 sched_switch (td=0xfffff8055a385000, newtd=0xfffff80115533000, flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1973 #1 0xffffffff80a497d7 in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:455 #2 0xffffffff80a8dac7 in sleepq_switch (wchan=<value optimized out>, pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:566 #3 0xffffffff80a8d993 in sleepq_wait (wchan=0xfffff801b112f8f8, pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:646 #4 0xffffffff809e11b4 in _cv_wait (cvp=<value optimized out>, lock=<value optimized out>) at /usr/src/sys/kern/kern_condvar.c:144 #5 0xffffffff82238545 in zil_commit (zilog=<value optimized out>, foid=<value optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c:1574 #6 0xffffffff8226c746 in zfs_freebsd_putpages (ap=<value optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:4767 #7 0xffffffff81005e74 in VOP_PUTPAGES_APV (vop=<value optimized out>, a=<value optimized out>) at vnode_if.c:2930 #8 0xffffffff80d3a349 in vnode_pager_putpages (object=<value optimized out>, m=0xfffffe201e33a3f0, count=<value optimized out>, flags=0, rtvals=0xfffffe201e33a360) at vnode_if.h:1224 #9 0xffffffff80d2f67d in vm_pageout_flush (mc=<value optimized out>, count=<value optimized out>, flags=1, mreq=0, prunlen=0xfffffe201e33a4ac, eio=0xfffffe201e33a530) at vm_pager.h:126 #10 0xffffffff80d26fca in vm_object_page_collect_flush ( object=<value optimized out>, p=<value optimized out>, pagerflags=1, flags=0, clearobjflags=<value optimized out>, eio=0xfffffe201e33a530) at /usr/src/sys/vm/vm_object.c:979 #11 0xffffffff80d26cbe in vm_object_page_clean (object=<value optimized out>, start=<value optimized out>, end=<value optimized out>, flags=<value optimized out>) at /usr/src/sys/vm/vm_object.c:905 #12 0xffffffff80d26829 in vm_object_terminate (object=<value optimized out>) at /usr/src/sys/vm/vm_object.c:739 #13 0xffffffff80d3a869 in vnode_destroy_vobject (vp=0xfffff804e7292000) at /usr/src/sys/vm/vnode_pager.c:173 #14 0xffffffff822761e3 in zfs_freebsd_reclaim (ap=<value optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:5367 #15 0xffffffff810047a4 in VOP_RECLAIM_APV (vop=<value optimized out>, a=<value optimized out>) at vnode_if.c:2021 #16 0xffffffff80afdf0d in vgonel (vp=<value optimized out>) at vnode_if.h:830 #17 0xffffffff80afe45a in vrecycle (vp=0xfffff804e7292000) at /usr/src/sys/kern/vfs_subr.c:3087 #18 0xffffffff8227616d in zfs_freebsd_inactive (ap=<value optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:5349 #19 0xffffffff810045f4 in VOP_INACTIVE_APV (vop=<value optimized out>, a=<value optimized out>) at vnode_if.c:1955 #20 0xffffffff80afca9b in vinactive (vp=0xfffff804e7292000, td=0xfffff8055a385000) at vnode_if.h:807 #21 0xffffffff80afd2d8 in vputx (vp=<value optimized out>, func=<value optimized out>) at /usr/src/sys/kern/vfs_subr.c:2682 #22 0xffffffff80b05ae2 in kern_unlinkat (td=<value optimized out>, fd=-100, path=0x7fffffffd080 <Error reading address 0x7fffffffd080: Bad address>, pathseg=UIO_USERSPACE, oldinum=<value optimized out>) at /usr/src/sys/kern/vfs_syscalls.c:1774 #23 0xffffffff80ea4426 in amd64_syscall (td=0xfffff8055a385000, traced=<value optimized out>) at subr_syscall.c:135 #24 0xffffffff80e8497b in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396 #25 0x0000000802a72f7a in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) frame 4 #4 0xffffffff809e11b4 in _cv_wait (cvp=<value optimized out>, lock=<value optimized out>) at /usr/src/sys/kern/kern_condvar.c:144 144 sleepq_wait(cvp, 0); (kgdb) p *cvp $1 = {cv_description = 0xffffffff81c81848 "I.=\201????", cv_waiters = -411880152} (kgdb) frame 5 #5 0xffffffff82238545 in zil_commit (zilog=<value optimized out>, foid=<value optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c:1574 1574 cv_wait(&zilog->zl_cv_batch[mybatch & 1], &zilog->zl_lock); (kgdb) p *zilog $2 = {zl_lock = {lock_object = {lo_name = 0xffffffff81c81848 "I.=\201????", lo_flags = 3883087144, lo_data = 4294965252, lo_witness = 0x0}, sx_lock = 18446735298679485752}, zl_dmu_pool = 0x0, zl_spa = 0xffffffff81bb7cc8, zl_header = 0x0, zl_os = 0xfffff804e7665e80, zl_get_data = 0, zl_root_zio = 0x0, zl_lr_seq = 0, zl_commit_lr_seq = 18446741874700784480, zl_destroy_txg = 18446735282269118464, zl_replayed_seq = 0xfffff8055a385068, zl_replaying_seq = 18446735284892366080, zl_suspend = 0, zl_cv_writer = { cv_description = 0x1906cffffffff <Error reading address 0x1906cffffffff: Bad address>, cv_waiters = 0}, zl_cv_suspend = {cv_description = 0x0, cv_waiters = 0}, zl_suspending = 0 '\0', zl_keep_first = 0 '\0', zl_replay = 0 '\0', zl_stop_sync = 0 '\0', zl_writer = 0 '\0', zl_logbias = 0 '\0', zl_sync = 0 '\0', zl_parse_error = 0, zl_parse_blk_seq = 18446735300605006016, zl_parse_lr_seq = 18446735298679485736, zl_parse_blk_count = 1, zl_parse_lr_count = 17179869439, zl_next_batch = 2, zl_com_batch = 0, zl_cv_batch = 0xfffff8055a3850f8, zl_itxg = 0xfffff8055a385118, zl_itx_commit_list = {list_size = 206226633, list_offset = 2, list_head = { list_next = 0x8, list_prev = 0x0}}, zl_itx_list_sz = 15863, zl_cur_used = 63452, zl_lwb_list = {list_size = 79316, list_offset = 0, list_head = {list_next = 0xc4ac4c9, list_prev = 0x0}}, zl_vdev_lock = { lock_object = {lo_name = 0x0, lo_flags = 0, lo_data = 0, lo_witness = 0x0}, sx_lock = 270582939648}, zl_vdev_tree = { avl_root = 0x0, avl_compar = 0, avl_offset = 4, avl_numnodes = 0, avl_size = 0}, zl_clean_taskq = 0x612d666300000000, zl_bp_tree = { avl_root = 0x746e6567, avl_compar = 0, avl_offset = 0, avl_numnodes = 0, avl_size = 0}, zl_replay_time = 0, zl_replay_blks = 0, zl_old_header = { zh_claim_txg = 0, zh_replay_seq = 0, zh_log = { blk_dva = 0xfffff8055a3852e0, blk_prop = 0, blk_pad = 0xfffff8055a385318, blk_phys_birth = 0, blk_birth = 2, blk_fill = 18446735284884707568, blk_cksum = { zc_word = 0xfffff8055a385340}}, zh_claim_blk_seq = 0, zh_flags = 0, zh_claim_lr_seq = 0, zh_pad = 0xfffff8055a385378}, zl_prev_blks = 0xfffff8055a385390, zl_prev_rotor = 0, zl_dirty_link = { tn_next = 0xfffff8055a3853d8, tn_member = 0xfffff8055a3853f8 "@֨\200????"}} (kgdb)
I'm pretty sure I haven't seen this issue on 10.3-RELEASE and now I fail to reproduce it on 11.1-PRERELEASE, etc., all on the same hardware as before. Working versions: n-3-b% uname -a FreeBSD n-3-b 11.1-PRERELEASE FreeBSD 11.1-PRERELEASE #0 r319653: Wed Jun 7 15:17:47 CEST 2017 root@n-3-b:/usr/obj/usr/src/sys/DEBUG amd64 tools-1-b% uname -a FreeBSD tools-1-b 11.1-BETA1 FreeBSD 11.1-BETA1 #0 r319852: Mon Jun 12 15:12:38 CEST 2017 root@tools-1-b:/usr/obj/usr/src/sys/DEBUG amd64 nagios-2-b% uname -a FreeBSD nagios-2-b 11.1-BETA2 FreeBSD 11.1-BETA2 #0 r320148: Tue Jun 20 18:07:08 CEST 2017 root@tools-2-b:/usr/obj/usr/src/sys/GENERIC amd64 DEBUG kernel config is GENERIC plus options INVARIANTS options INVARIANT_SUPPORT options WITNESS options WITNESS_SKIPSPIN options DEBUG_LOCKS options DEBUG_VFS_LOCKS I've had to remove DIAGNOSTIC for performance reasons. With 11.1-RELEASE upcoming I consider this problem solved, although I fail to spot the changes in svn log so far. Thank you.