I tried to abort a zfs send: 07:06:26 2.28G zroot/j/jails/admin-svn@daily.31 07:06:27 2.29G zroot/j/jails/admin-svn@daily.31 ^C load: 1.27 cmd: zfs 72318 [tx->tx_sync_done_cv] 101.46r 0.00u 1.89s 0% 4212k load: 0.84 cmd: zfs 72318 [tx->tx_sync_done_cv] 127.51r 0.00u 1.89s 0% 4212k load: 0.84 cmd: zfs 72318 [tx->tx_sync_done_cv] 127.98r 0.00u 1.89s 0% 4212k It turns out the freebsd-snapshot cron was running at that very instant and it lead to a deadlock: # ps -axl | grep tx 0 74866 72290 0 20 0 40336 3528 tx->tx_s D - 0:00.00 zfs rename zroot/j/jails/monitor@hourly.15 zroot/j/jails/monitor@hourly.16 0 72318 69290 0 20 0 42520 4228 tx->tx_s DE+ 0 0:01.89 zfs send -Rv zroot/j/jails/admin-svn@sync.00 I've seen this before but never quite at such a bad time. Neither can be kill -9'ed.
FWIW; the machine couldn't reboot either. It required a break to ddb and 'call cpu_reset'.
uname -a? FWIW, I ran into ZFS deadlocks with r288943 when running tests/sys/acl . When I upgrade to r289441, this went away. I was running into apparent deadlocks with the spa namespace lock though, not tx->tx_sync_done_cv. mav's modifications to zfs send/recv might need to be fixed.
It's later than the revision you indicated: FreeBSD halo.ysv.freebsd.org 11.0-CURRENT FreeBSD 11.0-CURRENT #0 r289488: Sun Oct 18 06:47:35 UTC 2015 peter@build-11.freebsd.org:/usr/obj/usr/src/sys/CLUSTER11 amd64
https://wiki.freebsd.org/AvgZfsDeadlockDebug
Created attachment 167247 [details] procstat -kk -a output output from procstat -kk -a at the moment of the deadlock
We've run into this exact same issue using the zrep script. It's easily reproducable. The deadlock occurs when a "zfs rename" is executed while there is still a "zfs send" process running. In zrep, this occurs when a zfs receive on the remote hosts fails and zrep immediately attempts to rename a snapshot. However, the zfs send process is still running (cleaning up?) and this causes an immediate deadlock. Any zfs process run after that immediately blocks in state "zpa_namespace_lock". IO continues to work, but fails at some point in the future. I've not diagnosed the exact trigger that causes IO to seize. The only resolution is to power cycle the system. I've just attached the output to a procstat -kk -a from yesterday when I ran into this. Running on 10.3-PRERELEASE FreeBSD 10.3-PRERELEASE #1 r294572
(In reply to roel from comment #6) It would be interesting to run kgdb and examine spa_namespace_lock. Forcing a crash dump for post-mortem examination is probably more useful because it would be easier to answer any possible follow-up questions.
(In reply to roel from comment #6) Hi guys, Any news about this issue? I've same issue when running zrep script as roel said. Roel, did you find any workaround for this? Thanks.
Yes, I just introduced a short sleep in the script as a workaround.
(In reply to roel from comment #9) Hi Roel, Can you send me the patch to my email address? i'd like to add it. I think would be a good idea to submit a patch to zrep port. The worst thing with this issue is: when it happens i have to force server reboot and when it comes back i lost all my data since the beginning of issue. It's unbelievable, i know, but it's happening.
That's not that hard: *** /home/roel/zrep Wed Jan 13 22:12:27 2016 --- zrep Sun Feb 21 13:24:35 2016 *************** *** 1160,1164 **** if [[ $? -ne 0 ]] ; then zfs rename ${newsnap} ${newsnap}_unsent zrep_errquit Problem doing sync for $newsnap. renamed to ${newsnap}_unsent fi --- if [[ $? -ne 0 ]] ; then + echo "Error on send.. waiting 60 seconds to prevent deadlock" + sleep 60 zfs rename ${newsnap} ${newsnap}_unsent zrep_errquit Problem doing sync for $newsnap. renamed to ${newsnap}_unsent fi
(In reply to roel from comment #11) Hi Roel, Thanks to send me. Do you know some way to simulate this issue? i tried to just make my second server down and run zrep, but didn't work. Thanks.
I think that I've been able to reproduce this problem or, at least, something that looks very much like it. I did the standard procstat debugging and I noticed something that did not appear in any of the previous reports: 6 100572 zfskern txg_thread_enter mi_switch+0x167 sleepq_switch+0xe7 sleepq_wait+0x43 _sx_xlock_hard+0x49d _sx_xlock+0xc5 zvol_rename_minors+0x104 dsl_dataset_rename_snapshot_sync_impl+0x308 dsl_dataset_rename_snapshot_sync+0xc1 dsl_sync_task_sync+0xef dsl_pool_sync+0x45b spa_sync+0x7c7 txg_sync_thread+0x383 fork_exit+0x84 fork_trampoline+0xe 1226 100746 zfs - mi_switch+0x167 sleepq_switch+0xe7 sleepq_wait+0x43 _cv_wait+0x1e4 txg_wait_synced+0x13b dsl_sync_task+0x205 dsl_dataset_user_release_impl+0x1cf dsl_dataset_user_release_onexit+0x86 zfs_onexit_destroy+0x56 zfsdev_close+0x88 devfs_destroy_cdevpriv+0x8b devfs_close_f+0x65 _fdrop+0x1a closef+0x200 closefp+0xa3 amd64_syscall+0x2db Xfast_syscall+0xfb 1228 100579 zfs - mi_switch+0x167 sleepq_switch+0xe7 sleepq_wait+0x43 _cv_wait+0x1e4 txg_wait_synced+0x13b dsl_sync_task+0x205 dsl_dataset_rename_snapshot+0x3a zfs_ioc_rename+0x157 zfsdev_ioctl+0x635 devfs_ioctl_f+0x156 kern_ioctl+0x246 sys_ioctl+0x171 amd64_syscall+0x2db Xfast_syscall+0xfb Thread 100746 is it. zfsdev_close() holds spa_namespace_lock and then calls dsl_sync_task() -> txg_wait_synced(). On the other hand the sync thread (100572) gets stuck on spa_namespace_lock in a call to zvol_rename_minors(). My opinion is that the sync thread must never try to take spa_namespace_lock. The problem seems to be introduced quite a while ago in base r219317. Some later commits like base r272474 also followed the same pattern. The problem is certainly FreeBSD-specific as illumos handles ZVOL names in a very different manner. Also, the problem is rather deep-rooted and at the moment I do not see any easy way to fix without breaking ZVOL name tracking. P.S. A bit of information from ddb: db> p spa_namespace_lock ffffffff822b1ee0 db> show lock 0xffffffff822b1ee0 class: sx name: spa_namespace_lock state: XLOCK: 0xfffff8001da60500 (tid 100746, pid 1226, "zfs") waiters: exclusive db> thread 100746 [ thread pid 1226 tid 100746 ] sched_switch+0x48a: movl %gs:0x34,%eax db> bt Tracing pid 1226 tid 100746 td 0xfffff8001da60500 sched_switch() at sched_switch+0x48a/frame 0xfffffe004def4590 mi_switch() at mi_switch+0x167/frame 0xfffffe004def45c0 sleepq_switch() at sleepq_switch+0xe7/frame 0xfffffe004def4600 sleepq_wait() at sleepq_wait+0x43/frame 0xfffffe004def4630 _cv_wait() at _cv_wait+0x1e4/frame 0xfffffe004def4690 txg_wait_synced() at txg_wait_synced+0x13b/frame 0xfffffe004def46d0 dsl_sync_task() at dsl_sync_task+0x205/frame 0xfffffe004def4790 dsl_dataset_user_release_impl() at dsl_dataset_user_release_impl+0x1cf/frame 0xfffffe004def4910 dsl_dataset_user_release_onexit() at dsl_dataset_user_release_onexit+0x86/frame 0xfffffe004def4940 zfs_onexit_destroy() at zfs_onexit_destroy+0x56/frame 0xfffffe004def4970 zfsdev_close() at zfsdev_close+0x88/frame 0xfffffe004def4990 devfs_destroy_cdevpriv() at devfs_destroy_cdevpriv+0x8b/frame 0xfffffe004def49b0 devfs_close_f() at devfs_close_f+0x65/frame 0xfffffe004def49e0 _fdrop() at _fdrop+0x1a/frame 0xfffffe004def4a00 closef() at closef+0x200/frame 0xfffffe004def4a90 closefp() at closefp+0xa3/frame 0xfffffe004def4ae0 amd64_syscall() at amd64_syscall+0x2db/frame 0xfffffe004def4bf0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe004def4bf0 --- syscall (6, FreeBSD ELF64, sys_close), rip = 0x8013f996a, rsp = 0x7fffffffd438, rbp = 0x7fffffffd450 ---
You're conclusion seems valid there Andriy but as you say there doesn't seem to be an easy way to fix it.
(In reply to Steven Hartland from comment #14) One possible approach is to move all the "minors" manipulations to ZFS ioctl handlers, the only place where they were called originally, out of the sync thread. This sounds rather straightforward, but could turn out to be a bit laborious.
(In reply to Steven Hartland from comment #14) Another possibility is to apply all zvol changes asynchronously. Either by queuing them or by having a dedicated thread scanning for them. The former should be almost trivial to implement.
Another observation is that the current code seems to be broken in a couple of places as well. 1. dsl_dataset_rename_snapshot_sync_impl() uses ddrsa_fsname to construct the old and new snapshot names, but this would be wrong in the case of a recursive snpashot rename (zfs rename -r x@snap1 x@snap2) as ddrsa_fsname always points to the name of the top level filesystem. 2. dsl_dataset_promote_sync(): as far as I can see, oldname is never populated, so the code can do arbitrary renames.
And another deadlock of the same nature, but with an even more obvious LOR: 5087 100846 zfs - mi_switch+0x167 sleepq_switch+0xe7 sleepq_wait+0x43 _cv_wait+0x1e4 rrw_enter_read_impl+0x9b dsl_pool_hold+0x77 dmu_objset_own+0x31 zvol_create_minor+0x16f zvol_create_minors+0x9d zfs_ioc_create+0x3df zfsdev_ioctl+0x65c devfs_ioctl_f+0x156 kern_ioctl+0x246 sys_ioctl+0x171 amd64_syscall+0x2db Xfast_syscall+0xfb 6 100572 zfskern txg_thread_enter mi_switch+0x167 sleepq_switch+0xe7 sleepq_wait+0x43 _sx_xlock_hard+0x49d _sx_xlock+0xc5 zvol_rename_minors+0x104 dsl_dir_rename_sync+0x406 dsl_sync_task_sync+0xef dsl_pool_sync+0x45b spa_sync+0x7c7 txg_sync_thread+0x383 fork_exit+0x84 fork_trampoline+0xe Thread 100846 took spa_namespace_lock in zvol_create_minor() and wants to get dp_config_rwlock (as a reader) in dsl_pool_hold(). Thread 100572 took as dp_config_rwlock (as a writer) in dsl_sync_task_sync() and wants to take spa_namespace_lock in zvol_rename_minors().
The deadlock from comment #13 can happen even if there are no zvols in a pool at all, the deadlock from comment #18 happens when a zvol is created.
I've coded the suggestion from comment #15 here https://reviews.freebsd.org/D7179 The change should also fix bugs described in comment #17.
Hi All We are seeing the same problem using NPM 6.2.2. This has been a problem for some time, but seems to randomly pop up. We have run for months without this error happening, however today it seems to happen repeatedly and relatively reproducable. Procstat trace below. # procstat -kk 12590 PID TID COMM TDNAME KSTACK 12590 101356 node - mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d kern_kevent_fp+0x399 kern_kevent+0x9f sys_kevent+0x12a amd64_syscall+0x40f Xfast_syscall+0xfb 12590 101358 node - mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d umtxq_sleep+0x125 do_sem_wait+0x4db __umtx_op_sem_wait+0x73 amd64_syscall+0x40f Xfast_syscall+0xfb 12590 101359 node V8 WorkerThread mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d umtxq_sleep+0x125 do_sem_wait+0x4db __umtx_op_sem_wait+0x73 amd64_syscall+0x40f Xfast_syscall+0xfb 12590 101360 node V8 WorkerThread mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d umtxq_sleep+0x125 do_sem_wait+0x4db __umtx_op_sem_wait+0x73 amd64_syscall+0x40f Xfast_syscall+0xfb 12590 101361 node V8 WorkerThread mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d umtxq_sleep+0x125 do_sem_wait+0x4db __umtx_op_sem_wait+0x73 amd64_syscall+0x40f Xfast_syscall+0xfb 12590 101362 node V8 WorkerThread mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d umtxq_sleep+0x125 do_sem_wait+0x4db __umtx_op_sem_wait+0x73 amd64_syscall+0x40f Xfast_syscall+0xfb 12590 101363 node - mi_switch+0xe1 sleepq_wait+0x3a sleeplk+0x15d __lockmgr_args+0x91a vop_stdlock+0x3c VOP_LOCK1_APV+0xab _vn_lock+0x43 vget+0x73 cache_lookup+0x5d5 vfs_cache_lookup+0xac VOP_LOOKUP_APV+0xa1 lookup+0x5a1 namei+0x4d4 kern_renameat+0x1b7 amd64_syscall+0x40f Xfast_syscall+0xfb 12590 101364 node - mi_switch+0xe1 sleepq_wait+0x3a sleeplk+0x15d __lockmgr_args+0xca0 vop_stdlock+0x3c VOP_LOCK1_APV+0xab _vn_lock+0x43 vputx+0x21f
(In reply to Laurie Odgers from comment #21) I do not think that you are seeing the same problem. The data that you provided is far too incomplete to tell, but it is more likely that you are running into bug #209158. Or something else altogether.
(In reply to Andriy Gapon from comment #22) Yes apologies - I had both threads open and accidentally posted in this one.
I update Andiry's patch with a zvol update ordering. The patch is here: https://reviews.freebsd.org/D14669
New version of a prospective solution: https://reviews.freebsd.org/D23478
A commit references this bug: Author: avg Date: Thu Jun 11 10:41:34 UTC 2020 New revision: 362047 URL: https://svnweb.freebsd.org/changeset/base/362047 Log: rework how ZVOLs are updated in response to DSL operations With this change all ZVOL updates are initiated from the SPA sync context instead of a mix of the sync and open contexts. The updates are queued to be applied by a dedicated thread in the original order. This should ensure that ZVOLs always accurately reflect the corresponding datasets. ZFS ioctl operations wait on the mentioned thread to complete its work. Thus, the illusion of the synchronous ZVOL update is preserved. At the same time, the SPA sync thread never blocks on ZVOL related operations avoiding problems like reported in bug 203864. This change is based on earlier work in the same direction: D7179 and D14669 by Anthoine Bourgeois. D7179 tried to perform ZVOL operations in the open context and that opened races between them. D14669 uses a design very similar to this change but with different implementation details. This change also heavily borrows from similar code in ZoL, but there are many differences too. See: - https://github.com/zfsonlinux/zfs/commit/a0bd735adb1b1eb81fef10b4db102ee051c4d4ff - https://github.com/zfsonlinux/zfs/issues/3681 - https://github.com/zfsonlinux/zfs/issues/2217 PR: 203864 MFC after: 5 weeks Sponsored by: CyberSecure Differential Revision: https://reviews.freebsd.org/D23478 Changes: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_destroy.c head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dir.c head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa_impl.h head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zvol.h head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
Merged to stable/12 in base r364917.