After a recent upgrade from 12.1 to 12.2-stable (stable/12-c1-ge82353f84), several zfs processes hang in different wait channels. This is a system with short of 1000 zfs filesystems, making frequent snapshot based send/receive backups from a primary data pool to a backup pool using znapzend. ps axl | awk '/zfs / { print $9}' | sort | uniq -c 20 rrl->rr_ 28 tq_qdrai 4 tx->tx_s some sample for each: 0 95220 94050 23 28 0 13000 3372 rrl->rr_ D - 0:00.74 zfs recv -F backup1/servi 0 87914 85482 22 25 0 13000 3308 tq_qdrai D - 0:00.55 zfs recv -F backup1/servi 0 77834 77117 3 27 0 13000 2716 tx->tx_s D - 0:00.74 zfs recv -F backup1/servi PID TID COMM TDNAME KSTACK 95220 104268 zfs - mi_switch+0xd4 sleepq_wait+0x2c _cv_wait+0xf2 rrw_enter_read_impl+0x8b zfs_register_callbacks+0x1c6 zfsvfs_setup+0x18 zfs_resume_fs+0xc0 zfs_ioc_recv+0xb53 zfsdev_ioctl+0x62d devfs_ioctl+0xb0 VOP_IOCTL_APV+0x7b vn_ioctl+0x16a devfs_ioctl_f+0x1e kern_ioctl+0x2b7 sys_ioctl+0xfa amd64_syscall+0x387 fast_syscall_common+0xf8 87914 103712 zfs - mi_switch+0xd4 sleepq_wait+0x2c _sleep+0x253 taskqueue_drain_all+0xe1 zfsdev_ioctl+0x7e3 devfs_ioctl+0xb0 VOP_IOCTL_APV+0x7b vn_ioctl+0x16a devfs_ioctl_f+0x1e kern_ioctl+0x2b7 sys_ioctl+0xfa amd64_syscall+0x387 fast_syscall_common+0xf8 77834 104829 zfs - mi_switch+0xd4 sleepq_wait+0x2c _cv_wait+0xf2 txg_wait_synced_impl+0xa9 txg_wait_synced+0xb dsl_sync_task_common+0x230 dsl_sync_task+0x1a dmu_recv_end+0x67 zfs_ioc_recv+0xb3d zfsdev_ioctl+0x62d devfs_ioctl+0xb0 VOP_IOCTL_APV+0x7b vn_ioctl+0x16a devfs_ioctl_f+0x1e kern_ioctl+0x2b7 sys_ioctl+0xfa amd64_syscall+0x387 fast_syscall_common+0xf8 this starts to happen after a couple of hours of uptime, not immediately. I wanted to check my previous 12.1 version, but bectl hangs as well.. These processes are unkillable, and I'll be forced to reboot the system hard, because it won't shut down properly (at least not within reasonable amount of time).
(In reply to Markus Wild from comment #0) How much free space is there in the filesystems - any filesystems near quota limits?
(In reply to Peter Eriksson from comment #1) Both data pools are in single digit capacity percentage, and no filesystem with a quota is anywhere near the limit. I've now rebooted the system, so I can query some more zfs-related info if that helps.
Seeing the recent FreeBSD-EN-21:04.zfs advisory about changes to zfs receive, could this have caused different locking behavior, somehow causing deadlocks on a busy system? My 12.2-STABLE system is post this patch.
^Triage: I'm sorry that this PR did not get addressed in a timely fashion. But by now, the versions that it was created against are out of support. Please re-open if it is still a problem on a supported version.