Hi, when I'm trying to zfs-send a filesystem from one machine to another, the receiving end gets stuck with zfskern spinning one CPU core. No observable problems sending incremental streams. Here's what top shows (truncated) on the receiver: > last pid: 2848; load averages: 1.08, 1.08, 1.04 > 243 processes: 5 running, 220 sleeping, 18 waiting > CPU: 0.0% user, 0.0% nice, 50.6% system, 0.0% interrupt, 49.4% idle > Mem: 8904K Active, 77M Inact, 551M Wired, 5310M Free > ARC: 284M Total, 104M MFU, 46M MRU, 2448K Anon, 1979K Header, 131M Other > Swap: 8192M Total, 8192M Free > PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 5 root -8 - 0K 128K CPU0 0 252:00 100.00% zfskern{solthread 0xffff} > 11 root 155 ki31 0K 32K RUN 1 225:09 91.89% idle{idle: cpu1} > 11 root 155 ki31 0K 32K RUN 0 51:02 7.86% idle{idle: cpu0} > 0 root -16 - 0K 2496K swapin 1 0:18 0.00% kernel{swapper} > 16 root 16 - 0K 16K syncer 1 0:16 0.00% syncer > 12 root -92 - 0K 288K WAIT 1 0:09 0.00% intr{irq257: virtio_p} > 12 root -60 - 0K 288K WAIT 1 0:07 0.00% intr{swi4: clock} > 15 root -16 - 0K 16K vlruwt 1 0:02 0.00% vnlru > 6 root -16 - 0K 32K psleep 1 0:02 0.00% pagedaemon{pagedaemon} > 14 root -16 - 0K 16K RUN 1 0:02 0.00% rand_harvestq > 5 root -8 - 0K 128K tx->tx 1 0:02 0.00% zfskern{txg_thread_enter} > 1806 root 40 0 44420K 3692K rwa.cv 1 0:01 0.00% zfs And here's procstat for zfskern on the receiver: > #procstat -kk 5 > PID TID COMM TDNAME KSTACK > 5 100044 zfskern arc_reclaim_thre mi_switch+0xe1 sleepq_timedwait+0x3a _cv_timedwait_sbt+0x19e arc_reclaim_thread+0x2be fork_exit+0x9a fork_trampoline+0xe > 5 100045 zfskern arc_user_evicts_ mi_switch+0xe1 sleepq_timedwait+0x3a _cv_timedwait_sbt+0x19e arc_user_evicts_thread+0x17d fork_exit+0x9a fork_trampoline+0xe > 5 100046 zfskern l2arc_feed_threa mi_switch+0xe1 sleepq_timedwait+0x3a _cv_timedwait_sbt+0x19e l2arc_feed_thread+0xc73 fork_exit+0x9a fork_trampoline+0xe > 5 100322 zfskern trim seppel mi_switch+0xe1 sleepq_timedwait+0x3a _cv_timedwait_sbt+0x19e trim_thread+0x126 fork_exit+0x9a fork_trampoline+0xe > 5 100334 zfskern txg_thread_enter mi_switch+0xe1 sleepq_wait+0x3a _cv_wait+0x17d txg_quiesce_thread+0x16b fork_exit+0x9a fork_trampoline+0xe > 5 100335 zfskern txg_thread_enter mi_switch+0xe1 sleepq_timedwait+0x3a _cv_timedwait_sbt+0x19e txg_sync_thread+0x160 fork_exit+0x9a fork_trampoline+0xe > 5 100425 zfskern solthread 0xffff <running> The sender is running (custom trimmed-down GENERIC kernel): > FreeBSD XXX 10.3-STABLE FreeBSD 10.3-STABLE #1 r308740: Sat Nov 19 21:15:27 GMT 2016 root@XXX:/usr/obj/usr/src/sys/XXX amd64 And the receiver is running (a differently trimmed GENERIC kernel): > FreeBSD YYY 10.3-RELEASE-p15 FreeBSD 10.3-RELEASE-p15 #9 r310507: Sat Dec 24 21:22:15 UTC 2016 root@XXX:/path/usr/src/sys/YYY amd64 The file systems that zfs-send just fine are clones of snapshots. These are send as incremental streams. The problematic one is a fresh zfs-create'd file system, with only a few small files in it. The command used to send/receive, initiated from the sender-side, is: > /sbin/zfs send -v -R senderpool/myrootfs | /usr/bin/gzip | /usr/bin/ssh root@${HOST} "/usr/bin/gunzip | /sbin/zfs recv -v -ud receiverpool" And what I thus get in the console (from zfs recv -v) is (trimmed): > found clone origin receiverpool/base@20.46-r310507 > receiving incremental stream of senderpool/myrootfs@clean-install into receiverpool/myrootfs@clean-install > received 886KB stream in 3 seconds (295KB/sec) > receiving full stream of senderpool/myrootfs/freshfs@clean-install into receiverpool/myrootfs/freshfs@clean-install > [...stuck here...] The receiving end seems to be running fine with zfskern spinning. But it will never finish the the filesystem in question. Any ideas what might be going on, or what to do about it? Thanks, Johannes
I think I hit this same thing doing a non-incremental send from FreeBSD 11.1 to FreeBSD 10.3: PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 11 root 8 155 ki31 0K 128K CPU7 7 ??? 600.00% [idle] 3 root 8 -8 - 0K 144K CPU4 4 362:30 199.76% [zfskern] The 'zfs recv' command is stuck on the waitchan: rwa.cv load: 2.97 cmd: zfs 779 [rwa.cv] 85.90r 0.00u 0.41s 0% 2876k load: 2.40 cmd: zfs 779 [rwa.cv] 1049.93r 0.00u 0.41s 0% 2876k #procstat -kk 78197 779 PID TID COMM TDNAME KSTACK 78197 103920 zfs - mi_switch+0xe1 sleepq_wait+0x3a _cv_wait+0x17d dmu_recv_stream+0x9db zfs_ioc_recv+0x96d zfsdev_ioctl+0x63d devfs_ioctl_f+0x139 kern_ioctl+0x255 sys_ioctl+0x140 amd64_syscall+0x40f Xfast_syscall+0xfb 779 102626 zfs - mi_switch+0xe1 sleepq_wait+0x3a _cv_wait+0x17d dmu_recv_stream+0x9db zfs_ioc_recv+0x96d zfsdev_ioctl+0x63d devfs_ioctl_f+0x139 kern_ioctl+0x255 sys_ioctl+0x140 amd64_syscall+0x40f Xfast_syscall+0xfb
procstat for zfskern: #procstat -kk 3 PID TID COMM TDNAME KSTACK 3 100069 zfskern arc_reclaim_thre mi_switch+0xe1 sleepq_timedwait+0x3a _cv_timedwait_sbt+0x19e arc_reclaim_thread+0x28d fork_exit+0x9a fork_trampoline+0xe 3 100070 zfskern arc_user_evicts_ mi_switch+0xe1 sleepq_timedwait+0x3a _cv_timedwait_sbt+0x19e arc_user_evicts_thread+0x13c fork_exit+0x9a fork_trampoline+0xe 3 100071 zfskern l2arc_feed_threa mi_switch+0xe1 sleepq_timedwait+0x3a _cv_timedwait_sbt+0x19e l2arc_feed_thread+0x16f fork_exit+0x9a fork_trampoline+0xe 3 100375 zfskern trim zstore mi_switch+0xe1 sleepq_timedwait+0x3a _cv_timedwait_sbt+0x19e trim_thread+0x9e fork_exit+0x9a fork_trampoline+0xe 3 100417 zfskern txg_thread_enter mi_switch+0xe1 sleepq_wait+0x3a _cv_wait+0x17d txg_quiesce_thread+0x1ab fork_exit+0x9a fork_trampoline+0xe 3 100418 zfskern txg_thread_enter mi_switch+0xe1 sleepq_timedwait+0x3a _cv_timedwait_sbt+0x19e txg_sync_thread+0x1dc fork_exit+0x9a fork_trampoline+0xe 3 101617 zfskern solthread 0xffff <running> 3 101772 zfskern solthread 0xffff <running>
(In reply to Allan Jude from comment #2) It would be interesting to see where the spinning threads actually spin. Either DTrace profiling or hwpmc could be used for that.
Hi, Any update on this issue if it's fixed or not? It could be possible that this issue is related to https://github.com/zfsonlinux/zfs/pull/6616. Any indication if the pull is already backported/merged into a FreeBSD version? I seem to have the same problem as described by Allan Jude. zfs send/receive on a FreeBSD 11.1 sender to FreeBSD 10.3 receiver Thanks
FreeBSD 10.3 is end-of-life now, so I think this is overcome by events.
Per the latest comments and a lack of any new reports.
(In reply to Andriy Gapon from comment #6) Hi, I've resolved the issue by compiling the a new kernel with ZFS patches and I've also made a PR (and accepted) for OpenZFS so this issue shouldn't resurface anymore.
(In reply to staticwizard from comment #7) Could you please share the PR link for posterity? Thank you!
(In reply to Andriy Gapon from comment #8) https://github.com/openzfs/openzfs/pull/685