I have two servers FreeBSD 10.1-RELEASE and 10.1-STABLE. There are some zfs pools on it with raidz2 and raidz3. I have the same problem on them. When I was removing 1.3 TB file from zfs system hangs after 20-30 minutes. There was no one error in console, but I was forced to reset server. After boot I waited 30-90 minutes before system was able to mount zfs datasets. At that time HDD of pool was blinking. After that file has disappeared and system works well. The same hangs appeared when I destroy dataset with large file. This behavior repeated on files bigger 1TB on both servers. What is the problem? Thanks for any help! There is discussion on FreeBsd forums: https://forums.freebsd.org/threads/zfs-hangs-while-removing-large-file.51054
ZFS has a quirk that all indirect blocks of a file are read when the file is destroyed. That can be a lot of bytes and take a lot of time for such large files. Perhaps this is what you are seeing.
When system hangs I see freezing ssh-sessions, istgt and nfsd daemons stops answering requests, and no any reaction on keyboard in server console.
What does gstat -d -p look like at this time?
gstat hangs with a whole system. Last freezing data gstat -d -p:before hang: dT: 1.002s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps ms/d %busy Name 0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da0 0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da1 0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da2 0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da3 0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da4 0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da5 0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da6 0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da7 0 553 553 510 0.3 0 0 0.0 0 0 0.0 17.0| da8 0 563 563 518 0.3 0 0 0.0 0 0 0.0 16.5| da9 0 564 564 520 0.4 0 0 0.0 0 0 0.0 19.9| da10 0 562 562 516 0.3 0 0 0.0 0 0 0.0 17.3| da11 0 548 548 500 0.3 0 0 0.0 0 0 0.0 16.4| da12 0 544 544 497 0.3 0 0 0.0 0 0 0.0 18.0| da13 0 549 549 503 0.3 0 0 0.0 0 0 0.0 19.0| da14 0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da15 0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da16 0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da17 0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da18 0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da19 0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da20 0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da21 0 561 561 511 0.6 0 0 0.0 0 0 0.0 31.7| da22 0 1 0 0 0.0 1 4 0.2 0 0 0.0 0.0| ada0 0 1 0 0 0.0 1 4 0.2 0 0 0.0 0.0| ada1 0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| cd0 da8-da14,da22 are disks in the pool where I delete big file.
(In reply to Yuriy Tabolin from comment #4) Yuriy, if you want to try your development skills you might want to try to adapt a patch from here https://reviews.csiden.org/r/218/ to your source code tree and test if the patch helps in your situation.
Change in base r284593 should help with this problem.
A commit references this bug: Author: avg Date: Mon Jul 6 10:40:54 UTC 2015 New revision: 285202 URL: https://svnweb.freebsd.org/changeset/base/285202 Log: MFC r284593: MFV r284412: 5911 ZFS "hangs" while deleting file illumos/illumos-gate@46e1baa6cf6d5432f5fd231bb588df8f9570c858 https://www.illumos.org/issues/5911 Sometimes ZFS appears to hang while deleting a file. It is actually making slow progress at the file deletion, but other operations (administrative and writes via the data path) "hang" until the file removal completes, which can take a long time if the file has many blocks. The deletion (or most of it) happens in a single txg, and the sync thread spends most of its time reading indirect blocks... Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com> Reviewed by: Alek Pinchuk <alek@nexenta.com> Reviewed by: Simon Klinkert <simon.klinkert@gmail.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com> PR: 199775 Approved by: re(kib) Changes: _U stable/10/ stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode_sync.c stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dbuf.h
A commit references this bug: Author: avg Date: Mon Jul 6 10:41:32 UTC 2015 New revision: 285203 URL: https://svnweb.freebsd.org/changeset/base/285203 Log: MFC r284593: MFV r284412: 5911 ZFS "hangs" while deleting file illumos/illumos-gate@46e1baa6cf6d5432f5fd231bb588df8f9570c858 https://www.illumos.org/issues/5911 Sometimes ZFS appears to hang while deleting a file. It is actually making slow progress at the file deletion, but other operations (administrative and writes via the data path) "hang" until the file removal completes, which can take a long time if the file has many blocks. The deletion (or most of it) happens in a single txg, and the sync thread spends most of its time reading indirect blocks... Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com> Reviewed by: Alek Pinchuk <alek@nexenta.com> Reviewed by: Simon Klinkert <simon.klinkert@gmail.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com> PR: 199775 Changes: _U stable/9/sys/ _U stable/9/sys/cddl/contrib/opensolaris/ stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode_sync.c stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dbuf.h
The problem should be cured now. Please test.