Summary: | ZFS hangs while removing large file | ||
---|---|---|---|
Product: | Base System | Reporter: | Yuriy Tabolin <danmer> |
Component: | kern | Assignee: | Andriy Gapon <avg> |
Status: | Closed FIXED | ||
Severity: | Affects Some People | CC: | avg, smh |
Priority: | --- | ||
Version: | 10.1-STABLE | ||
Hardware: | amd64 | ||
OS: | Any |
Description
Yuriy Tabolin
2015-04-29 10:21:57 UTC
ZFS has a quirk that all indirect blocks of a file are read when the file is destroyed. That can be a lot of bytes and take a lot of time for such large files. Perhaps this is what you are seeing. When system hangs I see freezing ssh-sessions, istgt and nfsd daemons stops answering requests, and no any reaction on keyboard in server console. What does gstat -d -p look like at this time? gstat hangs with a whole system. Last freezing data gstat -d -p:before hang: dT: 1.002s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps ms/d %busy Name 0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da0 0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da1 0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da2 0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da3 0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da4 0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da5 0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da6 0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da7 0 553 553 510 0.3 0 0 0.0 0 0 0.0 17.0| da8 0 563 563 518 0.3 0 0 0.0 0 0 0.0 16.5| da9 0 564 564 520 0.4 0 0 0.0 0 0 0.0 19.9| da10 0 562 562 516 0.3 0 0 0.0 0 0 0.0 17.3| da11 0 548 548 500 0.3 0 0 0.0 0 0 0.0 16.4| da12 0 544 544 497 0.3 0 0 0.0 0 0 0.0 18.0| da13 0 549 549 503 0.3 0 0 0.0 0 0 0.0 19.0| da14 0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da15 0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da16 0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da17 0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da18 0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da19 0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da20 0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da21 0 561 561 511 0.6 0 0 0.0 0 0 0.0 31.7| da22 0 1 0 0 0.0 1 4 0.2 0 0 0.0 0.0| ada0 0 1 0 0 0.0 1 4 0.2 0 0 0.0 0.0| ada1 0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| cd0 da8-da14,da22 are disks in the pool where I delete big file. (In reply to Yuriy Tabolin from comment #4) Yuriy, if you want to try your development skills you might want to try to adapt a patch from here https://reviews.csiden.org/r/218/ to your source code tree and test if the patch helps in your situation. Change in base r284593 should help with this problem. A commit references this bug: Author: avg Date: Mon Jul 6 10:40:54 UTC 2015 New revision: 285202 URL: https://svnweb.freebsd.org/changeset/base/285202 Log: MFC r284593: MFV r284412: 5911 ZFS "hangs" while deleting file illumos/illumos-gate@46e1baa6cf6d5432f5fd231bb588df8f9570c858 https://www.illumos.org/issues/5911 Sometimes ZFS appears to hang while deleting a file. It is actually making slow progress at the file deletion, but other operations (administrative and writes via the data path) "hang" until the file removal completes, which can take a long time if the file has many blocks. The deletion (or most of it) happens in a single txg, and the sync thread spends most of its time reading indirect blocks... Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com> Reviewed by: Alek Pinchuk <alek@nexenta.com> Reviewed by: Simon Klinkert <simon.klinkert@gmail.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com> PR: 199775 Approved by: re(kib) Changes: _U stable/10/ stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode_sync.c stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dbuf.h A commit references this bug: Author: avg Date: Mon Jul 6 10:41:32 UTC 2015 New revision: 285203 URL: https://svnweb.freebsd.org/changeset/base/285203 Log: MFC r284593: MFV r284412: 5911 ZFS "hangs" while deleting file illumos/illumos-gate@46e1baa6cf6d5432f5fd231bb588df8f9570c858 https://www.illumos.org/issues/5911 Sometimes ZFS appears to hang while deleting a file. It is actually making slow progress at the file deletion, but other operations (administrative and writes via the data path) "hang" until the file removal completes, which can take a long time if the file has many blocks. The deletion (or most of it) happens in a single txg, and the sync thread spends most of its time reading indirect blocks... Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com> Reviewed by: Alek Pinchuk <alek@nexenta.com> Reviewed by: Simon Klinkert <simon.klinkert@gmail.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com> PR: 199775 Changes: _U stable/9/sys/ _U stable/9/sys/cddl/contrib/opensolaris/ stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode_sync.c stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dbuf.h The problem should be cured now. Please test. |