Bug 199775

Summary: ZFS hangs while removing large file
Product: Base System Reporter: Yuriy Tabolin <danmer>
Component: kernAssignee: Andriy Gapon <avg>
Status: Closed FIXED    
Severity: Affects Some People CC: avg, smh
Priority: ---    
Version: 10.1-STABLE   
Hardware: amd64   
OS: Any   

Description Yuriy Tabolin 2015-04-29 10:21:57 UTC
I have two servers FreeBSD 10.1-RELEASE and 10.1-STABLE. There are some zfs pools on it with raidz2 and raidz3. I have the same problem on them. When I was removing 1.3 TB file from zfs system hangs after 20-30 minutes. There was no one error in console, but I was forced to reset server. After boot I waited 30-90 minutes before system was able to mount zfs datasets. At that time HDD of pool was blinking. After that file has disappeared and system works well. The same hangs appeared when I destroy dataset with large file.

This behavior repeated on files bigger 1TB on both servers. What is the problem? Thanks for any help!

There is discussion on FreeBsd forums:
https://forums.freebsd.org/threads/zfs-hangs-while-removing-large-file.51054
Comment 1 Andriy Gapon freebsd_committer freebsd_triage 2015-04-29 18:56:35 UTC
ZFS has a quirk that all indirect blocks of a file are read when the file is destroyed.  That can be a lot of bytes and take a lot of time for such large files.  Perhaps this is what you are seeing.
Comment 2 Yuriy Tabolin 2015-04-30 14:02:57 UTC
When system hangs I see freezing ssh-sessions, istgt and nfsd daemons stops answering requests, and no any reaction on keyboard in server console.
Comment 3 Steven Hartland freebsd_committer freebsd_triage 2015-05-01 09:32:20 UTC
What does gstat -d -p look like at this time?
Comment 4 Yuriy Tabolin 2015-05-12 12:51:14 UTC
gstat hangs with a whole system. Last freezing data gstat -d -p:before hang:

dT: 1.002s  w: 1.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s   kBps   ms/d   %busy Name
    0      0      0      0    0.0      0      0    0.0      0      0    0.0    0.0| da0
    0      0      0      0    0.0      0      0    0.0      0      0    0.0    0.0| da1
    0      0      0      0    0.0      0      0    0.0      0      0    0.0    0.0| da2
    0      0      0      0    0.0      0      0    0.0      0      0    0.0    0.0| da3
    0      0      0      0    0.0      0      0    0.0      0      0    0.0    0.0| da4
    0      0      0      0    0.0      0      0    0.0      0      0    0.0    0.0| da5
    0      0      0      0    0.0      0      0    0.0      0      0    0.0    0.0| da6
    0      0      0      0    0.0      0      0    0.0      0      0    0.0    0.0| da7
    0    553    553    510    0.3      0      0    0.0      0      0    0.0   17.0| da8
    0    563    563    518    0.3      0      0    0.0      0      0    0.0   16.5| da9
    0    564    564    520    0.4      0      0    0.0      0      0    0.0   19.9| da10
    0    562    562    516    0.3      0      0    0.0      0      0    0.0   17.3| da11
    0    548    548    500    0.3      0      0    0.0      0      0    0.0   16.4| da12
    0    544    544    497    0.3      0      0    0.0      0      0    0.0   18.0| da13
    0    549    549    503    0.3      0      0    0.0      0      0    0.0   19.0| da14
    0      0      0      0    0.0      0      0    0.0      0      0    0.0    0.0| da15
    0      0      0      0    0.0      0      0    0.0      0      0    0.0    0.0| da16
    0      0      0      0    0.0      0      0    0.0      0      0    0.0    0.0| da17
    0      0      0      0    0.0      0      0    0.0      0      0    0.0    0.0| da18
    0      0      0      0    0.0      0      0    0.0      0      0    0.0    0.0| da19
    0      0      0      0    0.0      0      0    0.0      0      0    0.0    0.0| da20
    0      0      0      0    0.0      0      0    0.0      0      0    0.0    0.0| da21
    0    561    561    511    0.6      0      0    0.0      0      0    0.0   31.7| da22
    0      1      0      0    0.0      1      4    0.2      0      0    0.0    0.0| ada0
    0      1      0      0    0.0      1      4    0.2      0      0    0.0    0.0| ada1
    0      0      0      0    0.0      0      0    0.0      0      0    0.0    0.0| cd0

da8-da14,da22 are disks in the pool where I delete big file.
Comment 5 Andriy Gapon freebsd_committer freebsd_triage 2015-05-12 14:31:43 UTC
(In reply to Yuriy Tabolin from comment #4)
Yuriy, if you want to try your development skills you might want to try to adapt a patch from here https://reviews.csiden.org/r/218/ to your source code tree and test if the patch helps in your situation.
Comment 6 Andriy Gapon freebsd_committer freebsd_triage 2015-07-03 10:19:46 UTC
Change in base r284593 should help with this problem.
Comment 7 commit-hook freebsd_committer freebsd_triage 2015-07-06 10:41:25 UTC
A commit references this bug:

Author: avg
Date: Mon Jul  6 10:40:54 UTC 2015
New revision: 285202
URL: https://svnweb.freebsd.org/changeset/base/285202

Log:
  MFC r284593: MFV r284412: 5911 ZFS "hangs" while deleting file

  illumos/illumos-gate@46e1baa6cf6d5432f5fd231bb588df8f9570c858
  https://www.illumos.org/issues/5911
  Sometimes ZFS appears to hang while deleting a file. It is actually
  making slow progress at the file deletion, but other operations
  (administrative and writes via the data path) "hang" until the file
  removal completes, which can take a long time if the file has many
  blocks. The deletion (or most of it) happens in a single txg, and the
  sync thread spends most of its time reading indirect blocks...

  Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
  Reviewed by: Alek Pinchuk <alek@nexenta.com>
  Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
  Reviewed by: Dan McDonald <danmcd@omniti.com>
  Approved by: Richard Lowe <richlowe@richlowe.net>
  Author: Matthew Ahrens <mahrens@delphix.com>

  PR:	199775
  Approved by:	re(kib)

Changes:
_U  stable/10/
  stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c
  stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c
  stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c
  stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode_sync.c
  stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dbuf.h
Comment 8 commit-hook freebsd_committer freebsd_triage 2015-07-06 10:42:28 UTC
A commit references this bug:

Author: avg
Date: Mon Jul  6 10:41:32 UTC 2015
New revision: 285203
URL: https://svnweb.freebsd.org/changeset/base/285203

Log:
  MFC r284593: MFV r284412: 5911 ZFS "hangs" while deleting file

  illumos/illumos-gate@46e1baa6cf6d5432f5fd231bb588df8f9570c858
  https://www.illumos.org/issues/5911
  Sometimes ZFS appears to hang while deleting a file. It is actually
  making slow progress at the file deletion, but other operations
  (administrative and writes via the data path) "hang" until the file
  removal completes, which can take a long time if the file has many
  blocks. The deletion (or most of it) happens in a single txg, and the
  sync thread spends most of its time reading indirect blocks...

  Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
  Reviewed by: Alek Pinchuk <alek@nexenta.com>
  Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
  Reviewed by: Dan McDonald <danmcd@omniti.com>
  Approved by: Richard Lowe <richlowe@richlowe.net>
  Author: Matthew Ahrens <mahrens@delphix.com>

  PR:	199775

Changes:
_U  stable/9/sys/
_U  stable/9/sys/cddl/contrib/opensolaris/
  stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c
  stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c
  stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c
  stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode_sync.c
  stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dbuf.h
Comment 9 Andriy Gapon freebsd_committer freebsd_triage 2015-07-08 07:27:16 UTC
The problem should be cured now.
Please test.