Bug 203877 - NFS threads get blocked when writing to ZFS dataset that has reached quota
Summary: NFS threads get blocked when writing to ZFS dataset that has reached quota
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.3-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-zfs-devel mailing list
Depends on:
Reported: 2015-10-19 18:43 UTC by Garrett Wollman
Modified: 2017-08-17 17:25 UTC (History)
1 user (show)

See Also:

procstat -kk -a output (229.59 KB, text/plain)
2017-08-17 17:25 UTC, Garrett Wollman
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Garrett Wollman freebsd_committer 2015-10-19 18:43:40 UTC
Figured it was about time to actually get this into the bug database, since I've made no progress figuring it out.

We've seen this bug for a long time, since at least 9.2.  Given a set of clients (in a cluster) that write logs to a file or a small set of files in NFS (v3), ignoring write errors, and a FreeBSD NFS server using ZFS as a backing store, with the particular ZFS dataset being below quota when the files are initially opened for write, but reaching its quota while the clients are actively writing, eventually something inside ZFS slows to a crawl (synchronized to txg openings, perhaps?).  When this happens, all of the NFS service threads eventually get tied up deep in ZFS and no new requests can be processed from any client, even the non-misbehaving ones.  FHA makes this happen faster, but it will still happen even with FHA disabled.
Comment 1 Garrett Wollman freebsd_committer 2015-10-19 18:45:38 UTC
(I finally figured out that it was somewhere deep in ZFS by running a "procstat -a -kk | fgrep nfsd" last time this happened -- but of course it always happens when we need to get the server back up and serving files, so we just increased the quota on the problem dataset.)
Comment 2 Garrett Wollman freebsd_committer 2017-08-17 17:24:20 UTC
Updating, since 10.3 still has the problem.
Comment 3 Garrett Wollman freebsd_committer 2017-08-17 17:25:40 UTC
Created attachment 185534 [details]
procstat -kk -a output