Summary: | UFS deadlock with install -S during freebsd-update | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | vfs-locking <Grau_Smue> | ||||||
Component: | kern | Assignee: | Konstantin Belousov <kib> | ||||||
Status: | Closed FIXED | ||||||||
Severity: | Affects Some People | CC: | jfc, kib, markj, marklmi26-fbsd | ||||||
Priority: | --- | ||||||||
Version: | Unspecified | ||||||||
Hardware: | amd64 | ||||||||
OS: | Any | ||||||||
Attachments: |
|
Description
vfs-locking
2024-10-31 18:14:56 UTC
Sorry, forgot to add this was UFS+SJ, no trim. You have thread(s) sleeping in waitrunningbufspace. This means that there are accumulated never finished writes. correction: install -S (which does sync per file, and likely a rename()) versus -C which may have less stringent behavior. re: waitrunningbufspace, I see no indications of disk activity once it hangs. No errors seem to show up anywhere either. Is this a situation where a dirty buffer can't be written without first reading from disk but the disk can't be read until some dirty buffers are cleaned up? I've heard of a similar hang with NFS, never with UFS. QUOTE the last change was going sync=standard(prior disabled) on the underlying host ZFS END QUOTE QUOTE (of comment #1) Sorry, forgot to add this was UFS+SJ, no trim END QUOTE So, overall, lack of I/O at the "host ZFS" level could lead to lack of I/O at the UFS+SJ level? Did you do any inspection of the status at the host ZFS level while the UFS+SJ level was hung? Might the ZFS level have been the source of the problem? (In reply to Mark Millard from comment #5) Notwithstanding that it looks like an I/O pileup, my impression is that there is something more. My previous 2 or 3 encounters on VBox were all on hosting ZFS with sync=disabled, and there were no errors on the host side at those times since it continued other VMs and I/O. I only tried sync=standard this time because I was out of good ideas in attempts to reproduce. On the VPS encounter a few weeks ago, there could very well have been limited I/O. But not zero. It was the VPS encounter that convinced me I was looking at something other than a VBox bug, which is what I had previously thought this was. I had changed from SCSI to AHCI in the guest, and even read about a fix in VBox that could have been my issue, until it happened again. From there, it felt like a locking or race bug to me, that was happening because of the high volume of fsync() and rename() calls via install -S. I know those are hard to reproduce, but thats why I was willing to try repeatedly. It'd be useful to see the value of the vfs.runningbufspace sysctl (probably just capture all output from "sysctl vfs") after the deadlock occurs. That'd let us see whether there's a missing runningbufspace wakeup, or whether I/O requests are truly getting stuck at some lower layer. (In reply to Mark Johnston from comment #7) Is that just in the context that sees UFS? Also the host context that has ZFS in use instead? (In reply to Mark Millard from comment #8) Just in the system which uses UFS. runningbufspace is a buffer cache write-throttling mechanism that isn't used by ZFS. Created attachment 254850 [details]
sysctl vfs from the deadlocked VM
So we have runningbufspace == hirunningbufspace == 1MB. That is a very low threshold. Meanwhile, we have maxphys == 1MB by default. What happens if bufwrite() tries to write a 1MB buffer? It'll bump runningbufspace, and if that was previously larger than hirunningbufspace, bufwait() will block waiting for runningbufspace to drop below lobufspace, but that'll never happen. Could you please try setting kern.maxphys=131072 in /boot/loader.conf, then reboot and try to reproduce the problem? How much RAM do these systems have? I'm not sure if the correct solution is to reduce maxphys on small RAM systems or to increase the minimum lo/hirunningbufspace watermarks to ensure that this deadlock can't happen. RAM has been 256 to 512 in 'real' encounters. This test VM was 384.(That was one setting I was moving around during repro attempts) > Could you please try setting kern.maxphys=131072 in /boot/loader.conf, then reboot and try to reproduce the problem?
Have you had a chance to try this?
I've set it on the relevant systems and they still work. :) I don't have a reliable reproducer, but my test VM completed freebsd-update. I think your theory of cause would have to stand as a basis for calling kern.maxphys=131072 as a fix. The question of which parameter to tweak as a fix should get some discussion; 13 and 14 should probably go with reduced performance if it doesn't break. 15 might go the opposite direction since a higher memory floor is not a surprising change. Looking at this again, I think the problem isn't with the general runningbufspace mechanism. There's a specific code path in the SU+J implementation where we can deadlock when hirunningspace is small. What happens is, we prepare to write a data buf, claim space for it in the runningbuf total, then call bufstrategy(), which in turn might dispatch journal I/O asynchronously, which can block on runningbufspace. Normally this happens in the context of the softdep flusher, which is exempt from the runningbufspace limit, but it can happen in a user thread as well: _sleep+0x1f0 waitrunningbufspace+0x76 bufwrite+0x24a softdep_process_journal+0x728 softdep_disk_io_initiation+0x79b ffs_geom_strategy+0x1f0 ufs_strategy+0x83 bufstrategy+0x36 bufwrite+0x1da cluster_wbuild+0x722 cluster_write+0x12f ffs_write+0x41d When hirunningspace == maxphys == runningbufspace, this recursive bufwrite() call will block forever. I suspect this bawrite() call in softdep_process_journal() should be gated on (curthread->td_pflags & TDP_NORUNNINGSPACE) != 0. The softdep flusher will continue to issue async writes, but if a user thread is forced to flush journal records in order to initiate I/O, it'll do so synchronously. (In reply to Mark Johnston from comment #15) This is worth a try. But I suggest to try a more invasive attempt then: convert all async writes into sync if there is not enough running space and the thread is not exempt from the running space accounting. I may be reading this too broadly, but I ask for the record: Would this risk reducing throughput at (for example) the hard disk level at exactly the time it needs it most? Or is this unavoidable because we've exhausted capacity for handling it the most efficient way? (In reply to vfs-locking from comment #17) The issue is relevant for low-resource configuration, where the default tuning appears to not solve the deadlock. There, the problem is to keep the system operational, instead of making it performing in the fastest possible way. https://reviews.freebsd.org/D47523 I decided to do it differently, mostly pretending that the thread that helps flushing the journal is temporary like a system thread. A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=46f02c4282ff76b66579c83be53ef441ea522536 commit 46f02c4282ff76b66579c83be53ef441ea522536 Author: Konstantin Belousov <kib@FreeBSD.org> AuthorDate: 2024-11-12 06:29:23 +0000 Commit: Konstantin Belousov <kib@FreeBSD.org> CommitDate: 2024-11-13 19:35:03 +0000 SU+J: all writes to SU journal must be exempt from runningbufspace throttling regardless whether they come from the system thread or initiated from a normal thread helping the system. If we block waiting for other writes, that writes might not finish because our journal updates block that. Set TDP_NORUNNINGBUF around softdep_process_journal(). Note: Another solution might be to use bwrite() instead of bawrite() if the current thread is subject to the runningbufspace limit. The exempt approach is used to be same as the bufdaemon. PR: 282449 Noted and reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week sys/ufs/ffs/ffs_softdep.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=d0b41249bfbe4481baec8f1659468ffbb30388ab commit d0b41249bfbe4481baec8f1659468ffbb30388ab Author: Konstantin Belousov <kib@FreeBSD.org> AuthorDate: 2024-11-12 06:24:03 +0000 Commit: Konstantin Belousov <kib@FreeBSD.org> CommitDate: 2024-11-13 19:35:02 +0000 bufwrite(): adjust the comment The statement about 'do not deadlock there' is false, since this write might need other writes to finish, which cannot be started due to runningbufspace. PR: 282449 Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 3 days sys/kern/vfs_bio.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=ff7de7aa49fa095ec006c81299845e1bac5d5335 commit ff7de7aa49fa095ec006c81299845e1bac5d5335 Author: Konstantin Belousov <kib@FreeBSD.org> AuthorDate: 2024-11-12 06:24:03 +0000 Commit: Konstantin Belousov <kib@FreeBSD.org> CommitDate: 2024-11-16 01:07:33 +0000 bufwrite(): adjust the comment PR: 282449 (cherry picked from commit d0b41249bfbe4481baec8f1659468ffbb30388ab) sys/kern/vfs_bio.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=9b2226eef29aa6ada92203ead672f124b911df5f commit 9b2226eef29aa6ada92203ead672f124b911df5f Author: Konstantin Belousov <kib@FreeBSD.org> AuthorDate: 2024-11-12 06:29:23 +0000 Commit: Konstantin Belousov <kib@FreeBSD.org> CommitDate: 2024-11-16 01:07:33 +0000 SU+J: all writes to SU journal must be exempt from runningbufspace throttling PR: 282449 (cherry picked from commit 46f02c4282ff76b66579c83be53ef441ea522536) sys/ufs/ffs/ffs_softdep.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) It it possible to merge this into stable/13 ahead of the 13.5 branch? A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=02a71cfa504ebab7b94722068f57c8f4bdd509e2 commit 02a71cfa504ebab7b94722068f57c8f4bdd509e2 Author: Konstantin Belousov <kib@FreeBSD.org> AuthorDate: 2024-11-12 06:29:23 +0000 Commit: Konstantin Belousov <kib@FreeBSD.org> CommitDate: 2025-01-07 18:31:56 +0000 SU+J: all writes to SU journal must be exempt from runningbufspace throttling PR: 282449 (cherry picked from commit 46f02c4282ff76b66579c83be53ef441ea522536) sys/ufs/ffs/ffs_softdep.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) Thanks! |