Bug 282449

Summary:

UFS deadlock with install -S during freebsd-update

Product:

Base System

Reporter:

vfs-locking <Grau_Smue>

Component:

kern

Assignee:

Konstantin Belousov <kib>

Status:

Closed FIXED

Severity:

Affects Some People

CC:

jfc, kib, markj, marklmi26-fbsd

Priority:

---

Version:

Unspecified

Hardware:

amd64

OS:

Any

Attachments:

Description	Flags
procstat -kka on the system with hung sync and install process	none
sysctl vfs from the deadlocked VM	none

Description vfs-locking 2024-10-31 18:14:56 UTC

Created attachment 254825 [details]
procstat -kka on the system with hung sync and install process

I have experienced deadlock requiring reset a couple of times during freebsd-update.  The install -C command gets stuck in wdrain.  A sync executed at this time also never returns.

I found a thread on forums.freebsd.org which seems to be the same problem, however it also seems to be implicated in extremely slow freebsd-update when the filesystem is ZFS.  https://forums.freebsd.org/threads/freebsd-13-2-release-14-0-release-upgrade-stuck.91152/

I eventually was able to reproduce the problem and get a procstat -kka, which is attached.  I can not say that it is reproducible on demand, but I can certainly try if additional debug is needed.  This was about my 30th try, and the last change was going sync=standard(prior disabled) on the underlying host ZFS.

This was on 13.3-p7, going to 13.4-RELEASE.  However, it has been experienced under other kernels earlier and later for the past year.  It seems to be hardware independent, and I have run into it on a couple of VMs, and a VPS with varying cpu count and disk controllers.  This particular case is VirtualBox 6.1.50, 1 cpu, ahci, on top of a ZFS with sync=standard and no hostiocache in vbox.  Hostsystem is same kernel, on ahci supermicro with spinner.

Comment 1 vfs-locking 2024-10-31 18:21:30 UTC

Sorry, forgot to add this was UFS+SJ, no trim.

Comment 2 Konstantin Belousov freebsd_committer

2024-10-31 19:16:07 UTC

You have thread(s) sleeping in waitrunningbufspace.
This means that there are accumulated never finished writes.

Comment 3 vfs-locking 2024-10-31 19:22:36 UTC

correction: install -S (which does sync per file, and likely a rename()) versus -C which may have less stringent behavior.

re: waitrunningbufspace, I see no indications of disk activity once it hangs.  No errors seem to show up anywhere either.

Comment 4 John F. Carr 2024-10-31 19:26:38 UTC

Is this a situation where a dirty buffer can't be written without first reading from disk but the disk can't be read until some dirty buffers are cleaned up?  I've heard of a similar hang with NFS, never with UFS.

Comment 5 Mark Millard 2024-10-31 21:57:29 UTC

QUOTE
the last change was going sync=standard(prior disabled) on the
underlying host ZFS
END QUOTE

QUOTE (of comment #1)
Sorry, forgot to add this was UFS+SJ, no trim
END QUOTE

So, overall, lack of I/O at the "host ZFS" level could lead
to lack of I/O at the UFS+SJ level?

Did you do any inspection of the status at the host ZFS level
while the UFS+SJ level was hung? Might the ZFS level have
been the source of the problem?

Comment 6 vfs-locking 2024-11-01 01:38:35 UTC

(In reply to Mark Millard from comment #5)
Notwithstanding that it looks like an I/O pileup, my impression is that there is something more.  My previous 2 or 3 encounters on VBox were all on hosting ZFS with sync=disabled, and there were no errors on the host side at those times since it continued other VMs and I/O.  I only tried sync=standard this time because I was out of good ideas in attempts to reproduce.

On the VPS encounter a few weeks ago, there could very well have been limited I/O.  But not zero.  It was the VPS encounter that convinced me I was looking at something other than a VBox bug, which is what I had previously thought this was.  I had changed from SCSI to AHCI in the guest, and even read about a fix in VBox that could have been my issue, until it happened again.

From there, it felt like a locking or race bug to me, that was happening because of the high volume of fsync() and rename() calls via install -S.  I know those are hard to reproduce, but thats why I was willing to try repeatedly.

Comment 7 Mark Johnston freebsd_committer

2024-11-01 13:56:12 UTC

It'd be useful to see the value of the vfs.runningbufspace sysctl (probably just capture all output from "sysctl vfs") after the deadlock occurs.  That'd let us see whether there's a missing runningbufspace wakeup, or whether I/O requests are truly getting stuck at some lower layer.

Comment 8 Mark Millard 2024-11-01 15:08:36 UTC

(In reply to Mark Johnston from comment #7)

Is that just in the context that sees UFS? Also the host context that
has ZFS in use instead?

Comment 9 Mark Johnston freebsd_committer

2024-11-01 15:11:42 UTC

(In reply to Mark Millard from comment #8)
Just in the system which uses UFS.  runningbufspace is a buffer cache write-throttling mechanism that isn't used by ZFS.

Comment 10 vfs-locking 2024-11-01 17:22:18 UTC

Created attachment 254850 [details]
sysctl vfs from the deadlocked VM

Comment 11 Mark Johnston freebsd_committer

2024-11-01 20:34:15 UTC

So we have runningbufspace == hirunningbufspace == 1MB.  That is a very low threshold.  Meanwhile, we have maxphys == 1MB by default.  What happens if bufwrite() tries to write a 1MB buffer?  It'll bump runningbufspace, and if that was previously larger than hirunningbufspace, bufwait() will block waiting for runningbufspace to drop below lobufspace, but that'll never happen.

Could you please try setting kern.maxphys=131072 in /boot/loader.conf, then reboot and try to reproduce the problem?

How much RAM do these systems have?  I'm not sure if the correct solution is to reduce maxphys on small RAM systems or to increase the minimum lo/hirunningbufspace watermarks to ensure that this deadlock can't happen.

Comment 12 vfs-locking 2024-11-01 21:37:32 UTC

RAM has been 256 to 512 in 'real' encounters.  This test VM was 384.(That was one setting I was moving around during repro attempts)

Comment 13 Mark Johnston freebsd_committer

2024-11-07 14:44:56 UTC

> Could you please try setting kern.maxphys=131072 in /boot/loader.conf, then reboot and try to reproduce the problem?

Have you had a chance to try this?

Comment 14 vfs-locking 2024-11-07 19:53:38 UTC

I've set it on the relevant systems and they still work.  :)

I don't have a reliable reproducer, but my test VM completed freebsd-update.  I think your theory of cause would have to stand as a basis for calling kern.maxphys=131072 as a fix.

The question of which parameter to tweak as a fix should get some discussion; 13 and 14 should probably go with reduced performance if it doesn't break.  15 might go the opposite direction since a higher memory floor is not a surprising change.

Comment 15 Mark Johnston freebsd_committer

2024-11-10 22:35:27 UTC

Looking at this again, I think the problem isn't with the general runningbufspace mechanism.  There's a specific code path in the SU+J implementation where we can deadlock when hirunningspace is small.

What happens is, we prepare to write a data buf, claim space for it in the runningbuf total, then call bufstrategy(), which in turn might dispatch journal I/O asynchronously, which can block on runningbufspace.  Normally this happens in the context of the softdep flusher, which is exempt from the runningbufspace limit, but it can happen in a user thread as well:

_sleep+0x1f0 waitrunningbufspace+0x76 bufwrite+0x24a softdep_process_journal+0x728 softdep_disk_io_initiation+0x79b ffs_geom_strategy+0x1f0 ufs_strategy+0x83 bufstrategy+0x36 bufwrite+0x1da cluster_wbuild+0x722 cluster_write+0x12f ffs_write+0x41d

When hirunningspace == maxphys == runningbufspace, this recursive bufwrite() call will block forever.

I suspect this bawrite() call in softdep_process_journal() should be gated on (curthread->td_pflags & TDP_NORUNNINGSPACE) != 0.  The softdep flusher will continue to issue async writes, but if a user thread is forced to flush journal records in order to initiate I/O, it'll do so synchronously.

Comment 16 Konstantin Belousov freebsd_committer

2024-11-10 23:21:09 UTC

(In reply to Mark Johnston from comment #15)
This is worth a try.  But I suggest to try a more invasive attempt then: convert
all async writes into sync if there is not enough running space and the thread
is not exempt from the running space accounting.

Comment 17 vfs-locking 2024-11-11 20:43:38 UTC

I may be reading this too broadly, but I ask for the record:

Would this risk reducing throughput at (for example) the hard disk level at exactly the time it needs it most?  Or is this unavoidable because we've exhausted capacity for handling it the most efficient way?

Comment 18 Konstantin Belousov freebsd_committer

2024-11-11 22:34:08 UTC

(In reply to vfs-locking from comment #17)
The issue is relevant for low-resource configuration, where the default tuning
appears to not solve the deadlock.  There, the problem is to keep the system
operational, instead of making it performing in the fastest possible way.

Comment 19 Konstantin Belousov freebsd_committer

2024-11-12 06:37:50 UTC

https://reviews.freebsd.org/D47523

I decided to do it differently, mostly pretending that the thread that helps
flushing the journal is temporary like a system thread.

Comment 20 commit-hook freebsd_committer

2024-11-13 19:36:43 UTC

A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=46f02c4282ff76b66579c83be53ef441ea522536

commit 46f02c4282ff76b66579c83be53ef441ea522536
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2024-11-12 06:29:23 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2024-11-13 19:35:03 +0000

    SU+J: all writes to SU journal must be exempt from runningbufspace throttling

    regardless whether they come from the system thread or initiated from a
    normal thread helping the system.  If we block waiting for other writes,
    that writes might not finish because our journal updates block that.

    Set TDP_NORUNNINGBUF around softdep_process_journal().

    Note: Another solution might be to use bwrite() instead of bawrite() if the
    current thread is subject to the runningbufspace limit.  The exempt
    approach is used to be same as the bufdaemon.

    PR:     282449
    Noted and reviewed by:  markj
    Tested by:      pho
    Sponsored by:   The FreeBSD Foundation
    MFC after:      1 week

 sys/ufs/ffs/ffs_softdep.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comment 21 commit-hook freebsd_committer

2024-11-13 19:36:45 UTC

A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=d0b41249bfbe4481baec8f1659468ffbb30388ab

commit d0b41249bfbe4481baec8f1659468ffbb30388ab
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2024-11-12 06:24:03 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2024-11-13 19:35:02 +0000

    bufwrite(): adjust the comment

    The statement about 'do not deadlock there' is false, since this write
    might need other writes to finish, which cannot be started due to
    runningbufspace.

    PR:     282449
    Reviewed by:    markj
    Sponsored by:   The FreeBSD Foundation
    MFC after:      3 days

 sys/kern/vfs_bio.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

Comment 22 commit-hook freebsd_committer

2024-11-16 01:42:00 UTC

A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=ff7de7aa49fa095ec006c81299845e1bac5d5335

commit ff7de7aa49fa095ec006c81299845e1bac5d5335
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2024-11-12 06:24:03 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2024-11-16 01:07:33 +0000

    bufwrite(): adjust the comment

    PR:     282449

    (cherry picked from commit d0b41249bfbe4481baec8f1659468ffbb30388ab)

 sys/kern/vfs_bio.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

Comment 23 commit-hook freebsd_committer

2024-11-16 01:42:01 UTC

A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=9b2226eef29aa6ada92203ead672f124b911df5f

commit 9b2226eef29aa6ada92203ead672f124b911df5f
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2024-11-12 06:29:23 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2024-11-16 01:07:33 +0000

    SU+J: all writes to SU journal must be exempt from runningbufspace throttling

    PR:     282449

    (cherry picked from commit 46f02c4282ff76b66579c83be53ef441ea522536)

 sys/ufs/ffs/ffs_softdep.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comment 24 vfs-locking 2025-01-07 16:55:07 UTC

It it possible to merge this into stable/13 ahead of the 13.5 branch?

Comment 25 commit-hook freebsd_committer

2025-01-07 18:34:00 UTC

A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=02a71cfa504ebab7b94722068f57c8f4bdd509e2

commit 02a71cfa504ebab7b94722068f57c8f4bdd509e2
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2024-11-12 06:29:23 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2025-01-07 18:31:56 +0000

    SU+J: all writes to SU journal must be exempt from runningbufspace throttling

    PR:     282449

    (cherry picked from commit 46f02c4282ff76b66579c83be53ef441ea522536)

 sys/ufs/ffs/ffs_softdep.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comment 26 vfs-locking 2025-01-08 16:07:47 UTC

Thanks!