Bug 279138 - NFS and NFSUPG and BUFWAIT
Summary: NFS and NFSUPG and BUFWAIT
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 15.0-CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: Rick Macklem
URL:
Keywords: regression
Depends on:
Blocks:
 
Reported: 2024-05-18 19:17 UTC by David Gilbert
Modified: 2024-05-27 04:18 UTC (History)
3 users (show)

See Also:


Attachments
core.txt from a panic that may be related. (211.73 KB, text/plain)
2024-05-27 04:18 UTC, David Gilbert
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description David Gilbert 2024-05-18 19:17:49 UTC
Setup:

host: 14.0p2, 1900 threadripper (16 threads), 128G ram, 120T zfs

vm: 15.0-CURRENT #5 main-n270190-75529910f77a: Thu May 16 19:08:46 EDT 2024
    64G ram, 15 processors, 200G ZFS (zvol on host).

Host runs poudreiere for <=14.  VM runs 15 poudriere (not at same time).  VM mounts /usr/local/poudriere via NFS.

Multiple times, with at least one make-world on a new pull inbetween, the whole things stopps (it didn't used to, 2 to 3 months ago code).  Seems to happen when this triple reversal happens:

May 17 12:16:24 curpoud kernel: lock order reversal:
May 17 12:16:24 curpoud kernel:  1st 0xfffff802dcae0230 nfs (nfs, lockmgr) @ /usr/src/sys/kern/vfs_subr.c:3298
May 17 12:16:24 curpoud kernel:  2nd 0xfffffe006a1633c8 bufwait (bufwait, lockmgr) @ /usr/src/sys/kern/vfs_subr.c:2442
May 17 12:16:24 curpoud kernel: lock order bufwait -> nfs established at:
May 17 12:16:24 curpoud kernel: #0 0xffffffff80bb5ca5 at witness_checkorder+0x315
May 17 12:16:24 curpoud kernel: #1 0xffffffff80b0e962 at lockmgr_lock_flags+0x172
May 17 12:16:24 curpoud kernel: #2 0xffffffff80a1990c at nfs_lock+0x2c
May 17 12:16:24 curpoud kernel: #3 0xffffffff80c25f50 at vop_sigdefer+0x30
May 17 12:16:24 curpoud kernel: #4 0xffffffff80c52783 at _vn_lock+0x53
May 17 12:16:24 curpoud kernel: #5 0xffffffff80c3a22d at vget_finish+0x4d
May 17 12:16:24 curpoud kernel: #6 0xffffffff80c28b91 at vfs_hash_get+0xd1
May 17 12:16:24 curpoud kernel: #7 0xffffffff80a2293b at nfscl_nget+0x13b
May 17 12:16:24 curpoud kernel: #8 0xffffffff80a0a2d8 at nfsrpc_readdirplus+0xa98
May 17 12:16:24 curpoud kernel: #9 0xffffffff80a150a0 at ncl_readdirplusrpc+0xf0
May 17 12:16:24 curpoud kernel: #10 0xffffffff80a26cbc at ncl_doio+0x47c
May 17 12:16:24 curpoud kernel: #11 0xffffffff80a25e5f at ncl_bioread+0x5ef
May 17 12:16:24 curpoud kernel: #12 0xffffffff80a19858 at nfs_readdir+0x1d8
May 17 12:16:24 curpoud kernel: #13 0xffffffff80c25f50 at vop_sigdefer+0x30
May 17 12:16:24 curpoud kernel: #14 0xffffffff811274a2 at VOP_READDIR_APV+0x32
May 17 12:16:24 curpoud kernel: #15 0xffffffff80c4f05e at kern_getdirentries+0x1ce
May 17 12:16:24 curpoud kernel: #16 0xffffffff80c4f459 at sys_getdirentries+0x29
May 17 12:16:24 curpoud kernel: #17 0xffffffff8105f638 at amd64_syscall+0x158
May 17 12:16:24 curpoud kernel: lock order nfs -> bufwait attempted at:
May 17 12:16:24 curpoud kernel: #0 0xffffffff80bb650b at witness_checkorder+0xb7b
May 17 12:16:24 curpoud kernel: #1 0xffffffff80b0f06e at lockmgr_xlock_hard+0x6e
May 17 12:16:24 curpoud kernel: #2 0xffffffff80b0f910 at __lockmgr_args+0x1e0
May 17 12:16:24 curpoud kernel: #3 0xffffffff80c388e0 at flushbuflist+0x110
May 17 12:16:24 curpoud kernel: #4 0xffffffff80c3859a at bufobj_invalbuf+0x8a
May 17 12:16:24 curpoud kernel: #5 0xffffffff80a26f20 at ncl_vinvalbuf+0xf0
May 17 12:16:24 curpoud kernel: #6 0xffffffff80a17125 at nfs_open+0x1d5
May 17 12:16:24 curpoud kernel: #7 0xffffffff80c25f50 at vop_sigdefer+0x30
May 17 12:16:24 curpoud kernel: #8 0xffffffff811253cf at VOP_OPEN_APV+0x2f
May 17 12:16:24 curpoud kernel: #9 0xffffffff80c52579 at vn_open_vnode+0x1b9
May 17 12:16:24 curpoud kernel: #10 0xffffffff80c51f78 at vn_open_cred+0x598
May 17 12:16:24 curpoud kernel: #11 0xffffffff80c48267 at openatfp+0x287
May 17 12:16:24 curpoud kernel: #12 0xffffffff80c47fbd at sys_openat+0x3d
May 17 12:16:24 curpoud kernel: #13 0xffffffff8105f638 at amd64_syscall+0x158
May 17 12:16:24 curpoud kernel: #14 0xffffffff81030cdb at fast_syscall_common+0xf8
May 17 12:16:24 curpoud kernel: lock order reversal:
May 17 12:16:24 curpoud kernel:  1st 0xfffffe01c4ec9e60 nfsupg (nfsupg, lockmgr) @ /usr/src/sys/fs/nfsclient/nfs_clsubs.c:151
May 17 12:16:24 curpoud kernel:  2nd 0xfffffe006a174e88 bufwait (bufwait, lockmgr) @ /usr/src/sys/kern/vfs_subr.c:2442
May 17 12:16:24 curpoud kernel: lock order nfsupg -> bufwait attempted at:
May 17 12:16:24 curpoud kernel: #0 0xffffffff80bb650b at witness_checkorder+0xb7b
May 17 12:16:24 curpoud kernel: #1 0xffffffff80b0f06e at lockmgr_xlock_hard+0x6e
May 17 12:16:24 curpoud kernel: #2 0xffffffff80b0f910 at __lockmgr_args+0x1e0
May 17 12:16:24 curpoud kernel: #3 0xffffffff80c388e0 at flushbuflist+0x110
May 17 12:16:24 curpoud kernel: #4 0xffffffff80c3859a at bufobj_invalbuf+0x8a
May 17 12:16:24 curpoud kernel: #5 0xffffffff80a26f20 at ncl_vinvalbuf+0xf0
May 17 12:16:24 curpoud kernel: #6 0xffffffff80a25bc8 at ncl_bioread+0x358
May 17 12:16:24 curpoud kernel: #7 0xffffffff80a19858 at nfs_readdir+0x1d8
May 17 12:16:24 curpoud kernel: #8 0xffffffff80c25f50 at vop_sigdefer+0x30
May 17 12:16:24 curpoud kernel: #9 0xffffffff811274a2 at VOP_READDIR_APV+0x32
May 17 12:16:24 curpoud kernel: #10 0xffffffff80c4f05e at kern_getdirentries+0x1ce
May 17 12:16:24 curpoud kernel: #11 0xffffffff80c4f459 at sys_getdirentries+0x29
May 17 12:16:24 curpoud kernel: #12 0xffffffff8105f638 at amd64_syscall+0x158
May 17 12:16:24 curpoud kernel: #13 0xffffffff81030cdb at fast_syscall_common+0xf8
Comment 1 David Gilbert 2024-05-18 19:20:37 UTC
To add configuration details, the NFS mounts are as follows:

vr:/usr/local/poudriere                                              5.2T    188K    5.2T     0%    /d/vr/poudriere
vr:/usr/local/poudriere/data                                         5.2T    205K    5.2T     0%    /d/vr/poudriere/data
vr:/usr/local/poudriere/data/.m                                      5.2T    427K    5.2T     0%    /d/vr/poudriere/data/.m
vr:/usr/local/poudriere/data/cache                                   5.2T     14G    5.2T     0%    /d/vr/poudriere/data/cache
vr:/usr/local/poudriere/data/images                                  5.2T    188K    5.2T     0%    /d/vr/poudriere/data/images
vr:/usr/local/poudriere/data/logs                                    5.2T     49G    5.2T     1%    /d/vr/poudriere/data/logs
vr:/usr/local/poudriere/data/packages                                7.7T    2.5T    5.2T    33%    /d/vr/poudriere/data/packages
vr:/usr/local/poudriere/data/wrkdirs                                 5.2T    188K    5.2T     0%    /d/vr/poudriere/data/wrkdirs
vr:/var/cache/ccache                                                 5.7T    568G    5.2T    10%    /d/vr/ccache

and are put to work thusly:

[2:3:303]root@curpoud:~> ll /usr/local/poudriere/data/
total 3
lrwxr-xr-x  1 root wheel 23 Mar 16  2022 .m -> /d/vr/poudriere/data/.m
lrwxr-xr-x  1 root wheel 26 Mar 16  2022 cache -> /d/vr/poudriere/data/cache
drwxr-xr-x  2 root wheel  2 Mar 16  2022 images
lrwxr-xr-x  1 root wheel 25 Mar 16  2022 logs -> /d/vr/poudriere/data/logs
lrwxr-xr-x  1 root wheel 29 Mar 16  2022 packages -> /d/vr/poudriere/data/packages
drwxr-xr-x  2 root wheel  2 Mar 16  2022 wrkdirs
Comment 2 David Gilbert 2024-05-18 19:23:15 UTC
I suppose I _might_ be burying the lead: after the LORs, I get a bunch of "fileid changed" nfs errors on the VM and then the NFS mounts are basically invalidated.
Comment 3 Konstantin Belousov freebsd_committer freebsd_triage 2024-05-18 22:59:13 UTC
The first LoR looks as a false positive, mostly.  It is readdirplus doing
scan of the directory, and then querying the attributes of each found entry'
node, except dotdot.  So the lock order is parent vnode -> (it's buffers) ->
child vnode, as expected by VFS.

I said "mostly" because server can do the directory move like from A->B to
B->A while client is not aware, and I am not completely sure that our client-
side invalidations would avoid tricking in this situation.  But this is very
unlikely.

For the second LoR, nfsupg->bufwait, the reported order is right.  I wonder
where the reversed order was recorded.

That said, to diagnose the problem, you need to gather the information listed
in the developer's handbook, debugging kernels, debugging deadlocks:
https://docs.freebsd.org/en/books/developers-handbook/kerneldebug/#kerneldebug-deadlocks
Comment 4 Rick Macklem freebsd_committer freebsd_triage 2024-05-18 23:39:48 UTC
(In reply to dgilbert from comment #2)
"fileid changed" normally occurs when multiple NFS
clients have the same /etc/hostid (due to cloning of
system images or ???). This will confuse the server,
since it will think the clients are the same client.

If you have multiple clients doing NFSv4 mounts, make
sure each one of them has a unique /etc/hostid.
(If your NFS mounts are NFSv3, I do not have an explanation
for "fileid changed". Long ago it was caused by a weird
broken NFS cache device, but I doubt any of those still exists.)

If your clients all have unique /etc/hostid's (and had them at
mount time), and you still see problems,
you could try reverting this commit:
fbe965591f8a
I don't see how it could cause a hang, but it is the only
recent change related to directory reading.
Comment 5 David Gilbert 2024-05-19 04:22:17 UTC
It's only causing processes to hang, not the system (so no dump, unless I cause one).

Mounts are nfsv3 ... and only w.r.t. that poudriere.  The host is not running it's poudriere duing this.
Comment 6 Rick Macklem freebsd_committer freebsd_triage 2024-05-19 14:06:17 UTC
(In reply to dgilbert from comment #5)
Since the mounts are NFSv3, try reverting the commit:
fbe965591f8a

If you cannot do it easily via git, I can create a patch
that does the reversion and put it here as an attachment.
Comment 7 David Gilbert 2024-05-19 20:34:54 UTC
I was successfully able to get git to jump through the hoop.  Re-running poudriere now.  Worth noting, suppose, that the LOR's are possibly not part of the issue --- the real issue being the disconnect of nfs.

... but we should know one way or another within 24h.
Comment 8 David Gilbert 2024-05-21 19:05:14 UTC
So the system, with patch, has definitely run the load for about 4x longer than it did before.
Comment 9 Rick Macklem freebsd_committer freebsd_triage 2024-05-21 23:57:18 UTC
Ok, thanks for letting us know.
I'll revert the commit (it is only in main
and was basically just cleanup).

I will leave this open for a while, since I
suspect readahead just hides the lockups.
(I'll think about the LORs.)

Basically, readahead means that the read of the
next directory block begins before the buffer lock
is released for the previous block and before the
vnode lock on the directory vnode is unlocked, if
I recall the code correctly.
Comment 10 commit-hook freebsd_committer freebsd_triage 2024-05-26 15:06:02 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=c68db4608ef63534a001b55de80995e1c3442d2a

commit c68db4608ef63534a001b55de80995e1c3442d2a
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2024-05-26 15:02:30 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2024-05-26 15:02:30 +0000

    Revert "nfscl: Do not do readahead for directories"

    The PR reported hangs that were avoided when this commit was
    reverted.  Since it was only a cleanup, revert it.
    The LORs in the PR need further investigation, since I think
    readahead only hides the problem.

    PR:     279138
    This reverts commit fbe965591f8a0a32c805a279a2505d4c20d22d26.

 sys/fs/nfsclient/nfs_clbio.c | 31 ++++++++++++++++++++++++++++++-
 1 file changed, 30 insertions(+), 1 deletion(-)
Comment 11 David Gilbert 2024-05-27 04:18:18 UTC
Created attachment 251011 [details]
core.txt from a panic that may be related.

The headline, I suppose, is:

panic: deadlres_td_sleep_q: possible deadlock detected for 0xfffff8000f59a740 (uma), blocked for 180134 ticks

... which seems to be coming from here:

#10 0xffffffff80b41c20 in vpanic (fmt=0xffffffff811e2be0 "%s: possible deadlock detected for %p (%s), blocked for %d ticks\n",
    ap=ap@entry=0xfffffe015abc0e90) at /usr/src/sys/kern/kern_shutdown.c:967
        buf = "deadlres_td_sleep_q: possible deadlock detected for 0xfffff8000f59a740 (uma), blocked for 180134 ticks\n", '\000'
 <repeats 152 times>
        __pc = 0x0
        __pc = 0x0
        __pc = 0x0
        other_cpus = {__bits = {32703, 0 <repeats 15 times>}}
        td = 0xfffff8000591d740
        bootopt = <optimized out>
        newpanic = <optimized out>
#11 0xffffffff80b41aa3 in panic (fmt=0xffffffff81b88480 <cnputs_mtx> "G\366\023\201\377\377\377\377")
    at /usr/src/sys/kern/kern_shutdown.c:892
        ap = {{gp_offset = 40, fp_offset = 48, overflow_arg_area = 0xfffffe015abc0ec0, reg_save_area = 0xfffffe015abc0e60}}
#12 0xffffffff80ac9f0a in deadlkres () at /usr/src/sys/kern/kern_clock.c:197
        tryl = <optimized out>
        slpticks = 180000
        p = 0xfffffe0161230b00
        td = 0xfffff8000f59a740
        blkticks = <optimized out>
#13 0xffffffff80af7152 in fork_exit (callout=0xffffffff80ac9be0 <deadlkres>, arg=0x0, frame=0xfffffe015abc0f40)
    at /usr/src/sys/kern/kern_fork.c:1151
        __pc = 0x0
        __pc = 0x0
        td = 0xfffff8000591d740
        p = 0xffffffff81b87120 <proc0>
        dtd = <optimized out>
#14 <signal handler called>
No locals.