Setup: host: 14.0p2, 1900 threadripper (16 threads), 128G ram, 120T zfs vm: 15.0-CURRENT #5 main-n270190-75529910f77a: Thu May 16 19:08:46 EDT 2024 64G ram, 15 processors, 200G ZFS (zvol on host). Host runs poudreiere for <=14. VM runs 15 poudriere (not at same time). VM mounts /usr/local/poudriere via NFS. Multiple times, with at least one make-world on a new pull inbetween, the whole things stopps (it didn't used to, 2 to 3 months ago code). Seems to happen when this triple reversal happens: May 17 12:16:24 curpoud kernel: lock order reversal: May 17 12:16:24 curpoud kernel: 1st 0xfffff802dcae0230 nfs (nfs, lockmgr) @ /usr/src/sys/kern/vfs_subr.c:3298 May 17 12:16:24 curpoud kernel: 2nd 0xfffffe006a1633c8 bufwait (bufwait, lockmgr) @ /usr/src/sys/kern/vfs_subr.c:2442 May 17 12:16:24 curpoud kernel: lock order bufwait -> nfs established at: May 17 12:16:24 curpoud kernel: #0 0xffffffff80bb5ca5 at witness_checkorder+0x315 May 17 12:16:24 curpoud kernel: #1 0xffffffff80b0e962 at lockmgr_lock_flags+0x172 May 17 12:16:24 curpoud kernel: #2 0xffffffff80a1990c at nfs_lock+0x2c May 17 12:16:24 curpoud kernel: #3 0xffffffff80c25f50 at vop_sigdefer+0x30 May 17 12:16:24 curpoud kernel: #4 0xffffffff80c52783 at _vn_lock+0x53 May 17 12:16:24 curpoud kernel: #5 0xffffffff80c3a22d at vget_finish+0x4d May 17 12:16:24 curpoud kernel: #6 0xffffffff80c28b91 at vfs_hash_get+0xd1 May 17 12:16:24 curpoud kernel: #7 0xffffffff80a2293b at nfscl_nget+0x13b May 17 12:16:24 curpoud kernel: #8 0xffffffff80a0a2d8 at nfsrpc_readdirplus+0xa98 May 17 12:16:24 curpoud kernel: #9 0xffffffff80a150a0 at ncl_readdirplusrpc+0xf0 May 17 12:16:24 curpoud kernel: #10 0xffffffff80a26cbc at ncl_doio+0x47c May 17 12:16:24 curpoud kernel: #11 0xffffffff80a25e5f at ncl_bioread+0x5ef May 17 12:16:24 curpoud kernel: #12 0xffffffff80a19858 at nfs_readdir+0x1d8 May 17 12:16:24 curpoud kernel: #13 0xffffffff80c25f50 at vop_sigdefer+0x30 May 17 12:16:24 curpoud kernel: #14 0xffffffff811274a2 at VOP_READDIR_APV+0x32 May 17 12:16:24 curpoud kernel: #15 0xffffffff80c4f05e at kern_getdirentries+0x1ce May 17 12:16:24 curpoud kernel: #16 0xffffffff80c4f459 at sys_getdirentries+0x29 May 17 12:16:24 curpoud kernel: #17 0xffffffff8105f638 at amd64_syscall+0x158 May 17 12:16:24 curpoud kernel: lock order nfs -> bufwait attempted at: May 17 12:16:24 curpoud kernel: #0 0xffffffff80bb650b at witness_checkorder+0xb7b May 17 12:16:24 curpoud kernel: #1 0xffffffff80b0f06e at lockmgr_xlock_hard+0x6e May 17 12:16:24 curpoud kernel: #2 0xffffffff80b0f910 at __lockmgr_args+0x1e0 May 17 12:16:24 curpoud kernel: #3 0xffffffff80c388e0 at flushbuflist+0x110 May 17 12:16:24 curpoud kernel: #4 0xffffffff80c3859a at bufobj_invalbuf+0x8a May 17 12:16:24 curpoud kernel: #5 0xffffffff80a26f20 at ncl_vinvalbuf+0xf0 May 17 12:16:24 curpoud kernel: #6 0xffffffff80a17125 at nfs_open+0x1d5 May 17 12:16:24 curpoud kernel: #7 0xffffffff80c25f50 at vop_sigdefer+0x30 May 17 12:16:24 curpoud kernel: #8 0xffffffff811253cf at VOP_OPEN_APV+0x2f May 17 12:16:24 curpoud kernel: #9 0xffffffff80c52579 at vn_open_vnode+0x1b9 May 17 12:16:24 curpoud kernel: #10 0xffffffff80c51f78 at vn_open_cred+0x598 May 17 12:16:24 curpoud kernel: #11 0xffffffff80c48267 at openatfp+0x287 May 17 12:16:24 curpoud kernel: #12 0xffffffff80c47fbd at sys_openat+0x3d May 17 12:16:24 curpoud kernel: #13 0xffffffff8105f638 at amd64_syscall+0x158 May 17 12:16:24 curpoud kernel: #14 0xffffffff81030cdb at fast_syscall_common+0xf8 May 17 12:16:24 curpoud kernel: lock order reversal: May 17 12:16:24 curpoud kernel: 1st 0xfffffe01c4ec9e60 nfsupg (nfsupg, lockmgr) @ /usr/src/sys/fs/nfsclient/nfs_clsubs.c:151 May 17 12:16:24 curpoud kernel: 2nd 0xfffffe006a174e88 bufwait (bufwait, lockmgr) @ /usr/src/sys/kern/vfs_subr.c:2442 May 17 12:16:24 curpoud kernel: lock order nfsupg -> bufwait attempted at: May 17 12:16:24 curpoud kernel: #0 0xffffffff80bb650b at witness_checkorder+0xb7b May 17 12:16:24 curpoud kernel: #1 0xffffffff80b0f06e at lockmgr_xlock_hard+0x6e May 17 12:16:24 curpoud kernel: #2 0xffffffff80b0f910 at __lockmgr_args+0x1e0 May 17 12:16:24 curpoud kernel: #3 0xffffffff80c388e0 at flushbuflist+0x110 May 17 12:16:24 curpoud kernel: #4 0xffffffff80c3859a at bufobj_invalbuf+0x8a May 17 12:16:24 curpoud kernel: #5 0xffffffff80a26f20 at ncl_vinvalbuf+0xf0 May 17 12:16:24 curpoud kernel: #6 0xffffffff80a25bc8 at ncl_bioread+0x358 May 17 12:16:24 curpoud kernel: #7 0xffffffff80a19858 at nfs_readdir+0x1d8 May 17 12:16:24 curpoud kernel: #8 0xffffffff80c25f50 at vop_sigdefer+0x30 May 17 12:16:24 curpoud kernel: #9 0xffffffff811274a2 at VOP_READDIR_APV+0x32 May 17 12:16:24 curpoud kernel: #10 0xffffffff80c4f05e at kern_getdirentries+0x1ce May 17 12:16:24 curpoud kernel: #11 0xffffffff80c4f459 at sys_getdirentries+0x29 May 17 12:16:24 curpoud kernel: #12 0xffffffff8105f638 at amd64_syscall+0x158 May 17 12:16:24 curpoud kernel: #13 0xffffffff81030cdb at fast_syscall_common+0xf8
To add configuration details, the NFS mounts are as follows: vr:/usr/local/poudriere 5.2T 188K 5.2T 0% /d/vr/poudriere vr:/usr/local/poudriere/data 5.2T 205K 5.2T 0% /d/vr/poudriere/data vr:/usr/local/poudriere/data/.m 5.2T 427K 5.2T 0% /d/vr/poudriere/data/.m vr:/usr/local/poudriere/data/cache 5.2T 14G 5.2T 0% /d/vr/poudriere/data/cache vr:/usr/local/poudriere/data/images 5.2T 188K 5.2T 0% /d/vr/poudriere/data/images vr:/usr/local/poudriere/data/logs 5.2T 49G 5.2T 1% /d/vr/poudriere/data/logs vr:/usr/local/poudriere/data/packages 7.7T 2.5T 5.2T 33% /d/vr/poudriere/data/packages vr:/usr/local/poudriere/data/wrkdirs 5.2T 188K 5.2T 0% /d/vr/poudriere/data/wrkdirs vr:/var/cache/ccache 5.7T 568G 5.2T 10% /d/vr/ccache and are put to work thusly: [2:3:303]root@curpoud:~> ll /usr/local/poudriere/data/ total 3 lrwxr-xr-x 1 root wheel 23 Mar 16 2022 .m -> /d/vr/poudriere/data/.m lrwxr-xr-x 1 root wheel 26 Mar 16 2022 cache -> /d/vr/poudriere/data/cache drwxr-xr-x 2 root wheel 2 Mar 16 2022 images lrwxr-xr-x 1 root wheel 25 Mar 16 2022 logs -> /d/vr/poudriere/data/logs lrwxr-xr-x 1 root wheel 29 Mar 16 2022 packages -> /d/vr/poudriere/data/packages drwxr-xr-x 2 root wheel 2 Mar 16 2022 wrkdirs
I suppose I _might_ be burying the lead: after the LORs, I get a bunch of "fileid changed" nfs errors on the VM and then the NFS mounts are basically invalidated.
The first LoR looks as a false positive, mostly. It is readdirplus doing scan of the directory, and then querying the attributes of each found entry' node, except dotdot. So the lock order is parent vnode -> (it's buffers) -> child vnode, as expected by VFS. I said "mostly" because server can do the directory move like from A->B to B->A while client is not aware, and I am not completely sure that our client- side invalidations would avoid tricking in this situation. But this is very unlikely. For the second LoR, nfsupg->bufwait, the reported order is right. I wonder where the reversed order was recorded. That said, to diagnose the problem, you need to gather the information listed in the developer's handbook, debugging kernels, debugging deadlocks: https://docs.freebsd.org/en/books/developers-handbook/kerneldebug/#kerneldebug-deadlocks
(In reply to dgilbert from comment #2) "fileid changed" normally occurs when multiple NFS clients have the same /etc/hostid (due to cloning of system images or ???). This will confuse the server, since it will think the clients are the same client. If you have multiple clients doing NFSv4 mounts, make sure each one of them has a unique /etc/hostid. (If your NFS mounts are NFSv3, I do not have an explanation for "fileid changed". Long ago it was caused by a weird broken NFS cache device, but I doubt any of those still exists.) If your clients all have unique /etc/hostid's (and had them at mount time), and you still see problems, you could try reverting this commit: fbe965591f8a I don't see how it could cause a hang, but it is the only recent change related to directory reading.
It's only causing processes to hang, not the system (so no dump, unless I cause one). Mounts are nfsv3 ... and only w.r.t. that poudriere. The host is not running it's poudriere duing this.
(In reply to dgilbert from comment #5) Since the mounts are NFSv3, try reverting the commit: fbe965591f8a If you cannot do it easily via git, I can create a patch that does the reversion and put it here as an attachment.
I was successfully able to get git to jump through the hoop. Re-running poudriere now. Worth noting, suppose, that the LOR's are possibly not part of the issue --- the real issue being the disconnect of nfs. ... but we should know one way or another within 24h.
So the system, with patch, has definitely run the load for about 4x longer than it did before.
Ok, thanks for letting us know. I'll revert the commit (it is only in main and was basically just cleanup). I will leave this open for a while, since I suspect readahead just hides the lockups. (I'll think about the LORs.) Basically, readahead means that the read of the next directory block begins before the buffer lock is released for the previous block and before the vnode lock on the directory vnode is unlocked, if I recall the code correctly.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=c68db4608ef63534a001b55de80995e1c3442d2a commit c68db4608ef63534a001b55de80995e1c3442d2a Author: Rick Macklem <rmacklem@FreeBSD.org> AuthorDate: 2024-05-26 15:02:30 +0000 Commit: Rick Macklem <rmacklem@FreeBSD.org> CommitDate: 2024-05-26 15:02:30 +0000 Revert "nfscl: Do not do readahead for directories" The PR reported hangs that were avoided when this commit was reverted. Since it was only a cleanup, revert it. The LORs in the PR need further investigation, since I think readahead only hides the problem. PR: 279138 This reverts commit fbe965591f8a0a32c805a279a2505d4c20d22d26. sys/fs/nfsclient/nfs_clbio.c | 31 ++++++++++++++++++++++++++++++- 1 file changed, 30 insertions(+), 1 deletion(-)
Created attachment 251011 [details] core.txt from a panic that may be related. The headline, I suppose, is: panic: deadlres_td_sleep_q: possible deadlock detected for 0xfffff8000f59a740 (uma), blocked for 180134 ticks ... which seems to be coming from here: #10 0xffffffff80b41c20 in vpanic (fmt=0xffffffff811e2be0 "%s: possible deadlock detected for %p (%s), blocked for %d ticks\n", ap=ap@entry=0xfffffe015abc0e90) at /usr/src/sys/kern/kern_shutdown.c:967 buf = "deadlres_td_sleep_q: possible deadlock detected for 0xfffff8000f59a740 (uma), blocked for 180134 ticks\n", '\000' <repeats 152 times> __pc = 0x0 __pc = 0x0 __pc = 0x0 other_cpus = {__bits = {32703, 0 <repeats 15 times>}} td = 0xfffff8000591d740 bootopt = <optimized out> newpanic = <optimized out> #11 0xffffffff80b41aa3 in panic (fmt=0xffffffff81b88480 <cnputs_mtx> "G\366\023\201\377\377\377\377") at /usr/src/sys/kern/kern_shutdown.c:892 ap = {{gp_offset = 40, fp_offset = 48, overflow_arg_area = 0xfffffe015abc0ec0, reg_save_area = 0xfffffe015abc0e60}} #12 0xffffffff80ac9f0a in deadlkres () at /usr/src/sys/kern/kern_clock.c:197 tryl = <optimized out> slpticks = 180000 p = 0xfffffe0161230b00 td = 0xfffff8000f59a740 blkticks = <optimized out> #13 0xffffffff80af7152 in fork_exit (callout=0xffffffff80ac9be0 <deadlkres>, arg=0x0, frame=0xfffffe015abc0f40) at /usr/src/sys/kern/kern_fork.c:1151 __pc = 0x0 __pc = 0x0 td = 0xfffff8000591d740 p = 0xffffffff81b87120 <proc0> dtd = <optimized out> #14 <signal handler called> No locals.