Created attachment 239342 [details] textdump.tar.1.xz Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 02 fault virtual address = 0x1ac fault code = supervisor write data, page not present instruction pointer = 0x20:0xffffffff80ce3c7c stack pointer = 0x28:0xfffffe00edb6fce0 frame pointer = 0x28:0xfffffe00edb6fce0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 1782 (nfsd: master) rdi: 0 rsi: 0 rdx: 0 rcx: fffffe0093c7c3c0 r8: 80 r9: 10000 rax: ffffffff rbx: fffffe00edb701a8 rbp: fffffe00edb6fce0 r10: 1 r11: 10000 r12: fffffe00edb6ff78 r13: 0 r14: 1 r15: fffffe00edb6ffd0 trap number = 12 panic: page fault cpuid = 1 time = 1673187203 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00edb6faa0 vpanic() at vpanic+0x151/frame 0xfffffe00edb6faf0 panic() at panic+0x43/frame 0xfffffe00edb6fb50 trap_fatal() at trap_fatal+0x409/frame 0xfffffe00edb6fbb0 trap_pfault() at trap_pfault+0xab/frame 0xfffffe00edb6fc10 calltrap() at calltrap+0x8/frame 0xfffffe00edb6fc10 --- trap 0xc, rip = 0xffffffff80ce3c7c, rsp = 0xfffffe00edb6fce0, rbp = 0xfffffe00edb6fce0 --- vrele() at vrele+0xc/frame 0xfffffe00edb6fce0 nfsvno_open() at nfsvno_open+0x1c7/frame 0xfffffe00edb6fd70 nfsrvd_open() at nfsrvd_open+0xe51/frame 0xfffffe00edb701f0 nfsrvd_dorpc() at nfsrvd_dorpc+0x1386/frame 0xfffffe00edb70400 nfssvc_program() at nfssvc_program+0x5e2/frame 0xfffffe00edb705e0 svc_run_internal() at svc_run_internal+0xa42/frame 0xfffffe00edb70710 svc_run() at svc_run+0x250/frame 0xfffffe00edb70770 nfsrvd_nfsd() at nfsrvd_nfsd+0x338/frame 0xfffffe00edb708d0 nfssvc_nfsd() at nfssvc_nfsd+0x524/frame 0xfffffe00edb70de0 sys_nfssvc() at sys_nfssvc+0xb0/frame 0xfffffe00edb70e00 amd64_syscall() at amd64_syscall+0x12e/frame 0xfffffe00edb70f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00edb70f30 --- syscall (155, FreeBSD ELF64, nfssvc), rip = 0x28e1eabcef2a, rsp = 0x28e1e8317598, rbp = 0x28e1e8317830 --- KDB: enter: panic
Created attachment 239354 [details] revert removal of a check for ni_startdir being non-NULL This patch reverts a small part of 65127e982b94, which I think caused this crash.
I believe this crash was caused by 65127e982b94 which removed a check for ni_startdir being non-NULL before doing vrele()s for the create case with nd_repstat set non-zero. The patch I attached reverts this change and, I think, will fix the crash. 651273982b94 is dated Nov. 10. I have no idea if your kernel version has this patch, but I suspect it does.
textdump.tar.1.xz at comment #0 comprises five files: % ls -hln total 1 -rw------- 1 1002 0 4.3K 8 Jan 14:13 config.txt -rw------- 1 1002 0 48K 8 Jan 14:13 ddb.txt -rw------- 1 1002 0 20K 8 Jan 14:13 msgbuf.txt -rw------- 1 1002 0 10B 8 Jan 14:13 panic.txt -rw------- 1 1002 0 154B 8 Jan 14:13 version.txt % (In reply to Rick Macklem from comment #2) Thank you, > … 651273982b94 is dated Nov. 10. … version.txt (above) gives us: FreeBSD 14.0-CURRENT #16 main-n259921-17f2b12a3877-dirty: Tue Jan 3 18:13:32 JST 2023 ishizuka@okra.ish.org:/usr/obj/usr/src/amd64.amd64/sys/SG_UDF2
(In reply to Rick Macklem from comment #2) Thank you to give me a patch. After applying this patch, it works fine.
Part of the point of the aformentioned change was to stop exporting ni_startdir. I probably should have added some poisononing of that value so that this would have been found sooner. Anyhow, for vrele to be called in this case both dvp and vp have to be null, which means the lookup failed to find anything? Should this code even be reachable in such a case? As a quick hack the following will probably do it: diff --git a/sys/fs/nfsserver/nfs_nfsdport.c b/sys/fs/nfsserver/nfs_nfsdport.c index 665e2c00ce08..012f84f8befa 100644 --- a/sys/fs/nfsserver/nfs_nfsdport.c +++ b/sys/fs/nfsserver/nfs_nfsdport.c @@ -1918,7 +1918,7 @@ nfsvno_open(struct nfsrv_descript *nd, struct nameidata *ndp, } } else { nfsvno_relpathbuf(ndp); - if (create == NFSV4OPEN_CREATE) { + if (create == NFSV4OPEN_CREATE && ndp->ni_dvp != NULL) { if (ndp->ni_dvp == ndp->ni_vp) vrele(ndp->ni_dvp); else
I'm not sure if your patch will work, since it depends on ni_dvp being set to NULL long before any call to nfsvno_namei()/vfs_lookup(). That code could be added, but I will need to look to see it gets set non-NULL in the correct place. What about ni_rootdir. Can I use that one? Basically, it is a test "has nfsvno_namei() been called", done by setting the field NULL at the beginning of the function and then setting in non-NULL in nfsvno_namei() at the correct place (ni_rootdir gets set in the same place as ni_startdir).
Created attachment 239370 [details] Check ni_dvp is not NULL before attempting vrele() This version of the patch should fix the problem as well. It does what mjg@ suggested plus initializes ni_dvp NULL. This is necessary, since nfsrv_parsename() can set nd_repstat so that nfsvno_namei() does not get called (it calls vfs_lookup(), which would set ni_dvp). This avoids use of ni_startdir as requested by mjg@. If anyone can test this patch variant, that would be appreciated.
as there is only one consumer of nfsrvd_open, perhaps you can denote that namei was called in the struct you pass around? admittedly the ni_dvp thing is also a giant hack, the content of nameidata should be considered indeterminate if namei was not called
Created attachment 239398 [details] Add an argument to nfsvno_open() to indicate if nfsvno_namei() was called This version of the patch uses a new argument to nfsvno_open() to indicate if nfsvno_namei() has been called before nfsvno_open(), then only do the vrele() if it has been called (which indicates that ni_dvp will be properly set). Maybe this version will make mjg@ happy?
as long as it does not depend on internal namei state (which ni_stardir now is) it is not my business :) I'll see about hiding ni_stardir behind a setter and otherwise making it not available.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=dcfa3ee44da2b139f51a8aedb0f55735c6dfe3f3 commit dcfa3ee44da2b139f51a8aedb0f55735c6dfe3f3 Author: Rick Macklem <rmacklem@FreeBSD.org> AuthorDate: 2023-01-13 00:45:26 +0000 Commit: Rick Macklem <rmacklem@FreeBSD.org> CommitDate: 2023-01-13 00:48:53 +0000 nfsserver: Fix vrele() panic in nfsvno_open() Commit 65127e982b94 removed a check for ni_startdir != NULL. This allowed the vrele(ndp->ni_dvp) to be called with a NULL argument. This patch adds a new boolean argument to nfsvno_open() that can be checked instead of ni_startdir, since mjg@ requested that ni_startdir not be used. (Discussed in PR#268828.) PR: 268828 Reviewed by: mjg Differential Revision: https://reviews.freebsd.org/D38032 sys/fs/nfs/nfs_var.h | 2 +- sys/fs/nfsserver/nfs_nfsdport.c | 4 ++-- sys/fs/nfsserver/nfs_nfsdserv.c | 13 ++++++++----- 3 files changed, 11 insertions(+), 8 deletions(-)
Patch has been committed.