Created attachment 239485 [details] textdump.tar.last Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 06 fault virtual address = 0x1ac fault code = supervisor write data, page not present instruction pointer = 0x20:0xffffffff80ce4f1c stack pointer = 0x28:0xfffffe00edb17cd0 frame pointer = 0x28:0xfffffe00edb17cd0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 1782 (nfsd: master) rdi: 0 rsi: 0 rdx: 0 rcx: fffffe0093c7c380 r8: 80 r9: 10000 rax: ffffffff rbx: fffffe00edb181a8 rbp: fffffe00edb17cd0 r10: 1 r11: 10000 r12: fffffe00edb17f78 r13: 0 r14: 1 r15: fffffe00edb17fd0 trap number = 12 panic: page fault cpuid = 3 time = 1673789502 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00edb17a90 vpanic() at vpanic+0x151/frame 0xfffffe00edb17ae0 panic() at panic+0x43/frame 0xfffffe00edb17b40 trap_fatal() at trap_fatal+0x409/frame 0xfffffe00edb17ba0 trap_pfault() at trap_pfault+0xab/frame 0xfffffe00edb17c00 calltrap() at calltrap+0x8/frame 0xfffffe00edb17c00 --- trap 0xc, rip = 0xffffffff80ce4f1c, rsp = 0xfffffe00edb17cd0, rbp = 0xfffffe00edb17cd0 --- vrele() at vrele+0xc/frame 0xfffffe00edb17cd0 nfsvno_open() at nfsvno_open+0x1d1/frame 0xfffffe00edb17d60 nfsrvd_open() at nfsrvd_open+0xe84/frame 0xfffffe00edb181f0 nfsrvd_dorpc() at nfsrvd_dorpc+0x1386/frame 0xfffffe00edb18400 nfssvc_program() at nfssvc_program+0x5e2/frame 0xfffffe00edb185e0 svc_run_internal() at svc_run_internal+0xa42/frame 0xfffffe00edb18710 svc_run() at svc_run+0x250/frame 0xfffffe00edb18770 nfsrvd_nfsd() at nfsrvd_nfsd+0x338/frame 0xfffffe00edb188d0 nfssvc_nfsd() at nfssvc_nfsd+0x524/frame 0xfffffe00edb18de0 sys_nfssvc() at sys_nfssvc+0xb0/frame 0xfffffe00edb18e00 amd64_syscall() at amd64_syscall+0x12e/frame 0xfffffe00edb18f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00edb18f30 --- syscall (155, FreeBSD ELF64, nfssvc), rip = 0x20bd03ff250a, rsp = 0x20bd0230b868, rbp = 0x20bd0230bb00 --- KDB: enter: panic
(In reply to Masachika ISHIZUKA from comment #0) I can avoid this panic by applying the patch of https://bugs.freebsd.org/bugzilla/attachment.cgi?id=239354 .
over to maintainer
this should already be fixed as of https://cgit.FreeBSD.org/src/commit/?id=dcfa3ee44da2b139f51a8aedb0f55735c6dfe3f3 are you still seeing the problem?
(In reply to Mateusz Guzik from comment #3) Yes. It panicked on dcfa3ee44da2b139f51a8aedb0f55735c6dfe3f3 and after. I can avoid this panic by applying the patch below. % diff -ruN /sys/fs/nfsserver/nfs_nfsdport.c.orig /sys/fs/nfsserver/nfs_nfsdport.c @@ -1918,7 +1918,7 @@ } } else { nfsvno_relpathbuf(ndp); - if (ndp->ni_startdir && create == NFSV4OPEN_CREATE) { + if (ndp->ni_startdir && done_namei && create == NFSV4OPEN_CREATE) { if (ndp->ni_dvp == ndp->ni_vp) vrele(ndp->ni_dvp); else
(In reply to Mateusz Guzik from comment #3) Sorry, the patch of comment #4 is incorrect. I wrote the patch on this reply by hand and I miss wrote. Yes. It panicked on dcfa3ee44da2b139f51a8aedb0f55735c6dfe3f3 and after. I can avoid this panic by applying the patch below. % diff -ruN /sys/fs/nfsserver/nfs_nfsdport.c.orig /sys/fs/nfsserver/nfs_nfsdport.c @@ -1918,7 +1918,7 @@ } } else { nfsvno_relpathbuf(ndp); - if (done_namei && create == NFSV4OPEN_CREATE) { + if (ndp->ni_startdir && done_namei && create == NFSV4OPEN_CREATE) { if (ndp->ni_dvp == ndp->ni_vp) vrele(ndp->ni_dvp); else
Created attachment 239922 [details] fix done_namei so it is set true exactly when ni_statdir is set non-NULL Please try this patch on a system with the done_namei patch instead of checking ni_startdir. With this patch, done_namei should be set true exactly when ni_startdir is set non-NULL. I, personally, do not understand the problem with using ni_startdir. Maybe mjg@ can explain that. Please let us know if this fixes the creashes?
Created attachment 239938 [details] textdump.tar.last Thank you for the patch. Unfortunately, this patch doesn't fix the crash.
Created attachment 239948 [details] Fix done_namei so it is only set when nd_repstat is set after nfsvno_namei Sorry about the previous failures. I looked at the code more carefully and came to the conclusion that the only time the code in that "if" near the end of nfsvno_open() needs to be executed is when nfsvno_namei() succeeds, but nd_repstat is set non-0 after the nfsvno_namei() call. I also noticed that nfsvno_relpathbuf() would be called twice for the unusual case of nfsrv_parsename() setting nd_repstat non-0. Hopefully this patch works correctly. It should be applied to what is in main.
Oops, don't test the patch I put here a few minutes ago. I got the fix for nfsvno_relpathbuf() wrong. I put an updated version here later to-day.
Created attachment 239962 [details] Fix done_namei so it is only set when nd_repstat is set after nfsvno_namei Ok, I think this version handles nfsvno_relpathbuf() correctly. It only executes the code near the end of nfsvno_open() when done_namei is true and it is only set true when nd_repstat is set after nfsvo_namei() has returned 0 (which implies the pathbuf is still allocated and ni_dvp, ni_vp are valid). Hopefully this one will fix the crashes.
(In reply to Rick Macklem from comment #10) Thank you for patch. Until yesterday, the reproducibility of the panic was 100%, but for some reason, from this morning, regardless of the presence or absence of the patch, the panic has stopped. Therefore, I cannot report whether this patch is effective or not at this time. In the future, I will report again when I make progress.
(In reply to Masachika ISHIZUKA from comment #11) Thank you for good patch. After that, my PC sometimes panicked on non patched GENERIC-main-n260675-de59f46a825b, but it never panicked on patched one. Also, the unpatched one cannot be shutdown (shutdown process hangs), but the patched one can shutdown normally.
(In reply to Masachika ISHIZUKA from comment #12) >γthe patched one can shutdown normally. This was a mistake. This is a separate issue from this bug report, so I will issue another bug report if I find out anything.
Just to be crystal clear, you are not getting crashes with the most recent patch? Sorry for the grief this has caused you. I should have looked at the code more carefully the first time instead of just trying to replace the use of ni_startdir (which stopped the crashes, but was broken for the case where the guarded create fails after nfsvno_namei() succeeds). And thanks for doing all the testing, rick.
(In reply to Rick Macklem from comment #14) Yes. It doesn't crash after applying the latest patch. And it sometimes crashes withouy the latest patch.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=ded5f2954e1a1bb7748646888938af767ee6257a commit ded5f2954e1a1bb7748646888938af767ee6257a Author: Rick Macklem <rmacklem@FreeBSD.org> AuthorDate: 2023-02-08 21:06:07 +0000 Commit: Rick Macklem <rmacklem@FreeBSD.org> CommitDate: 2023-02-08 21:08:51 +0000 nfsd: Fix handling of the error case for nfsvno_open Using done_namei instead of ni_startdir did not fix the crashes reported in the PR. Upon looking more closely at the code, the only case where the code near the end of nfsvno_open() needs to be executed is when nfsvno_namei() has succeeded, but a subsequent error was detected. This patch uses done_namei to indicate this case. Also, nfsvno_relpathbuf() should only be called for this case and not whenever nfsvno_open() is called with nd_repstat != 0. A bug was introduced here when the HASBUF flag was deleted. Reviewed by: mjg PR: 268971 Tested by: ish@amail.plala.or.jp MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D38430 sys/fs/nfsserver/nfs_nfsdport.c | 22 +++++++++++++--------- sys/fs/nfsserver/nfs_nfsdserv.c | 10 +++++----- 2 files changed, 18 insertions(+), 14 deletions(-)
The patch has been committed and will be MFC'd to stable/13 since it fixes things properly, whereas the use of ni_startdir did not.
(In reply to Rick Macklem from comment #17) Thank you very much. main-n260764-905ae5881bdc is working fine.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=3e230e0cc4ad822554afaaa07369ca5ccb62054d commit 3e230e0cc4ad822554afaaa07369ca5ccb62054d Author: Rick Macklem <rmacklem@FreeBSD.org> AuthorDate: 2023-02-11 03:34:57 +0000 Commit: Rick Macklem <rmacklem@FreeBSD.org> CommitDate: 2023-02-11 03:34:57 +0000 nfsd: Fix handling of the error case for nfsvno_open some more Commit ded5f2954e1a defined done_namei to indicate that nd_repstat was set after a successful nfsvno_namei(), so that a cleanup needs to be done in nfsvno_open(). However, it missed the case where a call to nfsrv_opencheck() in nfsvno_open() sets nd_repstat non-zero. This would cause panics due to a dangling locked vnode when nfsrv_opencheck() set nd_repstat, such as during grace just after a server boot. This patch fixes the problem. PR: 268971 sys/fs/nfsserver/nfs_nfsdport.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=5fd0916cdbfdbaad620f3d5e7ff80e3436c640ea commit 5fd0916cdbfdbaad620f3d5e7ff80e3436c640ea Author: Rick Macklem <rmacklem@FreeBSD.org> AuthorDate: 2023-02-11 15:14:08 +0000 Commit: Rick Macklem <rmacklem@FreeBSD.org> CommitDate: 2023-02-11 15:14:08 +0000 nfsd: Add a KASSERT in nfsvno_open Commit ded5f2954e1a defined done_namei to indicate that nd_repstat was set after a successful nfsvno_namei(), so that a cleanup needs to be done in nfsvno_open(). This only happens when nfsvno_namei() is done with CREATE. This patch adds a KASSERT() to check for that. PR: 268971 sys/fs/nfsserver/nfs_nfsdport.c | 2 ++ 1 file changed, 2 insertions(+)
This should now be fixed in main and the stable branches.