Bug 268971 - panic: main-n260091-06b93ef8cda2 panicked (Fatal trap 12: page fault while in kernel mode)
Summary: panic: main-n260091-06b93ef8cda2 panicked (Fatal trap 12: page fault while in...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: Rick Macklem
URL:
Keywords: crash
Depends on:
Blocks:
 
Reported: 2023-01-15 13:45 UTC by Masachika ISHIZUKA
Modified: 2023-12-28 23:11 UTC (History)
4 users (show)

See Also:
rmacklem: mfc-stable13?


Attachments
textdump.tar.last (16.38 KB, application/x-xz)
2023-01-15 13:45 UTC, Masachika ISHIZUKA
no flags Details
fix done_namei so it is set true exactly when ni_statdir is set non-NULL (5.49 KB, patch)
2023-02-05 16:30 UTC, Rick Macklem
no flags Details | Diff
textdump.tar.last (15.01 KB, application/x-xz)
2023-02-06 05:50 UTC, Masachika ISHIZUKA
no flags Details
Fix done_namei so it is only set when nd_repstat is set after nfsvno_namei (1.24 KB, patch)
2023-02-06 15:14 UTC, Rick Macklem
no flags Details | Diff
Fix done_namei so it is only set when nd_repstat is set after nfsvno_namei (2.11 KB, patch)
2023-02-07 01:15 UTC, Rick Macklem
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Masachika ISHIZUKA 2023-01-15 13:45:18 UTC
Created attachment 239485 [details]
textdump.tar.last

Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 06
fault virtual address   = 0x1ac
fault code              = supervisor write data, page not present
instruction pointer     = 0x20:0xffffffff80ce4f1c
stack pointer           = 0x28:0xfffffe00edb17cd0
frame pointer           = 0x28:0xfffffe00edb17cd0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 1782 (nfsd: master)
rdi:                0 rsi:                0 rdx:                0
rcx: fffffe0093c7c380  r8:               80  r9:            10000
rax:         ffffffff rbx: fffffe00edb181a8 rbp: fffffe00edb17cd0
r10:                1 r11:            10000 r12: fffffe00edb17f78
r13:                0 r14:                1 r15: fffffe00edb17fd0
trap number             = 12
panic: page fault
cpuid = 3
time = 1673789502
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00edb17a90
vpanic() at vpanic+0x151/frame 0xfffffe00edb17ae0
panic() at panic+0x43/frame 0xfffffe00edb17b40
trap_fatal() at trap_fatal+0x409/frame 0xfffffe00edb17ba0
trap_pfault() at trap_pfault+0xab/frame 0xfffffe00edb17c00
calltrap() at calltrap+0x8/frame 0xfffffe00edb17c00
--- trap 0xc, rip = 0xffffffff80ce4f1c, rsp = 0xfffffe00edb17cd0, rbp = 0xfffffe00edb17cd0 ---
vrele() at vrele+0xc/frame 0xfffffe00edb17cd0
nfsvno_open() at nfsvno_open+0x1d1/frame 0xfffffe00edb17d60
nfsrvd_open() at nfsrvd_open+0xe84/frame 0xfffffe00edb181f0
nfsrvd_dorpc() at nfsrvd_dorpc+0x1386/frame 0xfffffe00edb18400
nfssvc_program() at nfssvc_program+0x5e2/frame 0xfffffe00edb185e0
svc_run_internal() at svc_run_internal+0xa42/frame 0xfffffe00edb18710
svc_run() at svc_run+0x250/frame 0xfffffe00edb18770
nfsrvd_nfsd() at nfsrvd_nfsd+0x338/frame 0xfffffe00edb188d0
nfssvc_nfsd() at nfssvc_nfsd+0x524/frame 0xfffffe00edb18de0
sys_nfssvc() at sys_nfssvc+0xb0/frame 0xfffffe00edb18e00
amd64_syscall() at amd64_syscall+0x12e/frame 0xfffffe00edb18f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00edb18f30
--- syscall (155, FreeBSD ELF64, nfssvc), rip = 0x20bd03ff250a, rsp = 0x20bd0230b868, rbp = 0x20bd0230bb00 ---
KDB: enter: panic
Comment 1 Masachika ISHIZUKA 2023-02-03 05:21:34 UTC
(In reply to Masachika ISHIZUKA from comment #0)

I can avoid this panic by applying the patch of https://bugs.freebsd.org/bugzilla/attachment.cgi?id=239354 .
Comment 2 Dmitry Chagin freebsd_committer freebsd_triage 2023-02-03 08:34:40 UTC
over to maintainer
Comment 3 Mateusz Guzik freebsd_committer freebsd_triage 2023-02-04 18:32:57 UTC
this should already be fixed as of  https://cgit.FreeBSD.org/src/commit/?id=dcfa3ee44da2b139f51a8aedb0f55735c6dfe3f3

are you still seeing the problem?
Comment 4 Masachika ISHIZUKA 2023-02-05 02:36:36 UTC
(In reply to Mateusz Guzik from comment #3)
Yes.
It panicked on	dcfa3ee44da2b139f51a8aedb0f55735c6dfe3f3 and after.
I can avoid this panic by applying the patch below.

% diff -ruN /sys/fs/nfsserver/nfs_nfsdport.c.orig /sys/fs/nfsserver/nfs_nfsdport.c
@@ -1918,7 +1918,7 @@
                }
        } else {
                nfsvno_relpathbuf(ndp);
-               if (ndp->ni_startdir && create == NFSV4OPEN_CREATE) {
+               if (ndp->ni_startdir && done_namei && create == NFSV4OPEN_CREATE) {
                        if (ndp->ni_dvp == ndp->ni_vp)
                                vrele(ndp->ni_dvp);
                        else
Comment 5 Masachika ISHIZUKA 2023-02-05 02:41:35 UTC
(In reply to Mateusz Guzik from comment #3)
Sorry, the patch of comment #4 is incorrect. I wrote the patch on this reply by hand and I miss wrote.

Yes.
It panicked on	dcfa3ee44da2b139f51a8aedb0f55735c6dfe3f3 and after.
I can avoid this panic by applying the patch below.

% diff -ruN /sys/fs/nfsserver/nfs_nfsdport.c.orig /sys/fs/nfsserver/nfs_nfsdport.c
@@ -1918,7 +1918,7 @@
                }
        } else {
                nfsvno_relpathbuf(ndp);
-               if (done_namei && create == NFSV4OPEN_CREATE) {
+               if (ndp->ni_startdir && done_namei && create == NFSV4OPEN_CREATE) {
                        if (ndp->ni_dvp == ndp->ni_vp)
                                vrele(ndp->ni_dvp);
                        else
Comment 6 Rick Macklem freebsd_committer freebsd_triage 2023-02-05 16:30:22 UTC
Created attachment 239922 [details]
fix done_namei so it is set true exactly when ni_statdir is set non-NULL

Please try this patch on a system with
the done_namei patch instead of checking
ni_startdir.  With this patch, done_namei
should be set true exactly when ni_startdir
is set non-NULL.

I, personally, do not understand the problem
with using ni_startdir. Maybe mjg@ can
explain that.

Please let us know if this fixes the creashes?
Comment 7 Masachika ISHIZUKA 2023-02-06 05:50:15 UTC
Created attachment 239938 [details]
textdump.tar.last

Thank you for the patch.
Unfortunately, this patch doesn't fix the crash.
Comment 8 Rick Macklem freebsd_committer freebsd_triage 2023-02-06 15:14:29 UTC
Created attachment 239948 [details]
Fix done_namei so it is only set when nd_repstat is set after nfsvno_namei

Sorry about the previous failures.  I looked
at the code more carefully and came to the
conclusion that the only time the code in
that "if" near the end of nfsvno_open()
needs to be executed is when nfsvno_namei()
succeeds, but nd_repstat is set non-0 after
the nfsvno_namei() call.

I also noticed that nfsvno_relpathbuf() would
be called twice for the unusual case of
nfsrv_parsename() setting nd_repstat non-0.

Hopefully this patch works correctly.
It should be applied to what is in main.
Comment 9 Rick Macklem freebsd_committer freebsd_triage 2023-02-06 15:48:35 UTC
Oops, don't test the patch I put here a
few minutes ago.  I got the fix for
nfsvno_relpathbuf() wrong.

I put an updated version here later to-day.
Comment 10 Rick Macklem freebsd_committer freebsd_triage 2023-02-07 01:15:16 UTC
Created attachment 239962 [details]
Fix done_namei so it is only set when nd_repstat is set after nfsvno_namei

Ok, I think this version handles nfsvno_relpathbuf()
correctly. It only executes the code near the end
of nfsvno_open() when done_namei is true and it is
only set true when nd_repstat is set after nfsvo_namei()
has returned 0 (which implies the pathbuf is still
allocated and ni_dvp, ni_vp are valid).

Hopefully this one will fix the crashes.
Comment 11 Masachika ISHIZUKA 2023-02-07 04:01:20 UTC
(In reply to Rick Macklem from comment #10)
Thank you for patch.
Until yesterday, the reproducibility of the panic was 100%, but for some reason, from this morning, regardless of the presence or absence of the patch, the panic has stopped. Therefore, I cannot report whether this patch is effective or not at this time.
In the future, I will report again when I make progress.
Comment 12 Masachika ISHIZUKA 2023-02-07 12:29:28 UTC
(In reply to Masachika ISHIZUKA from comment #11)
Thank you for good patch.
After that, my PC sometimes panicked on non patched GENERIC-main-n260675-de59f46a825b, but it never panicked on patched one. Also, the unpatched one cannot be shutdown (shutdown process hangs), but the patched one can shutdown normally.
Comment 13 Masachika ISHIZUKA 2023-02-07 13:19:11 UTC
(In reply to Masachika ISHIZUKA from comment #12)
>γ€€the patched one can shutdown normally.

This was a mistake.
This is a separate issue from this bug report, so I will issue another bug report if I find out anything.
Comment 14 Rick Macklem freebsd_committer freebsd_triage 2023-02-07 22:49:01 UTC
Just to be crystal clear, you are not getting
crashes with the most recent patch?

Sorry for the grief this has caused you. I
should have looked at the code more carefully
the first time instead of just trying to replace
the use of ni_startdir (which stopped the crashes,
but was broken for the case where the guarded create
fails after nfsvno_namei() succeeds).

And thanks for doing all the testing, rick.
Comment 15 Masachika ISHIZUKA 2023-02-08 01:11:23 UTC
(In reply to Rick Macklem from comment #14)

Yes.
It doesn't crash after applying the latest patch.
And it sometimes crashes withouy the latest patch.
Comment 16 commit-hook freebsd_committer freebsd_triage 2023-02-08 21:10:16 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=ded5f2954e1a1bb7748646888938af767ee6257a

commit ded5f2954e1a1bb7748646888938af767ee6257a
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2023-02-08 21:06:07 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2023-02-08 21:08:51 +0000

    nfsd: Fix handling of the error case for nfsvno_open

    Using done_namei instead of ni_startdir did not
    fix the crashes reported in the PR. Upon looking
    more closely at the code, the only case where the
    code near the end of nfsvno_open() needs to be
    executed is when nfsvno_namei() has succeeded,
    but a subsequent error was detected.

    This patch uses done_namei to indicate this case.

    Also, nfsvno_relpathbuf() should only be called for
    this case and not whenever nfsvno_open() is called
    with nd_repstat != 0. A bug was introduced here when
    the HASBUF flag was deleted.

    Reviewed by:    mjg
    PR:     268971
    Tested by:      ish@amail.plala.or.jp
    MFC after:      1 week
    Differential Revision:  https://reviews.freebsd.org/D38430

 sys/fs/nfsserver/nfs_nfsdport.c | 22 +++++++++++++---------
 sys/fs/nfsserver/nfs_nfsdserv.c | 10 +++++-----
 2 files changed, 18 insertions(+), 14 deletions(-)
Comment 17 Rick Macklem freebsd_committer freebsd_triage 2023-02-08 23:56:24 UTC
The patch has been committed and will be MFC'd
to stable/13 since it fixes things properly, whereas
the use of ni_startdir did not.
Comment 18 Masachika ISHIZUKA 2023-02-09 05:53:18 UTC
(In reply to Rick Macklem from comment #17)
Thank you very much.
main-n260764-905ae5881bdc is working fine.
Comment 19 commit-hook freebsd_committer freebsd_triage 2023-02-11 03:36:18 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=3e230e0cc4ad822554afaaa07369ca5ccb62054d

commit 3e230e0cc4ad822554afaaa07369ca5ccb62054d
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2023-02-11 03:34:57 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2023-02-11 03:34:57 +0000

    nfsd: Fix handling of the error case for nfsvno_open some more

    Commit ded5f2954e1a defined done_namei to indicate that
    nd_repstat was set after a successful nfsvno_namei(),
    so that a cleanup needs to be done in nfsvno_open().
    However, it missed the case where a call to
    nfsrv_opencheck() in nfsvno_open() sets nd_repstat non-zero.

    This would cause panics due to a dangling locked vnode
    when nfsrv_opencheck() set nd_repstat, such as during grace
    just after a server boot.

    This patch fixes the problem.

    PR:     268971

 sys/fs/nfsserver/nfs_nfsdport.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)
Comment 20 commit-hook freebsd_committer freebsd_triage 2023-02-11 15:15:16 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=5fd0916cdbfdbaad620f3d5e7ff80e3436c640ea

commit 5fd0916cdbfdbaad620f3d5e7ff80e3436c640ea
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2023-02-11 15:14:08 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2023-02-11 15:14:08 +0000

    nfsd: Add a KASSERT in nfsvno_open

    Commit ded5f2954e1a defined done_namei to indicate that
    nd_repstat was set after a successful nfsvno_namei(),
    so that a cleanup needs to be done in nfsvno_open().
    This only happens when nfsvno_namei() is done with CREATE.

    This patch adds a KASSERT() to check for that.

    PR:     268971

 sys/fs/nfsserver/nfs_nfsdport.c | 2 ++
 1 file changed, 2 insertions(+)
Comment 21 Rick Macklem freebsd_committer freebsd_triage 2023-12-28 23:11:49 UTC
This should now be fixed in main and the stable branches.