Bug 268828 - panic: main-n259921-17f2b12a3877 panicked (Fatal trap 12: page fault while in kernel mode)
Summary: panic: main-n259921-17f2b12a3877 panicked (Fatal trap 12: page fault while in...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: Rick Macklem
URL: https://cgit.freebsd.org/src/commit/?...
Keywords: crash
Depends on:
Blocks:
 
Reported: 2023-01-08 14:27 UTC by Masachika ISHIZUKA
Modified: 2023-01-15 18:56 UTC (History)
5 users (show)

See Also:


Attachments
textdump.tar.1.xz (13.20 KB, application/x-xz)
2023-01-08 14:27 UTC, Masachika ISHIZUKA
no flags Details
revert removal of a check for ni_startdir being non-NULL (431 bytes, patch)
2023-01-09 01:16 UTC, Rick Macklem
no flags Details | Diff
Check ni_dvp is not NULL before attempting vrele() (903 bytes, patch)
2023-01-10 02:24 UTC, Rick Macklem
no flags Details | Diff
Add an argument to nfsvno_open() to indicate if nfsvno_namei() was called (3.57 KB, patch)
2023-01-11 04:19 UTC, Rick Macklem
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Masachika ISHIZUKA 2023-01-08 14:27:48 UTC
Created attachment 239342 [details]
textdump.tar.1.xz

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02
fault virtual address   = 0x1ac
fault code              = supervisor write data, page not present
instruction pointer     = 0x20:0xffffffff80ce3c7c
stack pointer           = 0x28:0xfffffe00edb6fce0
frame pointer           = 0x28:0xfffffe00edb6fce0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 1782 (nfsd: master)
rdi:                0 rsi:                0 rdx:                0
rcx: fffffe0093c7c3c0  r8:               80  r9:            10000
rax:         ffffffff rbx: fffffe00edb701a8 rbp: fffffe00edb6fce0
r10:                1 r11:            10000 r12: fffffe00edb6ff78
r13:                0 r14:                1 r15: fffffe00edb6ffd0
trap number             = 12
panic: page fault
cpuid = 1
time = 1673187203
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00edb6faa0
vpanic() at vpanic+0x151/frame 0xfffffe00edb6faf0
panic() at panic+0x43/frame 0xfffffe00edb6fb50
trap_fatal() at trap_fatal+0x409/frame 0xfffffe00edb6fbb0
trap_pfault() at trap_pfault+0xab/frame 0xfffffe00edb6fc10
calltrap() at calltrap+0x8/frame 0xfffffe00edb6fc10
--- trap 0xc, rip = 0xffffffff80ce3c7c, rsp = 0xfffffe00edb6fce0, rbp = 0xfffffe00edb6fce0 ---
vrele() at vrele+0xc/frame 0xfffffe00edb6fce0
nfsvno_open() at nfsvno_open+0x1c7/frame 0xfffffe00edb6fd70
nfsrvd_open() at nfsrvd_open+0xe51/frame 0xfffffe00edb701f0
nfsrvd_dorpc() at nfsrvd_dorpc+0x1386/frame 0xfffffe00edb70400
nfssvc_program() at nfssvc_program+0x5e2/frame 0xfffffe00edb705e0
svc_run_internal() at svc_run_internal+0xa42/frame 0xfffffe00edb70710
svc_run() at svc_run+0x250/frame 0xfffffe00edb70770
nfsrvd_nfsd() at nfsrvd_nfsd+0x338/frame 0xfffffe00edb708d0
nfssvc_nfsd() at nfssvc_nfsd+0x524/frame 0xfffffe00edb70de0
sys_nfssvc() at sys_nfssvc+0xb0/frame 0xfffffe00edb70e00
amd64_syscall() at amd64_syscall+0x12e/frame 0xfffffe00edb70f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00edb70f30
--- syscall (155, FreeBSD ELF64, nfssvc), rip = 0x28e1eabcef2a, rsp = 0x28e1e8317598, rbp = 0x28e1e8317830 ---
KDB: enter: panic
Comment 1 Rick Macklem freebsd_committer freebsd_triage 2023-01-09 01:16:33 UTC
Created attachment 239354 [details]
revert removal of a check for ni_startdir being non-NULL

This patch reverts a small part of 65127e982b94,
which I think caused this crash.
Comment 2 Rick Macklem freebsd_committer freebsd_triage 2023-01-09 01:20:19 UTC
I believe this crash was caused by 65127e982b94 which
removed a check for ni_startdir being non-NULL before
doing vrele()s for the create case with nd_repstat set
non-zero.

The patch I attached reverts this change and, I think,
will fix the crash.

651273982b94 is dated Nov. 10. I have no idea if your
kernel version has this patch, but I suspect it does.
Comment 3 Graham Perrin freebsd_committer freebsd_triage 2023-01-09 03:16:00 UTC
textdump.tar.1.xz at comment #0 comprises five files: 

% ls -hln
total 1
-rw-------  1 1002  0   4.3K  8 Jan 14:13 config.txt
-rw-------  1 1002  0    48K  8 Jan 14:13 ddb.txt
-rw-------  1 1002  0    20K  8 Jan 14:13 msgbuf.txt
-rw-------  1 1002  0    10B  8 Jan 14:13 panic.txt
-rw-------  1 1002  0   154B  8 Jan 14:13 version.txt
% 


(In reply to Rick Macklem from comment #2)

Thank you, 

> … 651273982b94 is dated Nov. 10. …

version.txt (above) gives us: 

FreeBSD 14.0-CURRENT #16 main-n259921-17f2b12a3877-dirty: Tue Jan  3 18:13:32 JST 2023
    ishizuka@okra.ish.org:/usr/obj/usr/src/amd64.amd64/sys/SG_UDF2
Comment 4 Masachika ISHIZUKA 2023-01-09 04:47:37 UTC
(In reply to Rick Macklem from comment #2)
Thank you to give me a patch.
After applying this patch, it works fine.
Comment 5 Mateusz Guzik freebsd_committer freebsd_triage 2023-01-09 07:30:52 UTC
Part of the point of the aformentioned change was to stop exporting ni_startdir. I probably should have added some poisononing of that value so that this would have been found sooner.

Anyhow, for vrele to be called in this case both dvp and vp have to be null, which means the lookup failed to find anything? Should this code even be reachable in such a case?

As a quick hack the following will probably do it:
diff --git a/sys/fs/nfsserver/nfs_nfsdport.c b/sys/fs/nfsserver/nfs_nfsdport.c
index 665e2c00ce08..012f84f8befa 100644
--- a/sys/fs/nfsserver/nfs_nfsdport.c
+++ b/sys/fs/nfsserver/nfs_nfsdport.c
@@ -1918,7 +1918,7 @@ nfsvno_open(struct nfsrv_descript *nd, struct nameidata *ndp,
                }
        } else {
                nfsvno_relpathbuf(ndp);
-               if (create == NFSV4OPEN_CREATE) {
+               if (create == NFSV4OPEN_CREATE && ndp->ni_dvp != NULL) {
                        if (ndp->ni_dvp == ndp->ni_vp)
                                vrele(ndp->ni_dvp);
                        else
Comment 6 Rick Macklem freebsd_committer freebsd_triage 2023-01-09 15:57:26 UTC
I'm not sure if your patch will work, since it
depends on ni_dvp being set to NULL long before
any call to nfsvno_namei()/vfs_lookup().
That code could be added, but I will need to
look to see it gets set non-NULL in the correct
place.

What about ni_rootdir. Can I use that one?

Basically, it is a test "has nfsvno_namei()
been called", done by setting the field NULL
at the beginning of the function and then
setting in non-NULL in nfsvno_namei() at
the correct place (ni_rootdir gets set in the
same place as ni_startdir).
Comment 7 Rick Macklem freebsd_committer freebsd_triage 2023-01-10 02:24:22 UTC
Created attachment 239370 [details]
Check ni_dvp is not NULL before attempting vrele()

This version of the patch should fix the problem
as well.  It does what mjg@ suggested plus initializes
ni_dvp NULL. This is necessary, since nfsrv_parsename()
can set nd_repstat so that nfsvno_namei() does not get
called (it calls vfs_lookup(), which would set ni_dvp).

This avoids use of ni_startdir as requested by mjg@.

If anyone can test this patch variant, that would be
appreciated.
Comment 8 Mateusz Guzik freebsd_committer freebsd_triage 2023-01-10 05:34:21 UTC
as there is only one consumer of nfsrvd_open, perhaps you can denote that namei was called in the struct you pass around?

admittedly the ni_dvp thing is also a giant hack, the content of nameidata should be considered indeterminate if namei was not called
Comment 9 Rick Macklem freebsd_committer freebsd_triage 2023-01-11 04:19:09 UTC
Created attachment 239398 [details]
Add an argument to nfsvno_open() to indicate if nfsvno_namei() was called

This version of the patch uses a new argument to
nfsvno_open() to indicate if nfsvno_namei() has
been called before nfsvno_open(), then only do
the vrele() if it has been called (which indicates
that ni_dvp will be properly set).

Maybe this version will make mjg@ happy?
Comment 10 Mateusz Guzik freebsd_committer freebsd_triage 2023-01-12 18:28:48 UTC
as long as it does not depend on internal namei state (which ni_stardir now is) it is not my business :)

I'll see about hiding ni_stardir behind a setter and otherwise making it not available.
Comment 11 commit-hook freebsd_committer freebsd_triage 2023-01-13 00:50:52 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=dcfa3ee44da2b139f51a8aedb0f55735c6dfe3f3

commit dcfa3ee44da2b139f51a8aedb0f55735c6dfe3f3
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2023-01-13 00:45:26 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2023-01-13 00:48:53 +0000

    nfsserver: Fix vrele() panic in nfsvno_open()

    Commit 65127e982b94 removed a check for ni_startdir != NULL.
    This allowed the vrele(ndp->ni_dvp) to be called with
    a NULL argument.

    This patch adds a new boolean argument to nfsvno_open()
    that can be checked instead of ni_startdir, since mjg@ requested
    that ni_startdir not be used. (Discussed in PR#268828.)

    PR:     268828
    Reviewed by:    mjg
    Differential Revision:  https://reviews.freebsd.org/D38032

 sys/fs/nfs/nfs_var.h            |  2 +-
 sys/fs/nfsserver/nfs_nfsdport.c |  4 ++--
 sys/fs/nfsserver/nfs_nfsdserv.c | 13 ++++++++-----
 3 files changed, 11 insertions(+), 8 deletions(-)
Comment 12 Rick Macklem freebsd_committer freebsd_triage 2023-01-13 01:19:47 UTC
Patch has been committed.