Bug 280978 - Kernel panics with vfs.nfsd.enable_locallocks=1 and nfs clients doing hdf5 file operations
Summary: Kernel panics with vfs.nfsd.enable_locallocks=1 and nfs clients doing hdf5 fi...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 15.0-CURRENT
Hardware: amd64 Any
: --- Affects Some People
Assignee: Rick Macklem
URL:
Keywords: crash
Depends on:
Blocks:
 
Reported: 2024-08-21 15:00 UTC by Matthew L. Dailey
Modified: 2024-10-14 19:24 UTC (History)
3 users (show)

See Also:
rmacklem: mfc-stable14+
rmacklem: mfc-stable13+


Attachments
Steps to reproduce panics (5.06 KB, text/plain)
2024-08-21 15:09 UTC, Matthew L. Dailey
no flags Details
Fix close handling when vfs.nfsd.enable_locallocks=1 (4.99 KB, patch)
2024-09-12 21:44 UTC, Rick Macklem
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Matthew L. Dailey 2024-08-21 15:00:46 UTC
With vfs.nfsd.enable_locallocks=1, kernel panics or hung nfs server (more rarely) can be induced from Linux nfs clients doing hdf5 file operations. In testing, this has also sometimes resulted in irrecoverable zpool corruption due to (I assume) memory corruption.

In testing on hardware and VMs, we have induced these failures usually within a few hours, but sometimes within several days to a week. We have replicated this on 13.0 through 15.0-CURRENT (20240725-82283cad12a4-271360). With this sysctl set to 0 (default), we are unable to replicate the issue, even after several weeks of 24/7 hdf5 file operations.

Below is a backtrace from a recent panic on a test VM. Based on a suggestion from a colleague, we are currently running a test with a VM on 15.0-CURRENT (20240725-82283cad12a4-271360) with a single CPU, just to see if this has any effect.

Thanks and please let me know what other information I can provide.

#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
#1  doadump (textdump=textdump@entry=0)
    at /usr/src/sys/kern/kern_shutdown.c:404
#2  0xffffffff8049e0fa in db_dump (dummy=<optimized out>,
    dummy2=<optimized out>, dummy3=<optimized out>, dummy4=<optimized out>)
    at /usr/src/sys/ddb/db_command.c:596
#3  0xffffffff8049deed in db_command (last_cmdp=<optimized out>,
    cmd_table=<optimized out>, dopager=true)
    at /usr/src/sys/ddb/db_command.c:508
#4  0xffffffff8049dbad in db_command_loop ()
    at /usr/src/sys/ddb/db_command.c:555
#5  0xffffffff804a15f6 in db_trap (type=<optimized out>, code=<optimized out>)
    at /usr/src/sys/ddb/db_main.c:267
#6  0xffffffff80b9c49f in kdb_trap (type=type@entry=3, code=code@entry=0,
    tf=tf@entry=0xfffffe00da2dfce0) at /usr/src/sys/kern/subr_kdb.c:790
#7  0xffffffff81068479 in trap (frame=0xfffffe00da2dfce0)
    at /usr/src/sys/amd64/amd64/trap.c:606
#8  <signal handler called>
#9  kdb_enter (why=<optimized out>, msg=<optimized out>)
    at /usr/src/sys/kern/subr_kdb.c:556
#10 0xffffffff80b4cd40 in vpanic (fmt=0xffffffff811db2b0 "%s",
    ap=ap@entry=0xfffffe00da2dff10) at /usr/src/sys/kern/kern_shutdown.c:967
#11 0xffffffff80b4cbc3 in panic (
    fmt=0xffffffff81b98380 <cnputs_mtx> "N\235\024\201\377\377\377\377")
    at /usr/src/sys/kern/kern_shutdown.c:892
#12 0xffffffff81068edb in trap_fatal (frame=0xfffffe00da2e0010, eva=7)
    at /usr/src/sys/amd64/amd64/trap.c:950
#13 0xffffffff81068f80 in trap_pfault (frame=<optimized out>, usermode=false,
    signo=0x0, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:758
#14 <signal handler called>
#15 0xffffffff80a42cc7 in nfsrv_freelockowner (
    stp=stp@entry=0xfffff8000459a600, vp=vp@entry=0x0,
    cansleep=cansleep@entry=0, p=p@entry=0xfffff8002716d740)
    at /usr/src/sys/fs/nfsserver/nfs_nfsdstate.c:1637
#16 0xffffffff80a45865 in nfsrv_freestateid (nd=nd@entry=0xfffffe00da2e0428,
    stateidp=stateidp@entry=0xfffffe00da2e0180, p=p@entry=0xfffff8002716d740)
    at /usr/src/sys/fs/nfsserver/nfs_nfsdstate.c:6651
#17 0xffffffff80a558b1 in nfsrvd_freestateid (nd=0xfffffe00da2e0428,
    isdgram=<optimized out>, vp=<optimized out>, exp=<optimized out>)
    at /usr/src/sys/fs/nfsserver/nfs_nfsdserv.c:4775
#18 0xffffffff80a35ecf in nfsrvd_compound (nd=0xfffffe00da2e0428, isdgram=0,
    tag=<optimized out>, taglen=0, minorvers=<optimized out>)
    at /usr/src/sys/fs/nfsserver/nfs_nfsdsocket.c:1338
#19 nfsrvd_dorpc (nd=nd@entry=0xfffffe00da2e0428, isdgram=isdgram@entry=0,
    tag=<optimized out>, taglen=0, minorvers=<optimized out>)
    at /usr/src/sys/fs/nfsserver/nfs_nfsdsocket.c:633
#20 0xffffffff80a4b608 in nfs_proc (nd=0xfffffe00da2e0428,
    xid=<optimized out>, xprt=0xfffff8000a8a6400, rpp=<optimized out>)
    at /usr/src/sys/fs/nfsserver/nfs_nfsdkrpc.c:474
#21 nfssvc_program (rqst=0xfffff80027e51800, xprt=0xfffff8000a8a6400)
    at /usr/src/sys/fs/nfsserver/nfs_nfsdkrpc.c:358
#22 0xffffffff80e62dd0 in svc_executereq (rqstp=0xfffff80027e51800)
    at /usr/src/sys/rpc/svc.c:1053
#23 svc_run_internal (grp=grp@entry=0xfffff80003201100,
    ismaster=ismaster@entry=1) at /usr/src/sys/rpc/svc.c:1329
#24 0xffffffff80e6220f in svc_run (pool=0xfffff80003201000)
    at /usr/src/sys/rpc/svc.c:1408
#25 0xffffffff80a4bd73 in nfsrvd_nfsd (td=td@entry=0xfffff8002716d740,
    args=args@entry=0xfffffe00da2e09a0)
    at /usr/src/sys/fs/nfsserver/nfs_nfsdkrpc.c:641
#26 0xffffffff80a66ada in nfssvc_nfsd (td=0xfffff8002716d740,
    uap=<optimized out>) at /usr/src/sys/fs/nfsserver/nfs_nfsdport.c:3877
#27 0xffffffff80dcc11c in sys_nfssvc (td=<optimized out>, uap=<optimized out>)
    at /usr/src/sys/nfs/nfs_nfssvc.c:107
#28 0xffffffff81069888 in syscallenter (td=0xfffff8002716d740)
    at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:189
#29 amd64_syscall (td=0xfffff8002716d740, traced=0)
    at /usr/src/sys/amd64/amd64/trap.c:1192
#30 <signal handler called>
#31 0x00002687a11d1eda in ?? ()
Backtrace stopped: Cannot access memory at address 0x26879f30f8f8
Comment 1 Matthew L. Dailey 2024-08-21 15:09:03 UTC
Created attachment 252988 [details]
Steps to reproduce panics

Attached are instructions on how to set up a server and client and steps to reproduce the panics.
Comment 2 Rick Macklem freebsd_committer freebsd_triage 2024-09-10 23:43:18 UTC
Just fyi, I have found a place where the code
does an msleep() when vfs.nfs.enable_locallocks is 1.
This is bad news, since the underlying linked list
data structures could change during the msleep().

I haven't yet gotten to the point of coming up with
a fix.

Until then, I would recommend avoiding setting
vfs.nfsd.enable_locallocks to 1.
Comment 3 Rick Macklem freebsd_committer freebsd_triage 2024-09-12 21:44:59 UTC
Created attachment 253528 [details]
Fix close handling when vfs.nfsd.enable_locallocks=1

I think this patch might fix the problem.

nfsrv_freeopen() was being called after the mutex
lock was released, making it possible for other
kernel threads to change the lists while nfsrv_freeopen()
took the nfsstateid out of the lists.

This patch moves the code around
"if (nfsrv_freeopen(stp, vp, 1 p) == 0) {"
into nfsrv_freeopen(), so that it can remove the nfsstateid
structure from all lists before unlocking the mutex.
This should avoid any race between CLOSE and other nfsd threads
updating the NFSv4 state.

The patch does not affect the semantics when vfs.nfsd.enable_locallocks=0.

If the reporter can test this patch, that would be great.
Comment 4 Matthew L. Dailey 2024-09-18 12:15:26 UTC
(In reply to Rick Macklem from comment #3)
Thanks, Rick - this looks promising!

I should be able to get a test started today with this patch and see how it goes. Since the panics can take hours or days to happen (sometimes weeks), it may be a while before I can say anything definitive.

I'll update this report once I have any results.
Comment 5 Rick Macklem freebsd_committer freebsd_triage 2024-09-18 22:18:11 UTC
Sounds good. I won't be able to do commits for about
the next 10 days anyhow and it will be a while before
the 14.2 release cycle gets going, so this can wait for
a few weeks for results.
(If it doesn't fix your problem, I will probably still
commit it unless you report that it has made things worse.)

Thanks, rick
Comment 6 Matthew L. Dailey 2024-09-30 20:38:42 UTC
Quick update on this. I have had a test system running with this patch now for just over 10 days with no panics or issues. The hdf5 workload from a linux client has been running 24/7 for this entire period.

This is certainly not definitive in terms of the original bug, but it's looking good so far. And, it does not appear that this has introduced any new problems, at least in this limited testing.

I'll update again in another week or two.
Comment 7 Rick Macklem freebsd_committer freebsd_triage 2024-09-30 22:35:18 UTC
(In reply to Matthew L. Dailey from comment #6)
Thanks for the update. Given the 14.2 release
schedule, I think I will commit the patch to
main now and MFC it, given that I do believe
it fixes a locking problem.

I will leave this PR Open until you report again.
Comment 8 commit-hook freebsd_committer freebsd_triage 2024-09-30 22:52:10 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=eb345e05ac6602eeef0c33fce9025bbc8ec44d0f

commit eb345e05ac6602eeef0c33fce9025bbc8ec44d0f
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2024-09-30 22:49:57 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2024-09-30 22:49:57 +0000

    nfsd: Fix close handling when vfs.nfsd.enable_locallocks=1

    nfsrv_freeopen() was being called after the mutex
    lock was released, making it possible for other
    kernel threads to change the lists while nfsrv_freeopen()
    took the nfsstateid out of the lists.

    This patch moves the code around
    "if (nfsrv_freeopen(stp, vp, 1 p) == 0) {"
    into nfsrv_freeopen(), so that it can remove the nfsstateid
    structure from all lists before unlocking the mutex.
    This should avoid any race between CLOSE and other nfsd threads
    updating the NFSv4 state.

    The patch does not affect semantics when vfs.nfsd.enable_locallocks=0.

    PR:     280978
    Tested by:      Matthew L. Dailey <matthew.l.dailey@dartmouth.edu>
    MFC after:      1 week

 sys/fs/nfsserver/nfs_nfsdstate.c | 62 +++++++++++++++++++---------------------
 1 file changed, 30 insertions(+), 32 deletions(-)
Comment 9 commit-hook freebsd_committer freebsd_triage 2024-10-12 00:45:03 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=e7474c619270c5c325ce844b77eecc2dcf98147c

commit e7474c619270c5c325ce844b77eecc2dcf98147c
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2024-09-30 22:49:57 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2024-10-12 00:43:27 +0000

    nfsd: Fix close handling when vfs.nfsd.enable_locallocks=1

    nfsrv_freeopen() was being called after the mutex
    lock was released, making it possible for other
    kernel threads to change the lists while nfsrv_freeopen()
    took the nfsstateid out of the lists.

    This patch moves the code around
    "if (nfsrv_freeopen(stp, vp, 1 p) == 0) {"
    into nfsrv_freeopen(), so that it can remove the nfsstateid
    structure from all lists before unlocking the mutex.
    This should avoid any race between CLOSE and other nfsd threads
    updating the NFSv4 state.

    The patch does not affect semantics when vfs.nfsd.enable_locallocks=0.

    PR:     280978
    Tested by:      Matthew L. Dailey <matthew.l.dailey@dartmouth.edu>

    (cherry picked from commit eb345e05ac6602eeef0c33fce9025bbc8ec44d0f)

 sys/fs/nfsserver/nfs_nfsdstate.c | 62 +++++++++++++++++++---------------------
 1 file changed, 30 insertions(+), 32 deletions(-)
Comment 10 commit-hook freebsd_committer freebsd_triage 2024-10-12 00:52:05 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=cc07d914bc80f0c644584de6eab2efd30e911d8d

commit cc07d914bc80f0c644584de6eab2efd30e911d8d
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2024-09-30 22:49:57 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2024-10-12 00:49:55 +0000

    nfsd: Fix close handling when vfs.nfsd.enable_locallocks=1

    nfsrv_freeopen() was being called after the mutex
    lock was released, making it possible for other
    kernel threads to change the lists while nfsrv_freeopen()
    took the nfsstateid out of the lists.

    This patch moves the code around
    "if (nfsrv_freeopen(stp, vp, 1 p) == 0) {"
    into nfsrv_freeopen(), so that it can remove the nfsstateid
    structure from all lists before unlocking the mutex.
    This should avoid any race between CLOSE and other nfsd threads
    updating the NFSv4 state.

    The patch does not affect semantics when vfs.nfsd.enable_locallocks=0.

    PR:     280978
    Tested by:      Matthew L. Dailey <matthew.l.dailey@dartmouth.edu>

    (cherry picked from commit eb345e05ac6602eeef0c33fce9025bbc8ec44d0f)

 sys/fs/nfsserver/nfs_nfsdstate.c | 62 +++++++++++++++++++---------------------
 1 file changed, 30 insertions(+), 32 deletions(-)
Comment 11 Rick Macklem freebsd_committer freebsd_triage 2024-10-12 00:57:56 UTC
The patch has now been MFC'd.
I will leave this PR Open until the
reporter reports back w.r.t. success
of the patch.
Comment 12 Matthew L. Dailey 2024-10-14 15:14:14 UTC
Our test system with this patch has now been running non-stop without any panics or errors for almost 24 days. This is well outside of any failure window we've seen in our prior testing, so I'm cautiously optimistic that this patch has fixed this issue.

At this point, I think this can be closed. I'll reach out again if we see anything related to this in further testing.

Huge thanks to Rick for fixing this!
Comment 13 Rick Macklem freebsd_committer freebsd_triage 2024-10-14 19:24:40 UTC
The reporter has had good test results when using
the patch (which is now MFC'd to stable/13 and stable/14).