I just saw this panic on a 12-stable machine. Unfortunately, I don't have a core dump, just a stack trace. It was serving NFS v4.0, with delegations enabled. The clients were all Debian, with Linux 3.16.0. The proximal cause of the panic seems to be that the file had a write delegation issued to an unconfirmed client. Root cause is harder to determine. Did the kernel previously issue a delegation to an unconfirmed client? Or did the client somehow change to an unconfirmed state after the delegation was issued, perhaps due to a race? Should there be a check for LCL_NEEDSCONFIRM somewhere around line 3166 in nfs_nfsdstate.c? kdb_backtrace vpanic panic nfsrv_docallback nfsrv_checkgetattr nfsrvd_getattr nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_start fork_exit fork_trampoline
Created attachment 217776 [details] Add a check for LCL_NEEDSCONFIRM to nfsrv_checkgetattr() I think this patch will fix the panic. It makes nfsrv_checkgetattr() return as if there is no delegation when the ClientID for the delegation is unconfirmed.
I believe that nfsrv_opencheck() will have checked for an unconfirmed ClientID. As such, I think the ClientID would be confirmed when the Open and Delegation was issued. However, a reboot of the client (or a broken client that for some reason did a SetClientID after acquiring a ClientID) would set the ClientID back to unconfirmed. --> This implies any delegation on the ClientID is no longer valid (it is still there, since old state remains until the SetClientIDConfirm is done to complete the handshake that (re)creates the ClientID). I think the bug was nfsrv_checkgetattr() assuming that the delegation it found was valid, because it did not check for an unconfirmed ClientID. The patch in the attachment adds that check and should avoid the panic.
A commit references this bug: Author: rmacklem Date: Mon Sep 14 00:44:50 UTC 2020 New revision: 365703 URL: https://svnweb.freebsd.org/changeset/base/365703 Log: Fix a case where the NFSv4.0 server might crash if delegations are enabled. asomers@ reported a crash on an NFSv4.0 server with a backtrace of: kdb_backtrace vpanic panic nfsrv_docallback nfsrv_checkgetattr nfsrvd_getattr nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_start fork_exit fork_trampoline where the panic message was "docallb", which indicates that a callback was attempted when the ClientID is unconfirmed. This would not normally occur, but it is possible to have an unconfirmed ClientID structure with delegation structure(s) chained off it if the client were to issue a SetClientID with the same "id" but different "verifier" after acquiring delegations on the previously confirmed ClientID. The bug appears to be that nfsrv_checkgetattr() failed to check for this uncommon case of an unconfirmed ClientID with a delegation structure that no longer refers to a delegation the client knows about. This patch adds a check for this case, handling it as if no delegation exists, which is the case when the above occurs. Although difficult to reproduce, this change should avoid the panic(). PR: 249127 Reported by: asomers Reviewed by: asomers MFC after: 1 week Differential Revision: https://reviews.freebbsd.org/D26342 Changes: head/sys/fs/nfsserver/nfs_nfsdstate.c
Patch has been committed to head and will be MFC'd.
A commit references this bug: Author: rmacklem Date: Mon Sep 21 00:50:32 UTC 2020 New revision: 365934 URL: https://svnweb.freebsd.org/changeset/base/365934 Log: MFC: r365703 Fix a case where the NFSv4.0 server might crash if delegations are enabled. asomers@ reported a crash on an NFSv4.0 server with a backtrace of: kdb_backtrace vpanic panic nfsrv_docallback nfsrv_checkgetattr nfsrvd_getattr nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_start fork_exit fork_trampoline where the panic message was "docallb", which indicates that a callback was attempted when the ClientID is unconfirmed. This would not normally occur, but it is possible to have an unconfirmed ClientID structure with delegation structure(s) chained off it if the client were to issue a SetClientID with the same "id" but different "verifier" after acquiring delegations on the previously confirmed ClientID. The bug appears to be that nfsrv_checkgetattr() failed to check for this uncommon case of an unconfirmed ClientID with a delegation structure that no longer refers to a delegation the client knows about. This patch adds a check for this case, handling it as if no delegation exists, which is the case when the above occurs. Although difficult to reproduce, this change should avoid the panic(). PR: 249127 Changes: _U stable/12/ stable/12/sys/fs/nfsserver/nfs_nfsdstate.c
A commit references this bug: Author: rmacklem Date: Mon Sep 21 01:39:01 UTC 2020 New revision: 365935 URL: https://svnweb.freebsd.org/changeset/base/365935 Log: MFC: r365703 Fix a case where the NFSv4.0 server might crash if delegations are enabled. asomers@ reported a crash on an NFSv4.0 server with a backtrace of: kdb_backtrace vpanic panic nfsrv_docallback nfsrv_checkgetattr nfsrvd_getattr nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_start fork_exit fork_trampoline where the panic message was "docallb", which indicates that a callback was attempted when the ClientID is unconfirmed. This would not normally occur, but it is possible to have an unconfirmed ClientID structure with delegation structure(s) chained off it if the client were to issue a SetClientID with the same "id" but different "verifier" after acquiring delegations on the previously confirmed ClientID. The bug appears to be that nfsrv_checkgetattr() failed to check for this uncommon case of an unconfirmed ClientID with a delegation structure that no longer refers to a delegation the client knows about. This patch adds a check for this case, handling it as if no delegation exists, which is the case when the above occurs. Although difficult to reproduce, this change should avoid the panic(). PR: 249127 Changes: _U stable/11/ stable/11/sys/fs/nfsserver/nfs_nfsdstate.c
A commit references this bug: Author: rmacklem Date: Thu Sep 24 14:59:10 UTC 2020 New revision: 366116 URL: https://svnweb.freebsd.org/changeset/base/366116 Log: MFS: r365703 Fix a case where the NFSv4.0 server might crash if delegations are enabled. asomers@ reported a crash on an NFSv4.0 server with a backtrace of: kdb_backtrace vpanic panic nfsrv_docallback nfsrv_checkgetattr nfsrvd_getattr nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_start fork_exit fork_trampoline where the panic message was "docallb", which indicates that a callback was attempted when the ClientID is unconfirmed. This would not normally occur, but it is possible to have an unconfirmed ClientID structure with delegation structure(s) chained off it if the client were to issue a SetClientID with the same "id" but different "verifier" after acquiring delegations on the previously confirmed ClientID. The bug appears to be that nfsrv_checkgetattr() failed to check for this uncommon case of an unconfirmed ClientID with a delegation structure that no longer refers to a delegation the client knows about. This patch adds a check for this case, handling it as if no delegation exists, which is the case when the above occurs. Although difficult to reproduce, this change should avoid the panic(). PR: 249127 Approved by: re (gjb) Changes: _U releng/12.2/ releng/12.2/sys/fs/nfsserver/nfs_nfsdstate.c
Patch has been MFC'd and MFS's to releng/12.2.