Bug 255695 - crash in NFSv4.1 server when processing a callback reply
Summary: crash in NFSv4.1 server when processing a callback reply
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.2-RELEASE
Hardware: Any Any
: --- Affects Some People
Assignee: Rick Macklem
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-05-08 00:45 UTC by Rick Macklem
Modified: 2021-05-08 02:09 UTC (History)
3 users (show)

See Also:
rmacklem: mfc-stable13?
rmacklem: mfc-stable12?


Attachments
ref cnt the CLIENT structure so that it is not prematurely free'd (2.54 KB, patch)
2021-05-08 00:57 UTC, Rick Macklem
no flags Details | Diff
ref cnt the CLIENT structure so that it is not prematurely free'd for freebsd12 (3.03 KB, patch)
2021-05-08 02:09 UTC, Rick Macklem
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Rick Macklem freebsd_committer 2021-05-08 00:45:08 UTC
The following crash was reported in a FreeNAS12 server:
> Fatal trap 12: page fault while in kernel mode
>
> cpuid = 1; apic id = 02
>
> fault virtual address   = 0x410
>
> fault code              = supervisor read data, page not present
>
> instruction pointer     = 0x20:0xffffffff80aa4a57
>
> stack pointer           = 0x28:0xfffffe021f94f150
>
> frame pointer           = 0x28:0xfffffe021f94f1d0
>
> code segment            = base rx0, limit 0xfffff, type 0x1b
>
>                           = DPL 0, pres 1, long 1, def32 0, gran 1
>
> processor eflags        = interrupt enabled, resume, IOPL = 0
>
> current process         = 4908 (nfsd: service)
>
> trap number             = 12
>
> panic: page fault
>
> cpuid = 1
>
> time = 1619545070
>
> KDB: stack backtrace:
>
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfffffe021f94ee10
> vpanic() at vpanic+0x17b/frame 0xfffffe021f94ee60
>
> panic() at panic+0x43/frame 0xfffffe021f94eec0
>
> trap_fatal() at trap_fatal+0x391/frame 0xfffffe021f94ef20
>
> trap_pfault() at trap_pfault+0x4f/frame 0xfffffe021f94ef70
>
> trap() at trap+0x286/frame 0xfffffe021f94f080
>
> calltrap() at calltrap+0x8/frame 0xfffffe021f94f080
>
> --- trap 0xc, rip = 0xffffffff80aa4a57, rsp = 0xfffffe021f94f150, rbp =
> 0xfffffe021f94f1d0 ---
> __mtx_lock_sleep() at __mtx_lock_sleep+0xd7/frame 0xfffffe021f94f1d0
>
> clnt_bck_svccall() at clnt_bck_svccall+0x10a/frame 0xfffffe021f94f210
>
> svc_vc_recv() at svc_vc_recv+0x1b2/frame 0xfffffe021f94f2e0
>
> svc_run_internal() at svc_run_internal+0x377/frame 0xfffffe021f94f420
>
> svc_thread_start() at svc_thread_start+0xb/frame 0xfffffe021f94f430
>
> fork_exit() at fork_exit+0x7e/frame 0xfffffe021f94f470
>
> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe021f94f470
>
> --- trap 0xc, rip = 0x8002e1b2a, rsp = 0x7fffffffe578, rbp =
> 0x7fffffffe810 ---
> KDB: enter: panic

This crash in clnt_bck_svccall() appears to have occurred
because the CLIENT structure for handling the callback RPCs
has already been free'd.
Freeing this CLIENT structure only occurs when the ClientID
(not the same thing, despite the name similarity) has been
destroyed.
Comment 1 Rick Macklem freebsd_committer 2021-05-08 00:57:59 UTC
Created attachment 224761 [details]
ref cnt the CLIENT structure so that it is not prematurely free'd

This patch acquires a reference count on the CLIENT
structure that is not released until the associated
socket structure is destroyed.
This should avoid the structure being free'd before
a callback reply has been processed.

It also adds a check for closed or closing, so that
it does not try to process a callback reply after the
CLIENT structure has been CLNT_CLOSE()'d.

I think should fix the crash.

I am waiting for a review and, hopefully, a positive
test report from Michael Dexter, who reported the crash.
Comment 2 Rick Macklem freebsd_committer 2021-05-08 02:09:10 UTC
Created attachment 224763 [details]
ref cnt the CLIENT structure so that it is not prematurely free'd for freebsd12

Same patch as 224761, but for older kernels.
Search in sys/fs/nfsserver/nfs_nfsdstate,c
for xp_p2. If you find 3 of them, use this patch.
If you find 2 of them, use 224761.