250866 – riscv/pmap: Lock inversion panic non-sleepable locks held: exclusive sleep mutex pmap (pmap) @ /usr/src/sys/riscv/riscv/pmap.c:3887 on 1300123

Bug 250866 - riscv/pmap: Lock inversion panic non-sleepable locks held: exclusive sleep mutex pmap (pmap) @ /usr/src/sys/riscv/riscv/pmap.c:3887 on 1300123

Summary: riscv/pmap: Lock inversion panic non-sleepable locks held: exclusive sleep mu...

Status:	Closed FIXED

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	riscv (show other bugs)
Version:	CURRENT
Hardware:	riscv Any

Importance:	--- Affects Only Me
Assignee:	Jessica Clarke

URL:	https://reviews.freebsd.org/D31253
Keywords:	crash, needs-qa

Depends on:
Blocks:

Reported:	2020-11-04 16:05 UTC by Dennis Clarke
Modified:	2021-09-07 16:35 UTC (History)
CC List:	2 users (show)

See Also:	https://reviews.freebsd.org/D27045 https://reviews.freebsd.org/D31087

Flags:	koobs: mfc-stable13? koobs: mfc-stable12? jrtc27: mfc-stable11-

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Dennis Clarke 2020-11-04 16:05:56 UTC

While running a test image from Mitchell Horne provided at : 

    https://reviews.freebsd.org/D27045

I was able to use it just fine under qemu and with a ZFS zpool. 

However after a number of hours while building the various dependencies
I ran into the following panic : 

Kernel page fault with the following non-sleepable locks held:
exclusive sleep mutex pmap (pmap) r = 0 (0xffffffd1a3420760) locked @ /usr/src/sys/riscv/riscv/pmap.c:3887
exclusive rw pmap pv list (pmap pv list) r = 0 (0xffffffc00083ff40) locked @ /usr/src/sys/riscv/riscv/pmap.c:3862
shared rw pmap pv global (pmap pv global) r = 0 (0xffffffc00083fe00) locked @ /usr/src/sys/riscv/riscv/pmap.c:3860
exclusive rw vm object (vm object) r = 0 (0xffffffd1d0046b58) locked @ /usr/src/sys/vm/vnode_pager.c:1239
stack backtrace:  339 -16    -     0B  5424K swapin   1  23:30   4.58% kernel
#0 0xffffffc000302a18 at witness_debugger+0x6celect   0   0:01   2.85% miniruby
#1 0xffffffc000303ba6 at witness_warn+0x420
#2 0xffffffc000536ff6 at page_fault_handler+0x60
#3 0xffffffc000536cb0 at do_trap_supervisor+0x64
#4 0xffffffc000527288 at cpu_exception_handler_supervisor+0x68
#5 0xffffffc0004f9b70 at vm_object_page_collect_flush+0x10a
#6 0xffffffc0004f990e at vm_object_page_clean+0x15e
#7 0xffffffc00036a9c0 at vinactivef+0xae
#8 0xffffffc00036a1bc at vput_final+0x256
#9 0xffffffc00036a248 at vput+0x32
#10 0xffffffc00037da08 at vn_close1+0x13c
#11 0xffffffc00037c466 at vn_closefile+0x44
#12 0xffffffc00024a324 at _fdrop+0x16
#13 0xffffffc00024d258 at closef+0x1e4
#14 0xffffffc00024a7dc at closefp+0x82
#15 0xffffffc00024afb6 at kern_close+0x11e
#16 0xffffffc00024ae8c at sys_close+0xe
#17 0xffffffc00053734e at do_trap_user+0x23a
t[0] == 0xffffffd1ffd8d800
t[1] == 0x0000000000000030
t[2] == 0x0000000000000000
t[3] == 0xffffffc062409708
t[4] == 0x000000000000000f
t[5] == 0x0000000000000180
t[6] == 0xffffffd00b47ed00
s[0] == 0xffffffc062409660
s[1] == 0xffffffd0a2cff2e0
s[2] == 0xffffffd1a3420778
s[3] == 0xffffffd1f3710830
s[4] == 0xffffffd1a3420760
s[5] == 0xffffffc0624097e0
s[6] == 0x0000000000000000
s[7] == 0x0000000000001000
s[8] == 0xffffffc0624095e0
s[9] == 0xfffffffffffff000
s[10] == 0x0000000000000001
s[11] == 0xffffffc000586443
a[0] == 0x0000000045f9e000
a[1] == 0x0000000000000000
a[2] == 0xffffffd1b6600178
a[3] == 0x0000000080000000
a[4] == 0x0000000280000000
a[5] == 0x0000000000000000
a[6] == 0xffffffcf80000000
a[7] == 0xffffffc000586443
ra == 0xffffffc000533736
sp == 0xffffffc062409560
gp == 0x0000000000000008
tp == 0x0000000000000020
sepc == 0xffffffc0005337ba
sstatus == 0x8000000000006120
panic: Fatal page fault at 0xffffffc0005337ba: 0000000000000000
cpuid = 1
time = 1604493628
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x38
kdb_backtrace() at kdb_backtrace+0x2c
vpanic() at vpanic+0x146
panic() at panic+0x26
page_fault_handler() at page_fault_handler+0x17a
do_trap_supervisor() at do_trap_supervisor+0x64
cpu_exception_handler_supervisor() at cpu_exception_handler_supervisor+0x68
--- exception 13, tval = 0
pmap_remove_write() at pmap_remove_write+0x352
vm_object_page_collect_flush() at vm_object_page_collect_flush+0x10a
vm_object_page_clean() at vm_object_page_clean+0x15e
vinactivef() at vinactivef+0xae
vput_final() at vput_final+0x256
vput() at vput+0x32
vn_close1() at vn_close1+0x13c
vn_closefile() at vn_closefile+0x44
_fdrop() at _fdrop+0x16
closef() at closef+0x1e4
closefp() at closefp+0x82
kern_close() at kern_close+0x11e
sys_close() at sys_close+0xe
do_trap_user() at do_trap_user+0x23a
cpu_exception_handler_user() at cpu_exception_handler_user+0x72
--- exception 8, tval = 0
KDB: enter: panic
[ thread pid 11478 tid 101371 ]
Stopped at      kdb_enter+0x4c: sd      zero,0(a0)
db> where
Tracing pid 11478 tid 101371 td 0xffffffc0620dfb80
kdb_enter() at kdb_enter+0x4a
vpanic() at vpanic+0x164
panic() at panic+0x26
page_fault_handler() at page_fault_handler+0x17a
do_trap_supervisor() at do_trap_supervisor+0x64
cpu_exception_handler_supervisor() at cpu_exception_handler_supervisor+0x68
--- exception 13, tval = 0
pmap_remove_write() at pmap_remove_write+0x352
vm_object_page_collect_flush() at vm_object_page_collect_flush+0x10a
vm_object_page_clean() at vm_object_page_clean+0x15e
vinactivef() at vinactivef+0xae
vput_final() at vput_final+0x256
vput() at vput+0x32
vn_close1() at vn_close1+0x13c
vn_closefile() at vn_closefile+0x44
_fdrop() at _fdrop+0x16
closef() at closef+0x1e4
closefp() at closefp+0x82
kern_close() at kern_close+0x11e
sys_close() at sys_close+0xe
do_trap_user() at do_trap_user+0x23a
cpu_exception_handler_user() at cpu_exception_handler_user+0x72
--- exception 8, tval = 0
db> 

Could be a null pointer deref issue? 

Seeing tval is 0 could be null pointer not a locking issue. There is also
exception 8, tval = 0 and maybe that is userspace making a syscall. Not my
area of knowledge ... sorry.

this probably only happens on FreeBSD CURRENT with the witness options and
debugging options enabled. 

If I can reproduce it that I may be able to gather a dump next time. 


-- 
Dennis Clarke
RISC-V/SPARC/PPC/ARM/CISC
UNIX and Linux spoken
GreyBeard and suspenders optional

Comment 1 Jessica Clarke freebsd_committer

2020-11-05 15:24:20 UTC

The disk image doesn't have a kernel.full in it, but here's the disassembly:

(gdb) x/20i 0xffffffc0005337ba
   0xffffffc0005337ba <pmap_remove_write+850>:  ld      a0,0(a1)
   0xffffffc0005337bc <pmap_remove_write+852>:  sd      a0,-128(s0)
   0xffffffc0005337c0 <pmap_remove_write+856>:  andi    a2,a0,4
   0xffffffc0005337c4 <pmap_remove_write+860>:  beqz    a2,0xffffffc000533700 <pmap_remove_write+664>
   0xffffffc0005337c6 <pmap_remove_write+862>:  andi    a2,a0,-133
   0xffffffc0005337ca <pmap_remove_write+866>:  li      a4,1
   0xffffffc0005337cc <pmap_remove_write+868>:  lr.d    a3,(a1)
   0xffffffc0005337d0 <pmap_remove_write+872>:  bne     a3,a0,0xffffffc0005337da <pmap_remove_write+882>
   0xffffffc0005337d4 <pmap_remove_write+876>:  sc.d    a4,a2,(a1)
   0xffffffc0005337d8 <pmap_remove_write+880>:  j       0xffffffc0005337de <pmap_remove_write+886>
   0xffffffc0005337da <pmap_remove_write+882>:  sd      a3,0(s8)
   0xffffffc0005337de <pmap_remove_write+886>:  ld      a0,-128(s0)
   0xffffffc0005337e2 <pmap_remove_write+890>:  slli    a2,a4,0x20
   0xffffffc0005337e6 <pmap_remove_write+894>:  srli    a2,a2,0x20
   0xffffffc0005337e8 <pmap_remove_write+896>:  bnez    a2,0xffffffc0005337c0 <pmap_remove_write+856>
   0xffffffc0005337ea <pmap_remove_write+898>:  andi    a0,a0,128
   0xffffffc0005337ee <pmap_remove_write+902>:  beqz    a0,0xffffffc0005337fa <pmap_remove_write+914>
   0xffffffc0005337f0 <pmap_remove_write+904>:  mv      a0,s3
   0xffffffc0005337f2 <pmap_remove_write+906>:  auipc   ra,0xfffcc
   0xffffffc0005337f6 <pmap_remove_write+910>:  jalr    -1888(ra)

The `andi    a2,a0,4` looks like masking with PTE_W, the `andi    a2,a0,-133` looks like masking with `~(PTE_D | PTE_W)`, and the li/lr.d/bne/sc.d/j/sd sequence is precisely atomic_fcmpset_64 (aka atomic_fcmpset_long), so this would indeed appear to be that the `oldl3 = pmap_load(l3);` faulted due to l3 (a1) coming from pmap_l3 being NULL. That likely means that either the mapping is no longer valid or we somehow ended up with a superpage in that list.

Comment 2 Jessica Clarke freebsd_committer

2021-07-07 03:04:03 UTC

Likely fix: https://reviews.freebsd.org/D31087

Comment 3 Jessica Clarke freebsd_committer

2021-07-08 02:31:42 UTC

Unfortunately further investigation reveals additional subtleties that suggest the above patch does not fix the bug, only a theoretical panic that's not yet been seen (have to have a superpage being allocated for userspace that forces the creation of a new L2 table, but I've yet to find a way to cause that to happen due to the nature of transparent superpages and when we choose to promote).

Comment 4 Jessica Clarke freebsd_committer

2021-07-21 02:19:34 UTC

More likely fix: https://reviews.freebsd.org/D31253

Comment 5 Jessica Clarke freebsd_committer

2021-07-21 02:39:20 UTC

The underlying bug is also believed to result in the following similar panics (especially the first one):

panic: pmap_l2_to_l3: PA out of range, PA: 0x0
cpuid = 1
time = 1625512247
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x38
kdb_backtrace() at kdb_backtrace+0x2c
vpanic() at vpanic+0x148
panic() at panic+0x2a
pmap_remove_write() at pmap_remove_write+0x56a
vm_object_page_collect_flush() at vm_object_page_collect_flush+0xf8
vm_object_page_clean() at vm_object_page_clean+0x144
vinactivef() at vinactivef+0x90
vput_final() at vput_final+0x2ea
vput() at vput+0x32
vn_close1() at vn_close1+0x13c
vn_closefile() at vn_closefile+0x44
_fdrop() at _fdrop+0x18
closef() at closef+0x1b8
closefp_impl() at closefp_impl+0x78
closefp() at closefp+0x52
kern_close() at kern_close+0x134
sys_close() at sys_close+0xe
do_trap_user() at do_trap_user+0x208
cpu_exception_handler_user() at cpu_exception_handler_user+0x72
--- exception 8, tval = 0x6
KDB: enter: panic

panic: Fatal page fault at 0xffffffc0005cf24e: 0x00000000000050
cpuid = 0
time = 1625611370
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x38
kdb_backtrace() at kdb_backtrace+0x2c
vpanic() at vpanic+0x148
panic() at panic+0x2a
page_fault_handler() at page_fault_handler+0x1ba
do_trap_supervisor() at do_trap_supervisor+0x7a
cpu_exception_handler_supervisor() at cpu_exception_handler_supervisor+0x70
--- exception 13, tval = 0x50
pmap_enter_l2() at pmap_enter_l2+0xb2
pmap_enter_object() at pmap_enter_object+0x15e
vm_map_pmap_enter() at vm_map_pmap_enter+0x228
vm_map_insert() at vm_map_insert+0x4ec
vm_map_find() at vm_map_find+0x474
vm_map_find_min() at vm_map_find_min+0x52
vm_mmap_object() at vm_mmap_object+0x1ba
vn_mmap() at vn_mmap+0xf8
kern_mmap() at kern_mmap+0x4c4
sys_mmap() at sys_mmap+0x38
do_trap_user() at do_trap_user+0x208
cpu_exception_handler_user() at cpu_exception_handler_user+0x72
--- exception 8, tval = 0x1dd
KDB: enter: panic

The race fixed by the linked patch is exposed by transparent superpage promotion (without that the pmap_kextract code, whilst inefficient, has nothing to race with), which is not implemented in stable/11 for RISC-V but was MFC'ed to stable/12 in r348644 (da49aaa2557ae2b49c538fc68e97b8c67348f9a6).

Comment 6 commit-hook freebsd_committer

2021-07-22 19:04:29 UTC

A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=4a235049082ee1cb044873ad9aff12cf73d0fd3b

commit 4a235049082ee1cb044873ad9aff12cf73d0fd3b
Author:     Jessica Clarke <jrtc27@FreeBSD.org>
AuthorDate: 2021-07-22 19:02:14 +0000
Commit:     Jessica Clarke <jrtc27@FreeBSD.org>
CommitDate: 2021-07-22 19:02:14 +0000

    riscv: Fix pmap_kextract racing with concurrent superpage promotion/demotion

    This repeats amd64's cfcbf8c6fd3b (r180498) and i386's cf3508519c5e
    (r202894) but for riscv; pmap_kextract must be lock-free and so it can
    race with superpage promotion and demotion, thus the L2 entry must only
    be loaded once to avoid using inconsistent state.

    PR:     250866
    Reviewed by:    markj, mhorne
    Tested by:      David Gilbert <dgilbert@daveg.ca>
    MFC after:      1 week
    Differential Revision:  https://reviews.freebsd.org/D31253

 sys/riscv/riscv/pmap.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

Comment 7 commit-hook freebsd_committer

2021-09-07 12:10:38 UTC

A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=2e3c6024a476f622e43e68243445168aa40a8d8e

commit 2e3c6024a476f622e43e68243445168aa40a8d8e
Author:     Jessica Clarke <jrtc27@FreeBSD.org>
AuthorDate: 2021-07-22 19:02:14 +0000
Commit:     Jessica Clarke <jrtc27@FreeBSD.org>
CommitDate: 2021-09-07 12:06:49 +0000

    riscv: Fix pmap_kextract racing with concurrent superpage promotion/demotion

    This repeats amd64's cfcbf8c6fd3b (r180498) and i386's cf3508519c5e
    (r202894) but for riscv; pmap_kextract must be lock-free and so it can
    race with superpage promotion and demotion, thus the L2 entry must only
    be loaded once to avoid using inconsistent state.

    PR:     250866
    Reviewed by:    markj, mhorne
    Tested by:      David Gilbert <dgilbert@daveg.ca>
    MFC after:      1 week
    Differential Revision:  https://reviews.freebsd.org/D31253

    (cherry picked from commit 4a235049082ee1cb044873ad9aff12cf73d0fd3b)

 sys/riscv/riscv/pmap.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)