Bug 244250

Summary: sys.file.flock_test.main panics kernel after [r358153, r358170]
Product: Base System Reporter: Li-Wen Hsu <lwhsu>
Component: testsAssignee: freebsd-testing (Nobody) <testing>
Status: Closed FIXED    
Severity: Affects Only Me CC: kib
Priority: ---    
Version: CURRENT   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
Do not call sigfastblock_setpend() while owning mutexes. none

Description Li-Wen Hsu freebsd_committer freebsd_triage 2020-02-20 18:53:50 UTC
https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14399/console

sys/file/flock_test:main  ->  Kernel page fault with the following non-sleepable locks held:
exclusive sleep mutex sigacts (sigacts) r = 0 (0xfffff80004e99ca8) locked @ /usr/src/sys/kern/subr_trap.c:330
exclusive sleep mutex process lock (process lock) r = 0 (0xfffff8000491eb80) locked @ /usr/src/sys/kern/subr_trap.c:329
stack backtrace:
#0 0xffffffff80c345a1 at witness_debugger+0x71
#1 0xffffffff80c355c0 at witness_warn+0x430
#2 0xffffffff8106cf70 at trap_pfault+0x80
#3 0xffffffff8106c587 at trap+0x2a7
#4 0xffffffff8104288c at calltrap+0x8
#5 0xffffffff80bd0bcb at sigfastblock_setpend+0x4b
#6 0xffffffff80c29ca9 at ast+0x339
#7 0xffffffff81045da9 at doreti_ast+0x1f


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x80069b038
fault code              = supervisor write data, protection violation
instruction pointer     = 0x20:0xffffffff8106a06d
stack pointer           = 0x28:0xfffffe003adfbb80
frame pointer           = 0x28:0xfffffe003adfbb80
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 16434 (flock_helper)
trap number             = 12
panic: page fault
cpuid = 0
time = 1582222637
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe003adfb7e0
vpanic() at vpanic+0x185/frame 0xfffffe003adfb840
panic() at panic+0x43/frame 0xfffffe003adfb8a0
trap_fatal() at trap_fatal+0x386/frame 0xfffffe003adfb900
trap_pfault() at trap_pfault+0x99/frame 0xfffffe003adfb980
trap() at trap+0x2a7/frame 0xfffffe003adfbab0
calltrap() at calltrap+0x8/frame 0xfffffe003adfbab0
--- trap 0xc, rip = 0xffffffff8106a06d, rsp = 0xfffffe003adfbb80, rbp = 0xfffffe003adfbb80 ---
casueword32_nosmap() at casueword32_nosmap+0x2d/frame 0xfffffe003adfbb80
sigfastblock_setpend() at sigfastblock_setpend+0x4b/frame 0xfffffe003adfbbb0
ast() at ast+0x339/frame 0xfffffe003adfbbf0
doreti_ast() at doreti_ast+0x1f/frame 0x7fffffffe900
KDB: enter: panic
[ thread pid 16434 tid 100129 ]
Stopped at      kdb_enter+0x37: movq    $0,0x10872f6(%rip)
db:0:kdb.enter.panic> show pcpu
cpuid        = 0
dynamic pcpu = 0x787d80
curthread    = 0xfffffe0032df7c00: pid 16434 tid 100129 critnest 1 "flock_helper"
curpcb       = 0xfffffe0032df8100
fpcurthread  = 0xfffffe0032df7c00: pid 16434 "flock_helper"
idlethread   = 0xfffffe000a1f6300: tid 100003 "idle: cpu0"
self         = 0xffffffff82210000
curpmap      = 0xfffffe0032dfab08
tssp         = 0xffffffff82210384
rsp0         = 0xfffffe003adfbcc0
kcr3         = 0x8000000073a9e6f9
ucr3         = 0x8000000073ac8ef9
scr3         = 0x73ac8ef9
gs32p        = 0xffffffff82210404
ldt          = 0xffffffff82210444
tss          = 0xffffffff82210434
tlb gen      = 104535
curvnet      = 0
spin locks held:
db:0:kdb.enter.panic>
Comment 1 Li-Wen Hsu freebsd_committer freebsd_triage 2020-02-20 18:56:30 UTC
test VM image: https://artifact.ci.freebsd.org/snapshot/head/r358170/amd64/amd64/disk-test.img.xz

to reproduce:
cd /usr/tests/sys/file/
kyua debug flock_test:main
Comment 2 Li-Wen Hsu freebsd_committer freebsd_triage 2020-02-20 18:57:28 UTC
kib: can you help check if this has anything to do with r358168? Thanks.
Comment 3 Konstantin Belousov freebsd_committer freebsd_triage 2020-02-20 19:42:34 UTC
show me verbose dmesg of the boot of this instance
Comment 4 Li-Wen Hsu freebsd_committer freebsd_triage 2020-02-20 19:54:21 UTC
(In reply to Konstantin Belousov from comment #3)
https://gist.github.com/61587eef583bd45d2ccb98d260813111
Is this enough?
Comment 5 Konstantin Belousov freebsd_committer freebsd_triage 2020-02-20 19:59:10 UTC
Created attachment 211781 [details]
Do not call sigfastblock_setpend() while owning mutexes.

Try this please.
Comment 6 Li-Wen Hsu freebsd_committer freebsd_triage 2020-02-20 21:06:12 UTC
(In reply to Konstantin Belousov from comment #5)
After applying this patch, sys.file.flock_test.main and other sys.file.* tests pass.
Comment 7 commit-hook freebsd_committer freebsd_triage 2020-02-20 21:25:18 UTC
A commit references this bug:

Author: kib
Date: Thu Feb 20 21:25:12 UTC 2020
New revision: 358182
URL: https://svnweb.freebsd.org/changeset/base/358182

Log:
  Fix a bug in r358168, do not call sigfastblock_setpend() under a mutex.

  PR:	244250
  Reported and tested by:	lwhsu
  Sponsored by:	The FreeBSD Foundation

Changes:
  head/sys/kern/subr_trap.c
Comment 8 Li-Wen Hsu freebsd_committer freebsd_triage 2020-03-02 04:18:53 UTC
No panics observed so far after r358182.