https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14399/console sys/file/flock_test:main -> Kernel page fault with the following non-sleepable locks held: exclusive sleep mutex sigacts (sigacts) r = 0 (0xfffff80004e99ca8) locked @ /usr/src/sys/kern/subr_trap.c:330 exclusive sleep mutex process lock (process lock) r = 0 (0xfffff8000491eb80) locked @ /usr/src/sys/kern/subr_trap.c:329 stack backtrace: #0 0xffffffff80c345a1 at witness_debugger+0x71 #1 0xffffffff80c355c0 at witness_warn+0x430 #2 0xffffffff8106cf70 at trap_pfault+0x80 #3 0xffffffff8106c587 at trap+0x2a7 #4 0xffffffff8104288c at calltrap+0x8 #5 0xffffffff80bd0bcb at sigfastblock_setpend+0x4b #6 0xffffffff80c29ca9 at ast+0x339 #7 0xffffffff81045da9 at doreti_ast+0x1f Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x80069b038 fault code = supervisor write data, protection violation instruction pointer = 0x20:0xffffffff8106a06d stack pointer = 0x28:0xfffffe003adfbb80 frame pointer = 0x28:0xfffffe003adfbb80 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 16434 (flock_helper) trap number = 12 panic: page fault cpuid = 0 time = 1582222637 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe003adfb7e0 vpanic() at vpanic+0x185/frame 0xfffffe003adfb840 panic() at panic+0x43/frame 0xfffffe003adfb8a0 trap_fatal() at trap_fatal+0x386/frame 0xfffffe003adfb900 trap_pfault() at trap_pfault+0x99/frame 0xfffffe003adfb980 trap() at trap+0x2a7/frame 0xfffffe003adfbab0 calltrap() at calltrap+0x8/frame 0xfffffe003adfbab0 --- trap 0xc, rip = 0xffffffff8106a06d, rsp = 0xfffffe003adfbb80, rbp = 0xfffffe003adfbb80 --- casueword32_nosmap() at casueword32_nosmap+0x2d/frame 0xfffffe003adfbb80 sigfastblock_setpend() at sigfastblock_setpend+0x4b/frame 0xfffffe003adfbbb0 ast() at ast+0x339/frame 0xfffffe003adfbbf0 doreti_ast() at doreti_ast+0x1f/frame 0x7fffffffe900 KDB: enter: panic [ thread pid 16434 tid 100129 ] Stopped at kdb_enter+0x37: movq $0,0x10872f6(%rip) db:0:kdb.enter.panic> show pcpu cpuid = 0 dynamic pcpu = 0x787d80 curthread = 0xfffffe0032df7c00: pid 16434 tid 100129 critnest 1 "flock_helper" curpcb = 0xfffffe0032df8100 fpcurthread = 0xfffffe0032df7c00: pid 16434 "flock_helper" idlethread = 0xfffffe000a1f6300: tid 100003 "idle: cpu0" self = 0xffffffff82210000 curpmap = 0xfffffe0032dfab08 tssp = 0xffffffff82210384 rsp0 = 0xfffffe003adfbcc0 kcr3 = 0x8000000073a9e6f9 ucr3 = 0x8000000073ac8ef9 scr3 = 0x73ac8ef9 gs32p = 0xffffffff82210404 ldt = 0xffffffff82210444 tss = 0xffffffff82210434 tlb gen = 104535 curvnet = 0 spin locks held: db:0:kdb.enter.panic>
test VM image: https://artifact.ci.freebsd.org/snapshot/head/r358170/amd64/amd64/disk-test.img.xz to reproduce: cd /usr/tests/sys/file/ kyua debug flock_test:main
kib: can you help check if this has anything to do with r358168? Thanks.
show me verbose dmesg of the boot of this instance
(In reply to Konstantin Belousov from comment #3) https://gist.github.com/61587eef583bd45d2ccb98d260813111 Is this enough?
Created attachment 211781 [details] Do not call sigfastblock_setpend() while owning mutexes. Try this please.
(In reply to Konstantin Belousov from comment #5) After applying this patch, sys.file.flock_test.main and other sys.file.* tests pass.
A commit references this bug: Author: kib Date: Thu Feb 20 21:25:12 UTC 2020 New revision: 358182 URL: https://svnweb.freebsd.org/changeset/base/358182 Log: Fix a bug in r358168, do not call sigfastblock_setpend() under a mutex. PR: 244250 Reported and tested by: lwhsu Sponsored by: The FreeBSD Foundation Changes: head/sys/kern/subr_trap.c
No panics observed so far after r358182.