panic: mutex pipe mutex not owned at ../../../kern/sys_pipe.c:1748 cpuid = 2 time = 1549792086 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x47/frame 0xfffffe00b8acc350 vpanic() at vpanic+0x20c/frame 0xfffffe00b8acc3b0 panic() at panic+0x43/frame 0xfffffe00b8acc410 __mtx_assert() at __mtx_assert+0x174/frame 0xfffffe00b8acc450 filt_pipewrite() at filt_pipewrite+0x38/frame 0xfffffe00b8acc480 kqueue_register() at kqueue_register+0xec2/frame 0xfffffe00b8acc520 kqueue_kevent() at kqueue_kevent+0x10b/frame 0xfffffe00b8acc800 kern_kevent_fp() at kern_kevent_fp+0xbc/frame 0xfffffe00b8acc850 kern_kevent() at kern_kevent+0xe2/frame 0xfffffe00b8acc8b0 kern_kevent_generic() at kern_kevent_generic+0xb4/frame 0xfffffe00b8acc910 sys_kevent() at sys_kevent+0x72/frame 0xfffffe00b8acc980 amd64_syscall() at amd64_syscall+0x45e/frame 0xfffffe00b8accab0 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe00b8accab0 --- syscall (560, FreeBSD ELF64, sys_kevent), rip = 0x80043055a, rsp = 0x7fffdfffde88, rbp = 0x7fffdfffded0 --- KDB: enter: panic [ thread pid 9322 tid 101144 ] Stopped at kdb_enter+0x6a: movq $0,kdb_why db> x/s version version: FreeBSD 13.0-CURRENT #0 r343931: Sat Feb 9 12:49:52 CET 2019\012 pho@mercat1.netperf.freebsd.org:/usr/src/sys/amd64/compile/PHO\012 db> A slightly older stack trace is found here: https://people.freebsd.org/~pho/stress/log/kevent13.txt
Could you provide a copy of k6.sh? I don't see it in stress2.
(In reply to Mark Johnston from comment #1) Test scenario: misc/kevent13.sh I just updated the test scenario to include VM pressure. root@t2:~pho/stress2/misc # uname -a FreeBSD t2.osted.lan 13.0-CURRENT FreeBSD 13.0-CURRENT #2 r343705M: Sun Feb 10 19:11:22 CET 2019 pho@t2.osted.lan:/usr/src/sys/amd64/compile/PHO amd64 root@t2:~pho/stress2/misc # ./kevent13.sh panic: mutex pipe mutex not owned at ../../../kern/sys_pipe.c:1748 cpuid = 14 time = 1549896871 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00d70b33d0 vpanic() at vpanic+0x1b4/frame 0xfffffe00d70b3430 panic() at panic+0x43/frame 0xfffffe00d70b3490 __mtx_assert() at __mtx_assert+0xb4/frame 0xfffffe00d70b34a0 filt_pipewrite() at filt_pipewrite+0x30/frame 0xfffffe00d70b34c0 kqueue_register() at kqueue_register+0xa03/frame 0xfffffe00d70b3550 kqueue_kevent() at kqueue_kevent+0xa2/frame 0xfffffe00d70b3820 kern_kevent_fp() at kern_kevent_fp+0x97/frame 0xfffffe00d70b3870 kern_kevent() at kern_kevent+0x9f/frame 0xfffffe00d70b38d0 kern_kevent_generic() at kern_kevent_generic+0x70/frame 0xfffffe00d70b3930 sys_kevent() at sys_kevent+0x61/frame 0xfffffe00d70b3990 amd64_syscall() at amd64_syscall+0x291/frame 0xfffffe00d70b3ab0 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe00d70b3ab0 --- syscall (560, FreeBSD ELF64, sys_kevent), rip = 0x80043055a, rsp = 0x7fffdfffde88, rbp = 0x7fffdfffded0 --- KDB: enter: panic [ thread pid 1369 tid 101232 ] Stopped at kdb_enter+0x3b: movq $0,kdb_why db>
This is caused by a race between one end of a pipe being closed and the registration of an EVFILT_WRITE knote for the other end of the pipe. Suppose f_attach (pipe_kqfilter()) succeeds, but before kn_list_lock() is called, the other end of the pipe is closed. pipeclose() removes the EVFILT_WRITE knote from that end's knlist, so kn_list_lock() returns NULL, and we call filt_pipewrite() without the pipe's mutex held. I think this simply means that the assertion in filt_pipewrite() is wrong. We can only assert that the pipe lock is held if wpipe->pipe_present == PIPE_ACTIVE.
Created attachment 202077 [details] proposed patch Peter, could you test the attached patch? BTW, I think kevent13.sh is slightly buggy: it uses EV_SET twice to initialize the same event, and declares that the change list has length 2 when it is in fact 1. For the purpose of reproducing the bug, only the EVFILT_WRITE event is needed. We'd get better coverage by registering both events and randomizing the order in which do_thread2() closes the pipe ends.
(In reply to Mark Johnston from comment #4) Sure, I'll test the patch. I'll work on adding a new test scenario.
(In reply to Mark Johnston from comment #4) I ran tests for 8 hours including a buildworld. LGTM.
A commit references this bug: Author: markj Date: Tue Feb 19 15:46:43 UTC 2019 New revision: 344278 URL: https://svnweb.freebsd.org/changeset/base/344278 Log: Move a racy assertion in filt_pipewrite(). EVFILT_WRITE knotes for pipes live on the knlist for the other end of the pipe. Since they do not hold a reference on the corresponding file structure, they may be removed from the knlist by pipeclose() while still remaining active. In this case, there is no knlist lock acquired before filt_pipewrite() is called, so the assertion fails. Fix the problem by first checking whether that end of the pipe has been closed. These checks are memory safe since the knote holds a reference on one end of the pipe, and the pipe structure is not freed until both ends are closed. The checks are not racy since PIPE_EOF is never cleared after being set, and pipe_present is never set back to PIPE_ACTIVE after pipeclose() has been called. PR: 235640 Reported and tested by: pho Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19224 Changes: head/sys/kern/sys_pipe.c
A commit references this bug: Author: markj Date: Tue Mar 5 19:37:07 UTC 2019 New revision: 344819 URL: https://svnweb.freebsd.org/changeset/base/344819 Log: MFC r344278: Move a racy assertion in filt_pipewrite(). PR: 235640 Changes: _U stable/12/ stable/12/sys/kern/sys_pipe.c
A commit references this bug: Author: markj Date: Tue Mar 5 19:39:14 UTC 2019 New revision: 344820 URL: https://svnweb.freebsd.org/changeset/base/344820 Log: MFC r344278: Move a racy assertion in filt_pipewrite(). PR: 235640 Changes: _U stable/11/ stable/11/sys/kern/sys_pipe.c