Bug 235640

Summary: panic: mutex pipe mutex not owned at ../../../kern/sys_pipe.c:1748
Product: Base System Reporter: Peter Holm <pho>
Component: kernAssignee: Mark Johnston <markj>
Status: Closed FIXED    
Severity: Affects Only Me CC: cem, markj
Priority: --- Keywords: crash, stress2
Version: CURRENT   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
proposed patch none

Description Peter Holm freebsd_committer freebsd_triage 2019-02-10 12:48:11 UTC
panic: mutex pipe mutex not owned at ../../../kern/sys_pipe.c:1748
cpuid = 2
time = 1549792086
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x47/frame 0xfffffe00b8acc350
vpanic() at vpanic+0x20c/frame 0xfffffe00b8acc3b0
panic() at panic+0x43/frame 0xfffffe00b8acc410
__mtx_assert() at __mtx_assert+0x174/frame 0xfffffe00b8acc450
filt_pipewrite() at filt_pipewrite+0x38/frame 0xfffffe00b8acc480
kqueue_register() at kqueue_register+0xec2/frame 0xfffffe00b8acc520
kqueue_kevent() at kqueue_kevent+0x10b/frame 0xfffffe00b8acc800
kern_kevent_fp() at kern_kevent_fp+0xbc/frame 0xfffffe00b8acc850
kern_kevent() at kern_kevent+0xe2/frame 0xfffffe00b8acc8b0
kern_kevent_generic() at kern_kevent_generic+0xb4/frame 0xfffffe00b8acc910
sys_kevent() at sys_kevent+0x72/frame 0xfffffe00b8acc980
amd64_syscall() at amd64_syscall+0x45e/frame 0xfffffe00b8accab0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe00b8accab0
--- syscall (560, FreeBSD ELF64, sys_kevent), rip = 0x80043055a, rsp = 0x7fffdfffde88, rbp = 0x7fffdfffded0 ---
KDB: enter: panic
[ thread pid 9322 tid 101144 ]
Stopped at      kdb_enter+0x6a: movq    $0,kdb_why
db> x/s version
version:        FreeBSD 13.0-CURRENT #0 r343931: Sat Feb  9 12:49:52 CET 2019\012    pho@mercat1.netperf.freebsd.org:/usr/src/sys/amd64/compile/PHO\012
db>

A slightly older stack trace is found here:
https://people.freebsd.org/~pho/stress/log/kevent13.txt
Comment 1 Mark Johnston freebsd_committer freebsd_triage 2019-02-11 14:04:52 UTC
Could you provide a copy of k6.sh?  I don't see it in stress2.
Comment 2 Peter Holm freebsd_committer freebsd_triage 2019-02-11 14:58:11 UTC
(In reply to Mark Johnston from comment #1)

Test scenario: misc/kevent13.sh
I just updated the test scenario to include VM pressure.

root@t2:~pho/stress2/misc # uname -a
FreeBSD t2.osted.lan 13.0-CURRENT FreeBSD 13.0-CURRENT #2 r343705M: Sun Feb 10 19:11:22 CET 2019     pho@t2.osted.lan:/usr/src/sys/amd64/compile/PHO  amd64
root@t2:~pho/stress2/misc # ./kevent13.sh 
panic: mutex pipe mutex not owned at ../../../kern/sys_pipe.c:1748
cpuid = 14
time = 1549896871
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00d70b33d0
vpanic() at vpanic+0x1b4/frame 0xfffffe00d70b3430
panic() at panic+0x43/frame 0xfffffe00d70b3490
__mtx_assert() at __mtx_assert+0xb4/frame 0xfffffe00d70b34a0
filt_pipewrite() at filt_pipewrite+0x30/frame 0xfffffe00d70b34c0
kqueue_register() at kqueue_register+0xa03/frame 0xfffffe00d70b3550
kqueue_kevent() at kqueue_kevent+0xa2/frame 0xfffffe00d70b3820
kern_kevent_fp() at kern_kevent_fp+0x97/frame 0xfffffe00d70b3870
kern_kevent() at kern_kevent+0x9f/frame 0xfffffe00d70b38d0
kern_kevent_generic() at kern_kevent_generic+0x70/frame 0xfffffe00d70b3930
sys_kevent() at sys_kevent+0x61/frame 0xfffffe00d70b3990
amd64_syscall() at amd64_syscall+0x291/frame 0xfffffe00d70b3ab0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe00d70b3ab0
--- syscall (560, FreeBSD ELF64, sys_kevent), rip = 0x80043055a, rsp = 0x7fffdfffde88, rbp = 0x7fffdfffded0 ---
KDB: enter: panic
[ thread pid 1369 tid 101232 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why
db>
Comment 3 Mark Johnston freebsd_committer freebsd_triage 2019-02-16 23:20:57 UTC
This is caused by a race between one end of a pipe being closed and the registration of an EVFILT_WRITE knote for the other end of the pipe.  Suppose f_attach (pipe_kqfilter()) succeeds, but before kn_list_lock() is called, the other end of the pipe is closed.  pipeclose() removes the EVFILT_WRITE knote from that end's knlist, so kn_list_lock() returns NULL, and we call filt_pipewrite() without the pipe's mutex held.

I think this simply means that the assertion in filt_pipewrite() is wrong.  We can only assert that the pipe lock is held if wpipe->pipe_present == PIPE_ACTIVE.
Comment 4 Mark Johnston freebsd_committer freebsd_triage 2019-02-16 23:24:10 UTC
Created attachment 202077 [details]
proposed patch

Peter, could you test the attached patch?

BTW, I think kevent13.sh is slightly buggy: it uses EV_SET twice to initialize the same event, and declares that the change list has length 2 when it is in fact 1.  For the purpose of reproducing the bug, only the EVFILT_WRITE event is needed.  We'd get better coverage by registering both events and randomizing the order in which do_thread2() closes the pipe ends.
Comment 5 Peter Holm freebsd_committer freebsd_triage 2019-02-17 07:24:08 UTC
(In reply to Mark Johnston from comment #4)
Sure, I'll test the patch.

I'll work on adding a new test scenario.
Comment 6 Peter Holm freebsd_committer freebsd_triage 2019-02-17 15:28:21 UTC
(In reply to Mark Johnston from comment #4)
I ran tests for 8 hours including a buildworld.
LGTM.
Comment 7 commit-hook freebsd_committer freebsd_triage 2019-02-19 15:46:54 UTC
A commit references this bug:

Author: markj
Date: Tue Feb 19 15:46:43 UTC 2019
New revision: 344278
URL: https://svnweb.freebsd.org/changeset/base/344278

Log:
  Move a racy assertion in filt_pipewrite().

  EVFILT_WRITE knotes for pipes live on the knlist for the other end of the
  pipe.  Since they do not hold a reference on the corresponding file
  structure, they may be removed from the knlist by pipeclose() while still
  remaining active.  In this case, there is no knlist lock acquired before
  filt_pipewrite() is called, so the assertion fails.

  Fix the problem by first checking whether that end of the pipe has been
  closed.  These checks are memory safe since the knote holds a reference
  on one end of the pipe, and the pipe structure is not freed until both
  ends are closed.  The checks are not racy since PIPE_EOF is never cleared
  after being set, and pipe_present is never set back to PIPE_ACTIVE after
  pipeclose() has been called.

  PR:		235640
  Reported and tested by:	pho
  Reviewed by:	kib
  MFC after:	2 weeks
  Sponsored by:	The FreeBSD Foundation
  Differential Revision:	https://reviews.freebsd.org/D19224

Changes:
  head/sys/kern/sys_pipe.c
Comment 8 commit-hook freebsd_committer freebsd_triage 2019-03-05 19:37:14 UTC
A commit references this bug:

Author: markj
Date: Tue Mar  5 19:37:07 UTC 2019
New revision: 344819
URL: https://svnweb.freebsd.org/changeset/base/344819

Log:
  MFC r344278:
  Move a racy assertion in filt_pipewrite().

  PR:	235640

Changes:
_U  stable/12/
  stable/12/sys/kern/sys_pipe.c
Comment 9 commit-hook freebsd_committer freebsd_triage 2019-03-05 19:39:17 UTC
A commit references this bug:

Author: markj
Date: Tue Mar  5 19:39:14 UTC 2019
New revision: 344820
URL: https://svnweb.freebsd.org/changeset/base/344820

Log:
  MFC r344278:
  Move a racy assertion in filt_pipewrite().

  PR:	235640

Changes:
_U  stable/11/
  stable/11/sys/kern/sys_pipe.c