Bug 228858

Summary: panic when delivering knote to a process who has opened a kqueue() is dying
Product: Base System Reporter: Siddharth <siddharthtuli>
Component: kernAssignee: Mark Johnston <markj>
Status: Closed FIXED    
Severity: Affects Some People CC: markj, pho, siddharthtuli
Priority: --- Keywords: crash
Version: 11.0-STABLE   
Hardware: Any   
OS: Any   

Description Siddharth 2018-06-10 05:45:52 UTC
The following race can occur on multi-processor systems resulting in this panic.

-- Race --

	cpu x 											cpu y
process X dies						       		 Process Y is sending knote() to kqueue() opened by X
kqueue_close								    knote (due to exec, exit, fork etc of any process)
    kqueue_drain
        KQ_LOCK(kq) << Acquired the lock			              KQ_LOCK(kq) << sleep and loop in __mtx_lock_sleep 
        ….
        KQ_UNLOCK(kq)
	kqueue_destory
	    mtx_destroy(&kq->kq_lock);
           		  set MTX_UNOWNED|MTX_CONTESTED
										__mtx_lock_sleep()
										Panic because no owner and (MTX_UNOWNED|MTX_CONTESTED) are set
free(kq, M_KQUEUE)


Process X is listening to NOTE_EXIT|NOTE_EXEC|NOTE_TRACK|NOTE_TRACKERR on all the running processes. When process X dies, kernel will close kqueue descriptor.  The kq_lock is therefore destroyed - MTX_UNOWNED|MTX_CONTESTED. Due to the above race, the other thread that is trying to deliver a knote() to process X could panic in __mtx_lock_sleep() because it finds that lock is not exclusively MTX_UNOWNED and tries to deref the owner of the lock (which is NULL). Other api’s like knote_fork() could also run into this problem.


<2>fault virtual address        = 0x3b0
<2>fault code           = supervisor read data, page not present
<2>instruction pointer  = 0x20:0xffffffff804169f6
<2>stack pointer                = 0x28:0xfffffe011f1db9a0
<2>frame pointer                = 0x28:0xfffffe011f1dba10
<2>code segment         = base rx0, limit 0xfffff, type 0x1b
<2>                     = DPL 0, pres 1, long 1, def32 0, gran 1
<2>processor eflags     = interrupt enabled, resume, IOPL = 0
<2>current process              = 9199 (rcp)
<2>trap number          = 12
<2>panic: page fault

(kgdb) bt
#0  __curthread () at ./machine/pcpu.h:221
#1  doadump (textdump=1) at /.amd/svl-engdata1vs1/occamdev/build/freebsd/stable_11/20180413.165755_fbsd-builder_stable_11.0.dc8ec62/src/sys/kern/kern_shutdown.c:313
#2  0xffffffff8042b93f in kern_reboot (howto=260)
    at /.amd/svl-engdata1vs1/occamdev/build/freebsd/stable_11/20180413.165755_fbsd-builder_stable_11.0.dc8ec62/src/sys/kern/kern_shutdown.c:381
#3  0xffffffff8042be9f in vpanic (fmt=<optimized out>, ap=<optimized out>)
    at /.amd/svl-engdata1vs1/occamdev/build/freebsd/stable_11/20180413.165755_fbsd-builder_stable_11.0.dc8ec62/src/sys/kern/kern_shutdown.c:792
#4  0xffffffff8042bee3 in panic (fmt=<unavailable>)
    at /.amd/svl-engdata1vs1/occamdev/build/freebsd/stable_11/20180413.165755_fbsd-builder_stable_11.0.dc8ec62/src/sys/kern/kern_shutdown.c:705
#5  0xffffffff80572b51 in trap_fatal (frame=<optimized out>, eva=<optimized out>)
    at /.amd/svl-engdata1vs1/occamdev/build/freebsd/stable_11/20180413.165755_fbsd-builder_stable_11.0.dc8ec62/src/sys/amd64/amd64/trap.c:841
#6  0xffffffff80572d44 in trap_pfault (frame=0xfffffe011f1db8f0, usermode=<optimized out>)
    at /.amd/svl-engdata1vs1/occamdev/build/freebsd/stable_11/20180413.165755_fbsd-builder_stable_11.0.dc8ec62/src/sys/amd64/amd64/trap.c:691
#7  0xffffffff805724dc in trap (frame=0xfffffe011f1db8f0)
    at /.amd/svl-engdata1vs1/occamdev/build/freebsd/stable_11/20180413.165755_fbsd-builder_stable_11.0.dc8ec62/src/sys/amd64/amd64/trap.c:442
#8  0xffffffff8055a661 in calltrap ()
    at /.amd/svl-engdata1vs1/occamdev/build/freebsd/stable_11/20180413.165755_fbsd-builder_stable_11.0.dc8ec62/src/sys/amd64/amd64/exception.S:236
#9  0xffffffff804169f6 in __mtx_lock_sleep (c=0xfffff8006ec00d18, tid=18446735282413282656, opts=0, file=0x0, line=96)
    at /.amd/svl-engdata1vs1/occamdev/build/freebsd/stable_11/20180413.165755_fbsd-builder_stable_11.0.dc8ec62/src/sys/kern/kern_mutex.c:435
#10 0xffffffff803f5175 in knote (list=0xfffff80089f6c5c0, hint=2147483648, lockflags=<optimized out>)
    at /.amd/svl-engdata1vs1/occamdev/build/freebsd/stable_11/20180413.165755_fbsd-builder_stable_11.0.dc8ec62/src/sys/kern/kern_event.c:2047
#11 0xffffffff803fa46a in exit1 (td=0xfffff8011de8a560, rval=<optimized out>, signo=0)
    at /.amd/svl-engdata1vs1/occamdev/build/freebsd/stable_11/20180413.165755_fbsd-builder_stable_11.0.dc8ec62/src/sys/kern/kern_exit.c:515
#12 0xffffffff803f9b7d in sys_sys_exit (td=0xfffff8006ec00d00, uap=<optimized out>)
    at /.amd/svl-engdata1vs1/occamdev/build/freebsd/stable_11/20180413.165755_fbsd-builder_stable_11.0.dc8ec62/src/sys/kern/kern_exit.c:178
#13 0xffffffff8058297e in syscallenter (td=0xfffff8011de8a560, sa=<optimized out>)
    at /.amd/svl-engdata1vs1/occamdev/build/freebsd/stable_11/20180413.165755_fbsd-builder_stable_11.0.dc8ec62/src/sys/amd64/ia32/../../kern/subr_syscall.c:146
#14 ia32_syscall (frame=0xfffffe011f1dbc00)
    at /.amd/svl-engdata1vs1/occamdev/build/freebsd/stable_11/20180413.165755_fbsd-builder_stable_11.0.dc8ec62/src/sys/amd64/ia32/ia32_syscall.c:187
#15 0xffffffff8055ac45 in Xint0x80_syscall ()
    at /.amd/svl-engdata1vs1/occamdev/build/freebsd/stable_11/20180413.165755_fbsd-builder_stable_11.0.dc8ec62/src/sys/amd64/ia32/ia32_exception.S:73
fr #16 0x00000000c83791bb in ?? ()

(kgdb) fr 9
#9  0xffffffff804169f6 in __mtx_lock_sleep (c=0xfffff8006ec00d18, tid=18446735282413282656, opts=0, file=0x0, line=96)
    at /.amd/svl-engdata1vs1/occamdev/build/freebsd/stable_11/20180413.165755_fbsd-builder_stable_11.0.dc8ec62/src/sys/kern/kern_mutex.c:435
 433                 v = m->mtx_lock;
 434                 if (v != MTX_UNOWNED) {
 435                         owner = (struct thread *)(v & ~MTX_FLAGMASK); 
 436                         if (TD_IS_RUNNING(owner)) { <==== Owner is NULL. Trying to access offset 0x3b0 resulting in page fault

#define TD_IS_RUNNING(td)       ((td)->td_state == TDS_RUNNING)

(kgdb)  p &(*(struct thread*)0)->td_state
$7 = (enum {...} *) 0x3b0

(kgdb) p c
$2 = (volatile uintptr_t *) 0xfffff8006ec00d18
(kgdb) pt struct mtx
type = struct mtx {
    struct lock_object lock_object;
    volatile uintptr_t mtx_lock;
}
(kgdb) p &(*(struct mtx*)0)->mtx_lock
$4 = (volatile uintptr_t *) 0x18
(kgdb) p *(struct mtx*)(0xfffff8006ec00d18-0x18)
$6 = {
  lock_object = {
    lo_name = 0xffffffff805fdf06 "kqueue", 
    lo_flags = 21102592, 
    lo_data = 0, 
    lo_witness = 0x0
  }, 
  mtx_lock = 6 <=== MTX_UNOWNED|MTX_CONTESTED
}

===> from “info threads”
  40   Thread 100237 (PID=5863: pmond) sched_switch (td=0xfffff8006e2a5560, newtd=<optimized out>, flags=<optimized out>)
    at /.amd/svl-engdata1vs1/occamdev/build/freebsd/stable_11/20180413.165755_fbsd-builder_stable_11.0.dc8ec62/src/sys/kern/sched_ule.c:1979

 (kgdb)   thread 40
[Switching to thread 40 (Thread 100237)]
#0  sched_switch (td=0xfffff8006e2a5560, newtd=<optimized out>, flags=<optimized out>)
    at /.amd/svl-engdata1vs1/occamdev/build/freebsd/stable_11/20180413.165755_fbsd-builder_stable_11.0.dc8ec62/src/sys/kern/sched_ule.c:1979
1979    /.amd/svl-engdata1vs1/occamdev/build/freebsd/stable_11/20180413.165755_fbsd-builder_stable_11.0.dc8ec62/src/sys/kern/sched_ule.c: No such file or directory.
(kgdb) bt
#0  sched_switch (td=0xfffff8006e2a5560, newtd=<optimized out>, flags=<optimized out>)
    at /.amd/svl-engdata1vs1/occamdev/build/freebsd/stable_11/20180413.165755_fbsd-builder_stable_11.0.dc8ec62/src/sys/kern/sched_ule.c:1979
#1  0x0000000000000000 in ?? ()
(kgdb) p td->td_proc->p_satete
There is no member named p_satete.
(kgdb) p td->td_proc->p_state
$13 = PRS_ZOMBIE
(kgdb)
Comment 1 commit-hook freebsd_committer freebsd_triage 2018-11-20 09:31:21 UTC
A commit references this bug:

Author: pho
Date: Tue Nov 20 09:30:26 UTC 2018
New revision: 340665
URL: https://svnweb.freebsd.org/changeset/base/340665

Log:
  Added a kevent(2) test.

  PR:		228858
  Sponsored by:	Dell EMC Isilon

Changes:
  user/pho/stress2/misc/kevent12.sh
Comment 2 Mark Johnston freebsd_committer freebsd_triage 2018-11-23 21:22:59 UTC
The process closing the kqueue should drain all of the kqueue's notes; this happens in kqueue_drain().  Unfortunately, we have a few races:
- kqueue_register() doesn't check for KQ_CLOSING, so it may add knotes to the fdtable or hash table after we've started draining.  This can be triggered by knote_fork().
- The locking in knote_fork() is wrong: at the end of the loop we should be acquiring the list lock before the kqueue lock.  Otherwise there's a window where the knote is unlocked and not in flux, and thus may be freed.  To fix this I believe it's sufficient to just reorder the locking; the list lock comes before the kqueue lock in the lock order.  The in-flux state of the knote is sufficient to prevent it from being removed, I believe, so we don't need a marker knote to hold our place in the list.
Comment 3 commit-hook freebsd_committer freebsd_triage 2018-11-24 16:42:27 UTC
A commit references this bug:

Author: markj
Date: Sat Nov 24 16:41:29 UTC 2018
New revision: 340897
URL: https://svnweb.freebsd.org/changeset/base/340897

Log:
  Lock the knlist before releasing the in-flux state in knote_fork().

  Otherwise there is a window, before iteration is resumed, during which
  the knote may be freed.  The in-flux state ensures that the knote will
  not be removed from the knlist while locks are dropped.

  PR:		228858
  Reviewed by:	kib
  Tested by:	pho
  MFC after:	3 days
  Sponsored by:	The FreeBSD Foundation
  Differential Revision:	https://reviews.freebsd.org/D18316

Changes:
  head/sys/kern/kern_event.c
Comment 4 commit-hook freebsd_committer freebsd_triage 2018-11-24 16:58:41 UTC
A commit references this bug:

Author: markj
Date: Sat Nov 24 16:58:34 UTC 2018
New revision: 340898
URL: https://svnweb.freebsd.org/changeset/base/340898

Log:
  Ensure that knotes do not get registered when KQ_CLOSING is set.

  KQ_CLOSING is set before draining the knotes associated with a kqueue,
  so we must ensure that new knotes are not added after that point.  In
  particular, some kernel facilities may register for events on behalf
  of a userspace process and race with a close of the kqueue.

  PR:		228858
  Reviewed by:	kib
  Tested by:	pho
  MFC after:	3 days
  Sponsored by:	The FreeBSD Foundation
  Differential Revision:	https://reviews.freebsd.org/D18316

Changes:
  head/sys/kern/kern_event.c
Comment 5 commit-hook freebsd_committer freebsd_triage 2018-11-27 16:58:10 UTC
A commit references this bug:

Author: markj
Date: Tue Nov 27 16:57:59 UTC 2018
New revision: 341077
URL: https://svnweb.freebsd.org/changeset/base/341077

Log:
  MFC r340897:
  Lock the knlist before releasing the in-flux state in knote_fork().

  PR:	228858

Changes:
_U  stable/12/
  stable/12/sys/kern/kern_event.c
Comment 6 commit-hook freebsd_committer freebsd_triage 2018-11-27 17:00:14 UTC
A commit references this bug:

Author: markj
Date: Tue Nov 27 16:59:13 UTC 2018
New revision: 341078
URL: https://svnweb.freebsd.org/changeset/base/341078

Log:
  MFC r340897:
  Lock the knlist before releasing the in-flux state in knote_fork().

  PR:	228858

Changes:
_U  stable/11/
  stable/11/sys/kern/kern_event.c
Comment 7 commit-hook freebsd_committer freebsd_triage 2018-11-27 17:08:22 UTC
A commit references this bug:

Author: markj
Date: Tue Nov 27 17:08:08 UTC 2018
New revision: 341082
URL: https://svnweb.freebsd.org/changeset/base/341082

Log:
  MFC r340898:
  Ensure that knotes do not get registered when KQ_CLOSING is set.

  PR:	228858

Changes:
_U  stable/12/
  stable/12/sys/kern/kern_event.c
Comment 8 commit-hook freebsd_committer freebsd_triage 2018-11-27 17:10:25 UTC
A commit references this bug:

Author: markj
Date: Tue Nov 27 17:10:01 UTC 2018
New revision: 341083
URL: https://svnweb.freebsd.org/changeset/base/341083

Log:
  MFC r340898:
  Ensure that knotes do not get registered when KQ_CLOSING is set.

  PR:	228858

Changes:
_U  stable/11/
  stable/11/sys/kern/kern_event.c
Comment 9 commit-hook freebsd_committer freebsd_triage 2018-11-28 17:41:10 UTC
A commit references this bug:

Author: markj
Date: Wed Nov 28 17:40:09 UTC 2018
New revision: 341157
URL: https://svnweb.freebsd.org/changeset/base/341157

Log:
  MFstable/12 r341077:
  Lock the knlist before releasing the in-flux state in knote_fork().

  PR:		228858
  Approved by:	re (gjb)

Changes:
_U  releng/12.0/
  releng/12.0/sys/kern/kern_event.c
Comment 10 commit-hook freebsd_committer freebsd_triage 2018-11-28 18:06:31 UTC
A commit references this bug:

Author: markj
Date: Wed Nov 28 18:06:17 UTC 2018
New revision: 341159
URL: https://svnweb.freebsd.org/changeset/base/341159

Log:
  MFstable/12 r341082:
  Ensure that knotes do not get registered when KQ_CLOSING is set.

  PR:		228858
  Approved by:	re (gjb)

Changes:
_U  releng/12.0/
  releng/12.0/sys/kern/kern_event.c