Bug 180385 - [kqueue] Conflict between EVFILT_PROC NOTE_CHILD and NOTE_EXIT use of data field
Summary: [kqueue] Conflict between EVFILT_PROC NOTE_CHILD and NOTE_EXIT use of data field
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 9.1-RELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: Eric van Gyzen
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-07-08 15:50 UTC by David Bright
Modified: 2016-04-14 17:15 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Bright freebsd_committer freebsd_triage 2013-07-08 15:50:00 UTC
There is a bug in the kevent EVFILT_PROC handling, possibly introduced by a mod in 2006:

http://svnweb.freebsd.org/base/head/sys/kern/kern_event.c?r1=164450&r2=164451

The scenario is a process that spawns a bunch of other processes and uses the kevent EVFILT_PROC, NOTE_TRACK facility to keep tabs on them, maintaining a process history including process parent/child relationships and whether the processes are running or have exited. Consider a process that does a fork() and then the child attempts to exec() a non-existent program and does an exit(127) on the exec() failure. This can result in a single kevent that returns both the NOTE_CHILD and NOTE_EXIT fflags.

Unfortunately, both of these fflags are defined to return something in kevent.data (the parent pid (ppid) for NOTE_CHILD and the exit status for NOTE_EXIT). Obviously, kevent.data can't contain both pieces of information. In fact, what gets returned is the exit status, so when the receiving code tries to interpret the NOTE_CHILD it appears that the ppid is 32512 (127 << 8), which cam really throws off the process tracking.

Before the mod mentioned above the exit status was not returned, so the returned ppid for the NOTE_CHILD would have been correct (sys/kern/kern_event.c, file_procattach(), about line 368 in head), although the returned ppid would not make much sense as an exit status for the NOTE_EXIT.

What I think should happen is that when filt_proc() (about kern_event.c line 433) is going to set the exit status, it should first check if NOTE_CHILD is already set in the knote and, if so, allocate a new knote for the NOTE_EXIT and queue it after the existing NOTE_CHILD knote (some adjustment would need to be made around line 424, too, since that is where the NOTE_EXIT was set). This would guarantee that the NOTE_CHILD was received before the NOTE_EXIT and that the appropriate piece of data could accompany each NOTE_.

I don't know if there is a significant concern about allocating a new knote at that point. If that were the case and the ideal behavior I described were not possible, I would think that it would be more appropriate to return the ppid in the knote and not set the kn->kn_data field to the exit status iff the NOTE_CHILD fflag was set, thereby giving it precedence. That would at least work much better for my particular situation!  If you receive a kevent with both NOTE_CHILD and NOTE_EXIT set, you might be able to presume that the child probably failed; in any case you would know for sure that it had exited and what
process was its parent.

I've exchanged email with jhb on the problem and he indicated that it might be a while before he could get to it. I'll take a shot at it myself, but it will probably be at least a couple weeks before I can do so. I wanted to file this PR so that it was out there in case someone else might be able to get to it sooner and also so that it isn't forgotten.

Fix: 

jhb suggested:

"Hmmm, this might be fixable by adding a f_touch method to the EVFILT_PROC
handling and having it notice the two states and break them up."
How-To-Repeat: This is a timing related thing, but doing a fork() and then exiting immediately in the child is likely to show the problem fairly often.
Comment 1 Jilles Tjoelker freebsd_committer freebsd_triage 2013-07-10 23:24:21 UTC
Hi,

In PR 180385, you wrote about problems with EVFILT_PROC/NOTE_TRACK. I
tried to use this too (for system service management, pid 1 or close to
it) but ran across a problem already while reading the man page:
NOTE_TRACKERR. If NOTE_TRACKERR happens, all the tracking breaks down.
How do you deal with this?

NOTE_CHILD|NOTE_EXIT knotes can be seen easily by suspending the process
so it does not invoke kevent() for a long time.

If EVFILT_PROC is not supposed to keep zombies alive (pretty much
mandatory, otherwise any user can prevent any other user from freeing up
proc slots by reaping zombies), then limiting the number of knotes that
do not correspond to a live kernel object implies that NOTE_TRACKERR
must be possible. Removing NOTE_TRACKERR would require discarding
NOTE_CHILD|NOTE_EXIT knotes when the zombie is reaped. Even then,
guaranteeing no NOTE_TRACKERR may cause fork() to fail sooner than
without active kqueues.

The extra restrictions can be compared to how waitpid() accepts
(discards) the instance of the SIGCHLD signal for the waited process,
ensuring that every terminating process generates a SIGCHLD signal while
bounding memory usage for undelivered signals.

Splitting NOTE_CHILD|NOTE_EXIT into two knotes either requires
allocating two knotes at fork() time (making NOTE_TRACKERR slightly more
likely and possibly wasting some memory) or allowing the split to fail.

A partial workaround for your issue may be the udata field. I think
(untested) that the udata field is copied upon NOTE_CHILD. This does not
allow tracking the full parent-child relationships, but does allow
tracking all descendants of a particular process (except if
NOTE_TRACKERR happens).

-- 
Jilles Tjoelker
Comment 2 Eric van Gyzen freebsd_committer freebsd_triage 2016-01-12 19:42:46 UTC
https://reviews.freebsd.org/D4900
Comment 3 commit-hook freebsd_committer freebsd_triage 2016-01-28 20:24:57 UTC
A commit references this bug:

Author: vangyzen
Date: Thu Jan 28 20:24:15 UTC 2016
New revision: 295012
URL: https://svnweb.freebsd.org/changeset/base/295012

Log:
  kqueue EVFILT_PROC: avoid collision between NOTE_CHILD and NOTE_EXIT

  NOTE_CHILD and NOTE_EXIT return something in kevent.data: the parent
  pid (ppid) for NOTE_CHILD and the exit status for NOTE_EXIT.
  Do not let the two events be combined, since one would overwrite
  the other's data.

  PR:		180385
  Submitted by:	David A. Bright <david_a_bright@dell.com>
  Reviewed by:	jhb
  MFC after:	1 month
  Sponsored by:	Dell Inc.
  Differential Revision:	https://reviews.freebsd.org/D4900

Changes:
  head/sys/kern/kern_event.c
  head/sys/sys/event.h
  head/tests/sys/kqueue/common.h
  head/tests/sys/kqueue/main.c
  head/tests/sys/kqueue/proc.c
Comment 4 commit-hook freebsd_committer freebsd_triage 2016-04-14 17:14:45 UTC
A commit references this bug:

Author: vangyzen
Date: Thu Apr 14 17:14:12 UTC 2016
New revision: 297977
URL: https://svnweb.freebsd.org/changeset/base/297977

Log:
  MFC r295012

  kqueue EVFILT_PROC: avoid collision between NOTE_CHILD and NOTE_EXIT

  NOTE_CHILD and NOTE_EXIT return something in kevent.data: the parent
  pid (ppid) for NOTE_CHILD and the exit status for NOTE_EXIT.
  Do not let the two events be combined, since one would overwrite
  the other's data.

  PR:		180385
  Submitted by:	David A. Bright <david_a_bright@dell.com>
  Sponsored by:	Dell Inc.

Changes:
_U  stable/10/
  stable/10/sys/kern/kern_event.c
  stable/10/sys/sys/event.h
  stable/10/tests/sys/kqueue/common.h
  stable/10/tests/sys/kqueue/main.c
  stable/10/tests/sys/kqueue/proc.c