There is a bug in the kevent EVFILT_PROC handling, possibly introduced by a mod in 2006: http://svnweb.freebsd.org/base/head/sys/kern/kern_event.c?r1=164450&r2=164451 The scenario is a process that spawns a bunch of other processes and uses the kevent EVFILT_PROC, NOTE_TRACK facility to keep tabs on them, maintaining a process history including process parent/child relationships and whether the processes are running or have exited. Consider a process that does a fork() and then the child attempts to exec() a non-existent program and does an exit(127) on the exec() failure. This can result in a single kevent that returns both the NOTE_CHILD and NOTE_EXIT fflags. Unfortunately, both of these fflags are defined to return something in kevent.data (the parent pid (ppid) for NOTE_CHILD and the exit status for NOTE_EXIT). Obviously, kevent.data can't contain both pieces of information. In fact, what gets returned is the exit status, so when the receiving code tries to interpret the NOTE_CHILD it appears that the ppid is 32512 (127 << 8), which cam really throws off the process tracking. Before the mod mentioned above the exit status was not returned, so the returned ppid for the NOTE_CHILD would have been correct (sys/kern/kern_event.c, file_procattach(), about line 368 in head), although the returned ppid would not make much sense as an exit status for the NOTE_EXIT. What I think should happen is that when filt_proc() (about kern_event.c line 433) is going to set the exit status, it should first check if NOTE_CHILD is already set in the knote and, if so, allocate a new knote for the NOTE_EXIT and queue it after the existing NOTE_CHILD knote (some adjustment would need to be made around line 424, too, since that is where the NOTE_EXIT was set). This would guarantee that the NOTE_CHILD was received before the NOTE_EXIT and that the appropriate piece of data could accompany each NOTE_. I don't know if there is a significant concern about allocating a new knote at that point. If that were the case and the ideal behavior I described were not possible, I would think that it would be more appropriate to return the ppid in the knote and not set the kn->kn_data field to the exit status iff the NOTE_CHILD fflag was set, thereby giving it precedence. That would at least work much better for my particular situation! If you receive a kevent with both NOTE_CHILD and NOTE_EXIT set, you might be able to presume that the child probably failed; in any case you would know for sure that it had exited and what process was its parent. I've exchanged email with jhb on the problem and he indicated that it might be a while before he could get to it. I'll take a shot at it myself, but it will probably be at least a couple weeks before I can do so. I wanted to file this PR so that it was out there in case someone else might be able to get to it sooner and also so that it isn't forgotten. Fix: jhb suggested: "Hmmm, this might be fixable by adding a f_touch method to the EVFILT_PROC handling and having it notice the two states and break them up." How-To-Repeat: This is a timing related thing, but doing a fork() and then exiting immediately in the child is likely to show the problem fairly often.
Hi, In PR 180385, you wrote about problems with EVFILT_PROC/NOTE_TRACK. I tried to use this too (for system service management, pid 1 or close to it) but ran across a problem already while reading the man page: NOTE_TRACKERR. If NOTE_TRACKERR happens, all the tracking breaks down. How do you deal with this? NOTE_CHILD|NOTE_EXIT knotes can be seen easily by suspending the process so it does not invoke kevent() for a long time. If EVFILT_PROC is not supposed to keep zombies alive (pretty much mandatory, otherwise any user can prevent any other user from freeing up proc slots by reaping zombies), then limiting the number of knotes that do not correspond to a live kernel object implies that NOTE_TRACKERR must be possible. Removing NOTE_TRACKERR would require discarding NOTE_CHILD|NOTE_EXIT knotes when the zombie is reaped. Even then, guaranteeing no NOTE_TRACKERR may cause fork() to fail sooner than without active kqueues. The extra restrictions can be compared to how waitpid() accepts (discards) the instance of the SIGCHLD signal for the waited process, ensuring that every terminating process generates a SIGCHLD signal while bounding memory usage for undelivered signals. Splitting NOTE_CHILD|NOTE_EXIT into two knotes either requires allocating two knotes at fork() time (making NOTE_TRACKERR slightly more likely and possibly wasting some memory) or allowing the split to fail. A partial workaround for your issue may be the udata field. I think (untested) that the udata field is copied upon NOTE_CHILD. This does not allow tracking the full parent-child relationships, but does allow tracking all descendants of a particular process (except if NOTE_TRACKERR happens). -- Jilles Tjoelker
https://reviews.freebsd.org/D4900
A commit references this bug: Author: vangyzen Date: Thu Jan 28 20:24:15 UTC 2016 New revision: 295012 URL: https://svnweb.freebsd.org/changeset/base/295012 Log: kqueue EVFILT_PROC: avoid collision between NOTE_CHILD and NOTE_EXIT NOTE_CHILD and NOTE_EXIT return something in kevent.data: the parent pid (ppid) for NOTE_CHILD and the exit status for NOTE_EXIT. Do not let the two events be combined, since one would overwrite the other's data. PR: 180385 Submitted by: David A. Bright <david_a_bright@dell.com> Reviewed by: jhb MFC after: 1 month Sponsored by: Dell Inc. Differential Revision: https://reviews.freebsd.org/D4900 Changes: head/sys/kern/kern_event.c head/sys/sys/event.h head/tests/sys/kqueue/common.h head/tests/sys/kqueue/main.c head/tests/sys/kqueue/proc.c
A commit references this bug: Author: vangyzen Date: Thu Apr 14 17:14:12 UTC 2016 New revision: 297977 URL: https://svnweb.freebsd.org/changeset/base/297977 Log: MFC r295012 kqueue EVFILT_PROC: avoid collision between NOTE_CHILD and NOTE_EXIT NOTE_CHILD and NOTE_EXIT return something in kevent.data: the parent pid (ppid) for NOTE_CHILD and the exit status for NOTE_EXIT. Do not let the two events be combined, since one would overwrite the other's data. PR: 180385 Submitted by: David A. Bright <david_a_bright@dell.com> Sponsored by: Dell Inc. Changes: _U stable/10/ stable/10/sys/kern/kern_event.c stable/10/sys/sys/event.h stable/10/tests/sys/kqueue/common.h stable/10/tests/sys/kqueue/main.c stable/10/tests/sys/kqueue/proc.c