Bug 170346

Summary: Changes to support waitid() and related stuff
Product: Base System Reporter: jau
Component: standardsAssignee: freebsd-standards (Nobody) <standards>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 9.1-PRERELEASE   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
file.diff
none
waitid-wait6.patch
none
wait6-waitid-9.0.patch
none
wait6-waitid-9.1.patch
none
wait6-waitid-9.1.patch none

Description jau 2012-08-03 09:30:12 UTC
The attached patch adds waitid() to the C library.

It also brings in a new system call wait6() to support the functionality needed by waitid(). The new wait6() is actually an extended version of wait4() with
the first pid argument split into two separate arguments idtype and id, and a new
siginfo_t pointer added to the end of the argument list.
The new setup understands also two new options flags WEXITED and WTRAPPED.
The older wait*() functions always behaved as if these two flags were implicitly
set. That is still the case for the older wait*() entry points.
For the new waitid() and wait6() at least one of WEXITED, WTRAPPED, WCONTINUED
or WSTOPPED (a.k.a. WUNTRACED) must be set for the call to make sense.
So, as a result now more detailed filtering of the processes to wait for is
available in both options and using also other idtype flavours than the old
PID or PGID.

Previously the treatment of WNOWAIT was faulty because it avoided removing
the zombies just fine, but it removed any signal state. Now the same signal
state can also be waited again.

This patch also quite intentionally removes the restriction on getting the
rusage data only for zombies. Sometimes intermediate rusage snapshots for
stopped processes might exactly what is needed.
Linux does not set any explicit limitations for getting rusage snapshots.
Solaris makes the reservation that only the time fields are useful.
Anyhow the traditional interpretation seemed unreasonably limiting.

Inside the kernel the old kern_wait() is still used anywhere where it was
before my changes. Now it is a wrapper for a new kern_wait6() and implicitly
sets option flags WEXITED and WTRAPPED, converts the old wpid to either
idtype = P_PID or idtype = P_PGID, and passes NULL as the siginfo_t pointer
to kern_wait6().

If you decide to try this patch, please, remember to run the following command

( cd /usr/src/sys/kern ; make sysent )

before attempting the build.
This patch changes the system call vector, and without the command above your
build is guaranteed to fail.

I also wish to remind that this implementation exceeds standard requirements
and supports multiple idtype alternatives which are not really required by
any standard. OTOH for PID and PGID only there would be no actual need for
a separate idtype argument either.
Consider support for the alternate idtypes as enabling technology only for
special purposes keeping in mind that using non-standard features may cause
portability problems.
You can use P_UID and P_GID for selecting target processes based on their
effective UID or effective GID. This might be handy when a parent process
starts SUID or SGID binaries.
When a child might start a session of its own, waiting using P_SID could be
useful. (The SID can only be equal to child's own PID or equal to the parent's
SID.)
P_ZONEID (Solaris terminology) tries to facilitate waiting for child processes
started in a certain jail.

Waiting on a certain CPU set ID (P_PSETID) might also be sometimes handy,
but for the moment that info lives only in the thread structures which seem to
be dropped before a terminated process becomes waitable.
Similarly the scheduling priority class (P_CID) might sometimes be a useful
tool for filtering the processes to wait. It seems quite plausible that a parent
process might only wish to know e.g. about real-time processes which have
stopped. As the CPU set ID also the scheduling class info lives for the time
being only inside the threads which causes it to be lost before it is possible
to wait for the zombie.

Fix: Apply the attached patch.

If you decide to try this patch, please, remember to run the following command

( cd /usr/src/sys/kern ; make sysent )

before attempting the build.
This patch changes the system call vector, and without the command above your
build is guaranteed to fail.


Patch attached with submission follows:
How-To-Repeat: No problem!
Just extended functionality and improved support for standards.
Comment 1 pluknet 2012-08-03 09:52:39 UTC
Hi.
Shouldn't the idtype_t definition be cut from sys/types.h and instead moved
into sys/wait.h itself? POSIX specifies for waitid() that you only need to
include sys/wait.h and doesn't mention sys/types.h.
It's also probably makes sense to wrap new syscall with __POSIX_VISIBLE
Something like this:

Index: sys/sys/wait.h
===================================================================
--- sys/sys/wait.h      (revision 237379)
+++ sys/sys/wait.h      (working copy)
@@ -93,8 +93,18 @@
 #define        WAIT_MYPGRP     0       /* any process in my process group */
 #endif /* __BSD_VISIBLE */

+#if __POSIX_VISIBLE >= 200809
+/* needs for waitid() */
+typedef enum {
+       P_ALL,
+       P_PGID,
+       P_PID
+/* other stuff goes here */
+} idtype_t;
+#endif
+
 #ifndef _KERNEL
 #include <sys/types.h>
+#include <sys/signal.h>

 __BEGIN_DECLS
 pid_t  wait(int *);
@@ -104,6 +114,9 @@
 pid_t  wait3(int *, int, struct rusage *);
 pid_t  wait4(pid_t, int *, int, struct rusage *);
 #endif
+#if __POSIX_VISIBLE >= 200809
+int    waitid(idtype_t, id_t, siginfo_t *, int);
+#endif
 __END_DECLS
 #endif /* !_KERNEL */

The remain part of sys/wait.h patch intentionally omitted for clarity.

-- 
wbr,
pluknet
Comment 2 jau 2012-08-03 11:11:00 UTC
Quoting Sergey Kandaurov:
> 
> Hi.
> Shouldn't the idtype_t definition be cut from sys/types.h and instead moved
> into sys/wait.h itself? POSIX specifies for waitid() that you only need to
> include sys/wait.h and doesn't mention sys/types.h.
> It's also probably makes sense to wrap new syscall with __POSIX_VISIBLE
> Something like this:
> 
> Index: sys/sys/wait.h
> ===================================================================
> --- sys/sys/wait.h      (revision 237379)
> +++ sys/sys/wait.h      (working copy)
> @@ -93,8 +93,18 @@
>  #define        WAIT_MYPGRP     0       /* any process in my process group */
>  #endif /* __BSD_VISIBLE */
> 
> +#if __POSIX_VISIBLE >= 200809
> +/* needs for waitid() */
> +typedef enum {
> +       P_ALL,
> +       P_PGID,
> +       P_PID
> +/* other stuff goes here */
> +} idtype_t;
> +#endif
> +
>  #ifndef _KERNEL
>  #include <sys/types.h>
> +#include <sys/signal.h>
> 
>  __BEGIN_DECLS
>  pid_t  wait(int *);
> @@ -104,6 +114,9 @@
>  pid_t  wait3(int *, int, struct rusage *);
>  pid_t  wait4(pid_t, int *, int, struct rusage *);
>  #endif
> +#if __POSIX_VISIBLE >= 200809
> +int    waitid(idtype_t, id_t, siginfo_t *, int);
> +#endif
>  __END_DECLS
>  #endif /* !_KERNEL */
> 
> The remain part of sys/wait.h patch intentionally omitted for clarity.
> 
> -- 
> wbr,
> pluknet
> 

Since the new wait6() also requires idtype_t and P_xxx stuff I am
not quite sure how will the macro framing fare with the kernel.
I just run out of steam before I got that far. ;-)
If you are convinced that framing the idtype_t declaration within
__POSIX_VISIBLE does not harm compilation of kern_exit.c, go ahead.

My decision to put the idtype_t type declaration where it is in my
patch was prompted by the idea that the type and its values might
be later on needed also elsewhere, not only with the wait*()
stuff.
As long as idtype_t is not used anywhere else but with the wait*()
implementation, it should be technically quite possible to put the
stuff in <sys/wait.h>. That would carry the risk, though, that later
the definition would have to be moved again to a more generic place
to be feasible to use with some other code.
Since idtype_t is a SUS/POSIX feature, I would not be surprised to
see it used with other standard APIs as well.
I can only say that it is hard to predict anything, especially the
future. ;-)
And, if I my memory serves me right, I guess Solaris has put the
idtype_t declaration in <sys/types.h> and <wait.h> just includes
all sorts of necessary stuff.
Because idtype_t carries with it a whole lot of P_xxx names, it might
be beneficial to have those somewhere such that any potential conflict
with the P_zzz macros in <sys/proc.h> becomes quickly visible.
I have no objection to moving things around as such.
I just estimated the benefits and the drawbacks embedded in the
selection of the location one way. Others may think differently.

	Cheers,
		// jau
.---  ..-  -.-  -.-  .-    .-  .-.-.-    ..-  -.-  -.-  ---  -.  .  -.
  /    Jukka A. Ukkonen,                             Oxit Ltd, Finland
 /__   M.Sc. (sw-eng & cs)                    (Phone) +358-500-606-671
   /   Internet: Jukka.Ukkonen(a)Oxit.Fi
  /    Internet: jau(a)iki.fi
 v
        .---  .-  ..-  ...-.-  ..  -.-  ..  .-.-.-  ..-.  ..
+ + + + My opinions are mine and mine alone, not my employers. + + + +
Comment 3 jau 2012-08-04 05:51:29 UTC
Oops! Sorry!
It seems I forgot the actual waitid() function from the patch.
Here is the missing part.

--jau

Comment 4 jau 2012-08-04 07:14:21 UTC
These links might shed some more light to the optimal
placement of the idtype_t definition.

http://www.unix.com/man-page/OpenSolaris/2/getacct/

http://www.unix.com/man-page/OpenSolaris/2/sigsend/

Both of these manual pages apply to Solaris-11. The same
features are apparently also in HP-UX, but they are not
SUS/POSIX - at least not yet.
Esp. the sigsend() style functionality as a generalized
kill() seems to me like a potential candidate to make it
also to SUS/POSIX. Such functionality would be a natural
companion to wait*() with generalized idtype_t and id_t.

And in any case it seems likely that id_t and idtype_t
will be used together almost anywhere where one is
needed. Neither of them makes a whole lot sense
without the other. So, keeping them together in the
header files may be beneficial.

Just my 0,02 EUR worth.
--jau
Comment 5 Jilles Tjoelker freebsd_committer freebsd_triage 2012-08-04 23:42:33 UTC
On Fri, Aug 03, 2012 at 08:25:20AM +0000, Jukka A. Ukkonen wrote:
> >Number:         170346
> >Category:       standards
> >Synopsis:       Changes to support waitid() and related stuff
> >Confidential:   no
> >Severity:       non-critical
> >Priority:       low
> >Responsible:    freebsd-standards
> >State:          open
> >Quarter:        
> >Keywords:       
> >Date-Required:
> >Class:          change-request
> >Submitter-Id:   current-users
> >Arrival-Date:   Fri Aug 03 08:30:12 UTC 2012
> >Closed-Date:
> >Last-Modified:
> >Originator:     Jukka A. Ukkonen
> >Release:        FreeBSD 9.1-PRERELEASE
> >Organization:
> -----
> >Environment:
> FreeBSD sleipnir 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #1: Tue Jul 31 15:39:12 EEST 2012     root@sleipnir:/usr/obj/usr/src/sys/Sleipnir  amd64
> >Description:
> The attached patch adds waitid() to the C library.

Why do you need this (other than this being in POSIX.1-2008)? Years ago,
I searched for applications of waitid() and did not really find
anything. I am curious what the use of this function is.

> It also brings in a new system call wait6() to support the
> functionality needed by waitid(). The new wait6() is actually an
> extended version of wait4() with the first pid argument split into two
> separate arguments idtype and id, and a new siginfo_t pointer added to
> the end of the argument list. The new setup understands also two new
> options flags WEXITED and WTRAPPED. The older wait*() functions always
> behaved as if these two flags were implicitly set. That is still the
> case for the older wait*() entry points. For the new waitid() and
> wait6() at least one of WEXITED, WTRAPPED, WCONTINUED or WSTOPPED
> (a.k.a. WUNTRACED) must be set for the call to make sense. So, as a
> result now more detailed filtering of the processes to wait for is
> available in both options and using also other idtype flavours than
> the old PID or PGID.

> Previously the treatment of WNOWAIT was faulty because it avoided removing
> the zombies just fine, but it removed any signal state. Now the same signal
> state can also be waited again.

I suppose that makes sense.

> This patch also quite intentionally removes the restriction on getting
> the rusage data only for zombies. Sometimes intermediate rusage
> snapshots for stopped processes might exactly what is needed. Linux
> does not set any explicit limitations for getting rusage snapshots.
> Solaris makes the reservation that only the time fields are useful.
> Anyhow the traditional interpretation seemed unreasonably limiting.

This seems useful. The restriction is likely for historical code reasons
and can be (partly) circumvented by using kern.proc.* sysctls already.

> Inside the kernel the old kern_wait() is still used anywhere where it
> was before my changes. Now it is a wrapper for a new kern_wait6() and
> implicitly sets option flags WEXITED and WTRAPPED, converts the old
> wpid to either idtype = P_PID or idtype = P_PGID, and passes NULL as
> the siginfo_t pointer to kern_wait6().

> If you decide to try this patch, please, remember to run the following
> command

> ( cd /usr/src/sys/kern ; make sysent )

> before attempting the build.
> This patch changes the system call vector, and without the command
> above your build is guaranteed to fail.

> I also wish to remind that this implementation exceeds standard
> requirements and supports multiple idtype alternatives which are not
> really required by any standard. OTOH for PID and PGID only there
> would be no actual need for a separate idtype argument either.
> Consider support for the alternate idtypes as enabling technology only
> for special purposes keeping in mind that using non-standard features
> may cause portability problems.
> You can use P_UID and P_GID for selecting target processes based on
> their effective UID or effective GID. This might be handy when a
> parent process starts SUID or SGID binaries.
> When a child might start a session of its own, waiting using P_SID
> could be useful. (The SID can only be equal to child's own PID or
> equal to the parent's SID.)
> P_ZONEID (Solaris terminology) tries to facilitate waiting for child
> processes started in a certain jail.
> Waiting on a certain CPU set ID (P_PSETID) might also be sometimes
> handy, but for the moment that info lives only in the thread
> structures which seem to be dropped before a terminated process
> becomes waitable.
> Similarly the scheduling priority class (P_CID) might sometimes be a
> useful tool for filtering the processes to wait. It seems quite
> plausible that a parent process might only wish to know e.g. about
> real-time processes which have stopped. As the CPU set ID also the
> scheduling class info lives for the time being only inside the threads
> which causes it to be lost before it is possible to wait for the
> zombie.

What is the use of these extensions? I would like more specific
applications, not just theoretical use cases.

An application may instead store status information about child
processes in its own data structures and/or use the siginfo_t
information with SIGCHLD for more flexibility.

> >How-To-Repeat:
> No problem!
> Just extended functionality and improved support for standards.
> 
> >Fix:
> Apply the attached patch.
> 
> If you decide to try this patch, please, remember to run the following command
> 
> ( cd /usr/src/sys/kern ; make sysent )
> 
> before attempting the build.
> This patch changes the system call vector, and without the command above your
> build is guaranteed to fail.

> Some comments inline.

> --- sys/sys/wait.h.orig	2011-09-23 03:51:37.000000000 +0300
> +++ sys/sys/wait.h	2012-07-31 10:29:42.000000000 +0300
> @@ -80,6 +80,8 @@
>  #define	WSTOPPED	WUNTRACED   /* SUS compatibility */
>  #define	WCONTINUED	4	/* Report a job control continued process. */
>  #define	WNOWAIT		8	/* Poll only. Don't delete the proc entry. */
> +#define WEXITED		16	/* Wait for exited processes. (SUS) */
> +#define WTRAPPED	32	/* Wait for a process to hit a trap or a breakpoint. (Solaris) */
>  
>  #if __BSD_VISIBLE
>  #define	WLINUXCLONE 0x80000000	/* Wait for kthread spawned from linux_clone. */
> @@ -95,14 +97,17 @@
>  
>  #ifndef _KERNEL
>  #include <sys/types.h>
> +#include <sys/signal.h>
>  
>  __BEGIN_DECLS
>  pid_t	wait(int *);
>  pid_t	waitpid(pid_t, int *, int);
> +int	waitid(idtype_t, id_t, siginfo_t *, int);
>  #if __BSD_VISIBLE
>  struct rusage;
>  pid_t	wait3(int *, int, struct rusage *);
>  pid_t	wait4(pid_t, int *, int, struct rusage *);
> +pid_t	wait6(idtype_t, id_t, int *, int, struct rusage *, siginfo_t *);
>  #endif
>  __END_DECLS
>  #endif /* !_KERNEL */
> --- sys/sys/syscallsubr.h.orig	2012-01-06 21:29:16.000000000 +0200
> +++ sys/sys/syscallsubr.h	2012-07-31 10:29:42.000000000 +0300
> @@ -233,6 +233,8 @@
>  	    enum uio_seg pathseg, struct timeval *tptr, enum uio_seg tptrseg);
>  int	kern_wait(struct thread *td, pid_t pid, int *status, int options,
>  	    struct rusage *rup);
> +int	kern_wait6(struct thread *td, idtype_t idtype, id_t id, int *status,
> +		   int options, struct rusage *rup, siginfo_t *sip);
>  int	kern_writev(struct thread *td, int fd, struct uio *auio);
>  int	kern_socketpair(struct thread *td, int domain, int type, int protocol,
>  	    int *rsv);
> --- sys/bsm/audit_kevents.h.orig	2011-09-23 03:51:37.000000000 +0300
> +++ sys/bsm/audit_kevents.h	2012-07-31 10:29:42.000000000 +0300
> @@ -602,6 +602,7 @@
>  #define	AUE_PDKILL		43198	/* FreeBSD. */
>  #define	AUE_PDGETPID		43199	/* FreeBSD. */
>  #define	AUE_PDWAIT		43200	/* FreeBSD. */
> +#define	AUE_WAIT6		43201	/* FreeBSD. */
>  
>  /*
>   * Darwin BSM uses a number of AUE_O_* definitions, which are aliased to the
> --- sys/kern/syscalls.master.orig	2012-01-06 21:29:16.000000000 +0200
> +++ sys/kern/syscalls.master	2012-07-31 10:29:42.000000000 +0300
> @@ -72,7 +72,7 @@
>  6	AUE_CLOSE	STD	{ int close(int fd); }
>  7	AUE_WAIT4	STD	{ int wait4(int pid, int *status, \
>  				    int options, struct rusage *rusage); } \
> -				    wait4 wait_args int
> +				    wait4 wait4_args int

I think this can be simplified now the argument struct has its standard
name.

>  8	AUE_CREAT	COMPAT	{ int creat(char *path, int mode); }
>  9	AUE_LINK	STD	{ int link(char *path, char *link); }
>  10	AUE_UNLINK	STD	{ int unlink(char *path); }
> @@ -368,7 +368,11 @@
>  190	AUE_LSTAT	STD	{ int lstat(char *path, struct stat *ub); }
>  191	AUE_PATHCONF	STD	{ int pathconf(char *path, int name); }
>  192	AUE_FPATHCONF	STD	{ int fpathconf(int fd, int name); }
> -193	AUE_NULL	UNIMPL	nosys
> +193	AUE_WAIT6	STD	{ int wait6(int idtype, int pid, \
> +					    int *status, int options, \
> +					    struct rusage *rusage, \
> +					    siginfo_t *info); } \
> +					wait6 wait6_args int
>  194	AUE_GETRLIMIT	STD	{ int getrlimit(u_int which, \
>  				    struct rlimit *rlp); } getrlimit \
>  				    __getrlimit_args int
> --- lib/libc/include/namespace.h.orig	2011-09-23 03:51:37.000000000 +0300
> +++ lib/libc/include/namespace.h	2012-07-31 10:29:42.000000000 +0300
> @@ -229,6 +229,7 @@
>  #define		socketpair			_socketpair
>  #define		usleep				_usleep
>  #define		wait4				_wait4
> +#define		wait6				_wait6
>  #define		waitpid				_waitpid
>  #define		write				_write
>  #define		writev				_writev
> --- lib/libc/include/un-namespace.h.orig	2011-09-23 03:51:37.000000000 +0300
> +++ lib/libc/include/un-namespace.h	2012-07-31 10:29:42.000000000 +0300
> @@ -210,6 +210,7 @@
>  #undef		socketpair
>  #undef		usleep
>  #undef		wait4
> +#undef		wait6
>  #undef		waitpid
>  #undef		write
>  #undef		writev
> --- lib/libc/gen/Makefile.inc.orig	2012-03-05 13:43:27.000000000 +0200
> +++ lib/libc/gen/Makefile.inc	2012-07-31 10:29:42.000000000 +0300
> @@ -34,7 +34,7 @@
>  	syslog.c telldir.c termios.c time.c times.c timezone.c tls.c \
>  	ttyname.c ttyslot.c ualarm.c ulimit.c uname.c unvis.c \
>  	usleep.c utime.c utxdb.c valloc.c vis.c wait.c wait3.c waitpid.c \
> -	wordexp.c
> +	waitid.c wordexp.c
>  
>  CANCELPOINTS_SRCS=sem.c sem_new.c
>  .for src in ${CANCELPOINTS_SRCS}
> --- sys/cddl/contrib/opensolaris/uts/common/sys/procset.h.orig	2008-03-29 00:16:13.000000000 +0200
> +++ sys/cddl/contrib/opensolaris/uts/common/sys/procset.h	2012-07-31 10:29:42.000000000 +0300
> @@ -51,6 +51,7 @@
>  #define	P_INITUID	0
>  #define	P_INITPGID	0
>  
> +#ifndef _IDTYPE_T_DECLARED
>  
>  /*
>   *	The following defines the values for an identifier type.  It
> @@ -79,8 +80,12 @@
>  	P_CTID,		/* A (process) contract identifier.	*/
>  	P_CPUID,	/* CPU identifier.			*/
>  	P_PSETID	/* Processor set identifier		*/
> +

Please remove this unnecessary new newline.

>  } idtype_t;
>  
> +#define	_IDTYPE_T_DECLARED
> +
> +#endif
>  
>  /*
>   *	The following defines the operations which can be performed to
> --- sys/sys/proc.h.orig	2012-07-03 11:40:20.000000000 +0300
> +++ sys/sys/proc.h	2012-07-31 10:29:42.000000000 +0300
> @@ -879,8 +879,7 @@
>  void	procinit(void);
>  void	proc_linkup0(struct proc *p, struct thread *td);
>  void	proc_linkup(struct proc *p, struct thread *td);
> -void	proc_reap(struct thread *td, struct proc *p, int *status, int options,
> -	    struct rusage *rusage);
> +void	proc_reap(struct thread *td, struct proc *p, int *status, int options);
>  void	proc_reparent(struct proc *child, struct proc *newparent);
>  struct	pstats *pstats_alloc(void);
>  void	pstats_fork(struct pstats *src, struct pstats *dst);
> --- lib/libc/gen/Symbol.map.orig	2012-02-21 23:18:59.000000000 +0200
> +++ lib/libc/gen/Symbol.map	2012-07-31 10:29:42.000000000 +0300
> @@ -384,6 +384,7 @@
>  	 fdlopen;
>  	__FreeBSD_libc_enter_restricted_mode;
>  	getcontextx;
> +	waitid;
>  };
>  
>  FBSDprivate_1.0 {
> --- sys/sys/types.h.orig	2012-01-02 18:14:52.000000000 +0200
> +++ sys/sys/types.h	2012-07-31 10:29:42.000000000 +0300
> @@ -142,6 +142,45 @@
>  #define	_ID_T_DECLARED
>  #endif
>  
> +#ifndef _IDTYPE_T_DECLARED
> +
> +typedef enum
> +#if !defined(_XPG4_2) || defined(__EXTENSIONS__)

These defines look Solaris-specific.

> +	idtype		/* pollutes XPG4.2 namespace */
> +#endif
> +		{
> +	/*
> +	 * These names were mostly lifted from Solaris source code
> +	 * and still use Solaris style naming to avoid breaking any
> +	 * OpenSolaris code which has been ported to FreeBSD.
> +	 * There is no clear FreeBSD counterpart for all of the names.
> +	 * OTOH some have a clear correspondence to FreeBSD entities.
> +	 */
> +	
> +	P_PID,		/* A process identifier.		*/
> +	P_PPID,		/* A parent process identifier.		*/
> +	P_PGID,		/* A process group (job control group)	*/
> +			/* identifier.				*/
> +	P_SID,		/* A session identifier.		*/
> +	P_CID,		/* A scheduling class identifier.	*/
> +	P_UID,		/* A user identifier.			*/
> +	P_GID,		/* A group identifier.			*/
> +	P_ALL,		/* All processes.			*/
> +	P_LWPID,	/* An LWP identifier.			*/
> +	P_TASKID,	/* A task identifier.			*/
> +	P_PROJID,	/* A project identifier.		*/
> +	P_POOLID,	/* A pool identifier.			*/
> +	P_ZONEID,	/* A zone identifier.			*/
> +	P_CTID,		/* A (process) contract identifier.	*/
> +	P_CPUID,	/* CPU identifier.			*/
> +	P_PSETID	/* Processor set identifier		*/
> +
> +} idtype_t;		/* The type of id_t we are using.	*/
> +
> +#define	_IDTYPE_T_DECLARED
> +#endif
> +
> +
>  #ifndef _INO_T_DECLARED
>  typedef	__ino_t		ino_t;		/* inode number */
>  #define	_INO_T_DECLARED
> --- lib/libc/sys/wait.2.orig	2011-09-23 03:51:37.000000000 +0300
> +++ lib/libc/sys/wait.2	2012-07-31 16:45:18.000000000 +0300
> @@ -34,9 +34,11 @@
>  .Sh NAME
>  .Nm wait ,
>  .Nm waitpid ,
> +.Nm waitid ,
> +.Nm wait3 ,
>  .Nm wait4 ,
> -.Nm wait3
> -.Nd wait for process termination
> +.Nm wait6
> +.Nd wait for processes to change status
>  .Sh LIBRARY
>  .Lb libc
>  .Sh SYNOPSIS
> @@ -46,12 +48,17 @@
>  .Fn wait "int *status"
>  .Ft pid_t
>  .Fn waitpid "pid_t wpid" "int *status" "int options"
> +.In sys/signal.h
> +.Ft int
> +.Fn waitid "idtype_t idtype" "id_t id" "siginfo_t *info" "int options"
>  .In sys/time.h
>  .In sys/resource.h
>  .Ft pid_t
>  .Fn wait3 "int *status" "int options" "struct rusage *rusage"
>  .Ft pid_t
>  .Fn wait4 "pid_t wpid" "int *status" "int options" "struct rusage *rusage"
> +.Ft pid_t
> +.Fn wait6 "idtype_t idtype" "id_t id" "int *status" "int options" "struct rusage *rusage" "siginfo_t *infop"
>  .Sh DESCRIPTION
>  The
>  .Fn wait
> @@ -89,25 +96,190 @@
>  The other wait functions are implemented using
>  .Fn wait4 .
>  .Pp
> +The broadest interface of all functions in this family is
> +.Fn wait6
> +which is otherwise very much like
> +.Fn wait4
> +but with a few very important distinctions.
> +.br
> +It will not wait for existed processes unless the option flag 
> +.Dv WEXITED
> +is explicitly specified.
> +This allows for waiting for processes which have experienced other
> +status changes without having to handle also the exit status from
> +the terminated processes.
> +Another important difference is the additional fifth argument
> +which must be either 
> +.Dv NULL
> +or a pointer to a
> +.Fa siginfo_t
> +structure.
> +Additionally the old
> +.Fq pid_t
> +argument has been split into two separate
> +.Fa idtype_t
> +and
> +.Fa id_t .
> +.br
> +Allowing for the distinction in how the
> +PID or PGID
> +is passed to the routine, calling
> +.Fn wait6
> +with the bits
> +.Dv WEXITED
> +and
> +.Dv WTRAPPED
> +set in the
> +.Fa options
> +and with
> +.Fa infop
> +set to
> +.Dv NULL ,
> +is still functionally equivalent to calling
> +.Fn wait4 .
> +The separation of
> +.Fa idtype
> +and
> +.Fa id
> +arguments has the benefit, though, that many other types of
> +IDs can be supported as well in addition to PID and PGID.
> +.sp
> +Notice that
> +.Fn wait6
> +is not required by any standard nor is it common in other
> +operating system.

"systems"

> +It is simply a generalized API to support in one function call
> +interface any and all of the functionality available through 
> +any of the other
> +.Fn wait*
> +functions.
> +Do not use it unless you fully accept the implied
> +limitations to the portability of your code.
> +.Pp
>  The
> +.Fa idtype
> +and
> +.Fa id
> +arguments specify which processes
> +.Fn waitid
> +and
> +.Fn wait6
> +shall wait for.
> +.Bl -tag -width Ds
> +.It Dv +

Is this supposed to be some sort of list bullet? Those are done
differently in mdoc.

> +If
> +.Fa idtype
> +is 
> +.Dv P_PID ,
> +.Fn waitid
> +and
> +.Fn wait6
> +wait for the child process with a process ID equal to
> +.Dv (pid_t)id .
> +.It Dv +
> +If
> +.Fa idtype
> +is 
> +.Dv P_PGID ,
> +.Fn waitid
> +and
> +.Fn wait6
> +wait for the child process with a process group ID equal to
> +.Dv (pid_t)id .
> +.It Dv +
> +If
> +.Fa idtype
> +is 
> +.Dv P_ALL ,
> +.Fn waitid
> +and
> +.Fn wait6
> +wait for any child process and the
> +.Dv id
> +is ignored.
> +.It Dv +
> +If
> +.Fa idtype
> +is 
> +.Dv P_PID
> +or
> +.Dv P_PGID
> +and the
> +.Dv id
> +is zero,
> +.Fn waitid
> +and
> +.Fn wait6
> +wait for any child process in the same process group as the caller.
> +.It Dv +
> +While no standard actually requires such functionality,
> +this implementation supports also other types of IDs to wait.
> +.br
> +Notice anyhow that using any of these non-standard features will
> +most likely seriously degrade the portability of your code.
> +Consider such use only as enabling technology for new creative
> +experimentation locked into its original environment.
> +.br
> +Use
> +.Fa idtype
> +value
> +.Dv P_UID
> +to filter processes based on their effective UID,
> +.Dv P_GID
> +to filter processes based on their effective GID.
> +.br
> +.Dv P_SID
> +could be used to filter based on the session ID.
> +In case the child process started its own new session,
> +SID will be the same as its own PID.
> +Otherwise the SID of a child process will match the caller's SID.

setsid() is usually only used after fork() so this seems rarely useful.

> +.br
> +.Dv P_ZONEID
> +facilitates waiting for processes within a certain jail.
> +.br
> +There could be still more meaningful ID types to wait for
> +like
> +.Dv P_PSETID
> +for processes restricted to a certain set of CPUs,
> +.Dv P_CID
> +to wait for processes in a certain scheduling class or
> +.Dv P_CPUID
> +to wait for processes nailed to a certain CPU.
> +These three
> +have not been implemented at the time of this writing,
> +because the data stored in the thread structures seems
> +to be zeroed when a process terminates before the parent
> +gets to wait for the zombie.
> +They are mentioned here as potentially useful extensions.

I don't think potential new non-standard features belong in a man page.

> +.El
> +.Pp
> +For all the other
> +.Fn wait*
> +variants the
>  .Fa wpid
>  argument specifies the set of child processes for which to wait.
> +.Bl -tag -width Ds
> +.It Dv +
>  If
>  .Fa wpid
>  is -1, the call waits for any child process.
> +.It Dv +
>  If
>  .Fa wpid
>  is 0,
>  the call waits for any child process in the process group of the caller.
> +.It Dv +
>  If
>  .Fa wpid
>  is greater than zero, the call waits for the process with process id
>  .Fa wpid .
> +.It Dv +
>  If
>  .Fa wpid
>  is less than -1, the call waits for any process whose process group id
>  equals the absolute value of
>  .Fa wpid .
> +.El
>  .Pp
>  The
>  .Fa status
> @@ -116,41 +288,106 @@
>  The
>  .Fa options
>  argument contains the bitwise OR of any of the following options.
> -The
> -.Dv WCONTINUED
> -option indicates that children of the current process that
> +.Bl -tag -width Ds
> +.It Dv WCONTINUED
> +indicates that children of the current process that
>  have continued from a job control stop, by receiving a
>  .Dv SIGCONT
>  signal, should also have their status reported.
> -The
> -.Dv WNOHANG
> -option
> -is used to indicate that the call should not block if
> -there are no processes that wish to report status.
> -If the
> -.Dv WUNTRACED
> -option is set,
> -children of the current process that are stopped
> +.It Dv WNOHANG
> +is used to indicate that the call should not block when
> +there are no processes wishing to report status.
> +.It Dv WUNTRACED
> +indicates that children of the current process which are stopped
>  due to a
>  .Dv SIGTTIN , SIGTTOU , SIGTSTP ,
>  or
>  .Dv SIGSTOP
> -signal also have their status reported.
> -The
> -.Dv WSTOPPED
> -option is an alias for
> +signal shall have their status reported.
> +.It Dv WSTOPPED
> +is an alias for
>  .Dv WUNTRACED .
> -The
> -.Dv WNOWAIT
> -option keeps the process whose status is returned in a waitable state.
> +.It Dv WTRAPPED
> +allows waiting for processes which have trapped or reached a breakpoint.
> +.It Dv WEXITED
> +indicates that the caller is wants to receive status reports from
> +terminated processes.
> +.br
> +This bit is implicitly set for the older functions
> +.Fn wait ,
> +.Fn waitpid ,
> +.Fn wait3 ,
> +and
> +.Fn wait4 
> +to avoid changing their traditional functionality.
> +.br
> +For the more recent new APIs 

"new" is redundant.

> +.Fn waitid
> +and
> +.Fn wait6
> +this bit has to be explicitly included in the 
> +.Fa options ,
> +if status reports from terminated processes are expected.
> +.br
> +This has the benefit that while using the latter two APIs
> +it is possible to request status reports only for processes
> +which have expereinced some other status change, but which

"experienced"

> +have not terminated.
> +So, it is possible to avoid receiving reports for terminated
> +processes, in those parts of a program which are not able
> +to properly handle zombies and delay zombie processing to
> +other parts which can handle them properly.

POSIX requires this functionality but IMHO taking advantage of it
signifies that your design is wrong. The various state changes of a
child process belong together. Also, most applications only care about
exited child processes, not about stopped and continued ones.

> +.It Dv WNOWAIT
> +keeps the process whose status is returned in a waitable state.
>  The process may be waited for again after this call completes.
> +.El
> +.sp
> +For the more recent APIs 
> +.Fn waitid
> +and
> +.Fn wait6
> +at least one of the options
> +.Dv WEXITED ,
> +.Dv WUNTRACED ,
> +.Dv WSTOPPED ,
> +.Dv WTRAPPED ,
> +or
> +.Dv WCONTINUED
> +must be specified. Otherwise there will be no data for the call to
> +return.
> +To avoid hanging indefinitely in such a case these functions currently
> +behave as if WNOHANG had been specified.

Not specifying necessary options is [EINVAL], not success.

I think there is also another case added by making WEXITED an option: if
WEXITED is not specified, there is at least one zombie and there are no
other child processes. The zombies are never going to stop or continue
so the call would block indefinitely. It seems to make sense to fail
with [ECHILD] right away, just like happens when calling any wait
function while there are no child processes at all.

>  .Pp
>  If
>  .Fa rusage
>  is non-zero, a summary of the resources used by the terminated
>  process and all its
> -children is returned (this information is currently not available
> -for stopped or continued processes).
> +children is returned.
> +.Pp
> +If
> +.Fa infop
> +is non-null, it must point to a 
> +.Dv siginfo_t
> +structure which will be filled such that the
> +.Dv si_signo
> +field will always be
> +.Dv SIGCHLD
> +and the field
> +.Dv si_pid
> +will be non-zero, if there is a status change to report.
> +If there are no status changes to report and WNOHANG is applied,
> +both of these fields will be zero.
> +.br
> +When using the
> +.Fn waitid
> +API with the
> +.Dv WNOHANG
> +option set checking these fields is the only way to know whether
> +there were any status changes to report, because the return value
> +from
> +.Fn waitid
> +will be zero as it is for any successful return from
> +.Fn waitid .
>  .Pp
>  When the
>  .Dv WNOHANG
> @@ -306,6 +543,18 @@
>  is returned and
>  .Va errno
>  is set to indicate the error.
> +.Pp
> +If
> +.Fn waitid
> +returns because one or more processes have a state change to report,
> +0 is returned.
> +To indicate an error, -1 will be returned and
> +.Dv errno
> +set to an appropriate value.
> +If
> +.Dv WNOHANG
> +was used, 0 can be returned indicating no error, but no processes
> +may have changed state either, if si_signo and/or si_pid are zero.
>  .Sh ERRORS
>  The
>  .Fn wait
> @@ -335,6 +584,14 @@
>  or the signal did not have the
>  .Dv SA_RESTART
>  flag set.
> +.It Bq Er EINVAL
> +An invalid value as specified for
> +.Fa options ,
> +or
> +.Fa idtype
> +and
> +.Fa id
> +do not specify a valid set of processes.
>  .El
>  .Sh SEE ALSO
>  .Xr _exit 2 ,
> --- sys/kern/kern_exit.c.orig	2012-04-05 13:33:39.000000000 +0300
> +++ sys/kern/kern_exit.c	2012-07-31 16:39:30.000000000 +0300
> @@ -674,6 +674,7 @@
>  	int error, status;
>  
>  	error = kern_wait(td, WAIT_ANY, &status, 0, NULL);
> +

Please remove this unnecessary new newline.

>  	if (error == 0)
>  		td->td_retval[1] = status;
>  	return (error);
> @@ -684,7 +685,7 @@
>   * The dirty work is handled by kern_wait().
>   */
>  int
> -sys_wait4(struct thread *td, struct wait_args *uap)
> +sys_wait4(struct thread *td, struct wait4_args *uap)
>  {
>  	struct rusage ru, *rup;
>  	int error, status;
> @@ -693,11 +694,63 @@
>  		rup = &ru;
>  	else
>  		rup = NULL;
> +
>  	error = kern_wait(td, uap->pid, &status, uap->options, rup);
> +
> +	if (uap->status != NULL && error == 0)
> +		error = copyout(&status, uap->status, sizeof(status));
> +	if (uap->rusage != NULL && error == 0)
> +		error = copyout(&ru, uap->rusage, sizeof(struct rusage));
> +	return (error);
> +}
> +
> +int
> +sys_wait6(struct thread *td, struct wait6_args *uap)
> +{
> +	struct rusage ru, *rup;
> +	siginfo_t  si, *sip;
> +	int error, status;
> +	pid_t	pid;
> +	idtype_t idtype;
> +	id_t	id;
> +
> +	pid = uap->pid;
> +
> +	if (pid == WAIT_ANY) {
> +		idtype = P_ALL;
> +		id = 0;
> +	}
> +	else if (pid <= 0) {
> +		idtype = P_PGID;
> +		id = (id_t) -pid;
> +	}
> +	else {
> +		idtype = P_PID;
> +		id = (id_t) pid;
> +	}

Why is this code here? I expected uap->idtype to be used.

> +
> +	if (uap->rusage != NULL)
> +		rup = &ru;
> +	else
> +		rup = NULL;
> +
> +	if (uap->info != NULL)
> +		sip = &si;
> +	else
> +		sip = NULL;
> +
> +	/*
> +	 *  We expect all callers of wait6()
> +	 *  to know about WEXITED & WTRAPPED!

Consider "and" here so it is clear the C operator is not meant.

> +	 */
> +	error = kern_wait6(td, idtype, id, &status, uap->options, rup, sip);
> +
>  	if (uap->status != NULL && error == 0)
>  		error = copyout(&status, uap->status, sizeof(status));
>  	if (uap->rusage != NULL && error == 0)
>  		error = copyout(&ru, uap->rusage, sizeof(struct rusage));
> +	if (uap->info != NULL && error == 0)
> +		error = copyout(&si, uap->info, sizeof(siginfo_t));
>  	return (error);
>  }
>  
> @@ -707,8 +760,7 @@
>   * lock as part of its work.
>   */
>  void
> -proc_reap(struct thread *td, struct proc *p, int *status, int options,
> -    struct rusage *rusage)
> +proc_reap(struct thread *td, struct proc *p, int *status, int options)
>  {
>  	struct proc *q, *t;
>  
> @@ -718,10 +770,7 @@
>  	KASSERT(p->p_state == PRS_ZOMBIE, ("proc_reap: !PRS_ZOMBIE"));
>  
>  	q = td->td_proc;
> -	if (rusage) {
> -		*rusage = p->p_ru;
> -		calcru(p, &rusage->ru_utime, &rusage->ru_stime);
> -	}
> +
>  	PROC_SUNLOCK(p);
>  	td->td_retval[0] = p->p_pid;
>  	if (status)
> @@ -834,8 +883,10 @@
>  }
>  
>  static int
> -proc_to_reap(struct thread *td, struct proc *p, pid_t pid, int *status,
> -    int options, struct rusage *rusage)
> +proc_to_reap(struct thread *td, struct proc *p,
> +	     idtype_t idtype, id_t id, 
> +	     int *status, int options,
> +	     struct rusage *rusage, siginfo_t *siginfo)
>  {
>  	struct proc *q;
>  
> @@ -843,15 +894,121 @@
>  
>  	q = td->td_proc;
>  	PROC_LOCK(p);
> -	if (pid != WAIT_ANY && p->p_pid != pid && p->p_pgid != -pid) {
> +
> +	switch (idtype) {
> +	case	P_ALL:
> +		break;
> +
> +	case	P_PID:
> +		if (p->p_pid != (pid_t) id) {

Is the cast necessary here? If id_t and pid_t signedness matches, there
is no need.

In case the cast is required, style(9) says no space after casts.

> +			PROC_UNLOCK(p);
> +			return (0);
> +		}
> +		break;
> +
> +	case	P_PGID:
> +		if (p->p_pgid != (pid_t) id) {
> +			PROC_UNLOCK(p);
> +			return (0);
> +		}
> +		break;
> +
> +	case	P_SID:
> +		if (p->p_session->s_sid != (pid_t) id) {
> +			PROC_UNLOCK(p);
> +			return (0);
> +		}
> +		break;
> +
> +	case	P_UID:
> +		if (p->p_ucred->cr_uid != (uid_t) id) {
> +			PROC_UNLOCK(p);
> +			return (0);
> +		}
> +		break;
> +
> +	case	P_GID:
> +		if (p->p_ucred->cr_gid != (gid_t) id) {
> +			PROC_UNLOCK(p);
> +			return (0);
> +		}
> +		break;
> +
> +	case	P_ZONEID:	/* jail */
> +		if (! p->p_ucred->cr_prison ||
> +		    (p->p_ucred->cr_prison->pr_id != (int) id)) {
> +			PROC_UNLOCK(p);
> +			return (0);
> +		}
> +		break;
> +
> +#if 0
> +		/*
> +		 * It seems that the first thread structure gets zeroed out
> +		 * at process exit.
> +		 * This makes toast of all useful info related to CPU set and
> +		 * scheduling priority class.
> +		 */
> +
> +	case	P_PSETID:
> +		{
> +			struct thread	*td1;
> +
> +			td1 = FIRST_THREAD_IN_PROC(p);
> +			if (td1->td_cpuset->cs_id != (cpusetid_t) id) {
> +				PROC_UNLOCK(p);
> +				return (0);
> +			}
> +		}
> +		break;
> +
> +	case	P_CID:
> +		{
> +			struct thread	*td1;
> +
> +			td1 = FIRST_THREAD_IN_PROC(p);
> +			if (td1->td_pri_class != (unsigned) id) {
> +				PROC_UNLOCK(p);
> +				return (0);
> +			}
> +		}
> +		break;
> +
> +
> +		/*
> +		 * Is there a good place for this?
> +		 * Supposedly also zeroed before it can be used, right?
> +		 */
> +
> +	case	P_CPUID:
> +		{
> +			struct thread	*td1;
> +
> +			td1 = FIRST_THREAD_IN_PROC(p);
> +			if (td1->td_lastcpu != (unsigned) id) {
> +				PROC_UNLOCK(p);
> +				return (0);
> +			}
> +		}
> +		break;
> +#endif
> +
> +	default:
>  		PROC_UNLOCK(p);
>  		return (0);
> +		break;
>  	}
> +
>  	if (p_canwait(td, p)) {
>  		PROC_UNLOCK(p);
>  		return (0);
>  	}
>  
> +	if (((options & WEXITED) == 0) && (p->p_state == PRS_ZOMBIE)) {
> +		PROC_UNLOCK(p);
> +		return (0);
> +	}
> +		
>  	/*
>  	 * This special case handles a kthread spawned by linux_clone
>  	 * (see linux_misc.c).  The linux_wait4 and linux_waitpid
> @@ -867,8 +1024,57 @@
>  	}
>  
>  	PROC_SLOCK(p);
> +
> +	/* New siginfo stuff... */
> +
> +	if (siginfo) {
> +		bzero (siginfo, sizeof (*siginfo));
> +		siginfo->si_signo = SIGCHLD;
> +		siginfo->si_errno = 0;
> +
> +		/*
> +		 *  Right, this is still a rough estimate.
> +		 *  We will fix the cases TRAPPED, STOPPED,
> +		 *  and CONTINUED later.
> +		 */
> +
> +		if (WCOREDUMP(p->p_xstat))
> +			siginfo->si_code = CLD_DUMPED;
> +		else if (WIFSIGNALED(p->p_xstat))
> +			siginfo->si_code = CLD_KILLED;
> +		else
> +			siginfo->si_code = CLD_EXITED;
> +
> +		siginfo->si_pid = p->p_pid;
> +		siginfo->si_uid = p->p_ucred->cr_uid;
> +		siginfo->si_status = p->p_xstat;

Hmm, is it possible to use p_ksi instead of duplicating code here?

> +
> +		/*
> +		 *  The si_addr field would be useful
> +		 *  additional detail, but apparently
> +		 *  the PC value may be lost when we
> +		 *  reach this point.
> +		 */
> +		siginfo->si_addr = NULL;	/* XXX */

This would be useless anyway.

> +	}
> +
> +	/*
> +	 * There should be no reason to limit resources usage info
> +	 * to exited processes only.
> +	 * A snapshot about any resources used by a stopped process
> +	 * may be exactly what is needed.
> +	 * (1) Solaris limits available info to times only.
> +	 * (2) Linux does not declare any limitations.
> +	 * (3) Now we within the same PROC_SLOCK anyway.
> +	 */
> +
> +	if (rusage) {
> +		*rusage = p->p_ru;
> +		calcru(p, &rusage->ru_utime, &rusage->ru_stime);
> +	}
> +


>  	if (p->p_state == PRS_ZOMBIE) {
> -		proc_reap(td, p, status, options, rusage);
> +		proc_reap(td, p, status, options);
>  		return (-1);
>  	}

It would be desirable to have the rusage from the child's waited child
processes added in for the stopped/continued case as well (like ps S).
kern_proc.c appears to do this using calccru().

>  	PROC_SUNLOCK(p);
> @@ -877,24 +1083,75 @@
>  }
>  
>  int
> -kern_wait(struct thread *td, pid_t pid, int *status, int options,
> -    struct rusage *rusage)
> +kern_wait(struct thread *td, pid_t pid,
> +	  int *status, int options, struct rusage *rusage)
> +{
> +	idtype_t idtype;
> +	id_t id;
> +
> +	if (pid == WAIT_ANY) {
> +		idtype = P_ALL;
> +		id = 0;
> +	}
> +	else if (pid <= 0) {
> +		idtype = P_PGID;
> +		id = (id_t) -pid;
> +	}
> +	else {
> +		idtype = P_PID;
> +		id = (id_t) pid;
> +	}
> +
> +	/*
> +	 *  For backward compatibility we implicitly add
> +	 *  flags WEXITED & WTRAPPED here.
> +	 */
> +
> +	options |= (WEXITED | WTRAPPED);
> +
> +	return (kern_wait6 (td, idtype, id, status, options, rusage, NULL));
> +}
> +
> +int
> +kern_wait6(struct thread *td, idtype_t idtype, id_t id,
> +	   int *status, int options,
> +	   struct rusage *rusage, siginfo_t *siginfo)
>  {
>  	struct proc *p, *q;
>  	int error, nfound, ret;
>  
> -	AUDIT_ARG_PID(pid);
> +#if 0
> +	AUDIT_ARG_VALUE((int) idtype);	/* XXX - This is likely wrong! */
> +#endif
> +	AUDIT_ARG_PID((pid_t) id);	/* XXX - This may be wrong! */
>  	AUDIT_ARG_VALUE(options);
>  
>  	q = td->td_proc;
> -	if (pid == 0) {
> +
> +	if (((pid_t) id == WAIT_MYPGRP) &&
> +	    ((idtype == P_PID) || (idtype == P_PGID))) {
>  		PROC_LOCK(q);
> -		pid = -q->p_pgid;
> +		id = (id_t) q->p_pgid;
>  		PROC_UNLOCK(q);
> +		idtype = P_PGID;
>  	}
> +
>  	/* If we don't know the option, just return. */
> -	if (options & ~(WUNTRACED|WNOHANG|WCONTINUED|WNOWAIT|WLINUXCLONE))
> +	if (options & ~(WUNTRACED|WNOHANG|WCONTINUED|WNOWAIT|WEXITED|WTRAPPED|WLINUXCLONE))
>  		return (EINVAL);
> +
> +	if ((options & (WEXITED|WUNTRACED|WCONTINUED|WTRAPPED)) == 0) {
> +		/*
> +		 * We will be unable to find any matching processes.
> +		 * Simply behave as WHOHANG were specified, because
> +		 * waiting for real will not help.
> +		 */
> +		if (siginfo)
> +			bzero (siginfo, sizeof (*siginfo));
> +		td->td_retval[0] = 0;
> +		return (0);
> +	}
> +
>  loop:
>  	if (q->p_flag & P_STATCHILD) {
>  		PROC_LOCK(q);
> @@ -904,7 +1161,8 @@
>  	nfound = 0;
>  	sx_xlock(&proctree_lock);
>  	LIST_FOREACH(p, &q->p_children, p_sibling) {
> -		ret = proc_to_reap(td, p, pid, status, options, rusage);
> +		ret = proc_to_reap(td, p, idtype, id,
> +				   status, options, rusage, siginfo);
>  		if (ret == 0)
>  			continue;
>  		else if (ret == 1)
> @@ -914,20 +1172,65 @@
>  
>  		PROC_LOCK(p);
>  		PROC_SLOCK(p);
> -		if ((p->p_flag & P_STOPPED_SIG) &&
> +
> +		if ((options & WTRAPPED) &&
> +		    (p->p_flag & P_TRACED) &&
> +		    (p->p_flag & (P_STOPPED_TRACE | P_STOPPED_SIG)) &&
>  		    (p->p_suspcount == p->p_numthreads) &&
> -		    (p->p_flag & P_WAITED) == 0 &&
> -		    (p->p_flag & P_TRACED || options & WUNTRACED)) {
> +		    ((p->p_flag & P_WAITED) == 0)) {
>  			PROC_SUNLOCK(p);
> -			p->p_flag |= P_WAITED;
> +
> +			if ((options & WNOWAIT) == 0)
> +				p->p_flag |= P_WAITED;
> +
>  			sx_xunlock(&proctree_lock);
>  			td->td_retval[0] = p->p_pid;
> +
>  			if (status)
>  				*status = W_STOPCODE(p->p_xstat);
>  
> -			PROC_LOCK(q);
> -			sigqueue_take(p->p_ksi);
> -			PROC_UNLOCK(q);
> +			if (siginfo) {
> +				siginfo->si_status = W_STOPCODE(p->p_xstat);
> +				siginfo->si_code = CLD_TRAPPED;
> +			}
> +
> +			if ((options & WNOWAIT) == 0) {
> +				PROC_LOCK(q);
> +				sigqueue_take(p->p_ksi);
> +				PROC_UNLOCK(q);
> +			}
> +
> +			PROC_UNLOCK(p);
> +
> +			return (0);
> +		}
> +
> +		if ((options & WUNTRACED) &&
> +		    (p->p_flag & P_STOPPED_SIG) &&
> +		    (p->p_suspcount == p->p_numthreads) &&
> +		    ((p->p_flag & P_WAITED) == 0)) {
> +			PROC_SUNLOCK(p);
> +
> +			if ((options & WNOWAIT) == 0)
> +				p->p_flag |= P_WAITED;
> +
> +			sx_xunlock(&proctree_lock);
> +			td->td_retval[0] = p->p_pid;
> +
> +			if (status)
> +				*status = W_STOPCODE(p->p_xstat);
> +
> +			if (siginfo) {
> +				siginfo->si_status = W_STOPCODE(p->p_xstat);
> +				siginfo->si_code = CLD_STOPPED;
> +			}
> +
> +			if ((options & WNOWAIT) == 0) {
> +				PROC_LOCK(q);
> +				sigqueue_take(p->p_ksi);
> +				PROC_UNLOCK(q);
> +			}
> +
>  			PROC_UNLOCK(p);
>  
>  			return (0);
> @@ -936,15 +1239,25 @@
>  		if (options & WCONTINUED && (p->p_flag & P_CONTINUED)) {
>  			sx_xunlock(&proctree_lock);
>  			td->td_retval[0] = p->p_pid;
> -			p->p_flag &= ~P_CONTINUED;
>  
> -			PROC_LOCK(q);
> -			sigqueue_take(p->p_ksi);
> -			PROC_UNLOCK(q);
> +			if ((options & WNOWAIT) == 0) {
> +				p->p_flag &= ~P_CONTINUED;
> +
> +				PROC_LOCK(q);
> +				sigqueue_take(p->p_ksi);
> +				PROC_UNLOCK(q);
> +			}
> +
>  			PROC_UNLOCK(p);
>  
>  			if (status)
>  				*status = SIGCONT;
> +
> +			if (siginfo) {
> +				siginfo->si_status = SIGCONT;
> +				siginfo->si_code = CLD_CONTINUED;
> +			}
> +
>  			return (0);
>  		}
>  		PROC_UNLOCK(p);
> @@ -963,7 +1276,8 @@
>  	 * to successfully wait until the child becomes a zombie.
>  	 */
>  	LIST_FOREACH(p, &q->p_orphans, p_orphan) {
> -		ret = proc_to_reap(td, p, pid, status, options, rusage);
> +		ret = proc_to_reap(td, p, idtype, id,
> +				   status, options, rusage, siginfo);
>  		if (ret == 0)
>  			continue;
>  		else if (ret == 1)
> @@ -977,6 +1291,8 @@
>  	}
>  	if (options & WNOHANG) {
>  		sx_xunlock(&proctree_lock);
> +		if (siginfo)
> +			bzero (siginfo, sizeof (*siginfo));
>  		td->td_retval[0] = 0;
>  		return (0);
>  	}
> --- lib/libc/sys/Makefile.inc.orig	2012-01-06 21:29:16.000000000 +0200
> +++ lib/libc/sys/Makefile.inc	2012-07-31 10:29:42.000000000 +0300
> @@ -210,5 +210,5 @@
>  MLINKS+=truncate.2 ftruncate.2
>  MLINKS+=unlink.2 unlinkat.2
>  MLINKS+=utimes.2 futimes.2 utimes.2 futimesat.2 utimes.2 lutimes.2
> -MLINKS+=wait.2 wait3.2 wait.2 wait4.2 wait.2 waitpid.2
> +MLINKS+=wait.2 wait3.2 wait.2 wait4.2 wait.2 waitpid.2 wait.2 waitid.2 wait.2 wait6.2
>  MLINKS+=write.2 pwrite.2 write.2 pwritev.2 write.2 writev.2
> 
> 
> >Release-Note:
> >Audit-Trail:
> >Unformatted:

-- 
Jilles Tjoelker
Comment 6 David Xu freebsd_committer freebsd_triage 2012-08-09 04:50:22 UTC
Excellent work. I think idtype should be types.h.
Also can you improve the patch according to Jilles Tjoelker's
comments.
I found google Chromium browser uses the waitid:
http://code.google.com/p/chromium/source/search?q=waitid&origq=waitid&btnG=Search+Trunk


--
David Xu
Comment 7 jau 2012-08-18 10:56:25 UTC
Right, Jilles made quite a bunch of questions and notes.
I am not even trying to answer them one by one.
On the whole I would like to say that in my mind when
working with operating systems we are not aiming to
a well defined end-user application but providing enabling
technology to be used by the whole community when
the need arises.
So, I do not think there has to be an immediate user for
all features. The whole issue is much like implementation
of code libraries, and the aim should be reuse and
generality, not necessarily immediate use in this or that
program. And when standards or tradition are concerned
I think any OS should aim to support anything which is
either standard or simply frequently supported by other
systems, because both are fair reasons for programmers
to expect this or that feature to be available.
While improving portability of code this also improves
application code quality when the programmers need
less alternate workaround code snippets to make their
application work in a particular environment.

As Jilles proposed I changed the code to return EINVAL
when none of WEXITED|WSTOPPED|WTRAPPED|WCONTINUED
are set. The sort of softer approach I initially used would
probably not be strong enough signal to a programmer
to realize (s)he is doing something silly.

Jilles also pointed out that there was some odd extra code
left behind where it did not belong. I still have no idea how
it had slipped through, but that should be now fixed as well.

A whole lot of typos in the comments and in the man page
have been fixed, also ones which Jilles did not point out.

And as before, start with a fresh system, apply the patch,
and then run ...

( cd /usr/src/sys/kern ; make sysent )

Otherwise the build will fail.

Have a nice weekend.
--jau
Comment 8 jau 2012-09-27 09:19:00 UTC
Right,

Unless someone finds some really major trouble in this version of the 
patch I
will try to avoid any further changes.
On the whole I consider this now mature enough that it deserves also a 
32 bit
shim for 64 bit systems to be included.

Previously Jilles proposed allowing resource usage statistics to be 
collected
not only from the child process but also from its children. I had myself had
the same idea already before, but initially I discarded it just thinking
"Nah... it would not be frequently used anyhow." Being reminded of the idea
made me think it over from the point of view that even a potentially rarely
used feature can be enabling technology for purposes I cannot see yet.
So, I decided to add the feature. For a while I thought I would simply
replace the old pointer to struct rusage with a pointer to an array of two
of these rusage structures, but then it dawned to me that it would be error
prone. For a compiler a pointer to one structure or a pointer to a two slot
array of the same type of structures would be pretty much the same thing.
So, I decided to introduce a completely new struct wrusage (for wider
rusage) and changed wait6() to take a pointer to one of those instead of
the old rusage. The new struct wrusage contains two fields

struct rusage    wru_self;
struct rusage    wru_children;


This allows compilers to make a distinction between pointers to two
distinct structures potentially avoiding some confusion and errors.

I had received some notes offline via e-mail from kib. I have tried to pay
attention to those comments as well.

When trying this, start with a clean 9.1-prerelease source, apply the patch,
and run these...

( cd /usr/src/sys/kern ; make sysent )
( cd /usr/src/sys/compat/freebsd32 ; make sysent )


Without those "make sysent" commands the build will fail.

Cheers,
// jau
Comment 9 jau 2012-10-13 11:38:27 UTC
Right, having said that I would stop modifying this I have to
break that promise.
I had forgotten one change which I had intended to make.
Previously the siginfo structure was zeroed inside the kernel
when wait6() returned 0. This has now been moved out of
the kernel and put inside waitid().
Now this should be stable. I have no more planned changes
waiting to be included.

As before start with 9.1-RC2 (9-stable), apply the patch, run

( cd /usr/src/sys/kern ; make sysent )
( cd /usr/src/sys/compat/freebsd32 ; make sysent )

and build the whole system.

--jau
Comment 10 Konstantin Belousov freebsd_committer freebsd_triage 2013-02-04 11:39:33 UTC
State Changed
From-To: open->closed

Waitid committed to HEAD and merged to stable/9.