Created attachment 152953 [details] zombie demonstration Attached is a zombie demonstration - it forks and the child immediately exits. The parent sleeps for one second and does not wait() for the child's status. As expected it terminates after 1 second: joule% /usr/bin/time ./a.out 1.02 real 0.00 user 0.00 sys However, running under timeout(1) results in waiting for the timeout period to expire: joule% /usr/bin/time /timeout 10s ./a.out 10.02 real 0.00 user 0.00 sys It looks like the issue is that we collect only one child status (cpid = wait(&status)), which happens to be the zombie from a.out (cpid != pid). We then loop to sigsuspend() and get stuck until the timeout expires.
Just to notice, GNU timeout handle this example correctly: /usr/bin/time gtimeout 10 ./a.out 1,02 real 0,00 user 0,00 sys
Created attachment 152960 [details] Check the monitoring pid on sigchld This patch checks for the monitored pid when receiving a SIGCHILD not from the moniror pid and only 1 process is left under control of timeout(1) Seems to fix the issue for me, can you confirm?
Created attachment 152967 [details] demo with 10 zombie children
It's not sufficient because it needs to loop and collect all outstanding zombies. E.g. with this version of zombie.c (also attached) it still waits: #include <stdlib.h> #include <unistd.h> int main () { int i; for (i = 0; i < 10; i++) if (fork() == 0) exit (0); sleep (1); return (0); }
Could we just simple look what GNU timeout does instead of reinventing the wheel? Sources are in sysutils/coreutils. It handles 10-zombies version well too.
Created attachment 152974 [details] loop to collect status from all children
(In reply to Ed Maste from comment #6) Am I right that the problem appears when the direct child forked, and then exited before the grandchild ? The patch seems to be a right thing to do anyway.
Ed your patch looks good to me please commit
A commit references this bug: Author: emaste Date: Sun Feb 15 20:10:54 UTC 2015 New revision: 278810 URL: https://svnweb.freebsd.org/changeset/base/278810 Log: timeout: handle zombie grandchildren timeout previously collected only one child status with wait(2). If this was one of the grandchildren timeout would return to sigsuspend and wait until the timeout expired. Instead, loop for all children. PR: kern/197608 Reviewed by: bapt, kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Changes: head/usr.bin/timeout/timeout.c