Created attachment 152953 [details]
Attached is a zombie demonstration - it forks and the child immediately exits. The parent sleeps for one second and does not wait() for the child's status. As expected it terminates after 1 second:
joule% /usr/bin/time ./a.out
1.02 real 0.00 user 0.00 sys
However, running under timeout(1) results in waiting for the timeout period to expire:
joule% /usr/bin/time /timeout 10s ./a.out
10.02 real 0.00 user 0.00 sys
It looks like the issue is that we collect only one child status (cpid = wait(&status)), which happens to be the zombie from a.out (cpid != pid). We then loop to sigsuspend() and get stuck until the timeout expires.
Just to notice, GNU timeout handle this example correctly:
/usr/bin/time gtimeout 10 ./a.out
1,02 real 0,00 user 0,00 sys
Created attachment 152960 [details]
Check the monitoring pid on sigchld
This patch checks for the monitored pid when receiving a SIGCHILD not from the moniror pid and only 1 process is left under control of timeout(1)
Seems to fix the issue for me, can you confirm?
Created attachment 152967 [details]
demo with 10 zombie children
It's not sufficient because it needs to loop and collect all outstanding zombies. E.g. with this version of zombie.c (also attached) it still waits:
int main ()
for (i = 0; i < 10; i++)
if (fork() == 0)
Could we just simple look what GNU timeout does instead of reinventing the wheel?
Sources are in sysutils/coreutils.
It handles 10-zombies version well too.
Created attachment 152974 [details]
loop to collect status from all children
(In reply to Ed Maste from comment #6)
Am I right that the problem appears when the direct child forked, and then exited before the grandchild ?
The patch seems to be a right thing to do anyway.
Ed your patch looks good to me please commit
A commit references this bug:
Date: Sun Feb 15 20:10:54 UTC 2015
New revision: 278810
timeout: handle zombie grandchildren
timeout previously collected only one child status with wait(2). If this
was one of the grandchildren timeout would return to sigsuspend and wait
until the timeout expired. Instead, loop for all children.
Reviewed by: bapt, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation