Bug 197608 - timeout(1) does not handle zombie grandchildren
Summary: timeout(1) does not handle zombie grandchildren
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Many People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-02-14 02:59 UTC by Ed Maste
Modified: 2015-05-29 09:59 UTC (History)
3 users (show)

See Also:


Attachments
zombie demonstration (80 bytes, text/x-csrc)
2015-02-14 02:59 UTC, Ed Maste
no flags Details
Check the monitoring pid on sigchld (480 bytes, patch)
2015-02-14 09:41 UTC, Baptiste Daroussin
no flags Details | Diff
demo with 10 zombie children (150 bytes, text/x-csrc)
2015-02-14 12:10 UTC, Ed Maste
no flags Details
loop to collect status from all children (1.65 KB, patch)
2015-02-14 16:02 UTC, Ed Maste
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ed Maste freebsd_committer freebsd_triage 2015-02-14 02:59:59 UTC
Created attachment 152953 [details]
zombie demonstration

Attached is a zombie demonstration - it forks and the child immediately exits. The parent sleeps for one second and does not wait() for the child's status. As expected it terminates after 1 second:

joule% /usr/bin/time ./a.out
        1.02 real         0.00 user         0.00 sys

However, running under timeout(1) results in waiting for the timeout period to expire:

joule% /usr/bin/time /timeout 10s ./a.out
       10.02 real         0.00 user         0.00 sys

It looks like the issue is that we collect only one child status (cpid = wait(&status)), which happens to be the zombie from a.out (cpid != pid). We then loop to sigsuspend() and get stuck until the timeout expires.
Comment 1 Andrey A. Chernov freebsd_committer freebsd_triage 2015-02-14 03:19:52 UTC
Just to notice, GNU timeout handle this example correctly:
/usr/bin/time gtimeout 10 ./a.out
        1,02 real         0,00 user         0,00 sys
Comment 2 Baptiste Daroussin freebsd_committer freebsd_triage 2015-02-14 09:41:13 UTC
Created attachment 152960 [details]
Check the monitoring pid on sigchld

This patch checks for the monitored pid when receiving a SIGCHILD not from the moniror pid and only 1 process is left under control of timeout(1)

Seems to fix the issue for me, can you confirm?
Comment 3 Ed Maste freebsd_committer freebsd_triage 2015-02-14 12:10:19 UTC
Created attachment 152967 [details]
demo with 10 zombie children
Comment 4 Ed Maste freebsd_committer freebsd_triage 2015-02-14 12:12:03 UTC
It's not sufficient because it needs to loop and collect all outstanding zombies. E.g. with this version of zombie.c (also attached) it still waits:

#include <stdlib.h>
#include <unistd.h>

int main ()
{
        int i;

        for (i = 0; i < 10; i++)
                if (fork() == 0)
                        exit (0);
        sleep (1);
        return (0);
}
Comment 5 Andrey A. Chernov freebsd_committer freebsd_triage 2015-02-14 13:20:02 UTC
Could we just simple look what GNU timeout does instead of reinventing the wheel?
Sources are in sysutils/coreutils.
It handles 10-zombies version well too.
Comment 6 Ed Maste freebsd_committer freebsd_triage 2015-02-14 16:02:56 UTC
Created attachment 152974 [details]
loop to collect status from all children
Comment 7 Konstantin Belousov freebsd_committer freebsd_triage 2015-02-14 18:28:44 UTC
(In reply to Ed Maste from comment #6)

Am I right that the problem appears when the direct child forked, and then exited before the grandchild ?

The patch seems to be a right thing to do anyway.
Comment 8 Baptiste Daroussin freebsd_committer freebsd_triage 2015-02-14 20:27:18 UTC
Ed your patch looks good to me please commit
Comment 9 commit-hook freebsd_committer freebsd_triage 2015-02-15 20:11:32 UTC
A commit references this bug:

Author: emaste
Date: Sun Feb 15 20:10:54 UTC 2015
New revision: 278810
URL: https://svnweb.freebsd.org/changeset/base/278810

Log:
  timeout: handle zombie grandchildren

  timeout previously collected only one child status with wait(2). If this
  was one of the grandchildren timeout would return to sigsuspend and wait
  until the timeout expired. Instead, loop for all children.

  PR:		kern/197608
  Reviewed by:	bapt, kib
  MFC after:	1 week
  Sponsored by:	The FreeBSD Foundation

Changes:
  head/usr.bin/timeout/timeout.c