Bug 251227 - setpgid sometimes returns ESRCH instead of EACCES
Summary: setpgid sometimes returns ESRCH instead of EACCES
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Many People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-11-18 01:18 UTC by Mahmoud Al-Qudsi
Modified: 2020-11-19 18:33 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mahmoud Al-Qudsi 2020-11-18 01:18:09 UTC
setpgid(2) says

>      [ESRCH]            The requested process does not exist.
>      [ESRCH]            The target process is not the calling process or a
>                         child of the calling process.
>      [EACCES]           The requested process is a child of the calling
>                         process, but it has performed an exec(3) operation.

However, as demonstrated by the following test, FreeBSD is incorrectly returning ESRCH when setpgid(2) is called on a child that has called both fork(2) and execv(3) by the time the parent calls setpgid(2).

```
#include <assert.h>
#include <errno.h>
#include <stdio.h>
#include <unistd.h>

int main() {
    pid_t pid = vfork();
    if (pid == 0) {
        // Child process
        char * const argv[] = { "env", "true", NULL };
        execv("/usr/bin/env", argv);
        assert(0 && "child returned after execv!");
    }

    // Parent process
    sleep(1);
    int result = setpgid(pid, pid);
    assert(result != 0);
    printf("error code: %d\n", errno);
    assert(errno == EACCES && "incorrect error code returned!");
    return 0;
}
```

The test above fails on FreeBSD 11.2 and FreeBSD 12.2; it passes on Linux.

The underlying issue appears to be some sort of race condition: the implementation of sys_setpgid in sys/kern/kern_prot.c correctly checks for the P_EXEC flag and sets errno to EACCES if present.

You'll notice the use of vfork(2) in the test above, which, though not guaranteed, in practice blocks execution of the parent process until exec(3) has been called in the child. If the sleep(1) call is omitted from the test, the test passes (EACCES is returned because execv(3) has been called by the child process). With or without the call to sleep(1), the unreaped child process remains present until the parent terminates (i.e. the referenced target pid does exist). Fundamentally, the behavior of the parent (and in particular, the return code from its call to setpgid) should be the same both with and without the call to sleep(1), given the semantics of vfork(3).

(The behavior does not change if /usr/bin/true is spawned directly instead of via /usr/bin/env, i.e. this is not an issue with child vs grandchild.)
Comment 1 Conrad Meyer freebsd_committer freebsd_triage 2020-11-18 01:57:57 UTC
> the unreaped child process remains present until the parent terminates (i.e. the referenced target pid does exist)

I don't know if it's true that we consider zombie processes "to exist" in the sense of this manual page.  In implementation, we don't — setpgid(2) uses pfind(9), which ignores zombie processes, instead of pfind_any(9), which can locate zombie processes.  This is relatively common: e.g., there are 30 invocations of pfind() in the core kernel, but only 5 of pfind_any().

I'm not sure this behavior difference with Linux matters.  Can you explain more about why you think this is a bug?
Comment 2 Conrad Meyer freebsd_committer freebsd_triage 2020-11-18 02:00:45 UTC
(If you change the exec in the child to run 'sleep 10' or something like that, rather than 'env', I'd expect you would observe EACCES instead of ESRCH.)
Comment 3 Mahmoud Al-Qudsi 2020-11-18 02:17:57 UTC
The main issue is that a failed zombie race breaks job control in shells; that's actually what led me to file this issue. We were getting reports of setpgid(2) failure when setting up a job in fish [0].

Typical job control setup involves setting up a new pgrp that has control of the terminal; by convention the pgrp is assigned the pid of the first process executed in the job pipeline. When executing `foo | bar`, there's obviously no hard guarantee that by the time the shell forks to init `bar`, `foo` has not yet finished execution (except if you add cross-process synchronization post-fork but pre-exec, which is extremely heavy handed and performs noticeably poorly). Shells count on the fact that as long as they have not reaped `foo`, then job pgrp with the same pid as `foo` will still be around by the time the shell calls setpgid for `bar`.

Apart from the bigger issue that using pfind() instead of pfind_any() here prevents a subsequent process in the same job from getting access to a shell that was assigned over to the newly minted pgrp that now contains only zombies, EACCES is used to distinguish between actual errors calling setpgid (e.g. EPERM, EINVAL, and in other cases, ESRCH) that qualify as exceptions stemming from incorrect call semantics from the unavoidable race condition where a shell needs to call setpgid but is scheduled after the child's fork+exec has already occurred. So shells abort or error out when ESRCH is returned, but silently ignore EACCES because it's an expected race condition. This exact behavior is actually spelled out in the POSIX.1-2004's setpgid page under the section entitled "RATIONALE" [1] (I don't have a copy of POSIX.1-2001 in front of me right now).

[0]: https://github.com/fish-shell/fish-shell/issues/7474
[1]: https://pubs.opengroup.org/onlinepubs/009695399/functions/setpgid.html
Comment 4 Conrad Meyer freebsd_committer freebsd_triage 2020-11-18 17:53:06 UTC
Thanks for the additional context, Mahmoud!

I guess I'm confused on why this is a problem.  Can fish just ignore ESRCH in this context on FreeBSD?  Certainly half a dozen other shells run on FreeBSD and implement some form of job control.  E.g., /bin/sh ignores the return value of setpgid() entirely.

I don't see any requirement in POSIX around FreeBSD's ESRCH behavior (I looked at it briefly yesterday, too).
Comment 5 Mahmoud Al-Qudsi 2020-11-19 03:41:45 UTC
Thanks for bringing my attention to the approach taken by sh - it made me
realize what I'd missed. So you're right, there's no broken job control here.

This is a real issue:

> [..] prevents a subsequent process in the same job from getting access to a
> shell that was assigned over to the newly minted pgrp that now contains only
> zombies

but it doesn't apply to FreeBSD, which is why sh can get away with ignoring any
setpgid failures. (Interestingly, tcsh handles this situation by swapping pgrps
in the middle of a single job invocation.)

It turns out that while FreeBSD does not consider zombie processes as valid
targets of a setpgid(2) call, it *does* keep their pgrp around and permits
other processes to join the pgrp even if it only contains zombies.

While the original issue persists and setpgid(zombie_pid, zombie_pid) calls fail
with ESRCH instead of the expected EACCES, as you say, this is trivially handled
if you have any degree of confidence that you are specifying a target process
that is or was a valid child pid from the same session... and (more tenuously)
assuming you know this is the behavior on at least some of the platforms you are
targeting.

It's not the end of the world by any means, but I would argue that the
consideration of zombie child processes should be an everywhere or nowhere at
all type of thing (and this is where my assumption of broken job control came
from). e.g. Linux considers zombie processes in setpgid calls (only to determine
the error code) and lets new processes join pgrps that contain only zombies,
whereas some kernels (e.g. WSLv1 up until recently) don't consider zombies at
all and reclaim pgrps as soon as the last process in it exits (before it's
reaped).

That aside, I'm assuming we can all agree it would be great for the man pages to
clarify the behavior when it comes to zombie procs. Are you open to a patch?

Thanks!
Comment 6 Conrad Meyer freebsd_committer freebsd_triage 2020-11-19 18:33:13 UTC
Admittedly, I am not especially familiar with job control, shells, or setpgid.  It might be desirable for zombies to be included in more places they currently are not.  I really don't know.

I would love a patch that improves the documentation clarity in this area.