Bug 19402 - Signals 127 and 128 cannot be detected in wait4() interface
Summary: Signals 127 and 128 cannot be detected in wait4() interface
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: Unspecified
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2000-06-20 17:00 UTC by Valentin Nechayev
Modified: 2017-08-09 18:37 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Valentin Nechayev 2000-06-20 17:00:01 UTC
Syscall wait4() and libc routines wait3(), waitpid() return status of
terminated/stopped process in "status" parameter passed by pointer in some
encoded format. Standard header <sys/wait.h> provides macros for its decoding.
Some of them are:

#define _WSTATUS(x)     (_W_INT(x) & 0177)
#define _WSTOPPED       0177            /* _WSTATUS if process is stopped */
#define WIFSTOPPED(x)   (_WSTATUS(x) == _WSTOPPED)
#define WIFSIGNALED(x)  (_WSTATUS(x) != _WSTOPPED && _WSTATUS(x) != 0)

But FreeBSD 4 & 5 has signal with number 127, and terminating on this signal
mixes mistakely with stopping. Stopping on signal 128 mixes with
coredumping without signal.

Fix: 

As a quick-and-dirty fix, disable signals 127-128 at all (desrease value of
_SIG_MAXSIG by 2, new value should be 126).

As normal fix, change kernel interface (wait4() syscall).

--
NVA
How-To-Repeat: 
Compile two following test programs:

=== cut si.c ===
#include <sys/types.h>
#include <unistd.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
int main( int argc, char *argv[] )
{
        int sig;
        sig = strtol( argv[1], NULL, 0 );
        signal( sig, SIG_DFL );
        kill( getpid(), sig );
        printf( "si: trace: after kill\n" );
        return 0;
}
=== end cut ===

=== cut sic.c ===
#include <sys/types.h>
#include <sys/wait.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main( int argc, char *argv[] )
{
        int rr;
        pid_t cpid;
#if 0
        char bb[1000];
        snprintf( bb, sizeof bb, "./si %s", argv[1] );
        rr = system( bb );
#endif
        cpid = fork();
        if( cpid == -1 ) { fprintf( stderr, "fork(): failed\n" ); return 2; }
        if( cpid == 0 )
                execl( "./si", "./si", argv[1], NULL );
        else
                waitpid( cpid, &rr, 0 );
        printf( "rr==%d==0x%X\n", rr, rr );
        printf( "Exited: %s\n", WIFEXITED(rr) ? "yes" : "no" );
        printf( "Stopped: %s\n", WIFSTOPPED(rr) ? "yes" : "no" );
        printf( "Signaled: %s\n", WIFSIGNALED(rr) ? "yes" : "no" );
        printf( "Exit status: %d\n", WEXITSTATUS(rr) );
        printf( "Stop sig: %d\n", WSTOPSIG(rr) );
        printf( "Term sig: %d\n", WTERMSIG(rr) );
        printf( "Coredumped: %s\n", WCOREDUMP(rr) ? "yes" : "no" );
        return 0;
}
=== end cut ===

Compile them:
cc -o si si.c
cc -o sic sic.c

and run "sic 126", "sic 127" and "sic 128". Result print attached:

=== cut result log ===
netch@ox:~/tmp>./sic 126
rr==126==0x7E
Exited: no
Stopped: no
Signaled: yes
Exit status: 0
Stop sig: 0
Term sig: 126
Coredumped: no
netch@ox:~/tmp>./sic 127
rr==127==0x7F
Exited: no
Stopped: yes
Signaled: no
Exit status: 0
Stop sig: 0
Term sig: 127
Coredumped: no
netch@ox:~/tmp>./sic 128
rr==128==0x80
Exited: yes
Stopped: no
Signaled: no
Exit status: 0
Stop sig: 0
Term sig: 0
Coredumped: yes
=== end cut ===

With signal 127, WIFSTOPPED() is true.
With signal 128, WCOREDUMP() is true and WIFEXITED() is true. ;(

Also another test:

netch@ox:~/tmp>./si 127
[1]+  Stopped                 ./si 127
netch@ox:~/tmp>fg
./si 127

and in this case bash falls to infinite cycle on waitpid() with eating of
all available CPU. Of course, this is ugly bash bug, but it is called by
kernel interface inconsistency.

Version of system on testing host:

netch@ox:~>uname -mrs
FreeBSD 4.0-STABLE i386
netch@ox:~>fgrep __FreeBSD_version /usr/include/sys/param.h
#undef __FreeBSD_version
#define __FreeBSD_version 400019        /* Master, propagated to newvers */
netch@ox:~>
Comment 1 Mike Barcroft freebsd_committer freebsd_triage 2001-07-22 05:09:54 UTC
State Changed
From-To: open->feedback


Does this problem still occur in newer versions of FreeBSD, 
such as 4.3-RELEASE?
Comment 2 Mike Barcroft freebsd_committer freebsd_triage 2001-07-22 18:51:19 UTC
Adding to Audit-Trail.

On Sun, Jul 22, 2001 at 09:54:00AM +0300, Valentin Nechayev wrote:
>  Sat, Jul 21, 2001 at 21:10:23, mike wrote about "Re: kern/19402: Signals 127 and 128 cannot be detected in wait4() interface": 
> 
> > Synopsis: Signals 127 and 128 cannot be detected in wait4() interface
> > 
> > State-Changed-From-To: open->feedback
> > State-Changed-By: mike
> > State-Changed-When: Sat Jul 21 21:09:54 PDT 2001
> > State-Changed-Why: 
> > 
> > Does this problem still occur in newer versions of FreeBSD,
> > such as 4.3-RELEASE?
> 
> Yes, it still occurs. Nobody changed macros in <sys/wait.h> to resolve this
> conflict, neither in RELENG_4 nor in HEAD.
> 
> I can create proposition (in form of patch) how they should be changed
> (this will use fact that wait4() status is 32 bits, but only low 16 bits
> are used) but this will be ABI change with incompatibility for
> signals 64...128 when bit shifts are used and only 128 in expensive
> variant of multiply/delete.
> Yet another variant is to exclude signals 127 and 128, this variant
> AFAIU conflicts with POSIX.
> 
> Another point view is this problem is most architectural and should be
> first discussed in -arch or -hackers, not in -bugs, and it (problem)
> is too complicated to fit in frames of gnats db. But IMO it does _not_
> mean the PR should be closed, because problem keeps.
> 
> 
> /netch
Comment 3 Mike Barcroft freebsd_committer freebsd_triage 2001-07-22 19:03:04 UTC
State Changed
From-To: feedback->suspended


This is still a problem.  See the originator's comments in the 
Audit-Trail.  Awaiting fix and committer.
Comment 4 Jilles Tjoelker freebsd_committer freebsd_triage 2012-04-29 23:46:19 UTC
> [problems with signals 127 and 128]

First, note that "clean" programs cannot use signals 127 and 128 because
they do neither have a SIG* constant nor are in the range SIGRTMIN to
SIGRTMAX. Therefore, I think it is inappropriate to make large changes
to make them work. It suffices if wait4() and other interfaces cannot
cause confusion.

Because sh returns exit status 128+sig for signal sig, signal 128 cannot
be represented in an 8-bit exit status and would have to be aliased to
another signal if it is kept.

The suggestion to modify wait4() ABI seems inappropriate for that
reason.

Another option is to modify the highest signal number accepted by
interfaces (while leaving the size of sigset_t and the like unchanged).
This effectively removes signals 127 and 128 from the system. One
problem results when having posix_spawn() from an old libc reset all
signals to default (by passing posix_spawnattr_setsigdefault() a
sigfillset()'ed sigset_t and enabling the POSIX_SPAWN_SETSIGDEF flag in
the posix_spawnattr_t). It will then attempt to set all signals from 1
to 128 to the default action and fail the entire spawn if sigaction()
fails. This could be allowed by having certain calls (such as
sigaction() with SIG_DFL) return success without doing anything for
signals 127 and 128. This is likely to get messy.

Alternatively, the default action for signals 127 and 128 could be
changed to ignore (like SIGCHLD, SIGURG and SIGINFO), so that no process
may terminate because of them. Processes can still send the signals and
set handlers for them. Apart from the obvious effect that the process
will not terminate when it receives such a signal without handling or
masking it, FreeBSD also discards ignored signals even when they are
masked (POSIX permits this). This could lead to unexpected results if a
process is using sigwait() or a similar function.

Yet another approach would modify the wait4() system call, changing
signals 127 and 128 to something that does not cause confusion. This
seems ugly.

-- 
Jilles Tjoelker
Comment 5 Bruce Evans freebsd_committer freebsd_triage 2012-04-30 09:24:51 UTC
On Mon, 30 Apr 2012, Jilles Tjoelker wrote:

>> [problems with signals 127 and 128]
>
> First, note that "clean" programs cannot use signals 127 and 128 because
> they do neither have a SIG* constant nor are in the range SIGRTMIN to
> SIGRTMAX. Therefore, I think it is inappropriate to make large changes
> to make them work. It suffices if wait4() and other interfaces cannot
> cause confusion.

I agree with not making large changes, of course.

I wonder if there is a technical reason why 127 and and 128 were left out
of SIGRTMAX.  In 4.4BSD, NSIG was only 32, with a comment saying that 33
is possible (since NSIG counts signal 0).  Signal 32 would have caused
fewer problems than signal 128 does now, but was left out.  In Linux
(2.6.10 for x86-64, i386 and many others), NSIG is 32 and _NSIG is 64;
apparently NSIG counts signal 0 but _NSIG doesn't, similarly to
FreeBSD except for the spelling and value of _NSIG and all signals up
to and including _NSIG being supported (SIGRTMIN is NSIG = 32 and
SIGRTMAX is _NSIG = 64; FreeBSD uses the better spelling _SIG_MAXSIG
for _NSIG); a max of 64 causes fewer technical problems and less bloat.

> Because sh returns exit status 128+sig for signal sig, signal 128 cannot
> be represented in an 8-bit exit status and would have to be aliased to
> another signal if it is kept.
>
> The suggestion to modify wait4() ABI seems inappropriate for that
> reason.
>
> Another option is to modify the highest signal number accepted by
> interfaces (while leaving the size of sigset_t and the like unchanged).
> This effectively removes signals 127 and 128 from the system. One
> problem results when having posix_spawn() from an old libc reset all
> signals to default (by passing posix_spawnattr_setsigdefault() a
> sigfillset()'ed sigset_t and enabling the POSIX_SPAWN_SETSIGDEF flag in
> the posix_spawnattr_t). It will then attempt to set all signals from 1
> to 128 to the default action and fail the entire spawn if sigaction()
> fails. This could be allowed by having certain calls (such as
> sigaction() with SIG_DFL) return success without doing anything for
> signals 127 and 128. This is likely to get messy.
>
> Alternatively, the default action for signals 127 and 128 could be
> changed to ignore (like SIGCHLD, SIGURG and SIGINFO), so that no process
> may terminate because of them. Processes can still send the signals and
> set handlers for them. Apart from the obvious effect that the process
> will not terminate when it receives such a signal without handling or
> masking it, FreeBSD also discards ignored signals even when they are
> masked (POSIX permits this). This could lead to unexpected results if a
> process is using sigwait() or a similar function.
> 
> Yet another approach would modify the wait4() system call, changing
> signals 127 and 128 to something that does not cause confusion. This
> seems ugly.

I think I prefer disallowing signal 128 and not worry about unportable
programs using it, and not changing anything for signal 127 and not worry
about the ambiguous wait status from this.

Emulators give interesting problems with signal ranges.  FreeBSD seems
to handle these problems mostly correctly in the Linux emulator.  First,
it needs a host signal range larger than the target signal range.
[0..126], [0..127] and [0..128] all exceed the Linux range of [0..64],
so there is no problem yet.  However, for mips under Linux, _NSIG is 128,
so the full FreeBSD range might be needed, depending on how Linux handles
the problem with wait statuses.   FreeBSD mostly uses the Linux _NSIG
correctly, so it gets target limits.  It also translates signal numbers
below NSIG, so it knows a little about NSIG counting signal 0.  However,
in linux_ioctl.c, it still uses the old FreeBSD signal number NSIG in a
private ISSIGVALID() macro instead of using its standard macro
LINUX_SIG_VALID() which uses _NSIG correctly.  ISSIGVALID() is only used
for the VT_SETMODE ioctl, and FreeBSD's signal handling for this differs
in other ways than Linux's (FreeBSD fixes up mode.frsig (but only if it
and mode.acqsig are invalid according to the private macro), while Linux
ignores mode.frsig.  The private macro might even be correct, with making
it look like a standard macro just obfuscating any magic for NSIG here.

Bruce
Comment 6 Valentin Nechayev 2012-04-30 10:04:54 UTC
Hi,

 Mon, Apr 30, 2012 at 00:46:19, jilles wrote about "Re: kern/19402: Signals 127 and 128 cannot be detected in wait4() interface": 

> > [problems with signals 127 and 128]
> 
> First, note that "clean" programs cannot use signals 127 and 128 because
> they do neither have a SIG* constant nor are in the range SIGRTMIN to
> SIGRTMAX.

You are correct here now, but not at the time I have issued the original
request. Values for SIGRTMIN, SIGRTMAX initially appeared only in
version 1.47 (Oct 2005) and was incorrect. Revision 1.53 reduced
SIGRTMAX from 128 to 126 exactly concerning this my PR. So, if we stick
on treating 126 as maximal possible signal number which doesn't break
existing ABI, all seems satisfied and I suggest simply to close it as
fixed. No need to change any more.


-netch-
Comment 7 Valentin Nechayev 2012-04-30 10:22:54 UTC
 Mon, Apr 30, 2012 at 18:24:51, brde wrote about "Re: kern/19402: Signals 127 and 128 cannot be detected in wait4() interface": 

> I think I prefer disallowing signal 128 and not worry about unportable
> programs using it, and not changing anything for signal 127 and not worry
> about the ambiguous wait status from this.

As soon as realtime signals are already kind of feature very limited in
use, and correct program doesn't allocate them in manner linear
dependent on checked descriptor count, I guess it's too improbable to
see a program which uses more than 10-16 realtime signals. Our current
limit 62 is much more.

> However, for mips under Linux, _NSIG is 128,

If they didn't change the wait*() exitstatus ABI under MIPS (and as far
as I see at the code, this ABI is platform independent), Linux have the
same problems with signals 127 and 128 and their usage is incorrect.
I guess it's better to discuss the issue in LKML and wait for Linux reaction.


-netch-
Comment 8 Valentin Nechayev 2012-05-06 08:09:29 UTC
> You are correct here now, but not at the time I have issued the original
> request. Values for SIGRTMIN, SIGRTMAX initially appeared only in
> version 1.47 (Oct 2005) and was incorrect. Revision 1.53 reduced
> SIGRTMAX from 128 to 126 exactly concerning this my PR. So, if we stick
> on treating 126 as maximal possible signal number which doesn't break
> existing ABI, all seems satisfied and I suggest simply to close it as
> fixed. No need to change any more.

Forgot to mention _SIG_MAXSIG which also should be reduced if used.


-netch-
Comment 9 Eugene Grosbein freebsd_committer freebsd_triage 2017-08-09 18:37:56 UTC
Closed at submitter's request (SIGRTMAX is 126).