Syscall wait4() and libc routines wait3(), waitpid() return status of terminated/stopped process in "status" parameter passed by pointer in some encoded format. Standard header <sys/wait.h> provides macros for its decoding. Some of them are: #define _WSTATUS(x) (_W_INT(x) & 0177) #define _WSTOPPED 0177 /* _WSTATUS if process is stopped */ #define WIFSTOPPED(x) (_WSTATUS(x) == _WSTOPPED) #define WIFSIGNALED(x) (_WSTATUS(x) != _WSTOPPED && _WSTATUS(x) != 0) But FreeBSD 4 & 5 has signal with number 127, and terminating on this signal mixes mistakely with stopping. Stopping on signal 128 mixes with coredumping without signal. Fix: As a quick-and-dirty fix, disable signals 127-128 at all (desrease value of _SIG_MAXSIG by 2, new value should be 126). As normal fix, change kernel interface (wait4() syscall). -- NVA How-To-Repeat: Compile two following test programs: === cut si.c === #include <sys/types.h> #include <unistd.h> #include <signal.h> #include <stdio.h> #include <stdlib.h> int main( int argc, char *argv[] ) { int sig; sig = strtol( argv[1], NULL, 0 ); signal( sig, SIG_DFL ); kill( getpid(), sig ); printf( "si: trace: after kill\n" ); return 0; } === end cut === === cut sic.c === #include <sys/types.h> #include <sys/wait.h> #include <stdlib.h> #include <stdio.h> #include <string.h> int main( int argc, char *argv[] ) { int rr; pid_t cpid; #if 0 char bb[1000]; snprintf( bb, sizeof bb, "./si %s", argv[1] ); rr = system( bb ); #endif cpid = fork(); if( cpid == -1 ) { fprintf( stderr, "fork(): failed\n" ); return 2; } if( cpid == 0 ) execl( "./si", "./si", argv[1], NULL ); else waitpid( cpid, &rr, 0 ); printf( "rr==%d==0x%X\n", rr, rr ); printf( "Exited: %s\n", WIFEXITED(rr) ? "yes" : "no" ); printf( "Stopped: %s\n", WIFSTOPPED(rr) ? "yes" : "no" ); printf( "Signaled: %s\n", WIFSIGNALED(rr) ? "yes" : "no" ); printf( "Exit status: %d\n", WEXITSTATUS(rr) ); printf( "Stop sig: %d\n", WSTOPSIG(rr) ); printf( "Term sig: %d\n", WTERMSIG(rr) ); printf( "Coredumped: %s\n", WCOREDUMP(rr) ? "yes" : "no" ); return 0; } === end cut === Compile them: cc -o si si.c cc -o sic sic.c and run "sic 126", "sic 127" and "sic 128". Result print attached: === cut result log === netch@ox:~/tmp>./sic 126 rr==126==0x7E Exited: no Stopped: no Signaled: yes Exit status: 0 Stop sig: 0 Term sig: 126 Coredumped: no netch@ox:~/tmp>./sic 127 rr==127==0x7F Exited: no Stopped: yes Signaled: no Exit status: 0 Stop sig: 0 Term sig: 127 Coredumped: no netch@ox:~/tmp>./sic 128 rr==128==0x80 Exited: yes Stopped: no Signaled: no Exit status: 0 Stop sig: 0 Term sig: 0 Coredumped: yes === end cut === With signal 127, WIFSTOPPED() is true. With signal 128, WCOREDUMP() is true and WIFEXITED() is true. ;( Also another test: netch@ox:~/tmp>./si 127 [1]+ Stopped ./si 127 netch@ox:~/tmp>fg ./si 127 and in this case bash falls to infinite cycle on waitpid() with eating of all available CPU. Of course, this is ugly bash bug, but it is called by kernel interface inconsistency. Version of system on testing host: netch@ox:~>uname -mrs FreeBSD 4.0-STABLE i386 netch@ox:~>fgrep __FreeBSD_version /usr/include/sys/param.h #undef __FreeBSD_version #define __FreeBSD_version 400019 /* Master, propagated to newvers */ netch@ox:~>
State Changed From-To: open->feedback Does this problem still occur in newer versions of FreeBSD, such as 4.3-RELEASE?
Adding to Audit-Trail. On Sun, Jul 22, 2001 at 09:54:00AM +0300, Valentin Nechayev wrote: > Sat, Jul 21, 2001 at 21:10:23, mike wrote about "Re: kern/19402: Signals 127 and 128 cannot be detected in wait4() interface": > > > Synopsis: Signals 127 and 128 cannot be detected in wait4() interface > > > > State-Changed-From-To: open->feedback > > State-Changed-By: mike > > State-Changed-When: Sat Jul 21 21:09:54 PDT 2001 > > State-Changed-Why: > > > > Does this problem still occur in newer versions of FreeBSD, > > such as 4.3-RELEASE? > > Yes, it still occurs. Nobody changed macros in <sys/wait.h> to resolve this > conflict, neither in RELENG_4 nor in HEAD. > > I can create proposition (in form of patch) how they should be changed > (this will use fact that wait4() status is 32 bits, but only low 16 bits > are used) but this will be ABI change with incompatibility for > signals 64...128 when bit shifts are used and only 128 in expensive > variant of multiply/delete. > Yet another variant is to exclude signals 127 and 128, this variant > AFAIU conflicts with POSIX. > > Another point view is this problem is most architectural and should be > first discussed in -arch or -hackers, not in -bugs, and it (problem) > is too complicated to fit in frames of gnats db. But IMO it does _not_ > mean the PR should be closed, because problem keeps. > > > /netch
State Changed From-To: feedback->suspended This is still a problem. See the originator's comments in the Audit-Trail. Awaiting fix and committer.
> [problems with signals 127 and 128] First, note that "clean" programs cannot use signals 127 and 128 because they do neither have a SIG* constant nor are in the range SIGRTMIN to SIGRTMAX. Therefore, I think it is inappropriate to make large changes to make them work. It suffices if wait4() and other interfaces cannot cause confusion. Because sh returns exit status 128+sig for signal sig, signal 128 cannot be represented in an 8-bit exit status and would have to be aliased to another signal if it is kept. The suggestion to modify wait4() ABI seems inappropriate for that reason. Another option is to modify the highest signal number accepted by interfaces (while leaving the size of sigset_t and the like unchanged). This effectively removes signals 127 and 128 from the system. One problem results when having posix_spawn() from an old libc reset all signals to default (by passing posix_spawnattr_setsigdefault() a sigfillset()'ed sigset_t and enabling the POSIX_SPAWN_SETSIGDEF flag in the posix_spawnattr_t). It will then attempt to set all signals from 1 to 128 to the default action and fail the entire spawn if sigaction() fails. This could be allowed by having certain calls (such as sigaction() with SIG_DFL) return success without doing anything for signals 127 and 128. This is likely to get messy. Alternatively, the default action for signals 127 and 128 could be changed to ignore (like SIGCHLD, SIGURG and SIGINFO), so that no process may terminate because of them. Processes can still send the signals and set handlers for them. Apart from the obvious effect that the process will not terminate when it receives such a signal without handling or masking it, FreeBSD also discards ignored signals even when they are masked (POSIX permits this). This could lead to unexpected results if a process is using sigwait() or a similar function. Yet another approach would modify the wait4() system call, changing signals 127 and 128 to something that does not cause confusion. This seems ugly. -- Jilles Tjoelker
On Mon, 30 Apr 2012, Jilles Tjoelker wrote: >> [problems with signals 127 and 128] > > First, note that "clean" programs cannot use signals 127 and 128 because > they do neither have a SIG* constant nor are in the range SIGRTMIN to > SIGRTMAX. Therefore, I think it is inappropriate to make large changes > to make them work. It suffices if wait4() and other interfaces cannot > cause confusion. I agree with not making large changes, of course. I wonder if there is a technical reason why 127 and and 128 were left out of SIGRTMAX. In 4.4BSD, NSIG was only 32, with a comment saying that 33 is possible (since NSIG counts signal 0). Signal 32 would have caused fewer problems than signal 128 does now, but was left out. In Linux (2.6.10 for x86-64, i386 and many others), NSIG is 32 and _NSIG is 64; apparently NSIG counts signal 0 but _NSIG doesn't, similarly to FreeBSD except for the spelling and value of _NSIG and all signals up to and including _NSIG being supported (SIGRTMIN is NSIG = 32 and SIGRTMAX is _NSIG = 64; FreeBSD uses the better spelling _SIG_MAXSIG for _NSIG); a max of 64 causes fewer technical problems and less bloat. > Because sh returns exit status 128+sig for signal sig, signal 128 cannot > be represented in an 8-bit exit status and would have to be aliased to > another signal if it is kept. > > The suggestion to modify wait4() ABI seems inappropriate for that > reason. > > Another option is to modify the highest signal number accepted by > interfaces (while leaving the size of sigset_t and the like unchanged). > This effectively removes signals 127 and 128 from the system. One > problem results when having posix_spawn() from an old libc reset all > signals to default (by passing posix_spawnattr_setsigdefault() a > sigfillset()'ed sigset_t and enabling the POSIX_SPAWN_SETSIGDEF flag in > the posix_spawnattr_t). It will then attempt to set all signals from 1 > to 128 to the default action and fail the entire spawn if sigaction() > fails. This could be allowed by having certain calls (such as > sigaction() with SIG_DFL) return success without doing anything for > signals 127 and 128. This is likely to get messy. > > Alternatively, the default action for signals 127 and 128 could be > changed to ignore (like SIGCHLD, SIGURG and SIGINFO), so that no process > may terminate because of them. Processes can still send the signals and > set handlers for them. Apart from the obvious effect that the process > will not terminate when it receives such a signal without handling or > masking it, FreeBSD also discards ignored signals even when they are > masked (POSIX permits this). This could lead to unexpected results if a > process is using sigwait() or a similar function. > > Yet another approach would modify the wait4() system call, changing > signals 127 and 128 to something that does not cause confusion. This > seems ugly. I think I prefer disallowing signal 128 and not worry about unportable programs using it, and not changing anything for signal 127 and not worry about the ambiguous wait status from this. Emulators give interesting problems with signal ranges. FreeBSD seems to handle these problems mostly correctly in the Linux emulator. First, it needs a host signal range larger than the target signal range. [0..126], [0..127] and [0..128] all exceed the Linux range of [0..64], so there is no problem yet. However, for mips under Linux, _NSIG is 128, so the full FreeBSD range might be needed, depending on how Linux handles the problem with wait statuses. FreeBSD mostly uses the Linux _NSIG correctly, so it gets target limits. It also translates signal numbers below NSIG, so it knows a little about NSIG counting signal 0. However, in linux_ioctl.c, it still uses the old FreeBSD signal number NSIG in a private ISSIGVALID() macro instead of using its standard macro LINUX_SIG_VALID() which uses _NSIG correctly. ISSIGVALID() is only used for the VT_SETMODE ioctl, and FreeBSD's signal handling for this differs in other ways than Linux's (FreeBSD fixes up mode.frsig (but only if it and mode.acqsig are invalid according to the private macro), while Linux ignores mode.frsig. The private macro might even be correct, with making it look like a standard macro just obfuscating any magic for NSIG here. Bruce
Hi, Mon, Apr 30, 2012 at 00:46:19, jilles wrote about "Re: kern/19402: Signals 127 and 128 cannot be detected in wait4() interface": > > [problems with signals 127 and 128] > > First, note that "clean" programs cannot use signals 127 and 128 because > they do neither have a SIG* constant nor are in the range SIGRTMIN to > SIGRTMAX. You are correct here now, but not at the time I have issued the original request. Values for SIGRTMIN, SIGRTMAX initially appeared only in version 1.47 (Oct 2005) and was incorrect. Revision 1.53 reduced SIGRTMAX from 128 to 126 exactly concerning this my PR. So, if we stick on treating 126 as maximal possible signal number which doesn't break existing ABI, all seems satisfied and I suggest simply to close it as fixed. No need to change any more. -netch-
Mon, Apr 30, 2012 at 18:24:51, brde wrote about "Re: kern/19402: Signals 127 and 128 cannot be detected in wait4() interface": > I think I prefer disallowing signal 128 and not worry about unportable > programs using it, and not changing anything for signal 127 and not worry > about the ambiguous wait status from this. As soon as realtime signals are already kind of feature very limited in use, and correct program doesn't allocate them in manner linear dependent on checked descriptor count, I guess it's too improbable to see a program which uses more than 10-16 realtime signals. Our current limit 62 is much more. > However, for mips under Linux, _NSIG is 128, If they didn't change the wait*() exitstatus ABI under MIPS (and as far as I see at the code, this ABI is platform independent), Linux have the same problems with signals 127 and 128 and their usage is incorrect. I guess it's better to discuss the issue in LKML and wait for Linux reaction. -netch-
> You are correct here now, but not at the time I have issued the original > request. Values for SIGRTMIN, SIGRTMAX initially appeared only in > version 1.47 (Oct 2005) and was incorrect. Revision 1.53 reduced > SIGRTMAX from 128 to 126 exactly concerning this my PR. So, if we stick > on treating 126 as maximal possible signal number which doesn't break > existing ABI, all seems satisfied and I suggest simply to close it as > fixed. No need to change any more. Forgot to mention _SIG_MAXSIG which also should be reduced if used. -netch-
Closed at submitter's request (SIGRTMAX is 126).