CURRENT 55c28005f544282b984ae0e15dacd0c108d8ab12 I believe this is a problem with the reaper and not timeout(1). shell read with timeout and SIGINFO trap. Expect it to get status 157 (SIGINFO) only. With timeout(1) it also gets a 142 SIGALRM which causes an early timeout. Expected: ``` sh -c 'pread() { while :; do read -t 50 n; ret=$?; echo $ret; case $ret in 142) exit 1; esac; done; }; trap "echo info" INFO; pread; echo done' # ^T load: 0.17 cmd: sh 4824 [select] 0.35r 0.00u 0.01s 0% 3296k mi_switch+0x172 sleepq_switch+0x109 sleepq_catch_signals+0x276 sleepq_timedwait_sig+0x12 _cv_timedwait_sig_sbt+0x187 kern_select+0xa93 sys_select+0x57 amd64_syscall+0x451 fast_syscall_common+0xf8 info 157 # SIGINFO ``` Actual: ``` timeout -v 100 sh -c 'pread() { while :; do read -t 50 n; ret=$?; echo $ret; case $ret in 142) exit 1; esac; done; }; trap "echo info" INFO; pread; echo done' # ^T load: 0.35 cmd: sh 5283 [select] 0.45r 0.00u 0.00s 0% 3288k mi_switch+0x172 sleepq_switch+0x109 sleepq_catch_signals+0x276 sleepq_timedwait_sig+0x12 _cv_timedwait_sig_sbt+0x187 kern_select+0xa93 sys_select+0x57 amd64_syscall+0x451 fast_syscall_common+0xf8 info 157 # SIGINFO timeout: received signal INFO(29) timeout: sending signal INFO(29) to command 'sh' timeout: signaled 1 processes timeout: sending signal CONT(19) to command 'sh' info 157 # SIGINFO 142 # SIGALRM ??? ``` Note timeout(1) forwards the SIGINFO so it is received twice. That's "fine". The SIGARLM isn't fine.
Could Bryan or Peter reproduce this spurious SIGALRM issue under ktrace?
https://people.freebsd.org/~bdrewery/pr-290844.txt Took 2 SIGINFO to make it happen with ktrace for some reason but it did return SIGALRM.
22638 timeout GIO fd 2 wrote 9 bytes "timeout: " 22638 timeout RET write 9 22638 timeout CALL write(0x2,0xdbabfbe4520,0x27) 22638 timeout GIO fd 2 wrote 39 bytes "sending signal CONT(19) to command 'sh'" 22638 timeout RET write 39/0x27 22638 timeout CALL write(0x2,0xdbac192f587,0x1) 22638 timeout GIO fd 2 wrote 1 byte " " 22638 timeout RET write 1 22638 timeout CALL getpid 22638 timeout RET getpid 22638/0x586e 22638 timeout CALL procctl(P_PID,0x586e,PROC_REAP_KILL,0xdbabfbe4c40) 22643 sh RET select -1 errno 4 Interrupted system call 22643 sh CALL write(0x1,0x437a0ee54000,0x4) 22638 timeout RET procctl 0 22638 timeout CALL sigsuspend(0xdbabfbe4f18) 22638 timeout PSIG SIGCHLD caught handler=0xdb29fa6d1c0 mask=0xfffefeff code=CLD_CONTINUED 22638 timeout RET sigsuspend -1 errno 4 Interrupted system call 22638 timeout CALL sigreturn(0xdbabfbe4000) 22638 timeout RET sigreturn JUSTRETURN 22638 timeout CALL wait6(P_ALL,-1,0xdbabfbe4c8c,0x11<WNOHANG|WEXITED>,0,0xdbabfbe4e30) 22638 timeout RET wait6 0 22638 timeout CALL sigsuspend(0xdbabfbe4f18) 22643 sh GIO fd 1 wrote 4 bytes "142 "
Sorry but I do not see any SIGALRM delivered to any process in the kdump. Can you please point me to the line number in the pr-290844.txt where SIGALRM delivery is reported?
(In reply to Konstantin Belousov from comment #4) My analysis isn't correct. I don't see an explicit SIGALRM either. timeout does not have any default SIGALRM/142 values so I don't see how it got it either. Something about the sigsuspend [EINTR] maybe. Although I don't see why the child would exit.
I think this is relevant. Timeout sends SIGCONT and the child gets interrupted and exits thinking it got a timeout on select. Sh's read code assumes that [EINTR] is due to timeout. So the real problem is spurious [EINTR] on SIGCONT. if (tv.tv_sec >= 0) { /* * Wait for something to become available. */ FD_ZERO(&ifds); FD_SET(0, &ifds); status = select(1, &ifds, NULL, NULL, &tv); /* * If there's nothing ready, return an error. */ if (status <= 0) { while (*ap != NULL) setvar(*ap++, "", 0); sig = pendingsig; return (128 + (sig != 0 ? sig : SIGALRM)); } } The ktrace part: 22638 timeout GIO fd 2 wrote 39 bytes "sending signal CONT(19) to command 'sh'" 22638 timeout RET write 39/0x27 22638 timeout CALL write(0x2,0xdbac192f587,0x1) 22638 timeout GIO fd 2 wrote 1 byte " " 22638 timeout RET write 1 22638 timeout CALL getpid 22638 timeout RET getpid 22638/0x586e 22638 timeout CALL procctl(P_PID,0x586e,PROC_REAP_KILL,0xdbabfbe4c40) 22643 sh RET select -1 errno 4 Interrupted system call 22643 sh CALL write(0x1,0x437a0ee54000,0x4) 22638 timeout RET procctl 0 22638 timeout CALL sigsuspend(0xdbabfbe4f18) 22638 timeout PSIG SIGCHLD caught handler=0xdb29fa6d1c0 mask=0xfffefeff code=CLD_CONTINUED 22638 timeout RET sigsuspend -1 errno 4 Interrupted system call 22638 timeout CALL sigreturn(0xdbabfbe4000) 22638 timeout RET sigreturn JUSTRETURN 22638 timeout CALL wait6(P_ALL,-1,0xdbabfbe4c8c,0x11<WNOHANG|WEXITED>,0,0xdbabfbe4e30) 22638 timeout RET wait6 0 22638 timeout CALL sigsuspend(0xdbabfbe4f18) 22643 sh GIO fd 1 wrote 4 bytes "142 " 22643 sh RET write 4 22643 sh CALL _exit(0x1)
(In reply to Bryan Drewery from comment #6) > Sh's read code assumes that [EINTR] is due to timeout When no signal has been processed. So yes it's not a *SIGLARM*, just a random [EINTR] which sh assumes is SIGALRM.
(In reply to Bryan Drewery from comment #7) Ok, this is much less mysterious then. I would say that the behavior that ANYSTOP/SIGCONT causes spurious EINTR is too well-established to be changed. I recently changed ptrace(PT_ATTACH) to not interrupt sleeps in this way, and it still not settled. I think that making much more common scenario change would be too much breakage. In other words, I believe that the fix belongs to sh(1) and not kernel. Perhaps sh should check if the timeout was actually reached before pretending to generate fake SIGALRM, if there is any timeout at all.
(In reply to Konstantin Belousov from comment #8) Fair enough. Appreciate your response. I'll look into a fix for sh. If anyone else runs into this a workaround is to pass "--foreground" to timeout(1) which will avoid it sending the extra signals.
I'm unable to reproduce the issue, but I believe this should fix it: https://reviews.freebsd.org/D53761
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=3c2643a7dbac370b7232f4e5ac15fd77b9ff396d commit 3c2643a7dbac370b7232f4e5ac15fd77b9ff396d Author: Dag-Erling Smørgrav <des@FreeBSD.org> AuthorDate: 2025-11-19 10:43:13 +0000 Commit: Dag-Erling Smørgrav <des@FreeBSD.org> CommitDate: 2025-11-19 10:43:59 +0000 sh: Don't assume EINTR means SIGALRM While waiting for input in the read builtin, if select() is interrupted but there is no pending signal, we act like we timed out, and return the same status as if we had been interrupted by SIGALRM, instead of looping until we actually do time out. * Replace the single select() call with a ppoll() loop. * Improve validation of the timeout value. We now accept things like "1h30m15s", which we used to silently truncate to "1h". The flip side is that we no longer accept things like "1hour" or "5sec". * Modify the existing `read -t 0` test case to verify that read returns immediately when there is input and fails immediately when there isn't. * Add a second test case which performs the same tests with a non-zero timeout value. PR: 290844 MFC after: 1 week Fixes: c4539460e3a4 ("sh: Improve error handling in read builtin:") Reviewed by: jilles, bdrewery Differential Revision: https://reviews.freebsd.org/D53761 bin/sh/miscbltin.c | 83 +++++++++++++++++++++++++----------- bin/sh/sh.1 | 6 ++- bin/sh/tests/builtins/Makefile | 1 + bin/sh/tests/builtins/read11.0 | 19 ++++++++- bin/sh/tests/builtins/read12.0 (new) | 32 ++++++++++++++ 5 files changed, 112 insertions(+), 29 deletions(-)
A commit in branch stable/15 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=fb57eac42c1598119aa8614f3886dca0379ae816 commit fb57eac42c1598119aa8614f3886dca0379ae816 Author: Dag-Erling Smørgrav <des@FreeBSD.org> AuthorDate: 2025-11-19 10:43:13 +0000 Commit: Dag-Erling Smørgrav <des@FreeBSD.org> CommitDate: 2025-11-25 03:18:24 +0000 sh: Don't assume EINTR means SIGALRM While waiting for input in the read builtin, if select() is interrupted but there is no pending signal, we act like we timed out, and return the same status as if we had been interrupted by SIGALRM, instead of looping until we actually do time out. * Replace the single select() call with a ppoll() loop. * Improve validation of the timeout value. We now accept things like "1h30m15s", which we used to silently truncate to "1h". The flip side is that we no longer accept things like "1hour" or "5sec". * Modify the existing `read -t 0` test case to verify that read returns immediately when there is input and fails immediately when there isn't. * Add a second test case which performs the same tests with a non-zero timeout value. PR: 290844 MFC after: 1 week Fixes: c4539460e3a4 ("sh: Improve error handling in read builtin:") Reviewed by: jilles, bdrewery Differential Revision: https://reviews.freebsd.org/D53761 (cherry picked from commit 3c2643a7dbac370b7232f4e5ac15fd77b9ff396d) bin/sh/miscbltin.c | 83 +++++++++++++++++++++++++----------- bin/sh/sh.1 | 6 ++- bin/sh/tests/builtins/Makefile | 1 + bin/sh/tests/builtins/read11.0 | 19 ++++++++- bin/sh/tests/builtins/read12.0 (new) | 32 ++++++++++++++ 5 files changed, 112 insertions(+), 29 deletions(-)
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=7f691e07efe63ea01273833e44fd03ee00106b2b commit 7f691e07efe63ea01273833e44fd03ee00106b2b Author: Dag-Erling Smørgrav <des@FreeBSD.org> AuthorDate: 2025-11-19 10:43:13 +0000 Commit: Dag-Erling Smørgrav <des@FreeBSD.org> CommitDate: 2025-11-25 03:19:16 +0000 sh: Don't assume EINTR means SIGALRM While waiting for input in the read builtin, if select() is interrupted but there is no pending signal, we act like we timed out, and return the same status as if we had been interrupted by SIGALRM, instead of looping until we actually do time out. * Replace the single select() call with a ppoll() loop. * Improve validation of the timeout value. We now accept things like "1h30m15s", which we used to silently truncate to "1h". The flip side is that we no longer accept things like "1hour" or "5sec". * Modify the existing `read -t 0` test case to verify that read returns immediately when there is input and fails immediately when there isn't. * Add a second test case which performs the same tests with a non-zero timeout value. PR: 290844 MFC after: 1 week Fixes: c4539460e3a4 ("sh: Improve error handling in read builtin:") Reviewed by: jilles, bdrewery Differential Revision: https://reviews.freebsd.org/D53761 (cherry picked from commit 3c2643a7dbac370b7232f4e5ac15fd77b9ff396d) bin/sh/miscbltin.c | 83 +++++++++++++++++++++++++----------- bin/sh/sh.1 | 6 ++- bin/sh/tests/builtins/Makefile | 1 + bin/sh/tests/builtins/read11.0 | 19 ++++++++- bin/sh/tests/builtins/read12.0 (new) | 32 ++++++++++++++ 5 files changed, 112 insertions(+), 29 deletions(-)