Bug 290844 - timeout(1)+sh(1): read -t spurious timeout on signal
Summary: timeout(1)+sh(1): read -t spurious timeout on signal
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 16.0-CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: Dag-Erling Smørgrav
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-11-06 19:20 UTC by Bryan Drewery
Modified: 2025-11-25 03:23 UTC (History)
2 users (show)

See Also:
des: mfc-stable15+
des: mfc-stable14+


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Bryan Drewery freebsd_committer freebsd_triage 2025-11-06 19:20:35 UTC
CURRENT 55c28005f544282b984ae0e15dacd0c108d8ab12

I believe this is a problem with the reaper and not timeout(1).

shell read with timeout and SIGINFO trap. Expect it to get status 157 (SIGINFO) only.
With timeout(1) it also gets a 142 SIGALRM which causes an early timeout.

Expected:
```
sh -c 'pread() { while :; do read -t 50 n; ret=$?; echo $ret;  case $ret in 142) exit 1; esac; done; }; trap "echo info" INFO; pread; echo done'
# ^T
load: 0.17  cmd: sh 4824 [select] 0.35r 0.00u 0.01s 0% 3296k
mi_switch+0x172 sleepq_switch+0x109 sleepq_catch_signals+0x276 sleepq_timedwait_sig+0x12 _cv_timedwait_sig_sbt+0x187 kern_select+0xa93 sys_select+0x57 amd64_syscall+0x451 fast_syscall_common+0xf8
info
157 # SIGINFO
```

Actual:
```
timeout -v 100 sh -c 'pread() { while :; do read -t 50 n; ret=$?; echo $ret;  case $ret in 142) exit 1; esac; done; }; trap "echo info" INFO; pread; echo done'
# ^T
load: 0.35  cmd: sh 5283 [select] 0.45r 0.00u 0.00s 0% 3288k
mi_switch+0x172 sleepq_switch+0x109 sleepq_catch_signals+0x276 sleepq_timedwait_sig+0x12 _cv_timedwait_sig_sbt+0x187 kern_select+0xa93 sys_select+0x57 amd64_syscall+0x451 fast_syscall_common+0xf8
info
157 # SIGINFO
timeout: received signal INFO(29)
timeout: sending signal INFO(29) to command 'sh'
timeout: signaled 1 processes
timeout: sending signal CONT(19) to command 'sh'
info
157 # SIGINFO
142 # SIGALRM ???
```

Note timeout(1) forwards the SIGINFO so it is received twice. That's "fine". The SIGARLM isn't fine.
Comment 1 Konstantin Belousov freebsd_committer freebsd_triage 2025-11-06 19:38:00 UTC
Could Bryan or Peter reproduce this spurious SIGALRM issue under ktrace?
Comment 2 Bryan Drewery freebsd_committer freebsd_triage 2025-11-06 19:44:38 UTC
https://people.freebsd.org/~bdrewery/pr-290844.txt

Took 2 SIGINFO to make it happen with ktrace for some reason but it did return SIGALRM.
Comment 3 Bryan Drewery freebsd_committer freebsd_triage 2025-11-06 19:45:12 UTC
22638 timeout  GIO   fd 2 wrote 9 bytes
       "timeout: "
 22638 timeout  RET   write 9
 22638 timeout  CALL  write(0x2,0xdbabfbe4520,0x27)
 22638 timeout  GIO   fd 2 wrote 39 bytes
       "sending signal CONT(19) to command 'sh'"
 22638 timeout  RET   write 39/0x27
 22638 timeout  CALL  write(0x2,0xdbac192f587,0x1)
 22638 timeout  GIO   fd 2 wrote 1 byte
       "
       "
 22638 timeout  RET   write 1
 22638 timeout  CALL  getpid
 22638 timeout  RET   getpid 22638/0x586e
 22638 timeout  CALL  procctl(P_PID,0x586e,PROC_REAP_KILL,0xdbabfbe4c40)
 22643 sh       RET   select -1 errno 4 Interrupted system call
 22643 sh       CALL  write(0x1,0x437a0ee54000,0x4)
 22638 timeout  RET   procctl 0
 22638 timeout  CALL  sigsuspend(0xdbabfbe4f18)
 22638 timeout  PSIG  SIGCHLD caught handler=0xdb29fa6d1c0 mask=0xfffefeff code=CLD_CONTINUED
 22638 timeout  RET   sigsuspend -1 errno 4 Interrupted system call
 22638 timeout  CALL  sigreturn(0xdbabfbe4000)
 22638 timeout  RET   sigreturn JUSTRETURN
 22638 timeout  CALL  wait6(P_ALL,-1,0xdbabfbe4c8c,0x11<WNOHANG|WEXITED>,0,0xdbabfbe4e30)
 22638 timeout  RET   wait6 0
 22638 timeout  CALL  sigsuspend(0xdbabfbe4f18)
 22643 sh       GIO   fd 1 wrote 4 bytes
       "142
       "
Comment 4 Konstantin Belousov freebsd_committer freebsd_triage 2025-11-06 19:48:54 UTC
Sorry but I do not see any SIGALRM delivered to any process in the kdump.
Can you please point me to the line number in the pr-290844.txt where
SIGALRM delivery is reported?
Comment 5 Bryan Drewery freebsd_committer freebsd_triage 2025-11-06 19:54:58 UTC
(In reply to Konstantin Belousov from comment #4)

My analysis isn't correct.

I don't see an explicit SIGALRM either. timeout does not have any default SIGALRM/142 values so I don't see how it got it either. Something about the sigsuspend [EINTR] maybe. Although I don't see why the child would exit.
Comment 6 Bryan Drewery freebsd_committer freebsd_triage 2025-11-06 19:59:45 UTC
I think this is relevant.

Timeout sends SIGCONT and the child gets interrupted and exits thinking it got a timeout on select.

Sh's read code assumes that [EINTR] is due to timeout. So the real problem is spurious [EINTR] on SIGCONT.

if (tv.tv_sec >= 0) {
        /*
         * Wait for something to become available.
         */
        FD_ZERO(&ifds);
        FD_SET(0, &ifds);
        status = select(1, &ifds, NULL, NULL, &tv);
        /*
         * If there's nothing ready, return an error.
         */
        if (status <= 0) {
                while (*ap != NULL)
                        setvar(*ap++, "", 0);
                sig = pendingsig;
                return (128 + (sig != 0 ? sig : SIGALRM));
        }
}

The ktrace part:

 22638 timeout  GIO   fd 2 wrote 39 bytes
       "sending signal CONT(19) to command 'sh'"
 22638 timeout  RET   write 39/0x27
 22638 timeout  CALL  write(0x2,0xdbac192f587,0x1)
 22638 timeout  GIO   fd 2 wrote 1 byte
       "
       "
 22638 timeout  RET   write 1
 22638 timeout  CALL  getpid
 22638 timeout  RET   getpid 22638/0x586e
 22638 timeout  CALL  procctl(P_PID,0x586e,PROC_REAP_KILL,0xdbabfbe4c40)
 22643 sh       RET   select -1 errno 4 Interrupted system call
 22643 sh       CALL  write(0x1,0x437a0ee54000,0x4)
 22638 timeout  RET   procctl 0
 22638 timeout  CALL  sigsuspend(0xdbabfbe4f18)
 22638 timeout  PSIG  SIGCHLD caught handler=0xdb29fa6d1c0 mask=0xfffefeff code=CLD_CONTINUED
 22638 timeout  RET   sigsuspend -1 errno 4 Interrupted system call
 22638 timeout  CALL  sigreturn(0xdbabfbe4000)
 22638 timeout  RET   sigreturn JUSTRETURN
 22638 timeout  CALL  wait6(P_ALL,-1,0xdbabfbe4c8c,0x11<WNOHANG|WEXITED>,0,0xdbabfbe4e30)
 22638 timeout  RET   wait6 0
 22638 timeout  CALL  sigsuspend(0xdbabfbe4f18)
 22643 sh       GIO   fd 1 wrote 4 bytes
       "142
       "
 22643 sh       RET   write 4
 22643 sh       CALL  _exit(0x1)
Comment 7 Bryan Drewery freebsd_committer freebsd_triage 2025-11-06 20:01:29 UTC
(In reply to Bryan Drewery from comment #6)

> Sh's read code assumes that [EINTR] is due to timeout

When no signal has been processed. So yes it's not a *SIGLARM*, just a random [EINTR] which sh assumes is SIGALRM.
Comment 8 Konstantin Belousov freebsd_committer freebsd_triage 2025-11-06 21:06:40 UTC
(In reply to Bryan Drewery from comment #7)
Ok, this is much less mysterious then.

I would say that the behavior that ANYSTOP/SIGCONT causes spurious EINTR
is too well-established to be changed.  I recently changed ptrace(PT_ATTACH)
to not interrupt sleeps in this way, and it still not settled.  I think
that making much more common scenario change would be too much breakage.

In other words, I believe that the fix belongs to sh(1) and not kernel.
Perhaps sh should check if the timeout was actually reached before pretending
to generate fake SIGALRM, if there is any timeout at all.
Comment 9 Bryan Drewery freebsd_committer freebsd_triage 2025-11-07 01:46:43 UTC
(In reply to Konstantin Belousov from comment #8)

Fair enough. Appreciate your response. I'll look into a fix for sh.

If anyone else runs into this a workaround is to pass "--foreground" to timeout(1) which will avoid it sending the extra signals.
Comment 10 Dag-Erling Smørgrav freebsd_committer freebsd_triage 2025-11-14 17:24:53 UTC
I'm unable to reproduce the issue, but I believe this should fix it:

https://reviews.freebsd.org/D53761
Comment 11 commit-hook freebsd_committer freebsd_triage 2025-11-19 10:50:49 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=3c2643a7dbac370b7232f4e5ac15fd77b9ff396d

commit 3c2643a7dbac370b7232f4e5ac15fd77b9ff396d
Author:     Dag-Erling Smørgrav <des@FreeBSD.org>
AuthorDate: 2025-11-19 10:43:13 +0000
Commit:     Dag-Erling Smørgrav <des@FreeBSD.org>
CommitDate: 2025-11-19 10:43:59 +0000

    sh: Don't assume EINTR means SIGALRM

    While waiting for input in the read builtin, if select() is interrupted
    but there is no pending signal, we act like we timed out, and return the
    same status as if we had been interrupted by SIGALRM, instead of looping
    until we actually do time out.

    * Replace the single select() call with a ppoll() loop.

    * Improve validation of the timeout value.  We now accept things like
      "1h30m15s", which we used to silently truncate to "1h".  The flip side
      is that we no longer accept things like "1hour" or "5sec".

    * Modify the existing `read -t 0` test case to verify that read returns
      immediately when there is input and fails immediately when there isn't.

    * Add a second test case which performs the same tests with a non-zero
      timeout value.

    PR:             290844
    MFC after:      1 week
    Fixes:          c4539460e3a4 ("sh: Improve error handling in read builtin:")
    Reviewed by:    jilles, bdrewery
    Differential Revision:  https://reviews.freebsd.org/D53761

 bin/sh/miscbltin.c                   | 83 +++++++++++++++++++++++++-----------
 bin/sh/sh.1                          |  6 ++-
 bin/sh/tests/builtins/Makefile       |  1 +
 bin/sh/tests/builtins/read11.0       | 19 ++++++++-
 bin/sh/tests/builtins/read12.0 (new) | 32 ++++++++++++++
 5 files changed, 112 insertions(+), 29 deletions(-)
Comment 12 commit-hook freebsd_committer freebsd_triage 2025-11-25 03:19:34 UTC
A commit in branch stable/15 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=fb57eac42c1598119aa8614f3886dca0379ae816

commit fb57eac42c1598119aa8614f3886dca0379ae816
Author:     Dag-Erling Smørgrav <des@FreeBSD.org>
AuthorDate: 2025-11-19 10:43:13 +0000
Commit:     Dag-Erling Smørgrav <des@FreeBSD.org>
CommitDate: 2025-11-25 03:18:24 +0000

    sh: Don't assume EINTR means SIGALRM

    While waiting for input in the read builtin, if select() is interrupted
    but there is no pending signal, we act like we timed out, and return the
    same status as if we had been interrupted by SIGALRM, instead of looping
    until we actually do time out.

    * Replace the single select() call with a ppoll() loop.

    * Improve validation of the timeout value.  We now accept things like
      "1h30m15s", which we used to silently truncate to "1h".  The flip side
      is that we no longer accept things like "1hour" or "5sec".

    * Modify the existing `read -t 0` test case to verify that read returns
      immediately when there is input and fails immediately when there isn't.

    * Add a second test case which performs the same tests with a non-zero
      timeout value.

    PR:             290844
    MFC after:      1 week
    Fixes:          c4539460e3a4 ("sh: Improve error handling in read builtin:")
    Reviewed by:    jilles, bdrewery
    Differential Revision:  https://reviews.freebsd.org/D53761

    (cherry picked from commit 3c2643a7dbac370b7232f4e5ac15fd77b9ff396d)

 bin/sh/miscbltin.c                   | 83 +++++++++++++++++++++++++-----------
 bin/sh/sh.1                          |  6 ++-
 bin/sh/tests/builtins/Makefile       |  1 +
 bin/sh/tests/builtins/read11.0       | 19 ++++++++-
 bin/sh/tests/builtins/read12.0 (new) | 32 ++++++++++++++
 5 files changed, 112 insertions(+), 29 deletions(-)
Comment 13 commit-hook freebsd_committer freebsd_triage 2025-11-25 03:20:39 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=7f691e07efe63ea01273833e44fd03ee00106b2b

commit 7f691e07efe63ea01273833e44fd03ee00106b2b
Author:     Dag-Erling Smørgrav <des@FreeBSD.org>
AuthorDate: 2025-11-19 10:43:13 +0000
Commit:     Dag-Erling Smørgrav <des@FreeBSD.org>
CommitDate: 2025-11-25 03:19:16 +0000

    sh: Don't assume EINTR means SIGALRM

    While waiting for input in the read builtin, if select() is interrupted
    but there is no pending signal, we act like we timed out, and return the
    same status as if we had been interrupted by SIGALRM, instead of looping
    until we actually do time out.

    * Replace the single select() call with a ppoll() loop.

    * Improve validation of the timeout value.  We now accept things like
      "1h30m15s", which we used to silently truncate to "1h".  The flip side
      is that we no longer accept things like "1hour" or "5sec".

    * Modify the existing `read -t 0` test case to verify that read returns
      immediately when there is input and fails immediately when there isn't.

    * Add a second test case which performs the same tests with a non-zero
      timeout value.

    PR:             290844
    MFC after:      1 week
    Fixes:          c4539460e3a4 ("sh: Improve error handling in read builtin:")
    Reviewed by:    jilles, bdrewery
    Differential Revision:  https://reviews.freebsd.org/D53761

    (cherry picked from commit 3c2643a7dbac370b7232f4e5ac15fd77b9ff396d)

 bin/sh/miscbltin.c                   | 83 +++++++++++++++++++++++++-----------
 bin/sh/sh.1                          |  6 ++-
 bin/sh/tests/builtins/Makefile       |  1 +
 bin/sh/tests/builtins/read11.0       | 19 ++++++++-
 bin/sh/tests/builtins/read12.0 (new) | 32 ++++++++++++++
 5 files changed, 112 insertions(+), 29 deletions(-)