Bug 290843 - killpg deadlock against a stopped interrupted fork
Summary: killpg deadlock against a stopped interrupted fork
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 16.0-CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: Konstantin Belousov
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-11-06 19:09 UTC by Bryan Drewery
Modified: 2025-11-21 16:20 UTC (History)
3 users (show)

See Also:
markj: mfc-stable15+


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Bryan Drewery freebsd_committer freebsd_triage 2025-11-06 19:09:03 UTC
This is on CURRENT 55c28005f544282b984ae0e15dacd0c108d8ab12 but I've seen this for a few years. I had to disable the DEADLKRES option because I hit it so often in Poudriere tests. Finally found a simple repro today.

Basic summary is `killpg(pgid, STOP)` against a forking child blocks further `killpg(pgid)`.

Repro given later. Here's the simplest result:

```
# procstat -t 51155 31783
  PID    TID COMM                TDNAME              CPU  PRI STATE   WCHAN
51155 302071 sh                  -                    -1  115 sleep   killpg r
31783 128716 sh                  -                    -1  115 stop    -

# procstat -kk 51155 31783
  PID    TID COMM                TDNAME              KSTACK
51155 302071 sh                  -                   mi_switch+0x172 sleepq_switch+0x109 _sx_xlock_hard+0x513 _sx_xlock+0xac killpg1+0x138 kern_kill+0x222 amd64_syscall+0x451 fast_syscall_common+0xf8
31783 128716 sh                  -                   mi_switch+0x172 thread_suspend_check+0xbd sig_intr+0x7a fork1+0x448 sys_fork+0x54 amd64_syscall+0x451 fast_syscall_common+0xf8
```

Using `kill -CONT -31783` blocks on killpg racer, while avoiding killpg with `kill -CONT 31783` does not block.

Repro:

```
# `kill -STOP; kill -TERM; kill-CONT` against a forking job (job control enabled).
# foo() is trying to repro a blank $() value which is not required for the repro but brings in enough forking to trigger the problem quickly so I left it in.

sh -c 'trap "kill -9 %1; exit" INT; foo() { unset cmd; cmd=$(/sbin/sysctl -n vm.loadavg|/usr/bin/awk "{print \$2,\$3,\$4}"); case "${cmd:+set}" in set) ;; *) exit 99 ;; esac }; runner() { while foo; do :; done }; launch() { local -; set -m; PS4="child+ " runner & }; set -x; while :; do launch; sleep 0.1; kill -STOP %1; kill -TERM %1; kill -CONT %1; ret=0; wait; if [ $ret -eq 99 ]; then exit 99; fi; done;'
```

It appears https://reviews.freebsd.org/D40493 and https://reviews.freebsd.org/D41128 may have relevant discussion and attempts to fix.
Comment 1 Konstantin Belousov freebsd_committer freebsd_triage 2025-11-06 19:17:23 UTC
The bug is clearly that thread_suspend_check(return_instead == 1) might not
return, but suspend the thread.  FWIW, sig_intr() is the only caller of
thread_suspend_check(1).
Comment 2 Konstantin Belousov freebsd_committer freebsd_triage 2025-11-06 19:28:47 UTC
https://reviews.freebsd.org/D53624
Comment 3 commit-hook freebsd_committer freebsd_triage 2025-11-11 08:54:41 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=e1c6f4cb9bd29358c2b2fe249af9a2f9626b0670

commit e1c6f4cb9bd29358c2b2fe249af9a2f9626b0670
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2025-11-06 19:25:23 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2025-11-11 08:54:05 +0000

    kern_thread: thread_suspend_check(1) must never suspend

    Reported by:    bdrewery
    Reviewed by:    bdrewery, markj
    Tested by:      bdrewery, pho
    PR:     290843
    Sponsored by:   The FreeBSD Foundation
    MFC after:      1 week
    Differential revision:  https://reviews.freebsd.org/D53624

 sys/kern/kern_thread.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)
Comment 4 commit-hook freebsd_committer freebsd_triage 2025-11-18 03:38:33 UTC
A commit in branch stable/15 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=48c28844382229a7af24941541e89e663b38f75c

commit 48c28844382229a7af24941541e89e663b38f75c
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2025-11-06 19:25:23 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2025-11-18 03:37:05 +0000

    kern_thread: thread_suspend_check(1) must never suspend

    PR:     290843

    (cherry picked from commit e1c6f4cb9bd29358c2b2fe249af9a2f9626b0670)

 sys/kern/kern_thread.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)
Comment 5 commit-hook freebsd_committer freebsd_triage 2025-11-18 03:39:34 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=f363f4a8fa8b7d7beb79d2f6f70479aad30e7d6f

commit f363f4a8fa8b7d7beb79d2f6f70479aad30e7d6f
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2025-11-06 19:25:23 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2025-11-18 03:38:29 +0000

    kern_thread: thread_suspend_check(1) must never suspend

    PR:     290843

    (cherry picked from commit e1c6f4cb9bd29358c2b2fe249af9a2f9626b0670)

 sys/kern/kern_thread.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)