Bug 283163 - kill -SIG -1 does not work any more for host - which leads to inconsistent reboot
Summary: kill -SIG -1 does not work any more for host - which leads to inconsistent re...
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 14.2-RELEASE
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-12-06 12:19 UTC by vova
Modified: 2024-12-10 02:41 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description vova 2024-12-06 12:19:29 UTC
Problem - I can't send signal to all process in systems:

# kill -15 -1
-1: No such process

# truss kill -15 -1
...
kill(-1,SIGTERM)     ERR#3 'No such process'

Looks like it was broken in 
https://reviews.freebsd.org/D34522
https://cgit.freebsd.org/src/commit/sys/kern/kern_sig.c?h=stable/14&id=69413598d2660054e29cac9454fe18c08e3bf36d


Can be easily spot during reboot with serial console - a number of processes are get killed with SIGSEGV because the was not killed by reboot which sends -1 signal

---
# shutdown -r
...

Waiting (max 60 seconds) for system process `vnlru' to stop... done
Waiting (max 60 seconds) for system process `syncer' to stop...
Syncing disks, vnodes remaining... 0 0 0 0 0 0 0 0 0 done
All buffers synced.
pid 25340 (sshd), jid 0, uid 502: exited on signal 11 (no core dump - bad address)
pid 25338 (sshd), jid 0, uid 0: exited on signal 11 (no core dump - bad address)
pid 25567 (tcsh), jid 0, uid 502: exited on signal 11 (no core dump - other error)
pid 25355 (tcsh), jid 0, uid 0: exited on signal 11 (no core dump - too large)
pid 25566 (sshd), jid 0, uid 502: exited on signal 11 (no core dump - bad address)
pid 25564 (sshd), jid 0, uid 0: exited on signal 11 (no core dump - bad address)
pid 25353 (sudo), jid 0, uid 0: exited on signal 11 (no core dump - bad address)
pid 25341 (tcsh), jid 0, uid 502: exited on signal 11 (no core dump - other error)
pid 25354 (sudo), jid 0, uid 0: exited on signal 11 (no core dump - bad address)
Uptime: 1h21m57s
uftdi0: detached
uhub0: detached
---

issue was discussed at https://lists.freebsd.org/archives/freebsd-current/2024-July/006124.html
but outcome was invalid
Comment 1 vova 2024-12-06 15:22:19 UTC
And, looks like (need to prove) that whole patch changed behaviour of sending signals from host (jid=0) to pid=-1 - it always calls prison_proc_iterate() not only when jid != 0
which breaks previous contract of kill(... , -1)
Comment 2 vova 2024-12-06 15:47:08 UTC
Issue triggered is any jail was created since boot, w/o jail looks like proc iteration falls back to old algorythm and works as expected
Comment 3 Eugene Grosbein freebsd_committer freebsd_triage 2024-12-06 15:53:10 UTC
CC'ing mjg who committed suspicted change.

I managed to reproduce the problem using https://download.freebsd.org/snapshots/ISO-IMAGES/15.0/FreeBSD-15.0-CURRENT-amd64-20241128-edfccce309a6-273911-disc1.iso.xz installed by default into new bhyve guest.

Log in and do: kill -15 -1
It works as expected terminating all use processes other than /sbin/init that restarts getty that runs "login" again. Re-login and reproduce the problem in question:

root@r150:~ # jail -c name=test0 persist
root@r150:~ # jls
   JID  IP Address      Hostname                      Path
     1                                                /
root@r150:~ # kill -15 -1
kill: -1: No such process
Comment 4 Konstantin Belousov freebsd_committer freebsd_triage 2024-12-06 17:04:02 UTC
Try https://reviews.freebsd.org/D47943
Comment 5 vova 2024-12-06 17:37:46 UTC
Fix helped:

---
#	kill -TERM -1
#
FreeBSD/amd64 (ha) (ttyu0)

login:

---
# shutdown -r now
Shutdown NOW!
shutdown: [pid 22088]

*** FINAL System shutdown message from vova@ha.sunrise ***

System going down IMMEDIATELY



*** FINAL System shutdown message from vova@ha.sunrise ***

System going down IMMEDIATELY



System shutdown time has arrived


Waiting (max 60 seconds) for system process `vnlru' to stop... done
Waiting (max 60 seconds) for system process `syncer' to stop...
Syncing disks, vnodes remaining... 0 0 0 0 0 0 0 0 0 0 done
All buffers synced.
Uptime: 1m29s
uftdi0: detached
uhub0: detached
---
Comment 6 commit-hook freebsd_committer freebsd_triage 2024-12-06 21:43:32 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=831531a82e0f1d1d7b97e50c0587639322ed8d2e

commit 831531a82e0f1d1d7b97e50c0587639322ed8d2e
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2024-12-06 17:01:00 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2024-12-06 21:42:26 +0000

    prison_proc_iterate(): make it work for prison0

    Do not exclude processes owned by host/prison0 if there are jails
    configured.

    PR:     283163
    Reviewed by:    jamie, markj
    Sponsored by:   The FreeBSD Foundation
    MFC after:      1 week
    Differential revision:  https://reviews.freebsd.org/D47943

 sys/kern/kern_jail.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)
Comment 7 vova 2024-12-08 12:09:39 UTC
I have a question 
- is it correct to use prison_proc_iterate() for kill(sig, -1) from host? 
(will it iterate through all jail's and host's processes?)

logically, signal should be delivered to all processes in all jails (similar to `ps ax` in host should show all process in all jails) 

also, with patch - I cannot reproduce more problem with processes surviving through reboot call.
Comment 8 Konstantin Belousov freebsd_committer freebsd_triage 2024-12-09 03:38:23 UTC
(In reply to vova from comment #7)
Host is the jail by itself, it is called prison0.
Comment 9 vova 2024-12-09 09:23:29 UTC
(In reply to Konstantin Belousov from comment #8)

Yes, I understand it, but, prison0 is different from other jails, 

i.e. `ps ax` in prison0 - shows processes from all jails, and normaly kill from prison0 will successfuly send a signal to process in other jail.

# sysctl security.jail.param.jid
security.jail.param.jid: 0

# ps axJ4 -o jid,pid,tty,state,command
JID   PID TTY STAT COMMAND
  4 10800 -   SNsJ /usr/sbin/syslogd -ss -c
  4 10833 -   INsJ /usr/sbin/cron -s
  4 10869 -   INsJ nginx: master process /usr/local/sbin/nginx
  4 55913 -   INJ  nginx: worker process (nginx)
  4 55914 -   INJ  nginx: worker process (nginx)
  4 55915 -   INJ  nginx: worker process (nginx)
  4 55916 -   INJ  nginx: worker process (nginx)
  4 55917 -   INJ  nginx: worker process (nginx)
  4 55918 -   INJ  nginx: worker process (nginx)
  4 55919 -   INJ  nginx: worker process (nginx)
  4 55920 -   INJ  nginx: worker process (nginx)

# kill -15 10869

# ps axJ4 -o jid,pid,tty,state,command
JID   PID TTY STAT COMMAND
  4 10800 -   INsJ /usr/sbin/syslogd -ss -c
  4 10833 -   SNsJ /usr/sbin/cron -s
#

so, for prison0, prison_proc_iterate() will go only through processes of prison0 or through all system processes?

if the first, then kill(sig, -1) will not sent signal to processes in other jails, 
if the second, then I've confused with name prison_proc_iterate()

(that is why I am askin)
Comment 10 Konstantin Belousov freebsd_committer freebsd_triage 2024-12-09 12:32:58 UTC
(In reply to vova from comment #9)
prison_proc_iterate() iterates over all processes belonging to the argument
prison.  This implicitly includes all processes belonging to the children
prisons.
Comment 11 commit-hook freebsd_committer freebsd_triage 2024-12-10 02:41:35 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=b50851e8ebfa8acc77607a4ff1095ed6e4a56881

commit b50851e8ebfa8acc77607a4ff1095ed6e4a56881
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2024-12-06 17:01:00 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2024-12-10 02:40:24 +0000

    prison_proc_iterate(): make it work for prison0

    PR:     283163

    (cherry picked from commit 831531a82e0f1d1d7b97e50c0587639322ed8d2e)

 sys/kern/kern_jail.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)