Bug 290217 - "jail -r" sometimes results in jails stuck in dying
Summary: "jail -r" sometimes results in jails stuck in dying
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 15.0-CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: Jamie Gritton
URL:
Keywords:
Depends on:
Blocks: 15.0-metabug
  Show dependency treegraph
 
Reported: 2025-10-13 20:44 UTC by David 'equinox' Lamparter
Modified: 2025-10-30 14:12 UTC (History)
4 users (show)

See Also:


Attachments
procstat -kka (26.29 KB, text/plain)
2025-10-18 18:01 UTC, David 'equinox' Lamparter
no flags Details
jls --libxo json,pretty -vdh (1.76 KB, application/json)
2025-10-18 22:58 UTC, David 'equinox' Lamparter
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description David 'equinox' Lamparter 2025-10-13 20:44:34 UTC
The script at the end of this report will, with a random chance somewhere in the 20-40% range, result in a jail stuck in "dying" and a mount that can no longer be umounted (if the root is a mount). It doesn't take more than a few tries to trigger the behavior for me. System version is 15-ALPHA4, the script doesn't seem to cause problems on 14.3.

The problem does not seem to occur if the jail exits normally after the sleep, i.e. "jail -r" is a necessary factor.  There is no config for any jail.

This is likely the same problem as in https://forums.freebsd.org/threads/remove-dying-jail.96919/ though I am not the creator of that forum post.

FreeBSD test 15.0-ALPHA4 FreeBSD 15.0-ALPHA4 stable/15-n280334-d2b670b27f37 GENERIC amd64


#!/bin/sh

set -x
root="/root/base_txz"  # contents of base.txz

jail -i -c "path=$root" host.hostname=test command=/bin/sh -c "sleep 3" | {
	read jid
	sleep 1
	jail -r $jid
}

for I in $(seq 0 10); do
	jls -d | grep "$root" || break
	sleep 1
done
Comment 1 David 'equinox' Lamparter 2025-10-13 21:48:57 UTC
Tried on a blank (almost untouched) downloaded VM image of 16 and it only happened once out of several hundred times.  Also double checked on a similarly blank download of 15-ALPHA5 and it happens within 3-4 tries.

versions involved:

FreeBSD freebsd 16.0-CURRENT FreeBSD 16.0-CURRENT #0 main-n281019-0dc634d48fcc: Thu Oct  9 20:12:32 UTC 2025     root@releng3.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64

FreeBSD freebsd 15.0-ALPHA5 FreeBSD 15.0-ALPHA5 stable/15-n280541-1c0898edf28f GENERIC amd64
Comment 2 David 'equinox' Lamparter 2025-10-13 22:24:11 UTC
Huh. The "jls -d" call seems to also be relevant, and it seems to be very timing sensitive.

via ssh (with stderr): 10.6% chance
on serial (with stderr): 0.3% chance
on serial (stderr to /dev/null): 19.8% chance

With this knowledge I tried 16 again - surprise, it's actually broken pretty exactly the same.

However, I should say, my original problem didn't involve "jls" at all (I was coming at this from a "can't umount jail's root" angle, wasn't even running jls). It's possible there are multiple issues here, or it's a single that can be triggered in various ways.
Comment 3 Mark Johnston freebsd_committer freebsd_triage 2025-10-17 16:43:44 UTC
I've been running the reproducer like this for a while and have not seen the problem yet on FreeBSD main:

while true; do sh repro.sh; if jls -vd | grep DYING; then break; fi ; done

Up to about 250 tries now.  This is in a 2-vcpu VM.

It'd be useful to see output from "procstat -kka" taken after you've got a stuck jail.
Comment 4 David 'equinox' Lamparter 2025-10-18 18:01:23 UTC
Created attachment 264702 [details]
procstat -kka

(In reply to Mark Johnston from comment #3)

Output of "uname -a; jls; jls -d; procstat -kka" attached.  3rd attempt got me the hang this time.

In case it's somehow relevant, my qemu command line is:

qemu-system-x86_64 -nodefaults -enable-kvm -snapshot -nographic -display none \
  -cpu host -m 4096M -smp 4 -rtc clock=vm -bios /usr/share/ovmf/OVMF.fd \
  -machine q35,i8042=off -device virtio-rng-pci -smbios type=1,serial=ds=nocloud \
  -chardev stdio,id=sio,signal=off,mux=on -device isa-serial,chardev=sio,index=0 \
  -mon chardev=sio,mode=readline \
  -netdev user,id=net0,hostfwd=tcp:127.0.0.1:23022-:22 \
  -device virtio-net-pci,netdev=net0 \
  -drive if=virtio,file=FreeBSD-15.0-ALPHA5-amd64-ufs-20251004-1c0898edf28f-280541.qcow2,if=virtio

and this is on an AMD Ryzen 9 PRO 6950HS, Linux 6.16 host
Comment 5 David 'equinox' Lamparter 2025-10-18 18:09:39 UTC
Ah, wrong qemu command line, but the only difference is the SMP number. Happens for both 4 and 12.  Let me try on a single core...

<insert jeopardy music>

...nope, can't seem to make it happen on a single core.  Let me try 2 as well...

<insert jeopardy music>

...interesting, 2 cores apparently also won't make it show up (within 250 attempts).  Now 3 cores...

<insert jeopardy music>

got it on attempt 28 with 3 cores.  If this is a 3-way race/deadlock condition, I'll be rather impressed.
Comment 6 David 'equinox' Lamparter 2025-10-18 18:38:27 UTC
changed the 2nd sleep from 1 to 0.1, and gave it 2000 tries each:

2 cores:  0 out of 2000

3 cores:  42 out of 2000

4 cores:  433 out of 2000

5 cores:  651 out of 2000

The mathematician in me wants to try 2.5 cores now, but alas, qemu does not seem to support half cores %-)
Comment 7 Mark Johnston freebsd_committer freebsd_triage 2025-10-18 20:55:40 UTC
Hum, still no luck for me when varying the number of cores.  I made 300 attempts with 3 and 4 vCPUs and don't see the problem.

The procstat output doesn't show anything, all of the kernel threads seem to be quiescent.

Can you show output from `jls --libxo json,pretty -vdh`?
Comment 8 David 'equinox' Lamparter 2025-10-18 22:58:40 UTC
Created attachment 264704 [details]
jls --libxo json,pretty -vdh

(In reply to Mark Johnston from comment #7)

Output of "jls --libxo json,pretty -vdh" attached

I can get you access to the VM and the hypervisor (in case you want to attach gdb), but maybe not this week (I'm at RIPE91).  I'm not sure what's different here, it's quite easy to reproduce with >= 4 CPUs.

I did "improve" the reproducer (removing the "set -x" makes it more likely because there's less "writing things to terminal" going on.  The sleep duration doesn't seem to matter):

#!/bin/sh

root="/root/base_txz"  # contents of base.txz

jail -i -c "path=$root" host.hostname=test command=/bin/sh -c "sleep 3" | {
	read jid
	echo $jid > /tmp/.jid     # not necessary, only for automation, see below
	sleep 0.1
	jail -r $jid
}
# time between "jail -r" and "jls -d" must be minimal
jls -d

# for automated reproduction with
#   while ./jailtest.sh; do true; done
for I in $(seq 0 10); do
	jls -d -j $(cat /tmp/.jid) || exit 0
	sleep .2
done
exit 1
Comment 9 Mark Johnston freebsd_committer freebsd_triage 2025-10-18 23:40:03 UTC
I can reproduce it now.  It helps to put some CPU load on the host.

If I remove the jls invocations I can still reproduce the problem.

In the past, I found such bugs to be caused by credential reference leaks.  I tried adding a global list of credentials and inspected it with a debugger after the problem occurs, but none of them refer to the dying jail, so presumably the problem is elsewhere.
Comment 10 Mark Johnston freebsd_committer freebsd_triage 2025-10-19 00:01:10 UTC
Jamie, I think there is a regression from commit 851dc7f859c23: sys_prison_remove() bumps the jail refcount and calls prison_remove(), which bails without releasing the reference if the jail is already dying.  Could you please take a look?
Comment 11 Jamie Gritton freebsd_committer freebsd_triage 2025-10-19 00:19:47 UTC
(In reply to Mark Johnston from comment #10)
Yes, I see the problem.  I'll work up a fix.
Comment 12 Jamie Gritton freebsd_committer freebsd_triage 2025-10-19 17:59:05 UTC
I've created https://reviews.freebsd.org/D53200 to fix this.

The fix is somewhat roundabout.  I could have added the code to just drop the jail in the already-dying part of prison_remove, but I noticed that it would take a call to prison_deref, almost identical to the call already made.  So I collapsed that function into just a prison_deref call with a couple of asserts attached, and then modified the PD_KILL part of prison_deref to make that test instead.

I haven't been able to replicate the problem, but I can at least say I haven't seen the fix break anything.  I'd appreciate it being tested by someone who has seen it fail (Mark, in particular I've made you a reviewer on the diff).
Comment 13 David 'equinox' Lamparter 2025-10-20 15:32:49 UTC
(In reply to Jamie Gritton from comment #12)

I've applied the patch on top of 15.0-ALPHA5 and can confirm I no longer see hung dying jails, after a reboot where I previously saw one on the 4th attempt.  Thanks for the fix!
Comment 14 commit-hook freebsd_committer freebsd_triage 2025-10-20 16:55:29 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=78f70d4ff9dd4af2318b25023a7f55be7402ec60

commit 78f70d4ff9dd4af2318b25023a7f55be7402ec60
Author:     Jamie Gritton <jamie@FreeBSD.org>
AuthorDate: 2025-10-20 16:49:14 +0000
Commit:     Jamie Gritton <jamie@FreeBSD.org>
CommitDate: 2025-10-20 16:49:14 +0000

    jail: fix a regression that creates zombies when removing dying jails

    When adding jail descriptors, I split sys_jail remove in two, and
    didn't properly track jail held between them when a jail was dying.
    This fixes that as well as cleaning up the logic behind it.

    PR:             290217
    Reported by:    David 'equinox' Lamparter <equinox at diac24.net>
    Reviewed by:    markj
    MFC after:      3 days
    Differential Revision:  https://reviews.freebsd.org/D53200

 sys/kern/kern_jail.c | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)
Comment 15 commit-hook freebsd_committer freebsd_triage 2025-10-23 04:37:32 UTC
A commit in branch stable/15 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=2d3c6a06edc3919455d1152f4ffaa60697e2c4f2

commit 2d3c6a06edc3919455d1152f4ffaa60697e2c4f2
Author:     Jamie Gritton <jamie@FreeBSD.org>
AuthorDate: 2025-10-20 16:49:14 +0000
Commit:     Jamie Gritton <jamie@FreeBSD.org>
CommitDate: 2025-10-23 04:37:01 +0000

    jail: fix a regression that creates zombies when removing dying jails

    When adding jail descriptors, I split sys_jail remove in two, and
    didn't properly track jail held between them when a jail was dying.
    This fixes that as well as cleaning up the logic behind it.

    PR:             290217
    Reported by:    David 'equinox' Lamparter <equinox at diac24.net>
    Reviewed by:    markj
    MFC after:      3 days
    Differential Revision:  https://reviews.freebsd.org/D53200

    (cherry picked from commit 78f70d4ff9dd4af2318b25023a7f55be7402ec60)

 sys/kern/kern_jail.c | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)
Comment 16 commit-hook freebsd_committer freebsd_triage 2025-10-30 04:24:04 UTC
A commit in branch releng/15.0 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=c37d95826ab5a9becb491396a6522f442680d25f

commit c37d95826ab5a9becb491396a6522f442680d25f
Author:     Jamie Gritton <jamie@FreeBSD.org>
AuthorDate: 2025-10-20 16:49:14 +0000
Commit:     Colin Percival <cperciva@FreeBSD.org>
CommitDate: 2025-10-30 04:23:18 +0000

    jail: fix a regression that creates zombies when removing dying jails

    When adding jail descriptors, I split sys_jail remove in two, and
    didn't properly track jail held between them when a jail was dying.
    This fixes that as well as cleaning up the logic behind it.

    Approved by:    re (cperciva)
    PR:             290217
    Reported by:    David 'equinox' Lamparter <equinox at diac24.net>
    Reviewed by:    markj
    MFC after:      3 days
    Differential Revision:  https://reviews.freebsd.org/D53200

    (cherry picked from commit 78f70d4ff9dd4af2318b25023a7f55be7402ec60)
    (cherry picked from commit 2d3c6a06edc3919455d1152f4ffaa60697e2c4f2)

 sys/kern/kern_jail.c | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)
Comment 17 Mark Johnston freebsd_committer freebsd_triage 2025-10-30 14:12:07 UTC
Thank you for the report and the reproducer.