The script at the end of this report will, with a random chance somewhere in the 20-40% range, result in a jail stuck in "dying" and a mount that can no longer be umounted (if the root is a mount). It doesn't take more than a few tries to trigger the behavior for me. System version is 15-ALPHA4, the script doesn't seem to cause problems on 14.3. The problem does not seem to occur if the jail exits normally after the sleep, i.e. "jail -r" is a necessary factor. There is no config for any jail. This is likely the same problem as in https://forums.freebsd.org/threads/remove-dying-jail.96919/ though I am not the creator of that forum post. FreeBSD test 15.0-ALPHA4 FreeBSD 15.0-ALPHA4 stable/15-n280334-d2b670b27f37 GENERIC amd64 #!/bin/sh set -x root="/root/base_txz" # contents of base.txz jail -i -c "path=$root" host.hostname=test command=/bin/sh -c "sleep 3" | { read jid sleep 1 jail -r $jid } for I in $(seq 0 10); do jls -d | grep "$root" || break sleep 1 done
Tried on a blank (almost untouched) downloaded VM image of 16 and it only happened once out of several hundred times. Also double checked on a similarly blank download of 15-ALPHA5 and it happens within 3-4 tries. versions involved: FreeBSD freebsd 16.0-CURRENT FreeBSD 16.0-CURRENT #0 main-n281019-0dc634d48fcc: Thu Oct 9 20:12:32 UTC 2025 root@releng3.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 FreeBSD freebsd 15.0-ALPHA5 FreeBSD 15.0-ALPHA5 stable/15-n280541-1c0898edf28f GENERIC amd64
Huh. The "jls -d" call seems to also be relevant, and it seems to be very timing sensitive. via ssh (with stderr): 10.6% chance on serial (with stderr): 0.3% chance on serial (stderr to /dev/null): 19.8% chance With this knowledge I tried 16 again - surprise, it's actually broken pretty exactly the same. However, I should say, my original problem didn't involve "jls" at all (I was coming at this from a "can't umount jail's root" angle, wasn't even running jls). It's possible there are multiple issues here, or it's a single that can be triggered in various ways.
I've been running the reproducer like this for a while and have not seen the problem yet on FreeBSD main: while true; do sh repro.sh; if jls -vd | grep DYING; then break; fi ; done Up to about 250 tries now. This is in a 2-vcpu VM. It'd be useful to see output from "procstat -kka" taken after you've got a stuck jail.
Created attachment 264702 [details] procstat -kka (In reply to Mark Johnston from comment #3) Output of "uname -a; jls; jls -d; procstat -kka" attached. 3rd attempt got me the hang this time. In case it's somehow relevant, my qemu command line is: qemu-system-x86_64 -nodefaults -enable-kvm -snapshot -nographic -display none \ -cpu host -m 4096M -smp 4 -rtc clock=vm -bios /usr/share/ovmf/OVMF.fd \ -machine q35,i8042=off -device virtio-rng-pci -smbios type=1,serial=ds=nocloud \ -chardev stdio,id=sio,signal=off,mux=on -device isa-serial,chardev=sio,index=0 \ -mon chardev=sio,mode=readline \ -netdev user,id=net0,hostfwd=tcp:127.0.0.1:23022-:22 \ -device virtio-net-pci,netdev=net0 \ -drive if=virtio,file=FreeBSD-15.0-ALPHA5-amd64-ufs-20251004-1c0898edf28f-280541.qcow2,if=virtio and this is on an AMD Ryzen 9 PRO 6950HS, Linux 6.16 host
Ah, wrong qemu command line, but the only difference is the SMP number. Happens for both 4 and 12. Let me try on a single core... <insert jeopardy music> ...nope, can't seem to make it happen on a single core. Let me try 2 as well... <insert jeopardy music> ...interesting, 2 cores apparently also won't make it show up (within 250 attempts). Now 3 cores... <insert jeopardy music> got it on attempt 28 with 3 cores. If this is a 3-way race/deadlock condition, I'll be rather impressed.
changed the 2nd sleep from 1 to 0.1, and gave it 2000 tries each: 2 cores: 0 out of 2000 3 cores: 42 out of 2000 4 cores: 433 out of 2000 5 cores: 651 out of 2000 The mathematician in me wants to try 2.5 cores now, but alas, qemu does not seem to support half cores %-)
Hum, still no luck for me when varying the number of cores. I made 300 attempts with 3 and 4 vCPUs and don't see the problem. The procstat output doesn't show anything, all of the kernel threads seem to be quiescent. Can you show output from `jls --libxo json,pretty -vdh`?
Created attachment 264704 [details] jls --libxo json,pretty -vdh (In reply to Mark Johnston from comment #7) Output of "jls --libxo json,pretty -vdh" attached I can get you access to the VM and the hypervisor (in case you want to attach gdb), but maybe not this week (I'm at RIPE91). I'm not sure what's different here, it's quite easy to reproduce with >= 4 CPUs. I did "improve" the reproducer (removing the "set -x" makes it more likely because there's less "writing things to terminal" going on. The sleep duration doesn't seem to matter): #!/bin/sh root="/root/base_txz" # contents of base.txz jail -i -c "path=$root" host.hostname=test command=/bin/sh -c "sleep 3" | { read jid echo $jid > /tmp/.jid # not necessary, only for automation, see below sleep 0.1 jail -r $jid } # time between "jail -r" and "jls -d" must be minimal jls -d # for automated reproduction with # while ./jailtest.sh; do true; done for I in $(seq 0 10); do jls -d -j $(cat /tmp/.jid) || exit 0 sleep .2 done exit 1
I can reproduce it now. It helps to put some CPU load on the host. If I remove the jls invocations I can still reproduce the problem. In the past, I found such bugs to be caused by credential reference leaks. I tried adding a global list of credentials and inspected it with a debugger after the problem occurs, but none of them refer to the dying jail, so presumably the problem is elsewhere.
Jamie, I think there is a regression from commit 851dc7f859c23: sys_prison_remove() bumps the jail refcount and calls prison_remove(), which bails without releasing the reference if the jail is already dying. Could you please take a look?
(In reply to Mark Johnston from comment #10) Yes, I see the problem. I'll work up a fix.
I've created https://reviews.freebsd.org/D53200 to fix this. The fix is somewhat roundabout. I could have added the code to just drop the jail in the already-dying part of prison_remove, but I noticed that it would take a call to prison_deref, almost identical to the call already made. So I collapsed that function into just a prison_deref call with a couple of asserts attached, and then modified the PD_KILL part of prison_deref to make that test instead. I haven't been able to replicate the problem, but I can at least say I haven't seen the fix break anything. I'd appreciate it being tested by someone who has seen it fail (Mark, in particular I've made you a reviewer on the diff).
(In reply to Jamie Gritton from comment #12) I've applied the patch on top of 15.0-ALPHA5 and can confirm I no longer see hung dying jails, after a reboot where I previously saw one on the 4th attempt. Thanks for the fix!
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=78f70d4ff9dd4af2318b25023a7f55be7402ec60 commit 78f70d4ff9dd4af2318b25023a7f55be7402ec60 Author: Jamie Gritton <jamie@FreeBSD.org> AuthorDate: 2025-10-20 16:49:14 +0000 Commit: Jamie Gritton <jamie@FreeBSD.org> CommitDate: 2025-10-20 16:49:14 +0000 jail: fix a regression that creates zombies when removing dying jails When adding jail descriptors, I split sys_jail remove in two, and didn't properly track jail held between them when a jail was dying. This fixes that as well as cleaning up the logic behind it. PR: 290217 Reported by: David 'equinox' Lamparter <equinox at diac24.net> Reviewed by: markj MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D53200 sys/kern/kern_jail.c | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-)
A commit in branch stable/15 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=2d3c6a06edc3919455d1152f4ffaa60697e2c4f2 commit 2d3c6a06edc3919455d1152f4ffaa60697e2c4f2 Author: Jamie Gritton <jamie@FreeBSD.org> AuthorDate: 2025-10-20 16:49:14 +0000 Commit: Jamie Gritton <jamie@FreeBSD.org> CommitDate: 2025-10-23 04:37:01 +0000 jail: fix a regression that creates zombies when removing dying jails When adding jail descriptors, I split sys_jail remove in two, and didn't properly track jail held between them when a jail was dying. This fixes that as well as cleaning up the logic behind it. PR: 290217 Reported by: David 'equinox' Lamparter <equinox at diac24.net> Reviewed by: markj MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D53200 (cherry picked from commit 78f70d4ff9dd4af2318b25023a7f55be7402ec60) sys/kern/kern_jail.c | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-)
A commit in branch releng/15.0 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=c37d95826ab5a9becb491396a6522f442680d25f commit c37d95826ab5a9becb491396a6522f442680d25f Author: Jamie Gritton <jamie@FreeBSD.org> AuthorDate: 2025-10-20 16:49:14 +0000 Commit: Colin Percival <cperciva@FreeBSD.org> CommitDate: 2025-10-30 04:23:18 +0000 jail: fix a regression that creates zombies when removing dying jails When adding jail descriptors, I split sys_jail remove in two, and didn't properly track jail held between them when a jail was dying. This fixes that as well as cleaning up the logic behind it. Approved by: re (cperciva) PR: 290217 Reported by: David 'equinox' Lamparter <equinox at diac24.net> Reviewed by: markj MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D53200 (cherry picked from commit 78f70d4ff9dd4af2318b25023a7f55be7402ec60) (cherry picked from commit 2d3c6a06edc3919455d1152f4ffaa60697e2c4f2) sys/kern/kern_jail.c | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-)
Thank you for the report and the reproducer.