Just saving this to a PR as I cannot constantly work on it and don't want to lose it. Also if others have seen this (not this is not waiting on a TCP close or similar) then they might chime in here on when this started... Please do NOT add a mailing list to Cc: The simplest test case I could find is: jail -c name=foo persist jexec foo ^d jail -r foo The jail should be gone but isn't... breaking into the debugger there is one pr_ref left. prison 0xfffff80009ff1000: jid = 1 name = foo parent = 0xffffffff81b01960 ref = 1 uref = 0 path = / cpuset = 2 vnet = 0xfffff80003104500 root = 0xfffff80009685b40 securelevel = -1 devfs_rsnum = 0 children.max = 0 children.cur = 0 child = 0 sibling = 0 flags = 0x600018e ip4.saddrsel ip6.saddrsel host = new vnet = inherit ip4 = disable ip6 = disable allow = 0x8201 allow.set_hostname allow.reserved_ports allow.unprivileged_proc_debug enforce_statfs = 2 host.hostname = host.domainname = host.hostuuid = 00000000-0000-0000-0000-000000000000 host.hostid = 0 ip4s = 0 ip6s = 0 Doing some instrumentation I find that this is what happened to pr_ref (reference logged after the ++ or -- operation): db> show pr_ref 0xfffff80009ff1000 66 0xfffff80009ff1000 1 kern_jail_set:1901 67 0xfffff80009ff1000 2 kern_jail_get:2073 68 0xfffff80009ff1000 1 prison_deref:2644 69 0xfffff80009ff1000 2 do_jail_attach:2419 72 0xfffff80009ff1000 3 kern_jail_get:2073 73 0xfffff80009ff1000 2 prison_deref:2644 74 0xfffff80009ff1000 3 sys_jail_remove:2285 75 0xfffff80009ff1000 2 prison_remove_one:2326 76 0xfffff80009ff1000 3 prison_deref:2636 77 0xfffff80009ff1000 2 prison_deref:2644
Ignore the trace for now; the instrumentation moves things around and also missed a place.
Ok, func/line are still offset but the references come out right now :-) 1st try to make sense of things: db> show pr_ref 0xfffff8001a234000 106 0xfffff8001a234000 1 kern_jail_set:1905 A ++ PERSIST 107 0xfffff8001a234000 2 kern_jail_get:2077 B ++ found_prison: 108 0xfffff8001a234000 1 prison_deref:2648 B -- PD_DEREF 109 0xfffff8001a234000 2 do_jail_attach:2423 C ++ ? 112 0xfffff8001a234000 3 kern_jail_get:2077 D ++ found_prison: 113 0xfffff8001a234000 2 prison_deref:2648 D -- PD_DEREF 114 0xfffff8001a234000 3 sys_jail_remove:2289 E ++ JR 115 0xfffff8001a234000 2 prison_remove_one:2330 A -- PERSIST <not the last one here>? 116 0xfffff8001a234000 3 prison_deref:2640 F ++ last_uref 117 0xfffff8001a234000 2 prison_deref:2648 G -- PD_DEREF 118 0xfffff8001a234000 1 prison_deref:2664 F -- last_uref 2nd try to pair them up: db> show pr_ref 0xfffff80017051000 105 0xfffff80017051000 1 kern_jail_set:1912 | ++ PERSIST 106 0xfffff80017051000 2 kern_jail_get:2084 | ++ found_prison: 107 0xfffff80017051000 1 _prison_deref:2655 kern_jail_get:2213 | -- found_prison: 108 0xfffff80017051000 2 do_jail_attach:2430 | ++ ------- what is this one added for? or why do we not lose it in time before jail_remove? (it's right at the top of the function, after the XXX comment) 111 0xfffff80017051000 3 kern_jail_get:2084 | ++ found_prison: 112 0xfffff80017051000 2 _prison_deref:2655 kern_jail_get:2213 | -- found_prison: <<<< SOMETHING MISSING HERE, I assume when jexec leaves the jail >>>> 113 0xfffff80017051000 3 sys_jail_remove:2296 | ++ JR 114 0xfffff80017051000 2 prison_remove_one:2337 | -- PERSIST <not at 1 yet so this is not the only reference which jail_remove had added> <so running into: Remove the temporary reference added by jail_remove> 115 0xfffff80017051000 3 _prison_deref:2647 prison_remove_one:2369 | ++ lasturef 116 0xfffff80017051000 2 _prison_deref:2655 prison_remove_one:2369 | -- JR 117 0xfffff80017051000 1 _prison_deref:2671 prison_remove_one:2369 | -- lasturef
Ok, verified that if doing a jail -c name=foo persist jexec foo ^d jexec foo ^d jexec foo ^d jexec foo ^d jexec foo ^d jail -r foo I get a pr_ref = 5 left: prison 0xfffff80009f59000: jid = 1 name = foo parent = 0xffffffff81b01960 ref = 5 uref = 0 path = / ... db> show pr_ref 0xfffff80009f59000 127 0xfffff80009f59000 1 kern_jail_set:1912 kern_jail_set:1912 128 0xfffff80009f59000 2 kern_jail_get:2084 kern_jail_get:2084 129 0xfffff80009f59000 1 _prison_deref:2655 kern_jail_get:2213 130 0xfffff80009f59000 2 do_jail_attach:2430 do_jail_attach:2430 133 0xfffff80009f59000 3 kern_jail_get:2084 kern_jail_get:2084 134 0xfffff80009f59000 2 _prison_deref:2655 kern_jail_get:2213 135 0xfffff80009f59000 3 do_jail_attach:2430 do_jail_attach:2430 138 0xfffff80009f59000 4 kern_jail_get:2084 kern_jail_get:2084 139 0xfffff80009f59000 3 _prison_deref:2655 kern_jail_get:2213 140 0xfffff80009f59000 4 do_jail_attach:2430 do_jail_attach:2430 143 0xfffff80009f59000 5 kern_jail_get:2084 kern_jail_get:2084 144 0xfffff80009f59000 4 _prison_deref:2655 kern_jail_get:2213 145 0xfffff80009f59000 5 do_jail_attach:2430 do_jail_attach:2430 148 0xfffff80009f59000 6 kern_jail_get:2084 kern_jail_get:2084 149 0xfffff80009f59000 5 _prison_deref:2655 kern_jail_get:2213 150 0xfffff80009f59000 6 do_jail_attach:2430 do_jail_attach:2430 153 0xfffff80009f59000 7 kern_jail_get:2084 kern_jail_get:2084 154 0xfffff80009f59000 6 _prison_deref:2655 kern_jail_get:2213 155 0xfffff80009f59000 7 sys_jail_remove:2296 sys_jail_remove:2296 156 0xfffff80009f59000 6 prison_remove_one:2337 prison_remove_one:2337 157 0xfffff80009f59000 7 _prison_deref:2647 prison_remove_one:2369 158 0xfffff80009f59000 6 _prison_deref:2655 prison_remove_one:2369 159 0xfffff80009f59000 5 _prison_deref:2671 prison_remove_one:2369
Hi. I can replicate this by running (after a reboot): # jail -c name=foo persist # jexec foo sh # jail -r foo After that, 'jls -dv' shows jail 1 as 'dying'. I accidentally found out that it will disappear when copying/creating a big file on my VM, for example: # dd if=/dev/zero of=/tmp/bigfile bs=1M count=500 If (after a reboot) I first run: # sh and then # jail -c name=foo persist # jexec foo sh # jail -r foo 'jls -dv' doesn't show jail 1. Revision r308442 is the first commit with this behavior (https://svnweb.freebsd.org/base?view=revision&revision=308442).
r308442 as a culprit makes sense - the whole point of that rev is to pass td_ucred to buffer allocation. Before then the buffer cache didn't reference the reader's prison. I don't know what this is supposed to accomplish, as I don't see off hand why it would be useful to know to credentials of the original reader of a cached page. Especially when the reader can be long gone, as in the case of a removed prison.
Receiveda nother report of this today on IRC... it seems sensible CC kib@ as the committer of r308442, which this has been tracked down to, for discussion- so I'm doing so.
Buffer' credentials are needed for correct operation of non-local filesystems. E.g. NFS needs to know credentials of the caller to issue read rpc. Otherwise, rpc is sent as if it is originating by the root user, that often is mapped to 'nobody' on the server. Please try https://reviews.freebsd.org/D23775.
(In reply to Konstantin Belousov from comment #7) Tried the diff on a netbooted NFSRoot machine with the command sequence from the first posting here; the jail stayed with 1 pr_ref. I assume it has to do with the nfsroot; the nfs server is a freebsd (zoo) in case it matters. And for my reference; the original instrumentation was in the git_bz_vimage tree.
A commit references this bug: Author: kib Date: Thu Mar 5 15:52:34 UTC 2020 New revision: 358676 URL: https://svnweb.freebsd.org/changeset/base/358676 Log: buffer pager: deref ucred immediately after read. Ucred is passed to bread(9) so that non-local filesystems use proper credentials. But, since clean buffer might be cached unless buf_pager_relbuf is not enabled, it makes credentials to have extra reference until buffer is reclaimed. Ucred reference would prevent jail from destroying if creds are jailed. Dereferencing the read credentials on the valid buffer avoid that, and should be fine because the buffer is valid and does not need re-read. PR: 238032 Reported by: bz Reproduced and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D23775 Changes: head/sys/kern/vfs_bio.c
I'm assuming kib's commit has cleaned this one up, as I'm no longer able to replicate on CURRENT or STABLE-12.