Bug 238032

Summary: jexec pr_ref leak
Product: Base System Reporter: Bjoern A. Zeeb <bz>
Component: kernAssignee: Bjoern A. Zeeb <bz>
Status: Closed FIXED    
Severity: Affects Some People CC: bz, jamie, kevans, kib, wiml
Priority: ---    
Version: CURRENT   
Hardware: Any   
OS: Any   

Description Bjoern A. Zeeb freebsd_committer freebsd_triage 2019-05-21 21:13:53 UTC
Just saving this to a PR as I cannot constantly work on it and don't want to lose it.  Also if others have seen this (not this is not waiting on a TCP close or similar) then they might chime in here on when this started...  Please do NOT add a mailing list to Cc:



The simplest test case I could find is:

jail -c name=foo persist
jexec foo
^d
jail -r foo

The jail should be gone but isn't...  breaking into the debugger there is one pr_ref left.

prison 0xfffff80009ff1000:
 jid             = 1
 name            = foo
 parent          = 0xffffffff81b01960
 ref             = 1
 uref            = 0
 path            = /
 cpuset          = 2
 vnet            = 0xfffff80003104500
 root            = 0xfffff80009685b40
 securelevel     = -1
 devfs_rsnum     = 0
 children.max    = 0
 children.cur    = 0
 child           = 0
 sibling         = 0
 flags           = 0x600018e ip4.saddrsel ip6.saddrsel host            = new
 vnet            = inherit
 ip4             = disable
 ip6             = disable
 allow           = 0x8201 allow.set_hostname allow.reserved_ports allow.unprivileged_proc_debug
 enforce_statfs  = 2
 host.hostname   =
 host.domainname =
 host.hostuuid   = 00000000-0000-0000-0000-000000000000
 host.hostid     = 0
 ip4s            = 0
 ip6s            = 0

Doing some instrumentation I find that this is what happened to pr_ref (reference logged after the ++ or -- operation):

db> show pr_ref 0xfffff80009ff1000
66    0xfffff80009ff1000 1     kern_jail_set:1901
67    0xfffff80009ff1000 2     kern_jail_get:2073
68    0xfffff80009ff1000 1     prison_deref:2644
69    0xfffff80009ff1000 2     do_jail_attach:2419
72    0xfffff80009ff1000 3     kern_jail_get:2073
73    0xfffff80009ff1000 2     prison_deref:2644
74    0xfffff80009ff1000 3     sys_jail_remove:2285
75    0xfffff80009ff1000 2     prison_remove_one:2326
76    0xfffff80009ff1000 3     prison_deref:2636
77    0xfffff80009ff1000 2     prison_deref:2644
Comment 1 Bjoern A. Zeeb freebsd_committer freebsd_triage 2019-05-21 21:18:57 UTC
Ignore the trace for now; the instrumentation moves things around and also missed a place.
Comment 2 Bjoern A. Zeeb freebsd_committer freebsd_triage 2019-05-21 22:02:06 UTC
Ok, func/line are still offset but the references come out right now :-)

1st try to make sense of things:

db> show pr_ref 0xfffff8001a234000
106   0xfffff8001a234000 1     kern_jail_set:1905       A ++ PERSIST

107   0xfffff8001a234000 2     kern_jail_get:2077       B ++ found_prison:
108   0xfffff8001a234000 1     prison_deref:2648        B -- PD_DEREF

109   0xfffff8001a234000 2     do_jail_attach:2423      C ++ ?

112   0xfffff8001a234000 3     kern_jail_get:2077       D ++ found_prison:
113   0xfffff8001a234000 2     prison_deref:2648        D -- PD_DEREF

114   0xfffff8001a234000 3     sys_jail_remove:2289     E ++ JR
115   0xfffff8001a234000 2     prison_remove_one:2330   A -- PERSIST
<not the last one here>?
116   0xfffff8001a234000 3     prison_deref:2640        F ++ last_uref

117   0xfffff8001a234000 2     prison_deref:2648        G -- PD_DEREF
118   0xfffff8001a234000 1     prison_deref:2664        F -- last_uref

2nd try to pair them up:

db> show pr_ref 0xfffff80017051000
105   0xfffff80017051000 1     kern_jail_set:1912                          | ++ PERSIST

106   0xfffff80017051000 2     kern_jail_get:2084                          | ++ found_prison:
107   0xfffff80017051000 1     _prison_deref:2655 kern_jail_get:2213       | -- found_prison:

108   0xfffff80017051000 2     do_jail_attach:2430                         | ++ ------- what is this one added for? or why do we not lose it in time before jail_remove?
      (it's right at the top of the function, after the XXX comment)
 
111   0xfffff80017051000 3     kern_jail_get:2084                          | ++ found_prison:
112   0xfffff80017051000 2     _prison_deref:2655 kern_jail_get:2213       | -- found_prison:

   <<<< SOMETHING MISSING HERE, I assume when jexec leaves the jail >>>>

113   0xfffff80017051000 3     sys_jail_remove:2296                        | ++ JR

114   0xfffff80017051000 2     prison_remove_one:2337                      | -- PERSIST
<not at 1 yet so this is not the only reference which jail_remove had added>
<so running into:  Remove the temporary reference added by jail_remove>
115   0xfffff80017051000 3     _prison_deref:2647 prison_remove_one:2369   | ++ lasturef 
116   0xfffff80017051000 2     _prison_deref:2655 prison_remove_one:2369   | -- JR
117   0xfffff80017051000 1     _prison_deref:2671 prison_remove_one:2369   | -- lasturef
Comment 3 Bjoern A. Zeeb freebsd_committer freebsd_triage 2019-05-21 22:12:36 UTC
Ok, verified that if doing a
jail -c name=foo persist
jexec foo
^d
jexec foo
^d
jexec foo
^d
jexec foo
^d
jexec foo
^d
jail -r foo

I get a pr_ref = 5 left:

prison 0xfffff80009f59000:
 jid             = 1
 name            = foo
 parent          = 0xffffffff81b01960
 ref             = 5
 uref            = 0
 path            = /
...


db> show pr_ref 0xfffff80009f59000
127   0xfffff80009f59000 1     kern_jail_set:1912 kern_jail_set:1912
128   0xfffff80009f59000 2     kern_jail_get:2084 kern_jail_get:2084
129   0xfffff80009f59000 1     _prison_deref:2655 kern_jail_get:2213
130   0xfffff80009f59000 2     do_jail_attach:2430 do_jail_attach:2430
133   0xfffff80009f59000 3     kern_jail_get:2084 kern_jail_get:2084
134   0xfffff80009f59000 2     _prison_deref:2655 kern_jail_get:2213
135   0xfffff80009f59000 3     do_jail_attach:2430 do_jail_attach:2430
138   0xfffff80009f59000 4     kern_jail_get:2084 kern_jail_get:2084
139   0xfffff80009f59000 3     _prison_deref:2655 kern_jail_get:2213
140   0xfffff80009f59000 4     do_jail_attach:2430 do_jail_attach:2430
143   0xfffff80009f59000 5     kern_jail_get:2084 kern_jail_get:2084
144   0xfffff80009f59000 4     _prison_deref:2655 kern_jail_get:2213
145   0xfffff80009f59000 5     do_jail_attach:2430 do_jail_attach:2430
148   0xfffff80009f59000 6     kern_jail_get:2084 kern_jail_get:2084
149   0xfffff80009f59000 5     _prison_deref:2655 kern_jail_get:2213
150   0xfffff80009f59000 6     do_jail_attach:2430 do_jail_attach:2430
153   0xfffff80009f59000 7     kern_jail_get:2084 kern_jail_get:2084
154   0xfffff80009f59000 6     _prison_deref:2655 kern_jail_get:2213
155   0xfffff80009f59000 7     sys_jail_remove:2296 sys_jail_remove:2296
156   0xfffff80009f59000 6     prison_remove_one:2337 prison_remove_one:2337
157   0xfffff80009f59000 7     _prison_deref:2647 prison_remove_one:2369
158   0xfffff80009f59000 6     _prison_deref:2655 prison_remove_one:2369
159   0xfffff80009f59000 5     _prison_deref:2671 prison_remove_one:2369
Comment 4 Denis Salopek 2019-08-27 09:55:06 UTC
Hi.

I can replicate this by running (after a reboot):
# jail -c name=foo persist
# jexec foo sh
# jail -r foo

After that, 'jls -dv' shows jail 1 as 'dying'.

I accidentally found out that it will disappear when copying/creating a big file on my VM, for example:

# dd if=/dev/zero of=/tmp/bigfile bs=1M count=500


If (after a reboot) I first run:
# sh
and then
# jail -c name=foo persist
# jexec foo sh
# jail -r foo

'jls -dv' doesn't show jail 1.

Revision r308442 is the first commit with this behavior (https://svnweb.freebsd.org/base?view=revision&revision=308442).
Comment 5 Jamie Gritton freebsd_committer freebsd_triage 2019-08-27 16:19:14 UTC
r308442 as a culprit makes sense - the whole point of that rev is to pass td_ucred to buffer allocation.  Before then the buffer cache didn't reference the reader's prison.

I don't know what this is supposed to accomplish, as I don't see off hand why it would be useful to know to credentials of the original reader of a cached page.  Especially when the reader can be long gone, as in the case of a removed prison.
Comment 6 Kyle Evans freebsd_committer freebsd_triage 2020-02-18 22:45:31 UTC
Receiveda nother report of this today on IRC... it seems sensible CC kib@ as the committer of r308442, which this has been tracked down to, for discussion- so I'm doing so.
Comment 7 Konstantin Belousov freebsd_committer freebsd_triage 2020-02-20 16:28:53 UTC
Buffer' credentials are needed for correct operation of non-local filesystems.  E.g. NFS needs to know credentials of the caller to issue read rpc.  Otherwise, rpc is sent as if it is originating by the root user, that often is mapped to 'nobody' on the server.

Please try https://reviews.freebsd.org/D23775.
Comment 8 Bjoern A. Zeeb freebsd_committer freebsd_triage 2020-02-26 22:13:16 UTC
(In reply to Konstantin Belousov from comment #7)

Tried the diff on a netbooted NFSRoot machine with the command sequence from the first posting here;  the jail stayed with 1 pr_ref.
I assume it has to do with the nfsroot;  the nfs server is a freebsd (zoo) in case it matters.

And for my reference; the original instrumentation was in the git_bz_vimage tree.
Comment 9 commit-hook freebsd_committer freebsd_triage 2020-03-05 15:53:32 UTC
A commit references this bug:

Author: kib
Date: Thu Mar  5 15:52:34 UTC 2020
New revision: 358676
URL: https://svnweb.freebsd.org/changeset/base/358676

Log:
  buffer pager: deref ucred immediately after read.

  Ucred is passed to bread(9) so that non-local filesystems use proper
  credentials.  But, since clean buffer might be cached unless
  buf_pager_relbuf is not enabled, it makes credentials to have extra
  reference until buffer is reclaimed.  Ucred reference would prevent
  jail from destroying if creds are jailed.

  Dereferencing the read credentials on the valid buffer avoid that, and
  should be fine because the buffer is valid and does not need re-read.

  PR:	238032
  Reported by:	bz
  Reproduced and tested by:	pho
  Sponsored by:	The FreeBSD Foundation
  MFC after:	1 week
  Differential revision:	https://reviews.freebsd.org/D23775

Changes:
  head/sys/kern/vfs_bio.c
Comment 10 Jamie Gritton freebsd_committer freebsd_triage 2020-08-29 22:37:29 UTC
I'm assuming kib's commit has cleaned this one up, as I'm no longer able to replicate on CURRENT or STABLE-12.