Bug 237656 - "Freed UMA keg (rtentry) was not empty (18 items). Lost 1 pages of memory." seen when running sys/netipsec tests
Summary: "Freed UMA keg (rtentry) was not empty (18 items). Lost 1 pages of memory." ...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: Kristof Provost
URL:
Keywords: vimage
Depends on:
Blocks:
 
Reported: 2019-04-29 19:48 UTC by Enji Cooper
Modified: 2021-04-23 15:52 UTC (History)
8 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Enji Cooper freebsd_committer freebsd_triage 2019-04-29 19:48:01 UTC
I see multiple warnings from the UMA keg code when running the netipsec tunnel tests about leaked UMA kegs. From https://ci.freebsd.org/job/FreeBSD-head-amd64-test/11014/consoleText :

sys/netipsec/tunnel/aes_cbc_128_hmac_sha1:v4  ->  lock order reversal:
 1st 0xffffffff82096810 allprison (allprison) @ /usr/src/sys/kern/kern_jail.c:966
 2nd 0xffffffff820c4830 vnet_sysinit_sxlock (vnet_sysinit_sxlock) @ /usr/src/sys/net/vnet.c:575
stack backtrace:
#0 0xffffffff80c47773 at witness_debugger+0x73
#1 0xffffffff80c474bd at witness_checkorder+0xa7d
#2 0xffffffff80be9008 at _sx_slock_int+0x68
#3 0xffffffff80d0f687 at vnet_alloc+0x117
#4 0xffffffff80ba40a1 at kern_jail_set+0x1bb1
#5 0xffffffff80ba5b00 at sys_jail_set+0x40
#6 0xffffffff810b2e16 at amd64_syscall+0x276
#7 0xffffffff8108b5fd at fast_syscall_common+0x101
Freed UMA keg (rtentry) was not empty (18 items).  Lost 1 pages of memory.
passed  [0.301s]
sys/netipsec/tunnel/aes_cbc_128_hmac_sha1:v6  ->  passed  [0.278s]
sys/netipsec/tunnel/aes_cbc_256_hmac_sha2_256:v4  ->  aesni0: <AES-CBC,AES-XTS,AES-GCM,AES-ICM> on motherboard
Freed UMA keg (rtentry) was not empty (18 items).  Lost 1 pages of memory.
Freed UMA keg (rtentry) was not empty (18 items).  Lost 1 pages of memory.
passed  [0.275s]
sys/netipsec/tunnel/aes_cbc_256_hmac_sha2_256:v6  ->  passed  [0.278s]
sys/netipsec/tunnel/aes_gcm_128:v4  ->  aesni0: detached
Apr 29 18:27:50  kernel: nd6_dad_timer: called with non-tentative address fe80:3::20:9bff:fe26:1c0a(epair2a)

Freed UMA keg (rtentry) was not empty (18 items).  Lost 1 pages of memory.
Freed UMA keg (rtentry) was not empty (18 items).  Lost 1 pages of memory.
passed  [0.285s]
sys/netipsec/tunnel/aes_gcm_128:v6  ->  passed  [0.278s]
sys/netipsec/tunnel/aes_gcm_256:v4  ->  Freed UMA keg (rtentry) was not empty (18 items).  Lost 1 pages of memory.
Freed UMA keg (rtentry) was not empty (18 items).  Lost 1 pages of memory.
passed  [0.287s]
sys/netipsec/tunnel/aes_gcm_256:v6  ->  passed  [0.280s]
sys/netipsec/tunnel/aesni_aes_cbc_128_hmac_sha1:v4  ->  aesni0: <AES-CBC,AES-XTS,AES-GCM,AES-ICM> on motherboard
Freed UMA keg (rtentry) was not empty (18 items).  Lost 1 pages of memory.
Freed UMA keg (rtentry) was not empty (18 items).  Lost 1 pages of memory.
passed  [0.291s]
sys/netipsec/tunnel/aesni_aes_cbc_128_hmac_sha1:v6  ->  passed  [0.292s]
sys/netipsec/tunnel/aesni_aes_cbc_256_hmac_sha2_256:v4  ->  aesni0: detached
Freed UMA keg (rtentry) was not empty (18 items).  Lost 1 pages of memory.
Freed UMA keg (rtentry) was not empty (18 items).  Lost 1 pages of memory.
passed  [0.296s]
sys/netipsec/tunnel/aesni_aes_cbc_256_hmac_sha2_256:v6  ->  passed  [0.257s]
sys/netipsec/tunnel/aesni_aes_gcm_128:v4  ->  aesni0: <AES-CBC,AES-XTS,AES-GCM,AES-ICM> on motherboard
Freed UMA keg (rtentry) was not empty (18 items).  Lost 1 pages of memory.
Freed UMA keg (rtentry) was not empty (18 items).  Lost 1 pages of memory.
passed  [0.256s]
sys/netipsec/tunnel/aesni_aes_gcm_128:v6  ->  passed  [0.254s]
sys/netipsec/tunnel/aesni_aes_gcm_256:v4  ->  Freed UMA keg (rtentry) was not empty (18 items).  Lost 1 pages of memory.
Freed UMA keg (rtentry) was not empty (18 items).  Lost 1 pages of memory.
passed  [0.262s]
sys/netipsec/tunnel/aesni_aes_gcm_256:v6  ->  passed  [0.291s]
sys/netipsec/tunnel/empty:v4  ->  Freed UMA keg (rtentry) was not empty (18 items).  Lost 1 pages of memory.
Freed UMA keg (rtentry) was not empty (18 items).  Lost 1 pages of memory.
passed  [0.285s]
sys/netipsec/tunnel/empty:v6  ->  passed  [0.282s]

Suggested repro:

* Boot a WITNESS enabled kernel with IPSEC support.
* Run `sudo kyua test -k /usr/tests/sys/netipsec/tunnel/Kyuafile`.
Comment 1 Bjoern A. Zeeb freebsd_committer freebsd_triage 2019-04-29 19:57:15 UTC
Missing cleanup with some component during VNET shutdown.
Comment 2 Bjoern A. Zeeb freebsd_committer freebsd_triage 2019-05-18 18:20:17 UTC
r346890 before I go and look it up again
Comment 3 Ed Maste freebsd_committer freebsd_triage 2019-07-31 13:33:49 UTC
(In reply to Bjoern A. Zeeb from comment #2)
Seems r346890 https://reviews.freebsd.org/rS346890 is the wrong revision?
Comment 4 Ryan Libby freebsd_committer freebsd_triage 2020-01-19 02:03:43 UTC
The leak report goes all the way back to the introduction of the test in
r326497, two years ago.

https://ci.freebsd.org/job/FreeBSD-head-amd64-test/5285/
https://svnweb.freebsd.org/base?view=revision&revision=326497

Here is a narrower repro that worked for me on a GENERIC-NODEBUG kernel:
kldload ipsec
kyua test -k /usr/tests/sys/netipsec/tunnel/Kyuafile empty:v4

The ping in ist_test() appears to be required for the leak, no leak is
reported when it is commented out.  Adding a route -n flush after the
ping did not rescue the leak.
Comment 5 Mason Loring Bliss freebsd_triage 2020-07-07 13:22:53 UTC
I saw this (or a very similar) bug just now while shutting down a couple 
jails. 

Jul  7 08:57:16 nile kernel: epair2a: promiscuous mode disabled
Jul  7 08:57:16 nile kernel: epair2a: link state changed to DOWN
Jul  7 08:57:16 nile kernel: epair2b: link state changed to DOWN
Jul  7 08:57:16 nile kernel: in6_purgeaddr: err=65, destination address delete failed
Jul  7 08:57:16 nile kernel: Freed UMA keg (rtentry) was not empty (54 items).
Lost 3 pages of memory.

I had shut down one additional jail a moment before, and then two more, and 
it was prior to this returning that I experienced a spontaneous reboot of 
the host.

The jail config in action for this:

    exec.prestart += "ifconfig epair${ep} create up";
    exec.prestart += "ifconfig $bridge addm epair${ep}a";
    exec.prestart += "ifconfig epair${ep}b link $mac";

    exec.start += "dhclient epair${ep}b";

    exec.poststop += "ifconfig $bridge deletem epair${ep}a";
    exec.poststop += "ifconfig epair${ep}a destroy";

I believe this may have been triggered by a user error, where I forgot to 
increment one of my jails' ep values before starting it yesterday, thus 
trying to run the prestart sequence against an already extant epair. It 
failed, I noticed and corrected the error, and had no subsequent problems, 
but it's notable that my first attempt to run deletem on the epair that had 
the bad, duplicate create attempt yesterday resulted in the system becoming 
unresponsive and spontaneously rebooting.

Note that this is was the last message in the log before the reboot, so it 
sticks out but isn't definitively the issue. That said, the server wasn't 
doing anything else at the time.
Comment 6 Mason Loring Bliss freebsd_triage 2020-07-07 13:23:54 UTC
Sorry, forgot to mention that this happened on 12.1-RELEASE-p6.
Comment 7 Mason Loring Bliss freebsd_triage 2020-07-25 21:28:24 UTC
A workaround, thanks to kevans and antranigv:

    exec.prestop = "/usr/sbin/jexec ${name} /bin/sh /etc/rc.shutdown";
    exec.prestop += "/sbin/ifconfig epair${ep}b -vnet ${name}";

    # no exec.stop needed in this case - prestop is doing it, since
    # that's the last chance we'll have to do things with host context
    # before the jail is torn down

    exec.poststop = "ifconfig $bridge deletem epair${ep}a";
    exec.poststop += "ifconfig epair${ep}a destroy";

...which says, do a clean shutdown of everything, which will mean that 
nothing has the network yanked out from under it while running, and then 
disassociate the NIC from the jail, after which the jail can be disposed of 
normally.
Comment 8 Kristof Provost freebsd_committer freebsd_triage 2021-04-23 15:52:26 UTC
Fixed by melifaro@ in b1d63265ac399112b3bca36c3d75df1a3c2c8102