I see multiple warnings from the UMA keg code when running the netipsec tunnel tests about leaked UMA kegs. From https://ci.freebsd.org/job/FreeBSD-head-amd64-test/11014/consoleText : sys/netipsec/tunnel/aes_cbc_128_hmac_sha1:v4 -> lock order reversal: 1st 0xffffffff82096810 allprison (allprison) @ /usr/src/sys/kern/kern_jail.c:966 2nd 0xffffffff820c4830 vnet_sysinit_sxlock (vnet_sysinit_sxlock) @ /usr/src/sys/net/vnet.c:575 stack backtrace: #0 0xffffffff80c47773 at witness_debugger+0x73 #1 0xffffffff80c474bd at witness_checkorder+0xa7d #2 0xffffffff80be9008 at _sx_slock_int+0x68 #3 0xffffffff80d0f687 at vnet_alloc+0x117 #4 0xffffffff80ba40a1 at kern_jail_set+0x1bb1 #5 0xffffffff80ba5b00 at sys_jail_set+0x40 #6 0xffffffff810b2e16 at amd64_syscall+0x276 #7 0xffffffff8108b5fd at fast_syscall_common+0x101 Freed UMA keg (rtentry) was not empty (18 items). Lost 1 pages of memory. passed [0.301s] sys/netipsec/tunnel/aes_cbc_128_hmac_sha1:v6 -> passed [0.278s] sys/netipsec/tunnel/aes_cbc_256_hmac_sha2_256:v4 -> aesni0: <AES-CBC,AES-XTS,AES-GCM,AES-ICM> on motherboard Freed UMA keg (rtentry) was not empty (18 items). Lost 1 pages of memory. Freed UMA keg (rtentry) was not empty (18 items). Lost 1 pages of memory. passed [0.275s] sys/netipsec/tunnel/aes_cbc_256_hmac_sha2_256:v6 -> passed [0.278s] sys/netipsec/tunnel/aes_gcm_128:v4 -> aesni0: detached Apr 29 18:27:50 kernel: nd6_dad_timer: called with non-tentative address fe80:3::20:9bff:fe26:1c0a(epair2a) Freed UMA keg (rtentry) was not empty (18 items). Lost 1 pages of memory. Freed UMA keg (rtentry) was not empty (18 items). Lost 1 pages of memory. passed [0.285s] sys/netipsec/tunnel/aes_gcm_128:v6 -> passed [0.278s] sys/netipsec/tunnel/aes_gcm_256:v4 -> Freed UMA keg (rtentry) was not empty (18 items). Lost 1 pages of memory. Freed UMA keg (rtentry) was not empty (18 items). Lost 1 pages of memory. passed [0.287s] sys/netipsec/tunnel/aes_gcm_256:v6 -> passed [0.280s] sys/netipsec/tunnel/aesni_aes_cbc_128_hmac_sha1:v4 -> aesni0: <AES-CBC,AES-XTS,AES-GCM,AES-ICM> on motherboard Freed UMA keg (rtentry) was not empty (18 items). Lost 1 pages of memory. Freed UMA keg (rtentry) was not empty (18 items). Lost 1 pages of memory. passed [0.291s] sys/netipsec/tunnel/aesni_aes_cbc_128_hmac_sha1:v6 -> passed [0.292s] sys/netipsec/tunnel/aesni_aes_cbc_256_hmac_sha2_256:v4 -> aesni0: detached Freed UMA keg (rtentry) was not empty (18 items). Lost 1 pages of memory. Freed UMA keg (rtentry) was not empty (18 items). Lost 1 pages of memory. passed [0.296s] sys/netipsec/tunnel/aesni_aes_cbc_256_hmac_sha2_256:v6 -> passed [0.257s] sys/netipsec/tunnel/aesni_aes_gcm_128:v4 -> aesni0: <AES-CBC,AES-XTS,AES-GCM,AES-ICM> on motherboard Freed UMA keg (rtentry) was not empty (18 items). Lost 1 pages of memory. Freed UMA keg (rtentry) was not empty (18 items). Lost 1 pages of memory. passed [0.256s] sys/netipsec/tunnel/aesni_aes_gcm_128:v6 -> passed [0.254s] sys/netipsec/tunnel/aesni_aes_gcm_256:v4 -> Freed UMA keg (rtentry) was not empty (18 items). Lost 1 pages of memory. Freed UMA keg (rtentry) was not empty (18 items). Lost 1 pages of memory. passed [0.262s] sys/netipsec/tunnel/aesni_aes_gcm_256:v6 -> passed [0.291s] sys/netipsec/tunnel/empty:v4 -> Freed UMA keg (rtentry) was not empty (18 items). Lost 1 pages of memory. Freed UMA keg (rtentry) was not empty (18 items). Lost 1 pages of memory. passed [0.285s] sys/netipsec/tunnel/empty:v6 -> passed [0.282s] Suggested repro: * Boot a WITNESS enabled kernel with IPSEC support. * Run `sudo kyua test -k /usr/tests/sys/netipsec/tunnel/Kyuafile`.
Missing cleanup with some component during VNET shutdown.
r346890 before I go and look it up again
(In reply to Bjoern A. Zeeb from comment #2) Seems r346890 https://reviews.freebsd.org/rS346890 is the wrong revision?
The leak report goes all the way back to the introduction of the test in r326497, two years ago. https://ci.freebsd.org/job/FreeBSD-head-amd64-test/5285/ https://svnweb.freebsd.org/base?view=revision&revision=326497 Here is a narrower repro that worked for me on a GENERIC-NODEBUG kernel: kldload ipsec kyua test -k /usr/tests/sys/netipsec/tunnel/Kyuafile empty:v4 The ping in ist_test() appears to be required for the leak, no leak is reported when it is commented out. Adding a route -n flush after the ping did not rescue the leak.
I saw this (or a very similar) bug just now while shutting down a couple jails. Jul 7 08:57:16 nile kernel: epair2a: promiscuous mode disabled Jul 7 08:57:16 nile kernel: epair2a: link state changed to DOWN Jul 7 08:57:16 nile kernel: epair2b: link state changed to DOWN Jul 7 08:57:16 nile kernel: in6_purgeaddr: err=65, destination address delete failed Jul 7 08:57:16 nile kernel: Freed UMA keg (rtentry) was not empty (54 items). Lost 3 pages of memory. I had shut down one additional jail a moment before, and then two more, and it was prior to this returning that I experienced a spontaneous reboot of the host. The jail config in action for this: exec.prestart += "ifconfig epair${ep} create up"; exec.prestart += "ifconfig $bridge addm epair${ep}a"; exec.prestart += "ifconfig epair${ep}b link $mac"; exec.start += "dhclient epair${ep}b"; exec.poststop += "ifconfig $bridge deletem epair${ep}a"; exec.poststop += "ifconfig epair${ep}a destroy"; I believe this may have been triggered by a user error, where I forgot to increment one of my jails' ep values before starting it yesterday, thus trying to run the prestart sequence against an already extant epair. It failed, I noticed and corrected the error, and had no subsequent problems, but it's notable that my first attempt to run deletem on the epair that had the bad, duplicate create attempt yesterday resulted in the system becoming unresponsive and spontaneously rebooting. Note that this is was the last message in the log before the reboot, so it sticks out but isn't definitively the issue. That said, the server wasn't doing anything else at the time.
Sorry, forgot to mention that this happened on 12.1-RELEASE-p6.
A workaround, thanks to kevans and antranigv: exec.prestop = "/usr/sbin/jexec ${name} /bin/sh /etc/rc.shutdown"; exec.prestop += "/sbin/ifconfig epair${ep}b -vnet ${name}"; # no exec.stop needed in this case - prestop is doing it, since # that's the last chance we'll have to do things with host context # before the jail is torn down exec.poststop = "ifconfig $bridge deletem epair${ep}a"; exec.poststop += "ifconfig epair${ep}a destroy"; ...which says, do a clean shutdown of everything, which will mean that nothing has the network yanked out from under it while running, and then disassociate the NIC from the jail, after which the jail can be disposed of normally.
Fixed by melifaro@ in b1d63265ac399112b3bca36c3d75df1a3c2c8102