Summary: | "Freed UMA keg (rtentry) was not empty (18 items). Lost 1 pages of memory." seen when running sys/netipsec tests | ||
---|---|---|---|
Product: | Base System | Reporter: | Enji Cooper <ngie> |
Component: | kern | Assignee: | Kristof Provost <kp> |
Status: | Closed FIXED | ||
Severity: | Affects Some People | CC: | ae, bz, emaste, lwhsu, mason, mjg, ota, rlibby |
Priority: | --- | Keywords: | vimage |
Version: | CURRENT | ||
Hardware: | Any | ||
OS: | Any |
Description
Enji Cooper
2019-04-29 19:48:01 UTC
Missing cleanup with some component during VNET shutdown. r346890 before I go and look it up again (In reply to Bjoern A. Zeeb from comment #2) Seems r346890 https://reviews.freebsd.org/rS346890 is the wrong revision? The leak report goes all the way back to the introduction of the test in r326497, two years ago. https://ci.freebsd.org/job/FreeBSD-head-amd64-test/5285/ https://svnweb.freebsd.org/base?view=revision&revision=326497 Here is a narrower repro that worked for me on a GENERIC-NODEBUG kernel: kldload ipsec kyua test -k /usr/tests/sys/netipsec/tunnel/Kyuafile empty:v4 The ping in ist_test() appears to be required for the leak, no leak is reported when it is commented out. Adding a route -n flush after the ping did not rescue the leak. I saw this (or a very similar) bug just now while shutting down a couple jails. Jul 7 08:57:16 nile kernel: epair2a: promiscuous mode disabled Jul 7 08:57:16 nile kernel: epair2a: link state changed to DOWN Jul 7 08:57:16 nile kernel: epair2b: link state changed to DOWN Jul 7 08:57:16 nile kernel: in6_purgeaddr: err=65, destination address delete failed Jul 7 08:57:16 nile kernel: Freed UMA keg (rtentry) was not empty (54 items). Lost 3 pages of memory. I had shut down one additional jail a moment before, and then two more, and it was prior to this returning that I experienced a spontaneous reboot of the host. The jail config in action for this: exec.prestart += "ifconfig epair${ep} create up"; exec.prestart += "ifconfig $bridge addm epair${ep}a"; exec.prestart += "ifconfig epair${ep}b link $mac"; exec.start += "dhclient epair${ep}b"; exec.poststop += "ifconfig $bridge deletem epair${ep}a"; exec.poststop += "ifconfig epair${ep}a destroy"; I believe this may have been triggered by a user error, where I forgot to increment one of my jails' ep values before starting it yesterday, thus trying to run the prestart sequence against an already extant epair. It failed, I noticed and corrected the error, and had no subsequent problems, but it's notable that my first attempt to run deletem on the epair that had the bad, duplicate create attempt yesterday resulted in the system becoming unresponsive and spontaneously rebooting. Note that this is was the last message in the log before the reboot, so it sticks out but isn't definitively the issue. That said, the server wasn't doing anything else at the time. Sorry, forgot to mention that this happened on 12.1-RELEASE-p6. A workaround, thanks to kevans and antranigv: exec.prestop = "/usr/sbin/jexec ${name} /bin/sh /etc/rc.shutdown"; exec.prestop += "/sbin/ifconfig epair${ep}b -vnet ${name}"; # no exec.stop needed in this case - prestop is doing it, since # that's the last chance we'll have to do things with host context # before the jail is torn down exec.poststop = "ifconfig $bridge deletem epair${ep}a"; exec.poststop += "ifconfig epair${ep}a destroy"; ...which says, do a clean shutdown of everything, which will mean that nothing has the network yanked out from under it while running, and then disassociate the NIC from the jail, after which the jail can be disposed of normally. Fixed by melifaro@ in b1d63265ac399112b3bca36c3d75df1a3c2c8102 |