Summary: | [vnet] Memory leak in VNET | ||
---|---|---|---|
Product: | Base System | Reporter: | Poul-Henning Kamp <phk> |
Component: | kern | Assignee: | Bjoern A. Zeeb <bz> |
Status: | Closed FIXED | ||
Severity: | Affects Many People | CC: | bob.cauthen, bz, corvin.wimmer, emaste, freebsd, jack.r.maco, koobs, rodrigc, seanc, spolyack, sudarshan.nallanchakravarthy |
Priority: | Normal | Keywords: | needs-patch, needs-qa, vimage |
Version: | 9.0-STABLE | Flags: | koobs:
mfc-stable10?
koobs: mfc-stable9? |
Hardware: | Any | ||
OS: | Any |
Description
Poul-Henning Kamp
2012-02-04 15:10:09 UTC
Our current rule of thumb is "if you need to shutdown one vimage jail then you need to reboot the host." So we just shut down the services in each jail, leave the jails themselves up, and just reboot the host. Of course this is far from optimal. Is this PR still on the radar of anyone? This is still occurring 1n RELEASE 8.x, RELEASE 9.x, and RELEASE 10.0. I tracked this issue down a little. I put some printf's and found that this occurs in the function vnet_destroy() in sys/net/vnet.c. The memory leak seems to happen when vnet_sysuninit() is called. So, something needs to be done before or inside vnet_sysuninit(), but I do not know what to do. The messages I saw on head and stable/10 around 2014-04-30 are: Freed UMA keg (udp_inpcb) was not empty (135 items). Lost 9 pages of memory. Freed UMA keg (udpcb) was not empty (250 items). Lost 1 pages of memory. Freed UMA keg (tcp_inpcb) was not empty (75 items). Lost 5 pages of memory. Freed UMA keg (tcpcb) was not empty (15 items). Lost 3 pages of memory. ----- Forwarded message from "Bjoern A. Zeeb" <bz@FreeBSD.org> ----- Date: Thu, 22 May 2014 15:16:43 +0000 From: "Bjoern A. Zeeb" <bz@FreeBSD.org> To: "Hiroo Ono (å°éå¯ç)" <hiroo.ono+freebsd@gmail.com> Cc: freebsd-bugs@FreeBSD.org Subject: Re: kern/164763: [vimage] Memory leak in VNET X-Mailer: Apple Mail (2.1874) The fixes for UDP have been in a perforce branch for two years and need updating and merging into HEAD. TCP was and is the only thing that could not be completely freed (back two years ago) synchronously and thus would continue to leak. Itâs not not fixable and probably with some tw changes lately got better (or maybe they didnât happen). â Bjoern A. Zeeb "Come on. Learn, goddamn it.", WarGames, 1983 ----- End forwarded message ----- Bjoern gave a link to his Perforce repo here: https://lists.freebsd.org/pipermail/freebsd-net/2014-October/040075.html It would be good to merge these memory leak fixes into FreeBSD HEAD Following appeared on freebsd-bugs@ so adding here: Hello FreeBSD folks, I'm Sudarshan, a Software Engineer from a Networking team at NetApp. While working on a bug, I observed a memory leak in keg_dtor() and looking at the call stack it looks like the FreeBSD bug 164763, i.e. memory leak during vnet_sysuninit() triggered from the UDP path. Reference: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=164763 I read that the fix has been in the perforce branch for a while now and not sure if it got merged into HEAD. If the fix is already available as a patch, can you point me to the location? If not, can you provide an estimate as when would it be available? Looking forward to hear. Thanks, Sudarshan (In reply to Hiren Panchasara from comment #6) Thanks Hiren. All, Is this memory leak fixed in FreeBSD 10 or has it been planned to do that sometime? Would be helpful to know the current status. Sudarshan Issue cant be In Progress without an Assignee. Resetting. Just here to report that the memory leaks still happen on 10.3-STABLE (r297561). I have not tested on 11-CURRENT as of yet. I upgraded my jail server to latest 10-stable today and built a custom kernel which includes GENERIC and adds 'options VIMAGE' and 'device epair'. (FreeBSD jupiter 10.3-STABLE FreeBSD 10.3-STABLE #0 r297561: Mon Apr 4 21:39:12 CEST 2016 root@jupiter:/usr/obj/usr/src/sys/JupiterVIMAGE amd64) After creating a jail and assigning two epairXb devices to it, and then stopping the jail, I still observe the following dmesg output on host: Freed UMA keg (udp_inpcb) was not empty (240 items). Lost 24 pages of memory. Freed UMA keg (udpcb) was not empty (2171 items). Lost 13 pages of memory. Freed UMA keg (tcp_inpcb) was not empty (90 items). Lost 9 pages of memory. Freed UMA keg (tcpcb) was not empty (27 items). Lost 9 pages of memory. Freed UMA keg (ripcb) was not empty (90 items). Lost 9 pages of memory. The interfaces are back on host, which is as expected, since I never executed a command to actually destroy them. If someone want to guide me through debugging these memory leaks, I'd be happy to provide any information required to get to the bottom of this. I do have a 11-CURRENT system I can use for this purpose as well. I have Reviews in PB for most of these and should commit them to HEAD the next days. If you want to give them a try sooner, I can probably do that this afternoon. There might be a possible leak still for one of the TCP zones after that but I was not able to reproduce it yet. I'm glad to hear fixes are inbound! :) Waiting a few days is fine, I'm not in a hurry on this. A commit references this bug: Author: bz Date: Sat Apr 9 10:44:58 UTC 2016 New revision: 297735 URL: https://svnweb.freebsd.org/changeset/base/297735 Log: Mfp: r296345 No need to keep type stability on raw sockets zone. We've also been running with a KASSERT since r222488 to make sure the ipi_count is 0 on destroy. PR: 164763 Reviewed by: gnn MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D5735 Changes: head/sys/netinet/raw_ip.c A commit references this bug: Author: bz Date: Sat Apr 9 10:58:08 UTC 2016 New revision: 297738 URL: https://svnweb.freebsd.org/changeset/base/297738 Log: Mfp: r296259 We attach the "counter" to the tcpcbs. Thus don't free the TCP Fastopen zone before the tcpcbs are gone, as otherwise the zone won't be empty. With that it should be safe to destroy the "tfo" zone without leaking the memory. PR: 164763 Reviewed by: gnn MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D5731 Changes: head/sys/netinet/tcp_fastopen.c head/sys/netinet/tcp_subr.c A commit references this bug: Author: bz Date: Sat Apr 9 11:27:48 UTC 2016 New revision: 297740 URL: https://svnweb.freebsd.org/changeset/base/297740 Log: Mfp: r296260 The tcp_inpcb (pcbinfo) zone should be safe to destroy. PR: 164763 Reviewed by: gnn MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D5732 Changes: head/sys/netinet/tcp_subr.c A commit references this bug: Author: bz Date: Sat Apr 9 12:05:24 UTC 2016 New revision: 297742 URL: https://svnweb.freebsd.org/changeset/base/297742 Log: Mfp: r296310,r296343 It looks like as with the safety belt of DELAY() fastened (*) we can completely tear down and free all memory for TCP (after r281599). (*) in theory a few ticks should be good enough to make sure the timers are all really gone. Could we use a better matric here and check a tcbcb count as an optimization? PR: 164763 Reviewed by: gnn, emaste MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D5734 Changes: head/sys/netinet/tcp_subr.c On r297742, -CURRENT, I still seem to get a memory leak when restarting jail, but I'm not able to reliably reproduce it. Apr 14 14:48:15 venus kernel: ifa_maintain_loopback_route: deletion failed for interface lo0: 48 Apr 14 14:48:15 venus kernel: Freed UMA keg (tcptw) was not empty (765 items). Lost 17 pages of memory. uname: FreeBSD venus 11.0-CURRENT FreeBSD 11.0-CURRENT #1 r297742M: Sat Apr 9 14:21:50 CEST 2016 root@venus:/usr/obj/usr/src/sys/Venus amd64 I'll try to find a reliable way of reproducing it this weekend. I am not entirely sure, but it seems like the memory leak: Freed UMA keg (tcptw) was not empty (630 items). Lost 14 pages of memory. correlates with large transfers over TCP. I was using nginx (without accf_http, not that it should matter), and had someone download some garbage from me. Potentially relevant nginx settings: sendfile, tcp_nopush, tcp_nodelay are enabled. Output buffers: 1 512k. Potentially relevant network stack settings: net.inet.tcp.cc.algorithm=htcp net.inet.tcp.cc.htcp.adaptive_backoff=1 net.inet.tcp.cc.htcp.rtt_scaling=1 Just to confirm; I was able to reproduce the TCP leaks; looking into it. In case you have cycles, can you test the projects/vnet branch (I am trying to keep it up to date with HEAD weekly)? Do not try anything related to pf (or ipfilter) yet however. Any feedback welcome. Can you please try FreeBSD 11.0-ALPHA6 or later? (In reply to Bjoern A. Zeeb from comment #20) I've tested 11.0-BETA2 using iperf3 in both directions between two jails with UDP and TCP. I couldn't find any of the usual keg leakage on VNET teardown. (In reply to Corvin Wimmer from comment #21) And I assume no panics either? Thanks a lot for the feedback! (In reply to Bjoern A. Zeeb from comment #22) No panics either. I was able to crash the system using iocage by recreating stopped jails but was not able to reproduce that specific issue using jail(8) manually. Not sure if this is helpful or not but I thought I would post in; FreeBSD 11-RC2, test command: perl -e 'my $count=100000; while($count--) { print "Remaining: $count\n"; `jail -l -u root -c path=/jails/jail1 name=foo persist vnet && jexec foo ifconfig lo0 127.0.0.1/8 && jail -r foo` }' I was only watching top, no debug tools enabled, as this is a raw base install no other daemons or such. It completed the entire 100K restarts with zero memory problems. As an aside top looked the same when I started to when it completed but /var/log/messages had 100K ifa_maintain_loopback_route: deletion failed for interface lo0: 48 So ... can we assume VIMAGE is finally safe to restart jails in? :) .. I use /etc/jail.conf instead of iocage It appears these issues have been resolved in 11.0. Please submit a new PR if additional memory leaks are found. |