Consider the following simple loop: while(1) netstat >/dev/null end Running this loop will increase wired memory as measured by vm.stats.vm.v_wire_count. This memory is never reclaimed by the OS, which can eventually lead to swapping, thrashing, and then a panic.
Could you look at the output of 'vmstat -m' and 'vmstat -z' and see if any malloc type and/or zone is the cause?
So I am not a modern kernel dev. That being said, I currently data over time of: - anything in sysctl - vmstat -z ... malloc buckets and UMA I can fairly easily monitor almost any other value I can get with a command, I just need to know what to monitor exactly (*all* of vmstat -m or vmstat -z is probably not indicated and a waste of prometheus space). Now having said that, I did this: # vmstat -m | sort -k 3 ... bus-sc 139 5134K 12172 16,32,64,128,256,384,512,1024,2048,4096,8192,16384,32768,65536 iflib 813 6241K 833 16,64,128,512,1024,2048,8192,16384,32768,65536 solaris 113 18692K 233 16,32,64,128,256,512,1024,2048,4096,8192,65536 md_sectors 7260 29040K 7260 4096 devbuf 28312 46705K 35453 16,32,64,128,256,384,1024,2048,4096,8192,65536 pf_hash 5 51204K 5 2048 Type InUse MemUse Requests Size(s) Does this help any?
Created attachment 242004 [details] Graph over past 24 hours of vm.stats.vm.v_wire_count I've added this graph in case that helps.
I should finally mention that, in my monitoring system, netstat runs twice every 5 seconds on the graph above. That might provide the correct context for the above data, and shed some light on how I discovered this issue.
If I know which zone or malloc type leaks, I would not ask you to look which one leaks. You need to find, on your machine, if any of malloc type increases steadily.
When you say "malloc_type", do you mean one of these? # vmstat -z | grep malloc malloc-16: 16, 0, 22655, 6073,105306080, 0, 0, 0 malloc-32: 32, 0, 21773, 4939,11701599, 0, 0, 0 malloc-64: 64, 0, 18498, 5379,212097839, 0, 0, 0 malloc-128: 128, 0, 36208, 2480,82557120, 0, 0, 0 malloc-256: 256, 0, 4415, 1300,13300795, 0, 0, 0 malloc-384: 384, 0, 565, 575, 145683, 0, 0, 0 malloc-512: 512, 0, 413, 459, 209713, 0, 0, 0 malloc-1024: 1024, 0, 342, 218, 5870330, 0, 0, 0 malloc-2048: 2048, 0, 308, 156, 237219, 0, 0, 0 malloc-4096: 4096, 0, 9866, 78, 6516830, 0, 0, 0 malloc-8192: 8192, 0, 618, 18, 1794, 0, 0, 0 malloc-16384: 16384, 0, 24, 23, 1480, 0, 0, 0 malloc-32768: 32768, 0, 145, 46, 17038, 0, 0, 0 malloc-65536: 65536, 0, 18, 18, 166444, 0, 0, 0 I ask this because... # vmstat -z | wc -l 243 :)
(In reply to Dave Hayes from comment #6) vmstat -z output lists zones. Zones you quoted are used to back the malloc(9) allocator. Malloc types are listed with vmstat -m. Again, I am asking you to look for the outstanding *growing* malloc types and zones.
First, I do understand that you want me to check for "the cause" and "the outstanding growing malloc zones". Next I also do understand vmstat -z produces output related to zones, and vmstat -m produces output related to types. You do not need to re-inform me of these ideas, but if you choose to again that choice is out of my hands. I am not intending to frustrate you. :) My unanswered questions are based on the idea that the two switches to vmstat produce hundreds of different data points: vmstat -z | wc -l 274 vmstat -z | wc -l 218 This is a lot of data (492 points) to check for changes. Naturally, I'd like to reduce this check (which obviously has to be scripted) to data which is relevant. With that, would you be so kind as to answer the following question? Are you asking me to check each and every entry for each switch -m and -z to vmstat to see what is growing OR is it possible to reduce the number of data points to a specific useful subset? Thank you for your patience, as I am not a (modern) kernel dev.
Yes, I am asking to check each entry. This is the way to trim down the search area for kernel memory leak.
Ok. Thank you for clarifying. It turns out this was easier than I thought it would be. Here's output from my script (source code on request). I stopped at > 128 pages of lost wired memory but I can run this longer if you like. Stopped at > 128 pages of lost wired memory. Total wired change: 130 pages (vmstat -z) PROC grew 1 bytes in that time period (vmstat -z) mbuf_jumbo_page grew -1 bytes in that time period (vmstat -z) pf states grew 248 bytes in that time period (vmstat -z) BUF TRIE grew -12 bytes in that time period (vmstat -z) pf state scrubs grew -3 bytes in that time period (vmstat -z) buf free cache grew 4 bytes in that time period (vmstat -z) malloc-32 grew 8 bytes in that time period (vmstat -z) filedesc0 grew 1 bytes in that time period (vmstat -z) tcpcb grew 4 bytes in that time period (vmstat -z) PGRP grew 1 bytes in that time period (vmstat -z) 4 Bucket grew -1 bytes in that time period (vmstat -z) 128 Bucket grew -7 bytes in that time period (vmstat -z) MAP ENTRY grew 50 bytes in that time period (vmstat -z) Files grew 8 bytes in that time period (vmstat -z) tcp_inpcb grew 4 bytes in that time period (vmstat -z) VM OBJECT grew 27 bytes in that time period (vmstat -z) 32 Bucket grew -2 bytes in that time period (vmstat -z) mbuf grew 3 bytes in that time period (vmstat -z) buffer arena-32 grew -1 bytes in that time period (vmstat -z) RADIX NODE grew 56 bytes in that time period (vmstat -z) 2 Bucket grew 10 bytes in that time period (vmstat -z) udpcb grew 3 bytes in that time period (vmstat -z) vm pgcache grew 101 bytes in that time period (vmstat -z) 64 Bucket grew 1 bytes in that time period (vmstat -z) socket grew 7 bytes in that time period (vmstat -z) udp_inpcb grew 3 bytes in that time period (vmstat -z) malloc-64 grew 13 bytes in that time period (vmstat -z) malloc-16 grew 1 bytes in that time period (vmstat -z) malloc-4096 grew 1 bytes in that time period (vmstat -z) mbuf_cluster grew -1 bytes in that time period (vmstat -z) VMSPACE grew 1 bytes in that time period (vmstat -z) KNOTE grew 6 bytes in that time period (vmstat -z) pf state keys grew 246 bytes in that time period (vmstat -z) malloc-256 grew -34 bytes in that time period (vmstat -m) freework grew by -3 Kbytes in that time period) (vmstat -m) subproc grew by 4 Kbytes in that time period) (vmstat -m) freeblks grew by -1 Kbytes in that time period) (vmstat -m) select grew by 1 Kbytes in that time period) (vmstat -m) newblk grew by -5 Kbytes in that time period) (vmstat -m) freefile grew by -1 Kbytes in that time period) (vmstat -m) bmsafemap grew by -1 Kbytes in that time period) (vmstat -m) pcb grew by 1 Kbytes in that time period) These numbers come from slurping in --libxo json output. For vmstat -z I use the "name" and "used" keys. For vmstat -m I use the "type" and "memory-use" keys. From this output, my likely naive analysis is that your leak is in -none- of these areas. :)
There is something wrong in your scripts. Zones grow by the size of the items allocated, so e.g. the PROC zone cannot grow by 1 byte.
What is wrong is likely the label "bytes". Here's a sample output from vmstat -z --libxo json (formatted for clarity): { "fail" : 0, "free" : 0, "limit" : 0, "name" : "buffer arena-20", "requests" : 0, "name" : "PROC", "requests" : 3371179, "size" : 1360, "sleep" : 0, "used" : 69, "xdomain" : 0 }, So I use the 'used' field. If the units are not in bytes, then I was ignorant of that. What unit is the 'used' field using?
(In reply to Dave Hayes from comment #12) Unit is the size of the item. Keep the thing running until you see that the machine reached the almost unusable state, then look at the vmstat -z/vmstat -m status.
Are you aware of this idea? # vmstat -z | grep pgca vm pgcache: 4096, 0, 632833, 2480,2107607698, 0, 0, 0 vm pgcache: 4096, 0, 589518, 1688,865006857, 74, 0, 0 My quickly written script does not handle monitoring both of those (this duplication also appears in the json output).
Ok now I'm handling duplicated names (there was another one too "buffer arena-40"). The machine I have to test this on is "live" but I can let this run until it loses a fair bit of memory and do a controlled reboot. This will take a while as it appears to be a very slow memory leak.
(In reply to Dave Hayes from comment #14) These are per-cpu page cache queues.
So far, one of the vm pgcache entries in vmstat -z is leaking the most. Do I have to do extra gymnastics to find out which entry it is somehow?
Do you have the commit 6094749a1a5dafb8daf98deab23fc968070bc695 in your source tree? You should since you specified 13.2-STABLE, but the symptoms require re-checking.
I have manually inspected sys/vm/vm_page.c for the listed commit. It is there.
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256507 ... in case you are unaware that I also initiated that one. In 12.4, this memory leak is only visible if you do specific math on the sysctls. In 13.2, it at least shows that wired is leaking slowly, but the "root bug" has yet to be found.
[Mon May 8 10:23:42 2023] (842226) (wired) Changed by 7 pages (now at 4.205 Gb, total change since start 13210 pages) [Mon May 8 10:23:42 2023] (842226) (vmstat -z) vm pgcache (18) size changed by 73 units of whatever the 'used' field is in (14280 since start) One of the vm pgcache entries is still showing the most leakage. Should I continue this test?
Created attachment 242068 [details] Graphical evidence of vm pgcache being the source of the leak
After 48 hours here is the top of the file containing output of the script: Total wired change: 37450 pages (vmstat -z) vm pgcache (18) grew 43426 units of whatever the 'used' field is in inside that time period (vmstat -z) RADIX NODE (41) grew 601 units of whatever the 'used' field is in inside that time period (vmstat -z) vm pgcache (19) grew 565 units of whatever the 'used' field is in inside that time period (vmstat -z) UMA Slabs 0 (25) grew 277 units of whatever the 'used' field is in inside that time period (vmstat -z) mbuf_cluster (94) grew 252 units of whatever the 'used' field is in inside that time period (vmstat -z) pf state keys (158) grew 228 units of whatever the 'used' field is in inside that time period (vmstat -z) pf states (157) grew 197 units of whatever the 'used' field is in inside that time period
43426 cached pages is equal to around 170MB of cached free memory. First, this is not wired memory, second, it is allocatable on demand. There should be something else that leaks wirings. It would be useful to see e.g. simple output from the top(1) Mem: line when the situation occurs.
So, as part of monitoring, the machine in question does repeated netstats and hence is always in that state. It's just a matter of figuring out what statistics are useful to you in finding the problem, as I likely collect way too much. As you might have noticed, I can produce graphs of any sysctl MIB that is exposed to root users. Here is the information you requested: Mem: 66M Active, 1015M Inact, 4439M Wired, 1419M Buf, 10G Free It took some weeks for the machine to start thrashing last time, this is because the leak is very slow. My while loop above is just to make the leak more apparent. That vm pgcache value is still growing, even though I turned the loop off (as is wired memory).
(In reply to Dave Hayes from comment #25) So there is nothing unusual in the data you shown, and more, there were no de-facto demonstration of the problem. I do not see a leak in any of the numbers you provided so far. Gather the data I requested, from the machine when it does have the problem.
That will be impossible since once it does have the problem, I cannot log in and command prompts do not work, as evidenced by previous interactions with a machine with this bug. If the attached graph doesn't show a problem to you, and nor does the additional idea that the sign of the slope of that graph hasn't changed in all this time, then I am at a loss as to what will demonstrate this issue to your opaque standards of "demonstrated". Perhaps if you refrained from treating me like a mushroom and actually explained your reasoning, I might find another way to demonstrate this issue. If you don't care at all, I can respect that, but at least tell me that you do not care.
Hi, I encountered the same issue. Based on the sysctl outputs below, I think the lost memory is due to unswappable pages because the unswappable page count is increasing and RAM is decreasing in the outputs of the top command. However, I don't know why. vm.stats.vm.v_laundry_count: 56910 vm.stats.vm.v_inactive_count: 329390 vm.stats.vm.v_active_count: 313255 vm.stats.vm.v_wire_count: 132393 vm.stats.vm.v_free_count: 44036 vm.stats.vm.v_page_count: 1000174 vm.stats.vm.v_page_size: 4096 vm.domain.0.stats.unswappable: 98333 vm.domain.0.stats.laundry: 56910 vm.domain.0.stats.inactive: 329390 vm.domain.0.stats.active: 313255 vm.domain.0.stats.free_count: 44036