271246 – Kernel wired memory leak with repeated netstat

Bug 271246 - Kernel wired memory leak with repeated netstat

Summary: Kernel wired memory leak with repeated netstat

Status:	Open

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	13.2-STABLE
Hardware:	Any Any

Importance:	--- Affects Some People
Assignee:	freebsd-bugs (Nobody)

URL:
Keywords:

Depends on:
Blocks:

Reported:	2023-05-04 18:43 UTC by Dave Hayes
Modified:	2023-12-14 09:39 UTC (History)
CC List:	7 users (show)

See Also:

Attachments
Graph over past 24 hours of vm.stats.vm.v_wire_count (33.59 KB, image/png) 2023-05-05 19:36 UTC, Dave Hayes	no flags	Details
Graphical evidence of vm pgcache being the source of the leak (28.00 KB, image/png) 2023-05-08 23:05 UTC, Dave Hayes	no flags	Details
Show Obsolete (1) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Dave Hayes 2023-05-04 18:43:27 UTC

Consider the following simple loop:

while(1) 
  netstat >/dev/null
end

Running this loop will increase wired memory as measured by vm.stats.vm.v_wire_count. This memory is never reclaimed by the OS, which can eventually lead to swapping, thrashing, and then a panic.

Comment 1 Konstantin Belousov freebsd_committer

2023-05-05 17:49:02 UTC

Could you look at the output of 'vmstat -m' and 'vmstat -z' and see if any
malloc type and/or zone is the cause?

Comment 2 Dave Hayes 2023-05-05 19:32:37 UTC

So I am not a modern kernel dev. That being said, I currently data over time of:

- anything in sysctl 
- vmstat -z ... malloc buckets and UMA 

I can fairly easily monitor almost any other value I can get with a command, I just need to know what to monitor exactly (*all* of vmstat -m or vmstat -z is probably not indicated and a waste of prometheus space).

Now having said that, I did this:

# vmstat -m | sort -k 3
...
      bus-sc   139  5134K    12172  16,32,64,128,256,384,512,1024,2048,4096,8192,16384,32768,65536
        iflib   813  6241K      833  16,64,128,512,1024,2048,8192,16384,32768,65536
      solaris   113 18692K      233  16,32,64,128,256,512,1024,2048,4096,8192,65536
   md_sectors  7260 29040K     7260  4096
       devbuf 28312 46705K    35453  16,32,64,128,256,384,1024,2048,4096,8192,65536
      pf_hash     5 51204K        5  2048
         Type InUse MemUse Requests  Size(s)

Does this help any?

Comment 3 Dave Hayes 2023-05-05 19:36:16 UTC

Created attachment 242004 [details]
Graph over past 24 hours of vm.stats.vm.v_wire_count

I've added this graph in case that helps.

Comment 4 Dave Hayes 2023-05-05 19:38:39 UTC

I should finally mention that, in my monitoring system, netstat runs twice every 5 seconds on the graph above. That might provide the correct context for the above data, and shed some light on how I discovered this issue.

Comment 5 Konstantin Belousov freebsd_committer

2023-05-05 20:00:54 UTC

If I know which zone or malloc type leaks, I would not ask you to look which
one leaks.

You need to find, on your machine, if any of malloc type increases steadily.

Comment 6 Dave Hayes 2023-05-06 05:39:34 UTC

When you say "malloc_type", do you mean one of these?

# vmstat -z | grep malloc
malloc-16:               16,      0,   22655,    6073,105306080,   0,   0,   0
malloc-32:               32,      0,   21773,    4939,11701599,   0,   0,   0
malloc-64:               64,      0,   18498,    5379,212097839,   0,   0,   0
malloc-128:             128,      0,   36208,    2480,82557120,   0,   0,   0
malloc-256:             256,      0,    4415,    1300,13300795,   0,   0,   0
malloc-384:             384,      0,     565,     575,  145683,   0,   0,   0
malloc-512:             512,      0,     413,     459,  209713,   0,   0,   0
malloc-1024:           1024,      0,     342,     218, 5870330,   0,   0,   0
malloc-2048:           2048,      0,     308,     156,  237219,   0,   0,   0
malloc-4096:           4096,      0,    9866,      78, 6516830,   0,   0,   0
malloc-8192:           8192,      0,     618,      18,    1794,   0,   0,   0
malloc-16384:         16384,      0,      24,      23,    1480,   0,   0,   0
malloc-32768:         32768,      0,     145,      46,   17038,   0,   0,   0
malloc-65536:         65536,      0,      18,      18,  166444,   0,   0,   0

I ask this because...

# vmstat -z | wc -l
     243

:)

Comment 7 Konstantin Belousov freebsd_committer

2023-05-06 19:17:51 UTC

(In reply to Dave Hayes from comment #6)
vmstat -z output lists zones.  Zones you quoted are used to back the malloc(9)
allocator. Malloc types are listed with vmstat -m.

Again, I am asking you to look for the outstanding *growing* malloc types
and zones.

Comment 8 Dave Hayes 2023-05-06 20:13:25 UTC

First, I do understand that you want me to check for "the cause" and "the outstanding growing malloc zones". 

Next I also do understand vmstat -z produces output related to zones, and vmstat -m produces output related to types. 

You do not need to re-inform me of these ideas, but if you choose to again that choice is out of my hands. I am not intending to frustrate you. :)

My unanswered questions are based on the idea that the two switches to vmstat produce hundreds of different data points:

vmstat -z | wc -l
  274
vmstat -z | wc -l
  218

This is a lot of data (492 points) to check for changes. Naturally, I'd like to reduce this check (which obviously has to be scripted) to data which is relevant. 
With that, would you be so kind as to answer the following question?

Are you asking me to check each and every entry for each switch -m and -z to vmstat to see what is growing OR is it possible to reduce the number of data points to a specific useful subset? 

Thank you for your patience, as I am not a (modern) kernel dev.

Comment 9 Konstantin Belousov freebsd_committer

2023-05-06 21:19:09 UTC

Yes, I am asking  to check each entry.  This is the way to trim down the search
area for kernel memory leak.

Comment 10 Dave Hayes 2023-05-06 21:31:29 UTC

Ok. Thank you for clarifying. It turns out this was easier than I thought it would be. Here's output from my script (source code on request). I stopped at > 128 pages of lost wired memory but I can run this longer if you like. 

Stopped at > 128 pages of lost wired memory. Total wired change:   130 pages
(vmstat -z)                       PROC grew 1 bytes in that time period
(vmstat -z)            mbuf_jumbo_page grew -1 bytes in that time period
(vmstat -z)                  pf states grew 248 bytes in that time period
(vmstat -z)                   BUF TRIE grew -12 bytes in that time period
(vmstat -z)            pf state scrubs grew -3 bytes in that time period
(vmstat -z)             buf free cache grew 4 bytes in that time period
(vmstat -z)                  malloc-32 grew 8 bytes in that time period
(vmstat -z)                  filedesc0 grew 1 bytes in that time period
(vmstat -z)                      tcpcb grew 4 bytes in that time period
(vmstat -z)                       PGRP grew 1 bytes in that time period
(vmstat -z)                   4 Bucket grew -1 bytes in that time period
(vmstat -z)                 128 Bucket grew -7 bytes in that time period
(vmstat -z)                  MAP ENTRY grew 50 bytes in that time period
(vmstat -z)                      Files grew 8 bytes in that time period
(vmstat -z)                  tcp_inpcb grew 4 bytes in that time period
(vmstat -z)                  VM OBJECT grew 27 bytes in that time period
(vmstat -z)                  32 Bucket grew -2 bytes in that time period
(vmstat -z)                       mbuf grew 3 bytes in that time period
(vmstat -z)            buffer arena-32 grew -1 bytes in that time period
(vmstat -z)                 RADIX NODE grew 56 bytes in that time period
(vmstat -z)                   2 Bucket grew 10 bytes in that time period
(vmstat -z)                      udpcb grew 3 bytes in that time period
(vmstat -z)                 vm pgcache grew 101 bytes in that time period
(vmstat -z)                  64 Bucket grew 1 bytes in that time period
(vmstat -z)                     socket grew 7 bytes in that time period
(vmstat -z)                  udp_inpcb grew 3 bytes in that time period
(vmstat -z)                  malloc-64 grew 13 bytes in that time period
(vmstat -z)                  malloc-16 grew 1 bytes in that time period
(vmstat -z)                malloc-4096 grew 1 bytes in that time period
(vmstat -z)               mbuf_cluster grew -1 bytes in that time period
(vmstat -z)                    VMSPACE grew 1 bytes in that time period
(vmstat -z)                      KNOTE grew 6 bytes in that time period
(vmstat -z)              pf state keys grew 246 bytes in that time period
(vmstat -z)                 malloc-256 grew -34 bytes in that time period
(vmstat -m)                   freework grew by -3 Kbytes in that time period)
(vmstat -m)                    subproc grew by 4 Kbytes in that time period)
(vmstat -m)                   freeblks grew by -1 Kbytes in that time period)
(vmstat -m)                     select grew by 1 Kbytes in that time period)
(vmstat -m)                     newblk grew by -5 Kbytes in that time period)
(vmstat -m)                   freefile grew by -1 Kbytes in that time period)
(vmstat -m)                  bmsafemap grew by -1 Kbytes in that time period)
(vmstat -m)                        pcb grew by 1 Kbytes in that time period)

These numbers come from slurping in --libxo json output. For vmstat -z I use the "name" and "used" keys. For vmstat -m I use the "type" and "memory-use" keys. 

From this output, my likely naive analysis is that your leak is in -none- of these areas. :)

Comment 11 Konstantin Belousov freebsd_committer

2023-05-06 21:48:49 UTC

There is something wrong in your scripts.  Zones grow by the size of the items
allocated, so e.g. the PROC zone cannot grow by 1 byte.

Comment 12 Dave Hayes 2023-05-06 22:43:42 UTC

What is wrong is likely the label "bytes".

Here's a sample output from vmstat -z --libxo json (formatted for clarity):
         {
            "fail" : 0,
            "free" : 0,
            "limit" : 0,
            "name" : "buffer arena-20",
            "requests" : 0,
            "name" : "PROC",
            "requests" : 3371179,
            "size" : 1360,
            "sleep" : 0,
            "used" : 69,
            "xdomain" : 0
         },

So I use the 'used' field. If the units are not in bytes, then I was ignorant of that. What unit is the 'used' field using?

Comment 13 Konstantin Belousov freebsd_committer

2023-05-06 23:03:30 UTC

(In reply to Dave Hayes from comment #12)
Unit is the size of the item.

Keep the thing running until you see that the machine reached the almost
unusable state, then look at the vmstat -z/vmstat -m status.

Comment 14 Dave Hayes 2023-05-07 18:02:35 UTC

Are you aware of this idea?

# vmstat -z | grep pgca
vm pgcache:            4096,      0,  632833,    2480,2107607698,   0,   0,   0
vm pgcache:            4096,      0,  589518,    1688,865006857,  74,   0,   0

My quickly written script does not handle monitoring both of those (this duplication also appears in the json output).

Comment 15 Dave Hayes 2023-05-07 18:39:02 UTC

Ok now I'm handling duplicated names (there was another one too "buffer arena-40"). 

The machine I have to test this on is "live" but I can let this run until it loses a fair bit of memory and do a controlled reboot. This will take a while as it appears to be a very slow memory leak.

Comment 16 Konstantin Belousov freebsd_committer

2023-05-07 18:42:41 UTC

(In reply to Dave Hayes from comment #14)
These are per-cpu page cache queues.

Comment 17 Dave Hayes 2023-05-07 18:59:33 UTC

So far, one of the vm pgcache entries in vmstat -z is leaking the most. Do I have to do extra gymnastics to find out which entry it is somehow?

Comment 18 Konstantin Belousov freebsd_committer

2023-05-07 19:40:57 UTC

Do you have the commit 6094749a1a5dafb8daf98deab23fc968070bc695 in your source
tree?  You should since you specified 13.2-STABLE, but the symptoms require
re-checking.

Comment 19 Dave Hayes 2023-05-07 21:19:30 UTC

I have manually inspected sys/vm/vm_page.c for the listed commit. It is there.

Comment 20 Dave Hayes 2023-05-07 21:22:45 UTC

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256507 ... in case you are unaware that I also initiated that one. 

In 12.4, this memory leak is only visible if you do specific math on the sysctls. In 13.2, it at least shows that wired is leaking slowly, but the "root bug" has yet to be found.

Comment 21 Dave Hayes 2023-05-08 17:25:15 UTC

[Mon May  8 10:23:42 2023] (842226)  (wired) Changed by     7 pages (now at 4.205 Gb, total change since start 13210 pages)
[Mon May  8 10:23:42 2023] (842226)  (vmstat -z)                  vm pgcache (18) size changed by 73 units of whatever the 'used' field is in (14280 since start)

One of the vm pgcache entries is still showing the most leakage. Should I continue this test?

Comment 22 Dave Hayes 2023-05-08 23:05:35 UTC

Created attachment 242068 [details]
Graphical evidence of vm pgcache being the source of the leak

Comment 23 Dave Hayes 2023-05-11 02:43:08 UTC

After 48 hours here is the top of the file containing output of the script:

Total wired change: 37450 pages
(vmstat -z)                  vm pgcache (18) grew 43426 units of whatever the 'used' field is in inside that time period
(vmstat -z)                  RADIX NODE (41) grew 601 units of whatever the 'used' field is in inside that time period
(vmstat -z)                  vm pgcache (19) grew 565 units of whatever the 'used' field is in inside that time period
(vmstat -z)                 UMA Slabs 0 (25) grew 277 units of whatever the 'used' field is in inside that time period
(vmstat -z)                mbuf_cluster (94) grew 252 units of whatever the 'used' field is in inside that time period
(vmstat -z)              pf state keys (158) grew 228 units of whatever the 'used' field is in inside that time period
(vmstat -z)                  pf states (157) grew 197 units of whatever the 'used' field is in inside that time period

Comment 24 Konstantin Belousov freebsd_committer

2023-05-11 23:47:57 UTC

43426 cached pages is equal to around 170MB of cached free memory.
First, this is not wired memory, second, it is allocatable on demand.

There should be something else that leaks wirings.  It would be useful
to see e.g. simple output from the top(1) Mem: line when the situation occurs.

Comment 25 Dave Hayes 2023-05-12 01:39:29 UTC

So, as part of monitoring, the machine in question does repeated netstats and hence is always in that state. It's just a matter of figuring out what statistics are useful to you in finding the problem, as I likely collect way too much. As you might have noticed, I can produce graphs of any sysctl MIB that is exposed to root users. 

Here is the information you requested:

  Mem: 66M Active, 1015M Inact, 4439M Wired, 1419M Buf, 10G Free

It took some weeks for the machine to start thrashing last time, this is because the leak is very slow. My while loop above is just to make the leak more apparent. That vm pgcache value is still growing, even though I turned the loop off (as is wired memory).

Comment 26 Konstantin Belousov freebsd_committer

2023-05-12 17:47:56 UTC

(In reply to Dave Hayes from comment #25)
So there is nothing unusual in the data you shown, and more, there were no
de-facto demonstration of the problem.  I do not see a leak in any of the
numbers you provided so far.

Gather the data I requested, from the machine when it does have the problem.

Comment 27 Dave Hayes 2023-05-13 01:20:48 UTC

That will be impossible since once it does have the problem, I cannot log in and command prompts do not work, as evidenced by previous interactions with a machine with this bug. 

If the attached graph doesn't show a problem to you, and nor does the additional idea that the sign of the slope of that graph hasn't changed in all this time, then I am at a loss as to what will demonstrate this issue to your opaque standards of "demonstrated". 

Perhaps if you refrained from treating me like a mushroom and actually explained your reasoning, I might find another way to demonstrate this issue. 

If you don't care at all, I can respect that, but at least tell me that you do not care.

Comment 28 kadir kose 2023-12-14 09:39:58 UTC

Hi, I encountered the same issue. Based on the sysctl outputs below, I think the lost memory is due to unswappable pages because the unswappable page count is increasing and RAM is decreasing in the outputs of the top command. However, I don't know why.


vm.stats.vm.v_laundry_count: 56910
vm.stats.vm.v_inactive_count: 329390
vm.stats.vm.v_active_count: 313255
vm.stats.vm.v_wire_count: 132393
vm.stats.vm.v_free_count: 44036
vm.stats.vm.v_page_count: 1000174
vm.stats.vm.v_page_size: 4096

vm.domain.0.stats.unswappable: 98333
vm.domain.0.stats.laundry: 56910
vm.domain.0.stats.inactive: 329390
vm.domain.0.stats.active: 313255
vm.domain.0.stats.free_count: 44036