Bug 256507

Summary: Apparent kernel memory leak in 12-STABLE r368820
Product: Base System Reporter: dave
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: New ---    
Severity: Affects Some People CC: avg, dpetrov67, freebsd, ota
Priority: ---    
Version: 12.2-STABLE   
Hardware: amd64   
OS: Any   
Description Flags
Memory graph none

Description dave 2021-06-09 18:31:28 UTC
Created attachment 225666 [details]
Memory graph

Consider the following output:

 # sysctl vm.stats.vm | grep count
 vm.stats.vm.v_cache_count: 0
 vm.stats.vm.v_user_wire_count: 0
 vm.stats.vm.v_laundry_count: 0
 vm.stats.vm.v_inactive_count: 121191
 vm.stats.vm.v_active_count: 20836
 vm.stats.vm.v_wire_count: 754310
 vm.stats.vm.v_free_count: 254711
 vm.stats.vm.v_page_count: 3993253

It should be pretty clear that these numbers do not add up. There are missing memory pages. 

I have some detailed statistics of this machine in prometheus. A graph of the issue is attached. I calculate "lost memory" by simply adding up all the _count variables except v_page_count, and then subtracting that sum from v_page_count.

You will note that over time, the system gradually loses free memory. Eventually this machine will start swapping and then exhaust swap space and hang. This is one example from a machine that is running relatively few services. It is not running ZFS. However, I observe the same behavior on a few other machines with disparate services and some of those are running ZFS.

I have spent some time asking on lists and looking at various sysctl values to try to determine whether I am missing something or not. I was unable to find anything relevant, and having come to the freebsd-stable list to find two others experiencing this issue, I'm filing this bug. 

Any data anyone needs, just ask me. I actually use prometheus_sysctl_exporter (thanks for that btw!). Thanks in advance. :)
Comment 1 Andriy Gapon freebsd_committer 2021-06-10 06:15:07 UTC
Just want to note that I noticed a very similar problem with stable/13.
So far I haven't been able to find any clues.
In the original report the number of unaccounted pages seem to grow smoothly and linearly.  In my case I see it growing in steps.  That is, the number would stay pretty constant (with some jitter) and then would jump over a short period of time.

I see some correlation between the jumps and certain activity, but I cannot pinpoint what exactly causes it.

Some possibilities:
- the activity involves some db style updates via mmap
- the activity involves "spawning" of processes
- the activity involves a daemon built on Mono / .NET