Created attachment 225666 [details]
Consider the following output:
# sysctl vm.stats.vm | grep count
It should be pretty clear that these numbers do not add up. There are missing memory pages.
I have some detailed statistics of this machine in prometheus. A graph of the issue is attached. I calculate "lost memory" by simply adding up all the _count variables except v_page_count, and then subtracting that sum from v_page_count.
You will note that over time, the system gradually loses free memory. Eventually this machine will start swapping and then exhaust swap space and hang. This is one example from a machine that is running relatively few services. It is not running ZFS. However, I observe the same behavior on a few other machines with disparate services and some of those are running ZFS.
I have spent some time asking on lists and looking at various sysctl values to try to determine whether I am missing something or not. I was unable to find anything relevant, and having come to the freebsd-stable list to find two others experiencing this issue, I'm filing this bug.
Any data anyone needs, just ask me. I actually use prometheus_sysctl_exporter (thanks for that btw!). Thanks in advance. :)
Just want to note that I noticed a very similar problem with stable/13.
So far I haven't been able to find any clues.
In the original report the number of unaccounted pages seem to grow smoothly and linearly. In my case I see it growing in steps. That is, the number would stay pretty constant (with some jitter) and then would jump over a short period of time.
I see some correlation between the jumps and certain activity, but I cannot pinpoint what exactly causes it.
- the activity involves some db style updates via mmap
- the activity involves "spawning" of processes
- the activity involves a daemon built on Mono / .NET