Created attachment 225666 [details] Memory graph Consider the following output: # sysctl vm.stats.vm | grep count vm.stats.vm.v_cache_count: 0 vm.stats.vm.v_user_wire_count: 0 vm.stats.vm.v_laundry_count: 0 vm.stats.vm.v_inactive_count: 121191 vm.stats.vm.v_active_count: 20836 vm.stats.vm.v_wire_count: 754310 vm.stats.vm.v_free_count: 254711 vm.stats.vm.v_page_count: 3993253 It should be pretty clear that these numbers do not add up. There are missing memory pages. I have some detailed statistics of this machine in prometheus. A graph of the issue is attached. I calculate "lost memory" by simply adding up all the _count variables except v_page_count, and then subtracting that sum from v_page_count. You will note that over time, the system gradually loses free memory. Eventually this machine will start swapping and then exhaust swap space and hang. This is one example from a machine that is running relatively few services. It is not running ZFS. However, I observe the same behavior on a few other machines with disparate services and some of those are running ZFS. I have spent some time asking on lists and looking at various sysctl values to try to determine whether I am missing something or not. I was unable to find anything relevant, and having come to the freebsd-stable list to find two others experiencing this issue, I'm filing this bug. Any data anyone needs, just ask me. I actually use prometheus_sysctl_exporter (thanks for that btw!). Thanks in advance. :)
Just want to note that I noticed a very similar problem with stable/13. So far I haven't been able to find any clues. In the original report the number of unaccounted pages seem to grow smoothly and linearly. In my case I see it growing in steps. That is, the number would stay pretty constant (with some jitter) and then would jump over a short period of time. I see some correlation between the jumps and certain activity, but I cannot pinpoint what exactly causes it. Some possibilities: - the activity involves some db style updates via mmap - the activity involves "spawning" of processes - the activity involves a daemon built on Mono / .NET
We have exactly the same problem as Dave and Andriy describe immediatelly after we upgrade to 13.1-RELASE. Server is Supermicro SYS-6019P-MTR, with 128 GB RAM, ZFS ... Everything was working normal for years with 12.x branch.
Created attachment 234961 [details] Past 3 months of activity I've provided some more graphical data, graphing the actual sysctl vm.stats.vm.*_count stats vs the lost memory graph. I'm hoping this will shed enough light that someone who knows more may help fix it. I'm willing to provide almost any data needed.
Created attachment 234965 [details] monthly graph
Created attachment 234966 [details] daily graph
Comment on attachment 234965 [details] monthly graph In mid October we added RAM, then period with 12.3-RELEASE and you can see exactly when we upgrade to 13.1.
Just want to update that I have not be able to root cause this problem or get rid of it. I thought that I saw some correlation between some activity on the system and increases in missing pages. But I was never able to reproduce the leak at will, so not sure if my observation was actually valid. Just in case, it appeared to me that the leak was correlated with an application written in Go. I suspected that it used some compatibility system calls (especially related to mmap) and there was a bug somewhere in the compat code.
I will confirm the Go idea. The prometheus node_exporter is the only common application between two of my machines that have this bug. Good catch there, I think? BTW, the bug is still present in stable/12-n1-1115623ac
(In reply to dave from comment #8) Just for the record, we don't have anything in/with Go on our server and after returning back to 12.3 everything works normal.
I have to correct my comment #7, the suspect application is not a Go program, it's actually a Mono program. Somehow I confused those two things. Anyway, the problem is still present 13.1-STABLE 689d65d736bbed. It still correlates with activity of that application. file identifies the "executable" as ELF 64-bit LSB shared object, x86-64, version 1 (FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1, for FreeBSD 11.3, FreeBSD-style
The symptoms of this issue appear to be the same as 266013. Do the 'missing' pages return to the count if you stop all running services?
So...stopping -all- running services to me is an effective reboot. :) Nevertheless, on my machine with the most minimal service deployment that has the memory issue, I stopped the biggest memory consumers: unbound node_exporter blackbox_exporter Stopping them did not return the memory, as measured by this script: --Begin script #!/usr/local/bin/perl use strict; use warnings; my $pagesize = `sysctl -n vm.stats.vm.v_page_size`; chomp($pagesize); my %db = (); open(STATS,"sysctl vm.stats.vm |") || die "Can't open sysctl: $!\n"; while(<STATS>) { if (/v_(\S+)_count:\s(\d+)/) { $db{$1} = $2; } } close(STATS); my $total = $db{'page'}; foreach my $k (keys %db) { next if ($k eq 'page'); $total -= $db{$k}; } my $totalmemMB = ($pagesize * $total) / (1024 * 1024); printf("Lost memory: %d %d-byte pages (%.f MB)\n", $total, $pagesize, $totalmemMB); --- End script This printed out roughly the same numbers as reported by prometheus after services were stopped. Of course, I have superpages enabled and yet vm.stats.vm.v_page_size reports 4096 still. I've no idea if this is the correct way to calculate actual memory lost, but it looks correct.
This script might yield negative results because of lazy dequeuing of wired pages which may result into double counting pages occasionally. See bug #234559. Perhaps you are just observing similar artifacts (in the other direction)?
So I only wrote the script for the purposes of addressing comment 11. My main source of data is said sysctls exported at intervals to prometheus, which is where I get the graph you can look at in the attachments here. That being said, if stopping and restarting services released the lost memory due to lazy reporting, it should have shown up in prometheus eventually. It has not for the past few hours. Additionally, bug #234559 seems to be a reporting issue. If that were only the case, I would not have opened this bug. :) If you read the original comment, a machine with this bug left to itself for long enough will start swapping, then thrashing, and finally panic when the swap space is exhausted.
I should also mention this wonderful tool prometheus_sysctl_exporter(8). I have this data ingested into prometheus at a 5 second interval for both machines here that suffer from this bug. If anyone is after specific data in the sysctl space, I probably have it available and can likely render a grafana graph of whatever query you want. I am highly interested in getting this bug fixed.
Can you tell me all the major services that are running on the host? This would help to try and get a simple reproduction environment. The page count from your first comment suggests this is a machine with 16GB of memory, is that correct? The only reports I have for this issue (or a very similar one) are occuring on machines with multiple TB of RAM, which is a harder to reproduce.
Can you provide the output of: sysctl vm hw
Created attachment 236137 [details] Output of sysctl vm hw > Can you tell me all the major services that are running on the host? Sure: - openvpn - unbound - openntpd - openssh_portable - dhcpd - node_exporter - blackbox_exporter - a couple of minor perl daemons The -only- commonality with the other machine that has this issue (which happens to be my package builder) is: - openntpd - openssh_portable - node_exporter > The page count from your first comment suggests this is a machine with > 16GB of memory, is that correct? Yes, however the other machine with this issue has 128GB of ram. > Can you provide the output of: sysctl vm hw See attachment.
Hello. Just noted similar after upgrading web servers to 13.0-RELEASE, and same still in place for 13.1-RLEASE. After upgrade servers worked no more than several days until memory exhausted, then reboot required. Switching 'sendfile on' to 'sendfile off' in nginx config helped and servers now works stable for months, however monitoring (poor munin, I can provide graphs if required) still show strange vm behaviour which I didn't observe in 11.1-RELEASE. No ZFS used, only UFS. Probably this will give clue and help in investigation. Thanks.
Created attachment 236885 [details] possible bug fix If anyone able to reproduce can test a patch, please try this one. It applies only to 13 and later - if you are seeing these problems on 12, there is something unrelated happening.
Created attachment 236921 [details] possible bug fix The last patch missed a case, this one addresses that. If you were testing with the previous patch, please try this one instead. Sorry for the inconvenience.
Thanks Mark, i just applied your patch, so we'll just have to wait and see. For our web server the problem is manifested in range of one day.
(In reply to Mark Johnston from comment #21) Hi Mark, the patch seems to do the job in our case. We tested it on 13-stable. Still, it is a strange problem, none of our other servers with 13.1 experienced such behavior.
(In reply to Bane Ivosev from comment #23) Thanks for testing. I'm about to commit the patch and will merge to 13 soon.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=2c9dc2384f85a4ccc44a79b349f4fb0253a2f254 commit 2c9dc2384f85a4ccc44a79b349f4fb0253a2f254 Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2022-10-05 19:12:46 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2022-10-05 19:12:46 +0000 vm_page: Fix a logic error in the handling of PQ_ACTIVE operations As an optimization, vm_page_activate() avoids requeuing a page that's already in the active queue. A page's location in the active queue is mostly unimportant. When a page is unwired and placed back in the page queues, vm_page_unwire() avoids moving pages out of PQ_ACTIVE to honour the request, the idea being that they're likely mapped and so will simply get bounced back in to PQ_ACTIVE during a queue scan. In both cases, if the page was logically in PQ_ACTIVE but had not yet been physically enqueued (i.e., the page is in a per-CPU batch), we would end up clearing PGA_REQUEUE from the page. Then, batch processing would ignore the page, so it would end up unwired and not in any queues. This can arise, for example, when a page is allocated and then vm_page_activate() is called multiple times in quick succession. The result is that the page is hidden from the page daemon, so while it will be freed when its VM object is destroyed, it cannot be reclaimed under memory pressure. Fix the bug: when checking if a page is in PQ_ACTIVE, only perform the optimization if the page is physically enqueued. PR: 256507 Fixes: f3f38e2580f1 ("Start implementing queue state updates using fcmpset loops.") Reviewed by: alc, kib MFC after: 1 week Sponsored by: E-CARD Ltd. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D36839 sys/vm/vm_page.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=6094749a1a5dafb8daf98deab23fc968070bc695 commit 6094749a1a5dafb8daf98deab23fc968070bc695 Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2022-10-05 19:12:46 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2022-10-12 13:49:25 +0000 vm_page: Fix a logic error in the handling of PQ_ACTIVE operations As an optimization, vm_page_activate() avoids requeuing a page that's already in the active queue. A page's location in the active queue is mostly unimportant. When a page is unwired and placed back in the page queues, vm_page_unwire() avoids moving pages out of PQ_ACTIVE to honour the request, the idea being that they're likely mapped and so will simply get bounced back in to PQ_ACTIVE during a queue scan. In both cases, if the page was logically in PQ_ACTIVE but had not yet been physically enqueued (i.e., the page is in a per-CPU batch), we would end up clearing PGA_REQUEUE from the page. Then, batch processing would ignore the page, so it would end up unwired and not in any queues. This can arise, for example, when a page is allocated and then vm_page_activate() is called multiple times in quick succession. The result is that the page is hidden from the page daemon, so while it will be freed when its VM object is destroyed, it cannot be reclaimed under memory pressure. Fix the bug: when checking if a page is in PQ_ACTIVE, only perform the optimization if the page is physically enqueued. PR: 256507 Fixes: f3f38e2580f1 ("Start implementing queue state updates using fcmpset loops.") Reviewed by: alc, kib Sponsored by: E-CARD Ltd. Sponsored by: Klara, Inc. (cherry picked from commit 2c9dc2384f85a4ccc44a79b349f4fb0253a2f254) sys/vm/vm_page.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
*** Bug 266013 has been marked as a duplicate of this bug. ***
(In reply to Mark Johnston from comment #20) Is there any chance something like this might be happening on 12? Is there any data you need to address the issue on 12?
(In reply to dave from comment #28) I'm sorry, I missed this followup. I'd like to know any details about the workloads you're using on stable/12 which exhibit the problem. So far it looks like, - it happens with or without ZFS in use - the system is leaking pages at a constant rate - it only happens on systems running certain go applications(?) - stopping services does not cause the lost memory to reappear - it happens on the latest stable/12 Given that the rate of the leak appears to be nearly constant, is it possible to figure out which service is triggering it?
(In reply to Mark Johnston from comment #29) Thank you for replying. So to confirm: - Yes it happens with or without ZFS in use - The system is leaking pages at a constant rate (and this rate is different for each machine) - Both systems are running prometheus exporters (the go applications you refer to) - Stopping services does not cause the lost memory to return However, "latest stable" is not what I am running. stable/12-n1-1115623ac is what I am running, which is effectively 12.3-STABLE from some months ago. I had considered upgrading to the latest 12/stable, but the report of the bug in 13 stopped me from doing this. I personally do not believe a service is triggering it. From my extensive stats, I have almost exactly graphically linked vm.stats.vm.v_free_count to the lost memory measurement. All the other vm.stats.vm constants have no real graphical correlation to the lost memory measurement. You can see some of this in one of my attachements. Based on this observation alone, what you described as the cause for this kind of bug in 13 appears to me to be the most likely cause in 12 as well. Do note that I am not a kernel dev. :) Let me know if you need any more data.
(In reply to dave from comment #30) Is it possible to see whether stopping the prometheus exporters also stops the page leak? Can you please share output of "vmstat -z" from a system that's leaked a "substantial" number of pages? Say, more than 10% of memory. I am sure that a kernel bug is responsible for what you're seeing, but I'm quite sure it's not the same bug as the one I already fixed. The affected code was heavily rewritten between 12 and 13, which is where the problem was introduced; many of the folks who saw a problem on 13 reported seeing it after an upgrade from 12. The bug in 12 might be similar, but I haven't been able to spot it by code inspection (yet), so right now I'm just trying to narrow down the areas where this bug could plausibly be lurking.
Created attachment 237639 [details] vmstat -z from 19.2% lost memory Here you go. Hope this helps.
I have picked the machine with the faster leak (the one I sent the vmstat -z for) and have stopped node_exporter and prometheus_sysctl_exporter. I will leave it in this state for 12-16 hours, after which time I should be able to see the leak stop iff the exporters are the stimulus of this bug.
Created attachment 237658 [details] possible bug fix for stable/12 Here's a patch for a bug that could cause the symptoms you're seeing. If you're able to test it soon, it would be greatly appreciated; if it fixes the problem I can get included it with a batch of errata patches next week. If you're not able to patch the kernel, another test to try is to set the vm.pageout_update_period sysctl to 0 on a running system. If the leak still occurs with that setting, then my patch won't help.
While I can patch the kernel, it seems sensible to try your sysctl setting first. Are there any potential side effects other than fixing the issue? :)
(In reply to dave from comment #35) I don't think there will be any downsides. The scanning mostly is mostly useful when you have a large active queue, which doesn't appear to be the case in your workload. Though, on second thought it is possible for the patch to help even if setting pageout_update_period=0 doesn't fix the problem. I had said it wouldn't because I see v_dfree = 0 in the vm sysctl dump you attached, but that might not reflect reality on all of your systems. So, please try the sysctl test, but it's worth also testing the patch no matter what the result.
So the sysctl test after a couple hours shows zero sign of fixing the memory leak. Which OS branch of 12/stable is your patch relative to? A build process shouldn't take too long here but I want to make sure we are both referencing the same source code; my source code is probably ancient to you.
I wish I could edit comments. I'm looking for which 12/stable revision I need to grab. I can just grab the latest one if that will work.
(In reply to dave from comment #37) What's the value of the vm.stats.vm.v_pfree sysctl on that system?
(In reply to dave from comment #38) Any recent revision will do. That code has not changed much in stable/12 for the past year or so. You mentioned stable/12-n1-1115623ac earlier, which should be fine.
# sysctl -a vm.stats.vm.v_pfree vm.stats.vm.v_pfree: 8102884297 Want a graph of that over time?
(In reply to dave from comment #41) Thanks. No, a large value just suggests that the patch has a chance of fixing the bug you're hitting.
The patch is not working to resolve the memory issue. :/
A commit in branch releng/13.1 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=4867d7d34dfd54986d5798eddc3ce92a70cc9841 commit 4867d7d34dfd54986d5798eddc3ce92a70cc9841 Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2022-10-05 19:12:46 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2022-11-01 13:28:11 +0000 vm_page: Fix a logic error in the handling of PQ_ACTIVE operations As an optimization, vm_page_activate() avoids requeuing a page that's already in the active queue. A page's location in the active queue is mostly unimportant. When a page is unwired and placed back in the page queues, vm_page_unwire() avoids moving pages out of PQ_ACTIVE to honour the request, the idea being that they're likely mapped and so will simply get bounced back in to PQ_ACTIVE during a queue scan. In both cases, if the page was logically in PQ_ACTIVE but had not yet been physically enqueued (i.e., the page is in a per-CPU batch), we would end up clearing PGA_REQUEUE from the page. Then, batch processing would ignore the page, so it would end up unwired and not in any queues. This can arise, for example, when a page is allocated and then vm_page_activate() is called multiple times in quick succession. The result is that the page is hidden from the page daemon, so while it will be freed when its VM object is destroyed, it cannot be reclaimed under memory pressure. Fix the bug: when checking if a page is in PQ_ACTIVE, only perform the optimization if the page is physically enqueued. Approved by: so Security: FreeBSD-EN-22:23.vm PR: 256507 Fixes: f3f38e2580f1 ("Start implementing queue state updates using fcmpset loops.") Reviewed by: alc, kib Sponsored by: E-CARD Ltd. Sponsored by: Klara, Inc. (cherry picked from commit 2c9dc2384f85a4ccc44a79b349f4fb0253a2f254) (cherry picked from commit 6094749a1a5dafb8daf98deab23fc968070bc695) sys/vm/vm_page.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
(In reply to commit-hook from comment #44) > FreeBSD-EN-22:23.vm <https://lists.freebsd.org/archives/freebsd-announce/2022-November/000050.html> > Affects: FreeBSD 13.1
Created attachment 237828 [details] Finer grained picture of memory leak I'm wondering if this picture helps at all to get a working patch for FreeBSD 12? It's the finest grained detail I can display about the memory bug. Are there downsides to attempting to use the patch from 13 as a test?
So a brief update. I've upgraded one of the affected machines to 13.2-STABLE 8c09bde96. Apparently, the memory leak has moved from "lost" to wired memory. Machine has already crashed running from out of memory, to swapping, to starting to thrash. I would say the actual bug isn't found yet, but this entire effort has moved the visibility of the bug from "having to be clever about calculations" to "look the wired memory is leaking". How would I go about finding this leak? What can I monitor from things like vmstat -z?
I started to notice a very similar behavior after upgrading to 13.2-STABLE stable/13-n256460-03b6464b29fe. I use ZFS and my major workload is netatalk. The machine's wired memory would go up over the course of a day or so until many processes get killed. Even getty cannot get pages, which prevents me from login. The strange thing is that I only see this behavior now (after an git pull and update at the beginning of Oct.). My last such update of the system was at the end of July and after that the system ran continuously without a problem for two months. How can I help to provide more data to help debug this? $ sysctl vm.stats.vm | grep -i count vm.stats.vm.v_cache_count: 0 vm.stats.vm.v_user_wire_count: 0 vm.stats.vm.v_laundry_count: 0 vm.stats.vm.v_inactive_count: 946022 vm.stats.vm.v_active_count: 75985 vm.stats.vm.v_wire_count: 1401974 vm.stats.vm.v_free_count: 1633981 vm.stats.vm.v_page_count: 4057118
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=271246 From my perception, it seems to be that people do not want to fix this issue. :)
Hi, I encountered the same issue. Based on the sysctl outputs below, I think the lost memory is due to unswappable pages because the unswappable page count is increasing and RAM is decreasing in the outputs of the top command. However, I don't know why. vm.stats.vm.v_laundry_count: 56910 vm.stats.vm.v_inactive_count: 329390 vm.stats.vm.v_active_count: 313255 vm.stats.vm.v_wire_count: 132393 vm.stats.vm.v_free_count: 44036 vm.stats.vm.v_page_count: 1000174 vm.stats.vm.v_page_size: 4096 vm.domain.0.stats.unswappable: 98333 vm.domain.0.stats.laundry: 56910 vm.domain.0.stats.inactive: 329390 vm.domain.0.stats.active: 313255 vm.domain.0.stats.free_count: 44036
*** Bug 266272 has been marked as a duplicate of this bug. ***
I have a data point that *might* be related to this. A bit fuzzy, sorry, but maybe it can ring a bell for someone. Today I compiled a heavyweight port (Mongodb) inside a jail on a server running TrueNAS 13. Inactive memory went to the roof (8 GB on a 16 GB machine) and of course it literally squeezed the ZFS ARC. I stopped everything I could (there are several jails running stuff) but inactive memory didn't decrease. I had even stopped the jails and restarted them hoping that something was holding that memory (although it doesn't make much sense!). The big surprise: I didn't reboot the system but going inside the jail where I compiled Mongodb I did a make clean on the port directory. And suddenly Inactive memory went from 8 GB to 1.5 GB! I am wondering (sorry about this extremely fuzzy data point), is there any sort of directory/cache leak related to jails? Destroying all of those files really solved the situation. So, key points: - Using jails - Using ZFS (of course) - Compiling a heavy port and dependencies inside the jail Theory: Doing that inside a jail somehow made pages in Inactive memory "stick".
(In reply to Borja Marcos from comment #52) First of all, the symptoms you describe seem to have nothing to do with this bug. This bug is about total memory shrinking (as if physical memory were removed in bits). Second, I suspect that what you observed is related to tmpfs. Anyway, please don't hijack bug reports.
Sorry, not intending to hijack it at all. I arrived to this particular bug following some of the duplicates (probably fat fingering my searches, sorry!) and I thought it might be related. Going silent on this one and I will search for tmpfs related bugs. Thank you and please accept my apologies!