After the MFC of base r349840 amd base r349841 to 12-STABLE as base r350374, attempts to start a VirtualBox Windows7 guest would fail when there was memory pressure forcing swapping. Another user running HEAD on an RPi3 with 6 GB of RAM is reporting similar OOM issues, though his case is triggered by building www/chromium. It does involve OOM with a lot of swap available and it appears to not occur if r349840 and r349841 are rolled back. Just reverting base r349841 does not resolve the issue. Note that r349840 should do absolutely nothing on systems with less than 16 GB of RAM. This is per the author, markj@. So far he has no explanation for the problem. My system is a Lenovo T520 with a Sandy Bridge processor, 8 GB of RAM and * GB of swap space evenly spit over two drives. The amount of swap space in use seems irrelevant ss the failure has occurred with over 7 GB of free swap space. The failure never occurs when the free memory is adequate to start the VM without swapping. I have not looked at the code in VirtualBox to see how it allocates memory, but I would guess that it uses slab allocation for loading the memory image (4 GB) of the virtual machine. I suspect that it uses malloc for loading the VB (VirtualBox) code. It looks like the load of the virtual machine's memory works OK, as a minimum of 98% of the load always completes regardless of the amount of swap required without problem. At 98%, the load percentage holds for several seconds and then errors out. If there is no memory pressure, there is no pause at 98%. If the VM is restarted, it succeeds with no pause. An occasional non-fatal error is similar except that the pause may happen at 99% and a non-fatal error that the VM could not be started due to lack of memory and applications should be terminated to free memory. The VM is left in the paused (greyed out) state, but the system may be immediately "un-paused" and it runs instantly with no other action. The other user, Bob Prohaske, provided information on his case and agreed to my including a link to it in this PR. The files are at http://www.zefox.net/~fbsd/rpi3/swaptests/oberman/ I am not attaching he logs due to their size. They include logs of attempts to build chromium which succeeds with r349839 and fails with r349841. Note that his system is running HEAD.
(In reply to rkoberman from comment #0) An RPi3 has only 1 GiByte of RAM. Did you mean 6 GiBite of SWAP? Was the RPi3 an incorrect identification? Example messages that were misnomers: (taken from another list exchange on the issue) Sep 11 23:21:43 zipper kernel: pid 16581 (rustc), jid 43, uid 65534, was killed: out of swap space Sep 12 02:48:23 zipper kernel: pid 1209 (rustc), jid 62, uid 65534, was killed: out of swap space . . . Unfortunately, the wording of this type of message is a misnomer for what drove the associated kills: it was actually driven by being unable to gain more free memory in order to maintain a minimum amount. FreeBSD will not swap-out processes that stay runnable (or are running), only ones that are waiting. Even a single process that stays runnable and keeps lots of RAM in the active category can lead to kills when swap is unused or little used. So the kill-behavior is very workload dependent. (Recently additional kills on other criteria were added on current.) Real "out of swap" conditions tend to also have messages similar to (or always do?): Aug 5 17:54:01 sentinel kernel: swap_pager_getswapspace(32): failed If you are not seeing such messages, then it is likely (certain?) that the mount of swap space still free is not the actual thing driving the kills. Are you seeing "swap_pager_getswapspace(32): failed" messages? Whatever for that, it would be nice if "was killed: out of swap space" messages were changed not not be so over specific/incorrect about the cause, frequently misleading people that get the messages and leading to user activity that can not help. From where on I may well be mixing in current (13) material, not just 12.x material. There are tunable's for controlling (some of) the kill behavior that have helped on small armv7 or aarch64 boards: vm.pageout_oom_seq vm.pfault_oom_wait vm.pfault_oom_attempts Prior to the last two being added in current, increasing vm.pageout_oom_seq to increase how many times FreeBSD would try to gain free RAM before starting kills used to be sufficient to get buildworld buildkernel and other activities to complete on various small armv7 or aarch64 boards. Setting vm.pfault_oom_attempts to -1 disables that pfault_oom mechanism but if there is a worry about actually running out of swap, that might not be appropriate (FreeBSD stopping overall from lack of resources). When vm.pfault_oom_attempts is not -1, the overall criteria is based on vm.pfault_oom_wait*vm.pfault_oom_attempts before starting kills. But there may be a tradeoff structure between different ways of having (nearly) the same total for the multiplication. https://lists.freebsd.org/pipermail/freebsd-current/2019-September/074310.html was the start of an exchange about that involved material about the vm.pfault_oom_<?> additions. (And mentions vm.pageout_oom_seq as well.)
Created attachment 208104 [details] proposed patch for head
Created attachment 208105 [details] proposed patch for stable/12
I uploaded a couple of patches which I believe will fix the regression. Could you please confirm that the issue is resolved when they are applied?
Created attachment 208118 [details] proposed patch for stable/12
After applying the latest proposed patches, the problem persists. No change on the symptoms.
There's a logfile for a kill on a pi3 running r353116M at http://www.zefox.net/~fbsd/rpi3/swaptests/oberman/r353116M/ The patch seemed to apply cleanly and there were no problems compiling. The kill is near timestamp Oct 7 05:28:38 and swap storage activity beforehand looks relatively light. There is a surge immediately after the kill, which surprised me. The patch for stable/12 was applied to a pi2 running r353115:353124M (the versioning is something I've not seen before). It applied cleanly and there were no problems compiling. So far, nothing undesirable has happened, but it's worth noting that 32-bit arm has never exhibited the OOMA difficulties associated with 64-bit arm.
(In reply to Mark Millard from comment #1) I will note that I don't get any "Out of swap space" messages. You are looking at the test from Bob Prohaska who also started seeing problems with the same commit to head and had success after reverting that commit. His memory pressure was from attempting to build chromium on a 6G RPi3. I agree that the message is really misleading, but not an issue. I am running on an AMD64 Sandy Bridge system and have not seen this error the "Out of swap space" message nor are processes being killed. It looks like VirtualBox loads the VM's RAM successfully, then tries to load the VM support code in the memory starved system and that fails. From the multi-second delay at 98% loaded. I could be totally wrong about this, but the hypothesis fits the symptoms quite well. I suspect that the RAM loading code is very different from the load of the support code. I would expect that RAM to use slab allocation and the support code to use regular malloc, but, again, this is speculation. The problem is only triggered after the the commit of a patch which should not ever run on a system with 8G of RAM. If I revert that patch, the error never occurs. It seems pretty clear that the commit is the trigger, though the last patch markj posted really should have made the code unreachable for an system with under 16G of RAM. BTW, at any moment I have a long list of idle processes sitting on quite a bit of memory. I'm running MATE which is not quite a big a memory pig and Gnome, but it's close with a lot of assorted stuff that mostly sits around waiting for something to happen that probably won't and are good candidate for swap.I do have one HUGE memory pig in Firefox which starts at about 2.5G and climbs to north of 3G. Right now "ps -o rss,command -Uoberman | grep firefox" is showing 3.31G over 10 processes. (Why is resident memory 'rss' in ps and 'res' in top? Minor annoyance.) One very noticeable artifact of the problem is that VirtualBox always loads 98% of it's 4G of RAM and then pauses for several seconds before dying and always succeeds when immediately start the VM again with no processes killed and no memory freed.
It turns out that OOMA kills can be provoked using Peter Holm's stress2 test. An example on a Pi3 running FreeBSD www.zefox.org 13.0-CURRENT FreeBSD 13.0-CURRENT #2 r353297: Tue Oct 8 03:11:47 PDT 2019 bob@www.zefox.org:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC arm64 produces a log excerpt that looks like this: procs memory page disks faults cpu r b w avm fre flt re pi po fr sr mm0 da0 in sy cs us sy id 912 0 4 2371076 19368 135 6 0 0 338 618 0 0 19815 2807 10286 1 9 90 dT: 17.380s w: 10.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps ms/d %busy Name 0 105 7 215 62.9 98 1703 134.4 0 0 0.0 74.7 mmcsd0 0 49 0 4 3.6 49 466 5.3 0 0 0.0 15.1 da0 0 105 7 215 77.9 98 1703 140.2 0 0 0.0 85.9 mmcsd0s2 0 57 7 212 73.5 51 1230 247.1 0 0 0.0 80.1 mmcsd0s2a 0 48 0 4 149.5 48 473 26.1 0 0 0.0 39.7 mmcsd0s2b 0 57 7 212 73.6 51 1230 247.2 0 0 0.0 80.2 ufs/rootfs 0 48 0 4 3.7 47 466 5.2 0 0 0.0 15.2 da0p5 Tue Oct 8 13:37:16 PDT 2019 Device 1K-blocks Used Avail Capacity /dev/mmcsd0s2b 4404252 23880 4380372 1% /dev/da0p5 2097152 23344 2073808 1% Total 6501404 47224 6454180 1% Oct 8 13:36:34 www bob[14995]: Starting test all.cfg Oct 8 13:37:13 www kernel: pid 15110 (swap), jid 0, uid 1001, was killed: out of swap space I certainly don't know how to fully interpret this, but it seems clear the _amount_ of swap in use isn't excessive. /boot/loader.conf contains vm.pageout_oom_seq="4096" Thanks for reading, I hope it's useful.
(In reply to Bob Prohaska from comment #9) stress2 is designed to trigger low memory conditions, so OOM kills are to be expected.
(In reply to Mark Johnston from comment #10) Does it appear that a low memory condition was actually created, and that the kill was justified in the example provided? The swap devices were not very busy, the amount of swap in use quite modest. Thanks for reading!
(In reply to Bob Prohaska from comment #11) Does it last longer with vm.pfault_oom_attempts=-1 ? I'm not sure how to uniquely tell which constraint lead to a specific Out Of Memory Kill but the above setting disables one of the constraints so the "last longer" status being "yes" would indicate something about the constraint hit. If that does last longer, then different combinations of values for: vm.pfault_oom_wait vm.pfault_oom_attempts even for the about the same total (multiplication of the two) may vary the behavior. If it was not this category of constraint, I' not sure what else to explore, given your already large: vm.pageout_oom_seq="4096" To my knowledge there is nothing to enable reporting information to well help identify the constraint(s) being hit (or approached). The old patch that we were supplied and used no longer applies. As far as I know it has not been updated. (And it predated at least the pfault oom settings being available, if I understand right.)
(In reply to Mark Millard from comment #12) I haven't tried using vm.pfault_oom_attempts=-1 but can do so after the present experiment completes. That experiment consists of overprovisioning swap to 9 GB, of which 8 GB are usable (4GB on the microSD card plus 5 GB on a USB flash drive). This is on the Pi3 at r353297 The compilation is not quite half done by filecount, but so far swap use is less than half the value reached using the former swap configuration of 4 GB on microSD plus 2 GB on usb flash resulting in 4 GB usable swap. It's contrary to my expectation. Might it be significant if it holds true?
Created attachment 208237 [details] proposed patch Here is another patch to try. It should apply to both head and stable/12 (with some offset). I noticed that the heuristic we use to decide whether or not to perform per-CPU caching was wrong in that it doesn't take into account the fact that we cache two buckets of up to 256 free pages per CPU, not one. I also adjusted the fraction of total RAM that the per-CPU caches are permitted to cache so that we are more conservative.
(In reply to Mark Johnston from comment #14) For 12-STABLE r353118M, so I just need this patch and revert the prior patches?
(In reply to rkoberman from comment #15) Yes, please.
(In reply to Mark Johnston from comment #14) Can the new patch be tested on 13.0-CURRENT FreeBSD 13.0-CURRENT #2 r353297 ? The compilation of www/chromium on that system is still running and probably won't finish for another couple of days. Max swap use so far is ~3.5 GB, about the same as ~6 months ago compared to 4.2 GB two weeks ago. Presently the machine looks badly swap-bound with swap devices more than 50% busy and % idle over 50 %. Oddly enough, the swap_pager: indefinite wait buffer: bufobj:....." messages are very sparse, none for the last half hour. Please indicate if it's more useful to let the compile finish or interrupt it to apply the patch. At the moment, vm.pageout_oom_seq="4096" and there are 8 GB of usable swap. Might it help to make any "on the fly" changes to vm.pageout_oom_seq vm.pfault_oom_wait vm.pfault_oom_attempts ?
(In reply to Mark Johnston from comment #16) Again, it i not fixed. I create memory pressure, start the VirtualBox VM and get an OOM kills it at 98% loaded.As usual, I start the VM again and it starts normally. Again, this is on 12-STABLE r353118M on anAMD64. For some reason, perhaps a fix in Firefox, I could not generate memory pressure. Forthe first time I saw resident memory use by firefox start at 3.2G, but I could not get it to the 3.6 needed to create pressure. I added a real memory hog, a build of lang/rust and that got available memory down to about 3G after which the failure to start the VM to re-appear.
(In reply to rkoberman from comment #18) Am I right that hw.ncpu is 4 on your system? With the patch applied, can you run # dtrace -n 'BEGIN{print(`vm_dom[0].vmd_pgcache[0]);}' and paste the output? Could you also attach the output of "sysctl vm.stats" with the patch applied?
Created attachment 208276 [details] vm.stats on amd64 12-STABLE howing error Yes, hw.ncpu is 4. CPU ID FUNCTION:NAME 0 1 :BEGIN struct vm_pgcache { int domain = 0 int pool = 0 uma_zone_t zone = 0xfffff80003308780 } Attached vm.stats
(In reply to rkoberman from comment #20) Foot to mention that the vm.stats collected on freshly re-booted system before any memory pressure or errors..
(In reply to rkoberman from comment #21) That's fine, thanks. Something doesn't add up. In my patch we have: if (vmd_page_count / 600 < 2 * 256 * mp_ncpus * VM_NFREEPOOL) continue; From your sysctls I get 2001978, you have mp_ncpus == 4, and VM_NFREEPOOL == 2 on amd64. 2001978 / 600 == 3336 and 2 * 256 * 4 * 2 == 4096, so we should not be creating the cache zones on your system, but the dtrace output shows that we did. I tried booting a bhyve VM with 8GB of RAM and 4 CPUs, and I have: # sysctl vm.stats.vm.v_page_count vm.stats.vm.v_page_count: 2031299 # sysctl hw.ncpu hw.ncpu: 4 and # dtrace -n 'BEGIN{print(`vm_dom[0].vmd_pgcache[0]);}' dtrace: description 'BEGIN' matched 1 probe CPU ID FUNCTION:NAME 3 1 :BEGIN struct vm_pgcache { int domain = 0 int pool = 0 uma_zone_t zone = 0 } Are you certain you tested the correct patch and kernel?
(In reply to Mark Johnston from comment #22) Here is confirmation of the patch. I did the kernel build with -DNO_CLEAN. # svn st /usr/src/sys ? /usr/src/sys/amd64/conf/GENERIC.4BSD M /usr/src/sys/dev/acpi_support/acpi_ibm.c ? /usr/src/sys/dev/acpi_support/acpi_ibm.c.orig M /usr/src/sys/vm/vm_page.c # svn diff /usr/src/sys/vm/vm_page.c Index: /usr/src/sys/vm/vm_page.c =================================================================== --- /usr/src/sys/vm/vm_page.c (revision 353118) +++ /usr/src/sys/vm/vm_page.c (working copy) @@ -199,10 +199,13 @@ vmd = VM_DOMAIN(domain); /* - * Don't allow the page caches to take up more than .25% of - * memory. + * Don't allow the page caches to take up more than .1875% of + * memory. A UMA bucket contains at most 256 free pages, and w + * have two buckets per CPU per free pool. */ if (vmd->vmd_page_count / 400 < 256 * mp_ncpus * VM_NFREEPOOL) + if (vmd->vmd_page_count / 600 < 2 * 256 * mp_ncpus * + VM_NFREEPOOL) continue; for (pool = 0; pool < VM_NFREEPOOL; pool++) { pgcache = &vmd->vmd_pgcache[pool];
Created attachment 208284 [details] debug patch (In reply to rkoberman from comment #23) Are you using a non-default kernel configuration? Could you share it? Would you please also try applying the attached patch on top of the existing patch, and post the dmesg from a fresh boot? Thanks for your patience.
I just noticed an error in my application of the patch. The diff I posted is missing a rather critical '-'. I just rebuilt the kernel and will test shortly. My sincere apologies for the waste of your time. Should the problem NOT be fixed, I will add the debug code, but I don't think it will be needed.
Looks like the patch did the trick. I just started my VM with only 2.5G of available RAM with no problems. Thanks so much for tracking this down and I apologize again for the botched patch. With the >30 line offset, I just manually did the patch and how I missed the deletion of the old line, I can't imagine. Rushing, I guess. The date on my kernel was only 19 minutes after you posted it and haste made waste, at least of your time.
(In reply to Mark Johnston from comment #14) I'm doing something wrong; when I copy the unified diff text from the patch into a file and feed it to patch then patch asks what file to patch. Is there a downloadable version that's idiot proof? Apologies for the dumb questions...
(In reply to Bob Prohaska from comment #27) # cd /usr/src # patch < patch-file # There are other ways, but this should work. Should work for HEAD.
(In reply to Bob Prohaska from comment #27) It looks as if removing the absolute pathnames from the patch fixed my mistake. The kernel compiled and booted, a new build of www/chromium is running now. apologies for the noise.
(In reply to rkoberman from comment #26) No worries, I'm glad that the results finally make sense. :) I submitted a patch for review: https://reviews.freebsd.org/D22040 I should be able to get it into 12.1.
(In reply to Bob Prohaska from comment #27) For patches generated from git ("diff --git") you need to strip off the dummy path element: cd /usr/src && patch -p1 < /path/to/diff.
Compilation of www/chromium started out well using attachment 208237 [details], seemingly running much faster (11,000 files in 36 hours) than previously. Make stopped around Tue Oct 15 08:36:22 PDT 2019. The first sign of trouble is Oct 15 08:36:12 www kernel: smsc0: warning: failed to create new mbuf followed less than a minute later by Oct 15 08:37:43 www kernel: pid 27729 (c++), jid 0, uid 0, was killed: out of swap space Total swap use looks fairly modest, less than 1 GB, and swap devices were busy but not saturated, ~80% for microSD and 23% for USB. Those are all well below recent maxima. The logs of make and the swapscript (vmstat, gstat, swapinfo and tail of the console) are at http://www.zefox.net/~fbsd/rpi3/swaptests/oberman/r353297M/ For clarity, the host is an RPI3B running GENERIC with 4 GB of swap on microSD and 5 GB of swap on usb flash.
(In reply to Bob Prohaska from comment #32) I believe you are on -CURRENT? Could you post the output of "vmstat -z | grep pgcache" from your system with that patch applied?
(In reply to Mark Johnston from comment #33) The output of "sysctl vm.stats" would also be helpful.
(In reply to Mark Johnston from comment #33) Run after the compile had stopped, the output is null.
(In reply to Mark Johnston from comment #34) root@www:/usr/ports/www/chromium # sysctl vm.stats vm.stats.page.queue_nops: 436827 vm.stats.page.queue_ops: 1198763477 vm.stats.object.bypasses: 86464 vm.stats.object.collapses: 270427 vm.stats.vm.v_pdpages: 1877368333 vm.stats.vm.v_tcached: 0 vm.stats.vm.v_cache_count: 0 vm.stats.vm.v_user_wire_count: 0 vm.stats.vm.v_free_severe: 939 vm.stats.vm.v_interrupt_free_min: 2 vm.stats.vm.v_pageout_free_min: 34 vm.stats.vm.v_laundry_count: 20 vm.stats.vm.v_inactive_count: 152121 vm.stats.vm.v_inactive_target: 7441 vm.stats.vm.v_active_count: 10919 vm.stats.vm.v_wire_count: 44702 vm.stats.vm.v_free_count: 27930 vm.stats.vm.v_free_min: 1514 vm.stats.vm.v_free_target: 4961 vm.stats.vm.v_free_reserved: 365 vm.stats.vm.v_page_count: 230168 vm.stats.vm.v_page_size: 4096 vm.stats.vm.v_kthreadpages: 0 vm.stats.vm.v_rforkpages: 1544014 vm.stats.vm.v_vforkpages: 13113165 vm.stats.vm.v_forkpages: 1211415 vm.stats.vm.v_kthreads: 25 vm.stats.vm.v_rforks: 23754 vm.stats.vm.v_vforks: 87630 vm.stats.vm.v_forks: 17234 vm.stats.vm.v_tfree: 500721967 vm.stats.vm.v_pfree: 83354209 vm.stats.vm.v_dfree: 63883592 vm.stats.vm.v_pdshortfalls: 28080 vm.stats.vm.v_pdwakeups: 11081 vm.stats.vm.v_reactivated: 8170291 vm.stats.vm.v_intrans: 4744832 vm.stats.vm.v_vnodepgsout: 13636 vm.stats.vm.v_vnodepgsin: 1268156 vm.stats.vm.v_vnodeout: 445 vm.stats.vm.v_vnodein: 165935 vm.stats.vm.v_swappgsout: 35426346 vm.stats.vm.v_swappgsin: 19728367 vm.stats.vm.v_swapout: 6560288 vm.stats.vm.v_swapin: 4734304 vm.stats.vm.v_ozfod: 68449 vm.stats.vm.v_zfod: 455274388 vm.stats.vm.v_cow_optim: 341 vm.stats.vm.v_cow_faults: 5144267 vm.stats.vm.v_io_faults: 5023292 vm.stats.vm.v_vm_faults: 512016786 vm.stats.sys.v_soft: 13951607 vm.stats.sys.v_intr: 2398852975 vm.stats.sys.v_syscall: 258985139 vm.stats.sys.v_trap: 7473660 vm.stats.sys.v_swtch: 979740516 root@www:/usr/ports/www/chromium
I'm still seeing "...killed: out of swap space..." on a Pi3 running FreeBSD 13.0-CURRENT #3 r353297M with 8 GB of usable swap while compiling www/chromium. The latest kill occurred with swap use around 300 MB (out of 8 GB) and storage traffic about 50% of maximum. Log files have been added to http://www.zefox.net/~fbsd/rpi3/swaptests/oberman/r353297M/ Might this be a separate issue from the original, since others report success with the patch?
(In reply to Bob Prohaska from comment #38) To my non-expert eye, I think there may be two issues. Take a look at Mark Millard's comment #1. You get a misleading error claiming that the system is out of swap space. When memory is required but there are no swapable processes, you get this error. I think the error should really be "Unable to obtain memory by swapping out a process". Also, did you ever post the output of "vmstat -z | grep pgcache"? On a CURRENT system, that should indicate whether this is the same issue.
(In reply to rkoberman from comment #39) Indeed, I suspect that the issue is unrelated. Comment 35 indicates that the per-CPU caches are not in use here. I recall that we have seen spurious OOM kills on systems swapping to SD cards and other slow media, and that was caused by extremely high I/O latency. This is still an issue.
A commit references this bug: Author: markj Date: Fri Oct 18 17:36:43 UTC 2019 New revision: 353734 URL: https://svnweb.freebsd.org/changeset/base/353734 Log: Further constrain the use of per-CPU caches for free pages. In low memory conditions a significant number of pages may end up stuck in the caches, and currently these caches cannot be reaped, leading to spurious memory allocation failures and OOM kills. So: - Take into account the fact that we may cache up to two full buckets of pages per CPU, not just one. - Increase the amount of RAM required per CPU to enable the caches. This is a temporary measure until the page cache management policy is improved. PR: 241048 Reported and tested by: Kevin Oberman <rkoberman@gmail.com> Reviewed by: alc, kib Discussed with: jeff MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22040 Changes: head/sys/vm/vm_page.c
(In reply to rkoberman from comment #39) root@www:/usr/ports/www/chromium # uname -a FreeBSD www.zefox.org 13.0-CURRENT FreeBSD 13.0-CURRENT #3 r353297M: Sun Oct 13 16:15:00 PDT 2019 bob@www.zefox.org:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC arm64 root@www:/usr/ports/www/chromium # vmstat -z | grep pgcache root@www:/usr/ports/www/chromium #
(In reply to Mark Johnston from comment #40) Does: vm.pfault_oom_attempts=-1 disable some of the "extremely high I/O latency" ways of getting OOM kills? All? (It might make for a good experiment?) There used to be a patch that helped us know when "extremely high I/O latency" was leading to OOM kills that we used for investigation. But it no longer applies and no analogous reporting was added to head (possibly under some means of enabling not requiring rebuilding). Be that as it may, How long ago was it that Bob was able to do these builds (using at least vm.pageout_oom_seq)? It does appear that something that contributes changed, if I understand the sequence/history right. (This is not the same as saying that the something is wrong/inappropriate for FreeBSD overall.)
The puzzle for me is that OOMA kills don't seem to correspond in any obvious way to times of worst I/O latency: in the latest example, the kill occured when writes to mmcsd took about a quarter of a second. Looking through the swapscript log, there are cases of latency many times greater, with no OOMA activity. Here's a fairly (but not most-) extreme example: procs memory page disks faults cpu r b w avm fre flt re pi po fr sr mm0 da0 in sy cs us sy id 0 0 22 3145364 31732 2665 98 312 183 2741 19705 0 0 22994 1295 16479 47 15 39 dT: 10.011s w: 10.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps ms/d %busy Name 12 4 0 2 16087 4 120 12876 0 0 0.0 119.0 mmcsd0 12 4 0 2 16087 4 120 12876 0 0 0.0 119.0 mmcsd0s2 6 4 0 0 0.0 4 118 12671 0 0 0.0 105.9 mmcsd0s2a 6 0 0 2 16087 0 2 16681 0 0 0.0 174.7 mmcsd0s2b 6 4 0 0 0.0 4 118 12671 0 0 0.0 105.9 ufs/rootfs Thu Oct 17 06:03:21 PDT 2019 Device 1K-blocks Used Avail Capacity /dev/mmcsd0s2b 4404252 856616 3547636 19% /dev/da0p6 5242880 853680 4389200 16% Total 9647132 1710296 7936836 18% Oct 17 06:00:21 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 201025, size: 16384 Oct 17 06:00:21 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 268296, size: 4096 Writes were taking sixteen seconds, OOMA didn't trigger and the build played on. I agree that setting vm.pfault_oom_attempts=-1 is worth a try and will do that next.
(In reply to Mark Johnston from comment #40) You write: ---quote--- I recall that we have seen spurious OOM kills on systems swapping to SD cards and other slow media, and that was caused by extremely high I/O latency. This is still an issue. ---end quote--- This rings a loud bell for me. Right now I'm seeing the "swap_pager: indefinite wait buffer:..." warning when svnlite is updating /usr/src, an activity that doesn't normally involve swapping, but does entail considerable storage activity. L(q) values exceeding 100 appear for mmcsd0, with %busy of zero. A few seconds later %busy exceeds 300% and L(q) shrinks. Is there a bug report that can be followed? I think this is relatively new behavior, appearing sometime in the last couple of months.
A commit references this bug: Author: markj Date: Tue Oct 22 14:10:31 UTC 2019 New revision: 353884 URL: https://svnweb.freebsd.org/changeset/base/353884 Log: MFC r353734: Further constrain the use of per-CPU caches for free pages. PR: 241048 Changes: _U stable/12/ stable/12/sys/vm/vm_page.c
(In reply to Mark Millard from comment #43) ---quote--- Does: vm.pfault_oom_attempts=-1 disable some of the "extremely high I/O latency" ways of getting OOM kills? All? (It might make for a good experiment?) ---end quote--- It certainly makes a difference. Setting vm.pfault_oom_attempts=-1 seemed to increase throughput severalfold, allowing make to build around two-thirds of the files required for www/chromium. Indefinite wait warnings were present, but seemingly somewhat more sparse. It really looked as if system overhead was greatly reduced. However, the machine stalled silently. The last log entries suggest both swap and storage activity were well below maximum observed values at the time of the stoppage. The log files, final top window and backtrace can be found at http://www.zefox.net/~fbsd/rpi3/swaptests/oberman/r353297M/ in the files containing "nopfault" in the name.
I'm using zfs on vps with very small rams (512mb). It was good operating on 10.3R, 11.0R, 11.1R, 11.2R and 12.0R. Recently, I upgraded to 12.1R and many processes were killed because of 'out of swap space' but no 'swap_pager_getswapspace failed'. So, I set 'sysctl vm.pageout_oom_seq=1024' on /boot/loader.conf and this reduced killed processes but sometimes some processes were killed. Increasing from 1024 to 10240 for vm.pageout_oom_seq was not changed any thing. I patched vm_page.c to 12.1R-p1 but it didn't stop killing processes by 'out of swap space'.
(In reply to Masachika ISHIZUKA from comment #48) I looked and stable/12/sys/vm/vm_fault.c -r351776 got the changes from head for the likes of vm.pfault_oom_attempts so there are more ways/contexts that can OOM and more things to control the overall behavior. You may want to try: vm.pfault_oom_attempts=-1 in addition to the: vm.pageout_oom_seq=1024 and see what the consequences are for what it disables. Use of the -1 presumes that you know the configuration will not run out of swap, as I understand other notes that were in the lists. If the -1 seems to help but running out of swap is a possibility you can not avoid, there may be other appropriate settings for the combination of: vm.pfault_oom_wait vm.pfault_oom_attempts that would be appropriate. (I've not explored such. As I remember the description different factors that multiply out to [nearly] the same figure can have differing consequences/tradeoffs.)
(In reply to Mark Millard from comment #49) Thank you for reply. As I know that the configuration will not run out of swap, I'll try to vm.pfault_oom_attempts="-1". Most Killing precesses with 'out of swap space' were occured within three to seven days in periodic daily cron job.
(In reply to Masachika ISHIZUKA from comment #50) The periodic daily on today produced 'swap_pager: I/O error - pageout failed' warnings as follows. % grep swap_pager /var/log/messages Nov 30 03:05:11 peach kernel: swap_pager: I/O error - pageout failed; blkno 6071,size 4096, error 12 Nov 30 03:05:12 peach kernel: swap_pager: I/O error - pageout failed; blkno 6072,size 4096, error 12 Nov 30 03:05:12 peach kernel: swap_pager: I/O error - pageout failed; blkno 6073,size 4096, error 12 Nov 30 03:05:12 peach kernel: swap_pager: I/O error - pageout failed; blkno 6077,size 4096, error 12 Nov 30 03:05:12 peach kernel: swap_pager: I/O error - pageout failed; blkno 6078,size 4096, error 12 Nov 30 03:05:12 peach kernel: swap_pager: I/O error - pageout failed; blkno 6079,size 8192, error 12 Nov 30 03:05:12 peach kernel: swap_pager: I/O error - pageout failed; blkno 6081,size 4096, error 12 Nov 30 03:05:12 peach kernel: swap_pager: I/O error - pageout failed; blkno 6084,size 8192, error 12 Nov 30 03:05:12 peach kernel: swap_pager: I/O error - pageout failed; blkno 6086,size 4096, error 12 Nov 30 03:05:12 peach kernel: swap_pager: I/O error - pageout failed; blkno 6087,size 4096, error 12 Nov 30 03:05:12 peach kernel: swap_pager: I/O error - pageout failed; blkno 6088,size 4096, error 12 Nov 30 03:05:12 peach kernel: swap_pager: I/O error - pageout failed; blkno 6092,size 4096, error 12 Nov 30 03:05:12 peach kernel: swap_pager: I/O error - pageout failed; blkno 6096,size 4096, error 12 Nov 30 03:05:12 peach kernel: swap_pager: I/O error - pageout failed; blkno 6097,size 8192, error 12 % sysctl kern.maxswzone kern.maxswzone: 0 % swapinfo Device 1K-blocks Used Avail Capacity /dev/vtbd0p3 1048540 277948 770592 27% % gpart show -l => 34 41942973 vtbd0 GPT (20G) 34 512 1 freebsd-boot (256K) 546 39845376 2 peach (19G) 39845922 2097085 3 freebsd-swap (1.0G) Vtbd0 is SSD on vps. Although I got these warning messages with today's periodic daily cron job, it seems to have finished normally.
(In reply to Masachika ISHIZUKA from comment #51) In tday's periodic daily cron job on patched vm_page.c of 12.1R-p1, 'out of swap space' with no 'swap_pager_getswapspace failed' were occured again as follows. % grep swap /var/log/messages Dec 3 03:52:41 peach kernel: pid 2888 (mysqld), jid 0, uid 88, was killed: out of swap space Dec 3 03:52:41 peach kernel: pid 49246 (httpd), jid 0, uid 80, was killed: out of swap space Dec 3 03:52:41 peach kernel: pid 1604 (nsd), jid 0, uid 216, was killed: out of swap space % sysctl vm.pageout_oom_seq vm.pageout_oom_seq: 1024 % sysctl vm.pfault_oom_attempts vm.pfault_oom_attempts: -1
(In reply to Masachika ISHIZUKA from comment #52) I'm afraid that, short of trying to add code to force a back-trace near the "was killed: out of swap space" notice, I do not have a clue how to find out what context and condition is leading to the process kills for your context. Nor am I aware of anything else for trying to avoid the behavior. (I'm not claiming forcing a back-trace to display is reasonable in the context. I just do not have any other ideas.)
I have been following this bug as I have been having either the same or a similar problem. Running unpatched 12.1-STABLE r354672 with 8G RAM and GG swap. I am frequently seeing "+swap_pager_getswapspace(32): failed" many days when periodic runs. This seems to be tied to heavy memory pressure with little available RAM and one program rapidly eating memory. I have confirmed that the pressure is is the result Firefox consuming over 6G of RAM as well as over 6G of swap. I have not determined exact values of swap consumed, but it is possibly close to 7G of the 8G available. I have seen this happen at other times when I am logged in and active on the system. One firefox process suddenly starts growing rapidly with even though firefox has been sitting fairly passive for a long time. This all started when a change was committed to STABLE that broke the VM on systems with <16G or RAM due to a miscalculation. Mark J. fixed this and, shortly after, committed the fix to 12-STABLE. At about the same time I updated Firefox to version 70. This is when the problems started and I can't confirm which triggered this. I can say that killing firefox when this happens can take as long as 30 minutes with my system effectively dead until then. I will note that the system is an eight year old Lenovo T520 and not the fastest. The problem is now happening and one process has grown from 500M to 4.8G as per "top -o res". The system is now down to <800M available (free+laundry+inactive) and a very active disk although, at this time, I still have 5.2G of free swap. Both discs with swap space are very active. Saving this now while I look for other clues.
While I am not sure of it, I now suspect that issue I am seeing is legitimate memory exhaustion caused by a single web site, newsmemory.com on which I read a couple of newspapers' digital editions. I have no clue why it is blowing up, but I seriously doubt that it is a system issue. Sorry for the noise!
(In reply to Masachika ISHIZUKA from comment #52) It seems that 'sysctl vm.pfault_oom_attempts=-1' in /boot/loader.conf is good solution for 'out of swap space' without 'swap_pager_getswapspace failed'. It dose not happen 10 days after reboot. Previous, I set 'sysctl vm.pfault_oom_attempts=-1' without reboot and occured 'out of swap space' with no 'swap_pager_getswapspace failed'. It seems that reboot was necessary.
(In reply to Masachika ISHIZUKA from comment #56) Unfortunately, 'out of swap space' without 'swap_pager_getswapspace failed' had occured this morning's daily cron job as follows. % grep swap /var/log/messages Dec 14 04:16:05 peach kernel: pid 59975 (mysqld), jid 0, uid 88, was killed: out of swap space Dec 14 04:16:05 peach kernel: pid 48249 (httpd), jid 0, uid 80, was killed: out of swap space Dec 14 04:16:05 peach kernel: pid 26677 (httpd), jid 0, uid 80, was killed: out of swap space Dec 14 04:16:05 peach kernel: pid 60170 (nsd), jid 0, uid 216, was killed: out of swap space
I don't know if this is related but on r358483 I am seeing swap space being on workloads that r357362 do not. Eventually the system becomes unresponsive. One difference I noticed while watching top is that on r358483 I see almost all my RAM in 'Wired' but in r357362 only a few gigs. I'm also not sure if this is related to chromium 79, I'm on 78 on the earlier kernel and 79 on the later one. If this is definitely not related I can make a new ticket.
(In reply to mmatalka from comment #58) Assuming this is on head, you are probably hitting the wired memory leak fixed in r358505.
Thank you, I'll verify.
I'm going to resolve the bug as fixed, since I committed several improvements to our low memory handling based on Kevin's test case of resuming a Windows VM in VirtualBox. I now use Windows+VirtualBox a fair bit myself, so it's been possible to continue verifying the kernel's behaviour in that scenario. VirtualBox is rather special since it abuses some kernel interfaces to allocate memory at a very high priority, and immediately returns failure when an allocation fails. We can't easily fix this since it's third-party code. The FreeBSD changes in question are r355002, r355003 and r355004. The last couple of comments deal with a kernel memory leak; that is unrelated to the original issue and has been fixed.
FYI: I will note that: https://lists.freebsd.org/archives/freebsd-hackers/2022-April/001022.html is the start of a sequence of messages of someone trying to buildkernel their way out of a 12.1-RELEASE to 12.3 and finding kills lead to: "Can not build kernel on 1GB VM". 2019-Nov-04 was the initial 12.1-RELEASE date and so is about a month after this bugzilla submittal. (12.1 is no longer supported as well.) So it might be evidence that there still was a problem. The context is old and so the standard messaging report out of swap space as the reason for the kill, no matter which of the 4(?) conditions actually initiates the kill. (Likely the interesting time is earlier then that.) So far my attempt to help with the classic sysctl settings for avoiding the kills has not been much help. I also do not know if there is a type of information from the failure that might still be of use for supported releases and stable/12 --given that there is a repeatable failure context to look at.