241048 – After r349840, OOM reported when swap space is available.

Bug 241048 - After r349840, OOM reported when swap space is available.

Summary: After r349840, OOM reported when swap space is available.

Status:	Closed FIXED

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	12.0-STABLE
Hardware:	Any Any

Importance:	--- Affects Some People
Assignee:	Mark Johnston

URL:
Keywords:	needs-qa, regression

Depends on:
Blocks:	240700
	Show dependency tree / graph

Reported:	2019-10-04 02:15 UTC by rkoberman
Modified:	2022-05-17 21:24 UTC (History)
CC List:	7 users (show)

See Also:

Flags:	koobs: mfc-stable12?

Attachments
proposed patch for head (1.85 KB, patch) 2019-10-04 20:16 UTC, Mark Johnston	no flags	Details \| Diff
proposed patch for stable/12 (625 bytes, patch) 2019-10-04 20:16 UTC, Mark Johnston	no flags	Details \| Diff
proposed patch for stable/12 (1.41 KB, patch) 2019-10-05 15:00 UTC, Mark Johnston	no flags	Details \| Diff
proposed patch (2.03 KB, patch) 2019-10-11 03:01 UTC, Mark Johnston	no flags	Details \| Diff
vm.stats on amd64 12-STABLE howing error (1.61 KB, text/plain) 2019-10-13 02:44 UTC, rkoberman	no flags	Details
debug patch (719 bytes, patch) 2019-10-13 18:34 UTC, Mark Johnston	no flags	Details \| Diff
Show Obsolete (3) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description rkoberman 2019-10-04 02:15:12 UTC

After the MFC of base r349840 amd base r349841 to 12-STABLE as base r350374, attempts to start a VirtualBox Windows7 guest would fail when there was memory pressure forcing swapping. Another user running HEAD on an RPi3 with 6 GB of RAM is reporting similar OOM issues, though his case is triggered by building www/chromium. It does involve OOM with a lot of swap available and it appears to not occur if r349840 and r349841 are rolled back. Just reverting base r349841 does not resolve the issue.

Note that r349840 should do absolutely nothing on systems with less than 16 GB of RAM. This is per the author, markj@. So far he has no explanation for the problem.

My system is a Lenovo T520 with a Sandy Bridge processor, 8 GB of RAM and * GB of swap space evenly spit over two drives. The amount of swap space in use seems irrelevant ss the failure has occurred with over 7 GB of free swap space. The failure never occurs when the free memory is adequate to start the VM without swapping.

I have not looked at the code in VirtualBox to see how it allocates memory, but I would guess that it uses slab allocation for loading the memory image (4 GB) of the virtual machine. I suspect that it uses malloc for loading the VB (VirtualBox) code. It looks like the load of the virtual machine's memory works OK, as a minimum of 98% of the load always completes regardless of the amount of swap required without problem. At 98%, the load percentage holds for several seconds and then errors out. If there is no memory pressure, there is no pause at 98%. If the VM is restarted, it succeeds with no pause.

An occasional non-fatal error is similar except that the pause may happen at 99% and a non-fatal error that the VM could not be started due to lack of memory and applications should be terminated to free memory. The VM is left in the paused (greyed out) state, but the system may be immediately "un-paused" and it runs instantly with no other action.

The other user, Bob Prohaske, provided information on his case and agreed to my including a link to it in this PR. The files are at http://www.zefox.net/~fbsd/rpi3/swaptests/oberman/ I am not attaching he logs due to their size. They include logs of attempts to build chromium which succeeds with r349839 and fails with r349841. Note that his system is running HEAD.

Comment 1 Mark Millard 2019-10-04 17:21:28 UTC

(In reply to rkoberman from comment #0)

An RPi3 has only 1 GiByte of RAM. Did you mean
6 GiBite of SWAP? Was the RPi3 an incorrect
identification?

Example messages that were misnomers:
(taken from another list exchange on the issue)

Sep 11 23:21:43 zipper kernel: pid 16581 (rustc), jid 43, uid 65534, was killed: out of swap space
Sep 12 02:48:23 zipper kernel: pid 1209 (rustc), jid 62, uid 65534, was killed: out of swap space
. . .

Unfortunately, the wording of this type of message is a misnomer for
what drove the associated kills: it was actually driven by being unable
to gain more free memory in order to maintain a minimum amount. FreeBSD
will not swap-out processes that stay runnable (or are running), only
ones that are waiting. Even a single process that stays runnable and
keeps lots of RAM in the active category can lead to kills when swap
is unused or little used. So the kill-behavior is very workload
dependent. (Recently additional kills on other criteria were added
on current.)

Real "out of swap" conditions tend to also have messages
similar to (or always do?):

Aug 5 17:54:01 sentinel kernel: swap_pager_getswapspace(32): failed

If you are not seeing such messages, then it is likely (certain?)
that the mount of swap space still free is not the actual thing
driving the kills.

Are you seeing "swap_pager_getswapspace(32): failed" messages?

Whatever for that, it would be nice if "was killed: out of swap space"
messages were changed not not be so over specific/incorrect about
the cause, frequently misleading people that get the messages and
leading to user activity that can not help.

From where on I may well be mixing in current (13) material,
not just 12.x material.

There are tunable's for controlling (some of) the kill behavior
that have helped on small armv7 or aarch64 boards:

vm.pageout_oom_seq
vm.pfault_oom_wait
vm.pfault_oom_attempts

Prior to the last two being added in current, increasing
vm.pageout_oom_seq to increase how many times FreeBSD would
try to gain free RAM before starting kills used to be sufficient
to get buildworld buildkernel and other activities to complete
on various small armv7 or aarch64 boards. Setting vm.pfault_oom_attempts
to -1 disables that pfault_oom mechanism but if there is a worry
about actually running out of swap, that might not be appropriate
(FreeBSD stopping overall from lack of resources).

When vm.pfault_oom_attempts is not -1, the overall criteria is based
on vm.pfault_oom_wait*vm.pfault_oom_attempts before starting
kills. But there may be a tradeoff structure between different
ways of having (nearly) the same total for the multiplication.

https://lists.freebsd.org/pipermail/freebsd-current/2019-September/074310.html
was the start of an exchange about that involved material about the
vm.pfault_oom_<?> additions. (And mentions vm.pageout_oom_seq as well.)

Comment 2 Mark Johnston freebsd_committer

2019-10-04 20:16:21 UTC

Created attachment 208104 [details]
proposed patch for head

Comment 3 Mark Johnston freebsd_committer

2019-10-04 20:16:42 UTC

Created attachment 208105 [details]
proposed patch for stable/12

Comment 4 Mark Johnston freebsd_committer

2019-10-04 20:17:24 UTC

I uploaded a couple of patches which I believe will fix the regression.  Could you please confirm that the issue is resolved when they are applied?

Comment 5 Mark Johnston freebsd_committer

2019-10-05 15:00:11 UTC

Created attachment 208118 [details]
proposed patch for stable/12

Comment 6 rkoberman 2019-10-07 14:29:33 UTC

After applying the latest proposed patches, the problem persists. No change on the  symptoms.

Comment 7 Bob Prohaska 2019-10-07 16:54:59 UTC

There's a logfile for a kill on a pi3 running r353116M at
http://www.zefox.net/~fbsd/rpi3/swaptests/oberman/r353116M/

The patch seemed to apply cleanly and there were no problems compiling.
The kill is near timestamp Oct  7 05:28:38 and swap storage activity
beforehand looks relatively light. There is a surge immediately after
the kill, which surprised me.

The patch for stable/12 was applied to a pi2 running r353115:353124M 
(the versioning is something I've not seen before). It applied cleanly
and there were no problems compiling. So far, nothing undesirable has
happened, but it's worth noting that 32-bit arm has never exhibited
the OOMA difficulties associated with 64-bit arm.

Comment 8 rkoberman 2019-10-07 19:46:20 UTC

(In reply to Mark Millard from comment #1)
I will note that I don't get any "Out of swap space" messages. You are looking at the test from Bob Prohaska who also started seeing problems with the same commit to head and had success after reverting that commit. His memory pressure was from attempting to build chromium on a 6G RPi3.

I agree that the message is really misleading, but not an issue. I am running on an AMD64 Sandy Bridge system and have not seen this error the "Out of swap space" message nor are processes being killed. It looks like VirtualBox loads the VM's RAM successfully, then tries to load the VM support code in the memory starved system and that fails. From the multi-second delay at 98% loaded. I could be totally wrong about this, but the hypothesis fits the symptoms quite well. I suspect that the RAM loading code is very different from the load of the support code. I would expect that RAM to use slab allocation and the support code to use regular malloc, but, again, this is speculation.

The problem is only triggered after the the commit of a patch which should not ever run on a system with 8G of RAM. If I revert that patch, the error never occurs. It seems pretty clear that the commit is the trigger, though the last patch markj posted really should have made the code unreachable for an system with under 16G of RAM.

BTW, at any moment I have a long list of idle processes sitting on quite a bit of memory. I'm running MATE which is not quite a big a memory pig and Gnome, but it's close with a lot of assorted stuff that mostly sits around waiting for something to happen that probably won't and are good candidate for swap.I do have one HUGE memory pig in Firefox which starts at about 2.5G and climbs to north of 3G. Right now "ps -o rss,command -Uoberman | grep firefox" is showing 3.31G over 10 processes. (Why is resident memory 'rss' in ps and 'res' in top? Minor annoyance.)

One very noticeable artifact of the problem is that VirtualBox always loads 98% of it's 4G of RAM and then pauses for several seconds before dying and always succeeds when immediately start the VM again with no processes killed and no memory freed.

Comment 9 Bob Prohaska 2019-10-09 00:55:43 UTC

It turns out that OOMA kills can be provoked using Peter Holm's stress2 test.
An example on a Pi3 running 

FreeBSD www.zefox.org 13.0-CURRENT FreeBSD 13.0-CURRENT #2 r353297: Tue Oct  8 03:11:47 PDT 2019     bob@www.zefox.org:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC  arm64

produces a log excerpt that looks like this:

procs     memory        page                    disks     faults         cpu
r b w     avm     fre  flt  re  pi  po    fr   sr mm0 da0   in    sy    cs us sy id
912 0 4 2371076   19368   135   6   0   0   338  618   0   0 19815  2807 10286  1  9 90
dT: 17.380s  w: 10.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s   kBps   ms/d   %busy Name
    0    105      7    215   62.9     98   1703  134.4      0      0    0.0   74.7  mmcsd0
    0     49      0      4    3.6     49    466    5.3      0      0    0.0   15.1  da0
    0    105      7    215   77.9     98   1703  140.2      0      0    0.0   85.9  mmcsd0s2
    0     57      7    212   73.5     51   1230  247.1      0      0    0.0   80.1  mmcsd0s2a
    0     48      0      4  149.5     48    473   26.1      0      0    0.0   39.7  mmcsd0s2b
    0     57      7    212   73.6     51   1230  247.2      0      0    0.0   80.2  ufs/rootfs
    0     48      0      4    3.7     47    466    5.2      0      0    0.0   15.2  da0p5
Tue Oct  8 13:37:16 PDT 2019
Device          1K-blocks     Used    Avail Capacity
/dev/mmcsd0s2b    4404252    23880  4380372     1%
/dev/da0p5        2097152    23344  2073808     1%
Total             6501404    47224  6454180     1%
Oct  8 13:36:34 www bob[14995]: Starting test all.cfg
Oct  8 13:37:13 www kernel: pid 15110 (swap), jid 0, uid 1001, was killed: out of swap space

I certainly don't know how to fully interpret this, but it seems clear the _amount_ of swap in
use isn't excessive. /boot/loader.conf contains
vm.pageout_oom_seq="4096"

Thanks for reading, I hope it's useful.

Comment 10 Mark Johnston freebsd_committer

2019-10-09 13:40:15 UTC

(In reply to Bob Prohaska from comment #9)
stress2 is designed to trigger low memory conditions, so OOM kills are to be expected.

Comment 11 Bob Prohaska 2019-10-09 15:12:58 UTC

(In reply to Mark Johnston from comment #10)
Does it appear that a low memory condition was actually created, and that the
kill was justified in the example provided? The swap devices were not very busy,
the amount of swap in use quite modest. 

Thanks for reading!

Comment 12 Mark Millard 2019-10-09 19:37:55 UTC

(In reply to Bob Prohaska from comment #11)

Does it last longer with

vm.pfault_oom_attempts=-1

?

I'm not sure how to uniquely tell which constraint lead
to a specific Out Of Memory Kill but the above setting
disables one of the constraints so the "last longer"
status being "yes" would indicate something about
the constraint hit.

If that does last longer, then different combinations
of values for:

vm.pfault_oom_wait
vm.pfault_oom_attempts

even for the about the same total (multiplication of
the two) may vary the behavior.

If it was not this category of constraint, I' not sure
what else to explore, given your already large:

vm.pageout_oom_seq="4096"

To my knowledge there is nothing to enable reporting
information to well help identify the constraint(s)
being hit (or approached). The old patch that we were
supplied and used no longer applies. As far as I know
it has not been updated. (And it predated at least
the pfault oom settings being available, if I
understand right.)

Comment 13 Bob Prohaska 2019-10-09 21:12:14 UTC

(In reply to Mark Millard from comment #12)

I haven't tried using 
vm.pfault_oom_attempts=-1
but can do so after the present experiment completes.

That experiment consists of overprovisioning swap to 9 GB, of which 8 GB
are usable (4GB on the microSD card plus 5 GB on a USB flash drive).
This is on the Pi3 at r353297

The compilation is not quite half done by filecount, but so far swap use
is less than half the value reached using the former swap configuration
of 4 GB on microSD plus 2 GB on usb flash resulting in 4 GB usable swap.

It's contrary to my expectation. Might it be significant if it holds true?

Comment 14 Mark Johnston freebsd_committer

2019-10-11 03:01:32 UTC

Created attachment 208237 [details]
proposed patch

Here is another patch to try.  It should apply to both head and stable/12 (with some offset).

I noticed that the heuristic we use to decide whether or not to perform per-CPU caching was wrong in that it doesn't take into account the fact that we cache two buckets of up to 256 free pages per CPU, not one.  I also adjusted the fraction of total RAM that the per-CPU caches are permitted to cache so that we are more conservative.

Comment 15 rkoberman 2019-10-11 03:20:29 UTC

(In reply to Mark Johnston from comment #14)
For 12-STABLE r353118M, so I just need this patch and revert the prior patches?

Comment 16 Mark Johnston freebsd_committer

2019-10-11 14:50:31 UTC

(In reply to rkoberman from comment #15)
Yes, please.

Comment 17 Bob Prohaska 2019-10-11 16:41:23 UTC

(In reply to Mark Johnston from comment #14)

Can the new patch be tested on 13.0-CURRENT FreeBSD 13.0-CURRENT #2 r353297 ?

The compilation of www/chromium on that system is still running and probably
won't finish for another couple of days. Max swap use so far is ~3.5 GB, about
the same as ~6 months ago compared to 4.2 GB two weeks ago. 

Presently the machine looks badly swap-bound with swap devices more than 50%
busy and % idle over 50 %. Oddly enough, the

swap_pager: indefinite wait buffer: bufobj:....." 

messages are very sparse, none for the last half hour.

Please indicate if it's more useful to let the compile finish or 
interrupt it to apply the patch. At the moment,  vm.pageout_oom_seq="4096"
and there are 8 GB of usable swap. Might it help to make any "on the fly"
changes to
vm.pageout_oom_seq
vm.pfault_oom_wait
vm.pfault_oom_attempts ?

Comment 18 rkoberman 2019-10-12 21:29:10 UTC

(In reply to Mark Johnston from comment #16)
Again, it i not fixed. I create memory pressure, start the VirtualBox VM and get an OOM kills it at 98% loaded.As usual, I start the VM again and it starts normally. Again, this is on 12-STABLE r353118M on anAMD64.

For some reason, perhaps a fix in Firefox, I could not generate memory pressure. Forthe first time I saw resident memory use by firefox start at 3.2G, but I could not get it to the 3.6 needed to create pressure. I added a real memory hog, a build of lang/rust and that got available memory down to about 3G after which the failure to start the VM to re-appear.

Comment 19 Mark Johnston freebsd_committer

2019-10-12 21:41:35 UTC

(In reply to rkoberman from comment #18)
Am I right that hw.ncpu is 4 on your system?

With the patch applied, can you run

# dtrace -n 'BEGIN{print(`vm_dom[0].vmd_pgcache[0]);}'

and paste the output?  Could you also attach the output of "sysctl vm.stats" with the patch applied?

Comment 20 rkoberman 2019-10-13 02:44:57 UTC

Created attachment 208276 [details]
vm.stats on amd64 12-STABLE howing error

Yes, hw.ncpu is 4.
CPU     ID                    FUNCTION:NAME
  0      1                           :BEGIN struct vm_pgcache {
    int domain = 0
    int pool = 0
    uma_zone_t zone = 0xfffff80003308780
}
 Attached vm.stats

Comment 21 rkoberman 2019-10-13 02:49:59 UTC

(In reply to rkoberman from comment #20)
Foot to mention that the vm.stats collected on freshly re-booted system before any memory pressure or errors..

Comment 22 Mark Johnston freebsd_committer

2019-10-13 16:10:48 UTC

(In reply to rkoberman from comment #21)
That's fine, thanks.

Something doesn't add up.  In my patch we have:

if (vmd_page_count / 600 < 2 * 256 * mp_ncpus * VM_NFREEPOOL)
    continue;

From your sysctls I get 2001978, you have mp_ncpus == 4, and VM_NFREEPOOL == 2 on amd64.  2001978 / 600 == 3336 and 2 * 256 * 4 * 2 == 4096, so we should not be creating the cache zones on your system, but the dtrace output shows that we did.

I tried booting a bhyve VM with 8GB of RAM and 4 CPUs, and I have:

# sysctl vm.stats.vm.v_page_count
vm.stats.vm.v_page_count: 2031299
# sysctl hw.ncpu
hw.ncpu: 4

and

# dtrace -n 'BEGIN{print(`vm_dom[0].vmd_pgcache[0]);}'
dtrace: description 'BEGIN' matched 1 probe
CPU     ID                    FUNCTION:NAME
  3      1                           :BEGIN struct vm_pgcache {
    int domain = 0
    int pool = 0
    uma_zone_t zone = 0
}

Are you certain you tested the correct patch and kernel?

Comment 23 rkoberman 2019-10-13 18:00:43 UTC

(In reply to Mark Johnston from comment #22)
Here is confirmation of the patch. I did the kernel build with -DNO_CLEAN. 
# svn st /usr/src/sys
?       /usr/src/sys/amd64/conf/GENERIC.4BSD
M       /usr/src/sys/dev/acpi_support/acpi_ibm.c
?       /usr/src/sys/dev/acpi_support/acpi_ibm.c.orig
M       /usr/src/sys/vm/vm_page.c
# svn diff /usr/src/sys/vm/vm_page.c
Index: /usr/src/sys/vm/vm_page.c
===================================================================
--- /usr/src/sys/vm/vm_page.c	(revision 353118)
+++ /usr/src/sys/vm/vm_page.c	(working copy)
@@ -199,10 +199,13 @@
 		vmd = VM_DOMAIN(domain);
 
 		/*
-		 * Don't allow the page caches to take up more than .25% of
-		 * memory.
+		 * Don't allow the page caches to take up more than .1875% of
+		 * memory.  A UMA bucket contains at most 256 free pages, and w
+		 * have two buckets per CPU per free pool.
 		 */
 		if (vmd->vmd_page_count / 400 < 256 * mp_ncpus * VM_NFREEPOOL)
+		if (vmd->vmd_page_count / 600 < 2 * 256 * mp_ncpus *
+		    VM_NFREEPOOL)
 			continue;
 		for (pool = 0; pool < VM_NFREEPOOL; pool++) {
 			pgcache = &vmd->vmd_pgcache[pool];

Comment 24 Mark Johnston freebsd_committer

2019-10-13 18:34:22 UTC

Created attachment 208284 [details]
debug patch

(In reply to rkoberman from comment #23)
Are you using a non-default kernel configuration?  Could you share it?

Would you please also try applying the attached patch on top of the existing patch, and post the dmesg from a fresh boot?  Thanks for your patience.

Comment 25 rkoberman 2019-10-13 19:02:29 UTC

I just noticed an error in my application of the patch. The diff I posted is missing a rather critical '-'.

I just rebuilt the kernel and will test shortly.

My sincere apologies for the waste of your time. Should the problem NOT be fixed, I will add the debug code, but I don't think it will be needed.

Comment 26 rkoberman 2019-10-13 19:46:42 UTC

Looks like the patch did the trick. I just started my VM with only 2.5G of available RAM with no problems.

Thanks so much for tracking this down and I apologize again for the botched patch. With the >30 line offset, I just manually did the patch and how I missed the deletion of the old line, I can't imagine. Rushing, I guess. The date on my kernel was only 19 minutes after you posted it and haste made waste, at least of your time.

Comment 27 Bob Prohaska 2019-10-13 22:16:05 UTC

(In reply to Mark Johnston from comment #14)
I'm doing something wrong; when I copy the unified diff text from the
patch into a file and feed it to patch then patch asks what file to patch.

Is there a downloadable version that's idiot proof?

Apologies for the dumb questions...

Comment 28 rkoberman 2019-10-13 22:34:28 UTC

(In reply to Bob Prohaska from comment #27)
# cd /usr/src
# patch < patch-file
#

There are other ways, but this should work.

Should work for HEAD.

Comment 29 Bob Prohaska 2019-10-14 03:48:07 UTC

 (In reply to Bob Prohaska from comment #27)
It looks as if removing the absolute pathnames from the patch
fixed my mistake. The kernel compiled and booted, a new build
of www/chromium is running now.

apologies for the noise.

Comment 30 Mark Johnston freebsd_committer

2019-10-15 16:14:38 UTC

(In reply to rkoberman from comment #26)
No worries, I'm glad that the results finally make sense. :)

I submitted a patch for review: https://reviews.freebsd.org/D22040
I should be able to get it into 12.1.

Comment 31 Mark Johnston freebsd_committer

2019-10-15 16:15:51 UTC

(In reply to Bob Prohaska from comment #27)
For patches generated from git ("diff --git") you need to strip off the dummy path element: cd /usr/src && patch -p1 < /path/to/diff.

Comment 32 Bob Prohaska 2019-10-15 16:29:39 UTC

Compilation of www/chromium started out well using attachment 208237 [details], seemingly 
running much faster (11,000 files in 36 hours) than previously.

Make stopped around  Tue Oct 15 08:36:22 PDT 2019. The first sign of
trouble is
Oct 15 08:36:12 www kernel: smsc0: warning: failed to create new mbuf
followed less than a minute later by
Oct 15 08:37:43 www kernel: pid 27729 (c++), jid 0, uid 0, was killed: out of swap space 

Total swap use looks fairly modest, less than 1 GB, and swap devices were busy
but not saturated, ~80% for microSD and 23% for USB. Those are all well below
recent maxima.

The logs of make and the swapscript (vmstat, gstat, swapinfo and tail of the
console) are at

http://www.zefox.net/~fbsd/rpi3/swaptests/oberman/r353297M/

For clarity, the host is an RPI3B running GENERIC with 4 GB of swap on
microSD and 5 GB of swap on usb flash.

Comment 33 Mark Johnston freebsd_committer

2019-10-15 16:33:59 UTC

(In reply to Bob Prohaska from comment #32)
I believe you are on -CURRENT?  Could you post the output of "vmstat -z | grep pgcache" from your system with that patch applied?

Comment 34 Mark Johnston freebsd_committer

2019-10-15 16:34:22 UTC

(In reply to Mark Johnston from comment #33)
The output of "sysctl vm.stats" would also be helpful.

Comment 35 Bob Prohaska 2019-10-15 17:09:22 UTC

(In reply to Mark Johnston from comment #33)
Run after the compile had stopped, the output is null.

Comment 36 Bob Prohaska 2019-10-15 17:11:02 UTC

(In reply to Mark Johnston from comment #34)
root@www:/usr/ports/www/chromium # sysctl vm.stats
vm.stats.page.queue_nops: 436827
vm.stats.page.queue_ops: 1198763477
vm.stats.object.bypasses: 86464
vm.stats.object.collapses: 270427
vm.stats.vm.v_pdpages: 1877368333
vm.stats.vm.v_tcached: 0
vm.stats.vm.v_cache_count: 0
vm.stats.vm.v_user_wire_count: 0
vm.stats.vm.v_free_severe: 939
vm.stats.vm.v_interrupt_free_min: 2
vm.stats.vm.v_pageout_free_min: 34
vm.stats.vm.v_laundry_count: 20
vm.stats.vm.v_inactive_count: 152121
vm.stats.vm.v_inactive_target: 7441
vm.stats.vm.v_active_count: 10919
vm.stats.vm.v_wire_count: 44702
vm.stats.vm.v_free_count: 27930
vm.stats.vm.v_free_min: 1514
vm.stats.vm.v_free_target: 4961
vm.stats.vm.v_free_reserved: 365
vm.stats.vm.v_page_count: 230168
vm.stats.vm.v_page_size: 4096
vm.stats.vm.v_kthreadpages: 0
vm.stats.vm.v_rforkpages: 1544014
vm.stats.vm.v_vforkpages: 13113165
vm.stats.vm.v_forkpages: 1211415
vm.stats.vm.v_kthreads: 25
vm.stats.vm.v_rforks: 23754
vm.stats.vm.v_vforks: 87630
vm.stats.vm.v_forks: 17234
vm.stats.vm.v_tfree: 500721967
vm.stats.vm.v_pfree: 83354209
vm.stats.vm.v_dfree: 63883592
vm.stats.vm.v_pdshortfalls: 28080
vm.stats.vm.v_pdwakeups: 11081
vm.stats.vm.v_reactivated: 8170291
vm.stats.vm.v_intrans: 4744832
vm.stats.vm.v_vnodepgsout: 13636
vm.stats.vm.v_vnodepgsin: 1268156
vm.stats.vm.v_vnodeout: 445
vm.stats.vm.v_vnodein: 165935
vm.stats.vm.v_swappgsout: 35426346
vm.stats.vm.v_swappgsin: 19728367
vm.stats.vm.v_swapout: 6560288
vm.stats.vm.v_swapin: 4734304
vm.stats.vm.v_ozfod: 68449
vm.stats.vm.v_zfod: 455274388
vm.stats.vm.v_cow_optim: 341
vm.stats.vm.v_cow_faults: 5144267
vm.stats.vm.v_io_faults: 5023292
vm.stats.vm.v_vm_faults: 512016786
vm.stats.sys.v_soft: 13951607
vm.stats.sys.v_intr: 2398852975
vm.stats.sys.v_syscall: 258985139
vm.stats.sys.v_trap: 7473660
vm.stats.sys.v_swtch: 979740516
root@www:/usr/ports/www/chromium

Comment 37 Bob Prohaska 2019-10-15 17:11:20 UTC

(In reply to Mark Johnston from comment #34)
root@www:/usr/ports/www/chromium # sysctl vm.stats
vm.stats.page.queue_nops: 436827
vm.stats.page.queue_ops: 1198763477
vm.stats.object.bypasses: 86464
vm.stats.object.collapses: 270427
vm.stats.vm.v_pdpages: 1877368333
vm.stats.vm.v_tcached: 0
vm.stats.vm.v_cache_count: 0
vm.stats.vm.v_user_wire_count: 0
vm.stats.vm.v_free_severe: 939
vm.stats.vm.v_interrupt_free_min: 2
vm.stats.vm.v_pageout_free_min: 34
vm.stats.vm.v_laundry_count: 20
vm.stats.vm.v_inactive_count: 152121
vm.stats.vm.v_inactive_target: 7441
vm.stats.vm.v_active_count: 10919
vm.stats.vm.v_wire_count: 44702
vm.stats.vm.v_free_count: 27930
vm.stats.vm.v_free_min: 1514
vm.stats.vm.v_free_target: 4961
vm.stats.vm.v_free_reserved: 365
vm.stats.vm.v_page_count: 230168
vm.stats.vm.v_page_size: 4096
vm.stats.vm.v_kthreadpages: 0
vm.stats.vm.v_rforkpages: 1544014
vm.stats.vm.v_vforkpages: 13113165
vm.stats.vm.v_forkpages: 1211415
vm.stats.vm.v_kthreads: 25
vm.stats.vm.v_rforks: 23754
vm.stats.vm.v_vforks: 87630
vm.stats.vm.v_forks: 17234
vm.stats.vm.v_tfree: 500721967
vm.stats.vm.v_pfree: 83354209
vm.stats.vm.v_dfree: 63883592
vm.stats.vm.v_pdshortfalls: 28080
vm.stats.vm.v_pdwakeups: 11081
vm.stats.vm.v_reactivated: 8170291
vm.stats.vm.v_intrans: 4744832
vm.stats.vm.v_vnodepgsout: 13636
vm.stats.vm.v_vnodepgsin: 1268156
vm.stats.vm.v_vnodeout: 445
vm.stats.vm.v_vnodein: 165935
vm.stats.vm.v_swappgsout: 35426346
vm.stats.vm.v_swappgsin: 19728367
vm.stats.vm.v_swapout: 6560288
vm.stats.vm.v_swapin: 4734304
vm.stats.vm.v_ozfod: 68449
vm.stats.vm.v_zfod: 455274388
vm.stats.vm.v_cow_optim: 341
vm.stats.vm.v_cow_faults: 5144267
vm.stats.vm.v_io_faults: 5023292
vm.stats.vm.v_vm_faults: 512016786
vm.stats.sys.v_soft: 13951607
vm.stats.sys.v_intr: 2398852975
vm.stats.sys.v_syscall: 258985139
vm.stats.sys.v_trap: 7473660
vm.stats.sys.v_swtch: 979740516
root@www:/usr/ports/www/chromium

Comment 38 Bob Prohaska 2019-10-18 16:16:19 UTC

I'm still seeing "...killed: out of swap space..." on a Pi3 running
FreeBSD 13.0-CURRENT #3 r353297M with 8 GB of usable swap while 
compiling www/chromium.

The latest kill occurred with swap use around 300 MB (out of 8 GB)
and storage traffic about 50% of maximum.

Log files have been added to

http://www.zefox.net/~fbsd/rpi3/swaptests/oberman/r353297M/

Might this be a separate issue from the original, since
others report success with the patch?

Comment 39 rkoberman 2019-10-18 17:03:53 UTC

(In reply to Bob Prohaska from comment #38)
To my non-expert eye, I think there may be two issues.

Take a look at Mark Millard's comment #1. You get a misleading error claiming that the system is out of swap space. When memory is required but there are no swapable processes, you get this error. I think the error should really be "Unable to obtain memory by swapping out a process".

Also, did you ever post the output of "vmstat -z | grep pgcache"? On a CURRENT system, that should indicate whether this is the same issue.

Comment 40 Mark Johnston freebsd_committer

2019-10-18 17:27:29 UTC

(In reply to rkoberman from comment #39)
Indeed, I suspect that the issue is unrelated.  Comment 35 indicates that the per-CPU caches are not in use here.

I recall that we have seen spurious OOM kills on systems swapping to SD cards and other slow media, and that was caused by extremely high I/O latency.  This is still an issue.

Comment 41 commit-hook freebsd_committer

2019-10-18 17:37:09 UTC

A commit references this bug:

Author: markj
Date: Fri Oct 18 17:36:43 UTC 2019
New revision: 353734
URL: https://svnweb.freebsd.org/changeset/base/353734

Log:
  Further constrain the use of per-CPU caches for free pages.

  In low memory conditions a significant number of pages may end up stuck
  in the caches, and currently these caches cannot be reaped, leading to
  spurious memory allocation failures and OOM kills.  So:

  - Take into account the fact that we may cache up to two full buckets
    of pages per CPU, not just one.
  - Increase the amount of RAM required per CPU to enable the caches.

  This is a temporary measure until the page cache management policy is
  improved.

  PR:		241048
  Reported and tested by:	Kevin Oberman <rkoberman@gmail.com>
  Reviewed by:	alc, kib
  Discussed with:	jeff
  MFC after:	3 days
  Sponsored by:	The FreeBSD Foundation
  Differential Revision:	https://reviews.freebsd.org/D22040

Changes:
  head/sys/vm/vm_page.c

Comment 42 Bob Prohaska 2019-10-18 18:03:30 UTC

(In reply to rkoberman from comment #39)
root@www:/usr/ports/www/chromium # uname -a
FreeBSD www.zefox.org 13.0-CURRENT FreeBSD 13.0-CURRENT #3 r353297M: Sun Oct 13 16:15:00 PDT 2019     bob@www.zefox.org:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC  arm64
root@www:/usr/ports/www/chromium # vmstat -z | grep pgcache
root@www:/usr/ports/www/chromium #

Comment 43 Mark Millard 2019-10-18 18:57:38 UTC

(In reply to Mark Johnston from comment #40)

Does:

vm.pfault_oom_attempts=-1

disable some of the "extremely high I/O latency" ways of
getting OOM kills? All? (It might make for a good experiment?)

There used to be a patch that helped us know when "extremely
high I/O latency" was leading to OOM kills that we used for
investigation. But it no longer applies and no analogous
reporting was added to head (possibly under some means of
enabling not requiring rebuilding).

Be that as it may, How long ago was it that Bob was able to
do these builds (using at least vm.pageout_oom_seq)? It does
appear that something that contributes changed, if I
understand the sequence/history right. (This is not the same
as saying that the something is wrong/inappropriate for
FreeBSD overall.)

Comment 44 Bob Prohaska 2019-10-19 00:08:57 UTC

The puzzle for me is that OOMA kills don't seem to correspond in any obvious
way to times of worst I/O latency: in the latest example, the kill occured when
writes to mmcsd took about a quarter of a second. Looking through the swapscript
log, there are cases of latency many times greater, with no OOMA activity.

Here's a fairly (but not most-) extreme example:
procs     memory        page                    disks     faults         cpu
r b w     avm     fre  flt  re  pi  po    fr   sr mm0 da0   in    sy    cs us sy id
0 0 22 3145364   31732  2665  98 312 183  2741 19705   0   0 22994  1295 16479 47 15 39
dT: 10.011s  w: 10.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s   kBps   ms/d   %busy Name
   12      4      0      2  16087      4    120  12876      0      0    0.0  119.0  mmcsd0
   12      4      0      2  16087      4    120  12876      0      0    0.0  119.0  mmcsd0s2
    6      4      0      0    0.0      4    118  12671      0      0    0.0  105.9  mmcsd0s2a
    6      0      0      2  16087      0      2  16681      0      0    0.0  174.7  mmcsd0s2b
    6      4      0      0    0.0      4    118  12671      0      0    0.0  105.9  ufs/rootfs
Thu Oct 17 06:03:21 PDT 2019
Device          1K-blocks     Used    Avail Capacity
/dev/mmcsd0s2b    4404252   856616  3547636    19%
/dev/da0p6        5242880   853680  4389200    16%
Total             9647132  1710296  7936836    18%
Oct 17 06:00:21 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 201025, size: 16384
Oct 17 06:00:21 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 268296, size: 4096

Writes were taking sixteen seconds, OOMA didn't trigger and the build played on.

I agree that setting
vm.pfault_oom_attempts=-1
is worth a try and will do that next.

Comment 45 Bob Prohaska 2019-10-20 16:53:15 UTC

(In reply to Mark Johnston from comment #40)

You write:
---quote---
I recall that we have seen spurious OOM kills on systems swapping to SD cards and other slow media, and that was caused by extremely high I/O latency.  This is still an issue.
---end quote---

This rings a loud bell for me. Right now I'm seeing the 
"swap_pager: indefinite wait buffer:..." warning when svnlite is updating
/usr/src, an activity that doesn't normally involve swapping, but does entail
considerable storage activity. L(q) values exceeding 100 appear for mmcsd0,
with %busy of zero. A few seconds later %busy exceeds 300% and L(q) shrinks.

Is there a bug report that can be followed? I think this is relatively new
behavior, appearing sometime in the last couple of months.

Comment 46 commit-hook freebsd_committer

2019-10-22 14:10:45 UTC

A commit references this bug:

Author: markj
Date: Tue Oct 22 14:10:31 UTC 2019
New revision: 353884
URL: https://svnweb.freebsd.org/changeset/base/353884

Log:
  MFC r353734:
  Further constrain the use of per-CPU caches for free pages.

  PR:	241048

Changes:
_U  stable/12/
  stable/12/sys/vm/vm_page.c

Comment 47 Bob Prohaska 2019-10-22 17:12:00 UTC

(In reply to Mark Millard from comment #43)
---quote---
Does:

vm.pfault_oom_attempts=-1

disable some of the "extremely high I/O latency" ways of
getting OOM kills? All? (It might make for a good experiment?)
---end quote---

It certainly makes a difference. Setting vm.pfault_oom_attempts=-1
seemed to increase throughput severalfold, allowing make to build
around two-thirds of the files required for www/chromium. Indefinite
wait warnings were present, but seemingly somewhat more sparse. It
really looked as if system overhead was greatly reduced.

However, the machine stalled silently. The last log entries suggest
both swap and storage activity were well below maximum observed values
at the time of the stoppage.

The log files, final top window and backtrace can be found at
http://www.zefox.net/~fbsd/rpi3/swaptests/oberman/r353297M/
in the files containing "nopfault" in the name.

Comment 48 Masachika ISHIZUKA 2019-11-29 01:39:05 UTC

I'm using zfs on vps with very small rams (512mb).
It was good operating on 10.3R, 11.0R, 11.1R, 11.2R and 12.0R.
Recently, I upgraded to 12.1R and many processes were killed because of 'out of swap space' but no 'swap_pager_getswapspace failed'.
So, I set 'sysctl vm.pageout_oom_seq=1024' on /boot/loader.conf and this reduced killed processes but sometimes some processes were killed.
Increasing from 1024 to 10240 for vm.pageout_oom_seq was not changed any thing.
I patched vm_page.c to 12.1R-p1 but it didn't stop killing processes by 'out of swap space'.

Comment 49 Mark Millard 2019-11-29 02:05:32 UTC

(In reply to Masachika ISHIZUKA from comment #48)

I looked and stable/12/sys/vm/vm_fault.c -r351776 got the changes from head for
the likes of vm.pfault_oom_attempts so there are more ways/contexts that can
OOM and more things to control the overall behavior.

You may want to try:

vm.pfault_oom_attempts=-1

in addition to the:

vm.pageout_oom_seq=1024

and see what the consequences are for what it disables.

Use of the -1 presumes that you know the configuration will not run out of swap,
as I understand other notes that were in the lists.

If the -1 seems to help but running out of swap is a possibility you can not
avoid, there may be other appropriate settings for the combination of:

vm.pfault_oom_wait
vm.pfault_oom_attempts

that would be appropriate. (I've not explored such. As I remember the description
different factors that multiply out to [nearly] the same figure can have differing
consequences/tradeoffs.)

Comment 50 Masachika ISHIZUKA 2019-11-29 02:41:17 UTC

(In reply to Mark Millard from comment #49)
Thank you for reply.
As I know that the configuration will not run out of swap, I'll try to vm.pfault_oom_attempts="-1".
Most Killing precesses with 'out of swap space' were occured within three to seven days in periodic daily cron job.

Comment 51 Masachika ISHIZUKA 2019-11-29 23:29:22 UTC

(In reply to Masachika ISHIZUKA from comment #50)
The periodic daily on today produced 'swap_pager: I/O error - pageout failed' warnings as follows.

% grep swap_pager /var/log/messages
Nov 30 03:05:11 peach kernel: swap_pager: I/O error - pageout failed; blkno 6071,size 4096, error 12
Nov 30 03:05:12 peach kernel: swap_pager: I/O error - pageout failed; blkno 6072,size 4096, error 12
Nov 30 03:05:12 peach kernel: swap_pager: I/O error - pageout failed; blkno 6073,size 4096, error 12
Nov 30 03:05:12 peach kernel: swap_pager: I/O error - pageout failed; blkno 6077,size 4096, error 12
Nov 30 03:05:12 peach kernel: swap_pager: I/O error - pageout failed; blkno 6078,size 4096, error 12
Nov 30 03:05:12 peach kernel: swap_pager: I/O error - pageout failed; blkno 6079,size 8192, error 12
Nov 30 03:05:12 peach kernel: swap_pager: I/O error - pageout failed; blkno 6081,size 4096, error 12
Nov 30 03:05:12 peach kernel: swap_pager: I/O error - pageout failed; blkno 6084,size 8192, error 12
Nov 30 03:05:12 peach kernel: swap_pager: I/O error - pageout failed; blkno 6086,size 4096, error 12
Nov 30 03:05:12 peach kernel: swap_pager: I/O error - pageout failed; blkno 6087,size 4096, error 12
Nov 30 03:05:12 peach kernel: swap_pager: I/O error - pageout failed; blkno 6088,size 4096, error 12
Nov 30 03:05:12 peach kernel: swap_pager: I/O error - pageout failed; blkno 6092,size 4096, error 12
Nov 30 03:05:12 peach kernel: swap_pager: I/O error - pageout failed; blkno 6096,size 4096, error 12
Nov 30 03:05:12 peach kernel: swap_pager: I/O error - pageout failed; blkno 6097,size 8192, error 12
% sysctl kern.maxswzone
kern.maxswzone: 0
% swapinfo
Device          1K-blocks     Used    Avail Capacity
/dev/vtbd0p3      1048540   277948   770592    27%
% gpart show -l
=>      34  41942973  vtbd0  GPT  (20G)
        34       512      1  freebsd-boot  (256K)
       546  39845376      2  peach  (19G)
  39845922   2097085      3  freebsd-swap  (1.0G)

  Vtbd0 is SSD on vps.

  Although I got these warning messages with today's periodic daily cron job, it seems to have finished normally.

Comment 52 Masachika ISHIZUKA 2019-12-03 01:00:51 UTC

(In reply to Masachika ISHIZUKA from comment #51)
In tday's periodic daily cron job on patched vm_page.c of 12.1R-p1, 'out of swap space' with no 'swap_pager_getswapspace failed' were occured again as follows.

% grep swap /var/log/messages
Dec  3 03:52:41 peach kernel: pid 2888 (mysqld), jid 0, uid 88, was killed: out of swap space
Dec  3 03:52:41 peach kernel: pid 49246 (httpd), jid 0, uid 80, was killed: out of swap space
Dec  3 03:52:41 peach kernel: pid 1604 (nsd), jid 0, uid 216, was killed: out of swap space
% sysctl vm.pageout_oom_seq
vm.pageout_oom_seq: 1024
% sysctl vm.pfault_oom_attempts
vm.pfault_oom_attempts: -1

Comment 53 Mark Millard 2019-12-03 01:27:53 UTC

(In reply to Masachika ISHIZUKA from comment #52)

I'm afraid that, short of trying to add code to force
a back-trace near the "was killed: out of swap space"
notice, I do not have a clue how to find out what
context and condition is leading to the process kills
for your context. Nor am I aware of anything else for
trying to avoid the behavior.

(I'm not claiming forcing a back-trace to display is
reasonable in the context. I just do not have any other
ideas.)

Comment 54 rkoberman 2019-12-03 04:46:15 UTC

I have been following this bug as I have been having either the same or a similar problem. Running unpatched 12.1-STABLE r354672 with 8G RAM and GG swap. I am frequently seeing "+swap_pager_getswapspace(32): failed" many days when periodic runs. This seems to be tied to heavy memory pressure with little available RAM and one program rapidly eating memory. I have confirmed that the pressure is is the result Firefox consuming over 6G of RAM as well as over 6G of swap. I have not determined exact values of swap consumed, but it is possibly close to 7G of the 8G available.

I have seen this happen at other times when I am logged in and active on the system. One firefox process suddenly starts growing rapidly with even though firefox has been sitting fairly passive for a long time.

This all started when a change was committed to STABLE that broke the VM on systems with <16G or RAM due to a miscalculation. Mark J. fixed this and, shortly after, committed the fix to 12-STABLE. At about the same time I updated Firefox to version 70. This is when the problems started and I can't confirm which triggered this. I can say that killing firefox when this happens can take as long as 30 minutes with my system effectively dead until then.

I will note that the system is an eight year old Lenovo T520 and not the fastest.

The problem is now happening and one process has grown from 500M to 4.8G as per "top -o res". The system is now down to <800M available (free+laundry+inactive) and a very active disk although, at this time, I still have 5.2G of free swap. Both discs with swap space are very active. Saving this now while I look for other clues.

Comment 55 rkoberman 2019-12-03 05:06:50 UTC

While I am not sure of it, I now suspect that issue I am seeing is legitimate memory exhaustion caused by a single web site, newsmemory.com on which I read a couple of newspapers' digital editions. I have no clue why it is blowing up, but I seriously doubt that it is a system issue.

Sorry for the noise!

Comment 56 Masachika ISHIZUKA 2019-12-12 22:58:40 UTC

(In reply to Masachika ISHIZUKA from comment #52)

It seems that 'sysctl vm.pfault_oom_attempts=-1' in /boot/loader.conf is good solution for 'out of swap space' without 'swap_pager_getswapspace failed'.
It dose not happen 10 days after reboot.
Previous, I set 'sysctl vm.pfault_oom_attempts=-1' without reboot and occured 'out of swap space' with no 'swap_pager_getswapspace failed'.
It seems that reboot was necessary.

Comment 57 Masachika ISHIZUKA 2019-12-13 22:37:30 UTC

(In reply to Masachika ISHIZUKA from comment #56)
Unfortunately, 'out of swap space' without 'swap_pager_getswapspace failed' had occured this morning's daily cron job as follows.

% grep swap /var/log/messages
Dec 14 04:16:05 peach kernel: pid 59975 (mysqld), jid 0, uid 88, was killed: out of swap space
Dec 14 04:16:05 peach kernel: pid 48249 (httpd), jid 0, uid 80, was killed: out of swap space
Dec 14 04:16:05 peach kernel: pid 26677 (httpd), jid 0, uid 80, was killed: out of swap space
Dec 14 04:16:05 peach kernel: pid 60170 (nsd), jid 0, uid 216, was killed: out of swap space

Comment 58 mmatalka 2020-03-02 12:43:21 UTC

I don't know if this is related but on r358483 I am seeing swap space being on workloads that r357362 do not.  Eventually the system becomes unresponsive.

One difference I noticed while watching top is that on r358483 I see almost all my RAM in 'Wired' but in r357362 only a few gigs.

I'm also not sure if this is related to chromium 79, I'm on 78 on the earlier kernel and 79 on the later one.

If this is definitely not related I can make a new ticket.

Comment 59 Mark Johnston freebsd_committer

2020-03-02 14:30:21 UTC

(In reply to mmatalka from comment #58)
Assuming this is on head, you are probably hitting the wired memory leak fixed in r358505.

Comment 60 mmatalka 2020-03-02 15:01:07 UTC

Thank you, I'll verify.

Comment 61 Mark Johnston freebsd_committer

2020-07-11 18:38:37 UTC

I'm going to resolve the bug as fixed, since I committed several improvements to our low memory handling based on Kevin's test case of resuming a Windows VM in VirtualBox.  I now use Windows+VirtualBox a fair bit myself, so it's been possible to continue verifying the kernel's behaviour in that scenario.

VirtualBox is rather special since it abuses some kernel interfaces to allocate memory at a very high priority, and immediately returns failure when an allocation fails.  We can't easily fix this since it's third-party code.  The FreeBSD changes in question are r355002, r355003 and r355004.

The last couple of comments deal with a kernel memory leak; that is unrelated to the original issue and has been fixed.

Comment 62 Mark Millard 2022-05-17 21:24:33 UTC

FYI: I will note that:

https://lists.freebsd.org/archives/freebsd-hackers/2022-April/001022.html

is the start of a sequence of messages of someone trying to
buildkernel their way out of a 12.1-RELEASE to 12.3 and
finding kills lead to: "Can not build kernel on 1GB VM".

2019-Nov-04 was the initial 12.1-RELEASE date and so is
about a month after this bugzilla submittal. (12.1 is
no longer supported as well.)

So it might be evidence that there still was a problem.

The context is old and so the standard messaging report
out of swap space as the reason for the kill, no matter
which of the 4(?) conditions actually initiates the kill.
(Likely the interesting time is earlier then that.)

So far my attempt to help with the classic sysctl settings
for avoiding the kills has not been much help.

I also do not know if there is a type of information from
the failure that might still be of use for supported
releases and stable/12 --given that there is a repeatable
failure context to look at.