Bug 241848 - lib/googletest/gtest/tests: gmock-matchers_test.cc requires a pathological amount of memory to compile
Summary: lib/googletest/gtest/tests: gmock-matchers_test.cc requires a pathological am...
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 12.1-RELEASE
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-bugs mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-11-09 23:46 UTC by Robert Clausecker
Modified: 2020-02-08 20:01 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Robert Clausecker 2019-11-09 23:46:48 UTC
During a compilation of FreeBSD 12.1 for the Raspberry Pi 2B, the build failed at the file

    /usr/src/contrib/googletest/googlemock/test/gmock-matchers_test.cc

with an out of memory condition.  Further analysis revealed, that it takes more than 1.5 GB memory to build this file.  Far too much for my puny computer.  This makes it impossible to finish a FreeBSD build and given the lack of binary upgrade possibilities, makes it very difficult for me to upgrade to FreeBSD 12.1.

Please find out what causes this pathological memory usage and make it possible to build FreeBSD on a machine with no more than 1 GB of RAM as it was before.  Not that 1 GB of RAM (mostly due to having to build clang) isn't already an annoyingly high memory requirement for upgrading your system from source.
Comment 1 Dave Evans 2020-02-07 13:01:20 UTC
I'm doing a buildworld for amd64 13-CURRENT with FreeBSD 12.1 guest on macOS VirtualBox host with 2.4GB ram assigned to FreeBSD guest.

I have 7 GB of swap.  

The build gets to the Build Everything stage.
When it gets to gmock-matchers_test.cc the process is killed with an out of swap signal.

swapinfo:
Device               1K-blocks     Used    Avail Capacity
/dev/ada0p2           2097152     11524  2085628     1%
/dev/zvol/zroot/swap  5242880     12012  5230868      0%
Total                 7340032     23536  7316496     0%

How much swap does this file need to compile? swapon complains if I try to add more.
Comment 2 Mark Millard 2020-02-07 17:56:25 UTC
(In reply to Dave Evans from comment #1)

If you get console messages something like (extracted example):

. . . kernel: pid 7963 (strip), jid 0, uid 0, was killed: out of swap space

This message text's detailed wording is frequently a misnomer.
Do you also have any messages of the form:

. . . sentinel kernel: swap_pager_getswapspace(32): failed

If yes: you really were out of swap space.
If no:  you were not out of swap space,
       or at least it is highly unlikely that you were.

FreeBSD kills processes for multiple potential reasons.
For example:

a) Still low on free RAM after a number of tries to increase it above a threshold.
b) Slow paging I/O.
c) . . . (I do not know the full list) . . .

Unfortunately, FreeBSD is not explicit about the category
of problem that leads to the kill activity that happens.

You might learn more by watching how things are going
via top or some such program or other way of monitoring.
You likely will find the swap space not low.


Below are some notes about specific tunables that might
or might not be of help. (There may be more tunables
that can help that I do not know about.)

For (a) there is a way to test if it is the issue by
adding to the number of tries before it gives up and
starts killing things. That will either:

1) let it get more done before kills start
2) let it complete before the count is reached
3) make no significant difference

(3) would imply that (b) or (c) are involved instead.

(1) might be handled by having it do even more tries.

For delaying how long free RAM staying low is
tolerated, one can increase vm.pageout_oom_seq from
12 to larger. The management of slow paging I've
less experience with but do have some notes about
below.

Examples follow that I use in contexts with
sufficient RAM that I do not have to worry about
out of swap/page space. These I've set in
/etc/sysctl.conf . (Of course, I'm not trying to
deliberately run out of RAM.)

#
# Delay when persisstent low free RAM leads to
# Out Of Memory killing of processes:
vm.pageout_oom_seq=120

I'll note that figures like 1024 or 1200 or
even more are possible. This is controlling how
many tries at regaining sufficient free RAM
that that level would be tolerated long-term.
After that it starts Out Of Memory kills to get
some free RAM.

No figure is designed to make the delay
unbounded. There may be large enough figures to
effectively be bounded beyond any reasonable
time to wait.


As for paging I/O (WARNING: all the below tunables
are specific to head (13), or was last I checked):

#
# For plunty of swap/paging space (will not
# run out), avoid pageout delays leading to
# Out Of Memory killing of processes:
vm.pfault_oom_attempts=-1

(Note: In my context "plunty" really means
sufficient RAM that paging is rare. But
others have reported on using the -1 in
contexts where paging was heavy at times and
OOM kills had been happening that were
eliminated by the assignment.)

I've no experience with the below alternative
to that -1 use:

#
# For possibly insufficient swap/paging space
# (might run out), increase the pageout delay
# that leads to Out Of Memory killing of
# processes:
#vm.pfault_oom_attempts= ???
#vm.pfault_oom_wait= ???
# (The multiplication is the total but there
# are other potential tradoffs in the factors
# multiplied, even for nearly the same total.)


I'm not claiming that these 3 vm.???_oom_???
figures are always sufficient. Nor am I
claiming that tunables are always available
that would be sufficient. Nor that it is easy
to find the ones that do exist that might
help for specific OOM kill issues.

I have seen reports of OOM kills for other
reasons when both vm.pageout_oom_seq and
vm.pfault_oom_attempts=-1 were in use.
As I understand, FreeBSD did not report
what kind of condition lead to the
decision to do an OOM kill.

So the above notes may or may-not help you.
Comment 3 Robert Clausecker 2020-02-07 18:16:55 UTC
These notes aside: this memory usage is far from the norm for compiling a C++ source file.  I believe there must be a bug in clang or llvm or some unfortunate design in the source file itself that causes this memory usage.  This is more than twice the highest memory usage I observed before (roughly 800 MB for one of the X86 instruction selection files in the LLVM source) and as opposed to that case, there is no apparent explanation for the memory usage.
Comment 4 Mark Millard 2020-02-07 18:21:08 UTC
(In reply to Mark Millard from comment #2)

I should have noted: if a process stays runnable,
FreeBSD does not stop it and swap it out but
instead just pages it. (For FreeBSD, swapping
basically seems to mean that the kernel stacks
were also moved out to swap space and would have
to be brought back in for the process to run.)

Thus, 1 or more processes that use large amounts
of memory relative to the RAM size but also stay
runnable, are not not stopped and swapped out to
make room. In such a context, if free RAM stays
low, despite other efforts to gain some back,
processses are then killed instead.

vm.pageout_oom_seq controls how many attempts are
made to gain more free RAM before the kills start.
Comment 5 Mark Millard 2020-02-07 19:41:53 UTC
(In reply to Dave Evans from comment #1)

Was this a -j1 build? Something larger?
(There could be other things worth
reporting about the build context.)

It can be interesting to watch the system with
top sorted by resident RAM use (decreasing).
It can give a clue what are the major things
contributing to low free RAM while watching.

I'm not sure one can infer much from which
process FreeBSD decides to kill. Other
evidence is likely better about what is
contributing to the sustained low free RAM
that likely is what leaded to the process
kill(s).

To Robert:

I've been replying mostly to Dave because it has
been a significant time since I've experimented
with a 1 GiByte machine for buildworld buildkernel
and the like. Dave indicated over 2 GiByte for
his context.

You could try vm.pageout_oom_seq=1200 and a -j1
build and see if it helps you. Reporting the result
here might be useful.

Actually, you indicated "upgrade to FreeBSD 12.1",
so your context is appearently older, such as 12.0.
I'm not sure if vm.pageout_oom_seq goes back
that far. That might leave you only with -j1 (which,
for all I know, you might have already been using).
Comment 6 Mark Millard 2020-02-07 21:14:13 UTC
(In reply to Mark Millard from comment #5)

Looks like vm.pageout_oom_seq goes back to
10.3.0-RELEASE so experiments with it on
a 12.0-RELEASE based system should be possible.
Comment 7 Dave Evans 2020-02-08 09:31:17 UTC
Thanks for all the useful comments.

I've now set

kern.maxswzone=42949664

which as far as I can tell from loader(8) is the value to be used for a theoretical 8GB of swap.

I've configured 4GB of swap and rebooted.

I then ran a stress test of running 3 compilations of the offending file simultaneously and
monitored the system with top.

Each job peaked at size: 1500M, resident: 600M

swap usage peaked at 75% or 3054M

The 3 jobs took 30 minutes to complete, as I would expect.
There were no out of swap messages, which I good.

The initial problem was that default kern.maxswzone was set way
too low. It is not something I've ever tweaked before. It was probably
not allowing more than 1GB or less of swap.

This experience has taught me to read the output of dmesg more frequently
and studiously. It also helps to read the man pages.
Comment 8 Dave Evans 2020-02-08 09:50:25 UTC
(In reply to Mark Millard from comment #5)

I was not specifying any value for make -j

The virtual machine is set up to use 1 cpu core.
Comment 9 Robert Clausecker 2020-02-08 16:33:21 UTC
In reply to comment #5 of Mark:

That was a -j1 world build with swap enabled.  On a separate SSH session, I watched the clang process spike to 1.5 GB (with some 700 MB resident, not sure) before it got killed.  I was eventually able to get the compilation to run through by temporarily configuring extra swap space but it was a real pain to do.

Please though: I am extremely sure this is a compiler bug or poorly designed program, not a configuration issue.  Tweaking VM settings will not solve the underlying issue which is probably a memory leak or something.  And if people have to perform arcane tweaks to be able to upgrade their system at all (as no other upgrade path than upgrading from source is supported on ARM32), that's really bad news for people who actually want to run FreeBSD on their ARM boxes.

Please solve the underlying issue.  I want a solution, not a bandaid.
Comment 10 Mark Millard 2020-02-08 20:01:45 UTC
(In reply to Robert Clausecker from comment #9)

> I am extremely sure this is a compiler bug or poorly designed program

You may or may not be right. I do not have the knowledge to
know what is appropriate to expect for the test case. I do not
know what reasonable memory use figures would be. I do not
know which side of that "or" should have the blaim (or if
either should). I've no clue just what would need to change
in either (if anything).

> Please solve the underlying issue.  I want a solution, not a bandaid.

I am not a llvm developer, nor a Google Test developer.
(I've only done a few small patches for FreeBSD, generally
for personal use. So, effectively, I'm not a FreeBSD
developer either: a user.)

llvm is an upstream project used by FreeBSD but not developed
by FreeBSD. There is a:

https://bugs.llvm.org/

That would be a more appropriate place for requesting
a fix or redesign from the compiler side of things. If they
improved things, FreeBSD would pick up the change at some
point.

Google Test is an upstream project used by FreeBSD but not
developed by FreeBSD. It looks like it uses:

https://github.com/google/googletest/issues

for submitting and tracking issues. That would be a more
appropriate place for requesting a fix or redesign from
the Google Test side of things.

I've no evidence for which place would be the right place
to submit something. I do know that FreeBSD's bugvilla is
not the right place for upstream project changes. (Although
having a FreeBSD bugzilla item for pointing to an upstream
item for reference can be appropriate at times.)