Bug 241848 - lib/googletest/gtest/tests: gmock-matchers_test.cc requires a pathological amount of memory to compile
Summary: lib/googletest/gtest/tests: gmock-matchers_test.cc requires a pathological am...
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 12.1-RELEASE
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-11-09 23:46 UTC by Robert Clausecker
Modified: 2020-10-19 22:30 UTC (History)
7 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Robert Clausecker freebsd_committer freebsd_triage 2019-11-09 23:46:48 UTC
During a compilation of FreeBSD 12.1 for the Raspberry Pi 2B, the build failed at the file

    /usr/src/contrib/googletest/googlemock/test/gmock-matchers_test.cc

with an out of memory condition.  Further analysis revealed, that it takes more than 1.5 GB memory to build this file.  Far too much for my puny computer.  This makes it impossible to finish a FreeBSD build and given the lack of binary upgrade possibilities, makes it very difficult for me to upgrade to FreeBSD 12.1.

Please find out what causes this pathological memory usage and make it possible to build FreeBSD on a machine with no more than 1 GB of RAM as it was before.  Not that 1 GB of RAM (mostly due to having to build clang) isn't already an annoyingly high memory requirement for upgrading your system from source.
Comment 1 Dave Evans 2020-02-07 13:01:20 UTC
I'm doing a buildworld for amd64 13-CURRENT with FreeBSD 12.1 guest on macOS VirtualBox host with 2.4GB ram assigned to FreeBSD guest.

I have 7 GB of swap.  

The build gets to the Build Everything stage.
When it gets to gmock-matchers_test.cc the process is killed with an out of swap signal.

swapinfo:
Device               1K-blocks     Used    Avail Capacity
/dev/ada0p2           2097152     11524  2085628     1%
/dev/zvol/zroot/swap  5242880     12012  5230868      0%
Total                 7340032     23536  7316496     0%

How much swap does this file need to compile? swapon complains if I try to add more.
Comment 2 Mark Millard 2020-02-07 17:56:25 UTC
(In reply to Dave Evans from comment #1)

If you get console messages something like (extracted example):

. . . kernel: pid 7963 (strip), jid 0, uid 0, was killed: out of swap space

This message text's detailed wording is frequently a misnomer.
Do you also have any messages of the form:

. . . sentinel kernel: swap_pager_getswapspace(32): failed

If yes: you really were out of swap space.
If no:  you were not out of swap space,
       or at least it is highly unlikely that you were.

FreeBSD kills processes for multiple potential reasons.
For example:

a) Still low on free RAM after a number of tries to increase it above a threshold.
b) Slow paging I/O.
c) . . . (I do not know the full list) . . .

Unfortunately, FreeBSD is not explicit about the category
of problem that leads to the kill activity that happens.

You might learn more by watching how things are going
via top or some such program or other way of monitoring.
You likely will find the swap space not low.


Below are some notes about specific tunables that might
or might not be of help. (There may be more tunables
that can help that I do not know about.)

For (a) there is a way to test if it is the issue by
adding to the number of tries before it gives up and
starts killing things. That will either:

1) let it get more done before kills start
2) let it complete before the count is reached
3) make no significant difference

(3) would imply that (b) or (c) are involved instead.

(1) might be handled by having it do even more tries.

For delaying how long free RAM staying low is
tolerated, one can increase vm.pageout_oom_seq from
12 to larger. The management of slow paging I've
less experience with but do have some notes about
below.

Examples follow that I use in contexts with
sufficient RAM that I do not have to worry about
out of swap/page space. These I've set in
/etc/sysctl.conf . (Of course, I'm not trying to
deliberately run out of RAM.)

#
# Delay when persisstent low free RAM leads to
# Out Of Memory killing of processes:
vm.pageout_oom_seq=120

I'll note that figures like 1024 or 1200 or
even more are possible. This is controlling how
many tries at regaining sufficient free RAM
that that level would be tolerated long-term.
After that it starts Out Of Memory kills to get
some free RAM.

No figure is designed to make the delay
unbounded. There may be large enough figures to
effectively be bounded beyond any reasonable
time to wait.


As for paging I/O (WARNING: all the below tunables
are specific to head (13), or was last I checked):

#
# For plunty of swap/paging space (will not
# run out), avoid pageout delays leading to
# Out Of Memory killing of processes:
vm.pfault_oom_attempts=-1

(Note: In my context "plunty" really means
sufficient RAM that paging is rare. But
others have reported on using the -1 in
contexts where paging was heavy at times and
OOM kills had been happening that were
eliminated by the assignment.)

I've no experience with the below alternative
to that -1 use:

#
# For possibly insufficient swap/paging space
# (might run out), increase the pageout delay
# that leads to Out Of Memory killing of
# processes:
#vm.pfault_oom_attempts= ???
#vm.pfault_oom_wait= ???
# (The multiplication is the total but there
# are other potential tradoffs in the factors
# multiplied, even for nearly the same total.)


I'm not claiming that these 3 vm.???_oom_???
figures are always sufficient. Nor am I
claiming that tunables are always available
that would be sufficient. Nor that it is easy
to find the ones that do exist that might
help for specific OOM kill issues.

I have seen reports of OOM kills for other
reasons when both vm.pageout_oom_seq and
vm.pfault_oom_attempts=-1 were in use.
As I understand, FreeBSD did not report
what kind of condition lead to the
decision to do an OOM kill.

So the above notes may or may-not help you.
Comment 3 Robert Clausecker freebsd_committer freebsd_triage 2020-02-07 18:16:55 UTC
These notes aside: this memory usage is far from the norm for compiling a C++ source file.  I believe there must be a bug in clang or llvm or some unfortunate design in the source file itself that causes this memory usage.  This is more than twice the highest memory usage I observed before (roughly 800 MB for one of the X86 instruction selection files in the LLVM source) and as opposed to that case, there is no apparent explanation for the memory usage.
Comment 4 Mark Millard 2020-02-07 18:21:08 UTC
(In reply to Mark Millard from comment #2)

I should have noted: if a process stays runnable,
FreeBSD does not stop it and swap it out but
instead just pages it. (For FreeBSD, swapping
basically seems to mean that the kernel stacks
were also moved out to swap space and would have
to be brought back in for the process to run.)

Thus, 1 or more processes that use large amounts
of memory relative to the RAM size but also stay
runnable, are not not stopped and swapped out to
make room. In such a context, if free RAM stays
low, despite other efforts to gain some back,
processses are then killed instead.

vm.pageout_oom_seq controls how many attempts are
made to gain more free RAM before the kills start.
Comment 5 Mark Millard 2020-02-07 19:41:53 UTC
(In reply to Dave Evans from comment #1)

Was this a -j1 build? Something larger?
(There could be other things worth
reporting about the build context.)

It can be interesting to watch the system with
top sorted by resident RAM use (decreasing).
It can give a clue what are the major things
contributing to low free RAM while watching.

I'm not sure one can infer much from which
process FreeBSD decides to kill. Other
evidence is likely better about what is
contributing to the sustained low free RAM
that likely is what leaded to the process
kill(s).

To Robert:

I've been replying mostly to Dave because it has
been a significant time since I've experimented
with a 1 GiByte machine for buildworld buildkernel
and the like. Dave indicated over 2 GiByte for
his context.

You could try vm.pageout_oom_seq=1200 and a -j1
build and see if it helps you. Reporting the result
here might be useful.

Actually, you indicated "upgrade to FreeBSD 12.1",
so your context is appearently older, such as 12.0.
I'm not sure if vm.pageout_oom_seq goes back
that far. That might leave you only with -j1 (which,
for all I know, you might have already been using).
Comment 6 Mark Millard 2020-02-07 21:14:13 UTC
(In reply to Mark Millard from comment #5)

Looks like vm.pageout_oom_seq goes back to
10.3.0-RELEASE so experiments with it on
a 12.0-RELEASE based system should be possible.
Comment 7 Dave Evans 2020-02-08 09:31:17 UTC
Thanks for all the useful comments.

I've now set

kern.maxswzone=42949664

which as far as I can tell from loader(8) is the value to be used for a theoretical 8GB of swap.

I've configured 4GB of swap and rebooted.

I then ran a stress test of running 3 compilations of the offending file simultaneously and
monitored the system with top.

Each job peaked at size: 1500M, resident: 600M

swap usage peaked at 75% or 3054M

The 3 jobs took 30 minutes to complete, as I would expect.
There were no out of swap messages, which I good.

The initial problem was that default kern.maxswzone was set way
too low. It is not something I've ever tweaked before. It was probably
not allowing more than 1GB or less of swap.

This experience has taught me to read the output of dmesg more frequently
and studiously. It also helps to read the man pages.
Comment 8 Dave Evans 2020-02-08 09:50:25 UTC
(In reply to Mark Millard from comment #5)

I was not specifying any value for make -j

The virtual machine is set up to use 1 cpu core.
Comment 9 Robert Clausecker freebsd_committer freebsd_triage 2020-02-08 16:33:21 UTC
In reply to comment #5 of Mark:

That was a -j1 world build with swap enabled.  On a separate SSH session, I watched the clang process spike to 1.5 GB (with some 700 MB resident, not sure) before it got killed.  I was eventually able to get the compilation to run through by temporarily configuring extra swap space but it was a real pain to do.

Please though: I am extremely sure this is a compiler bug or poorly designed program, not a configuration issue.  Tweaking VM settings will not solve the underlying issue which is probably a memory leak or something.  And if people have to perform arcane tweaks to be able to upgrade their system at all (as no other upgrade path than upgrading from source is supported on ARM32), that's really bad news for people who actually want to run FreeBSD on their ARM boxes.

Please solve the underlying issue.  I want a solution, not a bandaid.
Comment 10 Mark Millard 2020-02-08 20:01:45 UTC
(In reply to Robert Clausecker from comment #9)

> I am extremely sure this is a compiler bug or poorly designed program

You may or may not be right. I do not have the knowledge to
know what is appropriate to expect for the test case. I do not
know what reasonable memory use figures would be. I do not
know which side of that "or" should have the blaim (or if
either should). I've no clue just what would need to change
in either (if anything).

> Please solve the underlying issue.  I want a solution, not a bandaid.

I am not a llvm developer, nor a Google Test developer.
(I've only done a few small patches for FreeBSD, generally
for personal use. So, effectively, I'm not a FreeBSD
developer either: a user.)

llvm is an upstream project used by FreeBSD but not developed
by FreeBSD. There is a:

https://bugs.llvm.org/

That would be a more appropriate place for requesting
a fix or redesign from the compiler side of things. If they
improved things, FreeBSD would pick up the change at some
point.

Google Test is an upstream project used by FreeBSD but not
developed by FreeBSD. It looks like it uses:

https://github.com/google/googletest/issues

for submitting and tracking issues. That would be a more
appropriate place for requesting a fix or redesign from
the Google Test side of things.

I've no evidence for which place would be the right place
to submit something. I do know that FreeBSD's bugvilla is
not the right place for upstream project changes. (Although
having a FreeBSD bugzilla item for pointing to an upstream
item for reference can be appropriate at times.)
Comment 11 Alan Somers freebsd_committer freebsd_triage 2020-02-25 23:50:28 UTC
It's not just arm.  I ran into the same bug on amd64 trying to build a release of stable/12 at r358079.  I'm going to set WITHOUT_GOOGLETEST=1 in src.conf as a workaround.
Comment 12 Mark Millard 2020-02-26 03:55:00 UTC
Just to add to the examples of what it takes
to build and link gmock-matchers_test . . .

In /usr/src/lib/googletest/gmock_main/tests/ I tried
building gmock-matchers_test on an Orange Pi+ 2ed
(armv7 Cortex-A7 with 2 GiBytes of RAM and 1740Mi
swap/paging space). The context is head -r358132 .

I use a modified version of top that keeps track of
its sampled "Max. Observed Active" (MaxObsActive),
MaxObsWired, MaxObs(Act+Wir), and, for swap use,
MaxObsUsed (if any). It is also biased to present
more digits (smaller unit size) and be explicit
about powers of 2 factors being in use for memory
size display.

After finishing in somewhat over 20 minutes (under 25?),
the odd variant of top was showing:

1019Mi MaxObsActive, 193444Ki MaxObsWired, 1146Mi MaxObs(Act+Wir)
Swap: 1740Mi Total, 1740Mi Free

That spans the link as well. So swap/paging space was not observed
to be used --but clearly would have been on a 1 GiByte machine.
Similarly Free RAM was never observed to be low but would have
been on a 1 GiByte machine.

An example aarch64 is a Rock64 (not Pro) with 4 GiBytes of RAM:

1753Mi MaxObsActive, 633084Ki MaxObsWired, 2368Mi MaxObs(Act+Wir)
Swap: 4608Mi Total, 4608Mi Free

(It shows a lot more Wired even without the build, just because of
the larger amount of RAM.) So, even just looking at the MaxObsActive,
it indicates that a 1 GiByte RAM machine would be paging/swaping and
a 2 GiByte machine would likely do some as well (far less).

There is a significant MaxObsActive difference between the armv7 and
aarch64 contexts. But it would be interesting to see what a 2 GiByte
aarch64 would be like.
Comment 13 Mark Millard 2020-02-27 03:25:53 UTC
(In reply to Mark Millard from comment #12)

Adding a Pine64+2G example (so 2 GiBytes of RAM on
aarch64, again head -r358132 based):

1682Mi MaxObsActive, 278228Ki MaxObsWired, 1845Mi MaxObs(Act+Wir)
Swap: 3584Mi Total, 3584Mi Free

It did not use swap but looks like it was fairly close to doing so.

Note: It is expected that MaxObs(Act+Wir)<=MaxObsActive+MaxObsWired.
The right hand side need not be figures from similar time frames
but the left hand side is  from figures from comparatively similar
time frames. Plus just the Math: The maximum of a sum is at most
the sum of the maximums.
Comment 14 Mark Millard 2020-02-27 17:24:15 UTC
(In reply to Mark Millard from comment #13)

I should have noted that I used my normal
context relative to controlling the criteria
for kills for Out Of Memory and related issues:

#
# Delay when persistent low free RAM leads to
# Out Of Memory killing of processes:
vm.pageout_oom_seq=120
#
# For plunty of swap/paging space (will not
# run out), avoid pageout delays leading to
# Out Of Memory killing of processes:
vm.pfault_oom_attempts=-1

(That last one may only be for head but the
first has been around for longer.)
Comment 15 Dimitry Andric freebsd_committer freebsd_triage 2020-03-11 07:06:31 UTC
This particular source file is indeed a rather pathological case.

On my 13.0-CURRENT test system, using clang 10.0.0-rc3 (with assertions enabled), it takes a maxrss of 1982620, so ~1936 MiB to compile with -O2.

Gcc 9.2.0 from ports fares even worse, it takes about 20% more time to compile, and a maxrss of 2684812, so ~2622 MiB.

I also tried the clang90 port, but this assertions disabled, and this takes a maxrss of 1755320, so ~1714 MiB.

For now, my advice would be to compile this file with -O1, or even -O0, as it seems to be an internal test for googletest itself, and not something that we actively need to have heavily optimized.
Comment 16 Mark Millard 2020-03-15 08:06:55 UTC
(In reply to Dimitry Andric from comment #15)

I'll note one possibility may be jemalloc behavior
contributing. (Not that I've specific evidence one
way or the other.)

QUOTING (although I changed the top-post order to
bottom posting) . . .


On Thu, Jan 9, 2020 at 1:45 PM Bryan Drewery <bdrewery at freebsd.org> wrote:
>
> Do you plan to get this back in soon? I hope to see it before 12.2 if
> possible. Is there some way I can help?
>
> I'm interested in these changes in 5.2.1 (I think)
>   - Properly trigger decay on tcache destroy.  (@interwq, @amosbird)
>   - Fix tcache.flush.  (@interwq)
>   - Fix a side effect caused by extent_max_active_fit combined with
> decay-based purging, where freed extents can accumulate and not be
> reused for an extended period of time.  (@interwq, @mpghf)
>
> I have a test case where virtual memory was peaking at 275M on 4.x, 1GB
> on 5.0.0, around 750M on 5.1.0, and finally 275M again on 5.2.0. The
> 5.0/5.1 versions appeared to be a widespread leak to us.
. . .

I think it's fine to get jemalloc 5.2.1 in again now.  The previous
fails were due to ancient gcc421.  Now the in-tree gcc has been
removed and the default compiler of non-llvm platforms are all using
gcc6 from ports.  The CI environment are also updated to follow the
current standard.  I've tested a patch combines r354605 + r355975 and
it builds fine on amd64 (clang10) and mips (gcc6).

Best,
Li-Wen
Comment 17 Mark Millard 2020-03-31 18:40:26 UTC
(In reply to Mark Millard from comment #12)

I had a RPi3 that was based on head -r358966 do a
build world buildkernel of the same version, from
scratch, -j4 style. The RPi3 is a 1 GiByte RAM
context. I had 3072 GiBytes for the swap partition.
That ,and the ufs file system, were on a USB SSD,
not the microsd card.

The build completed without any /var/log/message or
console output during the build. My modified version
of top reported (details copied from a ssh window) . . .

For Mem: 738512Ki MaxObsActive, 190608Ki MaxObsWired, 906372Ki MaxObs(Act+Wir)
For Swap:  1927Mi MaxObsUsed

(top was started before the build. "MaxObs" is short
for "Maximum Observed".)

The build took a few minutes under 31 hrs.

The build used (the PINE64 media are also set up
to boot the RPi3, explaining some naming):

vm.pageout_oom_seq=120
vm.pfault_oom_attempts=-1
vfs.root.mountfrom="ufs:/dev/gpt/PINE642Groot"
dumpdev="/dev/gpt/PINE642Gswp2"
/dev/gpt/PINE642Groot           /               ufs rw,noatime          1 1
/dev/gpt/PINE642Gswap           none            swap sw                 0 0

(So this avoided the microsd card for ufs and
swap/page space.)

Overall, it looks like having more than 2 GiBytes
of swap partition is appropriate for -j4 : 1927
MiByte is not much less than 2048 MiByte.

But, with appropriate configuration anyway, the
RPi3 can do buildworld buildkernel for head 13,
even -j4 style.

This was aarch64. armv7 style with 1 GiByte RAM
does not allow as much swap/page space without
complaining at boot. It does not appear that
such a -j4 build would be appropriate for armv7.
But I've not investigated what would fit.
Comment 18 Mark Millard 2020-03-31 18:53:53 UTC
(In reply to Mark Millard from comment #17)

Poor wording:

"the PINE64 media are also set up
to boot the RPi3, explaining some naming"

Better:

my PINE64 media are also set up
to boot the RPi3, explaining some naming

Note:

This works because the dd based PINE64+ 2GB
material and the msdosfs based RPi3 materials
do not interfere with each other and can
both be in place. After that, FreeBSD need
not care which it is.
Comment 19 Mark Millard 2020-04-04 02:49:28 UTC
(In reply to Mark Millard from comment #17)

A 1 GiByte RAM armv7 test . . .

I tested a RPi2 V1.2 based on armv7 head -r359427
as the context (self-hosted, from-scratch build),
using -j2 with 1800 MiByte swap partition used for
the 1 GiByte RPi2. vm.pageout_oom_seq=120 and
vm.pfault_oom_attempts=-1 and USB SSD and the
like again, avoiding the microsd card after
the kernel loads. The 1800 MiByte swap avoided
boot notices of the form:

warning: total configured swap (... pages) exceeds maximum recommended amount (... pages).

I stayed somewhat under the recommended maximum.
(stable/12 reportedly lists a smaller recommended
maximum when its figure is exceeded, somewhat more
than 1200 MiByte.)

The build completed fine, with my odd top variant showing
"maximum Observed" figures:

Mem:  758544Ki MaxObsActive, 189972Ki MaxObsWired, 928060Ki MaxObs(Act+Wir)
Swap: 527388Ki MaxObsUsed

But it turned out that the high memory use time frame for
gmock-matchers_test.cc was matched with a very low memory
use activity. So the 527388Ki MaxObsUsed is on the low
side for figuring out having margin to cover -j2 variability
in what the paired activity might be. Other pairings could
easily have used over 700 MiByte more (say, linking clang),
and so have reached in the realm of 1400 to 1500 MiByte
for swap, leaving, say, 400 MiBytes to 300 MiBytes unused.

(I happened to be there to watch the top display over the
period of time at issue, seeing the growth to 527388Ki
MaxObsUsed.)

I'd not push it to -j3 for armv7 FreeBSD with 1 GiByte RAM.
Having swap fairly near (but under) the recommended maximum
seems appropriate for -j2 .

Appropriately configured, -j1 seems unlikely to be a problem
for for 1 GiByte RAM stable/12 (swap space contributing,
vm.pfault_oom_attempts=-1 contributing, vm.pageout_oom_seq=120
contributing). I've not experimented with the more
problematical microsd cards instead of using the particular
USB SSD's that I have around. In the microsd card context
vm.pfault_oom_attempts=-1 is likely appropriate to avoid
paging activity latency from leading to OOM kills.

For reference: the build took somewhat less than 38 hrs.


I will note that stable/12 does support vm.pfault_oom_attempts=-1
as of -r351776 (2019-Sep-3) and vm.pageout_oom_seq=120 for much
longer.
Comment 20 Mark Millard 2020-10-19 22:10:47 UTC
(In reply to Mark Millard from comment #19)

Looks like head -r366850 will make parallel build activity while
gmock-matchers_test.cc is building more likely, causing increased
peak-attempted memory use for -j2 overall. It might be that for
1 GiByte armv7 contexts that -j1 will effectively be required
if gmock-matchers_test.cc is to be part of the build.

-j2 for 1 GiByte aarch64 contexts with sufficient page/swap space
may page/swap heavily over this period. Use of the tuning controls
to avoid OOM kills would seem to be required for such contexts.
Comment 21 Dimitry Andric freebsd_committer freebsd_triage 2020-10-19 22:20:13 UTC
See also Alex's https://reviews.freebsd.org/D26751, which is supposed to lower the CPU and RAM requirements.
Comment 22 Alex Richardson freebsd_committer freebsd_triage 2020-10-19 22:30:50 UTC
https://reviews.freebsd.org/D26067 may also be of interest as it stops building this test by default.