Bug 230402

Summary: With buildworld, the system can not use swap
Product: Base System Reporter: Vladyslav V. Prodan <admin>
Component: miscAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Some People CC: chris, dim, marklmi26-fbsd, rgrimes, virtualization
Priority: ---    
Version: 11.2-STABLE   
Hardware: amd64   
OS: Any   
See Also: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=227609
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230454
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=241726
Attachments:
Description Flags
Preprocessed source none

Description Vladyslav V. Prodan 2018-08-06 03:05:37 UTC
Created attachment 195925 [details]
Preprocessed source

I run FreeBSD 11.2-STABLE r337132 inside the virtual machine in the Virtualbox.
Machine configuration: CPU 3 core, 1GB RAM, 1x SATA HDD 14GB with FreeBSD 11.2-STABLE r337132.

According to the system logs, with buildworld there is not enough RAM, but at the same time, swap remains unused.

# uname -a
FreeBSD core.domain.com 11.2-STABLE FreeBSD 11.2-STABLE #0 r337132: Thu Aug  2 17:54:09 UTC 2018     root@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64

dmesg:
pid 9870 (c++), uid 0, was killed: out of swap space

top -PS:
last pid:  9877;  load averages:  0.77,  1.79,  2.11                                       up 0+01:36:36  05:42:23
55 processes:  2 running, 52 sleeping, 1 waiting
CPU 0:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 1:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 2:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 37M Active, 7120K Inact, 450M Wired, 466M Free
ARC: 168M Total, 22M MFU, 70M MRU, 3616K Anon, 1537K Header, 71M Other
     21M Compressed, 74M Uncompressed, 3.49:1 Ratio
Swap: 2560M Total, 178M Used, 2381M Free, 6% Inuse, 4K In

# df -m
Filesystem                1M-blocks Used Avail Capacity  Mounted on
zroot                          7660   92  7567     1%    /
devfs                             0    0     0   100%    /dev
zroot/tmp                      7572    5  7567     0%    /tmp
zroot/usr                      8193  625  7567     8%    /usr
zroot/usr/home                 7567    0  7567     0%    /usr/home
zroot/usr/ports                7567    0  7567     0%    /usr/ports
zroot/usr/ports/distfiles      7567    0  7567     0%    /usr/ports/distfiles
zroot/usr/ports/packages       7567    0  7567     0%    /usr/ports/packages
zroot/usr/src                  8864 1297  7567    15%    /usr/src
zroot/var                      7568    0  7567     0%    /var
zroot/var/crash                7567    0  7567     0%    /var/crash
zroot/var/db                   7569    2  7567     0%    /var/db
zroot/var/db/pkg               7600   33  7567     0%    /var/db/pkg
zroot/var/empty                7567    0  7567     0%    /var/empty
zroot/var/log                  7567    0  7567     0%    /var/log
zroot/var/mail                 7567    0  7567     0%    /var/mail
zroot/var/ports                7567    0  7567     0%    /var/ports
zroot/var/run                  7567    0  7567     0%    /var/run
zroot/var/tmp                  7567    0  7567     0%    /var/tmp

# swapinfo -m
Device          1M-blocks     Used    Avail Capacity
/dev/gpt/swap-ada0      1024       93      930     9%
/dev/zvol/zroot/swap      1536       84     1451     6%
Total                2560      177     2382     7%


# make -j3 buildworld                       || exit
...

--- Target/ARM/ARMInstrInfo.o ---
c++  -target x86_64-unknown-freebsd11.2 --sysroot=/usr/obj/usr/src/tmp -B/usr/obj/usr/src/tmp/usr/bin  -O2 -pipe -I/usr/obj/usr/src/lib/clang/libllvm -I/usr/src/contrib/llvm/lib/Target/AArch64 -I/usr/src/contrib/llvm/lib/Target/ARM -I/usr/src/contrib/llvm/lib/Target/Mips -I/usr/src/contrib/llvm/lib/Target/PowerPC -I/usr/src/contrib/llvm/lib/Target/Sparc -I/usr/src/contrib/llvm/lib/Target/X86 -I/usr/src/lib/clang/include -I/usr/src/contrib/llvm/include -DLLVM_BUILD_GLOBAL_ISEL -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -DNDEBUG -DLLVM_DEFAULT_TARGET_TRIPLE=\"x86_64-unknown-freebsd11.2\" -DLLVM_HOST_TRIPLE=\"x86_64-unknown-freebsd11.2\" -DDEFAULT_SYSROOT=\"\" -DLLVM_TARGET_ENABLE_AARCH64 -DLLVM_TARGET_ENABLE_ARM -DLLVM_TARGET_ENABLE_MIPS -DLLVM_TARGET_ENABLE_POWERPC -DLLVM_TARGET_ENABLE_SPARC -DLLVM_TARGET_ENABLE_X86 -DLLVM_NATIVE_ASMPARSER=LLVMInitializeX86AsmParser -DLLVM_NATIVE_ASMPRINTER=LLVMInitializeX86AsmPrinter -DLLVM_NATIVE_DISASSEMBLER=LLVMInitializeX86Disassembler -DLLVM_NATIVE_TARGET=LLVMInitializeX86Target -DLLVM_NATIVE_TARGETINFO=LLVMInitializeX86TargetInfo -DLLVM_NATIVE_TARGETMC=LLVMInitializeX86TargetMC -ffunction-sections -fdata-sections -MD -MF.depend.Target_ARM_ARMInstrInfo.o -MTTarget/ARM/ARMInstrInfo.o -fstack-protector-strong -Qunused-arguments  -std=c++11 -fno-exceptions -fno-rtti -stdlib=libc++ -Wno-c++11-extensions  -c /usr/src/contrib/llvm/lib/Target/ARM/ARMInstrInfo.cpp -o Target/ARM/ARMInstrInfo.o
--- Target/ARM/ARMISelLowering.o ---
c++: error: unable to execute command: Killed
c++: error: clang frontend command failed due to signal (use -v to see invocation)
FreeBSD clang version 6.0.1 (tags/RELEASE_601/final 335540) (based on LLVM 6.0.1)
Target: x86_64-unknown-freebsd11.2
Thread model: posix
InstalledDir: /usr/bin
c++: note: diagnostic msg: PLEASE submit a bug report to https://bugs.freebsd.org/submit/ and include the crash backtrace, preprocessed source, and associated run script.
c++: note: diagnostic msg:
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
c++: note: diagnostic msg: /tmp/ARMISelLowering-c1b581.cpp
c++: note: diagnostic msg: /tmp/ARMISelLowering-c1b581.sh
c++: note: diagnostic msg:

********************
--- Target/ARM/ARMInstructionSelector.o ---
--- Target/ARM/ARMISelLowering.o ---
*** [Target/ARM/ARMISelLowering.o] Error code 254

make[6]: stopped in /usr/src/lib/clang/libllvm
--- Target/ARM/ARMInstructionSelector.o ---
c++  -target x86_64-unknown-freebsd11.2 --sysroot=/usr/obj/usr/src/tmp -B/usr/obj/usr/src/tmp/usr/bin  -O2 -pipe -I/usr/obj/usr/src/lib/clang/libllvm -I/usr/src/contrib/llvm/lib/Target/AArch64 -I/usr/src/contrib/llvm/lib/Target/ARM -I/usr/src/contrib/llvm/lib/Target/Mips -I/usr/src/contrib/llvm/lib/Target/PowerPC -I/usr/src/contrib/llvm/lib/Target/Sparc -I/usr/src/contrib/llvm/lib/Target/X86 -I/usr/src/lib/clang/include -I/usr/src/contrib/llvm/include -DLLVM_BUILD_GLOBAL_ISEL -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -DNDEBUG -DLLVM_DEFAULT_TARGET_TRIPLE=\"x86_64-unknown-freebsd11.2\" -DLLVM_HOST_TRIPLE=\"x86_64-unknown-freebsd11.2\" -DDEFAULT_SYSROOT=\"\" -DLLVM_TARGET_ENABLE_AARCH64 -DLLVM_TARGET_ENABLE_ARM -DLLVM_TARGET_ENABLE_MIPS -DLLVM_TARGET_ENABLE_POWERPC -DLLVM_TARGET_ENABLE_SPARC -DLLVM_TARGET_ENABLE_X86 -DLLVM_NATIVE_ASMPARSER=LLVMInitializeX86AsmParser -DLLVM_NATIVE_ASMPRINTER=LLVMInitializeX86AsmPrinter -DLLVM_NATIVE_DISASSEMBLER=LLVMInitializeX86Disassembler -DLLVM_NATIVE_TARGET=LLVMInitializeX86Target -DLLVM_NATIVE_TARGETINFO=LLVMInitializeX86TargetInfo -DLLVM_NATIVE_TARGETMC=LLVMInitializeX86TargetMC -ffunction-sections -fdata-sections -MD -MF.depend.Target_ARM_ARMInstructionSelector.o -MTTarget/ARM/ARMInstructionSelector.o  -DLLVM_TARGET_ENABLE_X86 -DLLVM_NATIVE_ASMPARSER=LLVMInitializeX86AsmParser -DLLVM_NATIVE_ASMPRINTER=LLVMInitializeX86AsmPrinter -DLLVM_NATIVE_DISASSEMBLER=LLVMInitializeX86Disassembler -DLLVM_NATIVE_TARGET=LLVMInitializeX86Target -DLLVM_NATIVE_TARGETINFO=LLVMInitializeX86TargetInfo -DLLVM_NATIVE_TARGETMC=LLVMInitializeX86TargetMC -ffunction-sections -fdata-sections -MD -MF.depend.Target_ARM_ARMInstructionSelector.o -MTTarget/ARM/ARMInstructionSelector.o
-fstack-protector-strong -Qunused-arguments  -std=c++11 -fno-exceptions -fno-rtti -stdlib=libc++ -Wno-c++11-extensions  -c /usr/src/contrib/llvm/lib/Target/ARM/ARMInstructionSelector.cpp -o Target/ARM/ARMInstructionSelector.o
1 error

make[6]: stopped in /usr/src/lib/clang/libllvm
*** [all] Error code 2

make[5]: stopped in /usr/src/lib/clang
1 error

make[5]: stopped in /usr/src/lib/clang
*** [all_subdir_lib/clang] Error code 2

make[4]: stopped in /usr/src/lib
1 error

make[4]: stopped in /usr/src/lib
*** [lib__L] Error code 2

make[3]: stopped in /usr/src
1 error

make[3]: stopped in /usr/src
*** [libraries] Error code 2

make[2]: stopped in /usr/src
1 error

make[2]: stopped in /usr/src
*** [_libraries] Error code 2

make[1]: stopped in /usr/src
1 error

make[1]: stopped in /usr/src
*** [buildworld] Error code 2

make: stopped in /usr/src
1 error

make: stopped in /usr/src
Comment 1 Vladyslav V. Prodan 2018-08-06 03:09:42 UTC
The second file is larger than the 1MB limit, so I posted a link to mega.nz

ARMISelLowering-c1b581.cpp.zip  2.2 MB
https://mega.nz/#!5tgwUYII!LFruPlHHBwz_aMSjQjLdItH5q-7G6Kd8dvLgTSMJGKQ
Comment 2 Mark Millard 2018-08-06 03:26:08 UTC
(In reply to Vladislav V. Prodan from comment #0)

Unfortunately, the message:

pid 9870 (c++), uid 0, was killed: out of swap space

can be misleading: that is not necessarily the actual
context.

The book "The Design and Implementation of the FreeBSD Operating System"
(2nd edition, 2014) states (page labeled 296):

QUOTE:
The FreeBSD swap-out daemon will not select a runnable processes to swap
out. So, if the set of runnable processes do not fit in memory, the
machine will effectively deadlock. Current machines have enough memory
that this condition usually does not arise. If it does, FreeBSD avoids
deadlock by killing the largest process. If the condition begins to arise
in normal operation, the 4.4BSD algorithm will need to be restored.
END QUOTE.

If there were no prior messages like:

sentinel kernel: swap_pager_getswapspace(32): failed

and tools such as swapinfo or top do not show low
swap space avaiable, then it is unlikely that "out of
swap space" is a correct wording in the message.

The arm list has lots of reports for this for RPI2 and
RPI3's (that are also 1 GiByte machines). In many cases
this is for head (12.0-CURRENT) but 11.x also shows
such issues as I understand. Also frequently UFS (so
no ARC memory use, for example).

Note: 4.4BSD is from long ago. The potential for the issue
is not new. What is new is building modern versions of
clang and other llvm materials.
Comment 3 Mark Millard 2018-08-06 03:35:27 UTC
(In reply to Vladislav V. Prodan from comment #0)

It looks like the top and swapinfo information are not from
during or just before the problem but after the memory use
has gone away because the buildworld stopped already.

This makes that information not obviously useful.

But this may be related to reports on the arm list
for 1 GiByte RPI3's and RPI2's trying to buildworld
as well.

(Having ZFS being involved does complicate things
and the ARC does use more memory.)
Comment 4 Vladyslav V. Prodan 2018-08-06 04:20:11 UTC
I used to make buildworld FreeBSD 10.x for 1GB RAM and 1GB swap. 

Now the situation is different with another CPU, odd number of CPU cores, HDD controller with IDE on SATA (for IDE controller with Virtualbox is another bug), lack of caching in the SATA controller.

Now I turned on the SATA controller caching.

If this does not help, I'll try to allocate 1.5-2 GB of RAM and repeat make buildworld in the FreeBSD 11.2.
Comment 5 Vladyslav V. Prodan 2018-08-06 17:54:49 UTC
After switching on the caching in the SATA controller in the Virtualbox assembled correctly buildworld and installworld for FreeBSD.

https://a.radikal.ru/a06/1808/66/659faed8db8d.jpg

I consider this to be a bug of Virtualbox versions 5.2.14 and 5.2.16.
But at the same time, an excellent testing ground for testing the behavior of faulty SATA controllers and in case of damage to individual parts of RAM
Comment 6 Rodney W. Grimes freebsd_committer freebsd_triage 2018-08-11 00:57:01 UTC
I would suggest that make with -j3 on a 1024MB machine is not a reasonable expectation.   It is very easy for a compile or linker process to get into the 500MB size region and suggest you either increase avaliable memory to the VM, or decrease the job count.

Due to the fact that "runnable" processes are not swapped in FreeBSD, this leads to an OOM condition, and the kill you see.
Comment 7 Mark Millard 2018-08-11 01:51:19 UTC
Mark Johnston has indicated that after investigations
in small armv7 and aarch64 examples, such as rpi3's
and rpi2's (V1.1):

I do think the default [vm.pageout_oom_seq] value
is too low and will get that addressed in 12.0.

Mark J. had someone with rpi3 and rpi2 (V1.1)
usage experiment with:

sysctl vm.pageout_oom_seq=120

and things got much farther but it was not
a cure.

While not a cure but it was discovered that
when some other changes were made ("lowering the
pagedaemon sleep period") a fair time ago,
vm.pageout_oom_seq was not rescaled to roughly
match, making OOM kills happen easier.

There are some patches for reporting information
that Mark J. has indicated will likely have some
variant become standard FreeBSD code that could
be enabled without needing patches, targeting
12.0 having such.

The lists have a long history tied to the
investigations on arm. I'll reference the first
Mark Johnston message here:

https://lists.freebsd.org/pipermail/freebsd-arm/2018-August/018506.html

His messages have most of the technical content
tied to internal evidence his patches produced
and what might be done. (Other folks produced data
from there environments, mostly one person.)

The prior activity does not get much into internal
activity tied to the issue.
Comment 8 Mark Millard 2018-08-12 17:27:19 UTC
(In reply to Mark Millard from comment #7)

[This is extracted from another context that
involved the Pine64+ 2GB.]

As of updating to -r337400 the Pine64+ 2GB no
longer will boot from the e.MMC on the microsd
adapter card. (I switched to tracking fully
modern dts use, u-boot, etc.)

So I tried a build via a USB SSD as the root
file system and swap partition. As reported in:

https://lists.freebsd.org/pipermail/freebsd-arm/2018-August/018605.html

it failed with an OOM kill.

This should have avoided I/O latency problems being
involved. (That message is part of a long on-going
thread tied to OOM kills, most of the reports involving
large I/O latencies being involved.)

I can not change the "Afects Only Me" status.
Comment 9 Mark Millard 2018-08-12 17:51:23 UTC
(In reply to Mark Millard from comment #8)

Other bugzilla's are: 227609 230454.
Comment 10 chris 2018-09-01 14:10:18 UTC
I to am having this problem. I am trying to buildworld on a RPi 3 B+ (using FreeBSD Current with 1G of swap space and the build fails during make of the clang source. I notice too that in my dmesg log that I am getting:

warning: total configured swap (1048576 pages) exceeds maximum recommended amount (924056 pages).

I have been unable to find any useful documentation on kern.maxswzone.

From my own observations the build does not seem to use up much swap space as the build fails.

The buildworld on my Rpi 1 B+ with 512M of swap space works fine, and there is no reference in dmesg about maximum recommended swap space being exceeded.

Any support appreciated.
Comment 11 Mark Millard 2018-09-01 15:22:34 UTC
(In reply to chris from comment #10)

You are not explicit about what revision you are building.
My experience is with head (12), not 11.x .

If the following is supported:

sysctl vm.pageout_oom_seq=120

then do that before starting the first build after booting. The default
value of 12 is unlikely to work. Depending on what all is going on in
your I/O environment, this may prove insufficient but it likely would
get more of the build done. If the build does not complete, then
investigating your I/O latencies becomes relevant.

The figure is tied to how long FreeBSD tolerates low free
RAM conditions. (This wording is a simplification.) FreeBSD
does not swap running processes to gain more free RAM, only
processes that are idle for a while.

Another point is the use of -j4 or other such vs. -j1 .
-j1 or other smaller figures are more likely to complete
(use less memory and have fewer long-running processes at
once). You were not explicit about your usage for this.

As for the swap space sizing (1 page = 4*1024 Bytes):

1048576 pages is 1048576 * (4*1024) Bytes, so 4 GiBytes, not the
1 GiByte referenced.

924056 pages is 924056 * (4*1024) Bytes, so a little over 3.5 GiBytes.

(Note the figures in the messages are system specific and can even
change some from build revision to revision for the same system.)

I'd recommend staying at or under the 3.5 GiByte figure. But going
anywhere near 1 GiByte of swap is insufficient with 1 GiByte of RAM.
2 GiByte of swap should work with some room to spare.

Is the reference to 512M of swap in another context similarly off
by a factor of 4? If yes: 2 GiBytes of swap were in use. Otherwise?
Again -j4 or other such vs. -j1 matters to the RAM+SWAP use and the
number of long-running processes at once.

I recommend using swap partitions and avoiding the use of swap
files. (I've no clue which you are using.)
Comment 12 Mark Millard 2018-09-01 15:50:51 UTC
(In reply to Mark Millard from comment #11)

I forgot to mention limiting the linker (lld)
to single threaded operation as a potential
help relative to RAM usage during builds:

LDFLAGS.lld+= -Wl,--no-threads

in a make.conf or src.conf like file used for
the likes of buildworld buildkernel activity.
Comment 13 Mark Millard 2018-09-02 04:04:28 UTC
(In reply to chris from comment #10)

See:

https://lists.freebsd.org/pipermail/freebsd-arm/2018-September/018797.html

for a report that vm.pageout_oom_seq=1024 was helpful for someone that
has had great difficulties getting rpi3 buildworld's to repeatedly complete.
Comment 14 chris 2018-09-03 13:52:17 UTC
Thank you all for your help and comments.

I set vm.pageout_oom_seq=120 and buildworld using -j 1 completed OK.

I presume the problem was I/O latency related. I have a swap file on a USB stick which might be slow. During the build only 3% of my 1Gb of swap was used.

Cheers

Chris
Comment 15 Dimitry Andric freebsd_committer freebsd_triage 2019-11-06 06:46:20 UTC
Apparently the conclusion in comment 14 was that it now worked.  Please reopen if you are sure that it is not an OOM issue.
Comment 16 Dimitry Andric freebsd_committer freebsd_triage 2019-11-06 06:46:43 UTC
Apparently the conclusion in comment 14 was that it now worked.  Please reopen if you are sure that it is not an OOM issue.