Summary: | With buildworld, the system can not use swap | ||||||
---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | Vladyslav V. Prodan <admin> | ||||
Component: | misc | Assignee: | freebsd-bugs (Nobody) <bugs> | ||||
Status: | Closed FIXED | ||||||
Severity: | Affects Some People | CC: | chris, dim, marklmi26-fbsd, rgrimes, virtualization | ||||
Priority: | --- | ||||||
Version: | 11.2-STABLE | ||||||
Hardware: | amd64 | ||||||
OS: | Any | ||||||
See Also: |
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=227609 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230454 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=241726 |
||||||
Attachments: |
|
Description
Vladyslav V. Prodan
2018-08-06 03:05:37 UTC
The second file is larger than the 1MB limit, so I posted a link to mega.nz ARMISelLowering-c1b581.cpp.zip 2.2 MB https://mega.nz/#!5tgwUYII!LFruPlHHBwz_aMSjQjLdItH5q-7G6Kd8dvLgTSMJGKQ (In reply to Vladislav V. Prodan from comment #0) Unfortunately, the message: pid 9870 (c++), uid 0, was killed: out of swap space can be misleading: that is not necessarily the actual context. The book "The Design and Implementation of the FreeBSD Operating System" (2nd edition, 2014) states (page labeled 296): QUOTE: The FreeBSD swap-out daemon will not select a runnable processes to swap out. So, if the set of runnable processes do not fit in memory, the machine will effectively deadlock. Current machines have enough memory that this condition usually does not arise. If it does, FreeBSD avoids deadlock by killing the largest process. If the condition begins to arise in normal operation, the 4.4BSD algorithm will need to be restored. END QUOTE. If there were no prior messages like: sentinel kernel: swap_pager_getswapspace(32): failed and tools such as swapinfo or top do not show low swap space avaiable, then it is unlikely that "out of swap space" is a correct wording in the message. The arm list has lots of reports for this for RPI2 and RPI3's (that are also 1 GiByte machines). In many cases this is for head (12.0-CURRENT) but 11.x also shows such issues as I understand. Also frequently UFS (so no ARC memory use, for example). Note: 4.4BSD is from long ago. The potential for the issue is not new. What is new is building modern versions of clang and other llvm materials. (In reply to Vladislav V. Prodan from comment #0) It looks like the top and swapinfo information are not from during or just before the problem but after the memory use has gone away because the buildworld stopped already. This makes that information not obviously useful. But this may be related to reports on the arm list for 1 GiByte RPI3's and RPI2's trying to buildworld as well. (Having ZFS being involved does complicate things and the ARC does use more memory.) I used to make buildworld FreeBSD 10.x for 1GB RAM and 1GB swap. Now the situation is different with another CPU, odd number of CPU cores, HDD controller with IDE on SATA (for IDE controller with Virtualbox is another bug), lack of caching in the SATA controller. Now I turned on the SATA controller caching. If this does not help, I'll try to allocate 1.5-2 GB of RAM and repeat make buildworld in the FreeBSD 11.2. After switching on the caching in the SATA controller in the Virtualbox assembled correctly buildworld and installworld for FreeBSD. https://a.radikal.ru/a06/1808/66/659faed8db8d.jpg I consider this to be a bug of Virtualbox versions 5.2.14 and 5.2.16. But at the same time, an excellent testing ground for testing the behavior of faulty SATA controllers and in case of damage to individual parts of RAM I would suggest that make with -j3 on a 1024MB machine is not a reasonable expectation. It is very easy for a compile or linker process to get into the 500MB size region and suggest you either increase avaliable memory to the VM, or decrease the job count. Due to the fact that "runnable" processes are not swapped in FreeBSD, this leads to an OOM condition, and the kill you see. Mark Johnston has indicated that after investigations in small armv7 and aarch64 examples, such as rpi3's and rpi2's (V1.1): I do think the default [vm.pageout_oom_seq] value is too low and will get that addressed in 12.0. Mark J. had someone with rpi3 and rpi2 (V1.1) usage experiment with: sysctl vm.pageout_oom_seq=120 and things got much farther but it was not a cure. While not a cure but it was discovered that when some other changes were made ("lowering the pagedaemon sleep period") a fair time ago, vm.pageout_oom_seq was not rescaled to roughly match, making OOM kills happen easier. There are some patches for reporting information that Mark J. has indicated will likely have some variant become standard FreeBSD code that could be enabled without needing patches, targeting 12.0 having such. The lists have a long history tied to the investigations on arm. I'll reference the first Mark Johnston message here: https://lists.freebsd.org/pipermail/freebsd-arm/2018-August/018506.html His messages have most of the technical content tied to internal evidence his patches produced and what might be done. (Other folks produced data from there environments, mostly one person.) The prior activity does not get much into internal activity tied to the issue. (In reply to Mark Millard from comment #7) [This is extracted from another context that involved the Pine64+ 2GB.] As of updating to -r337400 the Pine64+ 2GB no longer will boot from the e.MMC on the microsd adapter card. (I switched to tracking fully modern dts use, u-boot, etc.) So I tried a build via a USB SSD as the root file system and swap partition. As reported in: https://lists.freebsd.org/pipermail/freebsd-arm/2018-August/018605.html it failed with an OOM kill. This should have avoided I/O latency problems being involved. (That message is part of a long on-going thread tied to OOM kills, most of the reports involving large I/O latencies being involved.) I can not change the "Afects Only Me" status. (In reply to Mark Millard from comment #8) Other bugzilla's are: 227609 230454. I to am having this problem. I am trying to buildworld on a RPi 3 B+ (using FreeBSD Current with 1G of swap space and the build fails during make of the clang source. I notice too that in my dmesg log that I am getting: warning: total configured swap (1048576 pages) exceeds maximum recommended amount (924056 pages). I have been unable to find any useful documentation on kern.maxswzone. From my own observations the build does not seem to use up much swap space as the build fails. The buildworld on my Rpi 1 B+ with 512M of swap space works fine, and there is no reference in dmesg about maximum recommended swap space being exceeded. Any support appreciated. (In reply to chris from comment #10) You are not explicit about what revision you are building. My experience is with head (12), not 11.x . If the following is supported: sysctl vm.pageout_oom_seq=120 then do that before starting the first build after booting. The default value of 12 is unlikely to work. Depending on what all is going on in your I/O environment, this may prove insufficient but it likely would get more of the build done. If the build does not complete, then investigating your I/O latencies becomes relevant. The figure is tied to how long FreeBSD tolerates low free RAM conditions. (This wording is a simplification.) FreeBSD does not swap running processes to gain more free RAM, only processes that are idle for a while. Another point is the use of -j4 or other such vs. -j1 . -j1 or other smaller figures are more likely to complete (use less memory and have fewer long-running processes at once). You were not explicit about your usage for this. As for the swap space sizing (1 page = 4*1024 Bytes): 1048576 pages is 1048576 * (4*1024) Bytes, so 4 GiBytes, not the 1 GiByte referenced. 924056 pages is 924056 * (4*1024) Bytes, so a little over 3.5 GiBytes. (Note the figures in the messages are system specific and can even change some from build revision to revision for the same system.) I'd recommend staying at or under the 3.5 GiByte figure. But going anywhere near 1 GiByte of swap is insufficient with 1 GiByte of RAM. 2 GiByte of swap should work with some room to spare. Is the reference to 512M of swap in another context similarly off by a factor of 4? If yes: 2 GiBytes of swap were in use. Otherwise? Again -j4 or other such vs. -j1 matters to the RAM+SWAP use and the number of long-running processes at once. I recommend using swap partitions and avoiding the use of swap files. (I've no clue which you are using.) (In reply to Mark Millard from comment #11) I forgot to mention limiting the linker (lld) to single threaded operation as a potential help relative to RAM usage during builds: LDFLAGS.lld+= -Wl,--no-threads in a make.conf or src.conf like file used for the likes of buildworld buildkernel activity. (In reply to chris from comment #10) See: https://lists.freebsd.org/pipermail/freebsd-arm/2018-September/018797.html for a report that vm.pageout_oom_seq=1024 was helpful for someone that has had great difficulties getting rpi3 buildworld's to repeatedly complete. Thank you all for your help and comments. I set vm.pageout_oom_seq=120 and buildworld using -j 1 completed OK. I presume the problem was I/O latency related. I have a swap file on a USB stick which might be slow. During the build only 3% of my 1Gb of swap was used. Cheers Chris Apparently the conclusion in comment 14 was that it now worked. Please reopen if you are sure that it is not an OOM issue. Apparently the conclusion in comment 14 was that it now worked. Please reopen if you are sure that it is not an OOM issue. |