During some build stages rust/cargo eats a lot of memory using its default settings making it more or less not viable for low/mid-rage systems. Can we consider setting cogegen-units to 1 or add a toggle for it and also perhaps parallel compiling? Ref: https://reviews.freebsd.org/D30099#677659
Daniel, I'm sorry, not trying to ignore you but unsure of expectations. Seems worthwhile but someone needs to run the builds, get some before and after numbers, research downsides if any, and then I guess we can set codegen-units=1 if it looks ok, sure. As for some of the comments from the review: > [...] uses LLVM from their package tree instead of bundled, perhaps that's worth looking into? lang/rust had an LLVM_PORT option once, but that only works if somebody feels responsible for supporting it and fixing any regressions that might happen. But nobody really did so we removed it. > Unbundle libssh2? It's bundled now I guess because we bundle libgit2 now too (because devel/libgit2 was not updated fast enough again). Since we update the toolchain every 6 weeks it is all probably not worth the hassle.
Hi, I mainly started to look into this as building Rust on my low-end server (specs below) failed despite having quite a bit of RAM and swap at disposal. While this isn't ideal way of logging here's a graph of memory usage at the end of compiling rust (2 jobs) that succeeds. https://projects.pyret.net/files/public/freebsd/mem-usage-rustc.png This box is an old Dell T20 with a dual core Intel Pentium G3220 CPU, 12Gb of RAM and running ZFS for Poudriere but not on rootfs. It runs 12.2-RELEASE-p6 and building Rust in a 12.2 jail. It's lightly loaded and uses about 5-6Gb of RAM (incl ZFS) without any Poudriere job running. I have no specific ZFS tuning set, however from what I can tell ZFS cache seems to grow quite a bit compiling Rust. Setting codegen reduces memory usage about 1-1.5Gb from what I can tell but memory usage is still quite high. I also gave this a go on my RockPro64 (arm64) (4Gb of RAM) running 13-STABLE (stable/13-n245283-70a2e9a3d44), UFS only and while it took 14h+ hours (-j1) it did finish. During compiing it used about 2Gb (the job not the complete system) tops which is a lot less than what I'm seeing on my server. I'll give this a go on another box running 13-STABLE (amd64) and see if that also consumes a lot of memory. Thanks for replying about LLVM and libssh2, if it's too much of a hassle I understand the decision :-)
Hmm... compiling & optimizing seems to use a bit more memory, I did see a few processes use more than 2.5Gb of memory. Wired memory is a lot more though, ~4.8G and peaked at 6.7G so I guess that's due to ZFS?
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/ports/commit/?id=294f0c5c206d70e24b6bbc28766d962dc82f8b61 commit 294f0c5c206d70e24b6bbc28766d962dc82f8b61 Author: Tobias Kortkamp <tobik@FreeBSD.org> AuthorDate: 2021-06-14 18:50:33 +0000 Commit: Tobias Kortkamp <tobik@FreeBSD.org> CommitDate: 2021-06-14 20:51:11 +0000 lang/rust-nightly: Try to reduce memory usage/pressure Try to reduce memory usage/pressure by only using one code generation unit. "This flag [codegen-units] controls how many code generation units the crate is split into. It takes an integer greater than 0. When a crate is split into multiple codegen units, LLVM is able to process them in parallel. Increasing parallelism may speed up compile times, but may also produce slower code. Setting this to 1 may improve the performance of generated code, but may be slower to compile." https://doc.rust-lang.org/rustc/codegen-options/index.html#codegen-units PR: 256099 Suggested by: Daniel Engberg lang/rust/Makefile | 3 +++ 1 file changed, 3 insertions(+)
Just wanted to report here that, building Rust always get OOM'ed after ~6 hours on my low-end build box 10 times in a row, spec: - Intel i5-6500T (4) @ 2.496GHz - 16GB RAM, 2GB swap - FreeBSD 13.0-RELEASE amd64 - ZFS on root It does finish on my VPS which has similar spec but more swap, however, the memory/swap usage is very high. Spec: - Intel Xeon Platinum 8171M (4) @ 2.095GHz - 16GB RAM, 32GB swap - FreeBSD 13.0-RELEASE amd64 - UFS on root, with ZFS enabled on datadisks I haven't looked at it closely, so will report back if I noticed anything. Thanks!
Cannot get lang/rust-nightly to build on my build box with 16G memory with the code generation unit change. The CPU usage is down to one core, but the RAM pressure is still very high, and the whole process ended up getting OOM'ed.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/ports/commit/?id=6f1fefb50e755d727f471aeb75ebe4e28f876b4b commit 6f1fefb50e755d727f471aeb75ebe4e28f876b4b Author: Tobias Kortkamp <tobik@FreeBSD.org> AuthorDate: 2021-09-07 08:14:14 +0000 Commit: Tobias Kortkamp <tobik@FreeBSD.org> CommitDate: 2021-09-19 09:03:21 +0000 lang/rust: Update to 1.55.0 - Set codegen-units=1 [1] - Add hack to skip cargo update on git sources as a step towards solving [2] - Fix 'capacity overflow' panics on armv* [3] Changes: https://blog.rust-lang.org/2021-09-09/Rust-1.55.0.html PR: 258337 PR: 256099 [1] PR: 256581 [2] PR: 257419 [3] Reviewed by: mikael, pkubaj Exp-run by: antoine Differential Revision: https://reviews.freebsd.org/D31872 With hat: rust Mk/Uses/cargo.mk | 2 +- Mk/bsd.gecko.mk | 2 +- lang/rust-bootstrap/Makefile | 8 +- lang/rust-bootstrap/distinfo | 6 +- lang/rust/Makefile | 12 +-- lang/rust/distinfo | 114 ++++++++++----------- ...m-project_compiler-rt_lib_builtins_cpu__model.c | 21 ++-- ...ols_cargo_src_cargo_sources_git_source.rs (new) | 45 ++++++++ ...rc_tools_cargo_src_cargo_util_toml_mod.rs (new) | 22 ++++ .../patch-vendor_openssl-sys_build_main.rs (gone) | 19 ---- ..._src_unix_bsd_freebsdlike_freebsd_mod.rs (gone) | 12 --- ..._unix_bsd_freebsdlike_freebsd_powerpc.rs (gone) | 50 --------- .../powerpc64-elfv1/patch-src_bootstrap_native.rs | 10 +- ...h-compiler_rustc__target_src_spec_mod.rs (gone) | 10 -- ...rc_spec_powerpc64le__unknown__freebsd.rs (gone) | 19 ---- 15 files changed, 154 insertions(+), 198 deletions(-)
(In reply to commit-hook from comment #7) An FYI for systems with more resources . . . Prior to this change during a from-scratch bulk -a using ALLOW_PARALLEL_JOBS= : [05:52:06] [16] [04:29:11] Finished lang/rust | rust-1.54.0_2: Success on the same machine after the change (again from-scratch using ALLOW_PARALLEL_JOBS= ): [12:39:47] [14] [11:20:24] Finished lang/rust | rust-1.55.0: Success So about 2.5 times longer (about 4.5 hrs -> 11.3 hrs). For reference: HoneyComb (16 Cortext-A72's) with 64 GiBytes of RAM, root on ZFS, Optane 480 media. Large swap on USB3 SSD media but top indicated it was unused during both the bulk -a builds. This test does not control just what the other 15 builders were doing in the overlapping time frames in each bulk -a but all the other builders were busy with a sequence of builds over that time. The load averages were well over 16 but I do not have record of such over time for either bulk -a . I've another bulk -a going on that machine and it may be about a week before it finishes. (The 11:20:24 figure is from that on-oing bulk -a .)
(In reply to Mark Millard from comment #8) I forgot to list that: USE_TMPFS="data" was in use. I've built rust by itself with USE_TMPFS=yes (so "wrkdir data") in the past and the tmpfs use grew to around 17 GiBytes. Luckilly I had swap configured that was sufficient for the machine that was done on at the time. Having USE_TMPFS allowing significant tmpfs sizes for port builds using huge amounts of disk space basically requires an environment with sufficient resources arrnaged up front. The use of PCIe OPTANE media helps avoid I/O being as much of an issue as it could be with, say, spinning media.
(In reply to Mark Millard from comment #9) I found my old note about the tmpfs use for USE_TMPFS=yes for lang/rust : # df -m | grep tmpfs Filesystem 1M-blocks Used Avail Capacity Mounted on . . . tmpfs 301422 17859 283563 6% /usr/local/poudriere/data/.m/FBSDFSSDjail-default/01/wrkdirs . . . So the 17 GiBytes was only the "wrkdirs" contribution.
(In reply to Mark Millard from comment #8) I have similar result on my amd64 box: rust 1.55.0 with codegen-units=1 build time: 00:39:59 rust 1.55.0 without codegen-units=1 build time: 00:23:15
(In reply to Daniel Engberg from comment #0) What USE_TMPFS (or analogous) was in use?
(In reply to Guangyuan Yang from comment #5) What USE_TMPFS (or analogous) setting was in use?
(In reply to Guangyuan Yang from comment #5) Unfortunately messages such as: pid . . . (. . .), jid . . ., uid . . ., was killed: out of swap space can be a misnomer for the "out of swap space" part: it can be reported even when none of the swap space had been in use. There are other reasons possible for why kills happen. One point is that FreeBSD wil not swap out a process that stays runnable, even if its active memory use keeps the free RAM minimal, it just continues to page in and out. If it really was out of swap space there would also be messages like: swap_pager_getswapspace(. . .): failed or: swap_pager: out of swap space Other causes for the kills include: Sustained low free RAM (via stays-runnable processes). A sufficiently delayed pageout. The swap blk uma zone was exhausted. The swap pctrie uma zone was exhausted. The first two of those have some tunables that you might want to try: # Delay when persistent low free RAM leads to # Out Of Memory killing of processes: vm.pageout_oom_seq=120 # For plunty of swap/paging space (will not # run out), avoid pageout delays leading to # Out Of Memory killing of processes: vm.pfault_oom_attempts=-1 # For possibly insufficient swap/paging space # (might run out), increase the pageout delay # that leads to Out Of Memory killing of # processes (showing defaults at the time): #vm.pfault_oom_attempts= 3 #vm.pfault_oom_wait= 10 # (The multiplication is the total but there # are other potential tradoffs in the factors # multiplied, even for nearly the same total.) I'll note that vm.pageout_oom_seq has a default of 12 but can be much larger than 120, such as 1024 or 10240 or even more. Larger figures increase the time before kills start happening because of sustained low free RAM. But no setting is designed to disable the kills from eventually happening on some scale.
(In reply to Guangyuan Yang from comment #5) The following is based on (in part): USE_TMPFS="data" ALLOW_PARALLEL_JOBS= for building rust-1.54.0_2 (so: before the codegen-units change). It is a root-on-ZFS context. Also in use was /boot/loader.conf having: vm.pageout_oom_seq=120 vm.pfault_oom_attempts=-1 I'll report figures based on my local top patches that record and report various "Maximum Observed" figures (MaxObs???? naming). poudriere output: . . . [00:00:23] Building 1 packages using 1 builders [00:00:23] Starting/Cloning builders [00:00:27] Hit CTRL+t at any time to see build progress and stats [00:00:27] [01] [00:00:00] Building lang/rust | rust-1.54.0_2 [05:10:56] [01] [05:10:29] Finished lang/rust | rust-1.54.0_2: Success [05:11:35] Stopping 1 builders . . . Where the top output reported: . . .; load averages: . . . MaxObs: 5.83, 5.09, 4.93 . . . . . . threads: . . . 21 MaxObsRunning . . . Mem: . . . 2285Mi MaxObsActive . . . . . . Swap: 14336Mi Total, 14336Mi Free (The "Swap:" line did not report any positive amount used.) No console messages at all. In other words: it never got near starting to using the swap paritition that was active. For reference . . . System: MACCHIATObin Double Shot (4 Cortex-A72's) with 16 GiBytes RAM. (So an aarch64 context.) Root-on-ZFS with no special tuning. main [So: 14]. 14336 MiByte sawp partition active. The boot media is a portable USB3 SSD. # uname -apKU FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #12 main-n249019-0637070b5bca-dirty: Tue Aug 31 02:24:20 PDT 2021 root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400032 1400032 But: # poudriere jail -j13_0R-CA72 -i Jail name: 13_0R-CA72 Jail version: 13.0-RELEASE-p4 Jail arch: arm64.aarch64 Jail method: null Jail mount: /usr/obj/DESTDIRs/13_0R-CA72-poud Jail fs: Jail updated: 2021-09-06 19:07:54 Jail pkgbase: disabled And: # cd /usr/ports # ~/fbsd-based-on-what-commit.sh branch: main merge-base: b0c4eaac2a3aa9bc422c21b9d398e4dbfea18736 merge-base: CommitDate: 2021-09-07 21:55:24 +0000 b0c4eaac2a3a (HEAD -> main, freebsd/main, freebsd/HEAD) security/suricata: Add patch for upstream locking fix n557269 (--first-parent --count for merge-base)
(In reply to Tobias Kortkamp from comment #1) Based on comment #15 I expect that codegen-units was misidentified as the cause of the memory usage/pressure. I expect that USE_TMPFS included wrkdir, which for lang/rust can be 17 GiByte+ instead, was the driving issue for memory use/pressure. USE_TMPFS="data" (avoiding wrkdir) is the primary thing that deals with the memory use/pressure from what I can tell. (USE_TMPFS=yes is equivalent to "wrkdir data".) Based on comment #8 and comment #11 I beleive the change has negative consequences for various contexts, in part based on lack of control from the OPTIONS. (The default should track what FreeBSD wants for the official package builders for the tradeoff for more-time vs. better code generation. It is possible that would be the new setting. Such is not for me to say. But . . .) Given that USE_TMPFS="data" is what makes the big difference for memory use/pressure, I'd suggest reverting the change made for this bugzilla submittal until OPTIONS has control of the codegen-units setting the rust default vs. 1 and the default for the OPTION is set to what the long term official package builds should be based on.
(In reply to Mark Millard from comment #15) I forgot to mention that I have set larger timeout values in /usr/local/etc/poudriere.conf than the defaults. So my experiment would not show reaching a default timeout, not that I expect such would have occured in that experiment.
When I did some testing it did help because files were better optimized however it uses a single thread just like when you use lto vs thinlto. The behaviour is also documented in Rust's documentation regarding this option.
(In reply to Daniel Engberg from comment #18) If that help was with memory use/memoory pressure, I'd not expect it to be as big of a difference as "wrkdir data" vs. just "data" for USE_TMPFS: "data" uses vastly less memory than the 17 GiByte+ figure. How much of a difference did codegen-units=1 make in your context? See comment 6 for someone reporting codegen-units=1 being insufficient in their context. (Many of my notes are tied to trying to help that person since they gave enough detail for me to have specific suggestions and expeirments to try and my own exeriiments to report on.) My hope is that the build-time/code-optimation tradeoff ends up under explicit control at some point. I do not expect general agreement about lang/rust build time frames being shorter (default codegen-units) vs. the consequences of taking the larger build times such as more optimized code (codegen-units=1). I'd expect the default to be for the choice made for the official package builders.
(In reply to Mark Millard from comment #15) I've started a bulk lang/rust on a Rock64 (4 Cortex-A53's) with 4 GiByte of RAM and 14 GiByte of swap and root on UFS (no ZFS use). (I normally avoid ZFS on systems with less than 8 GiBytes of RAM.) Again: It is based on (in part): USE_TMPFS="data" ALLOW_PARALLEL_JOBS= for building rust-1.54.0_2 (so: before the codegen-units change). It is a root-on-ZFS context. Also in use was /boot/loader.conf having: vm.pageout_oom_seq=120 vm.pfault_oom_attempts=-1 Again I have larger than default poudriere timout settings. I'll report figures based on my local top patches that record and report various "Maximum Observed" figures (MaxObs???? naming). I expect that it will complete without using any swap space. (But the Cortex-A53's will take a long time compared to the prior MACCHIATObin Double Shot experiment.) It is possible that I'll have to adjust some timeout(s) and retry: lang/rust will be the largest thing that I've built in such a context. I will note that, with 4 GiByte of RAM, the system would complain about being mistuned for swap with even near 16 GiBytes of swap.
(In reply to Mark Millard from comment #20) I've also started a lang/rust build on a Orange Pi+ 2E (4 Cortex-A7's, so armv7) with 2 GiBytes of RAM and 3 GiByte of swap. USB2 port, so slower I/O. USE_TMPFS="data" ALLOW_PARALLEL_JOBS= and: vm.pageout_oom_seq=120 vm.pfault_oom_attempts=-1 in use again, with larger than default poudriere timeouts. This will likely use a non-trivial amunt of swap, unlike the Rock64. (The Rock64 has used somewhat under 6 MiByte of swap early on. I've seen FreeBSD do such small usage when the need is non-obvious various times before.) This will also likely take a very long time to complete and may well need bigger timeouts. (Bigger vm.pageout_oom_seq too?) But I expect with appropriate values for such set the rust build will complete in this context. (I'm planning on adjusting timeouts to allow rust builds on these systems. So I've other reasons for the experiments but might as well report the results.) Again rust-1.54.0_2 (before the codegen-units=1 change). 1.54 had some problems on armv7 but, as I remember, not in building: later use. My prior armv7 build was on a Cortex-A72 (aarch64) targeting Cortex-A7 (armv7) via a jail that used -a arm.armv7 . (The Cortex-A72 can execute Cortex-A7 code.) But there was lots of RAM and cores for that, unlike this experiment.
(In reply to Mark Millard from comment #21) The armv7 (Cortex-A7) test is stopped for now because poudriere's time reporting is messed up, such as: [00:00:00] Creating the reference jail... done . . . [00:00:00] Balancing pool [main-CA7-default] [2021-09-25_23h11m13s] [balancing_pool:] Queued: 70 Built: 0 Failed: 0 Skipped: 0 Ignored: 0 Fetched: 0 Tobuild: 70 Time: -258342:-3:-36 [00:00:00] Recording filesystem state for prepkg... done . . .
(In reply to Mark Millard from comment #20) For the Rock64 rust-1.54.0_2 build test with 4GiBytes of RAM using USE_TMPFS="data" and ALLOW_PARALLEL_JOBS= and vm.pageout_oom_seq=120 and vm.pfault_oom_attempts=-1 but not using codegen-units=1 : . . . [00:01:22] Building 1 packages using 1 builders [00:01:22] Starting/Cloning builders [00:01:34] Hit CTRL+t at any time to see build progress and stats [00:01:34] [01] [00:00:00] Building lang/rust | rust-1.54.0_2 [16:11:35] [01] [16:10:01] Finished lang/rust | rust-1.54.0_2: Success [16:12:12] Stopping 1 builders where: last pid: . . . load averages: . . . MaxObs: 5.60, 5.01, 4.85 . . . . . . threads: . . . 11 MaxObsRunning . . . Mem: . . . 2407Mi MaxObsActive, 995248Ki MaxObsWired, 3161Mi MaxObs(Act+Wir+Lndry) Swap: 14336Mi Total, . . . 10712Ki MaxObsUsed, 2457Mi MaxObs(Act+Lndry+SwapUsed), 3171Mi MaxObs(Act+Wir+Lndry+SwapUsed) So, somewhat under 10.5 MiBytes of swap used at some point (maximum observed by top). If no swap had been made active, it likely still would have finished just fine: no swap space (partition) required. Reminder: This was a UFS context with a USB3 SSD media, no ZFS use.
(In reply to Mark Millard from comment #22) I've started the 2 GiByte RAM armv7 test again, after patching poudriere-devel for the time reporting issue.
(In reply to Mark Millard from comment #21) For this armv7 test I should have listed that I was going to use: USE_TMPFS=no (instead of "data"). The test is still running.
(In reply to Mark Millard from comment #25) For the Orange Pi+ 2E (armv7) rust-1.54.0_2 build test with 2GiBytes of RAM using USE_TMPFS=no and ALLOW_PARALLEL_JOBS= and vm.pageout_oom_seq=120 and vm.pfault_oom_attempts=-1 but not using codegen-units=1 : . . . [00:02:32] Building 1 packages using 1 builders [00:02:32] Starting/Cloning builders [00:03:21] Hit CTRL+t at any time to see build progress and stats [00:03:21] [01] [00:00:00] Building lang/rust | rust-1.54.0_2 [25:09:49] [01] [25:06:28] Finished lang/rust | rust-1.54.0_2: Success [25:10:27] Stopping 1 builders . . . . . . load averages: . . . MaxObs: 5.50, 5.13, 4.88 . . . . . . threads: . . . 11 MaxObsRunning . . . Mem: . . . 1559Mi MaxObsActive, 257660Ki MaxObsWired, 1837Mi MaxObs(Act+Wir+Lndry) Swap: 3072Mi Total, . . . 320604Ki MaxObsUsed, 1898Mi MaxObs(Act+Lndry+SwapUsed), 2113Mi MaxObs(Act+Wir+Lndry+SwapUsed) So: Well under 350 MiBytes of swap used for USE_TMPFS=no with 2 GiBytes of RAM. Swap space likely required, given its size vs. the 2 GiBytes. (USE_TMPFS="data" would have used more swap space.) Reminder: This was a UFS context with a USB3 SSD media, no ZFS use.
(In reply to Mark Millard from comment #23) Lookd like the Rock64 test was with USE_TMPFS=no instead of USE_TMPFS="data" .
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/ports/commit/?id=124261fa7deb972b647c686d2531bbba0a9a4223 commit 124261fa7deb972b647c686d2531bbba0a9a4223 Author: Tobias Kortkamp <tobik@FreeBSD.org> AuthorDate: 2021-10-18 16:38:55 +0000 Commit: Tobias Kortkamp <tobik@FreeBSD.org> CommitDate: 2021-10-25 08:46:49 +0000 lang/rust: Update to 1.56.0 - Drop codegen-units=1 again as it seems to negatively impact build time for some people [1] Changes: https://blog.rust-lang.org/2021/10/21/Rust-1.56.0.html PR: 256099 [1] PR: 259251 Reviewed by: jbeich, mikael, pkubaj Exp-run by: antoine Differential Revision: https://reviews.freebsd.org/D32553 Mk/Uses/cargo.mk | 2 +- Mk/bsd.gecko.mk | 4 +- lang/rust-bootstrap/Makefile | 2 +- lang/rust-bootstrap/distinfo | 6 +- lang/rust/Makefile | 10 +- lang/rust/distinfo | 102 ++++++++++----------- lang/rust/files/patch-src_bootstrap_install.rs | 8 +- ...c_tools_cargo_src_cargo_util_toml_mod.rs (gone) | 22 ----- ...h-compiler_rustc__target_src_spec_mod.rs (gone) | 10 -- ...et_src_spec_powerpc__unknown__freebsd.rs (gone) | 27 ------ .../powerpc/patch-src_bootstrap_native.rs (gone) | 25 ----- ..._src_unix_bsd_freebsdlike_freebsd_mod.rs (gone) | 12 --- ..._unix_bsd_freebsdlike_freebsd_powerpc.rs (gone) | 50 ---------- .../patch-vendor_openssl__src_src_lib.rs (gone) | 10 -- 14 files changed, 65 insertions(+), 225 deletions(-)
(In reply to Tobias Kortkamp from comment #1) Regarding bundled libgit2, I see the following complaint by stage-qa: ====> Running Q/A tests (stage-qa) Error: /usr/local/bin/cargo is linked to /usr/local/lib/libgit2.so.1.1 from devel/libgit2 but it is not declared as a dependency From that it would seem that if libgit2 is installed, lang/rust will use it (at least link with it if not compile with it, which might be worse) rather than the bundled libgit2. If the bundled version is still desired, we should take steps to ensure it uses only the bundled version and not have the system's installed libgit2 (if any) leak into the build. I don't have a patch at this time for accomplishing that (not well versed in the whole rust build process). Or if we want to use devel/libgit2, that would be fine as well (add it as a library dependency). Either way, make sure the build is explicit about which libgit2 it is using. I doubt it's worth making it an option - better (probably) for a package like rust to just make an executive decision and pick one way.
(In reply to John Hein from comment #29) Please submit a patch to make sure it doesn't use system libgit2.
(In reply to John Hein from comment #29) The libgit2 issue seems to have nothing to do with the original problem description about "rust/cargo eats a lot of memory". A separate bugzilla submittal would seem to be appropriate instead.
*** Bug 265799 has been marked as a duplicate of this bug. ***
I guess we can close this by now
(In reply to Daniel Engberg from comment #33) I'd say it's debatable. Still can't build rust at all on a low-end system with 3 G RAM. Only setting codegen units to 1 helps there. This also breaks the logic of MAKE_JOBS_NUMBER — one does expect single CPU/core will be used when variable equals 1, which is not true with multiple code units. IMO defaults should be safe for anyone, and if one wants shorter build times — it's their responsibility to enable parallel jobs, tune compiler options, etc.
(In reply to Anton Saietskii from comment #34) I recently built rust on an armv7 with 2 GiBytes of RAM with llvm18 building at the same time, using all 4 cores (2 for rust and 2 for lvm18). It was part of a from scratch build of 265 packages. https://lists.freebsd.org/archives/freebsd-ports/2024-March/005792.html has my notes for multiple 4 core little arm boards, mostly aarch64 but the one armv7 example as well. The 2 GiByte context details are: Context: 1GHz, 4 core, cortex-a7 (armv7), 2 GiBytes RAM, USB2. RAM+SWAP: 5.6 GiBytes. Also, this is doing my normal armv7 (and aarch64) style of devel/llvm* build: OPTION'd to BE_NATIVE instead of BE_STANDARD and OPTION'd to not build MLIR. (No adjustment of options for rust.) /usr/local/etc/poudriere.conf has . . . NO_ZFS=yes USE_TMPFS=no PARALLEL_JOBS=2 ALLOW_MAKE_JOBS=yes MAX_EXECUTION_TIME=432000 NOHANG_TIME=432000 MAX_EXECUTION_TIME_EXTRACT=14400 MAX_EXECUTION_TIME_INSTALL=14400 MAX_EXECUTION_TIME_PACKAGE=57600 MAX_EXECUTION_TIME_DEINSTALL=14400 Not essential: PRIORITY_BOOST="cmake-core llvm18 rust" /usr/local/etc/poudriere.d/make.conf has . . . MAKE_JOBS_NUMBER_LIMIT=2 (With PARALLEL_JOBS=2 that keeps the load averages under 5 most of the time.) /etc/fstab does not specify any tmpfs use or the like: avoids competing for RAM+SWAP. RAM == 2 GiBytes RAM+SWAP == 5.6 Gibytes (So: SWAP == 3.6 GiBytes) I also historically use USB SSD/NVMe media, no spinning rust, no microsd cards or such. /boot/loader.conf has . . . # # Delay when persistent low free RAM leads to # Out Of Memory killing of processes: vm.pageout_oom_seq=120 This is important to allowing various things to complete. (The default is 12. 120 is not the maximum but has been appropriate in my context. The figure is not in time units but larger increases the observed delay so more work gets done before OOM activity starts.) Using vm.pageout_oom_seq is not specific to poudriere use. Other notes: 2794Mi MaxObs(Act+Wir+Lndry+SwapUsed) Swap: 995524Ki MaxObsUsed It finished overall, in somewhat under 5.5 days. The "what builds took over an hour" summary is: [01:51:31] [01] [01:00:07] Finished lang/perl5.36 | perl5-5.36.3_1: Success [08:55:35] [02] [03:08:09] Finished devel/icu | icu-74.2,1: Success [13:17:38] [02] [01:28:32] Finished lang/ruby31 | ruby-3.1.4_1,1: Success [14:17:44] [01] [09:20:55] Finished devel/cmake-core | cmake-core-3.28.3: Success [4D:01:03:43] [02] [3D:08:48:53] Finished lang/rust | rust-1.76.0: Success [4D:06:26:24] [02] [03:09:35] Finished devel/binutils@native | binutils-2.40_5,1: Success [4D:14:54:31] [02] [03:38:55] Finished devel/aarch64-none-elf-gcc | aarch64-none-elf-gcc-11.3.0_3: Success [4D:16:13:00] [01] [4D:01:55:03] Finished devel/llvm18@default | llvm18-18.1.0.r3: Success [4D:18:05:58] [02] [03:11:00] Finished devel/arm-none-eabi-gcc | arm-none-eabi-gcc-11.3.0_3: Success [4D:23:00:13] [01] [06:46:06] Finished devel/boost-libs | boost-libs-1.84.0: Success [5D:00:16:39] [01] [01:15:53] Finished textproc/source-highlight | source-highlight-3.1.9_9: Success [5D:01:17:24] [02] [07:10:52] Finished lang/gcc13 | gcc13-13.2.0_4: Success [5D:09:38:14] [01] [05:56:48] Finished devel/freebsd-gcc13@armv7 | armv7-gcc13-13.2.0_1: Success [5D:10:18:58] [02] [05:44:02] Finished devel/gdb@py39 | gdb-14.1_2: Success [5D:10:31:56] Stopping 2 builders [main-CA7-default] [2024-03-06_03h15m10s] [committing] Queued: 265 Built: 265 Failed: 0 Skipped: 0 Ignored: 0 Fetched: 0 Tobuild: 0 Time: 5D:10:31:55 In /etc/rc.conf I have: if [ "`sysctl -i -n hw.fdt.model`" == "Xunlong Orange Pi Plus 2E" ]; then sysctl dev.cpu.0.freq=1008 > /dev/null fi In other words: a fixed 1GHz or so clock rate is used. It has heatsinks and a fan. I happen to build for armv7 with use of -mcpu=cortex-a7 generally (kernel, world, and packages).
(In reply to Mark Millard from comment #35) I'm testing a similar aarch64 build (271 package for aarch64 it turns out) based on using total_mem=2048 in a RPi4B config.txt. (I've no access to a native 2 GiByte aarch64.) While it looks like it may well complete, I did get: # tail -3 /var/log/messages Mar 23 16:39:38 aarch64-main-pkgs kernel: pid 37137 (conftest), jid 11, uid 0: exited on signal 11 (core dumped) Mar 24 04:51:50 aarch64-main-pkgs kernel: swap_pager: cannot allocate bio Mar 24 04:51:50 aarch64-main-pkgs syslogd: last message repeated 3 times (Nothing looks to have failed and the build got past the peak RAM+SWAP use and is continuing. I'd never seen this type of message before.) rust and llvm18 were building at the time. The from-scratch bulk build is of 271 packages. 143 had already built. llvm18 was using more RAM+SWAP than rust and was working on some llvm-tblgen runs for AMDGPU at the time. It might be that building just llvm18 allowing all 4 hardware threads to be active might not have been able to complete. I've a modified top that tracks some "MaxObs" (Maximum Observed) figures. They happened to show: Mem: . . . 1473Mi MaxObsActive, 477304Ki MaxObsWired, 1908Mi MaxObs(Act+Wir+Lndry) Swap: . . . 3101Mi MaxObsUsed, 4456Mi MaxObs(Act+Lndry+SwapUsed), 4887Mi MaxObs(A+Wir+L+SU), 4933Mi (A+W+L+SU+InAct) (The 4933Mi (A+W+L+SU+InAct) is from when 4887Mi MaxObs(A+Wir+L+SU) was live but is not a MaxObs figure itself.) So a little under 4.9 GiBytes of RAM+SWAP in use at the time. It was paging significantly at the time, of course. For reference: /usr/local/etc/poudriere.conf has . . . NO_ZFS=yes USE_TMPFS=data PARALLEL_JOBS=2 ALLOW_MAKE_JOBS=yes MAX_EXECUTION_TIME=432000 NOHANG_TIME=432000 MAX_EXECUTION_TIME_EXTRACT=14400 MAX_EXECUTION_TIME_INSTALL=14400 MAX_EXECUTION_TIME_PACKAGE=57600 MAX_EXECUTION_TIME_DEINSTALL=14400 /usr/local/etc/poudriere.d/make.conf has . . . MAKE_JOBS_NUMBER_LIMIT=2 /boot/loader.conf has . . . vm.pageout_oom_seq=120 FYI: Using USE_TMPFS=no or USE_TMPFS=data (and avoiding other tmpfs use) avoids rust using huge amounts of RAM+SWAP For tmpfs and ends up using less peak RAM+SWAP than llvm18 does.
(In reply to Mark Millard from comment #36) Note: The "(conftest)" line was only included for its timestamp, indicating no other just-prior message for the "swap_pager: cannot allocate bio" messages. The "(conftest)" line is normal output for the overall bulk build.
(In reply to Mark Millard from comment #36) Despite the 4 "swap_pager: cannot allocate bio" notices, the aarch64 bulk build of 271 packages completed for the 2 GiByte RAM (88.5 GiByte RAM+SWAP) context: [2D:01:44:36] Stopping 2 builders [main-aarch64-pkgbase-default] [2024-03-23_11h33m25s] [committing] Queued: 271 Built: 271 Failed: 0 Skipped: 0 Ignored: 0 Fetched: 0 Tobuild: 0 Time: 2D:01:44:39 MaxObsWired increased after my prior note, so updating: Mem: . . . 1473Mi MaxObsActive, 680564Ki MaxObsWired, 1908Mi MaxObs(Act+Wir+Lndry) Swap: . . . 3101Mi MaxObsUsed, 4456Mi MaxObs(Act+Lndry+SwapUsed), 4887Mi MaxObs(A+Wir+L+SU), 4933Mi (A+W+L+SU+InAct) (The 4933Mi (A+W+L+SU+InAct) is from when 4887Mi MaxObs(A+Wir+L+SU) was live but is not a MaxObs [MAXimum OBServed] figure itself.) I conclude from the examples that, for aarch64 and armv7, "can't build rust at all on a low-end system with 3 G RAM. Only setting codegen units to 1 helps there" is false: One can build rust and llvm18 at the same time with only 2 GiBytes of RAM --but doing so requires use of appropriate SWAP space and avoiding nearly all tmpfs use, as well as use of the likes of PARALLEL_JOBS=2 and MAKE_JOBS_NUMBER_LIMIT=2 to limit the parallel activity. Also likely: avoiding ZFS being active (automatic/implicit for what I tested). Repeating (with adjustments) the note about the RAM+SWAP usage of rust vs. llvm18: Using USE_TMPFS=no or USE_TMPFS=data (and avoiding most other tmpfs use) avoids rust using huge amounts of RAM+SWAP for tmpfs and rust ends up using less peak RAM+SWAP than llvm18 does.
(In reply to Mark Millard from comment #38) > I conclude from the examples that, for aarch64 and armv7, "can't build rust at all on a low-end system with 3 G RAM. Only setting codegen units to 1 helps there" is false I didn't say anything about ARM. amd64, 3G RAM + 3G swap, ONE thread, ZFS, USE_TMPFS=no -- build fails. Yes, and also no swap-related tuning and shouldn't really be -- it will affect everything on the machine while we need and can fix rust itself. I mean it should simply build on a default install.
(In reply to Anton Saietskii from comment #39) > I didn't say anything about ARM. amd64 . . . I do not see you referencing amd64 before the above. I used what fit the text "on a low-end system with 3 G RAM" (or less than 3 GiBytes of RAM) that I happened to have access to. I do not have access to a amd64 system with a small RAM size. The experiment would be interesting to me if I had access to such a context. The closest I could do is to monitor Act+Wir+Lndry+SwapUsed despite the lack of significant memory pressure. (So, for example, SwapUsed would likely stay zero and Lndry might as well.) If I do such, I'll report on the results.
(In reply to Anton Saietskii from comment #39) I managed to set up an amd64, 4-core (one hardware thread each), 2 GiByte RAM, 9.5 GiByte RAM+SWAP, UFS, Hyper-V virtual machine: . . . Hypervisor: Origin = "Microsoft Hv" real memory = 2147483648 (2048 MB) avail memory = 2033299456 (1939 MB) Event timer "LAPIC" quality 100 ACPI APIC Table: <VRTUAL MICROSFT> FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 1 package(s) x 4 core(s) . . . (from gpart show:) . . . 2921332736 7340032 da0p3 freebsd-swap (3.5G) . . . 2919770112 8388608 da1p4 freebsd-swap (4.0G) . . . So, overall: RAM == 2.0 GiBytes RAM+SWAP == 9.5 GiBytes (using both swap partitions). It actually is directly using my normal FreeBSD UFS boo media for the 7950X3D system. (To make the small swap space I had to use 2 smaller, previously-free spaces, split across 2 FreeBSD media.) It is running my experimental bulk build that builds 271 packages, including rust and llvm18 building at the same time as part of the overall sequence. It is to the point that rust and llvm18 are what are building. (My attempt to get useful MAXimum OBServed information absent the memory pressure from from a 2 GiByte restriction failed to give useful memory figures. So this has the 2 GiByte restriction. I could try 3 GiBytes or other small figures if appropriate but I might not be able to scale RAM+SWAP as I usually do.) More later . . . I'll report sometime after the build. (I may be sleeping when it finishes.)
(In reply to Anton Saietskii from comment #39) rust finished building but llvm18 (and more) is still building: [01:56:31] [02] [01:30:29] Finished lang/rust | rust-1.76.0: Success During rust+llvm18 both building . . . RAM: 1405Mi MaxObsActive 607948Ki MaxObsWired 1944Mi MaxObs(Act+Wir+Lndry) SWAP: 2609Mi MaxObsUsed RAM+SWAP: 3932Mi MaxObs(Act+Lndry+SwapUsed) 4474Mi MaxObs(A+Wir+L+SU) 4528Mi (A+W+L+SU+InAct) I now conclude from the examples that, for amd64, aarch64, and armv7, "can't build rust at all on a low-end system with 3 G RAM. Only setting codegen units to 1 helps there" is false, given that I have a counter-example from each of the 3 contexts. Adding "with ZFS/ARC in use" to your wording may well prevent generating counter-examples to the extended statement. I'll note that for MaxObsWired (under RAM), the ZFS/ARC goes in the Wired category. Having a larger Wired would mean having less space for Active+Inact+Lndry, no matter how much swap space has been set up, given a fixed 3 GiByte RAM space. Note: I assume that your "ONE thread" reference means that, effectively, you had some equivalent of using the combination: PARALLEL_JOBS=1 MAKE_JOBS_NUMBER_LIMIT=1 More after it is all done building . . .
(In reply to Mark Millard from comment #42) The overall bulk build finished: [03:42:24] Stopping 2 builders . . . [main-amd64-bulk_a-default] [2024-03-25_22h26m11s] [committing] Queued: 271 Built: 271 Failed: 0 Skipped: 0 Ignored: 0 Fetched: 0 Tobuild: 0 Time: 03:42:25 MaxObsWired is the only MaxObs figure that increased, as tends to happen some when there is less memory pressure: RAM: 1405Mi MaxObsActive 827040Ki MaxObsWired 1944Mi MaxObs(Act+Wir+Lndry) SWAP: 2609Mi MaxObsUsed RAM+SWAP: 3932Mi MaxObs(Act+Lndry+SwapUsed) 4474Mi MaxObs(A+Wir+L+SU) 4528Mi (A+W+L+SU+InAct) (The 4528Mi (A+W+L+SU+InAct) is from when 4474Mi MaxObs(A+Wir+L+SU) was live but is not a MaxObs [MAXimum OBServed] figure itself.) I'm glad that I now have a context for such smaller RAM tests on amd64. They do not take nearly as long to complete compared to the RPi4B and OrangePi+2ed contexts.
As a simpler test in the amd64 Hyper-V context that I'd described, I tried a MAKE_JOBS_NUMBER_LIMIT=4 based run of just building rust: # poudriere bulk -jmain-amd64-bulk_a -C lang/rust . . . [00:47:26] [01] [00:47:19] Finished lang/rust | rust-1.76.0: Success [00:47:34] Stopping 1 builders . . . [main-amd64-bulk_a-default] [2024-03-26_04h00m50s] [committing] Queued: 1 Built: 1 Failed: 0 Skipped: 0 Ignored: 0 Fetched: 0 Tobuild: 0 Time: 00:47:35 It got: Mem: 1395Mi MaxObsActive 823960Ki MaxObsWired 1943Mi MaxObs(Act+Wir+Lndry) Swap: 2152Mi MaxObsUsed RAM+SWAP: 3361Mi MaxObs(Act+Lndry+SwapUsed) 3912Mi MaxObs(A+Wir+L+SU) 4089Mi (A+W+L+SU+InAct) (The 4089Mi (A+W+L+SU+InAct) is from when 3912Mi MaxObs(A+Wir+L+SU) was live but is not a MaxObs [MAXimum OBServed] figure itself.)
(In reply to Anton Saietskii from comment #39) "The Design and Implementation of the FreeBSD operating system", 2ed, page 542 says of ZFS: QUOTE ZFS was designed to manage an operate enormous filesystems easily, which it does well. Is design assumed that it would have many fast 64-bit CPUs with large amount of memory to support these enormous file systems. When these resoruces are available, it works extremely well. However, it is not designed for or well suited to run on resource-constrained system using 32-bit CPUs with less than 8 Gbyte of memory and one small, nearly-full disk, which is typical of many embedded systems. END QUOTE Another quote from the prior page: QUOTE Like all non-overwriting fielsystems, ZFS operates baest when at least a quarter of its disk pool is free. Write throughput becomes poor when the pool gets too full. By contrastm UFS can run well to 95 percent full and acceptably to 99 percent full. END QUOTE It does not appear that you have "many fast 64-bit CPUs" and you have "less than 8 Gbyte of memory" by a fair amount. Other aspects of the relationships to the quotes are less clear. Still, as I understand it, your context is not well suited to ZFS use for resource intensive activity like building packages (or ports), at least absent rather special-case tuning.
(In reply to Anton Saietskii from comment #39) I happened to run another build test on the OrangePi+ 2ed (2 GiBytes of RAM), so armv7. lang/rust 1.79.0 was involved this time. It's build stopped with: [ 86% 2867/3315] Linking CXX executable bin/llvm-ar FAILED: bin/llvm-ar . . . LLVM ERROR: out of memory Allocation failed . . . [ 86% 2867/3315] Linking CXX shared library lib/libLTO.so.18.1-rust-1.79.0-stable FAILED: lib/libLTO.so.18.1-rust-1.79.0-stable . . . LLVM ERROR: out of memory Allocation failed (No system OOM kills or notices.) These look to be process size/process-fragmented-space issues, not system RAM+SWAP size issues. My odd, patched-up top reported for the overall system: Mem: . . ., 1728Mi MaxObsActive, 275192Ki MaxObsWired, 1952Mi MaxObs(Act+Wir+Lndry) Swap: . . ., 1535Mi MaxObsUsed, 3177Mi MaxObs(Act+Lndry+SwapUsed), 3398Mi MaxObs(A+Wir+L+SU), 3449Mi (A+W+L+SU+InAct) These "Max Observed" figures had been reached earlier in the overall building of ports. This was for a context with: AVAIL_RAM_during_boot+SWAP == 1958Mi+3685Mi == 5643Mi So there still was notable RAM+SWAP space available.
In trying to answer someone's question(s) I did another configuration test via HyperV, this time a larger configuration . . . I set up a 8192 MiByte RAM, 30720 MiByte SWAP, so 38 GiByte RAM+SWAP, 16 FreeBSD-cpu context, UFS only, TMPFS_BLACKLIST in use listing rust, vm.pageout_oom_seq=120 in use, and built lang/rust while using my adjusted version of top that monitors and reports Maximum used figures: [00:00:10] [01] [00:00:00] Building lang/rust | rust-1.82.0_1 [00:25:16] [01] [00:25:06] Finished lang/rust | rust-1.82.0_1: Success RAM: 6367Mi MaxObsActive 1631Mi MaxObsWired 7845Mi MaxObs(Act+Wir+Lndry) Swap: 13570Mi MaxObsUsed RAM+SWAP: 19925Mi MaxObs(Act+Lndry+SwapUsed) 21359Mi MaxObs(A+Wir+L+SU) with 21468Mi (A+W+L+SU+InAct) at the time So 8192 MiBytes RAM + 13570 MiBytes SWAPUSED is 21762 MiBytes RAM+SWAP, which is somewhat under 22 GiBytes total (overhead included). (One would want more margin for variations/growth.) Thus, if one sufficiently avoided competing for RAM for other tradeoffs, 32 GiBytes of RAM should be plenty for the 16 FreeBSD-cpus: Something like 10 GiBytes RAM to spare. But tmpfs alone will likely be more than twice that 10 GiByte figure for the configuration that was reported. This wording ignores ZFS ARC competition, as I've no clue what sort of figures that one would end up seeing for that. FYI: the TMPFS_BLACKLIST use listing rust meant that lang/rust used only about 3.59 GiBytes of tmpfs, instead of more like 24 GiBytes of tmpfs that USE_TMPFS=all would normally end up using.