Bug 256099

Summary: lang/rust: Reduce memory usage/pressure
Product: Ports & Packages Reporter: Daniel Engberg <diizzy>
Component: Individual Port(s)Assignee: FreeBSD Rust Team <rust>
Status: Closed Overcome By Events    
Severity: Affects Only Me CC: diizzy, gasol.wu, hostmaster+freebsd, jcfyecrayz, marklmi26-fbsd, mikael, rudolphfroger, vas, vsasjason, ygy
Priority: --- Flags: bugzilla: maintainer-feedback? (rust)
Version: Latest   
Hardware: Any   
OS: Any   

Description Daniel Engberg freebsd_committer freebsd_triage 2021-05-23 11:54:28 UTC
During some build stages rust/cargo eats a lot of memory using its default settings making it more or less not viable for low/mid-rage systems.

Can we consider setting cogegen-units to 1 or add a toggle for it and also perhaps parallel compiling?
Ref: https://reviews.freebsd.org/D30099#677659
Comment 1 Tobias Kortkamp freebsd_committer freebsd_triage 2021-06-04 12:04:08 UTC
Daniel, I'm sorry, not trying to ignore you but unsure of expectations.
Seems worthwhile but someone needs to run the builds, get some
before and after numbers, research downsides if any, and then I
guess we can set codegen-units=1 if it looks ok, sure.

As for some of the comments from the review:

> [...] uses LLVM from their package tree instead of bundled, perhaps that's worth looking into?

lang/rust had an LLVM_PORT option once, but that only works if
somebody feels responsible for supporting it and fixing any regressions
that might happen.  But nobody really did so we removed it.

> Unbundle libssh2?

It's bundled now I guess because we bundle libgit2 now too (because
devel/libgit2 was not updated fast enough again).  Since we update
the toolchain every 6 weeks it is all probably not worth the hassle.
Comment 2 Daniel Engberg freebsd_committer freebsd_triage 2021-06-06 21:52:18 UTC
Hi,

I mainly started to look into this as building Rust on my low-end server (specs below) failed despite having quite a bit of RAM and swap at disposal.

While this isn't ideal way of logging here's a graph of memory usage at the end of compiling rust (2 jobs) that succeeds.
https://projects.pyret.net/files/public/freebsd/mem-usage-rustc.png

This box is an old Dell T20 with a dual core Intel Pentium G3220 CPU, 12Gb of RAM and running ZFS for Poudriere but not on rootfs. It runs 12.2-RELEASE-p6 and building Rust in a 12.2 jail. It's lightly loaded and uses about 5-6Gb of RAM (incl ZFS) without any Poudriere job running. I have no specific ZFS tuning set, however from what I can tell ZFS cache seems to grow quite a bit compiling Rust.

Setting codegen reduces memory usage about 1-1.5Gb from what I can tell but memory usage is still quite high.

I also gave this a go on my RockPro64 (arm64) (4Gb of RAM) running 13-STABLE (stable/13-n245283-70a2e9a3d44), UFS only and while it took 14h+ hours (-j1) it did finish. During compiing it used about 2Gb (the job not the complete system) tops which is a lot less than what I'm seeing on my server.

I'll give this a go on another box running 13-STABLE (amd64) and see if that also consumes a lot of memory.

Thanks for replying about LLVM and libssh2, if it's too much of a hassle I understand the decision :-)
Comment 3 Daniel Engberg freebsd_committer freebsd_triage 2021-06-07 01:08:15 UTC
Hmm... compiling & optimizing seems to use a bit more memory, I did see a few processes use more than 2.5Gb of memory. Wired memory is a lot more though, ~4.8G and peaked at 6.7G so I guess that's due to ZFS?
Comment 4 commit-hook freebsd_committer freebsd_triage 2021-06-14 20:52:02 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/ports/commit/?id=294f0c5c206d70e24b6bbc28766d962dc82f8b61

commit 294f0c5c206d70e24b6bbc28766d962dc82f8b61
Author:     Tobias Kortkamp <tobik@FreeBSD.org>
AuthorDate: 2021-06-14 18:50:33 +0000
Commit:     Tobias Kortkamp <tobik@FreeBSD.org>
CommitDate: 2021-06-14 20:51:11 +0000

    lang/rust-nightly: Try to reduce memory usage/pressure

    Try to reduce memory usage/pressure by only using one code generation
    unit.

    "This flag [codegen-units] controls how many code generation units
    the crate is split into.  It takes an integer greater than 0.

    When a crate is split into multiple codegen units, LLVM is able to
    process them in parallel.  Increasing parallelism may speed up
    compile times, but may also produce slower code.  Setting this to
    1 may improve the performance of generated code, but may be slower
    to compile."

    https://doc.rust-lang.org/rustc/codegen-options/index.html#codegen-units

    PR:             256099
    Suggested by:   Daniel Engberg

 lang/rust/Makefile | 3 +++
 1 file changed, 3 insertions(+)
Comment 5 Guangyuan Yang freebsd_committer freebsd_triage 2021-06-27 09:27:01 UTC
Just wanted to report here that, building Rust always get OOM'ed after ~6 hours on my low-end build box 10 times in a row, spec:

- Intel i5-6500T (4) @ 2.496GHz
- 16GB RAM, 2GB swap
- FreeBSD 13.0-RELEASE amd64
- ZFS on root

It does finish on my VPS which has similar spec but more swap, however, the memory/swap usage is very high. Spec:

- Intel Xeon Platinum 8171M (4) @ 2.095GHz
- 16GB RAM, 32GB swap
- FreeBSD 13.0-RELEASE amd64
- UFS on root, with ZFS enabled on datadisks

I haven't looked at it closely, so will report back if I noticed anything. Thanks!
Comment 6 Guangyuan Yang freebsd_committer freebsd_triage 2021-06-30 01:33:57 UTC
Cannot get lang/rust-nightly to build on my build box with 16G memory with the code generation unit change. The CPU usage is down to one core, but the RAM pressure is still very high, and the whole process ended up getting OOM'ed.
Comment 7 commit-hook freebsd_committer freebsd_triage 2021-09-19 09:16:18 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/ports/commit/?id=6f1fefb50e755d727f471aeb75ebe4e28f876b4b

commit 6f1fefb50e755d727f471aeb75ebe4e28f876b4b
Author:     Tobias Kortkamp <tobik@FreeBSD.org>
AuthorDate: 2021-09-07 08:14:14 +0000
Commit:     Tobias Kortkamp <tobik@FreeBSD.org>
CommitDate: 2021-09-19 09:03:21 +0000

    lang/rust: Update to 1.55.0

    - Set codegen-units=1 [1]
    - Add hack to skip cargo update on git sources as a step towards solving [2]
    - Fix 'capacity overflow' panics on armv* [3]

    Changes:        https://blog.rust-lang.org/2021-09-09/Rust-1.55.0.html
    PR:             258337
    PR:             256099 [1]
    PR:             256581 [2]
    PR:             257419 [3]
    Reviewed by:    mikael, pkubaj
    Exp-run by:     antoine
    Differential Revision:  https://reviews.freebsd.org/D31872
    With hat:       rust

 Mk/Uses/cargo.mk                                   |   2 +-
 Mk/bsd.gecko.mk                                    |   2 +-
 lang/rust-bootstrap/Makefile                       |   8 +-
 lang/rust-bootstrap/distinfo                       |   6 +-
 lang/rust/Makefile                                 |  12 +--
 lang/rust/distinfo                                 | 114 ++++++++++-----------
 ...m-project_compiler-rt_lib_builtins_cpu__model.c |  21 ++--
 ...ols_cargo_src_cargo_sources_git_source.rs (new) |  45 ++++++++
 ...rc_tools_cargo_src_cargo_util_toml_mod.rs (new) |  22 ++++
 .../patch-vendor_openssl-sys_build_main.rs (gone)  |  19 ----
 ..._src_unix_bsd_freebsdlike_freebsd_mod.rs (gone) |  12 ---
 ..._unix_bsd_freebsdlike_freebsd_powerpc.rs (gone) |  50 ---------
 .../powerpc64-elfv1/patch-src_bootstrap_native.rs  |  10 +-
 ...h-compiler_rustc__target_src_spec_mod.rs (gone) |  10 --
 ...rc_spec_powerpc64le__unknown__freebsd.rs (gone) |  19 ----
 15 files changed, 154 insertions(+), 198 deletions(-)
Comment 8 Mark Millard 2021-09-23 19:24:07 UTC
(In reply to commit-hook from comment #7)

An FYI for systems with more resources . . .

Prior to this change during a from-scratch bulk -a using ALLOW_PARALLEL_JOBS= :

[05:52:06] [16] [04:29:11] Finished lang/rust | rust-1.54.0_2: Success

on the same machine after the change (again from-scratch
using ALLOW_PARALLEL_JOBS= ):

[12:39:47] [14] [11:20:24] Finished lang/rust | rust-1.55.0: Success

So about 2.5 times longer (about 4.5 hrs -> 11.3 hrs).

For reference:

HoneyComb (16 Cortext-A72's) with 64 GiBytes of RAM, root on ZFS,
Optane 480 media. Large swap on USB3 SSD media but top indicated
it was unused during both the bulk -a builds.

This test does not control just what the other 15 builders
were doing in the overlapping time frames in each bulk -a
but all the other builders were busy with a sequence of
builds over that time. The load averages were well over 16
but I do not have record of such over time for either bulk -a .

I've another bulk -a going on that machine and it may be about
a week before it finishes. (The 11:20:24 figure is from that
on-oing bulk -a .)
Comment 9 Mark Millard 2021-09-23 19:36:00 UTC
(In reply to Mark Millard from comment #8)

I forgot to list that:

USE_TMPFS="data"

was in use.

I've built rust by itself with USE_TMPFS=yes (so "wrkdir data") in the
past and the tmpfs use grew to around 17 GiBytes. Luckilly I had swap
configured that was sufficient for the machine that was done on at the
time.

Having USE_TMPFS allowing significant tmpfs sizes for port builds
using huge amounts of disk space basically requires an environment
with sufficient resources arrnaged up front.

The use of PCIe OPTANE media helps avoid I/O being as much of an issue
as it could be with, say, spinning media.
Comment 10 Mark Millard 2021-09-23 19:55:59 UTC
(In reply to Mark Millard from comment #9)

I found my old note about the tmpfs use for USE_TMPFS=yes
for lang/rust :

# df -m | grep tmpfs
Filesystem 1M-blocks   Used  Avail Capacity  Mounted on
. . .
tmpfs         301422  17859 283563     6%    /usr/local/poudriere/data/.m/FBSDFSSDjail-default/01/wrkdirs
. . .

So the 17 GiBytes was only the "wrkdirs" contribution.
Comment 11 Mikael Urankar freebsd_committer freebsd_triage 2021-09-24 09:22:20 UTC
(In reply to Mark Millard from comment #8)
I have similar result on my amd64 box:

rust 1.55.0 with codegen-units=1
build time: 00:39:59


rust 1.55.0 without codegen-units=1
build time: 00:23:15
Comment 12 Mark Millard 2021-09-24 18:17:40 UTC
(In reply to Daniel Engberg from comment #0)

What USE_TMPFS (or analogous) was in use?
Comment 13 Mark Millard 2021-09-24 18:18:38 UTC
(In reply to Guangyuan Yang from comment #5)

What USE_TMPFS (or analogous) setting was in use?
Comment 14 Mark Millard 2021-09-24 19:08:43 UTC
(In reply to Guangyuan Yang from comment #5)

Unfortunately messages such as:

pid . . . (. . .), jid . . ., uid . . ., was killed: out of swap space

can be a misnomer for the "out of swap space" part: it can
be reported even when none of the swap space had been in use.
There are other reasons possible for why kills happen. One
point is that FreeBSD wil not swap out a process that stays
runnable, even if its active memory use keeps the free RAM
minimal, it just continues to page in and out.

If it really was out of swap space there would also be messages
like:

swap_pager_getswapspace(. . .): failed

or:

swap_pager: out of swap space

Other causes for the kills include:

Sustained low free RAM (via stays-runnable processes).
A sufficiently delayed pageout.
The swap blk uma zone was exhausted.
The swap pctrie uma zone was exhausted.

The first two of those have some tunables
that you might want to try:

# Delay when persistent low free RAM leads to
# Out Of Memory killing of processes:
vm.pageout_oom_seq=120

# For plunty of swap/paging space (will not
# run out), avoid pageout delays leading to
# Out Of Memory killing of processes:
vm.pfault_oom_attempts=-1

# For possibly insufficient swap/paging space
# (might run out), increase the pageout delay
# that leads to Out Of Memory killing of
# processes (showing defaults at the time):
#vm.pfault_oom_attempts= 3
#vm.pfault_oom_wait= 10
# (The multiplication is the total but there
# are other potential tradoffs in the factors
# multiplied, even for nearly the same total.)


I'll note that vm.pageout_oom_seq has a default of 12
but can be much larger than 120, such as 1024 or 10240
or even more. Larger figures increase the time before
kills start happening because of sustained low free RAM.
But no setting is designed to disable the kills from
eventually happening on some scale.
Comment 15 Mark Millard 2021-09-25 01:27:53 UTC
(In reply to Guangyuan Yang from comment #5)

The following is based on (in part):

USE_TMPFS="data"
ALLOW_PARALLEL_JOBS=

for building rust-1.54.0_2 (so: before the codegen-units change).
It is a root-on-ZFS context. Also in use was /boot/loader.conf
having:

vm.pageout_oom_seq=120
vm.pfault_oom_attempts=-1

I'll report figures based on my local top patches that record
and report various "Maximum Observed" figures (MaxObs???? naming).

poudriere output:

. . .
[00:00:23] Building 1 packages using 1 builders
[00:00:23] Starting/Cloning builders
[00:00:27] Hit CTRL+t at any time to see build progress and stats
[00:00:27] [01] [00:00:00] Building lang/rust | rust-1.54.0_2
[05:10:56] [01] [05:10:29] Finished lang/rust | rust-1.54.0_2: Success
[05:11:35] Stopping 1 builders
. . .

Where the top output reported:

. . .;  load averages:  . . . MaxObs:  5.83,  5.09,  4.93                                                                                            . . .
. . . threads: . . . 21 MaxObsRunning
. . .
Mem: . . . 2285Mi MaxObsActive . . .
. . .
Swap: 14336Mi Total, 14336Mi Free

(The "Swap:" line did not report any positive amount used.)

No console messages at all.

In other words: it never got near starting to using the
swap paritition that was active.


For reference . . .

System: MACCHIATObin Double Shot (4 Cortex-A72's) with 16 GiBytes
        RAM. (So an aarch64 context.) Root-on-ZFS with no special
        tuning. main [So: 14]. 14336 MiByte sawp partition active.
        The boot media is a portable USB3 SSD.

# uname -apKU
FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #12 main-n249019-0637070b5bca-dirty: Tue Aug 31 02:24:20 PDT 2021     root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72  arm64 aarch64 1400032 1400032

But:

# poudriere jail -j13_0R-CA72 -i
Jail name:         13_0R-CA72
Jail version:      13.0-RELEASE-p4
Jail arch:         arm64.aarch64
Jail method:       null
Jail mount:        /usr/obj/DESTDIRs/13_0R-CA72-poud
Jail fs:           
Jail updated:      2021-09-06 19:07:54
Jail pkgbase:      disabled

And:

# cd /usr/ports
# ~/fbsd-based-on-what-commit.sh 
branch: main
merge-base: b0c4eaac2a3aa9bc422c21b9d398e4dbfea18736
merge-base: CommitDate: 2021-09-07 21:55:24 +0000
b0c4eaac2a3a (HEAD -> main, freebsd/main, freebsd/HEAD) security/suricata: Add patch for upstream locking fix
n557269 (--first-parent --count for merge-base)
Comment 16 Mark Millard 2021-09-25 01:47:58 UTC
(In reply to Tobias Kortkamp from comment #1)

Based on comment #15 I expect that codegen-units was
misidentified as the cause of the memory usage/pressure.
I expect that USE_TMPFS included wrkdir, which for
lang/rust can be 17 GiByte+ instead, was the driving
issue for memory use/pressure. USE_TMPFS="data"
(avoiding wrkdir) is the primary thing that deals with
the memory use/pressure from what I can tell.
(USE_TMPFS=yes is equivalent to "wrkdir data".)

Based on comment #8 and comment #11 I beleive the change
has negative consequences for various contexts, in part
based on lack of control from the OPTIONS.

(The default should track what FreeBSD wants for the official
package builders for the tradeoff for more-time vs. better code
generation. It is possible that would be the new setting. Such
is not for me to say. But . . .)

Given that USE_TMPFS="data" is what makes the big difference
for memory use/pressure, I'd suggest reverting the change made
for this bugzilla submittal until OPTIONS has control of the
codegen-units setting the rust default vs. 1 and the default
for the OPTION is set to what the long term official package
builds should be based on.
Comment 17 Mark Millard 2021-09-25 02:19:32 UTC
(In reply to Mark Millard from comment #15)

I forgot to mention that I have set larger timeout
values in /usr/local/etc/poudriere.conf than the
defaults. So my experiment would not show reaching
a default timeout, not that I expect such would have
occured in that experiment.
Comment 18 Daniel Engberg freebsd_committer freebsd_triage 2021-09-25 05:08:33 UTC
When I did some testing it did help because files were better optimized however it uses a single thread just like when you use lto vs thinlto. The behaviour is also documented in Rust's documentation regarding this option.
Comment 19 Mark Millard 2021-09-25 06:47:18 UTC
(In reply to Daniel Engberg from comment #18)

If that help was with memory use/memoory pressure, I'd
not expect it to be as big of a difference as "wrkdir data"
vs. just "data" for USE_TMPFS: "data" uses vastly less
memory than the 17 GiByte+ figure. How much of a difference
did codegen-units=1 make in your context?

See comment 6 for someone reporting codegen-units=1 being
insufficient in their context. (Many of my notes are tied
to trying to help that person since they gave enough detail
for me to have specific suggestions and expeirments to try
and my own exeriiments to report on.)

My hope is that the build-time/code-optimation tradeoff ends
up under explicit control at some point. I do not expect
general agreement about lang/rust build time frames being
shorter (default codegen-units) vs. the consequences of taking
the larger build times such as more optimized code
(codegen-units=1). I'd expect the default to be for the choice
made for the official package builders.
Comment 20 Mark Millard 2021-09-25 19:34:01 UTC
(In reply to Mark Millard from comment #15)

I've started a bulk lang/rust on a Rock64 (4 Cortex-A53's) with 4 GiByte
of RAM and 14 GiByte of swap and root on UFS (no ZFS use). (I normally
avoid ZFS on systems with less than 8 GiBytes of RAM.)

Again: It is based on (in part):

USE_TMPFS="data"
ALLOW_PARALLEL_JOBS=

for building rust-1.54.0_2 (so: before the codegen-units change).
It is a root-on-ZFS context. Also in use was /boot/loader.conf
having:

vm.pageout_oom_seq=120
vm.pfault_oom_attempts=-1

Again I have larger than default poudriere timout settings.

I'll report figures based on my local top patches that record
and report various "Maximum Observed" figures (MaxObs???? naming).

I expect that it will complete without using any swap space. (But
the Cortex-A53's will take a long time compared to the prior
MACCHIATObin Double Shot experiment.) It is possible that I'll
have to adjust some timeout(s) and retry: lang/rust will be the
largest thing that I've built in such a context.


I will note that, with 4 GiByte of RAM, the system would complain about
being mistuned for swap with even near 16 GiBytes of swap.
Comment 21 Mark Millard 2021-09-26 02:01:07 UTC
(In reply to Mark Millard from comment #20)

I've also started a lang/rust build on a Orange Pi+ 2E
(4 Cortex-A7's, so armv7) with 2 GiBytes of RAM and
3 GiByte of swap. USB2 port, so slower I/O.

USE_TMPFS="data"
ALLOW_PARALLEL_JOBS=

and:

vm.pageout_oom_seq=120
vm.pfault_oom_attempts=-1

in use again, with larger than default poudriere timeouts.

This will likely use a non-trivial amunt of swap, unlike
the Rock64. (The Rock64 has used somewhat under 6 MiByte
of swap early on. I've seen FreeBSD do such small usage
when the need is non-obvious various times before.)

This will also likely take a very long time to complete
and may well need bigger timeouts. (Bigger vm.pageout_oom_seq
too?) But I expect with appropriate values for such set the
rust build will complete in this context.

(I'm planning on adjusting timeouts to allow rust builds
on these systems. So I've other reasons for the experiments
but might as well report the results.)

Again rust-1.54.0_2 (before the codegen-units=1 change).
1.54 had some problems on armv7 but, as I remember, not in
building: later use. My prior armv7 build was on a Cortex-A72
(aarch64) targeting Cortex-A7 (armv7) via a jail that used -a
arm.armv7 . (The Cortex-A72 can execute Cortex-A7 code.)
But there was lots of RAM and cores for that, unlike this
experiment.
Comment 22 Mark Millard 2021-09-26 06:29:49 UTC
(In reply to Mark Millard from comment #21)

The armv7 (Cortex-A7) test is stopped for now because poudriere's
time reporting is messed up, such as:

[00:00:00] Creating the reference jail... done
. . .
[00:00:00] Balancing pool
[main-CA7-default] [2021-09-25_23h11m13s] [balancing_pool:] Queued: 70 Built: 0  Failed: 0  Skipped: 0  Ignored: 0  Fetched: 0  Tobuild: 70  Time: -258342:-3:-36
[00:00:00] Recording filesystem state for prepkg... done
. . .
Comment 23 Mark Millard 2021-09-26 20:06:04 UTC
(In reply to Mark Millard from comment #20)

For the Rock64 rust-1.54.0_2 build test with 4GiBytes of RAM using
USE_TMPFS="data" and ALLOW_PARALLEL_JOBS= and vm.pageout_oom_seq=120
and vm.pfault_oom_attempts=-1 but not using codegen-units=1 :

. . .
[00:01:22] Building 1 packages using 1 builders
[00:01:22] Starting/Cloning builders
[00:01:34] Hit CTRL+t at any time to see build progress and stats
[00:01:34] [01] [00:00:00] Building lang/rust | rust-1.54.0_2
[16:11:35] [01] [16:10:01] Finished lang/rust | rust-1.54.0_2: Success
[16:12:12] Stopping 1 builders

where:

last pid: . . .  load averages:  . . . MaxObs:  5.60,  5.01,  4.85                                                                                                . . .
. . . threads:    . . . 11 MaxObsRunning
. . .
Mem: . . . 2407Mi MaxObsActive, 995248Ki MaxObsWired, 3161Mi MaxObs(Act+Wir+Lndry)
Swap: 14336Mi Total, . . . 10712Ki MaxObsUsed, 2457Mi MaxObs(Act+Lndry+SwapUsed), 3171Mi MaxObs(Act+Wir+Lndry+SwapUsed)

So, somewhat under 10.5 MiBytes of swap used at some point (maximum
observed by top). If no swap had been made active, it likely still
would have finished just fine: no swap space (partition) required.

Reminder: This was a UFS context with a USB3 SSD media, no ZFS use.
Comment 24 Mark Millard 2021-09-27 06:15:50 UTC
(In reply to Mark Millard from comment #22)

I've started the 2 GiByte RAM armv7 test again,
after patching poudriere-devel for the time
reporting issue.
Comment 25 Mark Millard 2021-09-27 23:50:12 UTC
(In reply to Mark Millard from comment #21)

For this armv7 test I should have listed that I was going to use:

USE_TMPFS=no

(instead of "data").

The test is still running.
Comment 26 Mark Millard 2021-09-28 07:49:43 UTC
(In reply to Mark Millard from comment #25)

For the Orange Pi+ 2E (armv7) rust-1.54.0_2 build test with 2GiBytes
of RAM using USE_TMPFS=no and ALLOW_PARALLEL_JOBS= and
vm.pageout_oom_seq=120 and vm.pfault_oom_attempts=-1 but not using
codegen-units=1 :

. . .
[00:02:32] Building 1 packages using 1 builders
[00:02:32] Starting/Cloning builders
[00:03:21] Hit CTRL+t at any time to see build progress and stats
[00:03:21] [01] [00:00:00] Building lang/rust | rust-1.54.0_2
[25:09:49] [01] [25:06:28] Finished lang/rust | rust-1.54.0_2: Success
[25:10:27] Stopping 1 builders
. . .

. . .  load averages:  . . . MaxObs:  5.50,  5.13,  4.88                                                                                               . . .
. . . threads:    . . . 11 MaxObsRunning
. . .
Mem: . . . 1559Mi MaxObsActive, 257660Ki MaxObsWired, 1837Mi MaxObs(Act+Wir+Lndry)
Swap: 3072Mi Total, . . . 320604Ki MaxObsUsed, 1898Mi MaxObs(Act+Lndry+SwapUsed), 2113Mi MaxObs(Act+Wir+Lndry+SwapUsed)

So: Well under 350 MiBytes of swap used for USE_TMPFS=no with 2 GiBytes of RAM.
Swap space likely required, given its size vs. the 2 GiBytes. (USE_TMPFS="data"
would have used more swap space.)

Reminder: This was a UFS context with a USB3 SSD media, no ZFS use.
Comment 27 Mark Millard 2021-10-12 21:35:47 UTC
(In reply to Mark Millard from comment #23)

Lookd like the Rock64 test was with USE_TMPFS=no instead of
USE_TMPFS="data" .
Comment 28 commit-hook freebsd_committer freebsd_triage 2021-10-25 08:59:06 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/ports/commit/?id=124261fa7deb972b647c686d2531bbba0a9a4223

commit 124261fa7deb972b647c686d2531bbba0a9a4223
Author:     Tobias Kortkamp <tobik@FreeBSD.org>
AuthorDate: 2021-10-18 16:38:55 +0000
Commit:     Tobias Kortkamp <tobik@FreeBSD.org>
CommitDate: 2021-10-25 08:46:49 +0000

    lang/rust: Update to 1.56.0

    - Drop codegen-units=1 again as it seems to negatively impact build
      time for some people [1]

    Changes:        https://blog.rust-lang.org/2021/10/21/Rust-1.56.0.html
    PR:             256099 [1]
    PR:             259251
    Reviewed by:    jbeich, mikael, pkubaj
    Exp-run by:     antoine
    Differential Revision:  https://reviews.freebsd.org/D32553

 Mk/Uses/cargo.mk                                   |   2 +-
 Mk/bsd.gecko.mk                                    |   4 +-
 lang/rust-bootstrap/Makefile                       |   2 +-
 lang/rust-bootstrap/distinfo                       |   6 +-
 lang/rust/Makefile                                 |  10 +-
 lang/rust/distinfo                                 | 102 ++++++++++-----------
 lang/rust/files/patch-src_bootstrap_install.rs     |   8 +-
 ...c_tools_cargo_src_cargo_util_toml_mod.rs (gone) |  22 -----
 ...h-compiler_rustc__target_src_spec_mod.rs (gone) |  10 --
 ...et_src_spec_powerpc__unknown__freebsd.rs (gone) |  27 ------
 .../powerpc/patch-src_bootstrap_native.rs (gone)   |  25 -----
 ..._src_unix_bsd_freebsdlike_freebsd_mod.rs (gone) |  12 ---
 ..._unix_bsd_freebsdlike_freebsd_powerpc.rs (gone) |  50 ----------
 .../patch-vendor_openssl__src_src_lib.rs (gone)    |  10 --
 14 files changed, 65 insertions(+), 225 deletions(-)
Comment 29 John Hein 2021-10-28 11:49:32 UTC
(In reply to Tobias Kortkamp from comment #1)
Regarding bundled libgit2, I see the following complaint by stage-qa:

====> Running Q/A tests (stage-qa)
Error: /usr/local/bin/cargo is linked to /usr/local/lib/libgit2.so.1.1 from devel/libgit2 but it is not declared as a dependency


From that it would seem that if libgit2 is installed, lang/rust will use it (at least link with it if not compile with it, which might be worse) rather than the bundled libgit2.

If the bundled version is still desired, we should take steps to ensure it uses only the bundled version and not have the system's installed libgit2 (if any) leak into the build.  I don't have a patch at this time for accomplishing that (not well versed in the whole rust build process).

Or if we want to use devel/libgit2, that would be fine as well (add it as a library dependency).  Either way, make sure the build is explicit about which libgit2 it is using.  I doubt it's worth making it an option - better (probably) for a package like rust to just make an executive decision and pick one way.
Comment 30 Tobias Kortkamp freebsd_committer freebsd_triage 2021-11-10 13:11:32 UTC
(In reply to John Hein from comment #29)
Please submit a patch to make sure it doesn't use system libgit2.
Comment 31 Mark Millard 2021-11-11 05:38:42 UTC
(In reply to John Hein from comment #29)

The libgit2 issue seems to have nothing to do with the original problem
description about "rust/cargo eats a lot of memory". A separate bugzilla
submittal would seem to be appropriate instead.
Comment 32 Victor Sudakov 2023-01-19 11:12:32 UTC
*** Bug 265799 has been marked as a duplicate of this bug. ***
Comment 33 Daniel Engberg freebsd_committer freebsd_triage 2024-03-23 10:56:13 UTC
I guess we can close this by now
Comment 34 Anton Saietskii 2024-03-23 11:23:57 UTC
(In reply to Daniel Engberg from comment #33)

I'd say it's debatable. Still can't build rust at all on a low-end system with 3 G RAM. Only setting codegen units to 1 helps there.
This also breaks the logic of MAKE_JOBS_NUMBER — one does expect single CPU/core will be used when variable equals 1, which is not true with multiple code units.

IMO defaults should be safe for anyone, and if one wants shorter build times — it's their responsibility to enable parallel jobs, tune compiler options, etc.
Comment 35 Mark Millard 2024-03-23 14:18:08 UTC
(In reply to Anton Saietskii from comment #34)

I recently built rust on an armv7 with 2 GiBytes of RAM with
llvm18 building at the same time, using all 4 cores (2 for
rust and 2 for lvm18). It was part of a from scratch build
of 265 packages.

https://lists.freebsd.org/archives/freebsd-ports/2024-March/005792.html

has my notes for multiple 4 core little arm boards, mostly
aarch64 but the one armv7 example as well.

The 2 GiByte context details are:

Context: 1GHz, 4 core, cortex-a7 (armv7), 2 GiBytes RAM, USB2.
RAM+SWAP: 5.6 GiBytes. Also, this is doing my normal armv7 (and
aarch64) style of devel/llvm* build: OPTION'd to BE_NATIVE
instead of BE_STANDARD and OPTION'd to not build MLIR.
(No adjustment of options for rust.)

/usr/local/etc/poudriere.conf has . . .

NO_ZFS=yes
USE_TMPFS=no
PARALLEL_JOBS=2
ALLOW_MAKE_JOBS=yes
MAX_EXECUTION_TIME=432000
NOHANG_TIME=432000
MAX_EXECUTION_TIME_EXTRACT=14400
MAX_EXECUTION_TIME_INSTALL=14400
MAX_EXECUTION_TIME_PACKAGE=57600
MAX_EXECUTION_TIME_DEINSTALL=14400

Not essential:
PRIORITY_BOOST="cmake-core llvm18 rust"


/usr/local/etc/poudriere.d/make.conf has . . .

MAKE_JOBS_NUMBER_LIMIT=2

(With PARALLEL_JOBS=2 that keeps the load averages
under 5 most of the time.)

/etc/fstab does not specify any tmpfs use or the
like: avoids competing for RAM+SWAP.

RAM       ==   2 GiBytes
RAM+SWAP  == 5.6 Gibytes
(So: SWAP == 3.6 GiBytes)

I also historically use USB SSD/NVMe media, no
spinning rust, no microsd cards or such.

/boot/loader.conf has . . .

#
# Delay when persistent low free RAM leads to
# Out Of Memory killing of processes:
vm.pageout_oom_seq=120

This is important to allowing various things
to complete. (The default is 12. 120 is not
the maximum but has been appropriate in my
context. The figure is not in time units but
larger increases the observed delay so more
work gets done before OOM activity starts.)

Using vm.pageout_oom_seq is not specific to
poudriere use.

Other notes:

2794Mi MaxObs(Act+Wir+Lndry+SwapUsed)
Swap: 995524Ki MaxObsUsed

It finished overall, in somewhat under 5.5 days. The "what
builds took over an hour" summary is:

[01:51:31] [01] [01:00:07] Finished lang/perl5.36 | perl5-5.36.3_1: Success
[08:55:35] [02] [03:08:09] Finished devel/icu | icu-74.2,1: Success
[13:17:38] [02] [01:28:32] Finished lang/ruby31 | ruby-3.1.4_1,1: Success
[14:17:44] [01] [09:20:55] Finished devel/cmake-core | cmake-core-3.28.3: Success
[4D:01:03:43] [02] [3D:08:48:53] Finished lang/rust | rust-1.76.0: Success
[4D:06:26:24] [02] [03:09:35] Finished devel/binutils@native | binutils-2.40_5,1: Success
[4D:14:54:31] [02] [03:38:55] Finished devel/aarch64-none-elf-gcc | aarch64-none-elf-gcc-11.3.0_3: Success
[4D:16:13:00] [01] [4D:01:55:03] Finished devel/llvm18@default | llvm18-18.1.0.r3: Success
[4D:18:05:58] [02] [03:11:00] Finished devel/arm-none-eabi-gcc | arm-none-eabi-gcc-11.3.0_3: Success
[4D:23:00:13] [01] [06:46:06] Finished devel/boost-libs | boost-libs-1.84.0: Success
[5D:00:16:39] [01] [01:15:53] Finished textproc/source-highlight | source-highlight-3.1.9_9: Success
[5D:01:17:24] [02] [07:10:52] Finished lang/gcc13 | gcc13-13.2.0_4: Success
[5D:09:38:14] [01] [05:56:48] Finished devel/freebsd-gcc13@armv7 | armv7-gcc13-13.2.0_1: Success
[5D:10:18:58] [02] [05:44:02] Finished devel/gdb@py39 | gdb-14.1_2: Success
[5D:10:31:56] Stopping 2 builders
[main-CA7-default] [2024-03-06_03h15m10s] [committing] Queued: 265 Built: 265 Failed: 0   Skipped: 0   Ignored: 0   Fetched: 0   Tobuild: 0    Time: 5D:10:31:55

In /etc/rc.conf I have:

if [ "`sysctl -i -n hw.fdt.model`" == "Xunlong Orange Pi Plus 2E" ]; then
 sysctl dev.cpu.0.freq=1008 > /dev/null
fi

In other words: a fixed 1GHz or so clock rate is used. It has heatsinks
and a fan.

I happen to build for armv7 with use of -mcpu=cortex-a7 
generally (kernel, world, and packages).
Comment 36 Mark Millard 2024-03-24 13:21:51 UTC
(In reply to Mark Millard from comment #35)

I'm testing a similar aarch64 build (271 package
for aarch64 it turns out) based on using
total_mem=2048 in a RPi4B config.txt. (I've no
access to a native 2 GiByte aarch64.)

While it looks like it may well complete, I did
get:

# tail -3 /var/log/messages
Mar 23 16:39:38 aarch64-main-pkgs kernel: pid 37137 (conftest), jid 11, uid 0: exited on signal 11 (core dumped)
Mar 24 04:51:50 aarch64-main-pkgs kernel: swap_pager: cannot allocate bio
Mar 24 04:51:50 aarch64-main-pkgs syslogd: last message repeated 3 times

(Nothing looks to have failed and the build got past the
peak RAM+SWAP use and is continuing. I'd never seen this
type of message before.)

rust and llvm18 were building at the time. The from-scratch
bulk build is of 271 packages. 143 had already built.

llvm18 was using more RAM+SWAP than rust and was working
on some llvm-tblgen runs for AMDGPU at the time.

It might be that building just llvm18 allowing all 4
hardware threads to be active might not have been able
to complete.

I've a modified top that tracks some "MaxObs" (Maximum Observed)
figures. They happened to show:

Mem:  . . . 1473Mi MaxObsActive, 477304Ki MaxObsWired, 1908Mi MaxObs(Act+Wir+Lndry)
Swap: . . . 3101Mi MaxObsUsed, 4456Mi MaxObs(Act+Lndry+SwapUsed), 4887Mi MaxObs(A+Wir+L+SU), 4933Mi (A+W+L+SU+InAct)

(The 4933Mi (A+W+L+SU+InAct) is from when 4887Mi MaxObs(A+Wir+L+SU) was
live but is not a MaxObs figure itself.)

So a little under 4.9 GiBytes of RAM+SWAP in use at the time.

It was paging significantly at the time, of course.

For reference:

/usr/local/etc/poudriere.conf has . . .

NO_ZFS=yes
USE_TMPFS=data
PARALLEL_JOBS=2
ALLOW_MAKE_JOBS=yes
MAX_EXECUTION_TIME=432000
NOHANG_TIME=432000
MAX_EXECUTION_TIME_EXTRACT=14400
MAX_EXECUTION_TIME_INSTALL=14400
MAX_EXECUTION_TIME_PACKAGE=57600
MAX_EXECUTION_TIME_DEINSTALL=14400

/usr/local/etc/poudriere.d/make.conf has . . .

MAKE_JOBS_NUMBER_LIMIT=2

/boot/loader.conf has . . .

vm.pageout_oom_seq=120

FYI:

Using USE_TMPFS=no or USE_TMPFS=data (and avoiding
other tmpfs use) avoids rust using huge amounts of
RAM+SWAP For tmpfs and ends up using less peak
RAM+SWAP than llvm18 does.
Comment 37 Mark Millard 2024-03-24 19:31:13 UTC
(In reply to Mark Millard from comment #36)

Note: The "(conftest)" line was only included for its timestamp,
indicating no other just-prior message for the "swap_pager:
cannot allocate bio" messages. The "(conftest)" line is normal
output for the overall bulk build.
Comment 38 Mark Millard 2024-03-25 20:49:43 UTC
(In reply to Mark Millard from comment #36)

Despite the 4 "swap_pager: cannot allocate bio" notices, the
aarch64 bulk build of 271 packages completed for the 2 GiByte
RAM (88.5 GiByte RAM+SWAP) context:

[2D:01:44:36] Stopping 2 builders
[main-aarch64-pkgbase-default] [2024-03-23_11h33m25s] [committing] Queued: 271 Built: 271 Failed: 0   Skipped: 0   Ignored: 0   Fetched: 0   Tobuild: 0    Time: 2D:01:44:39

MaxObsWired increased after my prior note, so updating:

Mem:  . . . 1473Mi MaxObsActive, 680564Ki MaxObsWired, 1908Mi MaxObs(Act+Wir+Lndry)
Swap: . . . 3101Mi MaxObsUsed, 4456Mi MaxObs(Act+Lndry+SwapUsed), 4887Mi MaxObs(A+Wir+L+SU), 4933Mi (A+W+L+SU+InAct)

(The 4933Mi (A+W+L+SU+InAct) is from when 4887Mi MaxObs(A+Wir+L+SU) was
live but is not a MaxObs [MAXimum OBServed] figure itself.)


I conclude from the examples that, for aarch64 and armv7, "can't
build rust at all on a low-end system with 3 G RAM. Only setting
codegen units to 1 helps there" is false: One can build rust and
llvm18 at the same time with only 2 GiBytes of RAM --but doing
so requires use of appropriate SWAP space and avoiding nearly all
tmpfs use, as well as use of the likes of PARALLEL_JOBS=2 and
MAKE_JOBS_NUMBER_LIMIT=2 to limit the parallel activity. Also
likely: avoiding ZFS being active (automatic/implicit for what
I tested).


Repeating (with adjustments) the note about the RAM+SWAP usage
of rust vs. llvm18:

Using USE_TMPFS=no or USE_TMPFS=data (and avoiding most other
tmpfs use) avoids rust using huge amounts of RAM+SWAP for
tmpfs and rust ends up using less peak RAM+SWAP than llvm18
does.
Comment 39 Anton Saietskii 2024-03-25 21:03:20 UTC
(In reply to Mark Millard from comment #38)

> I conclude from the examples that, for aarch64 and armv7, "can't
build rust at all on a low-end system with 3 G RAM. Only setting
codegen units to 1 helps there" is false
I didn't say anything about ARM. amd64, 3G RAM + 3G swap, ONE thread, ZFS, USE_TMPFS=no -- build fails. Yes, and also no swap-related tuning and shouldn't really be -- it will affect everything on the machine while we need and can fix rust itself. I mean it should simply build on a default install.
Comment 40 Mark Millard 2024-03-25 22:00:57 UTC
(In reply to Anton Saietskii from comment #39)

> I didn't say anything about ARM. amd64 . . .

I do not see you referencing amd64 before the above. I used what fit
the text "on a low-end system with 3 G RAM" (or less than 3 GiBytes
of RAM) that I happened to have access to.

I do not have access to a amd64 system with a small RAM size. The
experiment would be interesting to me if I had access to such a
context.

The closest I could do is to monitor Act+Wir+Lndry+SwapUsed despite
the lack of significant memory pressure. (So, for example, SwapUsed
would likely stay zero and Lndry might as well.) If I do such, I'll
report on the results.
Comment 41 Mark Millard 2024-03-26 06:12:05 UTC
(In reply to Anton Saietskii from comment #39)

I managed to set up an amd64, 4-core (one hardware
thread each), 2 GiByte RAM, 9.5 GiByte RAM+SWAP, UFS,
Hyper-V virtual machine:

. . .
Hypervisor: Origin = "Microsoft Hv"
real memory  = 2147483648 (2048 MB)
avail memory = 2033299456 (1939 MB)
Event timer "LAPIC" quality 100
ACPI APIC Table: <VRTUAL MICROSFT>
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 1 package(s) x 4 core(s)
. . . (from gpart show:) . . .
  2921332736     7340032  da0p3  freebsd-swap  (3.5G)
. . .
  2919770112     8388608  da1p4  freebsd-swap  (4.0G)
. . .

So, overall:
RAM      == 2.0 GiBytes
RAM+SWAP == 9.5 GiBytes (using both swap partitions).

It actually is directly using my normal FreeBSD UFS boo
media for the 7950X3D system. (To make the small swap
space I had to use 2 smaller, previously-free spaces,
split across 2 FreeBSD media.)

It is running my experimental bulk build that builds
271 packages, including rust and llvm18 building at
the same time as part of the overall sequence. It
is to the point that rust and llvm18 are what are
building.

(My attempt to get useful MAXimum OBServed information
absent the memory pressure from from a 2 GiByte
restriction failed to give useful memory figures. So
this has the 2 GiByte restriction. I could try 3
GiBytes or other small figures if appropriate but
I might not be able to scale RAM+SWAP as I usually
do.)

More later . . .

I'll report sometime after the build. (I may be
sleeping when it finishes.)
Comment 42 Mark Millard 2024-03-26 07:32:57 UTC
(In reply to Anton Saietskii from comment #39)

rust finished building but llvm18 (and more) is
still building:

[01:56:31] [02] [01:30:29] Finished lang/rust | rust-1.76.0: Success

During rust+llvm18 both building . . .

RAM:
1405Mi MaxObsActive
607948Ki MaxObsWired
1944Mi MaxObs(Act+Wir+Lndry)

SWAP:
2609Mi MaxObsUsed

RAM+SWAP:
3932Mi MaxObs(Act+Lndry+SwapUsed)
4474Mi MaxObs(A+Wir+L+SU)
4528Mi (A+W+L+SU+InAct)

I now conclude from the examples that, for amd64, aarch64, and
armv7, "can't build rust at all on a low-end system with 3 G
RAM. Only setting codegen units to 1 helps there" is false,
given that I have a counter-example from each of the 3 contexts.
Adding "with ZFS/ARC in use" to your wording may well prevent
generating counter-examples to the extended statement.

I'll note that for MaxObsWired (under RAM), the ZFS/ARC goes
in the Wired category. Having a larger Wired would mean having
less space for Active+Inact+Lndry, no matter how much swap
space has been set up, given a fixed 3 GiByte RAM space.


Note: I assume that your "ONE thread" reference means that,
effectively, you had some equivalent of using the combination:
PARALLEL_JOBS=1 MAKE_JOBS_NUMBER_LIMIT=1

More after it is all done building . . .
Comment 43 Mark Millard 2024-03-26 10:53:12 UTC
(In reply to Mark Millard from comment #42)

The overall bulk build finished:

[03:42:24] Stopping 2 builders
. . .
[main-amd64-bulk_a-default] [2024-03-25_22h26m11s] [committing] Queued: 271 Built: 271 Failed: 0   Skipped: 0   Ignored: 0   Fetched: 0   Tobuild: 0    Time: 03:42:25

MaxObsWired is the only MaxObs figure that increased, as
tends to happen some when there is less memory pressure:

RAM:
1405Mi MaxObsActive
827040Ki MaxObsWired
1944Mi MaxObs(Act+Wir+Lndry)

SWAP:
2609Mi MaxObsUsed

RAM+SWAP:
3932Mi MaxObs(Act+Lndry+SwapUsed)
4474Mi MaxObs(A+Wir+L+SU)
4528Mi (A+W+L+SU+InAct)

(The 4528Mi (A+W+L+SU+InAct) is from when 4474Mi MaxObs(A+Wir+L+SU) was
live but is not a MaxObs [MAXimum OBServed] figure itself.)


I'm glad that I now have a context for such smaller RAM tests on amd64.
They do not take nearly as long to complete compared to the RPi4B
and OrangePi+2ed contexts.
Comment 44 Mark Millard 2024-03-26 13:27:12 UTC
As a simpler test in the amd64 Hyper-V context that
I'd described, I tried a MAKE_JOBS_NUMBER_LIMIT=4
based run of just building  rust:

# poudriere bulk -jmain-amd64-bulk_a -C lang/rust
. . .
[00:47:26] [01] [00:47:19] Finished lang/rust | rust-1.76.0: Success
[00:47:34] Stopping 1 builders
. . .
[main-amd64-bulk_a-default] [2024-03-26_04h00m50s] [committing] Queued: 1  Built: 1  Failed: 0  Skipped: 0  Ignored: 0  Fetched: 0  Tobuild: 0   Time: 00:47:35

It got:

Mem:
1395Mi MaxObsActive
823960Ki MaxObsWired
1943Mi MaxObs(Act+Wir+Lndry)

Swap:
2152Mi MaxObsUsed

RAM+SWAP:
3361Mi MaxObs(Act+Lndry+SwapUsed)
3912Mi MaxObs(A+Wir+L+SU)
4089Mi (A+W+L+SU+InAct)

(The 4089Mi (A+W+L+SU+InAct) is from when 3912Mi MaxObs(A+Wir+L+SU) was
live but is not a MaxObs [MAXimum OBServed] figure itself.)
Comment 45 Mark Millard 2024-03-26 17:23:37 UTC
(In reply to Anton Saietskii from comment #39)

"The Design and Implementation of the FreeBSD operating system", 2ed, page 542
says of ZFS:

QUOTE
ZFS was designed to manage an operate enormous filesystems easily, which
it does well. Is design assumed that it would have many fast 64-bit CPUs with
large amount of memory to support these enormous file systems. When these
resoruces are available, it works extremely well. However, it is not designed for
or well suited to run on resource-constrained system using 32-bit CPUs with less
than 8 Gbyte of memory and one small, nearly-full disk, which is typical of many
embedded systems.
END QUOTE

Another quote from the prior page:

QUOTE
Like all non-overwriting fielsystems, ZFS operates baest when at least a quarter of
its disk pool is free. Write throughput becomes poor when the pool gets too
full. By contrastm UFS can run well to 95 percent full and acceptably to 99 percent
full.
END QUOTE

It does not appear that you have "many fast 64-bit CPUs" and you have "less
than 8 Gbyte of memory" by a fair amount. Other aspects of the relationships to
the quotes are less clear. Still, as I understand it, your context is not well
suited to ZFS use for resource intensive activity like building packages (or
ports), at least absent rather special-case tuning.