Bug 256099

Summary: lang/rust: Reduce memory usage/pressure
Product: Ports & Packages Reporter: Daniel Engberg <diizzy>
Component: Individual Port(s)Assignee: FreeBSD Rust Team <rust>
Status: New ---    
Severity: Affects Only Me CC: diizzy, marklmi26-fbsd, mikael, ygy
Priority: --- Flags: bugzilla: maintainer-feedback? (rust)
Version: Latest   
Hardware: Any   
OS: Any   

Description Daniel Engberg freebsd_committer 2021-05-23 11:54:28 UTC
During some build stages rust/cargo eats a lot of memory using its default settings making it more or less not viable for low/mid-rage systems.

Can we consider setting cogegen-units to 1 or add a toggle for it and also perhaps parallel compiling?
Ref: https://reviews.freebsd.org/D30099#677659
Comment 1 Tobias Kortkamp freebsd_committer 2021-06-04 12:04:08 UTC
Daniel, I'm sorry, not trying to ignore you but unsure of expectations.
Seems worthwhile but someone needs to run the builds, get some
before and after numbers, research downsides if any, and then I
guess we can set codegen-units=1 if it looks ok, sure.

As for some of the comments from the review:

> [...] uses LLVM from their package tree instead of bundled, perhaps that's worth looking into?

lang/rust had an LLVM_PORT option once, but that only works if
somebody feels responsible for supporting it and fixing any regressions
that might happen.  But nobody really did so we removed it.

> Unbundle libssh2?

It's bundled now I guess because we bundle libgit2 now too (because
devel/libgit2 was not updated fast enough again).  Since we update
the toolchain every 6 weeks it is all probably not worth the hassle.
Comment 2 Daniel Engberg freebsd_committer 2021-06-06 21:52:18 UTC
Hi,

I mainly started to look into this as building Rust on my low-end server (specs below) failed despite having quite a bit of RAM and swap at disposal.

While this isn't ideal way of logging here's a graph of memory usage at the end of compiling rust (2 jobs) that succeeds.
https://projects.pyret.net/files/public/freebsd/mem-usage-rustc.png

This box is an old Dell T20 with a dual core Intel Pentium G3220 CPU, 12Gb of RAM and running ZFS for Poudriere but not on rootfs. It runs 12.2-RELEASE-p6 and building Rust in a 12.2 jail. It's lightly loaded and uses about 5-6Gb of RAM (incl ZFS) without any Poudriere job running. I have no specific ZFS tuning set, however from what I can tell ZFS cache seems to grow quite a bit compiling Rust.

Setting codegen reduces memory usage about 1-1.5Gb from what I can tell but memory usage is still quite high.

I also gave this a go on my RockPro64 (arm64) (4Gb of RAM) running 13-STABLE (stable/13-n245283-70a2e9a3d44), UFS only and while it took 14h+ hours (-j1) it did finish. During compiing it used about 2Gb (the job not the complete system) tops which is a lot less than what I'm seeing on my server.

I'll give this a go on another box running 13-STABLE (amd64) and see if that also consumes a lot of memory.

Thanks for replying about LLVM and libssh2, if it's too much of a hassle I understand the decision :-)
Comment 3 Daniel Engberg freebsd_committer 2021-06-07 01:08:15 UTC
Hmm... compiling & optimizing seems to use a bit more memory, I did see a few processes use more than 2.5Gb of memory. Wired memory is a lot more though, ~4.8G and peaked at 6.7G so I guess that's due to ZFS?
Comment 4 commit-hook freebsd_committer 2021-06-14 20:52:02 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/ports/commit/?id=294f0c5c206d70e24b6bbc28766d962dc82f8b61

commit 294f0c5c206d70e24b6bbc28766d962dc82f8b61
Author:     Tobias Kortkamp <tobik@FreeBSD.org>
AuthorDate: 2021-06-14 18:50:33 +0000
Commit:     Tobias Kortkamp <tobik@FreeBSD.org>
CommitDate: 2021-06-14 20:51:11 +0000

    lang/rust-nightly: Try to reduce memory usage/pressure

    Try to reduce memory usage/pressure by only using one code generation
    unit.

    "This flag [codegen-units] controls how many code generation units
    the crate is split into.  It takes an integer greater than 0.

    When a crate is split into multiple codegen units, LLVM is able to
    process them in parallel.  Increasing parallelism may speed up
    compile times, but may also produce slower code.  Setting this to
    1 may improve the performance of generated code, but may be slower
    to compile."

    https://doc.rust-lang.org/rustc/codegen-options/index.html#codegen-units

    PR:             256099
    Suggested by:   Daniel Engberg

 lang/rust/Makefile | 3 +++
 1 file changed, 3 insertions(+)
Comment 5 Guangyuan Yang freebsd_committer 2021-06-27 09:27:01 UTC
Just wanted to report here that, building Rust always get OOM'ed after ~6 hours on my low-end build box 10 times in a row, spec:

- Intel i5-6500T (4) @ 2.496GHz
- 16GB RAM, 2GB swap
- FreeBSD 13.0-RELEASE amd64
- ZFS on root

It does finish on my VPS which has similar spec but more swap, however, the memory/swap usage is very high. Spec:

- Intel Xeon Platinum 8171M (4) @ 2.095GHz
- 16GB RAM, 32GB swap
- FreeBSD 13.0-RELEASE amd64
- UFS on root, with ZFS enabled on datadisks

I haven't looked at it closely, so will report back if I noticed anything. Thanks!
Comment 6 Guangyuan Yang freebsd_committer 2021-06-30 01:33:57 UTC
Cannot get lang/rust-nightly to build on my build box with 16G memory with the code generation unit change. The CPU usage is down to one core, but the RAM pressure is still very high, and the whole process ended up getting OOM'ed.
Comment 7 commit-hook freebsd_committer 2021-09-19 09:16:18 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/ports/commit/?id=6f1fefb50e755d727f471aeb75ebe4e28f876b4b

commit 6f1fefb50e755d727f471aeb75ebe4e28f876b4b
Author:     Tobias Kortkamp <tobik@FreeBSD.org>
AuthorDate: 2021-09-07 08:14:14 +0000
Commit:     Tobias Kortkamp <tobik@FreeBSD.org>
CommitDate: 2021-09-19 09:03:21 +0000

    lang/rust: Update to 1.55.0

    - Set codegen-units=1 [1]
    - Add hack to skip cargo update on git sources as a step towards solving [2]
    - Fix 'capacity overflow' panics on armv* [3]

    Changes:        https://blog.rust-lang.org/2021-09-09/Rust-1.55.0.html
    PR:             258337
    PR:             256099 [1]
    PR:             256581 [2]
    PR:             257419 [3]
    Reviewed by:    mikael, pkubaj
    Exp-run by:     antoine
    Differential Revision:  https://reviews.freebsd.org/D31872
    With hat:       rust

 Mk/Uses/cargo.mk                                   |   2 +-
 Mk/bsd.gecko.mk                                    |   2 +-
 lang/rust-bootstrap/Makefile                       |   8 +-
 lang/rust-bootstrap/distinfo                       |   6 +-
 lang/rust/Makefile                                 |  12 +--
 lang/rust/distinfo                                 | 114 ++++++++++-----------
 ...m-project_compiler-rt_lib_builtins_cpu__model.c |  21 ++--
 ...ols_cargo_src_cargo_sources_git_source.rs (new) |  45 ++++++++
 ...rc_tools_cargo_src_cargo_util_toml_mod.rs (new) |  22 ++++
 .../patch-vendor_openssl-sys_build_main.rs (gone)  |  19 ----
 ..._src_unix_bsd_freebsdlike_freebsd_mod.rs (gone) |  12 ---
 ..._unix_bsd_freebsdlike_freebsd_powerpc.rs (gone) |  50 ---------
 .../powerpc64-elfv1/patch-src_bootstrap_native.rs  |  10 +-
 ...h-compiler_rustc__target_src_spec_mod.rs (gone) |  10 --
 ...rc_spec_powerpc64le__unknown__freebsd.rs (gone) |  19 ----
 15 files changed, 154 insertions(+), 198 deletions(-)
Comment 8 Mark Millard 2021-09-23 19:24:07 UTC
(In reply to commit-hook from comment #7)

An FYI for systems with more resources . . .

Prior to this change during a from-scratch bulk -a using ALLOW_PARALLEL_JOBS= :

[05:52:06] [16] [04:29:11] Finished lang/rust | rust-1.54.0_2: Success

on the same machine after the change (again from-scratch
using ALLOW_PARALLEL_JOBS= ):

[12:39:47] [14] [11:20:24] Finished lang/rust | rust-1.55.0: Success

So about 2.5 times longer (about 4.5 hrs -> 11.3 hrs).

For reference:

HoneyComb (16 Cortext-A72's) with 64 GiBytes of RAM, root on ZFS,
Optane 480 media. Large swap on USB3 SSD media but top indicated
it was unused during both the bulk -a builds.

This test does not control just what the other 15 builders
were doing in the overlapping time frames in each bulk -a
but all the other builders were busy with a sequence of
builds over that time. The load averages were well over 16
but I do not have record of such over time for either bulk -a .

I've another bulk -a going on that machine and it may be about
a week before it finishes. (The 11:20:24 figure is from that
on-oing bulk -a .)
Comment 9 Mark Millard 2021-09-23 19:36:00 UTC
(In reply to Mark Millard from comment #8)

I forgot to list that:

USE_TMPFS="data"

was in use.

I've built rust by itself with USE_TMPFS=yes (so "wrkdir data") in the
past and the tmpfs use grew to around 17 GiBytes. Luckilly I had swap
configured that was sufficient for the machine that was done on at the
time.

Having USE_TMPFS allowing significant tmpfs sizes for port builds
using huge amounts of disk space basically requires an environment
with sufficient resources arrnaged up front.

The use of PCIe OPTANE media helps avoid I/O being as much of an issue
as it could be with, say, spinning media.
Comment 10 Mark Millard 2021-09-23 19:55:59 UTC
(In reply to Mark Millard from comment #9)

I found my old note about the tmpfs use for USE_TMPFS=yes
for lang/rust :

# df -m | grep tmpfs
Filesystem 1M-blocks   Used  Avail Capacity  Mounted on
. . .
tmpfs         301422  17859 283563     6%    /usr/local/poudriere/data/.m/FBSDFSSDjail-default/01/wrkdirs
. . .

So the 17 GiBytes was only the "wrkdirs" contribution.
Comment 11 Mikael Urankar freebsd_committer 2021-09-24 09:22:20 UTC
(In reply to Mark Millard from comment #8)
I have similar result on my amd64 box:

rust 1.55.0 with codegen-units=1
build time: 00:39:59


rust 1.55.0 without codegen-units=1
build time: 00:23:15
Comment 12 Mark Millard 2021-09-24 18:17:40 UTC
(In reply to Daniel Engberg from comment #0)

What USE_TMPFS (or analogous) was in use?
Comment 13 Mark Millard 2021-09-24 18:18:38 UTC
(In reply to Guangyuan Yang from comment #5)

What USE_TMPFS (or analogous) setting was in use?
Comment 14 Mark Millard 2021-09-24 19:08:43 UTC
(In reply to Guangyuan Yang from comment #5)

Unfortunately messages such as:

pid . . . (. . .), jid . . ., uid . . ., was killed: out of swap space

can be a misnomer for the "out of swap space" part: it can
be reported even when none of the swap space had been in use.
There are other reasons possible for why kills happen. One
point is that FreeBSD wil not swap out a process that stays
runnable, even if its active memory use keeps the free RAM
minimal, it just continues to page in and out.

If it really was out of swap space there would also be messages
like:

swap_pager_getswapspace(. . .): failed

or:

swap_pager: out of swap space

Other causes for the kills include:

Sustained low free RAM (via stays-runnable processes).
A sufficiently delayed pageout.
The swap blk uma zone was exhausted.
The swap pctrie uma zone was exhausted.

The first two of those have some tunables
that you might want to try:

# Delay when persistent low free RAM leads to
# Out Of Memory killing of processes:
vm.pageout_oom_seq=120

# For plunty of swap/paging space (will not
# run out), avoid pageout delays leading to
# Out Of Memory killing of processes:
vm.pfault_oom_attempts=-1

# For possibly insufficient swap/paging space
# (might run out), increase the pageout delay
# that leads to Out Of Memory killing of
# processes (showing defaults at the time):
#vm.pfault_oom_attempts= 3
#vm.pfault_oom_wait= 10
# (The multiplication is the total but there
# are other potential tradoffs in the factors
# multiplied, even for nearly the same total.)


I'll note that vm.pageout_oom_seq has a default of 12
but can be much larger than 120, such as 1024 or 10240
or even more. Larger figures increase the time before
kills start happening because of sustained low free RAM.
But no setting is designed to disable the kills from
eventually happening on some scale.
Comment 15 Mark Millard 2021-09-25 01:27:53 UTC
(In reply to Guangyuan Yang from comment #5)

The following is based on (in part):

USE_TMPFS="data"
ALLOW_PARALLEL_JOBS=

for building rust-1.54.0_2 (so: before the codegen-units change).
It is a root-on-ZFS context. Also in use was /boot/loader.conf
having:

vm.pageout_oom_seq=120
vm.pfault_oom_attempts=-1

I'll report figures based on my local top patches that record
and report various "Maximum Observed" figures (MaxObs???? naming).

poudriere output:

. . .
[00:00:23] Building 1 packages using 1 builders
[00:00:23] Starting/Cloning builders
[00:00:27] Hit CTRL+t at any time to see build progress and stats
[00:00:27] [01] [00:00:00] Building lang/rust | rust-1.54.0_2
[05:10:56] [01] [05:10:29] Finished lang/rust | rust-1.54.0_2: Success
[05:11:35] Stopping 1 builders
. . .

Where the top output reported:

. . .;  load averages:  . . . MaxObs:  5.83,  5.09,  4.93                                                                                            . . .
. . . threads: . . . 21 MaxObsRunning
. . .
Mem: . . . 2285Mi MaxObsActive . . .
. . .
Swap: 14336Mi Total, 14336Mi Free

(The "Swap:" line did not report any positive amount used.)

No console messages at all.

In other words: it never got near starting to using the
swap paritition that was active.


For reference . . .

System: MACCHIATObin Double Shot (4 Cortex-A72's) with 16 GiBytes
        RAM. (So an aarch64 context.) Root-on-ZFS with no special
        tuning. main [So: 14]. 14336 MiByte sawp partition active.
        The boot media is a portable USB3 SSD.

# uname -apKU
FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #12 main-n249019-0637070b5bca-dirty: Tue Aug 31 02:24:20 PDT 2021     root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72  arm64 aarch64 1400032 1400032

But:

# poudriere jail -j13_0R-CA72 -i
Jail name:         13_0R-CA72
Jail version:      13.0-RELEASE-p4
Jail arch:         arm64.aarch64
Jail method:       null
Jail mount:        /usr/obj/DESTDIRs/13_0R-CA72-poud
Jail fs:           
Jail updated:      2021-09-06 19:07:54
Jail pkgbase:      disabled

And:

# cd /usr/ports
# ~/fbsd-based-on-what-commit.sh 
branch: main
merge-base: b0c4eaac2a3aa9bc422c21b9d398e4dbfea18736
merge-base: CommitDate: 2021-09-07 21:55:24 +0000
b0c4eaac2a3a (HEAD -> main, freebsd/main, freebsd/HEAD) security/suricata: Add patch for upstream locking fix
n557269 (--first-parent --count for merge-base)
Comment 16 Mark Millard 2021-09-25 01:47:58 UTC
(In reply to Tobias Kortkamp from comment #1)

Based on comment #15 I expect that codegen-units was
misidentified as the cause of the memory usage/pressure.
I expect that USE_TMPFS included wrkdir, which for
lang/rust can be 17 GiByte+ instead, was the driving
issue for memory use/pressure. USE_TMPFS="data"
(avoiding wrkdir) is the primary thing that deals with
the memory use/pressure from what I can tell.
(USE_TMPFS=yes is equivalent to "wrkdir data".)

Based on comment #8 and comment #11 I beleive the change
has negative consequences for various contexts, in part
based on lack of control from the OPTIONS.

(The default should track what FreeBSD wants for the official
package builders for the tradeoff for more-time vs. better code
generation. It is possible that would be the new setting. Such
is not for me to say. But . . .)

Given that USE_TMPFS="data" is what makes the big difference
for memory use/pressure, I'd suggest reverting the change made
for this bugzilla submittal until OPTIONS has control of the
codegen-units setting the rust default vs. 1 and the default
for the OPTION is set to what the long term official package
builds should be based on.
Comment 17 Mark Millard 2021-09-25 02:19:32 UTC
(In reply to Mark Millard from comment #15)

I forgot to mention that I have set larger timeout
values in /usr/local/etc/poudriere.conf than the
defaults. So my experiment would not show reaching
a default timeout, not that I expect such would have
occured in that experiment.
Comment 18 Daniel Engberg freebsd_committer 2021-09-25 05:08:33 UTC
When I did some testing it did help because files were better optimized however it uses a single thread just like when you use lto vs thinlto. The behaviour is also documented in Rust's documentation regarding this option.
Comment 19 Mark Millard 2021-09-25 06:47:18 UTC
(In reply to Daniel Engberg from comment #18)

If that help was with memory use/memoory pressure, I'd
not expect it to be as big of a difference as "wrkdir data"
vs. just "data" for USE_TMPFS: "data" uses vastly less
memory than the 17 GiByte+ figure. How much of a difference
did codegen-units=1 make in your context?

See comment 6 for someone reporting codegen-units=1 being
insufficient in their context. (Many of my notes are tied
to trying to help that person since they gave enough detail
for me to have specific suggestions and expeirments to try
and my own exeriiments to report on.)

My hope is that the build-time/code-optimation tradeoff ends
up under explicit control at some point. I do not expect
general agreement about lang/rust build time frames being
shorter (default codegen-units) vs. the consequences of taking
the larger build times such as more optimized code
(codegen-units=1). I'd expect the default to be for the choice
made for the official package builders.
Comment 20 Mark Millard 2021-09-25 19:34:01 UTC
(In reply to Mark Millard from comment #15)

I've started a bulk lang/rust on a Rock64 (4 Cortex-A53's) with 4 GiByte
of RAM and 14 GiByte of swap and root on UFS (no ZFS use). (I normally
avoid ZFS on systems with less than 8 GiBytes of RAM.)

Again: It is based on (in part):

USE_TMPFS="data"
ALLOW_PARALLEL_JOBS=

for building rust-1.54.0_2 (so: before the codegen-units change).
It is a root-on-ZFS context. Also in use was /boot/loader.conf
having:

vm.pageout_oom_seq=120
vm.pfault_oom_attempts=-1

Again I have larger than default poudriere timout settings.

I'll report figures based on my local top patches that record
and report various "Maximum Observed" figures (MaxObs???? naming).

I expect that it will complete without using any swap space. (But
the Cortex-A53's will take a long time compared to the prior
MACCHIATObin Double Shot experiment.) It is possible that I'll
have to adjust some timeout(s) and retry: lang/rust will be the
largest thing that I've built in such a context.


I will note that, with 4 GiByte of RAM, the system would complain about
being mistuned for swap with even near 16 GiBytes of swap.
Comment 21 Mark Millard 2021-09-26 02:01:07 UTC
(In reply to Mark Millard from comment #20)

I've also started a lang/rust build on a Orange Pi+ 2E
(4 Cortex-A7's, so armv7) with 2 GiBytes of RAM and
3 GiByte of swap. USB2 port, so slower I/O.

USE_TMPFS="data"
ALLOW_PARALLEL_JOBS=

and:

vm.pageout_oom_seq=120
vm.pfault_oom_attempts=-1

in use again, with larger than default poudriere timeouts.

This will likely use a non-trivial amunt of swap, unlike
the Rock64. (The Rock64 has used somewhat under 6 MiByte
of swap early on. I've seen FreeBSD do such small usage
when the need is non-obvious various times before.)

This will also likely take a very long time to complete
and may well need bigger timeouts. (Bigger vm.pageout_oom_seq
too?) But I expect with appropriate values for such set the
rust build will complete in this context.

(I'm planning on adjusting timeouts to allow rust builds
on these systems. So I've other reasons for the experiments
but might as well report the results.)

Again rust-1.54.0_2 (before the codegen-units=1 change).
1.54 had some problems on armv7 but, as I remember, not in
building: later use. My prior armv7 build was on a Cortex-A72
(aarch64) targeting Cortex-A7 (armv7) via a jail that used -a
arm.armv7 . (The Cortex-A72 can execute Cortex-A7 code.)
But there was lots of RAM and cores for that, unlike this
experiment.
Comment 22 Mark Millard 2021-09-26 06:29:49 UTC
(In reply to Mark Millard from comment #21)

The armv7 (Cortex-A7) test is stopped for now because poudriere's
time reporting is messed up, such as:

[00:00:00] Creating the reference jail... done
. . .
[00:00:00] Balancing pool
[main-CA7-default] [2021-09-25_23h11m13s] [balancing_pool:] Queued: 70 Built: 0  Failed: 0  Skipped: 0  Ignored: 0  Fetched: 0  Tobuild: 70  Time: -258342:-3:-36
[00:00:00] Recording filesystem state for prepkg... done
. . .
Comment 23 Mark Millard 2021-09-26 20:06:04 UTC
(In reply to Mark Millard from comment #20)

For the Rock64 rust-1.54.0_2 build test with 4GiBytes of RAM using
USE_TMPFS="data" and ALLOW_PARALLEL_JOBS= and vm.pageout_oom_seq=120
and vm.pfault_oom_attempts=-1 but not using codegen-units=1 :

. . .
[00:01:22] Building 1 packages using 1 builders
[00:01:22] Starting/Cloning builders
[00:01:34] Hit CTRL+t at any time to see build progress and stats
[00:01:34] [01] [00:00:00] Building lang/rust | rust-1.54.0_2
[16:11:35] [01] [16:10:01] Finished lang/rust | rust-1.54.0_2: Success
[16:12:12] Stopping 1 builders

where:

last pid: . . .  load averages:  . . . MaxObs:  5.60,  5.01,  4.85                                                                                                . . .
. . . threads:    . . . 11 MaxObsRunning
. . .
Mem: . . . 2407Mi MaxObsActive, 995248Ki MaxObsWired, 3161Mi MaxObs(Act+Wir+Lndry)
Swap: 14336Mi Total, . . . 10712Ki MaxObsUsed, 2457Mi MaxObs(Act+Lndry+SwapUsed), 3171Mi MaxObs(Act+Wir+Lndry+SwapUsed)

So, somewhat under 10.5 MiBytes of swap used at some point (maximum
observed by top). If no swap had been made active, it likely still
would have finished just fine: no swap space (partition) required.

Reminder: This was a UFS context with a USB3 SSD media, no ZFS use.
Comment 24 Mark Millard 2021-09-27 06:15:50 UTC
(In reply to Mark Millard from comment #22)

I've started the 2 GiByte RAM armv7 test again,
after patching poudriere-devel for the time
reporting issue.
Comment 25 Mark Millard 2021-09-27 23:50:12 UTC
(In reply to Mark Millard from comment #21)

For this armv7 test I should have listed that I was going to use:

USE_TMPFS=no

(instead of "data").

The test is still running.
Comment 26 Mark Millard 2021-09-28 07:49:43 UTC
(In reply to Mark Millard from comment #25)

For the Orange Pi+ 2E (armv7) rust-1.54.0_2 build test with 2GiBytes
of RAM using USE_TMPFS=no and ALLOW_PARALLEL_JOBS= and
vm.pageout_oom_seq=120 and vm.pfault_oom_attempts=-1 but not using
codegen-units=1 :

. . .
[00:02:32] Building 1 packages using 1 builders
[00:02:32] Starting/Cloning builders
[00:03:21] Hit CTRL+t at any time to see build progress and stats
[00:03:21] [01] [00:00:00] Building lang/rust | rust-1.54.0_2
[25:09:49] [01] [25:06:28] Finished lang/rust | rust-1.54.0_2: Success
[25:10:27] Stopping 1 builders
. . .

. . .  load averages:  . . . MaxObs:  5.50,  5.13,  4.88                                                                                               . . .
. . . threads:    . . . 11 MaxObsRunning
. . .
Mem: . . . 1559Mi MaxObsActive, 257660Ki MaxObsWired, 1837Mi MaxObs(Act+Wir+Lndry)
Swap: 3072Mi Total, . . . 320604Ki MaxObsUsed, 1898Mi MaxObs(Act+Lndry+SwapUsed), 2113Mi MaxObs(Act+Wir+Lndry+SwapUsed)

So: Well under 350 MiBytes of swap used for USE_TMPFS=no with 2 GiBytes of RAM.
Swap space likely required, given its size vs. the 2 GiBytes. (USE_TMPFS="data"
would have used more swap space.)

Reminder: This was a UFS context with a USB3 SSD media, no ZFS use.
Comment 27 Mark Millard 2021-10-12 21:35:47 UTC
(In reply to Mark Millard from comment #23)

Lookd like the Rock64 test was with USE_TMPFS=no instead of
USE_TMPFS="data" .