Bug 274705 - unreasonably large stack reservation for armv7 processes on arm64
Summary: unreasonably large stack reservation for armv7 processes on arm64
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: arm (show other bugs)
Version: 13.2-STABLE
Hardware: arm64 Any
: --- Affects Only Me
Assignee: Konstantin Belousov
URL: https://reviews.freebsd.org/D42451
Keywords:
Depends on:
Blocks:
 
Reported: 2023-10-24 22:02 UTC by Robert Clausecker
Modified: 2023-11-24 16:07 UTC (History)
4 users (show)

See Also:
fuz: mfc-stable14?
fuz: mfc-stable13?


Attachments
memory-wasting test program (511 bytes, text/plain)
2023-10-24 22:02 UTC, Robert Clausecker
no flags Details
arm64: improve UVA layout for 32bit processes (3.22 KB, patch)
2023-10-25 01:40 UTC, Konstantin Belousov
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Robert Clausecker freebsd_committer freebsd_triage 2023-10-24 22:02:53 UTC
Created attachment 245854 [details]
memory-wasting test program

I am trying to understand why armv7 processes on arm64 can only allocate around 2 GB of memory despite being allowed to use the whole 4GB of virtual address space.

Debugging with kib, we found that 1 GB of address space is reserved for the stack starting around the 3 GB mark.  As mmap() doesn't try to find memory beyond the stack, this limits it to finding memory between text and stack, which amounts to around 2 GB.

Running the attached test program, it fails after allocating 2022 buffers @ 1 MB each and we get a memory map like this:

  PID      START        END PRT  RES PRES REF SHD FLAG  TP PATH
 2844    0x10000    0x11000 r--    1    3   3   1 CN--- vn /usr/home/fuz/test
 2844    0x20000    0x21000 r-x    1    3   3   1 CN--- vn /usr/home/fuz/test
 2844    0x30000    0x31000 r--    1    0   1   0 C---- vn /usr/home/fuz/test
 2844    0x40000    0x41000 rw-    1    1   1   0 ----- df 
 2844 0x40040000 0x40045000 r--    5   29  35  11 CN--- vn /libexec/ld-elf.so.1
 2844 0x40045000 0x4004b000 rw-    1    1   1   0 ----- df 
 2844 0x40054000 0x4006d000 r-x   25   29  35  11 CN--- vn /libexec/ld-elf.so.1
 2844 0x4007c000 0x4007d000 r--    1    0   1   0 C---- vn /libexec/ld-elf.so.1
 2844 0x4008c000 0x400af000 rw-   20   20   1   0 ----- df 
 2844 0x400af000 0x400b3000 r--    4   14  46  22 CN--- vn /lib/libgcc_s.so.1
 2844 0x400b3000 0x400c2000 ---    0    0   0   0 CN--- gd 
 2844 0x400c2000 0x400cc000 r-x   10   14  46  22 CN--- vn /lib/libgcc_s.so.1
 2844 0x400cc000 0x400db000 ---    0    0   0   0 CN--- gd 
 2844 0x400db000 0x400dc000 rw-    1    0   1   0 C---- vn /lib/libgcc_s.so.1
 2844 0x400dc000 0x400eb000 ---    0    0   0   0 CN--- gd 
 2844 0x400eb000 0x400ec000 rw-    1    0   1   0 C---- vn /lib/libgcc_s.so.1
 2844 0x400ec000 0x40133000 r--   71  360  55  31 CN--- vn /lib/libc.so.7
 2844 0x40133000 0x40142000 ---    0    0   0   0 CN--- gd 
 2844 0x40142000 0x40299000 r-x  282  360  55  31 CN--- vn /lib/libc.so.7
 2844 0x40299000 0x402a8000 ---    0    0   0   0 CN--- gd 
 2844 0x402a8000 0x402ac000 r--    4    0   2   0 C---- vn /lib/libc.so.7
 2844 0x402ac000 0x402ad000 rw-    1    0   2   0 C---- vn /lib/libc.so.7
 2844 0x402ad000 0x402bc000 ---    0    0   0   0 CN--- gd 
 2844 0x402bc000 0x402c1000 rw-    5    0   1   0 C---- vn /lib/libc.so.7
 2844 0x402c1000 0x403de000 rw-  272  272   1   0 ----- df 
 2844 0x40400000 0x68981000 rw- 165235 165235   1   0 ----- df 
 2844 0x68a00000 0xba916000 rw- 335612 335612   1   0 ----- df 
 2844 0xbaa00000 0xbff4f000 rw- 20366 20366   1   0 ----- df 
 2844 0xbfffe000 0xfffde000 ---    0    0   0   0 ----- gd 
 2844 0xfffde000 0xffffe000 rw-    3    3   1   0 ---D- df 
 2844 0xffffe000 0xfffff000 r-x    1    1 161   0 ----- ph 

clearly showing how a large guard mapping for the stack from 0xbfffe000 to 0xfffde000 prevents the heap from growing further.  On the other side, mmap() doesn't want to allocate below 0x40040000, cutting another GB off the available memory for around 2GB left to allocate.

Setting ulimit -Hs 65536 does not affect the behaviour, but compiling with -Wl,-z,stack-size=65536 does, reducing the size of the stack mapping.

Perhaps the guard page sizes and mmap lower limits could be adjusted to the defaults used for i386 processes on amd64 hosts?
Comment 1 Robert Clausecker freebsd_committer freebsd_triage 2023-10-24 22:06:05 UTC
The lower address limit depends on the data segment rlimit, which is obeyed; reducing the hard data segment rlimit causes mmap to allocate more pages.
Comment 2 Robert Clausecker freebsd_committer freebsd_triage 2023-10-24 22:10:37 UTC
For comparison, for i386 tasks on arm64, the behaviour is much more reasonable.  There it looks like this:

  PID              START                END PRT  RES PRES REF SHD FLAG  TP PATH
91169           0x400000           0x401000 r--    1    4   3   1 CN--- vn /usr/home/fuz/test
91169           0x401000           0x402000 r-x    1    4   3   1 CN--- vn /usr/home/fuz/test
91169           0x402000           0x403000 r--    1    0   1   0 C---- vn /usr/home/fuz/test
91169           0x403000           0x404000 rw-    1    1   1   0 ----- df 
91169         0x20403000         0x20409000 r--    6   26   3   1 CN--- vn /libexec/ld-elf32.so.1
91169         0x20409000         0x2041d000 r-x   20   26   3   1 CN--- vn /libexec/ld-elf32.so.1
91169         0x2041d000         0x2041e000 r--    1    0   2   0 C---- vn /libexec/ld-elf32.so.1
91169         0x2041e000         0x2041f000 rw-    1    0   2   0 C---- vn /libexec/ld-elf32.so.1
91169         0x2041f000         0x20442000 rw-   18   18   1   0 ----- df 
91169         0x20442000         0x204b0000 r--   65  325   4   2 CN--- vn /usr/lib32/libc.so.7
91169         0x204b0000         0x205ec000 r-x  252  325   4   2 CN--- vn /usr/lib32/libc.so.7
91169         0x205ec000         0x205f1000 r--    5    0   2   0 C---- vn /usr/lib32/libc.so.7
91169         0x205f1000         0x205f2000 rw-    1    0   2   0 C---- vn /usr/lib32/libc.so.7
91169         0x205f2000         0x205f6000 rw-    4    0   1   0 C---- vn /usr/lib32/libc.so.7
91169         0x205f6000         0x20719000 rw-  276  276   1   0 ----- df 
91169         0x20800000         0x48d81000 rw- 165238 165238   1   0 ----- df 
91169         0x48e00000         0x9ad16000 rw- 335624 335624   1   0 ----- df 
91169         0x9ae00000         0xe2068000 rw- 290755 290755   1   0 ----- df 
91169         0xfbffe000         0xfffde000 ---    0    0   0   0 ----- gd 
91169         0xfffde000         0xffffe000 rw-    3    3   1   0 ---D- df 
91169         0xffffe000         0xfffff000 r-x    1    1 212   0 ----- ph
Comment 3 Konstantin Belousov freebsd_committer freebsd_triage 2023-10-25 01:40:17 UTC
Created attachment 245855 [details]
arm64: improve UVA layout for 32bit processes

not even compiled
Comment 4 Robert Clausecker freebsd_committer freebsd_triage 2023-10-25 02:28:53 UTC
Needs #include <sys/sysctl.h> to build; will test once my build machine reboots for some reason.
Comment 5 Robert Clausecker freebsd_committer freebsd_triage 2023-11-02 22:43:49 UTC
With the missing include added in, I can confirm that the patch does the trick for me and fixes this issue.
Comment 6 commit-hook freebsd_committer freebsd_triage 2023-11-03 21:19:06 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/ports/commit/?id=e0b169f1d0061a31fccb6e24253011db40c5fdd7

commit e0b169f1d0061a31fccb6e24253011db40c5fdd7
Author:     Robert Clausecker <fuz@FreeBSD.org>
AuthorDate: 2023-11-03 07:50:10 +0000
Commit:     Robert Clausecker <fuz@FreeBSD.org>
CommitDate: 2023-11-03 21:16:26 +0000

    graphics/lux: unbreak on armv7

    An upcoming patch will fix the misconfiguration that restricts the
    address space for armv7 processes on arm64 to ~2GB instead of the ~3.5GB
    it should have been.  With that patch applied, the port builds fine.
    As a temporary workaround, the following sysctls can be set to effect
    the same change (though affecting arm64 processes too):

        kern.maxssiz=67108864
        kern.maxdsiz=536870912

    armv6 stays broken as we cannot run armv6 processes on arm64 (see
    PR #256132).

    PR:             274705
    MFH:            2023Q4
    See also:       https://reviews.freebsd.org/D42451

 graphics/lux/Makefile | 1 -
 1 file changed, 1 deletion(-)
Comment 7 commit-hook freebsd_committer freebsd_triage 2023-11-03 21:25:37 UTC
A commit in branch 2023Q4 references this bug:

URL: https://cgit.FreeBSD.org/ports/commit/?id=aaef966067395755338246f68386aa44ac8acf82

commit aaef966067395755338246f68386aa44ac8acf82
Author:     Robert Clausecker <fuz@FreeBSD.org>
AuthorDate: 2023-11-03 07:50:10 +0000
Commit:     Robert Clausecker <fuz@FreeBSD.org>
CommitDate: 2023-11-03 21:24:20 +0000

    graphics/lux: unbreak on armv7

    An upcoming patch will fix the misconfiguration that restricts the
    address space for armv7 processes on arm64 to ~2GB instead of the ~3.5GB
    it should have been.  With that patch applied, the port builds fine.
    As a temporary workaround, the following sysctls can be set to effect
    the same change (though affecting arm64 processes too):

        kern.maxssiz=67108864
        kern.maxdsiz=536870912

    armv6 stays broken as we cannot run armv6 processes on arm64 (see
    PR #256132).

    PR:             274705
    MFH:            2023Q4
    See also:       https://reviews.freebsd.org/D42451

    (cherry picked from commit e0b169f1d0061a31fccb6e24253011db40c5fdd7)

 graphics/lux/Makefile | 1 -
 1 file changed, 1 deletion(-)
Comment 8 commit-hook freebsd_committer freebsd_triage 2023-11-04 16:48:43 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=967022aa5aa60a18764a668ae0fb78e39e16fa8e

commit 967022aa5aa60a18764a668ae0fb78e39e16fa8e
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2023-10-25 01:03:09 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2023-11-04 16:47:50 +0000

    arm64: improve UVA layout for 32bit processes

    Add compat.aarch32 tunables for maxssiz, maxdsiz, and maxvmem.
    Set the default values same as for amd64.
    Fix freebsd32 sysentvec on arm64 to provide sv_maxssiz, and sv_fixlimit.

    PR:     274705
    Reviewed by:    markj
    Tested by:      fuz
    Sponsored by:   The FreeBSD Foundation
    MFC after:      1 week
    Differential revision:  https://reviews.freebsd.org/D42451

 sys/arm64/arm64/elf32_machdep.c | 54 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 52 insertions(+), 2 deletions(-)
Comment 9 commit-hook freebsd_committer freebsd_triage 2023-11-11 00:41:00 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=0dc8af9dae2ca5419a0d3313d0dcb42c6b5d6d38

commit 0dc8af9dae2ca5419a0d3313d0dcb42c6b5d6d38
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2023-10-25 01:03:09 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2023-11-10 12:15:46 +0000

    arm64: improve UVA layout for 32bit processes

    PR:     274705

    (cherry picked from commit 967022aa5aa60a18764a668ae0fb78e39e16fa8e)

 sys/arm64/arm64/elf32_machdep.c | 54 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 52 insertions(+), 2 deletions(-)
Comment 10 commit-hook freebsd_committer freebsd_triage 2023-11-11 00:41:01 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=9e1efa0f88356747d88e310209e57ba8f689fa4e

commit 9e1efa0f88356747d88e310209e57ba8f689fa4e
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2023-10-25 01:03:09 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2023-11-11 00:40:25 +0000

    arm64: improve UVA layout for 32bit processes

    PR:     274705

    (cherry picked from commit 967022aa5aa60a18764a668ae0fb78e39e16fa8e)

 sys/arm64/arm64/elf32_machdep.c | 54 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 52 insertions(+), 2 deletions(-)