Created attachment 245854 [details] memory-wasting test program I am trying to understand why armv7 processes on arm64 can only allocate around 2 GB of memory despite being allowed to use the whole 4GB of virtual address space. Debugging with kib, we found that 1 GB of address space is reserved for the stack starting around the 3 GB mark. As mmap() doesn't try to find memory beyond the stack, this limits it to finding memory between text and stack, which amounts to around 2 GB. Running the attached test program, it fails after allocating 2022 buffers @ 1 MB each and we get a memory map like this: PID START END PRT RES PRES REF SHD FLAG TP PATH 2844 0x10000 0x11000 r-- 1 3 3 1 CN--- vn /usr/home/fuz/test 2844 0x20000 0x21000 r-x 1 3 3 1 CN--- vn /usr/home/fuz/test 2844 0x30000 0x31000 r-- 1 0 1 0 C---- vn /usr/home/fuz/test 2844 0x40000 0x41000 rw- 1 1 1 0 ----- df 2844 0x40040000 0x40045000 r-- 5 29 35 11 CN--- vn /libexec/ld-elf.so.1 2844 0x40045000 0x4004b000 rw- 1 1 1 0 ----- df 2844 0x40054000 0x4006d000 r-x 25 29 35 11 CN--- vn /libexec/ld-elf.so.1 2844 0x4007c000 0x4007d000 r-- 1 0 1 0 C---- vn /libexec/ld-elf.so.1 2844 0x4008c000 0x400af000 rw- 20 20 1 0 ----- df 2844 0x400af000 0x400b3000 r-- 4 14 46 22 CN--- vn /lib/libgcc_s.so.1 2844 0x400b3000 0x400c2000 --- 0 0 0 0 CN--- gd 2844 0x400c2000 0x400cc000 r-x 10 14 46 22 CN--- vn /lib/libgcc_s.so.1 2844 0x400cc000 0x400db000 --- 0 0 0 0 CN--- gd 2844 0x400db000 0x400dc000 rw- 1 0 1 0 C---- vn /lib/libgcc_s.so.1 2844 0x400dc000 0x400eb000 --- 0 0 0 0 CN--- gd 2844 0x400eb000 0x400ec000 rw- 1 0 1 0 C---- vn /lib/libgcc_s.so.1 2844 0x400ec000 0x40133000 r-- 71 360 55 31 CN--- vn /lib/libc.so.7 2844 0x40133000 0x40142000 --- 0 0 0 0 CN--- gd 2844 0x40142000 0x40299000 r-x 282 360 55 31 CN--- vn /lib/libc.so.7 2844 0x40299000 0x402a8000 --- 0 0 0 0 CN--- gd 2844 0x402a8000 0x402ac000 r-- 4 0 2 0 C---- vn /lib/libc.so.7 2844 0x402ac000 0x402ad000 rw- 1 0 2 0 C---- vn /lib/libc.so.7 2844 0x402ad000 0x402bc000 --- 0 0 0 0 CN--- gd 2844 0x402bc000 0x402c1000 rw- 5 0 1 0 C---- vn /lib/libc.so.7 2844 0x402c1000 0x403de000 rw- 272 272 1 0 ----- df 2844 0x40400000 0x68981000 rw- 165235 165235 1 0 ----- df 2844 0x68a00000 0xba916000 rw- 335612 335612 1 0 ----- df 2844 0xbaa00000 0xbff4f000 rw- 20366 20366 1 0 ----- df 2844 0xbfffe000 0xfffde000 --- 0 0 0 0 ----- gd 2844 0xfffde000 0xffffe000 rw- 3 3 1 0 ---D- df 2844 0xffffe000 0xfffff000 r-x 1 1 161 0 ----- ph clearly showing how a large guard mapping for the stack from 0xbfffe000 to 0xfffde000 prevents the heap from growing further. On the other side, mmap() doesn't want to allocate below 0x40040000, cutting another GB off the available memory for around 2GB left to allocate. Setting ulimit -Hs 65536 does not affect the behaviour, but compiling with -Wl,-z,stack-size=65536 does, reducing the size of the stack mapping. Perhaps the guard page sizes and mmap lower limits could be adjusted to the defaults used for i386 processes on amd64 hosts?
The lower address limit depends on the data segment rlimit, which is obeyed; reducing the hard data segment rlimit causes mmap to allocate more pages.
For comparison, for i386 tasks on arm64, the behaviour is much more reasonable. There it looks like this: PID START END PRT RES PRES REF SHD FLAG TP PATH 91169 0x400000 0x401000 r-- 1 4 3 1 CN--- vn /usr/home/fuz/test 91169 0x401000 0x402000 r-x 1 4 3 1 CN--- vn /usr/home/fuz/test 91169 0x402000 0x403000 r-- 1 0 1 0 C---- vn /usr/home/fuz/test 91169 0x403000 0x404000 rw- 1 1 1 0 ----- df 91169 0x20403000 0x20409000 r-- 6 26 3 1 CN--- vn /libexec/ld-elf32.so.1 91169 0x20409000 0x2041d000 r-x 20 26 3 1 CN--- vn /libexec/ld-elf32.so.1 91169 0x2041d000 0x2041e000 r-- 1 0 2 0 C---- vn /libexec/ld-elf32.so.1 91169 0x2041e000 0x2041f000 rw- 1 0 2 0 C---- vn /libexec/ld-elf32.so.1 91169 0x2041f000 0x20442000 rw- 18 18 1 0 ----- df 91169 0x20442000 0x204b0000 r-- 65 325 4 2 CN--- vn /usr/lib32/libc.so.7 91169 0x204b0000 0x205ec000 r-x 252 325 4 2 CN--- vn /usr/lib32/libc.so.7 91169 0x205ec000 0x205f1000 r-- 5 0 2 0 C---- vn /usr/lib32/libc.so.7 91169 0x205f1000 0x205f2000 rw- 1 0 2 0 C---- vn /usr/lib32/libc.so.7 91169 0x205f2000 0x205f6000 rw- 4 0 1 0 C---- vn /usr/lib32/libc.so.7 91169 0x205f6000 0x20719000 rw- 276 276 1 0 ----- df 91169 0x20800000 0x48d81000 rw- 165238 165238 1 0 ----- df 91169 0x48e00000 0x9ad16000 rw- 335624 335624 1 0 ----- df 91169 0x9ae00000 0xe2068000 rw- 290755 290755 1 0 ----- df 91169 0xfbffe000 0xfffde000 --- 0 0 0 0 ----- gd 91169 0xfffde000 0xffffe000 rw- 3 3 1 0 ---D- df 91169 0xffffe000 0xfffff000 r-x 1 1 212 0 ----- ph
Created attachment 245855 [details] arm64: improve UVA layout for 32bit processes not even compiled
Needs #include <sys/sysctl.h> to build; will test once my build machine reboots for some reason.
With the missing include added in, I can confirm that the patch does the trick for me and fixes this issue.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/ports/commit/?id=e0b169f1d0061a31fccb6e24253011db40c5fdd7 commit e0b169f1d0061a31fccb6e24253011db40c5fdd7 Author: Robert Clausecker <fuz@FreeBSD.org> AuthorDate: 2023-11-03 07:50:10 +0000 Commit: Robert Clausecker <fuz@FreeBSD.org> CommitDate: 2023-11-03 21:16:26 +0000 graphics/lux: unbreak on armv7 An upcoming patch will fix the misconfiguration that restricts the address space for armv7 processes on arm64 to ~2GB instead of the ~3.5GB it should have been. With that patch applied, the port builds fine. As a temporary workaround, the following sysctls can be set to effect the same change (though affecting arm64 processes too): kern.maxssiz=67108864 kern.maxdsiz=536870912 armv6 stays broken as we cannot run armv6 processes on arm64 (see PR #256132). PR: 274705 MFH: 2023Q4 See also: https://reviews.freebsd.org/D42451 graphics/lux/Makefile | 1 - 1 file changed, 1 deletion(-)
A commit in branch 2023Q4 references this bug: URL: https://cgit.FreeBSD.org/ports/commit/?id=aaef966067395755338246f68386aa44ac8acf82 commit aaef966067395755338246f68386aa44ac8acf82 Author: Robert Clausecker <fuz@FreeBSD.org> AuthorDate: 2023-11-03 07:50:10 +0000 Commit: Robert Clausecker <fuz@FreeBSD.org> CommitDate: 2023-11-03 21:24:20 +0000 graphics/lux: unbreak on armv7 An upcoming patch will fix the misconfiguration that restricts the address space for armv7 processes on arm64 to ~2GB instead of the ~3.5GB it should have been. With that patch applied, the port builds fine. As a temporary workaround, the following sysctls can be set to effect the same change (though affecting arm64 processes too): kern.maxssiz=67108864 kern.maxdsiz=536870912 armv6 stays broken as we cannot run armv6 processes on arm64 (see PR #256132). PR: 274705 MFH: 2023Q4 See also: https://reviews.freebsd.org/D42451 (cherry picked from commit e0b169f1d0061a31fccb6e24253011db40c5fdd7) graphics/lux/Makefile | 1 - 1 file changed, 1 deletion(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=967022aa5aa60a18764a668ae0fb78e39e16fa8e commit 967022aa5aa60a18764a668ae0fb78e39e16fa8e Author: Konstantin Belousov <kib@FreeBSD.org> AuthorDate: 2023-10-25 01:03:09 +0000 Commit: Konstantin Belousov <kib@FreeBSD.org> CommitDate: 2023-11-04 16:47:50 +0000 arm64: improve UVA layout for 32bit processes Add compat.aarch32 tunables for maxssiz, maxdsiz, and maxvmem. Set the default values same as for amd64. Fix freebsd32 sysentvec on arm64 to provide sv_maxssiz, and sv_fixlimit. PR: 274705 Reviewed by: markj Tested by: fuz Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D42451 sys/arm64/arm64/elf32_machdep.c | 54 +++++++++++++++++++++++++++++++++++++++-- 1 file changed, 52 insertions(+), 2 deletions(-)
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=0dc8af9dae2ca5419a0d3313d0dcb42c6b5d6d38 commit 0dc8af9dae2ca5419a0d3313d0dcb42c6b5d6d38 Author: Konstantin Belousov <kib@FreeBSD.org> AuthorDate: 2023-10-25 01:03:09 +0000 Commit: Konstantin Belousov <kib@FreeBSD.org> CommitDate: 2023-11-10 12:15:46 +0000 arm64: improve UVA layout for 32bit processes PR: 274705 (cherry picked from commit 967022aa5aa60a18764a668ae0fb78e39e16fa8e) sys/arm64/arm64/elf32_machdep.c | 54 +++++++++++++++++++++++++++++++++++++++-- 1 file changed, 52 insertions(+), 2 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=9e1efa0f88356747d88e310209e57ba8f689fa4e commit 9e1efa0f88356747d88e310209e57ba8f689fa4e Author: Konstantin Belousov <kib@FreeBSD.org> AuthorDate: 2023-10-25 01:03:09 +0000 Commit: Konstantin Belousov <kib@FreeBSD.org> CommitDate: 2023-11-11 00:40:25 +0000 arm64: improve UVA layout for 32bit processes PR: 274705 (cherry picked from commit 967022aa5aa60a18764a668ae0fb78e39e16fa8e) sys/arm64/arm64/elf32_machdep.c | 54 +++++++++++++++++++++++++++++++++++++++-- 1 file changed, 52 insertions(+), 2 deletions(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/ports/commit/?id=12e7fdc3d5e4ef441fe41ad46978d4940ff991b4 commit 12e7fdc3d5e4ef441fe41ad46978d4940ff991b4 Author: Robert Clausecker <fuz@FreeBSD.org> AuthorDate: 2024-04-21 21:13:59 +0000 Commit: Robert Clausecker <fuz@FreeBSD.org> CommitDate: 2024-05-06 19:07:49 +0000 misc/openvdb: builds fine in an armv7 jail on arm64 The process limimts have been revised in a fix to PR 274705, making this port build fine. PR: 274705 MFH: 2024Q2 Approved by: portmgr (build fix blanket) misc/openvdb/Makefile | 2 -- 1 file changed, 2 deletions(-)
A commit in branch 2024Q2 references this bug: URL: https://cgit.FreeBSD.org/ports/commit/?id=643a47d64f9f777ebd9b1900928842c21865f616 commit 643a47d64f9f777ebd9b1900928842c21865f616 Author: Robert Clausecker <fuz@FreeBSD.org> AuthorDate: 2024-04-21 21:13:59 +0000 Commit: Robert Clausecker <fuz@FreeBSD.org> CommitDate: 2024-05-06 20:18:29 +0000 misc/openvdb: builds fine in an armv7 jail on arm64 The process limimts have been revised in a fix to PR 274705, making this port build fine. PR: 274705 MFH: 2024Q2 Approved by: portmgr (build fix blanket) (cherry picked from commit 12e7fdc3d5e4ef441fe41ad46978d4940ff991b4) misc/openvdb/Makefile | 2 -- 1 file changed, 2 deletions(-)