I think it should return a value equal or slightly smaller than RLIMIT_STACK instead. It does so on Ubuntu at least. Apparently, Mono (most notably used in the popular Unity game engine) relies on this for setting stack guards: https://github.com/mono/mono/blob/da11592cbea4269971f4b1f9624769a85cc10660/mono/utils/mono-threads-linux.c#L13-L38, https://github.com/mono/mono/blob/43190aeb5f7e4d7e0185d3b656054bf232219fe2/mono/mini/mini-exceptions.c#L3160-L3175. Reproducer: % uname -a FreeBSD desktop 12.2-RELEASE-p1 FreeBSD 12.2-RELEASE-p1 GENERIC amd64 % cat apparent_stack_size.c #define _GNU_SOURCE #include <assert.h> #include <stdio.h> #include <stdlib.h> #include <pthread.h> #include <unistd.h> int main() { char cmd[100]; snprintf(cmd, sizeof(cmd), "cat /proc/%d/maps | tail -n 5", getpid()); system(cmd); size_t size = 0; void* addr = NULL; pthread_attr_t attr; assert(pthread_attr_init(&attr) == 0); assert(pthread_getattr_np(pthread_self(), &attr) == 0); assert(pthread_attr_getstack(&attr, &addr, &size) == 0); assert(pthread_attr_destroy(&attr) == 0); fprintf(stderr, "stack size = %zd\n", size); return 0; } % /compat/linux/bin/cc apparent_stack_size.c -pthread -o test % ./test 00000008011c7000-00000008011c9000 rw-p 0038a000 00:00 391497 /compat/linux/usr/lib64/libc-2.17.so 00000008011c9000-00000008011ce000 rw-p 00000000 00:00 0 00007fffdffff000-00007ffffffdf000 ---p 00000000 00:00 0 00007ffffffdf000-00007ffffffff000 rw-p 00000000 00:00 0 [stack] 00007ffffffff000-0000800000000000 r-xs 00000000 00:00 0 [vdso] stack size = 126976 As it happens, glibc reads /proc/self/maps and compares the stack entry to the preceding entry. You know, just in case: /* The limit might be too high. */ if ((size_t) iattr->stacksize > (size_t) iattr->stackaddr - last_to) iattr->stacksize = (size_t) iattr->stackaddr - last_to; (https://sourceware.org/git/?p=glibc.git;a=blob;f=nptl/pthread_getattr_np.c;h=25807cb529880d67a6561b6ebcd45042e89dea3e;hb=HEAD#l144)
What's wrong here? 00007ffffffdf000-00007ffffffff000 is 128kB; the reported pthread value is pretty close to that. I.e., what about this triggers a bad behavior in Mono?
(In reply to Conrad Meyer from comment #1) > 00007ffffffdf000-00007ffffffff000 is 128kB Yeah, sorry. In fact this the case on Linux as well. It's just that pthread_getattr_np is not supposed to be return an actual allocated value, but rather how much stack can grow. (Well, probably, I'm exactly sure if it's specified anywhere.)
(In reply to Alex S from comment #2) s/be// s/exactly sure/not exactly sure/
Here's how this looks on Ubuntu: xubuntu@xubuntu:~$ uname -a Linux xubuntu 5.4.0-42-generic #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux xubuntu@xubuntu:~$ gcc apparent_stack_size.c -pthread -o test xubuntu@xubuntu:~$ ./test 7feaba441000-7feaba442000 rw-p 00000000 00:00 0 7ffca0aef000-7ffca0b10000 rw-p 00000000 00:00 0 [stack] 7ffca0b6b000-7ffca0b6e000 r--p 00000000 00:00 0 [vvar] 7ffca0b6e000-7ffca0b6f000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0 [vsyscall] stack size = 8384512 xubuntu@xubuntu:~$ ulimit -s 3000 xubuntu@xubuntu:~$ ./test 7f07aa268000-7f07aa269000 rw-p 00000000 00:00 0 7ffc606ee000-7ffc6070f000 rw-p 00000000 00:00 0 [stack] 7ffc60765000-7ffc60768000 r--p 00000000 00:00 0 [vvar] 7ffc60768000-7ffc60769000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0 [vsyscall] stack size = 3063808
Ok, here's what's going on: Our rlimit value (cur) is fine; same as glibc (8MB). Glibc is parsing /proc/self/maps to limit the pthread "stack size" based on adjacent mappings, *which it assumes cannot be part of the stack*. In FreeBSD, we actually have explicit no-access mapping for the region the stack can grow into (rlim_max): 00007fffdffff000-00007ffffffdf000 ---p 00000000 00:00 0 00007ffffffdf000-00007ffffffff000 rw-p 00000000 00:00 0 [stack] I.e., that earlier mapping also corresponds to the stack. Linux doesn't do this, or doesn't report it in /proc/self/maps.
(In reply to Conrad Meyer from comment #5) Yep, that's pretty much it. Do you think it makes sense to just hide that entry?
In exec, we map the stack with vm_map_stack() with rlim_cur (I think); in vm_map_stack, we set the init_ssize with MIN(sysctl kern.sgrowsiz, rlim_cur). kern.sgrowsiz is 128kB. There's a comment about the behavior in vm/vm_map.c:4565. At line 4585 we insert the normal stack mapping. At line 4599 we insert the reservation for the unallocated portion of the stack with no access (---). We won't insert the --- mapping if gap_bot == gap_top, which I think only happens if kern.sgrowsiz happens to match the stack rlimit.rl_cur. So... a crappy workaround here might be to set kern.sgrowsiz to 8MB. Obviously, that's system-wide, and doesn't chase rlim_cur. I'm not sure of the ramifications. I don't think this actually faults in backing physical memory pages, and both RW- and no-prot (---) pages consume the same amount of virtual memory. So it might be pretty harmless.
Sure, we could also just try to hide that entry. I'm honestly not sure what the point of the explicit --- mapping is for.
I think this may have been introduced in r320317 (19bd0d9c85cc): Implement address space guards.
(In reply to Conrad Meyer from comment #8) I noticed neither Linux nor FreeBSD actually bothers with accurate stack mappings in /proc/self/map. About multiple threads Linux's documentation plainly states "[stack:<tid>] (from Linux 3.4 to 4.4) … This field was removed in Linux 4.5, since providing this information for a process with large numbers of threads is expensive." However, this looks a bit fishy even with a basic single-threaded test: xubuntu@xubuntu:~$ cat address.c #define _GNU_SOURCE #include <assert.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> int main() { int i = 1; fprintf(stderr, "[[%p]]\n", &i); system("cat /proc/self/maps | tail -n 5"); return 0; } xubuntu@xubuntu:~$ gcc address.c -o test xubuntu@xubuntu:~$ ./test [[0x7ffe7a0cbca4]] 7f4f3c7af000-7f4f3c7b0000 rw-p 00000000 00:00 0 7ffe37e2e000-7ffe37e4f000 rw-p 00000000 00:00 0 [stack] 7ffe37fd0000-7ffe37fd3000 r--p 00000000 00:00 0 [vvar] 7ffe37fd3000-7ffe37fd4000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0 [vsyscall] I mean, 0x7ffe7a0cbca4 is right between [vdso] and [vsyscall]. Since apparently we are already in the pure fantasy land, it's probably not a big deal to mess this a bit more.
You need to use /proc/getpid()/maps — your most recent demo is showing cat’s memory map.
(In reply to Conrad Meyer from comment #11) Rampant copy-pasting finally got me, ROFL.
(In reply to Alex S from comment #12) OK, just for completeness, FreeBSD — pretty accurate, Linux — only for initial thread: https://gist.github.com/shkhln/af421368a36727926ad9103e8f59b455.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=ef1976ccf5420d0912afcb49733c7a88643069da commit ef1976ccf5420d0912afcb49733c7a88643069da Author: Dmitry Chagin <dchagin@FreeBSD.org> AuthorDate: 2022-06-22 11:49:40 +0000 Commit: Dmitry Chagin <dchagin@FreeBSD.org> CommitDate: 2022-06-22 11:49:40 +0000 linprocfs: Skip printing of the guard page in the /proc/self/maps To calculate the base (lowest addressable) address of the stack of the initial thread glibc parses /proc/self/maps. In fact, the base address is calculated as 'to' value of stack entry of the /proc/self/maps - stack size limit (if the stack grows down). The base address should fit in between preceding entry and stack entry of the /proc/self/maps. In FreeBSD, since 19bd0d9 (Implement address space guards), we actually have two mappings for the stack region. The first one is the no-access mapping for the region the stack can grow into (guard page), and the second - initial stack region with size sgrowsiz. The first mapping confuses Glibc, in the end which is improperly calculate stack size and the base address. PR: 253337 Reviewed by: kib Differential revision: https://reviews.freebsd.org/D35537 MFC after: 2 week sys/compat/linprocfs/linprocfs.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=b1d0fe755bb11be36d97885dbab0ac66aabb5877 commit b1d0fe755bb11be36d97885dbab0ac66aabb5877 Author: Dmitry Chagin <dchagin@FreeBSD.org> AuthorDate: 2022-06-22 11:49:40 +0000 Commit: Dmitry Chagin <dchagin@FreeBSD.org> CommitDate: 2022-07-06 11:02:15 +0000 linprocfs: Skip printing of the guard page in the /proc/self/maps To calculate the base (lowest addressable) address of the stack of the initial thread glibc parses /proc/self/maps. In fact, the base address is calculated as 'to' value of stack entry of the /proc/self/maps - stack size limit (if the stack grows down). The base address should fit in between preceding entry and stack entry of the /proc/self/maps. In FreeBSD, since 19bd0d9 (Implement address space guards), we actually have two mappings for the stack region. The first one is the no-access mapping for the region the stack can grow into (guard page), and the second - initial stack region with size sgrowsiz. The first mapping confuses Glibc, in the end which is improperly calculate stack size and the base address. PR: 253337 Reviewed by: kib Differential revision: https://reviews.freebsd.org/D35537 MFC after: 2 week (cherry picked from commit ef1976ccf5420d0912afcb49733c7a88643069da) sys/compat/linprocfs/linprocfs.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
in stable/13