Bug 253337 - Linuxulator: glibc's pthread_getattr_np reports stack size as 124K
Summary: Linuxulator: glibc's pthread_getattr_np reports stack size as 124K
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: Unspecified
Hardware: Any Any
: --- Affects Only Me
Assignee: Dmitry Chagin
URL:
Keywords:
Depends on:
Blocks: 247219
  Show dependency treegraph
 
Reported: 2021-02-08 06:42 UTC by Alex S
Modified: 2022-07-06 11:08 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alex S 2021-02-08 06:42:37 UTC
I think it should return a value equal or slightly smaller than
RLIMIT_STACK instead. It does so on Ubuntu at least.

Apparently, Mono (most notably used in the popular Unity game engine)
relies on this for setting stack guards:
https://github.com/mono/mono/blob/da11592cbea4269971f4b1f9624769a85cc10660/mono/utils/mono-threads-linux.c#L13-L38,
https://github.com/mono/mono/blob/43190aeb5f7e4d7e0185d3b656054bf232219fe2/mono/mini/mini-exceptions.c#L3160-L3175.

Reproducer:
% uname -a
FreeBSD desktop 12.2-RELEASE-p1 FreeBSD 12.2-RELEASE-p1 GENERIC  amd64
% cat apparent_stack_size.c
#define _GNU_SOURCE

#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>

int main() {
  char cmd[100];
  snprintf(cmd, sizeof(cmd), "cat /proc/%d/maps | tail -n 5", getpid());
  system(cmd);

  size_t size = 0;
  void*  addr = NULL;

  pthread_attr_t attr;
  assert(pthread_attr_init(&attr) == 0);
  assert(pthread_getattr_np(pthread_self(), &attr) == 0);
  assert(pthread_attr_getstack(&attr, &addr, &size) == 0);
  assert(pthread_attr_destroy(&attr) == 0);

  fprintf(stderr, "stack size = %zd\n", size);

  return 0;
}
% /compat/linux/bin/cc apparent_stack_size.c -pthread -o test
% ./test
00000008011c7000-00000008011c9000 rw-p 0038a000 00:00 391497     /compat/linux/usr/lib64/libc-2.17.so
00000008011c9000-00000008011ce000 rw-p 00000000 00:00 0
00007fffdffff000-00007ffffffdf000 ---p 00000000 00:00 0
00007ffffffdf000-00007ffffffff000 rw-p 00000000 00:00 0           [stack]
00007ffffffff000-0000800000000000 r-xs 00000000 00:00 0           [vdso]
stack size = 126976

As it happens, glibc reads /proc/self/maps and compares the stack entry
to the preceding entry. You know, just in case:

  /* The limit might be too high.  */
  if ((size_t) iattr->stacksize
    > (size_t) iattr->stackaddr - last_to)
  iattr->stacksize = (size_t) iattr->stackaddr - last_to;

(https://sourceware.org/git/?p=glibc.git;a=blob;f=nptl/pthread_getattr_np.c;h=25807cb529880d67a6561b6ebcd45042e89dea3e;hb=HEAD#l144)
Comment 1 Conrad Meyer freebsd_committer freebsd_triage 2021-02-08 07:59:58 UTC
What's wrong here?  00007ffffffdf000-00007ffffffff000 is 128kB; the reported pthread value is pretty close to that.  I.e., what about this triggers a bad behavior in Mono?
Comment 2 Alex S 2021-02-08 08:11:52 UTC
(In reply to Conrad Meyer from comment #1)

> 00007ffffffdf000-00007ffffffff000 is 128kB

Yeah, sorry. In fact this the case on Linux as well. It's just that pthread_getattr_np is not supposed to be return an actual allocated value, but rather how much stack can grow. (Well, probably, I'm exactly sure if it's specified anywhere.)
Comment 3 Alex S 2021-02-08 08:12:47 UTC
(In reply to Alex S from comment #2)

s/be//
s/exactly sure/not exactly sure/
Comment 4 Alex S 2021-02-08 08:31:58 UTC
Here's how this looks on Ubuntu:

xubuntu@xubuntu:~$ uname -a
Linux xubuntu 5.4.0-42-generic #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
xubuntu@xubuntu:~$ gcc apparent_stack_size.c -pthread -o test
xubuntu@xubuntu:~$ ./test
7feaba441000-7feaba442000 rw-p 00000000 00:00 0 
7ffca0aef000-7ffca0b10000 rw-p 00000000 00:00 0                          [stack]
7ffca0b6b000-7ffca0b6e000 r--p 00000000 00:00 0                          [vvar]
7ffca0b6e000-7ffca0b6f000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0                  [vsyscall]
stack size = 8384512
xubuntu@xubuntu:~$ ulimit -s 3000
xubuntu@xubuntu:~$ ./test
7f07aa268000-7f07aa269000 rw-p 00000000 00:00 0 
7ffc606ee000-7ffc6070f000 rw-p 00000000 00:00 0                          [stack]
7ffc60765000-7ffc60768000 r--p 00000000 00:00 0                          [vvar]
7ffc60768000-7ffc60769000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0                  [vsyscall]
stack size = 3063808
Comment 5 Conrad Meyer freebsd_committer freebsd_triage 2021-02-13 16:44:49 UTC
Ok, here's what's going on:

Our rlimit value (cur) is fine; same as glibc (8MB).

Glibc is parsing /proc/self/maps to limit the pthread "stack size" based on adjacent mappings, *which it assumes cannot be part of the stack*.  In FreeBSD, we actually have explicit no-access mapping for the region the stack can grow into (rlim_max):

00007fffdffff000-00007ffffffdf000 ---p 00000000 00:00 0
00007ffffffdf000-00007ffffffff000 rw-p 00000000 00:00 0           [stack]

I.e., that earlier mapping also corresponds to the stack.

Linux doesn't do this, or doesn't report it in /proc/self/maps.
Comment 6 Alex S 2021-02-13 16:54:50 UTC
(In reply to Conrad Meyer from comment #5)

Yep, that's pretty much it. Do you think it makes sense to just hide that entry?
Comment 7 Conrad Meyer freebsd_committer freebsd_triage 2021-02-13 17:16:47 UTC
In exec, we map the stack with vm_map_stack() with rlim_cur (I think); in vm_map_stack, we set the init_ssize with MIN(sysctl kern.sgrowsiz, rlim_cur).  

kern.sgrowsiz is 128kB.

There's a comment about the behavior in vm/vm_map.c:4565.  At line 4585 we insert the normal stack mapping.  At line 4599 we insert the reservation for the unallocated portion of the stack with no access (---).

We won't insert the --- mapping if gap_bot == gap_top, which I think only happens if kern.sgrowsiz happens to match the stack rlimit.rl_cur.

So... a crappy workaround here might be to set kern.sgrowsiz to 8MB.  Obviously, that's system-wide, and doesn't chase rlim_cur.  I'm not sure of the ramifications.  I don't think this actually faults in backing physical memory pages, and both RW- and no-prot (---) pages consume the same amount of virtual memory.  So it might be pretty harmless.
Comment 8 Conrad Meyer freebsd_committer freebsd_triage 2021-02-13 17:17:32 UTC
Sure, we could also just try to hide that entry.  I'm honestly not sure what the point of the explicit --- mapping is for.
Comment 9 Conrad Meyer freebsd_committer freebsd_triage 2021-02-13 17:19:49 UTC
I think this may have been introduced in r320317 (19bd0d9c85cc):
Implement address space guards.
Comment 10 Alex S 2021-02-13 18:23:53 UTC
(In reply to Conrad Meyer from comment #8)

I noticed neither Linux nor FreeBSD actually bothers with accurate
stack mappings in /proc/self/map. About multiple threads Linux's
documentation plainly states "[stack:<tid>] (from Linux 3.4 to 4.4) …
This field was removed in Linux 4.5, since providing this information
for a process with large numbers of threads is expensive."

However, this looks a bit fishy even with a basic single-threaded test:

xubuntu@xubuntu:~$ cat address.c 
#define _GNU_SOURCE

#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main() {
  int i = 1;
  fprintf(stderr, "[[%p]]\n", &i);
  system("cat /proc/self/maps | tail -n 5");
  return 0;
}
xubuntu@xubuntu:~$ gcc address.c -o test
xubuntu@xubuntu:~$ ./test
[[0x7ffe7a0cbca4]]
7f4f3c7af000-7f4f3c7b0000 rw-p 00000000 00:00 0 
7ffe37e2e000-7ffe37e4f000 rw-p 00000000 00:00 0                          [stack]
7ffe37fd0000-7ffe37fd3000 r--p 00000000 00:00 0                          [vvar]
7ffe37fd3000-7ffe37fd4000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0                  [vsyscall]

I mean, 0x7ffe7a0cbca4 is right between [vdso] and [vsyscall].

Since apparently we are already in the pure fantasy land,
it's probably not a big deal to mess this a bit more.
Comment 11 Conrad Meyer freebsd_committer freebsd_triage 2021-02-13 19:02:19 UTC
You need to use /proc/getpid()/maps — your most recent demo is showing cat’s memory map.
Comment 12 Alex S 2021-02-13 19:04:27 UTC
(In reply to Conrad Meyer from comment #11)

Rampant copy-pasting finally got me, ROFL.
Comment 13 Alex S 2021-02-13 19:38:11 UTC
(In reply to Alex S from comment #12)

OK, just for completeness, FreeBSD — pretty accurate, Linux — only for initial thread: https://gist.github.com/shkhln/af421368a36727926ad9103e8f59b455.
Comment 14 commit-hook freebsd_committer freebsd_triage 2022-06-22 11:51:38 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=ef1976ccf5420d0912afcb49733c7a88643069da

commit ef1976ccf5420d0912afcb49733c7a88643069da
Author:     Dmitry Chagin <dchagin@FreeBSD.org>
AuthorDate: 2022-06-22 11:49:40 +0000
Commit:     Dmitry Chagin <dchagin@FreeBSD.org>
CommitDate: 2022-06-22 11:49:40 +0000

    linprocfs: Skip printing of the guard page in the /proc/self/maps

    To calculate the base (lowest addressable) address of the stack of the
    initial thread glibc parses /proc/self/maps.
    In fact, the base address is calculated as 'to' value of stack entry of the
    /proc/self/maps - stack size limit (if the stack grows down).
    The base address should fit in between preceding entry and stack entry of
    the /proc/self/maps.
    In FreeBSD, since 19bd0d9 (Implement address space guards), we actually
    have two mappings for the stack region. The first one is the no-access
    mapping for the region the stack can grow into (guard page), and the
    second - initial stack region with size sgrowsiz.
    The first mapping confuses Glibc, in the end which is improperly
    calculate stack size and the base address.

    PR:                     253337
    Reviewed by:            kib
    Differential revision:  https://reviews.freebsd.org/D35537
    MFC after:              2 week

 sys/compat/linprocfs/linprocfs.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)
Comment 15 commit-hook freebsd_committer freebsd_triage 2022-07-06 11:04:34 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=b1d0fe755bb11be36d97885dbab0ac66aabb5877

commit b1d0fe755bb11be36d97885dbab0ac66aabb5877
Author:     Dmitry Chagin <dchagin@FreeBSD.org>
AuthorDate: 2022-06-22 11:49:40 +0000
Commit:     Dmitry Chagin <dchagin@FreeBSD.org>
CommitDate: 2022-07-06 11:02:15 +0000

    linprocfs: Skip printing of the guard page in the /proc/self/maps

    To calculate the base (lowest addressable) address of the stack of the
    initial thread glibc parses /proc/self/maps.
    In fact, the base address is calculated as 'to' value of stack entry of the
    /proc/self/maps - stack size limit (if the stack grows down).
    The base address should fit in between preceding entry and stack entry of
    the /proc/self/maps.
    In FreeBSD, since 19bd0d9 (Implement address space guards), we actually
    have two mappings for the stack region. The first one is the no-access
    mapping for the region the stack can grow into (guard page), and the
    second - initial stack region with size sgrowsiz.
    The first mapping confuses Glibc, in the end which is improperly
    calculate stack size and the base address.

    PR:                     253337
    Reviewed by:            kib
    Differential revision:  https://reviews.freebsd.org/D35537
    MFC after:              2 week

    (cherry picked from commit ef1976ccf5420d0912afcb49733c7a88643069da)

 sys/compat/linprocfs/linprocfs.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)
Comment 16 Dmitry Chagin freebsd_committer freebsd_triage 2022-07-06 11:08:39 UTC
in stable/13