A complex application run in limited address space eventually fails to allocate a mutex. pthread_mutex_init normally calls __thr_calloc which calls __crt_calloc in rtld-malloc.c. When out of memory __crt_calloc calls morecore, which always adds NPOOLPAGES (128K) to the requested size (circa 128 bytes for a mutex). This way pthread_mutex_init and others will eventually fail when the address space is both limited and fragmented. A simple fix will be to retry morepages(amt/pagesz) every time morepages(amt/pagesz + NPOOLPAGES) fails.
Yes, I think something along these lines should help. Can you confirm? diff --git a/libexec/rtld-elf/rtld_malloc.c b/libexec/rtld-elf/rtld_malloc.c index 6604aa7201f8..61c18abb0d49 100644 --- a/libexec/rtld-elf/rtld_malloc.c +++ b/libexec/rtld-elf/rtld_malloc.c @@ -184,7 +184,9 @@ morecore(int bucket) nblks = 1; } if (amt > pagepool_end - pagepool_start) - if (morepages(amt/pagesz + NPOOLPAGES) == 0) + if (morepages(amt / pagesz + NPOOLPAGES) == 0 && + /* Retry with min required size */ + morepages(amt / pagesz) == 0) return; op = (union overhead *)pagepool_start; pagepool_start += amt;
Created attachment 228635 [details] 0 mutexes bug
(In reply to Konstantin Belousov from comment #1) Sorry, I cannot confirm, but after some more testing I think that fixing rtld won't help much. Please have a look at the attached program. On 11.2-RELEASE it prints alloc 51712 Kbytes freed 51712 Kbytes 612224 mutexes: 76528 Kbytes Whereas on 13.0-RELEASE the result is alloc 45312 Kbytes freed 45312 Kbytes 0 mutexes: 0 Kbytes (It MUST be compiled with -pthread) Apparently jemalloc does not unmap free pages on the second system, which is where I noticed pthread_mutex_init failures.
So there are two unrelated issues. One is jemalloc, and there I could not help, ask jemalloc developers perhaps. Another *is* in crt_malloc, and my fixed patch somewhat helps your problem: solo% LD_LIBRARY_PATH=/usr/obj/usr/home/kostik/work/DEV/src/amd64.amd64/lib/libthr ktrace -f /tmp/ktrace.out ./jemalloc_limit alloc 51200 Kbytes freed 51200 Kbytes jemalloc_limit: pthread_mutex_init: Cannot allocate memory 320 mutexes: 40 Kbytes I suspect you rebuilt rtld instead of libthr. The fixed patch is available at https://reviews.freebsd.org/D32474
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=19e008e7f79ce55182d227be8513b3fa520471d8 commit 19e008e7f79ce55182d227be8513b3fa520471d8 Author: Konstantin Belousov <kib@FreeBSD.org> AuthorDate: 2021-10-11 16:13:31 +0000 Commit: Konstantin Belousov <kib@FreeBSD.org> CommitDate: 2021-10-12 23:37:09 +0000 crt_malloc: Be more persistent when handling mmap() failure In the situation with limited address space, together with fragmentation, it is possible for mmap() request in morecore() to fail when asking for required size + NPOOLPAGES, but succeed without the addend. Retry allocation there. PR: 259076 Reported by: Denis Koreshkov <dynamic-wind@mail.ru> Reviewed by: arichardson Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32474 libexec/rtld-elf/rtld_malloc.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
All right, here's a worse bug in rtld-malloc. Once morepages has failed, it sets pagepool_start to MAP_FAILED, that is -1. If the next call to __crt_malloc is for the same bucket, morecore is called again but does not call morepages, because (amt > pagepool_end - pagepool_start) is TRUE. And the free list loop crashes. A quick test (with -lpthread, release >= 12.1): #include <sys/time.h> #include <sys/resource.h> #include <pthread.h> int main() { pthread_mutex_t t; int lim = 1<<26; struct rlimit rl = { lim,lim }; setrlimit(RLIMIT_AS,&rl); for(;;) { if (pthread_mutex_init(&t,NULL)) { pthread_mutex_init(&t,NULL); break; } } return 0; }
(In reply to Denis Koreshkov from comment #6) This was fixed by 19e008e7f79ce55182d227be8513b3fa520471d8 as well.
(In reply to Konstantin Belousov from comment #7) If you meant these lines, they do not help in the case mmap returns MAP_FAILED. + if (pagepool_start == MAP_FAILED) + pagepool_start = 0; offset = (uintptr_t)pagepool_start - rounddown2( (uintptr_t)pagepool_start, pagesz); pagepool_start = mmap(0, n * pagesz, PROT_READ | PROT_WRITE, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(In reply to Denis Koreshkov from comment #8) Why do not they help? When mmap(2) returns MMAP_FAILED, morepages() returns 0. Then, morecore would not touch pagepool_start. Hmm, It is probably wrong that pagepool_end is not reset there. I think I can agree to some part of your note, that we should e.g. set both pagepool_start and pagepool_end to 0 to avoid munmap(2)-ing something not own by us on the next allocation attempt. Like this (didn't tested) diff --git a/libexec/rtld-elf/rtld_malloc.c b/libexec/rtld-elf/rtld_malloc.c index 64218b5bb786..54159bec8e4e 100644 --- a/libexec/rtld-elf/rtld_malloc.c +++ b/libexec/rtld-elf/rtld_malloc.c @@ -271,21 +271,21 @@ morepages(int n) } } - if (pagepool_start == MAP_FAILED) - pagepool_start = 0; offset = (uintptr_t)pagepool_start - rounddown2( (uintptr_t)pagepool_start, pagesz); - pagepool_start = mmap(0, n * pagesz, PROT_READ | PROT_WRITE, + addr = mmap(0, n * pagesz, PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0); - if (pagepool_start == MAP_FAILED) { + if (addr == MAP_FAILED) { #ifdef IN_RTLD rtld_fdprintf(STDERR_FILENO, _BASENAME_RTLD ": morepages: " "cannot mmap anonymous memory: %s\n", rtld_strerror(errno)); #endif + pagepool_start = pagepool_end = 0; return (0); } + pagepool_start = addr; pagepool_end = pagepool_start + n * pagesz; pagepool_start += offset;
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=73dddffc3175581ba99f6ced9a2e508a0e880e59 commit 73dddffc3175581ba99f6ced9a2e508a0e880e59 Author: Konstantin Belousov <kib@FreeBSD.org> AuthorDate: 2021-10-15 17:59:37 +0000 Commit: Konstantin Belousov <kib@FreeBSD.org> CommitDate: 2021-10-18 22:02:47 +0000 crt_malloc: more accurate handling of mmap(2) failure Reset both pagepool_start and pagepool_end after a mmap(2) failure, to avoid using invalid pagepool either for allocation or munmap(2). PR: 259076 Noted by: Denis Koreshkov <dynamic-wind@mail.ru> Reviewed by: arichardson Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32514 libexec/rtld-elf/rtld_malloc.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=e5a8b8152ade3c658a9a180f0bd2f6ae82302948 commit e5a8b8152ade3c658a9a180f0bd2f6ae82302948 Author: Konstantin Belousov <kib@FreeBSD.org> AuthorDate: 2021-10-11 16:13:31 +0000 Commit: Konstantin Belousov <kib@FreeBSD.org> CommitDate: 2021-10-22 23:32:27 +0000 crt_malloc: Be more persistent when handling mmap() failure PR: 259076 (cherry picked from commit 19e008e7f79ce55182d227be8513b3fa520471d8) libexec/rtld-elf/rtld_malloc.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=00df149452bfd1b9eee9acba842bb6bc874c5ae3 commit 00df149452bfd1b9eee9acba842bb6bc874c5ae3 Author: Konstantin Belousov <kib@FreeBSD.org> AuthorDate: 2021-10-15 17:59:37 +0000 Commit: Konstantin Belousov <kib@FreeBSD.org> CommitDate: 2021-10-22 23:32:27 +0000 crt_malloc: more accurate handling of mmap(2) failure PR: 259076 (cherry picked from commit 73dddffc3175581ba99f6ced9a2e508a0e880e59) libexec/rtld-elf/rtld_malloc.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)