Bug 200138 - [PATCH] Fixed per-thread 'specific' array allocation conflict incurred in libthr by some foreign malloc libraries
Summary: [PATCH] Fixed per-thread 'specific' array allocation conflict incurred in lib...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: threads (show other bugs)
Version: 10.1-STABLE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-threads (Nobody)
URL:
Keywords: patch
Depends on:
Blocks:
 
Reported: 2015-05-12 02:08 UTC by Yuri Victorovich
Modified: 2019-01-21 17:53 UTC (History)
3 users (show)

See Also:


Attachments
patch (1.04 KB, patch)
2015-05-12 02:08 UTC, Yuri Victorovich
no flags Details | Diff
Allocate specific together with the struct thread, avoids calloc call. (9.04 KB, patch)
2015-05-12 11:13 UTC, Konstantin Belousov
no flags Details | Diff
Use mmap for allocator (9.12 KB, patch)
2015-05-12 12:56 UTC, Konstantin Belousov
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yuri Victorovich freebsd_committer freebsd_triage 2015-05-12 02:08:54 UTC
Created attachment 156674 [details]
patch

While investigating the message coming from "Thread %p has exited with %d leftover", I found that _pthread_setspecific is entered twice with the same TCB, _get_curthread() returned the same thread pointer. Second invocation was incurred by the google tcmalloc library (devel/google-perftools port).

> (gdb) bt
> #0  yuri_enter (serno=60) at /usr/src/lib/libthr/thread/thr_spec.c:60
> #1  0x0000000804e23e89 in _pthread_setspecific (userkey=3, value=0x740a080) at /usr/src/lib/libthr/thread/thr_spec.c:228
> #2  0x000000080408b95c in ?? () from /usr/local/lib/qt4/libQtCore.so.4
> #3  0x0000000804e1da3a in thread_start (curthread=0x73f6800) at /usr/src/lib/libthr/thread/thr_create.c:288
> #4  0x0000000000000000 in ?? ()
> Backtrace stopped: Cannot access memory at address 0x7fffdf9fb000


> (gdb) bt
> #0  yuri_enter (serno=61) at /usr/src/lib/libthr/thread/thr_spec.c:60
> #1  0x0000000804e23e89 in _pthread_setspecific (userkey=2, value=0x7a9520) at /usr/src/lib/libthr/thread/thr_spec.c:228
> #2  0x00000008008ff4de in perftools_pthread_setspecific(int, void*) () from /usr/local/lib/libtcmalloc.so.4
> #3  0x00000008008f2322 in tcmalloc::ThreadCache::CreateCacheIfNecessary() () from /usr/local/lib/libtcmalloc.so.4
> #4  0x00000008008ea666 in ?? () from /usr/local/lib/libtcmalloc.so.4
> #5  0x00000008008ffd92 in tc_calloc () from /usr/local/lib/libtcmalloc.so.4
> #6  0x0000000804e243a1 in pthread_key_allocate_data () at /usr/src/lib/libthr/thread/thr_spec.c:212
> #7  0x0000000804e23f19 in _pthread_setspecific (userkey=3, value=0x740a080) at /usr/src/lib/libthr/thread/thr_spec.c:231
> #8  0x000000080408b95c in ?? () from /usr/local/lib/qt4/libQtCore.so.4
> #9  0x0000000804e1da3a in thread_start (curthread=0x73f6800) at /usr/src/lib/libthr/thread/thr_create.c:288
> #10 0x0000000000000000 in ?? ()

This causes some memory loss, and stray message about leftover TLS data.

The attached patch corrects for such potential problem by double-checking the pthread's "specific" array after allocating it, and frees the newly allocated copy if it happens to be already allocated.

Please keep the comment, because I think these lines may appear very confusing and wrong to people without the knowledge of this specific situation.
Comment 1 Yuri Victorovich freebsd_committer freebsd_triage 2015-05-12 02:09:29 UTC
Please MFC to 10.X
Comment 2 Konstantin Belousov freebsd_committer freebsd_triage 2015-05-12 07:41:17 UTC
Thread cannot race with itself.  pthread_setspecific() is only touching the current thread 'specific' member, so what race you are talking about ?
Comment 3 Yuri Victorovich freebsd_committer freebsd_triage 2015-05-12 09:07:37 UTC
Why do you close the bug without understanding? You are trying to blow me off?
Comment 4 Yuri Victorovich freebsd_committer freebsd_triage 2015-05-12 09:12:02 UTC
"Race condition" is maybe a wrong term, but it is close.

Like I said in the description, stray messages are printed in the presence of an external malloc library: "Thread %p has exited with %d leftover". And memory for per-thread key-based "specific" array gets lost.
Comment 5 Konstantin Belousov freebsd_committer freebsd_triage 2015-05-12 09:48:55 UTC
(In reply to yuri from comment #4)
Your patch makes absolutely no sense, you just repeat the code from _pthread_setspecific() into pthread_key_allocate_data().  Checking the same thread-local condition twice cannot help, as well as doing it n times, n > 1.

There is nothing wrong with _pthread_setspecific being called several times from the same thread, for the same key.
Comment 6 Yuri Victorovich freebsd_committer freebsd_triage 2015-05-12 09:54:29 UTC
There is the recursion going on. calloc calls _pthread_setspecific again, and after its return it is already allocated.
Comment 7 Konstantin Belousov freebsd_committer freebsd_triage 2015-05-12 10:15:56 UTC
(In reply to yuri from comment #6)
What does prevent the recursion from going infinite ?
And your patch still does not make any sense, since assignment to the curthread->specific only happens after the calloc() returned.

This sounds like the application bug anyway, because malloc/calloc interposers must handle the system libraries requirements.
Comment 8 Yuri Victorovich freebsd_committer freebsd_triage 2015-05-12 10:21:45 UTC
> What does prevent the recursion from going infinite?
This question is besides the point of this PR. Recursion happens anyway, with or without this patch. If the malloc library will cause infinite recursion, then this would be a bug. But this doesn't actually happen.

> This sounds like the application bug anyway, because malloc/calloc interposers must handle the system libraries requirements.
Could you please specify which application are you talking about? And what are the requirements, and where are they specified?
Comment 9 Yuri Victorovich freebsd_committer freebsd_triage 2015-05-12 10:26:09 UTC
You keep saying that this patch makes no sense. And it does fix the error message that used to be printed before all the time, and also it does fix the memory leak.

How can one reconcile your opinion that "it doesn't make sense" with the obvious positive practical consequences that have pretty good theoretical explanation?
Comment 10 Konstantin Belousov freebsd_committer freebsd_triage 2015-05-12 11:12:49 UTC
(In reply to yuri from comment #8)
Requirements for the malloc interposers are established by the system source code.  Third-party malloc library has to adhere to the constraints put by the system, if the intent is to replace the system component.

I am saying that your statements do not make sense exactly because they do not make sense.  The mere fact that a patch results in not printing a message, does not make the patch valid.

That said, I think it would be more useful to ease the life of malloc(3) interposers by avoiding the calloc call in the pthread_setspecific(3).  The size of the thread-specific is around 1K per thread, which is almost free for the userspace to carry together with the struct thread.  Non-trivial threading programs would use explicit TLS for sure.

Try the patch I am attaching.  I change specific to be a part of the struct thread, and also I cleaned style and removed false or trivial comments.
Comment 11 Konstantin Belousov freebsd_committer freebsd_triage 2015-05-12 11:13:45 UTC
Created attachment 156695 [details]
Allocate specific together with the struct thread, avoids calloc call.
Comment 12 Yuri Victorovich freebsd_committer freebsd_triage 2015-05-12 11:22:21 UTC
Now it prints this warning:
> Warning: old _pthread_cleanup_push was called, stack unwinding is disabled.
Comment 13 Yuri Victorovich freebsd_committer freebsd_triage 2015-05-12 11:31:32 UTC
Also sizeof(pthread_specific_elem)=12 * PTHREAD_KEYS_MAX=256 = 3kB.

Every thread will always have 3kB attached. I am sure there are a lot of threads which don't use pthread_setspecific at all, and an extra 3K will be a burden for them.
Comment 14 Konstantin Belousov freebsd_committer freebsd_triage 2015-05-12 12:55:27 UTC
(In reply to yuri from comment #13)
Around a 2-3KB of data for the thread is not a burden, comparing with the >=1M userspace thread stack allocated for each thread.  Also, I believe that any non-trivial threading program does use thread-specific data, not least because libc uses per-thread keys in nss and locale code.

But I can suggest to either use the standard libc malloc(3), which does not have the problem, or direct mmap(2) calls.  The later would result in some overhead due to unused part of the page for specific array, but this could not be important, since keys are used rarely, according to you previous reply.

I tend to think that the mmap(2) is preferred, but __malloc() use would also be fine.
Comment 15 Konstantin Belousov freebsd_committer freebsd_triage 2015-05-12 12:56:22 UTC
Created attachment 156696 [details]
Use mmap for allocator
Comment 16 Yuri Victorovich freebsd_committer freebsd_triage 2015-05-13 02:15:08 UTC
The mmap patch works, but with his patch every process will have the tiny separate extra mmapped block per thread. Currently every new thread creates one extra mmapped block, and this will be the second extra mmapped block. This doesn't look nice.

Currently every new thread gets its own mmaped block of the size 0x20000 = 131kB. So maybe 2.3% addition isn't that bad, I don't know. Especially that this memory isn't the real memory allocation unless actually used.

Is it possible to merge them? Adding this 3kB "specific" array block to that 0x2000-sized per-thread block?
Comment 17 Konstantin Belousov freebsd_committer freebsd_triage 2015-05-13 07:46:16 UTC
(In reply to yuri from comment #16)
I am not sure what 'mapped block of size 128K you mean.  A new thread gets allocated the stack, which has 1M or 2M size, and struct pthread of size 640B on 32bit arches.  I already proposed to put the keys table into the struct pthread, which you did not liked.

I still think that making struct pthread and keys live together is the best route if malloc() has to be avoided from pthread_setspecific().  I am also fine with the mmap() patch, despite the fact that some portion of the whole page is wasted.  I do not want to write a special allocator for this single purpose.
Comment 18 Yuri Victorovich freebsd_committer freebsd_triage 2015-05-13 09:12:20 UTC
Oh no, I like it.

Go ahead commit it if you think it is good.
Comment 19 Yuri Victorovich freebsd_committer freebsd_triage 2015-05-14 21:40:34 UTC
Ideally everything in libthr that is fixed-sized should be allocated with the single mmap, I think. But this is apparently not the case.

In my process, I see that 0x20000-sized block is always mmapped per-thread. Must be something google tcmalloc is doing. I first thought this is what libthr was doing.

But for the time being, we need the solution, so commit your patch making struct pthread and keys live together.
Comment 20 commit-hook freebsd_committer freebsd_triage 2015-05-15 08:41:18 UTC
A commit references this bug:

Author: kib
Date: Fri May 15 08:40:18 UTC 2015
New revision: 282948
URL: https://svnweb.freebsd.org/changeset/base/282948

Log:
  Some third-party malloc(3) implementations use pthread_setspecific(3)
  to handle per-thread information.  Since our pthread_setspecific()
  implementation calls calloc(3) to allocate per-thread specific data
  storage, things get complicated.

  Switch the allocator to use bare mmap(2).  There is some loss of the
  allocated page, since e.g. on amd64, PTHREAD_KEYS_MAX * sizeof(struct
  pthread_specific_elem) is 3K (it actually spans whole page due to
  padding), but I believe it is more acceptable than additional code for
  specialized allocator().

  The alternatives would either to make the specific data array be part of
  the struct thread, or use internal bindings to call the libc malloc,
  avoiding interposing.

  Also do the style pass over the thr_spec.c, esp. simplify the
  conditionals nesting by returning early when an error detected.
  Remove trivial comments.

  Found by:	yuri@rawbw.com
  PR:	200138
  Sponsored by:	The FreeBSD Foundation
  MFC after:	2 weeks

Changes:
  head/lib/libthr/thread/thr_spec.c
Comment 21 Oleksandr Tymoshenko freebsd_committer freebsd_triage 2019-01-21 17:53:10 UTC
There is a commit referencing this PR, but it's still not closed and has been inactive for some time. Closing the PR as fixed but feel free to re-open it if the issue hasn't been completely resolved.

Thanks