Summary: | Deadlock between _rtld_atfork_pre and _thr_attr_init | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | KJ Tsanaktsidis <kj> | ||||||||
Component: | threads | Assignee: | Konstantin Belousov <kib> | ||||||||
Status: | Closed FIXED | ||||||||||
Severity: | Affects Some People | CC: | cy, emaste, kib, markj, natalino.picone | ||||||||
Priority: | --- | ||||||||||
Version: | 13.2-RELEASE | ||||||||||
Hardware: | amd64 | ||||||||||
OS: | Any | ||||||||||
Attachments: |
|
Description
KJ Tsanaktsidis
2023-05-18 13:22:04 UTC
Yes, this is indeed a bug in jemalloc. It seems that the problem is that malloc_mutex_prefork() from jemalloc/src/mutex.c does not lock _all_ jemalloc mutexes, in particular, the extent mtx. OTOH, note that after fork(), in multi-threaded process child, only async-signal safe functions are guaranteed to work. Thanks for your quick eyes. > It seems that the problem is that malloc_mutex_prefork() from jemalloc/src/mutex.c does not lock _all_ jemalloc mutexes, in particular, the extent mtx. I guess I could try patching this and see if it makes the deadlocks on my machine go away. I'll give it a go over the weekend and report back! > OTOH, note that after fork(), in multi-threaded process child, only async-signal safe functions are guaranteed to work. Yeah.... this is unfortunately very common in the Ruby world though, because of a combination of two things: * Gems (especially gems by monitoring providers like Datadog, Sentry, NewRelic, etc) that maintain background threads (to e.g. report metrics & exceptions asynchronously to some central service) * Pre-forking webservers (like Unicorn) are frequently used to get good use of multi-core machines (since only one Ruby thread can run Ruby code at a time). These webservers often load all of an applications gems, and _then_ fork workers, to get good use of shared copy-on-write pages. Most gems that maintain background threads install after_fork hooks to restart them when they're used inside preforking webservers. The documentation for fork(2) does say: > The FreeBSD implementation of fork() provides a usable malloc(3), and rtld(1) services in the child process. when referring to multi-threaded forks, so I guess _this_ is supposed to work. I do note however that the man page says nothing about DNS resolution working in the child of a multithreaded fork :/ but that's a problem for another day (maybe Ruby ensures that the resolver is not in use before forking? I haven't looked) Yes, I was too optimistic in stating that in the fork(2) man page. Problem with the complete fix is that all jemalloc(2) locks needs to be taken in pre-fork hook, in the right order. Can you please provide me the stand-alone ruby program that demonstrates the problem in non-disputable way, and the exact instructions how to launch it? > Can you please provide me the stand-alone ruby program that demonstrates the problem in non-disputable way, and the exact instructions how to launch it? Unfortunately I still haven't been able to isolate the problem outside of the actual Ruby test harness.... so this is all a bit involved. I've been trying to develop a minimal reproduction (with either C or Ruby) but I haven't had any luck so far. * You'll need `git`, `autoconf`, `automake`, `bison`, `m4`, `libffi`, `libunwind`, and `libyaml`. * Clone my Ruby fork: `git clone https://github.com/KJTsanaktsidis/ruby.git` * Check out this commit (includes some _other_ fixes necessary to surface this problem): `git checkout d5471e6e1e8b81d7c7e960538d138868db0e0707` * `autoreconf -fi` * `./configure` (I don't think any particular flags are needed) * `make` Then, use the built ruby to run this particular test in a loop, until eventually it hangs: ``` while ./miniruby -I./lib -I. -I.ext/common ./tool/runruby.rb --extout=.ext -- ./test/runner.rb test/ruby/test_process.rb -n test_daemon_no_threads; do echo "ok"; done; ``` Created attachment 242274 [details]
Another gdb backtrace of the problem
I did make a bit more progress debugging the issue though - I don't think the problem has anything to do with jemalloc at all - it's _just_ an RTLD issue. I captured a different backtrace of the problem and attached it. In this one thread #2 is just sitting there chilling on an unrelated mutex; it is _not_ doing any jemalloc things. So, thread #1 manages to get deadlocked during `_rtld_atfork_pre` all by itself. In particular, it's blocked acquiring the `rtld_bind_lock` in write mode, and the lock appears to already be held in write mode (by who??): ``` (gdb) print *((struct rtld_lock *)lock) $21 = {lock = {rw_state = -1610612736, rw_flags = 2, rw_blocked_readers = 1, rw_blocked_writers = 0, rw_spare = {0, 0, 0, 0}}, _pad = '\000' <repeats 31 times>} # bitmask against URWLOCK constants, from https://github.com/freebsd/freebsd-src/blob/1377eb268a5a8a5573bda33a651963259c43c3ed/sys/sys/umtx.h#L62 (gdb) print /x ((struct rtld_lock *)lock)->lock.rw_state & 0x80000000U $22 = 0x80000000 (gdb) print /x ((struct rtld_lock *)lock)->lock.rw_state & 0x40000000U $23 = 0x0 (gdb) print /x ((struct rtld_lock *)lock)->lock.rw_state & 0x1FFFFFFFU $24 = 0x0 (gdb) ``` Also also, I'm not 100% sure yet that this wasn't just luck so far, but it _seems_ I can avoid the deadlock by setting `LD_BIND_NOW`... Anyway I'll keep hunting for an isolated reproduction. If you have a stab in the dark though I'm happy to try a patch as well. Created attachment 242275 [details]
Isolated reproduction of the issue
Ahah! I managed to make an isolated program which causes the issue. See the file `reproduction.c` I added as an attachment. This will eventually hang not printing anything, and attaching GDB to the child process (not the parent) will show a similar stack trace going through `_rtld_atfork_pre`: ``` clang -std=c11 reproduction.c -lpthread -o reproduction while true; do ./reproduction; done; ``` However, this will _NOT_ hang: ``` while true; do LD_BIND_NOW=yes ./reproduction; done; ``` (on my machine, in any case). The trick seems to be doing a multithreaded fork, followed by _another_ multithreaded fork. Thanks again for having a look at this Konstantin! Please try the following. Does the patch help in your settings? I am esp. interested in the ruby test. diff --git a/lib/libthr/thread/thr_rtld.c b/lib/libthr/thread/thr_rtld.c index a9d1924967a7..c9c4871bab54 100644 --- a/lib/libthr/thread/thr_rtld.c +++ b/lib/libthr/thread/thr_rtld.c @@ -80,7 +80,7 @@ _thr_rtld_lock_create(void) busy_places |= (1 << locki); l = &lock_place[locki]; - l->lock.rw_flags = URWLOCK_PREFER_READER; +// l->lock.rw_flags = URWLOCK_PREFER_READER; return (l); } Or better, try https://reviews.freebsd.org/D40178 That should be the final fix, assuming it works. Yes - https://reviews.freebsd.org/D40178 seems to fix the issue with running the ruby tests. I applied it on top of the `releng/13.2` branch on my system and followed the I've been running the test in question for half an hour in a loop and no failures yet! Thanks so much for your help debugging this. I must say - this has been my first experience with anything in BSD-world, and I've really appreciated having the source for the base system all in one giant git repo together. Makes it easy to go spelunking and understand how it all fits together. As someone who recently spent many hours staring at #ifdef soup across glibc, libgcc_s, and the Linux kernel recently, it's been very refreshing! I don't really know how the backport process works - is your intention to backport this to 13.1/13.2 or would the fix just apply from 13.3 forward? If there's no backport I'll probably add `LD_BIND_NOW=yes` to the Ruby test runner for FreeBSD <= 13.2 - does this seem reasonable? LD_BIND_NOW is not a fix, it mostly avoids taking the bind lock in parallel with the new thread creating its TLS segment, which also requires the same lock. If so critical, I can prepare errata notice for 13.2 after the change sit in main and stable/13 for some time. It simply some additional work both for me and for security team which manages updates on releng branches. I don't think it's critical that this be patched in 13.2 just for the sake of the Ruby CI tests. If I find someone who actually cares about this in a real application that's a different story I guess. I realise LD_BIND_NOW isn't a real fix - more just something I could add to the test suite to avoid CI failures (rubyci.org actually still runs builds on FreeBSD 12.4 & 13.1, and it'd be nice if they also stopped failing for this reason). (In reply to KJ Tsanaktsidis from comment #14) My 2ยข. This long-standing bug affecting FreeBSD starting from 13.0 was not investigated because it was very specific. I think this should be released asap as it's not strictly related to Ruby, but it's in the thread lib inside the base. In my case, I had to revert the upgrade to 13.0 because of it. A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=6f49eafb056cfa0703dfc97a731cabe4ed2596b8 commit 6f49eafb056cfa0703dfc97a731cabe4ed2596b8 Author: Konstantin Belousov <kib@FreeBSD.org> AuthorDate: 2023-05-20 08:11:54 +0000 Commit: Konstantin Belousov <kib@FreeBSD.org> CommitDate: 2023-05-22 13:46:42 +0000 libthr rtld locks: do not leak URWLOCK_READ_WAITERS into child Since there is only the current thread in the child, no pending readers exist. Clear the bit, since it confuses future attempts to acquire write ownership of the rtld locks, due to URWLOCK_PREFER_READERS flag. To be future-proof, clear all state about pending writers and readers. PR: 271490 Reported and tested by: KJ Tsanaktsidis <kj@kjtsanaktsidis.id.au> Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D40178 lib/libthr/thread/thr_fork.c | 4 ++++ lib/libthr/thread/thr_private.h | 2 ++ lib/libthr/thread/thr_rtld.c | 11 +++++++++++ 3 files changed, 17 insertions(+) A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=5b45cd83c44edb9386eed1155acd3c52f36bf12c commit 5b45cd83c44edb9386eed1155acd3c52f36bf12c Author: Konstantin Belousov <kib@FreeBSD.org> AuthorDate: 2023-05-20 08:11:54 +0000 Commit: Konstantin Belousov <kib@FreeBSD.org> CommitDate: 2023-05-29 03:20:42 +0000 libthr rtld locks: do not leak URWLOCK_READ_WAITERS into child PR: 271490 (cherry picked from commit 6f49eafb056cfa0703dfc97a731cabe4ed2596b8) lib/libthr/thread/thr_fork.c | 4 ++++ lib/libthr/thread/thr_private.h | 2 ++ lib/libthr/thread/thr_rtld.c | 11 +++++++++++ 3 files changed, 17 insertions(+) |