Bug 271490

Summary: Deadlock between _rtld_atfork_pre and _thr_attr_init
Product: Base System Reporter: KJ Tsanaktsidis <kj>
Component: threadsAssignee: Konstantin Belousov <kib>
Status: Closed FIXED    
Severity: Affects Some People CC: cy, emaste, kib, markj, natalino.picone
Priority: ---    
Version: 13.2-RELEASE   
Hardware: amd64   
OS: Any   
Attachments:
Description Flags
GDB output from my stuck process
none
Another gdb backtrace of the problem
none
Isolated reproduction of the issue none

Description KJ Tsanaktsidis 2023-05-18 13:22:04 UTC
Created attachment 242250 [details]
GDB output from my stuck process

I'm working on fixing some test failures in Ruby on FreeBSD, and I've found what I think is a deadlock in FreeBSD between two threads, one of which is forking and one of which is just starting.

The Ruby test which does this looks something like the following:

```
require 'timeout'
def test_daemon_no_threads
  data = Timeout.timeout(3) do
    IO.popen("-") do |f|
        break f.readlines.map(&:chomp) if f
        th = Thread.start { sleep 3 }
        Process.daemon(true, true)
        puts "this is sometimes never reached!"
    end
  end
end
```

The test forks, and then in the child, starts a thread and then forks again (via a call to daemon(3)). On my machine, this will semi-reliably produce a deadlock inside the call to `Process.daemon` (and before the second fork actually takes place). The thread calling `daemon(3)` gets stuck acquiring locks in `_rtld_atfork_pre`, whilst the new thread (the one which is started as `Thread.start { sleep 3 }` is stuck inside jemalloc's `extent_deactivate` as part of a call to `_thr_attr_init`.

---

To reproduce this in the Ruby source, one can checkout the latest Ruby master from github.com/ruby/ruby.git, do the autoconf/configure dance, run `make`, and then run

```
while ./miniruby -I./lib -I. -I.ext/common ./tool/runruby.rb --extout=.ext -- ./test/runner.rb test/ruby/test_process.rb -n test_daemon_no_threads; do echo "ok"; done;
```

On my machine, this will eventually get stuck forever in the test. 

n.b. - if you find the test failing for this reason - http://rubyci.s3.amazonaws.com/freebsd13/ruby-master/log/20230517T003001Z.fail.html.gz - that's a _different_ issue, in Ruby itself, that I am trying to fix - that's how I discovered this deadlock thing...

---

I've attached a pair of backtraces I got from attaching gdb to the process. I don't really know where to go from here to debug the issue though; I can't seem to make it happen in a similar C program, and by inspection I can't really see why there's a deadlock at all. I can't work out why the forking thread would hold any jemalloc locks, and I _really_ can't see why the new thread would hold any rtld locks.

Any thoughts? Thanks!
Comment 1 Konstantin Belousov freebsd_committer freebsd_triage 2023-05-18 17:44:25 UTC
Yes, this is indeed a bug in jemalloc.  It seems that the problem is that
malloc_mutex_prefork() from jemalloc/src/mutex.c does not lock _all_ jemalloc
mutexes, in particular, the extent mtx.

OTOH, note that after fork(), in multi-threaded process child, only async-signal
safe functions are guaranteed to work.
Comment 2 KJ Tsanaktsidis 2023-05-19 00:40:21 UTC
Thanks for your quick eyes.

>  It seems that the problem is that
malloc_mutex_prefork() from jemalloc/src/mutex.c does not lock _all_ jemalloc
mutexes, in particular, the extent mtx.

I guess I could try patching this and see if it makes the deadlocks on my machine go away. I'll give it a go over the weekend and report back!

> OTOH, note that after fork(), in multi-threaded process child, only async-signal
safe functions are guaranteed to work.

Yeah.... this is unfortunately very common in the Ruby world though, because of a combination of two things:

* Gems (especially gems by monitoring providers like Datadog, Sentry, NewRelic, etc) that maintain background threads (to e.g. report metrics & exceptions asynchronously to some central service)
* Pre-forking webservers (like Unicorn) are frequently used to get good use of multi-core machines (since only one Ruby thread can run Ruby code at a time). These webservers often load all of an applications gems, and _then_ fork workers, to get good use of shared copy-on-write pages.

Most gems that maintain background threads install after_fork hooks to restart them when they're used inside preforking webservers.

The documentation for fork(2) does say:

> The FreeBSD implementation of fork() provides a usable malloc(3), and rtld(1) services in the child process.

when referring to multi-threaded forks, so I guess _this_ is supposed to work. I do note however that the man page says nothing about DNS resolution working in the child of a multithreaded fork :/ but that's a problem for another day (maybe Ruby ensures that the resolver is not in use before forking? I haven't looked)
Comment 3 Konstantin Belousov freebsd_committer freebsd_triage 2023-05-19 04:26:50 UTC
Yes, I was too optimistic in stating that in the fork(2) man page.

Problem with the complete fix is that all jemalloc(2) locks needs to be taken
in pre-fork hook, in the right order.
Comment 4 Konstantin Belousov freebsd_committer freebsd_triage 2023-05-19 05:31:22 UTC
Can you please provide me the stand-alone ruby program that demonstrates the
problem in non-disputable way, and the exact instructions how to launch it?
Comment 5 KJ Tsanaktsidis 2023-05-19 11:20:15 UTC
> Can you please provide me the stand-alone ruby program that demonstrates the
problem in non-disputable way, and the exact instructions how to launch it?

Unfortunately I still haven't been able to isolate the problem outside of the actual Ruby test harness.... so this is all a bit involved. I've been trying to develop a minimal reproduction (with either C or Ruby) but I haven't had any luck so far.

* You'll need `git`, `autoconf`, `automake`, `bison`, `m4`, `libffi`, `libunwind`, and `libyaml`.
* Clone my Ruby fork: `git clone https://github.com/KJTsanaktsidis/ruby.git`
* Check out this commit (includes some _other_ fixes necessary to surface this problem): `git checkout d5471e6e1e8b81d7c7e960538d138868db0e0707`
* `autoreconf -fi`
* `./configure` (I don't think any particular flags are needed)
* `make`

Then, use the built ruby to run this particular test in a loop, until eventually it hangs:

```
while ./miniruby -I./lib -I. -I.ext/common ./tool/runruby.rb --extout=.ext -- ./test/runner.rb test/ruby/test_process.rb -n test_daemon_no_threads; do echo "ok"; done;
```
Comment 6 KJ Tsanaktsidis 2023-05-19 11:20:55 UTC
Created attachment 242274 [details]
Another gdb backtrace of the problem
Comment 7 KJ Tsanaktsidis 2023-05-19 11:24:49 UTC
I did make a bit more progress debugging the issue though - I don't think the problem has anything to do with jemalloc at all - it's _just_ an RTLD issue.

I captured a different backtrace of the problem and attached it. In this one thread #2 is just sitting there chilling on an unrelated mutex; it is _not_ doing any jemalloc things. So, thread #1 manages to get deadlocked during `_rtld_atfork_pre` all by itself.

In particular, it's blocked acquiring the `rtld_bind_lock` in write mode, and the lock appears to already be held in write mode (by who??):

```
(gdb) print *((struct rtld_lock *)lock)

$21 = {lock = {rw_state = -1610612736, rw_flags = 2, rw_blocked_readers = 1, rw_blocked_writers = 0, rw_spare = {0, 0, 0, 0}},
  _pad = '\000' <repeats 31 times>}
# bitmask against URWLOCK constants, from https://github.com/freebsd/freebsd-src/blob/1377eb268a5a8a5573bda33a651963259c43c3ed/sys/sys/umtx.h#L62
(gdb) print /x ((struct rtld_lock *)lock)->lock.rw_state & 0x80000000U

$22 = 0x80000000
(gdb) print /x ((struct rtld_lock *)lock)->lock.rw_state & 0x40000000U

$23 = 0x0
(gdb) print /x ((struct rtld_lock *)lock)->lock.rw_state & 0x1FFFFFFFU

$24 = 0x0
(gdb)
```

Also also, I'm not 100% sure yet that this wasn't just luck so far, but it _seems_ I can avoid the deadlock by setting `LD_BIND_NOW`...

Anyway I'll keep hunting for an isolated reproduction. If you have a stab in the dark though I'm happy to try a patch as well.
Comment 8 KJ Tsanaktsidis 2023-05-19 11:52:13 UTC
Created attachment 242275 [details]
Isolated reproduction of the issue
Comment 9 KJ Tsanaktsidis 2023-05-19 11:56:04 UTC
Ahah! I managed to make an isolated program which causes the issue. See the file `reproduction.c` I added as an attachment.

This will eventually hang not printing anything, and attaching GDB to the child process (not the parent) will show a similar stack trace going through `_rtld_atfork_pre`:

```
clang -std=c11 reproduction.c -lpthread -o reproduction
while true; do ./reproduction; done;
```

However, this will _NOT_ hang:

```
while true; do LD_BIND_NOW=yes ./reproduction; done;
```

(on my machine, in any case).

The trick seems to be doing a multithreaded fork, followed by _another_ multithreaded fork.

Thanks again for having a look at this Konstantin!
Comment 10 Konstantin Belousov freebsd_committer freebsd_triage 2023-05-20 08:11:32 UTC
Please try the following.  Does the patch help in your settings?
I am esp. interested in the ruby test.

diff --git a/lib/libthr/thread/thr_rtld.c b/lib/libthr/thread/thr_rtld.c
index a9d1924967a7..c9c4871bab54 100644
--- a/lib/libthr/thread/thr_rtld.c
+++ b/lib/libthr/thread/thr_rtld.c
@@ -80,7 +80,7 @@ _thr_rtld_lock_create(void)
        busy_places |= (1 << locki);
 
        l = &lock_place[locki];
-       l->lock.rw_flags = URWLOCK_PREFER_READER;
+//     l->lock.rw_flags = URWLOCK_PREFER_READER;
        return (l);
 }
Comment 11 Konstantin Belousov freebsd_committer freebsd_triage 2023-05-20 08:30:07 UTC
Or better, try https://reviews.freebsd.org/D40178
That should be the final fix, assuming it works.
Comment 12 KJ Tsanaktsidis 2023-05-21 06:28:48 UTC
Yes - https://reviews.freebsd.org/D40178 seems to fix the issue with running the ruby tests. I applied it on top of the `releng/13.2` branch on my system and followed the  I've been running the test in question for half an hour in a loop and no failures yet!

Thanks so much for your help debugging this. I must say - this has been my first experience with anything in BSD-world, and I've really appreciated having the source for the base system all in one giant git repo together. Makes it easy to go spelunking and understand how it all fits together. As someone who recently spent many hours staring at #ifdef soup across glibc, libgcc_s, and the Linux kernel recently, it's been very refreshing!

I don't really know how the backport process works - is your intention to backport this to 13.1/13.2 or would the fix just apply from 13.3 forward? If there's no backport I'll probably add `LD_BIND_NOW=yes` to the Ruby test runner for FreeBSD <= 13.2 - does this seem reasonable?
Comment 13 Konstantin Belousov freebsd_committer freebsd_triage 2023-05-21 11:25:40 UTC
LD_BIND_NOW is not a fix, it mostly avoids taking the bind lock in parallel
with the new thread creating its TLS segment, which also requires the same
lock.

If so critical, I can prepare errata notice for 13.2 after the change sit in
main and stable/13 for some time.  It simply some additional work both for me
and for security team which manages updates on releng branches.
Comment 14 KJ Tsanaktsidis 2023-05-21 12:16:22 UTC
I don't think it's critical that this be patched in 13.2 just for the sake of the Ruby CI tests. If I find someone who actually cares about this in a real application that's a different story I guess.

I realise LD_BIND_NOW isn't a real fix - more just something I could add to the test suite to avoid CI failures (rubyci.org actually still runs builds on FreeBSD 12.4 & 13.1, and it'd be nice if they also stopped failing for this reason).
Comment 15 Natalino Picone 2023-05-22 08:50:05 UTC
(In reply to KJ Tsanaktsidis from comment #14)

My 2ยข.
This long-standing bug affecting FreeBSD starting from 13.0 was not investigated because it was very specific.
I think this should be released asap as it's not strictly related to Ruby, but it's in the thread lib inside the base.
In my case, I had to revert the upgrade to 13.0 because of it.
Comment 16 commit-hook freebsd_committer freebsd_triage 2023-05-22 13:47:15 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=6f49eafb056cfa0703dfc97a731cabe4ed2596b8

commit 6f49eafb056cfa0703dfc97a731cabe4ed2596b8
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2023-05-20 08:11:54 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2023-05-22 13:46:42 +0000

    libthr rtld locks: do not leak URWLOCK_READ_WAITERS into child

    Since there is only the current thread in the child, no pending readers
    exist.  Clear the bit, since it confuses future attempts to acquire
    write ownership of the rtld locks, due to URWLOCK_PREFER_READERS flag.

    To be future-proof, clear all state about pending writers and readers.

    PR:     271490
    Reported and tested by: KJ Tsanaktsidis <kj@kjtsanaktsidis.id.au>
    Reviewed by:    markj
    Sponsored by:   The FreeBSD Foundation
    MFC after:      1 week
    Differential revision:  https://reviews.freebsd.org/D40178

 lib/libthr/thread/thr_fork.c    |  4 ++++
 lib/libthr/thread/thr_private.h |  2 ++
 lib/libthr/thread/thr_rtld.c    | 11 +++++++++++
 3 files changed, 17 insertions(+)
Comment 17 commit-hook freebsd_committer freebsd_triage 2023-05-29 03:21:38 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=5b45cd83c44edb9386eed1155acd3c52f36bf12c

commit 5b45cd83c44edb9386eed1155acd3c52f36bf12c
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2023-05-20 08:11:54 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2023-05-29 03:20:42 +0000

    libthr rtld locks: do not leak URWLOCK_READ_WAITERS into child

    PR:     271490

    (cherry picked from commit 6f49eafb056cfa0703dfc97a731cabe4ed2596b8)

 lib/libthr/thread/thr_fork.c    |  4 ++++
 lib/libthr/thread/thr_private.h |  2 ++
 lib/libthr/thread/thr_rtld.c    | 11 +++++++++++
 3 files changed, 17 insertions(+)