Created attachment 221461 [details] NSS module and test program This bug is discovered on FreeBSD 13-CURRENT but can also be reproduced on 12.2-RELEASE. It causes a process to hung when fork(2) is called and specific NSS (Name Switch Service) module is used. How to reproduce: 1) Download archive in the attachment. 2) Compile NSS stub module (do not forget .1 at the end of compiled module): cc -shared -fPIC -pthread -o nss_stub.so.1 nss_stub.c 3) Copy nss_stub.so.1 to /usr/local/lib 4) Edit /etc/nsswitch.conf and replace 'hosts: files dns' witch 'hosts: files dns stub' 5) Compile test program: cc -o bug bug.c 6) Run it, it will hang, so even killall -9 bug won't kill it. There is a small and unpleasant discussion on freebsd-net mailing list with Konstantin Belousov who wanted me to reproduce this bug without editing /etc/nsswitch.conf I think it's either impossible, because NSS system is somehow messing with fork, or it's beyond my competence. So the provided way to reproduce the bug is as minimal as I can get.
I can reproduce this 100% of the time on a -current VM using the supplied test code. I noticed a few things: --for me, the parent process seems to be hanging during fork(); I see no evidence the child process is ever spawned. --wmesg for the process is 'umtxn', and ddb shows what looks like the main thread attempting to take a userspace lock, going through umtxq_lock(), and sleeping in sleepq_wait_sig() --I tried to write a smaller test program to reproduce the failure by simulating the locking done by the NS dispatcher and the pthread_create() issued by the stub, but this did not reproduce the hang. --However, if I just link the original test program against libpthread ('cc -o bug -pthread bug.c), then I can no longer reproduce the hang. This tells me the problem might have something to do with some bit of static umtx initialization that happens when linking against libpthread/libthr. If this initialization hasn't happened by the time the NS dispatcher (which loads the stub through dlopen()) is invoked, then fork() ends up stuck in a umtx wait that never gets signaled. It might also be related to the __isthreaded checks made by lib/libc/net/nsdispatch.c, which smell fishy to me. At the very least, it might be possible to make a smaller repro case by writing a test program (that does not link libpthread) which dlopen()s a simple library (which does link libpthread) and calls an entry point that spawns a thread.
This is a smaller case: just a library linked against pthread which spawns a thread and a main program which loads that library and forks. Instructions are similar: 1) Download newtest.tar.gz 2) Compile the library: cc -shared -fPIC -pthread -o bug-lib.so bug-lib.c 3) Compile the main program: cc -o demo demo.c 4) Run it: env LD_LIBRARY_PATH=. ./demo
Created attachment 221484 [details] The new test case
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=85d028223bc2768651f4d44881644ceb5dc2a664 commit 85d028223bc2768651f4d44881644ceb5dc2a664 Author: Konstantin Belousov <kib@FreeBSD.org> AuthorDate: 2021-01-12 09:02:37 +0000 Commit: Konstantin Belousov <kib@FreeBSD.org> CommitDate: 2021-01-12 10:45:44 +0000 libthr malloc: support recursion on thr_malloc_umtx. One possible way the recursion can happen is during fork: suppose that fork is called from early code that did not triggered jemalloc(3) initialization yet. Then we lock thr_malloc lock, and call malloc_prefork() that might require initialization of jemalloc pthread_mutexes, calling into libthr malloc. It is safe to allow recursion for this occurence. PR: 252579 Reported by: Vasily Postnicov <shamaz.mazum@gmail.com> MFC after: 1 week Sponsored by: The FreeBSD Foundation lib/libthr/thread/thr_malloc.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-)
Thanks, I confirm that bug is fixed.
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=20432a4fa157be15465e3aefc7977b494c812584 commit 20432a4fa157be15465e3aefc7977b494c812584 Author: Konstantin Belousov <kib@FreeBSD.org> AuthorDate: 2021-01-12 09:02:37 +0000 Commit: Konstantin Belousov <kib@FreeBSD.org> CommitDate: 2021-01-18 05:11:07 +0000 libthr malloc: support recursion on thr_malloc_umtx. PR: 252579 (cherry picked from commit 85d028223bc2768651f4d44881644ceb5dc2a664) lib/libthr/thread/thr_malloc.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-)