Created attachment 155764 [details] The test source. The attached source unexpectedly hangs on sysconf(_SC_OPEN_MAX) when run under FreeBSD localhost 11.0-CURRENT FreeBSD 11.0-CURRENT #0 r280598: Wed Mar 25 15:54:11 UTC 2015 root@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64 in a VirtualBox session. How to reproduce: $ cat run.sh clang test.cc -lpthread -lc++ || exit 1 while true; do date ./a.out done $ ./run.sh The problem usually expose itself after a dozen of iterations. Actual output: $ ./run.sh Mon Apr 20 13:53:30 EEST 2015 before sysconf() Expected output: Mon Apr 20 13:53:30 EEST 2015 before sysconf() after sysconf() before sysconf() after sysconf() ...
Reproduced on FreeBSD 10.1-STABLE r280427+86df2de(stable-10) on the second try. procstat shows: PID TID COMM TDNAME KSTACK 29430 101383 a.out - mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d kern_wait6+0x5f4 sys_wait4+0x72 amd64_syscall+0x357 Xfast_syscall+0xfb 29431 104036 a.out - mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d umtxq_sleep+0x125 do_rw_rdlock+0x39b __umtx_op_rw_rdlock+0x4b amd64_syscall+0x357 Xfast_syscall+0xfb
Pointed out by kib, when invoking the fork syscall directly the child inherits potentially locked libc mutexes, and it's probably the printf that hangs, not the sysconf. What was the original underlying issue here?
userland backtraces: (lldb) bt * thread #1: tid = 103971, 0x0000000800f8e198 libc.so.7`__syscall + 8 at __syscall.S:3 * frame #0: 0x0000000800f8e198 libc.so.7`__syscall + 8 at __syscall.S:3 frame #1: 0x0000000000400e32 a.out`zz_fork(file=0x0000000000401222, line=269) + 114 at pr199557.cc:244 frame #2: 0x0000000000400f7b a.out`zz_pthread_create(th=0x00007fffffffe528, attr=0x0000000000000000, callback=0x0000000000401090, param=0x0000000000000000) + 235 at pr199557.cc:269 frame #3: 0x00000000004010dc a.out`main + 44 at pr199557.cc:294 frame #4: 0x000000000040087f a.out`_start(ap=<unavailable>, cleanup=<unavailable>) + 367 at crt1.c:78 (lldb) bt * thread #1: tid = 103861, 0x0000000800833b3c libthr.so.3 at _umtx_op_err.S:37 * frame #0: 0x0000000800833b3c libthr.so.3 at _umtx_op_err.S:37 frame #1: 0x00000008008314dc libthr.so.3`_thr_rtld_rlock_acquire [inlined] _thr_rwlock_rdlock(flags=<unavailable>, tsp=<unavailable>) + 76 at thr_umtx.h:196 frame #2: 0x0000000800831490 libthr.so.3`_thr_rtld_rlock_acquire(lock=0x0000000800a42680) + 64 at thr_rtld.c:123 frame #3: 0x000000080060bf42 ld-elf.so.1`rlock_acquire(lock=0x000000080081f860, lockstate=0x00007fffffffe358) + 50 at rtld_lock.c:197 frame #4: 0x0000000800605c5d ld-elf.so.1`_rtld_bind(obj=0x0000000800621000, reloff=0) + 45 at rtld.c:679 frame #5: 0x00000008006034fd ld-elf.so.1`??? + 45 at rtld_start.S:121 frame #6: 0x0000000000400d9b a.out`zz_sysconf(file=0x0000000000401222, line=269) + 43 at pr199557.cc:227 frame #7: 0x0000000000400e01 a.out`zz_fork(file=0x0000000000401222, line=269) + 65 at pr199557.cc:240 frame #8: 0x0000000000400f7b a.out`zz_pthread_create(th=0x00007fffffffe528, attr=0x0000000000000000, callback=0x0000000000401090, param=0x0000000000000000) + 235 at pr199557.cc:269 frame #9: 0x00000000004010dc a.out`main + 44 at pr199557.cc:294 frame #10: 0x000000000040087f a.out`_start(ap=<unavailable>, cleanup=<unavailable>) + 367 at crt1.c:78
See http://svnweb.freebsd.org/changeset/base/266609 for the fix for this issue in the fork() case. In any case calling the fork syscall directly is going to be error-prone, so it would be useful to have a brief description of how tsan intends to interact with a target process fork.
(In reply to Ed Maste from comment #2) > Pointed out by kib, when invoking the fork syscall directly the child > inherits potentially locked libc mutexes, and it's probably the printf > that hangs, not the sysconf. Note that the code doesn't leverage libc's printf(), but rather uses its own version of it. > What was the original underlying issue here? The original problem is random Tsan tests hanging on high loaded machines like the FreeBSD sanitizers buildbot: http://lab.llvm.org:8011/builders/sanitizer_x86_64-freebsd/builds/4978/steps/make-check-tsan/logs/stdio It mostly work fine when the tests run in about 4 threads, but tends to fail constantly when run in about 64 threads--doesn't matter how much cores the hardware has. This is how it leads to the point of hang: during execution of a Tsan tests the Tsan STL catches a data race or other condition triggering a run-time report. That report contains references to related object files and associated addresses. These file names together with the addresses are then translated into source file name, line numbers and function names by calling the llvm-symbolizer or, if the RTL is unable to locate it, the addr2line command. To catch the output of the symbolizing tool the RTL fork() the process and redirects the streams. This is where the whole thing hangs. Note is that if I comment the sysconf(_SC_OPEN_MAX) call, then it hangs on one of the other subsequent system calls, like read() so it appears to not to be a sysconf()-specific issue. Another note is that the RTL sources refer to a specific reason about why it tries to use the syscall fork() in place of the libc's intercepted one: // Real fork() may call user callbacks registered with pthread_atfork(). pid = internal_fork(); The internal_fork() function is just the syscall: int internal_fork() { #if SANITIZER_USES_CANONICAL_LINUX_SYSCALLS return internal_syscall(SYSCALL(clone), SIGCHLD, 0); #else return internal_syscall(SYSCALL(fork)); #endif }
(In reply to ikosarev from comment #5) It is actually rtld' locks which are inherited blocked and thus symbol resolution is non-functional. As a quick test, try your test with LD_BIND_NOW=1 env var set. Direct calls to the fork syscall from the threading process is not supposed to work.
There is a proposal for an async-signal safe version of fork() called _Fork(), which does not call atfork handlers, at http://austingroupbugs.net/view.php?id=62 . This would help if the only problem with calling fork() is that it executes atfork handlers. It still executes a fair bit of code, but no user code. To make _Fork() async-signal safe, the malloc handling would have to be disabled as well, making malloc/free in the child more unsafe (but also interfering less with other threads in the parent). The handling of the lock for sem_open() and sem_close() uses pthread_atfork() and would be disabled as well. This may be useful for this and other situations that want to fork from signal handlers or other strange thread states. I have not found common implementations of _Fork(), though.