Bug 254927

Summary: rtld: dangling locks after fix for bug 254774
Product: Base System Reporter: Eric van Gyzen <vangyzen>
Component: binAssignee: Konstantin Belousov <kib>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: ---    
Version: CURRENT   
Hardware: Any   
OS: Any   
Bug Depends on: 254774    
Bug Blocks:    

Description Eric van Gyzen freebsd_committer freebsd_triage 2021-04-09 19:33:30 UTC
After the fix for bug 254774, some processes hang due to apparent dangling locks in rtld.  Reverting commit d36d68161517 fixes the hang.

I first saw the hang with gdk-pixbuf-query-loaders.  (This is called from some package post-install scripts, so I first noticed it when running tools/build/beinstall.sh.)  Simply run it without arguments.  I also see it with thunderbird.

Here are some notes from debugging:

98991  0  I+   0:00.00   `-- /bin/sh ./tools/build/beinstall.sh
58781  0  I+   0:00.00     `-- pkg upgrade
58782  0  S+   0:15.73       `-- pkg upgrade
59415  0  I+   0:00.00         `-- /bin/sh -c set -- libavif-0.9.0_1\nif ! /usr/sbin/service ldconfig restart >/dev/null; then\n\011\011if [ -z "${INSTALL_AS_USER}" ]; then\n\011\011\011exit 1\n\011\011fi\n\011fi\n/usr/local/bin/gdk-pixbuf-query-loaders > /dev/null 2>&1 && /usr/local/bin/gdk-pixbuf-query-loaders > /usr/local/lib/gdk-pixbuf-2.0/2.10.0/loaders.cache 2>/dev/null || /usr/bin/true
59428  0  I+   0:00.02           `-- /usr/local/bin/gdk-pixbuf-query-loaders (gdk-pixbuf-query-lo)

[9/41] Extracting librsvg2-rust-2.50.3_2: 100%
(hang)

[28/41] Extracting libavif-0.9.0_1: 100%
(hang)

# gdb /usr/local/bin/gdk-pixbuf-query-loaders 58871
GNU gdb (GDB) 10.1 [GDB v10.1 for FreeBSD]
[...]
(gdb) info thr
  Id   Target Id                                  Frame
* 1    LWP 114003 of process 58871                _umtx_op_err ()
    at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:40
  2    LWP 178846 of process 58871 "pool-spawner" _umtx_op_err ()
    at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:40
  3    LWP 178847 of process 58871 "gmain"        0x000000080071680a in ?? ()
   from /lib/libc.so.7
(gdb) bt
#0  _umtx_op_err () at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:40
#1  0x00000008005c6830 in __thr_rwlock_wrlock
    (rwlock=0x8005cac80, tsp=<optimized out>)
    at /usr/src/lib/libthr/thread/thr_umtx.c:327
#2  0x00000008005c0f11 in _thr_rwlock_wrlock
    (rwlock=<optimized out>, tsp=<optimized out>)
    at /usr/src/lib/libthr/thread/thr_umtx.h:241
#3  _thr_rtld_wlock_acquire (lock=0x8005cac80)
    at /usr/src/lib/libthr/thread/thr_rtld.c:143
#4  0x0000000800216b3b in wlock_acquire
    (lock=0x800224640 <rtld_locks>, lockstate=lockstate@entry=0x7fffffffca90)
    at /usr/src/libexec/rtld-elf/rtld_lock.c:248
#5  0x0000000800211e71 in tls_get_addr_slow
    (dtvp=<optimized out>, index=3, offset=0)
    at /usr/src/libexec/rtld-elf/rtld.c:4878
#6  0x0000000800210df9 in rtld_fill_dl_phdr_info
    (obj=0x0, phdr_info=0x7fffffffcbf0)
    at /usr/src/libexec/rtld-elf/rtld.c:3896
#7  _rtld_addr_phdr (addr=0x802e1a200, phdr_info=0x7fffffffcbf0)
    at /usr/src/libexec/rtld-elf/rtld.c:3781
#8  0x000000080073f565 in __cxa_finalize (dso=0x8005cac80)
    at /usr/src/lib/libc/stdlib/atexit.c:209
#9  0x0000000802d6ae8c in  () at /usr/local/lib/libpixman-1.so.0
#10 0x0000000800c17808 in  ()
#11 0x0000000800c14008 in  ()
#12 0x00007fffffffd110 in  ()
#13 0x0000000802e10945 in _fini () at /usr/local/lib/libpixman-1.so.0
#14 0x00007fffffffd110 in  ()
#15 0x00000008002162a4 in objlist_call_fini
    (list=<optimized out>, root=<optimized out>, lockstate=0x800c16008)
    at /usr/src/libexec/rtld-elf/rtld.c:2766
(gdb) thr 2
[Switching to thread 2 (LWP 178846 of process 58871)]
#0  _umtx_op_err () at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:40
40      RSYSCALL_ERR(_umtx_op)
(gdb) bt
#0  _umtx_op_err () at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:40
#1  0x00000008005c6660 in _thr_umtx_timedwait_uint
    (mtx=0x80023d008, id=id@entry=0, clockid=<optimized out>, abstime=<optimized out>, shared=<optimized out>, shared@entry=0)
    at /usr/src/lib/libthr/thread/thr_umtx.c:236
#2  0x00000008005bd119 in _thr_sleep
    (curthread=curthread@entry=0x801012500, clockid=0, abstime=abstime@entry=0x0) at /usr/src/lib/libthr/thread/thr_kern.c:199
#3  0x00000008005b869f in cond_wait_user
    (cvp=0x801018e80, mp=0x800c22c88, abstime=0x0, cancel=1)
    at /usr/src/lib/libthr/thread/thr_cond.c:320
#4  cond_wait_common
    (cond=<optimized out>, mutex=<optimized out>, abstime=0x0, cancel=1)
    at /usr/src/lib/libthr/thread/thr_cond.c:380
#5  0x0000000800377185 in g_cond_wait () at /usr/local/lib/libglib-2.0.so.0
#6  0x00000008002eec35 in  () at /usr/local/lib/libglib-2.0.so.0
#7  0x000000080034e1f8 in  () at /usr/local/lib/libglib-2.0.so.0
#8  0x000000080034dbf2 in  () at /usr/local/lib/libglib-2.0.so.0
#9  0x00000008005b982b in thread_start (curthread=0x801012500)
    at /usr/src/lib/libthr/thread/thr_create.c:292
#10 0x0000000000000000 in  ()
(gdb) thr 3
[Switching to thread 3 (LWP 178847 of process 58871)]
#0  0x000000080071680a in ?? () from /lib/libc.so.7
(gdb) btr
Undefined command: "btr".  Try "help".
(gdb) bt
#0  0x000000080071680a in  () at /lib/libc.so.7
#1  0x00000008005c4f56 in __thr_poll (fds=0x801008210, nfds=1, timeout=-1)
    at /usr/src/lib/libthr/thread/thr_syscalls.c:338
#2  0x0000000800323471 in  () at /usr/local/lib/libglib-2.0.so.0
#3  0x0000000800323596 in g_main_context_iteration ()
    at /usr/local/lib/libglib-2.0.so.0
#4  0x00000008003255d1 in  () at /usr/local/lib/libglib-2.0.so.0
#5  0x000000080034dbf2 in  () at /usr/local/lib/libglib-2.0.so.0
#6  0x00000008005b982b in thread_start (curthread=0x801012a00)
    at /usr/src/lib/libthr/thread/thr_create.c:292
#7  0x0000000000000000 in  ()
(gdb) thr 1
[Switching to thread 1 (LWP 114003 of process 58871)]
#0  _umtx_op_err () at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:40
40      RSYSCALL_ERR(_umtx_op)
(gdb) f 5
#5  0x0000000800211e71 in tls_get_addr_slow (dtvp=<optimized out>, index=3,
    offset=0) at /usr/src/libexec/rtld-elf/rtld.c:4878
(gdb) p rtld_bind_lock
$1 = (rtld_lock_t) 0x800224640 <rtld_locks>
(gdb) p *rtld_bind_lock
$2 = {handle = 0x8005cac80, mask = 1}
(gdb) p *(struct rtld_lock*)rtld_bind_lock.handle
$4 = {lock = {rw_state = 1, rw_flags = 2, rw_blocked_readers = 0,
    rw_blocked_writers = 0, rw_spare = {0, 0, 0, 0}},
  _pad = '\000' <repeats 31 times>}
(gdb) p dtv[0]
$5 = 3
(gdb) p tls_dtv_generation
$6 = 3
(gdb) p index
$7 = 3
(gdb) p dtv[index + 1]
$8 = 0
(gdb) f 3
#3  _thr_rtld_wlock_acquire (lock=0x8005cac80)
    at /usr/src/lib/libthr/thread/thr_rtld.c:143
143             while (_thr_rwlock_wrlock(&l->lock, NULL) != 0)
(gdb) set $l = (struct rtld_lock *)lock
(gdb) p $l->lock
$11 = {rw_state = 1, rw_flags = 2, rw_blocked_readers = 0,
  rw_blocked_writers = 0, rw_spare = {0, 0, 0, 0}}
Comment 1 Eric van Gyzen freebsd_committer freebsd_triage 2021-04-15 22:09:51 UTC
This is probably fixed by
    https://reviews.freebsd.org/D29623
I haven't verified, but reports on the mailing lists sound very similar to mine.