Bug 233204 - rtld issue on aarch64
Summary: rtld issue on aarch64
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: arm (show other bugs)
Version: CURRENT
Hardware: arm64 Any
: --- Affects Many People
Assignee: Michal Meloun
Depends on:
Blocks: 228892 232149 232311
  Show dependency treegraph
Reported: 2018-11-13 18:04 UTC by Mikael Urankar
Modified: 2019-01-16 19:28 UTC (History)
5 users (show)

See Also:

test program (1.02 KB, text/plain)
2018-11-13 18:04 UTC, Mikael Urankar
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mikael Urankar freebsd_committer 2018-11-13 18:04:45 UTC
Created attachment 199211 [details]
test program

I'm having the following crash in rtld on aarch64 when a program uses dlopen, pthread and tls variables with the test program available at [1] :

Program terminated with signal SIGSEGV, Segmentation fault.
#0  free_tls (tcb=0x4028e010, tcbsize=16, tcbalign=16) at /usr/src/libexec/rtld-elf/rtld.c:4842
4842        dtvsize = dtv[1];

(gdb) bt
#0  free_tls (tcb=0x4028e010, tcbsize=16, tcbalign=16) at /usr/src/libexec/rtld-elf/rtld.c:4842
#1  0x0000000040235910 in _rtld_free_tls (tcb=0x4028e010, tcbsize=16, tcbalign=<optimized out>)
   at /usr/src/libexec/rtld-elf/rtld.c:5062
#2  0x00000000402acde4 in _thr_free (curthread=0x406c4000, thread=0x406c4500) at /usr/src/lib/libthr/thread/thr_list.c:199
#3  0x00000000402accf0 in _thr_gc (curthread=0x406c4000) at /usr/src/lib/libthr/thread/thr_list.c:129
#4  0x00000000402ad164 in _thr_alloc (curthread=0x406c4000) at /usr/src/lib/libthr/thread/thr_list.c:141
#5  0x00000000402a2124 in _pthread_create (thread=0xffffffffe948, attr=0x0, start_routine=0x406d906c <do_something>, arg=0x0)
   at /usr/src/lib/libthr/thread/thr_create.c:81
#6  0x0000000000210364 in main ()

(gdb) p *0x4028e010
$1 = 666

The tcb points to my __thread variable which seems wrong.

I don't have the knowledge to debug this problem further so any help will be greatly appreciated.

It crashes on 11.2-RELEASE and 13.0-CURRENT r340197.

[1]  http://mikael.urankar.free.fr/FreeBSD/aarch64/test.c
Comment 1 Greg V 2018-11-13 20:04:35 UTC
Is this strictly dlopen related?

I'm seeing a weird segfault in LDC: https://github.com/ldc-developers/druntime/pull/146#issuecomment-412674323 — doesn't look similar, the crash is not *in* rtld, but it's still something related to aarch64+TLS
Comment 2 Michal Meloun freebsd_committer 2018-11-14 04:49:51 UTC
Yep, TLS relocation for dlopened libraries is broken on Aarch64.
Can you test (still WIP) patch?
Comment 3 Mikael Urankar freebsd_committer 2018-11-14 16:03:44 UTC
(In reply to Michal Meloun from comment #2)
Unfortunately it still crashes, the backtrace is different:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00000000406d9094 in do_something (arg=0x0) at test_lib.c:8
8               a = 666;
[Current thread is 1 (LWP 100642)]
(gdb) bt
#0  0x00000000406d9094 in do_something (arg=0x0) at test_lib.c:8
#1  0x00000000402a28c4 in thread_start (curthread=0x406c5900) at /usr/src/lib/libthr/thread/thr_create.c:292
#2  0x00000000402a2470 in _pthread_create (thread=0xffffffffe988, attr=<optimized out>, start_routine=<optimized out>, arg=<optimized out>) at /usr/src/lib/libthr/thread/thr_create.c:188
#3  0x0000000000210394 in main () at test.c:33
Comment 4 Mikael Urankar freebsd_committer 2018-11-14 16:10:24 UTC
your current wip patch fixes bug #232149 :)
Comment 5 Michal Meloun freebsd_committer 2018-12-12 12:08:53 UTC
Final (and much more complex) version of this patch is under review now:

Comment 6 commit-hook freebsd_committer 2018-12-15 10:39:07 UTC
A commit references this bug:

Author: mmel
Date: Sat Dec 15 10:38:10 UTC 2018
New revision: 342113
URL: https://svnweb.freebsd.org/changeset/base/342113

  Improve R_AARCH64_TLSDESC relocation.
  The original code did not support dynamically loaded libraries and used
  suboptimal access to TLS variables.
  New implementation removes lazy resolving of TLS relocation - due to flaw
  in TLSDESC design is impossible to switch resolver function at runtime
  without expensive locking.

  Due to this, 3 specialized resolvers are implemented:
   - load time resolver for TLS relocation from libraries loaded with main
     executable (thus with known TLS offset).
   - resolver for undefined thread weak symbols.
   - slower lazy resolver for dynamically loaded libraries with fast path for
     already resolved symbols.

  PR:		228892, 232149, 233204, 232311
  MFC after:	2 weeks
  Differential Revision:	https://reviews.freebsd.org/D18417

Comment 7 Torsten Zuehlsdorff freebsd_committer 2019-01-16 16:49:05 UTC
Is the issue solved after the commit? Than please close the PR.

Currently it blocks 232311 which is already solved by this commit. But it can not be closed with this issue being closed. ;)