See Dimitry's thread at http://lists.llvm.org/pipermail/release-testers/2017-August/000475.html. This started after the jemalloc update in r319971 https://reviews.freebsd.org/rS319971 ref12-i386% clang --version FreeBSD clang version 5.0.0 (trunk 308421) (based on LLVM 5.0.0svn) ... hello.c is just a printf("hello world"); ref12-i386% cc -g -fsanitize=address hello.c ref12-i386% ./a.out ==14688==AddressSanitizer CHECK failed: /usr/src/contrib/compiler-rt/lib/asan/asan_poisoning.cc:36 "((AddrIsAlignedByGranularity(addr))) != (0)" (0x0, 0x0) #0 0x806f0e3 (/home/emaste/a.out+0x806f0e3) #1 0x805dcc3 (/home/emaste/a.out+0x805dcc3) #2 0x80dfc35 (/home/emaste/a.out+0x80dfc35) #3 0x806961d (/home/emaste/a.out+0x806961d) #4 0x80698bf (/home/emaste/a.out+0x80698bf) #5 0x806ec73 (/home/emaste/a.out+0x806ec73) #6 0x8092785 (/home/emaste/a.out+0x8092785) ref12-i386%
I did a bit of further research, adding some instrumentation to ASan. In particular, to its initialization in ClearShadowForThreadStackAndTLS() here: https://github.com/llvm-mirror/compiler-rt/blob/master/lib/asan/asan_thread.cc#L301 and the TLS support here: https://github.com/llvm-mirror/compiler-rt/blob/master/lib/sanitizer_common/sanitizer_linux_libcdep.cc#L358 It turns out that the beginning of its TLS area (tls_begin) is not aligned to 8 bytes anymore, after r319971: $ asprintf.cc.tmp ==65176==DEBUG lib/asan/asan_thread.cc(228): stack_top=0x00000000, stack_bottom=0x00000000 ==65176==DEBUG lib/sanitizer_common/sanitizer_linux_libcdep.cc(358): segbase=0x28423a0c ==65176==DEBUG lib/sanitizer_common/sanitizer_linux_libcdep.cc(368): dtv[0]=0x00000002, dtv[1]=0x00000002, dtv[2]=0x284239f4 ==65176==DEBUG lib/sanitizer_common/sanitizer_linux_libcdep.cc(402): tls_addr=0x284239f4, tls_size=0x00000018 ==65176==DEBUG lib/sanitizer_common/sanitizer_linux_libcdep.cc(407): stack_top=0xbfbdf000, stack_bottom=0xbfbff000 ==65176==DEBUG lib/asan/asan_thread.cc(235): stack_top=0xbfbdf000, stack_bottom=0xbfbff000 ==65176==DEBUG lib/asan/asan_thread.cc(292): stack_top=0xbfbdf000, stack_bottom=0xbfbff000 ==65176==DEBUG lib/asan/asan_thread.cc(296): tls_begin=0x284239f4, tls_end=0x28423a0c ==65176==AddressSanitizer CHECK failed: lib/asan/asan_poisoning.cc:36 "((AddrIsAlignedByGranularity(addr))) != (0)" (0x0, 0x0) #0 0x80b5850 in __asan::AsanCheckFailed(char const*, int, char const, unsigned long long, unsigned long long) lib/asan/asan_rtl.cc:69:3 #1 0x80c76ca in __sanitizer::CheckFailed(char const*, int, char const, unsigned long long, unsigned long long) lib/sanitizer_common/sanitizer_termination.cc:79:5 #2 0x80af170 in __asan::PoisonShadow(unsigned long, unsigned long, unsigned char) lib/asan/asan_poisoning.cc:36:3 #3 0x80b7581 in ClearShadowForThreadStackAndTLS lib/asan/asan_thread.cc:298:5 #4 0x80b7581 in __asan::AsanThread::Init(void) lib/asan/asan_thread.cc:240 #5 0x80b774d in __asan::AsanThread::ThreadStart(unsigned long, __sanitizer::atomic_uintptr_t*) lib/asan/asan_thread.cc:249:3 #6 0x80b54c5 in __asan::AsanInitInternal(void) lib/asan/asan_rtl.cc:591:16 #7 0x807c7aa in clock_gettime lib/sanitizer_common/sanitizer_common_interceptors.inc:1995:3 E.g in its GetTls() routine it retrieves segbase (probably from %gs), then derives the DTV array from that, and uses the third element as the TLS base address. In this particular case it ends up at 0x284239f4, which is not 8-byte aligned, causing the internal CHECK failure. In contrast, with older libc, from r319970 or before, the TLS base is aligned to 8 bytes, whether that is by accident or on purpose, I don't know: $ LD_LIBRARY_PATH=/usr/obj/usr/src/lib/libc.r319970 asprintf.cc.tmp ==65175==DEBUG lib/asan/asan_thread.cc(228): stack_top=0x00000000, stack_bottom=0x00000000 ==65175==DEBUG lib/sanitizer_common/sanitizer_linux_libcdep.cc(358): segbase=0x281212f8 ==65175==DEBUG lib/sanitizer_common/sanitizer_linux_libcdep.cc(368): dtv[0]=0x00000002, dtv[1]=0x00000002, dtv[2]=0x281212e0 ==65175==DEBUG lib/sanitizer_common/sanitizer_linux_libcdep.cc(402): tls_addr=0x281212e0, tls_size=0x00000018 ==65175==DEBUG lib/sanitizer_common/sanitizer_linux_libcdep.cc(407): stack_top=0xbfbdf000, stack_bottom=0xbfbff000 ==65175==DEBUG lib/asan/asan_thread.cc(235): stack_top=0xbfbdf000, stack_bottom=0xbfbff000 ==65175==DEBUG lib/asan/asan_thread.cc(292): stack_top=0xbfbdf000, stack_bottom=0xbfbff000 ==65175==DEBUG lib/asan/asan_thread.cc(296): tls_begin=0x281212e0, tls_end=0x281212f8 x1 1x DONE And then all the ASan machinery works fine. I have tried to follow how the DTV array is filled, but I end up in rtld instead of libc, so I don't fully understand how replacing libc can make a difference in this case. Konstantin, maybe you can shed some light? Of course I know that there is no guarantee at all for any other alignment than 4 bytes on i386, but it seems that we have somehow always managed (until now) to align the TLS base addresses (at least for ASan's threads) to 8 bytes.... Could we do some sort of compat hack to make this work? Because after r319971, all executables with ASan fail in the above manner, even those compiled long ago with earlier versions of clang (and maybe even gcc).
(In reply to Dimitry Andric from comment #1) Module tls is allocated in rtld.c:allocate_module_tls(). We use the requested alignment of the TLS initialization segment as the guaranteed alignment value. There are several possibilities how something might not get aligned: 1. The segment has inappropriate alignment requirement, check with readelf. 2. The symbol inside the TLS segment is unaligned, so the segment alignment does not help. Again, check with readelf. 3. The allocate_module_tls() is buggy. Probably check this last, by inserting rtld_printf() into rtld code. I would bet that the cause is either 1 or 2, esp. because this sounds like a deja vu, isn't it ?
Looking at libclang_rt.asan-i386.so it does have PT_TLS with 4-byte alignment, and perhaps that should be changed on general principle: % readelf -l /usr/lib/clang/5.0.0/lib/freebsd/libclang_rt.asan-i386.so Elf file type is DYN (Shared object file) Entry point 0xbb80 There are 6 program headers, starting at offset 52 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x000000 0x00000000 0x00000000 0xc21c4 0xc21c4 R E 0x1000 LOAD 0x0c21c4 0x000c31c4 0x000c31c4 0x02f08 0x49a58c RW 0x1000 DYNAMIC 0x0c2470 0x000c3470 0x000c3470 0x000f8 0x000f8 RW 0x4 TLS 0x0c21c4 0x000c31c4 0x000c31c4 0x00000 0x00018 R 0x4 GNU_EH_FRAME 0x0b5a50 0x000b5a50 0x000b5a50 0x02934 0x02934 R 0x4 GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4 Section to Segment mapping: Segment Sections... 00 .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame .comment .gnu_debuglink .shstrtab 01 .tbss .ctors .dtors .jcr .data.rel.ro .dynamic .got .got.plt .data .bss 02 .dynamic 03 .tbss .ctors 04 .eh_frame_hdr 05 ref12-i386% but note that asan is not asserting that one of its symbols is not 8-byte aligned but rather (some #ifdefs trimmed, showing only the FreeBSD/i386 path): static void **ThreadSelfSegbase() { void **segbase = 0; // sysarch(I386_GET_GSBASE, segbase); __asm __volatile("mov %%gs:0, %0" : "=r" (segbase)); return segbase; } uptr ThreadSelf() { return (uptr)ThreadSelfSegbase()[2]; } static void GetTls(uptr *addr, uptr *size) { void** segbase = ThreadSelfSegbase(); *addr = 0; *size = 0; if (segbase != 0) { // tcbalign = 16 // tls_size = round(tls_static_space, tcbalign); // dtv = segbase[1]; // dtv[2] = segbase - tls_static_space; void **dtv = (void**) segbase[1]; *addr = (uptr) dtv[2]; *size = (*addr == 0) ? 0 : ((uptr) segbase[0] - (uptr) dtv[2]); } } and later asserts that the addr returned from GetTls is 8-byte aligned
(In reply to Ed Maste from comment #3) So it asserts that the TLS segment for module with the index 1 is 8-bytes aligned. Which is strange, indeed.
(In reply to Ed Maste from comment #3) > Looking at libclang_rt.asan-i386.so it does have PT_TLS with 4-byte > alignment, and perhaps that should be changed on general principle: > > % readelf -l /usr/lib/clang/5.0.0/lib/freebsd/libclang_rt.asan-i386.so Unfortunately, in most cases this .so is not used, but libclang_rt.asan-i386.a is statically linked into the asan-instrumented executable. For instance, I have this asan-instrumented executable, which is built on FreeBSD 10/i386, using clang 4.0.1: $ ldd ./asantest-f10-i386 ./asantest-f10-i386: libthr.so.3 => /lib/libthr.so.3 (0x28124000) librt.so.1 => /usr/lib/librt.so.1 (0x28149000) libm.so.5 => /lib/libm.so.5 (0x2814f000) libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x2817a000) libc.so.7 => /lib/libc.so.7 (0x2818f000) It has these program headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align PHDR 0x000034 0x08048034 0x08048034 0x00140 0x00140 R E 0x4 INTERP 0x000174 0x08048174 0x08048174 0x00015 0x00015 R 0x1 [Requesting program interpreter: /libexec/ld-elf.so.1] LOAD 0x000000 0x08048000 0x08048000 0xb2cec 0xb2cec R E 0x1000 LOAD 0x0b36fc 0x080fc6fc 0x080fc6fc 0x02df4 0x459ed4 RW 0x1000 DYNAMIC 0x0b39a8 0x080fc9a8 0x080fc9a8 0x00110 0x00110 RW 0x4 NOTE 0x00018c 0x0804818c 0x0804818c 0x00030 0x00030 R 0x4 TLS 0x0b36fc 0x080fc6fc 0x080fc6fc 0x00000 0x00018 R 0x4 GNU_EH_FRAME 0x09dc48 0x080e5c48 0x080e5c48 0x028fc 0x028fc R 0x4 GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RWE 0x10 GNU_RELRO 0x0b36fc 0x080fc6fc 0x080fc6fc 0x00904 0x00904 R 0x1 Most align fields are set to 4, which seems logical on i386. Meanwhile, I compared the program headers of libc.so.7 r319970: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x000000 0x00000000 0x00000000 0x1660f8 0x1660f8 R E 0x1000 LOAD 0x1660f8 0x001670f8 0x001670f8 0x06918 0x1b820 RW 0x1000 DYNAMIC 0x168dd8 0x00169dd8 0x00169dd8 0x000c8 0x000c8 RW 0x4 TLS 0x1660f8 0x001670f8 0x001670f8 0x00040 0x0005c R 0x4 GNU_EH_FRAME 0x165cf8 0x00165cf8 0x00165cf8 0x000dc 0x000dc R 0x4 GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4 and of libc.so.7 r319971: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x000000 0x00000000 0x00000000 0x190368 0x190368 R E 0x1000 LOAD 0x190368 0x00191368 0x00191368 0x07ab8 0x274e0 RW 0x1000 DYNAMIC 0x194148 0x00195148 0x00195148 0x000d0 0x000d0 RW 0x4 TLS 0x190368 0x00191368 0x00191368 0x00954 0x00970 R 0x4 GNU_EH_FRAME 0x18ff68 0x0018ff68 0x0018ff68 0x000dc 0x000dc R 0x4 GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4 What strikes me is that the size of the TLS section has greatly increased, from 0x40 to 0x954. E.g. it has increased 370%, and also its size is no longer a multiple of 16 bytes. Maybe the new jemalloc adds a lot more 'static' TLS data?
(In reply to Dimitry Andric from comment #5) Still the question is up. Why this code wants 8-byte alignment of the TLS segment for module with index 1 ? What would not work otherwise ?
(In reply to Konstantin Belousov from comment #6) > (In reply to Dimitry Andric from comment #5) > Still the question is up. Why this code wants 8-byte alignment of the TLS > segment for module with index 1 ? > > What would not work otherwise ? Ah sorry, that is the way upstream compiler-rt has apparently hardcoded it, e.g. they use a define SHADOW_GRANULARITY for this, here: https://github.com/llvm-mirror/compiler-rt/blob/master/lib/asan/asan_mapping.h#L193 127 static const u64 kDefaultShadowScale = 3; ... 145 #define SHADOW_SCALE kDefaultShadowScale ... 193 #define SHADOW_GRANULARITY (1ULL << SHADOW_SCALE) The same granularity scale values is found in the llvm instrumentation libraries, here: https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/Instrumentation/AddressSanitizer.cpp#L70 70 static const uint64_t kDefaultShadowScale = 3; In the past I have attempted to lower this scale to 2, but ran into hundreds of test failures, so I abandoned that effort. I think the assumption is working without issue for e.g. Linux, since they left the SysV ABI a long time ago there. And apparently we have been lucky somehow, all this time; it has worked well since the first time we tried AddressSanitizer, up to 12.0 before jemalloc 5.0.0.
Some possibly useful references: This (GetThreadStackAndTls) originally arrived in https://reviews.llvm.org/rL156542. FreeBSD support was added in https://reviews.llvm.org/rL203235. Thread sanitizer algorithm description at https://github.com/google/sanitizers/wiki/ThreadSanitizerAlgorithm I think the use of dtv[2] is indeed strange - AFAICT the intent is that it should be returning the address and size of the entire TLS block -- at least, the contiguous block of "initially available modules." I suppose there are a few fairly straightforward changes/hacks that could be tried: 1. Change the FreeBSD GetTls in sanitizer_linux_libcdep.cc to set *addr = 0 and *size = 0. My guess is that this will just exclude all tls accesses from the sanitizer. If that works I'd be happy enough with it as an initial stopgap on FreeBSD/i386 (asan that does not detect errors with tls is better than asan that does not work at all). 2. Make TLS_TCB_ALIGN 8 for i386 in libc tls.c 3. Round addr down to be 8-byte aligned (increasing size as necessary) All of this goop is to implement a function void GetThreadStackAndTls(bool main, uptr *stk_addr, uptr *stk_size, uptr *tls_addr, uptr *tls_size) to fetch the extent of the current thread's stack and tls regions; ideally we can replace this with a clean and consistent FreeBSD-specific interface.
(In reply to Ed Maste from comment #8) Note that libc tls.c is only used for static binaries (at least this is my current understanding).
Created attachment 187379 [details] Ensure alignment of jemalloc's TSD is 8 bytes on 32-bit platforms Here is a possible way of fixing this issue: it ensures that the big tsd_t struct in contrib/jemalloc/src/tsd.c is aligned to 8 bytes, on !LP64 platforms (i.e. i386, but it could also apply to arm or mips, if AddressSanitizer ever gets to work there). Before r319971, on i386 the TLS items in libc.so.7 added up to 80 bytes of storage (a multiple of 8 bytes), assuming __je_tsd_initialized is aligned at 4 bytes: $ readelf -sW /usr/obj/head-r319970/usr/src/lib/libc/libc.so.7|grep -w TLS 1139: 00000058 4 TLS GLOBAL DEFAULT 17 _ThreadRuneLocale@@FBSD_1.3 547: 00000054 4 TLS LOCAL DEFAULT 17 __thread_locale 607: 00000000 64 TLS LOCAL DEFAULT 16 __je_tsd_tls 608: 0000004c 1 TLS LOCAL DEFAULT 17 __je_tsd_initialized 3088: 00000058 4 TLS GLOBAL DEFAULT 17 _ThreadRuneLocale With r319971, this increased quite a lot, to 2404 bytes (which is not a multiple of 8 bytes anymore, unfortunately): $ readelf -sW /usr/obj/head-r319971/usr/src/lib/libc/libc.so.7|grep -w TLS 1139: 0000096c 4 TLS GLOBAL DEFAULT 17 _ThreadRuneLocale@@FBSD_1.3 441: 00000968 4 TLS LOCAL DEFAULT 17 __thread_locale 500: 00000000 2388 TLS LOCAL DEFAULT 16 __je_tsd_tls 502: 00000960 1 TLS LOCAL DEFAULT 17 __je_tsd_initialized 2845: 0000096c 4 TLS GLOBAL DEFAULT 17 _ThreadRuneLocale All the growth is in the __je_tsd_tls struct, as you can see. If this struct is forced to align to 8 bytes, the total amount of TLS data also becomes aligned to 8 bytes. An alternative would be to unconditionally align the struct at, say, 16 bytes, and get rid of the #ifdef. Yet another alternative would be to place some bogus padding 4 byte entity somewhere else in libc.so to ensure the TLS data is a multiple of 8 bytes. Suggestions as to where are welcome. :)
(In reply to Dimitry Andric from comment #10) I somewhat prefer the unconditional alignment to 16 bytes, without check for !LP64.
Created attachment 187398 [details] Ensure jemalloc's TSD is aligned to 16 bytes Okay, unconditionally aligning to 16 bytes looks a bit cleaner, and should not matter for any other arch. I will commit this tomorrow, if there are no objections.
The patch in attachment 187398 [details] seems fine to me.
A commit references this bug: Author: dim Date: Mon Oct 23 21:31:04 UTC 2017 New revision: 324938 URL: https://svnweb.freebsd.org/changeset/base/324938 Log: After jemalloc was updated to version 5.0.0 in r319971, i386 executables linked with AddressSanitizer (even those linked on earlier versions of FreeBSD, or with external versions of clang) started failing with errors similar to: ==14688==AddressSanitizer CHECK failed: /usr/src/contrib/compiler-rt/lib/asan/asan_poisoning.cc:36 "((AddrIsAlignedByGranularity(addr))) != (0)" (0x0, 0x0) This is because AddressSanitizer expects all the TLS data in the program to be aligned to at least 8 bytes. Before the jemalloc 5.0.0 update, all the TLS data in the i386 version of libc.so added up to 80 bytes (a multiple of 8), but 5.0.0 made this grow to 2404 bytes (not a multiple of 8). This is due to added caching data in jemalloc's internal struct tsd_s. To fix AddressSanitizer, ensure this struct is aligned to at least 16 bytes, which can be done unconditionally for all architectures. (An earlier version of the fix aligned the struct to 8 bytes, but only for ILP32 architectures. This was deemed unnecessarily complicated.) PR: 221337 X-MFC-With: r319971 Changes: head/contrib/jemalloc/include/jemalloc/internal/tsd.h