Bug 221337

Summary: -fsanitize=address (asan) fails on i386
Product: Base System Reporter: Ed Maste <emaste>
Component: binAssignee: Dimitry Andric <dim>
Status: Closed FIXED    
Severity: Affects Only Me CC: andreast, dim, jasone, kib
Priority: ---    
Version: CURRENT   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
Ensure alignment of jemalloc's TSD is 8 bytes on 32-bit platforms
none
Ensure jemalloc's TSD is aligned to 16 bytes none

Description Ed Maste freebsd_committer freebsd_triage 2017-08-08 17:12:22 UTC
See Dimitry's thread at http://lists.llvm.org/pipermail/release-testers/2017-August/000475.html. This started after the jemalloc update in r319971 https://reviews.freebsd.org/rS319971

ref12-i386% clang --version
FreeBSD clang version 5.0.0 (trunk 308421) (based on LLVM 5.0.0svn)
...

hello.c is just a printf("hello world");

ref12-i386% cc -g -fsanitize=address hello.c
ref12-i386% ./a.out
==14688==AddressSanitizer CHECK failed: /usr/src/contrib/compiler-rt/lib/asan/asan_poisoning.cc:36 "((AddrIsAlignedByGranularity(addr))) != (0)" (0x0, 0x0)
    #0 0x806f0e3  (/home/emaste/a.out+0x806f0e3)
    #1 0x805dcc3  (/home/emaste/a.out+0x805dcc3)
    #2 0x80dfc35  (/home/emaste/a.out+0x80dfc35)
    #3 0x806961d  (/home/emaste/a.out+0x806961d)
    #4 0x80698bf  (/home/emaste/a.out+0x80698bf)
    #5 0x806ec73  (/home/emaste/a.out+0x806ec73)
    #6 0x8092785  (/home/emaste/a.out+0x8092785)

ref12-i386%
Comment 1 Dimitry Andric freebsd_committer freebsd_triage 2017-08-15 21:12:47 UTC
I did a bit of further research, adding some instrumentation to ASan.  In particular, to its initialization in ClearShadowForThreadStackAndTLS() here:

https://github.com/llvm-mirror/compiler-rt/blob/master/lib/asan/asan_thread.cc#L301

and the TLS support here:

https://github.com/llvm-mirror/compiler-rt/blob/master/lib/sanitizer_common/sanitizer_linux_libcdep.cc#L358

It turns out that the beginning of its TLS area (tls_begin) is not aligned to 8 bytes anymore, after r319971:

$ asprintf.cc.tmp
==65176==DEBUG lib/asan/asan_thread.cc(228): stack_top=0x00000000, stack_bottom=0x00000000
==65176==DEBUG lib/sanitizer_common/sanitizer_linux_libcdep.cc(358): segbase=0x28423a0c
==65176==DEBUG lib/sanitizer_common/sanitizer_linux_libcdep.cc(368): dtv[0]=0x00000002, dtv[1]=0x00000002, dtv[2]=0x284239f4
==65176==DEBUG lib/sanitizer_common/sanitizer_linux_libcdep.cc(402): tls_addr=0x284239f4, tls_size=0x00000018
==65176==DEBUG lib/sanitizer_common/sanitizer_linux_libcdep.cc(407): stack_top=0xbfbdf000, stack_bottom=0xbfbff000
==65176==DEBUG lib/asan/asan_thread.cc(235): stack_top=0xbfbdf000, stack_bottom=0xbfbff000
==65176==DEBUG lib/asan/asan_thread.cc(292): stack_top=0xbfbdf000, stack_bottom=0xbfbff000
==65176==DEBUG lib/asan/asan_thread.cc(296): tls_begin=0x284239f4, tls_end=0x28423a0c
==65176==AddressSanitizer CHECK failed: lib/asan/asan_poisoning.cc:36 "((AddrIsAlignedByGranularity(addr))) != (0)" (0x0, 0x0)
    #0 0x80b5850 in __asan::AsanCheckFailed(char const*, int, char const, unsigned long long, unsigned long long) lib/asan/asan_rtl.cc:69:3
    #1 0x80c76ca in __sanitizer::CheckFailed(char const*, int, char const, unsigned long long, unsigned long long) lib/sanitizer_common/sanitizer_termination.cc:79:5
    #2 0x80af170 in __asan::PoisonShadow(unsigned long, unsigned long, unsigned char) lib/asan/asan_poisoning.cc:36:3
    #3 0x80b7581 in ClearShadowForThreadStackAndTLS lib/asan/asan_thread.cc:298:5
    #4 0x80b7581 in __asan::AsanThread::Init(void) lib/asan/asan_thread.cc:240
    #5 0x80b774d in __asan::AsanThread::ThreadStart(unsigned long, __sanitizer::atomic_uintptr_t*) lib/asan/asan_thread.cc:249:3
    #6 0x80b54c5 in __asan::AsanInitInternal(void) lib/asan/asan_rtl.cc:591:16
    #7 0x807c7aa in clock_gettime lib/sanitizer_common/sanitizer_common_interceptors.inc:1995:3

E.g in its GetTls() routine it retrieves segbase (probably from %gs), then derives the DTV array from that, and uses the third element as the TLS base address.  In this particular case it ends up at 0x284239f4, which is not 8-byte aligned, causing the internal CHECK failure.

In contrast, with older libc, from r319970 or before, the TLS base is aligned to 8 bytes, whether that is by accident or on purpose, I don't know:

$ LD_LIBRARY_PATH=/usr/obj/usr/src/lib/libc.r319970 asprintf.cc.tmp
==65175==DEBUG lib/asan/asan_thread.cc(228): stack_top=0x00000000, stack_bottom=0x00000000
==65175==DEBUG lib/sanitizer_common/sanitizer_linux_libcdep.cc(358): segbase=0x281212f8
==65175==DEBUG lib/sanitizer_common/sanitizer_linux_libcdep.cc(368): dtv[0]=0x00000002, dtv[1]=0x00000002, dtv[2]=0x281212e0
==65175==DEBUG lib/sanitizer_common/sanitizer_linux_libcdep.cc(402): tls_addr=0x281212e0, tls_size=0x00000018
==65175==DEBUG lib/sanitizer_common/sanitizer_linux_libcdep.cc(407): stack_top=0xbfbdf000, stack_bottom=0xbfbff000
==65175==DEBUG lib/asan/asan_thread.cc(235): stack_top=0xbfbdf000, stack_bottom=0xbfbff000
==65175==DEBUG lib/asan/asan_thread.cc(292): stack_top=0xbfbdf000, stack_bottom=0xbfbff000
==65175==DEBUG lib/asan/asan_thread.cc(296): tls_begin=0x281212e0, tls_end=0x281212f8
x1 1x
DONE

And then all the ASan machinery works fine.

I have tried to follow how the DTV array is filled, but I end up in rtld instead of libc, so I don't fully understand how replacing libc can make a difference in this case.  Konstantin, maybe you can shed some light?

Of course I know that there is no guarantee at all for any other alignment than 4 bytes on i386, but it seems that we have somehow always managed (until now) to align the TLS base addresses (at least for ASan's threads) to 8 bytes....

Could we do some sort of compat hack to make this work?  Because after r319971, all executables with ASan fail in the above manner, even those compiled long ago with earlier versions of clang (and maybe even gcc).
Comment 2 Konstantin Belousov freebsd_committer freebsd_triage 2017-08-16 10:37:37 UTC
(In reply to Dimitry Andric from comment #1)
Module tls is allocated in rtld.c:allocate_module_tls().  We use the requested alignment of the TLS initialization segment as the guaranteed alignment value.

There are several possibilities how something might not get aligned:
1. The segment has inappropriate alignment requirement, check with readelf.
2. The symbol inside the TLS segment is unaligned, so the segment alignment does not help.  Again, check with readelf.
3. The allocate_module_tls() is buggy. Probably check this last, by inserting rtld_printf() into rtld code.

I would bet that the cause is either 1 or 2, esp. because this sounds like a deja vu, isn't it ?
Comment 3 Ed Maste freebsd_committer freebsd_triage 2017-08-16 12:41:57 UTC
Looking at libclang_rt.asan-i386.so it does have PT_TLS with 4-byte alignment, and perhaps that should be changed on general principle:

% readelf -l /usr/lib/clang/5.0.0/lib/freebsd/libclang_rt.asan-i386.so

Elf file type is DYN (Shared object file)
Entry point 0xbb80
There are 6 program headers, starting at offset 52

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x000000 0x00000000 0x00000000 0xc21c4 0xc21c4 R E 0x1000
  LOAD           0x0c21c4 0x000c31c4 0x000c31c4 0x02f08 0x49a58c RW  0x1000
  DYNAMIC        0x0c2470 0x000c3470 0x000c3470 0x000f8 0x000f8 RW  0x4
  TLS            0x0c21c4 0x000c31c4 0x000c31c4 0x00000 0x00018 R   0x4
  GNU_EH_FRAME   0x0b5a50 0x000b5a50 0x000b5a50 0x02934 0x02934 R   0x4
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x4

 Section to Segment mapping:
  Segment Sections...
   00     .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame .comment .gnu_debuglink .shstrtab 
   01     .tbss .ctors .dtors .jcr .data.rel.ro .dynamic .got .got.plt .data .bss 
   02     .dynamic 
   03     .tbss .ctors 
   04     .eh_frame_hdr 
   05     
ref12-i386% 

but note that asan is not asserting that one of its symbols is not 8-byte aligned but rather (some #ifdefs trimmed, showing only the FreeBSD/i386 path):

static void **ThreadSelfSegbase() {
  void **segbase = 0;
  // sysarch(I386_GET_GSBASE, segbase);
  __asm __volatile("mov %%gs:0, %0" : "=r" (segbase));
  return segbase;
}

uptr ThreadSelf() {
  return (uptr)ThreadSelfSegbase()[2];
}

static void GetTls(uptr *addr, uptr *size) {
  void** segbase = ThreadSelfSegbase();
  *addr = 0;
  *size = 0;
  if (segbase != 0) {
    // tcbalign = 16
    // tls_size = round(tls_static_space, tcbalign);
    // dtv = segbase[1];
    // dtv[2] = segbase - tls_static_space;
    void **dtv = (void**) segbase[1];
    *addr = (uptr) dtv[2];
    *size = (*addr == 0) ? 0 : ((uptr) segbase[0] - (uptr) dtv[2]);
  }
}

and later asserts that the addr returned from GetTls is 8-byte aligned
Comment 4 Konstantin Belousov freebsd_committer freebsd_triage 2017-08-16 12:58:03 UTC
(In reply to Ed Maste from comment #3)
So it asserts that the TLS segment for module with the index 1 is 8-bytes aligned.  Which is strange, indeed.
Comment 5 Dimitry Andric freebsd_committer freebsd_triage 2017-08-16 17:08:37 UTC
(In reply to Ed Maste from comment #3)
> Looking at libclang_rt.asan-i386.so it does have PT_TLS with 4-byte
> alignment, and perhaps that should be changed on general principle:
> 
> % readelf -l /usr/lib/clang/5.0.0/lib/freebsd/libclang_rt.asan-i386.so

Unfortunately, in most cases this .so is not used, but libclang_rt.asan-i386.a is statically linked into the asan-instrumented executable.

For instance, I have this asan-instrumented executable, which is built on FreeBSD 10/i386, using clang 4.0.1:

$ ldd ./asantest-f10-i386
./asantest-f10-i386:
	libthr.so.3 => /lib/libthr.so.3 (0x28124000)
	librt.so.1 => /usr/lib/librt.so.1 (0x28149000)
	libm.so.5 => /lib/libm.so.5 (0x2814f000)
	libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x2817a000)
	libc.so.7 => /lib/libc.so.7 (0x2818f000)

It has these program headers:

  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  PHDR           0x000034 0x08048034 0x08048034 0x00140 0x00140 R E 0x4
  INTERP         0x000174 0x08048174 0x08048174 0x00015 0x00015 R   0x1
      [Requesting program interpreter: /libexec/ld-elf.so.1]
  LOAD           0x000000 0x08048000 0x08048000 0xb2cec 0xb2cec R E 0x1000
  LOAD           0x0b36fc 0x080fc6fc 0x080fc6fc 0x02df4 0x459ed4 RW  0x1000
  DYNAMIC        0x0b39a8 0x080fc9a8 0x080fc9a8 0x00110 0x00110 RW  0x4
  NOTE           0x00018c 0x0804818c 0x0804818c 0x00030 0x00030 R   0x4
  TLS            0x0b36fc 0x080fc6fc 0x080fc6fc 0x00000 0x00018 R   0x4
  GNU_EH_FRAME   0x09dc48 0x080e5c48 0x080e5c48 0x028fc 0x028fc R   0x4
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RWE 0x10
  GNU_RELRO      0x0b36fc 0x080fc6fc 0x080fc6fc 0x00904 0x00904 R   0x1

Most align fields are set to 4, which seems logical on i386.

Meanwhile, I compared the program headers of libc.so.7 r319970:

  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x000000 0x00000000 0x00000000 0x1660f8 0x1660f8 R E 0x1000
  LOAD           0x1660f8 0x001670f8 0x001670f8 0x06918 0x1b820 RW  0x1000
  DYNAMIC        0x168dd8 0x00169dd8 0x00169dd8 0x000c8 0x000c8 RW  0x4
  TLS            0x1660f8 0x001670f8 0x001670f8 0x00040 0x0005c R   0x4
  GNU_EH_FRAME   0x165cf8 0x00165cf8 0x00165cf8 0x000dc 0x000dc R   0x4
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x4

and of libc.so.7 r319971:

  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x000000 0x00000000 0x00000000 0x190368 0x190368 R E 0x1000
  LOAD           0x190368 0x00191368 0x00191368 0x07ab8 0x274e0 RW  0x1000
  DYNAMIC        0x194148 0x00195148 0x00195148 0x000d0 0x000d0 RW  0x4
  TLS            0x190368 0x00191368 0x00191368 0x00954 0x00970 R   0x4
  GNU_EH_FRAME   0x18ff68 0x0018ff68 0x0018ff68 0x000dc 0x000dc R   0x4
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x4

What strikes me is that the size of the TLS section has greatly increased, from 0x40 to 0x954.  E.g. it has increased 370%, and also its size is no longer a multiple of 16 bytes.

Maybe the new jemalloc adds a lot more 'static' TLS data?
Comment 6 Konstantin Belousov freebsd_committer freebsd_triage 2017-08-16 17:30:15 UTC
(In reply to Dimitry Andric from comment #5)
Still the question is up.  Why this code wants 8-byte alignment of the TLS segment for module with index 1 ?

What would not work otherwise ?
Comment 7 Dimitry Andric freebsd_committer freebsd_triage 2017-08-16 17:47:02 UTC
(In reply to Konstantin Belousov from comment #6)
> (In reply to Dimitry Andric from comment #5)
> Still the question is up.  Why this code wants 8-byte alignment of the TLS
> segment for module with index 1 ?
> 
> What would not work otherwise ?

Ah sorry, that is the way upstream compiler-rt has apparently hardcoded it, e.g. they use a define SHADOW_GRANULARITY for this, here:

https://github.com/llvm-mirror/compiler-rt/blob/master/lib/asan/asan_mapping.h#L193

   127  static const u64 kDefaultShadowScale = 3;
...
   145  #define SHADOW_SCALE kDefaultShadowScale
...
   193  #define SHADOW_GRANULARITY (1ULL << SHADOW_SCALE)

The same granularity scale values is found in the llvm instrumentation libraries, here:
https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/Instrumentation/AddressSanitizer.cpp#L70

    70  static const uint64_t kDefaultShadowScale = 3;

In the past I have attempted to lower this scale to 2, but ran into hundreds of test failures, so I abandoned that effort.  I think the assumption is working without issue for e.g. Linux, since they left the SysV ABI a long time ago there.  And apparently we have been lucky somehow, all this time; it has worked well since the first time we tried AddressSanitizer, up to 12.0 before jemalloc 5.0.0.
Comment 8 Ed Maste freebsd_committer freebsd_triage 2017-08-18 01:42:41 UTC
Some possibly useful references:

This (GetThreadStackAndTls) originally arrived in https://reviews.llvm.org/rL156542. FreeBSD support was added in https://reviews.llvm.org/rL203235.

Thread sanitizer algorithm description at https://github.com/google/sanitizers/wiki/ThreadSanitizerAlgorithm

I think the use of dtv[2] is indeed strange - AFAICT the intent is that it should be returning the address and size of the entire TLS block -- at least, the contiguous block of "initially available modules."

I suppose there are a few fairly straightforward changes/hacks that could be tried:

1. Change the FreeBSD GetTls in sanitizer_linux_libcdep.cc to set *addr = 0 and *size = 0. My guess is that this will just exclude all tls accesses from the sanitizer. If that works I'd be happy enough with it as an initial stopgap on FreeBSD/i386 (asan that does not detect errors with tls is better than asan that does not work at all).
2. Make TLS_TCB_ALIGN 8 for i386 in libc tls.c
3. Round addr down to be 8-byte aligned (increasing size as necessary)

All of this goop is to implement a function void GetThreadStackAndTls(bool main, uptr *stk_addr, uptr *stk_size, uptr *tls_addr, uptr *tls_size) to fetch the extent of the current thread's stack and tls regions; ideally we can replace this with a clean and consistent FreeBSD-specific interface.
Comment 9 Konstantin Belousov freebsd_committer freebsd_triage 2017-08-18 13:37:05 UTC
(In reply to Ed Maste from comment #8)
Note that libc tls.c is only used for static binaries (at least this is my current understanding).
Comment 10 Dimitry Andric freebsd_committer freebsd_triage 2017-10-22 21:57:30 UTC
Created attachment 187379 [details]
Ensure alignment of jemalloc's TSD is 8 bytes on 32-bit platforms

Here is a possible way of fixing this issue: it ensures that the big tsd_t struct in contrib/jemalloc/src/tsd.c is aligned to 8 bytes, on !LP64 platforms (i.e. i386, but it could also apply to arm or mips, if AddressSanitizer ever gets to work there).

Before r319971, on i386 the TLS items in libc.so.7 added up to 80 bytes of storage (a multiple of 8 bytes), assuming __je_tsd_initialized is aligned at 4 bytes:

$ readelf -sW /usr/obj/head-r319970/usr/src/lib/libc/libc.so.7|grep -w TLS
  1139: 00000058     4 TLS     GLOBAL DEFAULT   17 _ThreadRuneLocale@@FBSD_1.3
   547: 00000054     4 TLS     LOCAL  DEFAULT   17 __thread_locale
   607: 00000000    64 TLS     LOCAL  DEFAULT   16 __je_tsd_tls
   608: 0000004c     1 TLS     LOCAL  DEFAULT   17 __je_tsd_initialized
  3088: 00000058     4 TLS     GLOBAL DEFAULT   17 _ThreadRuneLocale

With r319971, this increased quite a lot, to 2404 bytes (which is not a multiple of 8 bytes anymore, unfortunately):

$ readelf -sW /usr/obj/head-r319971/usr/src/lib/libc/libc.so.7|grep -w TLS
  1139: 0000096c     4 TLS     GLOBAL DEFAULT   17 _ThreadRuneLocale@@FBSD_1.3
   441: 00000968     4 TLS     LOCAL  DEFAULT   17 __thread_locale
   500: 00000000  2388 TLS     LOCAL  DEFAULT   16 __je_tsd_tls
   502: 00000960     1 TLS     LOCAL  DEFAULT   17 __je_tsd_initialized
  2845: 0000096c     4 TLS     GLOBAL DEFAULT   17 _ThreadRuneLocale

All the growth is in the __je_tsd_tls struct, as you can see.  If this struct is forced to align to 8 bytes, the total amount of TLS data also becomes aligned to 8 bytes.

An alternative would be to unconditionally align the struct at, say, 16 bytes, and get rid of the #ifdef.

Yet another alternative would be to place some bogus padding 4 byte entity somewhere else in libc.so to ensure the TLS data is a multiple of 8 bytes.  Suggestions as to where are welcome. :)
Comment 11 Konstantin Belousov freebsd_committer freebsd_triage 2017-10-23 08:36:42 UTC
(In reply to Dimitry Andric from comment #10)
I somewhat prefer the unconditional alignment to 16 bytes, without check for !LP64.
Comment 12 Dimitry Andric freebsd_committer freebsd_triage 2017-10-23 16:59:03 UTC
Created attachment 187398 [details]
Ensure jemalloc's TSD is aligned to 16 bytes

Okay, unconditionally aligning to 16 bytes looks a bit cleaner, and should not matter for any other arch.  I will commit this tomorrow, if there are no objections.
Comment 13 Ed Maste freebsd_committer freebsd_triage 2017-10-23 17:43:05 UTC
The patch in attachment 187398 [details] seems fine to me.
Comment 14 commit-hook freebsd_committer freebsd_triage 2017-10-23 21:31:47 UTC
A commit references this bug:

Author: dim
Date: Mon Oct 23 21:31:04 UTC 2017
New revision: 324938
URL: https://svnweb.freebsd.org/changeset/base/324938

Log:
  After jemalloc was updated to version 5.0.0 in r319971, i386 executables
  linked with AddressSanitizer (even those linked on earlier versions of
  FreeBSD, or with external versions of clang) started failing with errors
  similar to:

    ==14688==AddressSanitizer CHECK failed:
    /usr/src/contrib/compiler-rt/lib/asan/asan_poisoning.cc:36
    "((AddrIsAlignedByGranularity(addr))) != (0)" (0x0, 0x0)

  This is because AddressSanitizer expects all the TLS data in the program
  to be aligned to at least 8 bytes.

  Before the jemalloc 5.0.0 update, all the TLS data in the i386 version
  of libc.so added up to 80 bytes (a multiple of 8), but 5.0.0 made this
  grow to 2404 bytes (not a multiple of 8).  This is due to added caching
  data in jemalloc's internal struct tsd_s.

  To fix AddressSanitizer, ensure this struct is aligned to at least 16
  bytes, which can be done unconditionally for all architectures.  (An
  earlier version of the fix aligned the struct to 8 bytes, but only for
  ILP32 architectures.  This was deemed unnecessarily complicated.)

  PR:		221337
  X-MFC-With:	r319971

Changes:
  head/contrib/jemalloc/include/jemalloc/internal/tsd.h