Bug 227552 - w, uptime i386 coredump in libxo
Summary: w, uptime i386 coredump in libxo
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 11.1-STABLE
Hardware: i386 Any
: --- Affects Many People
Assignee: Phil Shafer
URL:
Keywords: patch
Depends on:
Blocks:
 
Reported: 2018-04-16 13:30 UTC by jachmann@unitix.org
Modified: 2018-05-19 21:16 UTC (History)
13 users (show)

See Also:


Attachments
Fix from Kai Wang (kaiw@) for elfcopy (1.51 KB, patch)
2018-05-13 05:38 UTC, Phil Shafer
no flags Details | Diff
Fix from Kai Wang (kaiw@) for readelf (1.10 KB, patch)
2018-05-13 05:39 UTC, Phil Shafer
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description jachmann@unitix.org 2018-04-16 13:30:06 UTC

    
Comment 1 jachmann@unitix.org 2018-04-16 13:35:46 UTC
w and uptime are broken on i386 

gdb trace:
(gdb) r
Starting program: /usr/obj/usr/src/usr.bin/w/w.full 

Program received signal SIGSEGV, Segmentation fault.
ifree (tsd=0x28000000) at arena.h:799
799             return (*mapbitsp);
Current language:  auto; currently minimal
(gdb) where
#0  ifree (tsd=0x28000000) at arena.h:799
#1  0x28155316 in __free (ptr=0x280601ef) at tsd.h:716
#2  0x28095b07 in xo_do_emit_fields ()
    at /usr/src/contrib/libxo/libxo/libxo.c:6419
#3  0x28093a1c in xo_do_emit (xop=<value optimized out>, 
    flags=<value optimized out>, fmt=0x804ad4d "{:time-of-day/%s} ")
    at /usr/src/contrib/libxo/libxo/libxo.c:6470
#4  0x28093b61 in xo_emit (fmt=0x804ad4d "{:time-of-day/%s} ")
    at /usr/src/contrib/libxo/libxo/libxo.c:6541
#5  0x08049f50 in main (argc=<value optimized out>, argv=<value optimized out>)
    at /usr/src/usr.bin/w/w.c:475
(gdb)
Comment 2 jachmann@unitix.org 2018-04-19 15:35:43 UTC
What I can see is that only on my i386 it coredumps.

Not on my amd64 

Last Changed Rev: 331722
Last Changed Date: 2018-03-29 04:50:57 +0200 (Thu, 29 Mar 2018)
Comment 3 jachmann@unitix.org 2018-04-23 09:03:00 UTC
currently tried a 11.2-PRERELEASE Memstick 

https://download.freebsd.org/ftp/snapshots/i386/i386/ISO-IMAGES/11.2/FreeBSD-11.2-PRERELEASE-i386-20180420-r332802-memstick.img.xz

And, also here w and uptime are broken.

JUST FYI
Comment 4 Ian Pallfreeman 2018-04-23 14:44:23 UTC
Nothing useful to add to diagnosis, just "bump" and "me too".
Comment 5 Ian Pallfreeman 2018-04-23 14:44:47 UTC
Nothing useful to add to diagnosis, just "bump" and "me too".
Comment 6 jachmann@unitix.org 2018-04-24 18:23:29 UTC
its still there...

look:

Starting program: /usr/obj/usr/src/usr.bin/w/w.full 

Program received signal SIGSEGV, Segmentation fault.
ifree (tsd=0x28000000) at arena.h:799
799             return (*mapbitsp);
Current language:  auto; currently minimal
(gdb) bt
#0  ifree (tsd=0x28000000) at arena.h:799
#1  0x28155506 in __free (ptr=0x280601ef) at tsd.h:716
#2  0x28095b07 in xo_do_emit_fields()
    at /usr/src/contrib/libxo/libxo/libxo.c:6419
#3  0x28093a1c in xo_do_emit (xop=<value optimized out>, 
    flags=<value optimized out>, fmt=0x804ad4d "{:time-of-day/%s} ")
    at /usr/src/contrib/libxo/libxo/libxo.c:6470
#4  0x28093b61 in xo_emit (fmt=0x804ad4d "{:time-of-day/%s} ")
    at /usr/src/contrib/libxo/libxo/libxo.c:6541
#5  0x08049f50 in main (argc=<value optimized out>, argv=<value optimized out>)
    at /usr/src/usr.bin/w/w.c:475


and xo_do_emit_fields is from libxo, or ?

and w.c is so complex meanwile, not only because of libxo.

So what.....
Comment 7 Oleksandr Tymoshenko freebsd_committer freebsd_triage 2018-04-24 18:51:03 UTC
Confirmed. Looks like the problem is in libxo. Either xo_default_handle is not properly zero-initialized or there is a memory corruption
Comment 8 Oleksandr Tymoshenko freebsd_committer freebsd_triage 2018-04-24 18:53:15 UTC
(In reply to Oleksandr Tymoshenko from comment #7)

Not in libxo itself, there were no recent changes there but the bug manifests itself in libxo
Comment 9 Oleksandr Tymoshenko freebsd_committer freebsd_triage 2018-04-24 20:46:33 UTC
xo_default_handle is passed as an argument to xo_init_handle so I added a breakpoint and checked its content. Since it's static it's supposed to be zero-initialized but instead there are a lot of garbage values. xo_default_handle is thread-local variable so it might be a contributing factor.

root@freebsd:/home/gonzo # /usr/local/bin/gdb w
GNU gdb (GDB) 8.1 [GDB v8.1 for FreeBSD]
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i386-portbld-freebsd11.1".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from w...Reading symbols from /usr/lib/debug//usr/bin/w.debug...done.
done.
(gdb) break xo_init_handle
Function "xo_init_handle" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (xo_init_handle) pending.
(gdb) run
Starting program: /usr/bin/w

Breakpoint 1, xo_init_handle (xop=0x2806aff0) at /usr/src/contrib/libxo/libxo/libxo.c:640
640	    xop->xo_opaque = stdout;
(gdb) p *xop
$1 = {xo_flags = 0, xo_iflags = 2883994737386192896, xo_style = 45072, xo_indent = 10246, xo_indent_by = 41472, xo_write = 0x1,
  xo_close = 0x280601ef <__pthread_cleanup_push_imp_int+31>, xo_flush = 0x2806b020, xo_formatter = 0x2806a400, xo_checkpointer = 0x5d,
  xo_opaque = 0x280601ef <__pthread_cleanup_push_imp_int+31>, xo_data = {xb_bufp = 0x2806b030 "@\260\006(", xb_curp = 0x2806a600 "z\270P\325\001", xb_size = 161},
  xo_fmt = {xb_bufp = 0x280601ef <__pthread_cleanup_push_imp_int+31> "\377\220\344\001", xb_curp = 0x2806b040 "", xb_size = 671524864}, xo_attrs = {
    xb_bufp = 0x147 <error: Cannot access memory at address 0x147>, xb_curp = 0x280601ef <__pthread_cleanup_push_imp_int+31> "\377\220\344\001", xb_size = 0},
  xo_predicate = {xb_bufp = 0x2806aa00 "z\270P\325\001", xb_curp = 0x164 <error: Cannot access memory at address 0x164>, xb_size = 671482351},
  xo_stack = 0x2806c000, xo_depth = 0, xo_stack_size = 671506032, xo_info = 0x280601ef <__pthread_cleanup_push_imp_int+31>, xo_info_count = 671527024,
  xo_vap = 0x2806ac00 "z\270P\325\001", xo_leading_xpath = 0x421 <error: Cannot access memory at address 0x421>, xo_mbstate = {
    __mbstate8 = "\357\001\006(\000\000\000\000\000\252\006(-\004\000\000\357\001\006(\000\000\000\000\000\252\006(\377\001\000\000\357\001\006(\240\260\006(\000\250\006(f\t\000\000\357\001\006(\000\000\000\000\000\252\006(s\t\000\000\357\001\006(\000\000\000\000\000\252\006(\030\n\000\000\357\001\006(\000\000\000\000\000\252\006(q\005\000\000\357\001\006(\340\260\006(\000\240\006(\000\000\000\000\357\001\006(\360\260\006(\000\242\006(\000\000\000", _mbstateL = 671482351},
  xo_anchor_offset = 671482351, xo_anchor_columns = 671527168, xo_anchor_min_width = 671523840, xo_units_offset = 0, xo_columns = 671482351,
  xo_color_map_fg = "\020\261\006(\000\246\006(", xo_color_map_bg = "\000\000\000\357\001\006( \261", xo_colors = {xoc_effects = 6 '\006', xoc_col_fg = 40 '(',
    xoc_col_bg = 0 '\000'}, xo_color_buf = {xb_bufp = 0x0, xb_curp = 0x280601ef <__pthread_cleanup_push_imp_int+31> "\377\220\344\001", xb_size = 671527216},
  xo_version = 0x2806aa00 "z\270P\325\001", xo_errno = 0, xo_gt_domain = 0x280601ef <__pthread_cleanup_push_imp_int+31> "\377\220\344\001", xo_encoder = 0x0,
  xo_private = 0x2806ac00}
(gdb)
Comment 10 Dimitry Andric freebsd_committer 2018-04-26 16:39:51 UTC
(In reply to Oleksandr Tymoshenko from comment #9)
> xo_default_handle is passed as an argument to xo_init_handle so I added a
> breakpoint and checked its content. Since it's static it's supposed to be
> zero-initialized but instead there are a lot of garbage values.
> xo_default_handle is thread-local variable so it might be a contributing
> factor.
...
> (gdb) p *xop
> $1 = {xo_flags = 0, xo_iflags = 2883994737386192896, xo_style = 45072,
> xo_indent = 10246, xo_indent_by = 41472, xo_write = 0x1,
>   xo_close = 0x280601ef <__pthread_cleanup_push_imp_int+31>, xo_flush =

It definitely seems to have something to do with TLS.  The libxo.so.0 file shipped in the FreeBSD-11.2-PRERELEASE-i386-20180420-r332802 snapshot has:

Program Header:
    LOAD off    0x00000000 vaddr 0x00000000 paddr 0x00000000 align 2**12
         filesz 0x00017160 memsz 0x00017160 flags r-x
    LOAD off    0x00017160 vaddr 0x00018160 paddr 0x00018160 align 2**12
         filesz 0x00000604 memsz 0x00000654 flags rw-
 DYNAMIC off    0x00017264 vaddr 0x00018264 paddr 0x00018264 align 2**2
         filesz 0x000000d8 memsz 0x000000d8 flags rw-
     TLS off    0x00017160 vaddr 0x00018764 paddr 0x00018764 align 2**3
         filesz 0x00000000 memsz 0x00000050 flags r--
   STACK off    0x00000000 vaddr 0x00000000 paddr 0x00000000 align 2**2
         filesz 0x00000000 memsz 0x00000000 flags rw-

but if I install this snapshot onto a machine, check out stable/11 r332802 and rebuild lib/libxo, the resulting libxo.so.0 has:

Program Header:
    LOAD off    0x00000000 vaddr 0x00000000 paddr 0x00000000 align 2**12
         filesz 0x00017160 memsz 0x00017160 flags r-x
    LOAD off    0x00017160 vaddr 0x00018160 paddr 0x00018160 align 2**12
         filesz 0x00000604 memsz 0x00000654 flags rw-
 DYNAMIC off    0x00017264 vaddr 0x00018264 paddr 0x00018264 align 2**2
         filesz 0x000000d8 memsz 0x000000d8 flags rw-
     TLS off    0x00017160 vaddr 0x00018160 paddr 0x00018160 align 2**3
         filesz 0x00000000 memsz 0x00000658 flags r--
   STACK off    0x00000000 vaddr 0x00000000 paddr 0x00000000 align 2**2
         filesz 0x00000000 memsz 0x00000000 flags rw-

E.g. the shipped version has a TLS section of just 0x50 bytes, while the recompiled version has 0x658 bytes.  The recompiled version also works just fine, with every test I throw at it.

I don't know how the shipped versions are built, but I suspect there is something off there.
Comment 11 Dimitry Andric freebsd_committer 2018-04-28 12:40:20 UTC
I'm now not so sure anymore about the TLS section being a problem. On a stable/11 i386 box with r332318 (as of 2018-04-09), I do *not* see crashes in w or uptime, even though the TLS section appears to be 0x50 bytes:

$ ldd /usr/bin/uptime
/usr/bin/uptime:
	libkvm.so.7 => /lib/libkvm.so.7 (0x28070000)
	libsbuf.so.6 => /lib/libsbuf.so.6 (0x2807d000)
	libxo.so.0 => /lib/libxo.so.0 (0x28080000)
	libutil.so.9 => /lib/libutil.so.9 (0x28099000)
	libc.so.7 => /lib/libc.so.7 (0x280ab000)
	libelf.so.2 => /lib/libelf.so.2 (0x2820a000)

$ uptime
 2:36PM  up 21 mins, 1 user, load averages: 0.32, 0.26, 0.23

$ readelf -l /lib/libxo.so.0 | grep 'Type\|TLS'
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  TLS            0x017160 0x00018764 0x00018764 0x00000 0x00050 R   0x8

$ readelf -l /usr/obj/usr/src/lib/libxo/libxo.so.0.full | grep 'Type\|TLS'
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  TLS            0x017160 0x00018160 0x00018160 0x00000 0x00658 R   0x8

$ readelf -l /usr/obj/usr/src/lib/libxo/libxo.so.0 | grep 'Type\|TLS'
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  TLS            0x017160 0x00018160 0x00018160 0x00000 0x00654 R   0x8

So, libxo.so.0.full, the actual output of the link stage, has a TLS MemSize of 0x658 bytes, libxo.so.0, which is produced by:

objcopy --strip-debug --add-gnu-debuglink=libxo.so.0.debug  libxo.so.0.full libxo.so.0

has a TLS MemSize of 0x654 bytes, and the final version installed by installworld, and stripped during that time, has a TLS MemSize of 0x50 bytes.

However, at this revision, r332318, it does not crash.
Comment 12 Dimitry Andric freebsd_committer 2018-04-30 12:07:35 UTC
I bisected, and it turns out r331838 (the merge of clang 6.0.0 and follow-up fixes) is the first revision with those segfaults:

# ulimit -c 0; for i in /jail/test-r*; do echo "Using jail: $i"; chroot $i /usr/bin/w; done
Using jail: /jail/test-r331837
12:00PM  up 13:47, 0 users, load averages: 0.23, 0.24, 0.60
USER       TTY      FROM                                      LOGIN@  IDLE WHAT
Using jail: /jail/test-r331838
Segmentation fault

Since all of the jail in r331837 has been compiled with clang 5.0.1, and all of r331838 with clang 6.0.0, it is hard to say what is the exact cause.

Interestingly, moving around the libraries used by w seems to influence the crash, at least for me.  So for example:

$ ldd /usr/bin/w
/usr/bin/w:
        libkvm.so.7 => /lib/libkvm.so.7 (0x28070000)
        libsbuf.so.6 => /lib/libsbuf.so.6 (0x2807d000)
        libxo.so.0 => /lib/libxo.so.0 (0x28080000)
        libutil.so.9 => /lib/libutil.so.9 (0x28099000)
        libc.so.7 => /lib/libc.so.7 (0x280ab000)
        libelf.so.2 => /lib/libelf.so.2 (0x2820a000)

$ /usr/bin/w
 2:05PM  up 13:53, 2 users, load averages: 2.31, 0.76, 0.66
USER       TTY      FROM                                      LOGIN@  IDLE WHAT
dim        pts/2    coleburn.home.andric.com                  2:02PM     - w

$ mkdir ~/foo

$ cp /lib/libkvm.so.7 /lib/libsbuf.so.6 /lib/libxo.so.0 /lib/libutil.so.9 /lib/libc.so.7 /lib/libelf.so.2 ~/foo

$ LD_LIBRARY_PATH=~/foo /usr/bin/w
Segmentation fault (core dumped)

Meaning, the exact same .so files, but in a different path, crash!  Currently, I'm thinking that this may be something in the dynamic linker, but I'm still not sure.
Comment 13 Tijl Coosemans freebsd_committer 2018-04-30 13:32:52 UTC
Note that libc has a 16 byte aligned TLS section because of JEMALLOC_ALIGNED(16) in contrib/jemalloc/include/jemalloc/internal/tsd.h while the size of the TLS section is not a multiple of 16.  I reported a problem with this when that was added.  I suspect that rtld doesn't allocate enough extra bytes if it needs to realign the section causing overlap between sections, but I never investigated that and simply made the jemalloc struct 8 byte aligned.
Comment 14 Dimitry Andric freebsd_committer 2018-04-30 21:42:02 UTC
(In reply to Tijl Coosemans from comment #13)
> Note that libc has a 16 byte aligned TLS section because of
> JEMALLOC_ALIGNED(16) in contrib/jemalloc/include/jemalloc/internal/tsd.h
> while the size of the TLS section is not a multiple of 16.  I reported a
> problem with this when that was added.  I suspect that rtld doesn't allocate
> enough extra bytes if it needs to realign the section causing overlap
> between sections, but I never investigated that and simply made the jemalloc
> struct 8 byte aligned.

Were there any updates to rtld in head for this alignment stuff, that you recall?
Comment 15 Tijl Coosemans freebsd_committer 2018-05-01 12:05:47 UTC
(In reply to Dimitry Andric from comment #14)
Not that I recall, but I just tried to reproduce the problem I had back then and everything seems fine now, so it's possible that it was fixed.
Comment 16 Paul Boehmer 2018-05-07 15:23:21 UTC
Not sure if this sheds light on bug, but tracked it down to release #331838.  #331837 uptime/w both work fine.   Something in the Clang/LLVM update?
Comment 17 Paul Boehmer 2018-05-07 15:28:17 UTC
(In reply to Paul Boehmer from comment #16)
Derp, didn't notice comment 12.  Apologies for the noise.
Comment 18 Phil Shafer freebsd_committer 2018-05-08 01:31:42 UTC
Adding my recent email to freebsd-arch@:


From:     Phil Shafer <phil@juniper.net>
To:       <freebsd-arch@freebsd.org>
Subject:  initialization problem w/ thread-specific .tbss data on i386
Date:     Mon, 07 May 2018 17:27:03 -0400

I have a problem reported with libxo-based applications running
under FreeBSD-11-stable on i386 boxes that I think is related
to rtld:

When I breakpoint on main() and dump the contents of my uninitialized
thread-specific variable, it has not been initialized to zeroes.

I don't see this problem on 64-bit systems, only on i386 ones.

When I look at the rtld code, it appears to memset the .tbss to
zero (/usr/src/libexec/rtld-elf/rtld.c:allocate_tls) in the
non-arch-specific code so the arch shouldn't matter, but something
is not working right.

So I'm looking for a helpful clue, such as how to debug rtld to see
why this isn't being zeroed.  I thought I'd use:

    gdb /libexec/ld-elf.so.1
    run /usr/bin/uptime

for this doesn't work for me (SEGV with a callstack that doesn't
make sense).

For this instance, the work around is to initialize the contents
of xo_default_handle to zero so it's not in the .tbss, but I'd like
to understand what's failing.  In truth, I just have a hard time
blaming rtld, even though this is issue is an obscure intersection
of weird things (.tbbs on i386).  Perhaps it's something wrong with
how the library is built or similar.  But given that it's not zeroed
when main() get control, something's clearly broken.

Details follow:

I declare my variable as:

    #define THREAD_LOCAL(_x) __thread _x
    ...
    static THREAD_LOCAL(xo_handle_t) xo_default_handle;

To help debug this issue, I made the following change to the sources
to help with gdb's inability to show thread-local variables ("Cannot
find thread-local variables on this target"):

    --- contrib/libxo/libxo/libxo.c.save    2018-05-04 17:26:29.079500000 -0400
    +++ contrib/libxo/libxo/libxo.c 2018-05-04 17:28:06.570875000 -0400
    @@ -8349,3 +8349,11 @@
         xop->xo_style = XO_STYLE_ENCODER;
         xop->xo_encoder = encoder;
     }
    +
    +void xo_print_handle (void);
    +void
    +xo_print_handle (void)
    +{
    +    fprintf(stderr, "xo_default_handle: %p %d\n",
    +            &xo_default_handle, sizeof(xo_handle_t));
    +}

When I run the failing command (uptime) under gdb and breakpoint
on main, my thread-local variable is not set to zeroes:

    % gdb uptime
    GNU gdb 6.1.1 [FreeBSD]
    ...
    This GDB was configured as "i386-marcel-freebsd"...
    (gdb) b main
    Breakpoint 1 at 0x8049be5: file /usr/src/usr.bin/w/w.c, line 145.
    (gdb) run
    Starting program: /usr/home/phil/work/lib/uptime

    Breakpoint 1, main (argc=1, argv=0xbfbfe60c) at /usr/src/usr.bin/w/w.c:145
    145             (void)setlocale(LC_ALL, "");
    Current language:  auto; currently minimal
    (gdb) call xo_print_handle()
    xo_default_handle: 0x2806aff0 328
    $1 = 34
    (gdb) x/82x 0x2806aff0
    0x2806aff0:     0x00000000      0x00000000      0x00000000      0x280601ef
    0x2806b000:     0x2806b010      0x2806a200      0x00000001      0x280601ef
    0x2806b010:     0x2806b020      0x2806a400      0x0000005d      0x280601ef
    0x2806b020:     0x2806b030      0x2806a600      0x000000a1      0x280601ef
    0x2806b030:     0x2806b040      0x2806a800      0x00000147      0x280601ef
    0x2806b040:     0x00000000      0x2806aa00      0x00000164      0x280601ef
    0x2806b050:     0x2806c000      0x00000000      0x28065e70      0x280601ef
    0x2806b060:     0x2806b070      0x2806ac00      0x00000421      0x280601ef
    0x2806b070:     0x00000000      0x2806aa00      0x0000042d      0x280601ef
    0x2806b080:     0x00000000      0x2806aa00      0x000001ff      0x280601ef
    0x2806b090:     0x2806b0a0      0x2806a800      0x00000976      0x280601ef
    0x2806b0a0:     0x00000000      0x2806aa00      0x00000983      0x280601ef
    0x2806b0b0:     0x00000000      0x2806aa00      0x00000a18      0x280601ef
    0x2806b0c0:     0x00000000      0x2806aa00      0x00000571      0x280601ef
    0x2806b0d0:     0x2806b0e0      0x2806a000      0x00000000      0x280601ef
    0x2806b0e0:     0x2806b0f0      0x2806a200      0x00000000      0x280601ef
    0x2806b0f0:     0x2806b100      0x2806a400      0x00000000      0x280601ef
    0x2806b100:     0x2806b110      0x2806a600      0x00000000      0x280601ef
    0x2806b110:     0x2806b120      0x2806a800      0x00000000      0x280601ef
    0x2806b120:     0x2806b130      0x2806aa00      0x00000000      0x280601ef
    0x2806b130:     0x00000000      0x2806ac00
    (gdb)

objdump shows the lib does have a .tbbs:

     14 .tbss         00000658  000181f8  000181f8  000171f8  2**3
                      ALLOC, THREAD_LOCAL

Thanks,
 Phil
Comment 19 Phil Shafer freebsd_committer 2018-05-08 01:46:04 UTC
The work around is:

@@ -1376,8 +1380,8 @@
     xo_retain_entry_t *xr_bucket[RETAIN_HASH_SIZE];
 } xo_retain_t;

-static THREAD_LOCAL(xo_retain_t) xo_retain;
-static THREAD_LOCAL(unsigned) xo_retain_count;
+static THREAD_LOCAL(xo_retain_t) xo_retain = { 0 };
+static THREAD_LOCAL(unsigned) xo_retain_count = 0;

 /*
  * Simple hash function based on Thomas Wang's paper.  The original is

Thanks,
 Phil
Comment 20 Dimitry Andric freebsd_committer 2018-05-08 19:09:44 UTC
Hmm, now that we've identified .tbss as a contributor to the problem, it looks relevant that the r331838 version of libxo.so.0 (compiled with the clang 6.0.0 update) does NOT have a "section to segment mapping" for .tbss:

======================================================================
File: libxo.so.0.r331837
[...]
Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x000000 0x00000000 0x00000000 0x17224 0x17224 R E 0x1000
  LOAD           0x017228 0x00018228 0x00018228 0x0074d 0x007a0 RW  0x1000
  DYNAMIC        0x017324 0x00018324 0x00018324 0x000d8 0x000d8 RW  0x4
  TLS            0x017228 0x00018228 0x00018228 0x00000 0x00658 R   0x8
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x4

 Section to Segment mapping:
  Segment Sections...
   00     .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame .comment .gnu_debuglink .shstrtab
   01     .tbss .ctors .dtors .jcr .data.rel.ro .dynamic .got .got.plt .data .bss
   02     .dynamic
   03     .tbss .ctors .dtors .jcr .data.rel.ro .dynamic .got .got.plt
   04
There are 28 section headers, starting at offset 0x17b4c:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
[...]
  [15] .tbss             NOBITS          00018228 017228 000658 00 WAT  0   0  8

======================================================================
File: /jail/test-r331838/lib/libxo.so.0.r331838
[...]
Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x000000 0x00000000 0x00000000 0x17160 0x17160 R E 0x1000
  LOAD           0x017160 0x00018160 0x00018160 0x00604 0x00654 RW  0x1000
  DYNAMIC        0x017264 0x00018264 0x00018264 0x000d8 0x000d8 RW  0x4
  TLS            0x017160 0x00018764 0x00018764 0x00000 0x00050 R   0x8
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x4

 Section to Segment mapping:
  Segment Sections...
   00     .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame .comment .gnu_debuglink .shstrtab
   01     .ctors .dtors .jcr .data.rel.ro .dynamic .got .got.plt .data .bss
   02     .dynamic
   03     .bss
   04
There are 28 section headers, starting at offset 0x1793c:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
[...]
  [15] .tbss             NOBITS          00018160 017160 000658 00 WAT  0   0  8


In both r331837 and r331838 worlds, /usr/bin/ld is the GNU BFD ld, so it can't be caused by ldd being updated from 5.0 to 6.0.  It must be something in an object file that is being linked into libxo.so.0.
Comment 21 Dimitry Andric freebsd_committer 2018-05-08 19:33:09 UTC
(In reply to Dimitry Andric from comment #20)
> it
> looks relevant that the r331838 version of libxo.so.0 (compiled with the
> clang 6.0.0 update) does NOT have a "section to segment mapping" for .tbss

Interestingly, with the non-stripped versions of libxo.so, this is not the case:

======================================================================
File: libxo.so.0.full.r331837
[...]
 Section to Segment mapping:
  Segment Sections...
   00     .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame .comment .debug_pubnames .debug_info .debug_abbrev .debug_line .debug_frame .debug_str .debug_loc .debug_macinfo .debug_pubtypes .debug_ranges .shstrtab .symtab .strtab
   01     .tbss .ctors .dtors .jcr .data.rel.ro .dynamic .got .got.plt .data .bss
   02     .dynamic
   03     .tbss .ctors .dtors .jcr .data.rel.ro .dynamic .got .got.plt
[...]
Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
[...]
  [15] .tbss             NOBITS          00018228 017228 000658 00 WAT  0   0  8

======================================================================
File: libxo.so.0.full.r331838
[...]
 Section to Segment mapping:
  Segment Sections...
   00     .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame .comment .debug_pubnames .debug_info .debug_abbrev .debug_line .debug_frame .debug_str .debug_loc .debug_macinfo .debug_pubtypes .debug_ranges .shstrtab .symtab .strtab
   01     .ctors .dtors .jcr .data.rel.ro .dynamic .got .got.plt .data .bss
   02     .dynamic
   03     .tbss .ctors .dtors .jcr .data.rel.ro .dynamic .got .got.plt .data .bss
[...]
Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
[...]
  [15] .tbss             NOBITS          00018160 017160 000658 00 WAT  0   0  8

So in case of r331837, segments 01 *and* 03 have a .tbss mapping, but in case of r331838, only segment 03 has it.  And after stripping, the r331838 version even misses the .tbss mappings completely:

======================================================================
File: libxo.so.0.r331838
[...]
 Section to Segment mapping:
  Segment Sections...
   00     .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame .comment .gnu_debuglink .shstrtab .symtab .strtab
   01     .ctors .dtors .jcr .data.rel.ro .dynamic .got .got.plt .data .bss
   02     .dynamic
   03     .ctors .dtors .jcr .data.rel.ro .dynamic .got .got.plt .data .bss

It seems elftoolchain strip completely eradicates the mapping, for some reason?
Comment 22 Phil Shafer freebsd_committer 2018-05-08 20:38:52 UTC
I can confirm that the non-stripped libxo with "uptime" function correctly:

% ln -s /usr/obj/usr/src/usr.bin/w/w.full /tmp/uptime
% /tmp/uptime
Segmentation fault (core dumped)
% env LD_LIBRARY_PATH=/usr/obj/usr/src/lib/libxo/ ldd /tmp/uptime
/tmp/uptime:
        libkvm.so.7 => /lib/libkvm.so.7 (0x28070000)
        libsbuf.so.6 => /lib/libsbuf.so.6 (0x2807d000)
        libxo.so.0 => /usr/obj/usr/src/lib/libxo//libxo.so.0 (0x28080000)
        libutil.so.9 => /lib/libutil.so.9 (0x28099000)
        libc.so.7 => /lib/libc.so.7 (0x280ab000)
        libelf.so.2 => /lib/libelf.so.2 (0x2820a000)
% env LD_LIBRARY_PATH=/usr/obj/usr/src/lib/libxo/ /tmp/uptime
 4:37PM  up 4 days,  8:32, 3 users, load averages: 0.69, 0.65, 0.54

Thanks,
 Phil
Comment 23 Phil Shafer freebsd_committer 2018-05-08 21:43:18 UTC
Another interesting data point, not that I'm sure what it means:

    % env LD_PRELOAD=/usr/lib/libpthread.so /tmp/uptime
     5:26PM  up 4 days,  9:22, 3 users, load averages: 0.55, 0.52, 0.51

(where /tmp/uptime is a symlink to /usr/obj/.../w/w).
(see https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.freebsd.org_bugzilla_show-
5Fbug.cgi-3Fid-3D227552&d=DwICAg&c=HAkYuh63rsuhr6Scbfh0UjBXeMK-ndb3voDTXcWzoCI&r=And7spKE
XmRNIrq8pYCiSg&m=j2VlW6Tfy8t6kyvK1oE9ZgEFjaSbidABW3nn8LB2aU0&s=5EdUIHB_DHlP55i7bc5ZIXEDP2
1GYA_bgfEqe5PY1Mg&e=)

Does the mean that the use of __thread requires -lpthread?  My
understanding was that the startup code handled thread-specific
data for the main thread of execution.

Thanks,
 Phil
Comment 24 Phil Shafer freebsd_committer 2018-05-11 17:22:08 UTC
I'm looking into why readelf output differs between the stripped and unstripped versions of the library, per comment #20.  readelf.c:2381 has the following code:

2371            printf("\n Section to Segment mapping:\n");
2372            printf("  Segment Sections...\n");
2373            for (i = 0; (size_t)i < phnum; i++) {
2374                    if (gelf_getphdr(re->elf, i, &phdr) != &phdr) {
2375                            warnx("gelf_getphdr failed: %s", elf_errmsg(-1));
2376                            continue;
2377                    }
2378                    printf("   %2.2d     ", i);
2379                    /* skip NULL section. */
2380                    for (j = 1; (size_t)j < re->shnum; j++)
2381                            if (re->sl[j].addr >= phdr.p_vaddr &&
2382                                re->sl[j].addr + re->sl[j].sz <=
2383                                phdr.p_vaddr + phdr.p_memsz)
2384                                    printf("%s ", re->sl[j].name);
2385                    printf("\n");

For the unstripped library, the output is:

 Section to Segment mapping:
  Segment Sections...
   00     .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame .comment .debug_pubnames .debug_info .debug_abbrev .debug_line .debug_frame .debug_str .debug_loc .debug_macinfo .debug_pubtypes .debug_ranges .shstrtab .symtab .strtab
   01     .ctors .dtors .jcr .data.rel.ro .dynamic .got .got.plt .data .bss
   02     .dynamic
   03     .tbss .ctors .dtors .jcr .data.rel.ro .dynamic .got .got.plt .data .bss
   04

where the stripped library says:

 Section to Segment mapping:
  Segment Sections...
   00     .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame .comment .shstrtab
   01     .ctors .dtors .jcr .data.rel.ro .dynamic .got .got.plt .data .bss
   02     .dynamic
   03     .bss
   04

So I breakpointed on line 2381 when i == 3 and j == 15.

For the unstripped library (the working one):

(gdb) p re->sl[j]
$18 = {name = 0x28626087 ".tbss", scn = 0x28621780, off = 94712, sz = 1624, entsize = 0,
  align = 8, type = 8, flags = 1027, addr = 98808, link = 0, info = 0}
(gdb) p phdr
$19 = {p_type = 7, p_flags = 4, p_offset = 94712, p_vaddr = 98808, p_paddr = 98808,
  p_filesz = 0, p_memsz = 1624, p_align = 8}
(gdb) p (re->sl[j].addr >= phdr.p_vaddr)
$20 = 1
(gdb) p (re->sl[j].addr + re->sl[j].sz <= phdr.p_vaddr + phdr.p_memsz)
$21 = 1

Both conditions are true.

For the stripped library (the failing one):

(gdb) p re->sl[j]
$13 = {name = 0x28621077 ".tbss", scn = 0x2861d780, off = 94712, sz = 1624, entsize = 0,
  align = 8, type = 8, flags = 1027, addr = 98808, link = 0, info = 0}
(gdb) p phdr
$15 = {p_type = 7, p_flags = 4, p_offset = 94712, p_vaddr = 100340, p_paddr = 100340,
  p_filesz = 0, p_memsz = 80, p_align = 8}
(gdb) p (re->sl[j].addr >= phdr.p_vaddr)
$14 = 0

The section's address (98808) is less than the segment's (100340), so
the section is no longer listed.

Perhaps is strip not updating the addresses as it removes sections?  Or is there a disagreement between clang-6 and binutils about elf layout?

Thanks,
 Phil
Comment 25 Phil Shafer freebsd_committer 2018-05-11 17:30:05 UTC
FWIW, here's the diff between unstripped and stripped readelf output:

@@ -10,14 +10,14 @@
   Version:                           0x1
   Entry point address:               0x2e60
   Start of program headers:          52 (bytes into file)
-  Start of section headers:          287684 (bytes into file)
+  Start of section headers:          96676 (bytes into file)
   Flags:                             0
   Size of this header:               52 (bytes)
   Size of program headers:           32 (bytes)
   Number of program headers:         5
   Size of section headers:           40 (bytes)
-  Number of section headers:         39
-  Section header string table index: 36
+  Number of section headers:         27
+  Section header string table index: 26

 Elf file type is DYN (Shared object file)
 Entry point 0x2e60
@@ -28,17 +28,17 @@
   LOAD           0x000000 0x00000000 0x00000000 0x171f8 0x171f8 R E 0x1000
   LOAD           0x0171f8 0x000181f8 0x000181f8 0x005fc 0x0064c RW  0x1000
   DYNAMIC        0x0172f4 0x000182f4 0x000182f4 0x000d8 0x000d8 RW  0x4
-  TLS            0x0171f8 0x000181f8 0x000181f8 0x00000 0x00658 R   0x8
+  TLS            0x0171f8 0x000187f4 0x000187f4 0x00000 0x00050 R   0x8
   GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x4

  Section to Segment mapping:
   Segment Sections...
-   00     .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame .comment .debug_pubnames .debug_info .debug_abbrev .debug_line .debug_frame .debug_str .debug_loc .debug_macinfo .debug_pubtypes .debug_ranges .shstrtab .symtab .strtab
+   00     .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame .comment .shstrtab
    01     .ctors .dtors .jcr .data.rel.ro .dynamic .got .got.plt .data .bss
    02     .dynamic
-   03     .tbss .ctors .dtors .jcr .data.rel.ro .dynamic .got .got.plt .data .bss
+   03     .bss
    04
-There are 39 section headers, starting at offset 0x463c4:
+There are 27 section headers, starting at offset 0x179a4:

 Section Headers:
   [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
@@ -49,8 +49,8 @@
   [ 4] .dynstr           STRTAB          00001648 001648 0009c2 00   A  0   0  1
   [ 5] .gnu.version      SUNW_versym     0000200a 00200a 000188 02   A  3   0  2
   [ 6] .gnu.version_r    SUNW_verneed    00002194 002194 000030 00   A  4   1  4
-  [ 7] .rel.dyn          REL             000021c4 0021c4 000438 08   A  3   0  4
-  [ 8] .rel.plt          REL             000025fc 0025fc 0002c0 08   A  3  10  4
+  [ 7] .rel.dyn          REL             000021c4 0021c4 000438 08  AI  3   0  4
+  [ 8] .rel.plt          REL             000025fc 0025fc 0002c0 08  AI  3  10  4
   [ 9] .init             PROGBITS        000028bc 0028bc 000011 00  AX  0   0  4
   [10] .plt              PROGBITS        000028d0 0028d0 000590 04  AX  0   0  4
   [11] .text             PROGBITS        00002e60 002e60 0129c0 00  AX  0   0 16
@@ -68,19 +68,7 @@
   [23] .data             PROGBITS        00018578 017578 00027c 00  WA  0   0  4
   [24] .bss              NOBITS          000187f4 0177f4 000050 00  WA  0   0  4
   [25] .comment          PROGBITS        00000000 0177f4 0000e6 01  MS  0   0  1
-  [26] .debug_pubnames   PROGBITS        00000000 0178da 0018dc 00      0   0  1
-  [27] .debug_info       PROGBITS        00000000 0191b6 00e557 00      0   0  1
-  [28] .debug_abbrev     PROGBITS        00000000 02770d 000951 00      0   0  1
-  [29] .debug_line       PROGBITS        00000000 02805e 00bb0b 00      0   0  1
-  [30] .debug_frame      PROGBITS        00000000 033b6c 001498 00      0   0  4
-  [31] .debug_str        PROGBITS        00000000 035004 0023d9 01  MS  0   0  1
-  [32] .debug_loc        PROGBITS        00000000 0373dd 00ce49 00      0   0  1
-  [33] .debug_macinfo    PROGBITS        00000000 044226 000003 00      0   0  1
-  [34] .debug_pubtypes   PROGBITS        00000000 044229 0009b5 00      0   0  1
-  [35] .debug_ranges     PROGBITS        00000000 044bde 001688 00      0   0  1
-  [36] .shstrtab         STRTAB          00000000 046266 00015e 00      0   0  1
-  [37] .symtab           SYMTAB          00000000 0469dc 000eb0 10     38  40  4
-  [38] .strtab           STRTAB          00000000 04788c 000c9d 00      0   0  1
+  [26] .shstrtab         STRTAB          00000000 0178da 0000c8 00      0   0  1
 Key to Flags:
   W (write), A (alloc), X (execute), M (merge), S (strings)
   I (info), L (link order), G (group), x (unknown)

Of particular interest is the TLS line, which changes from:

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  ...
  TLS            0x0171f8 0x000181f8 0x000181f8 0x00000 0x00658 R   0x8


to:

  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  ...
  TLS            0x0171f8 0x000187f4 0x000187f4 0x00000 0x00050 R   0x8

The size changes from 0x658 to 0x50 and given that the xo_default_handle is 328 bytes (0x148), this is likely broken.

Not that this explains the starting address being below segment #3's starting address......

Thanks,
 Phil
Comment 26 Phil Shafer freebsd_committer 2018-05-11 20:48:58 UTC
Looks to be a "strip" issue:

Jimi [lib/test]% mkdir works fails
Jimi [lib/test]% install -s /usr/obj/usr/src/lib/libxo/libxo.so.0.full works/libxo.so.0
Jimi [lib/test]% install -s /usr/obj/usr/src/lib/libxo/libxo.so.0.full fails/libxo.so.0
Jimi [lib/test]% ll */*0
-rwxr-xr-x  1 phil  phil  97756 May 11 16:43 fails/libxo.so.0*
-rwxr-xr-x  1 phil  phil  97756 May 11 16:43 works/libxo.so.0*
Jimi [lib/test]% env LD_LIBRARY_PATH=works /tmp/uptime
 4:45PM  up 7 days,  8:40, 3 users, load averages: 0.55, 0.45, 0.43
Jimi [lib/test]% env LD_LIBRARY_PATH=fails /tmp/uptime
 4:45PM  up 7 days,  8:40, 3 users, load averages: 0.51, 0.44, 0.43
Jimi [lib/test]% strip fails/libxo.so.0
Jimi [lib/test]% env LD_LIBRARY_PATH=fails /tmp/uptime
Segmentation fault (core dumped)
Jimi [lib/test]% readelf -e works/libxo.so.0 > works/out
Jimi [lib/test]% readelf -e fails/libxo.so.0 > fails/out
Jimi [lib/test]% diff -u works/out fails/out
--- works/out   2018-05-11 16:45:46.660037000 -0400
+++ fails/out   2018-05-11 16:45:56.004434000 -0400
@@ -28,7 +28,7 @@
   LOAD           0x000000 0x00000000 0x00000000 0x171f8 0x171f8 R E 0x1000
   LOAD           0x0171f8 0x000181f8 0x000181f8 0x005fc 0x0064c RW  0x1000
   DYNAMIC        0x0172f4 0x000182f4 0x000182f4 0x000d8 0x000d8 RW  0x4
-  TLS            0x0171f8 0x000181f8 0x000181f8 0x00000 0x0064c R   0x8
+  TLS            0x0171f8 0x000187f4 0x000187f4 0x00000 0x00050 R   0x8
   GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x4

  Section to Segment mapping:
@@ -36,7 +36,7 @@
    00     .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame .comment .shstrtab
    01     .ctors .dtors .jcr .data.rel.ro .dynamic .got .got.plt .data .bss
    02     .dynamic
-   03     .ctors .dtors .jcr .data.rel.ro .dynamic .got .got.plt .data .bss
+   03     .bss
    04
 There are 27 section headers, starting at offset 0x179a4:

Jimi [lib/test]% which strip
/usr/bin/strip
Jimi [lib/test]%


So "strip" (but not "install -s"?) doctors the TLS header, reducing the length and causing TLS bss data to be uninitialized.   Both versions have the .tbss section removed from the "Segment to Section" map.

Thanks,
 Phil
Comment 27 Phil Shafer freebsd_committer 2018-05-11 21:02:05 UTC
Even more odd, running "strip" twice on the same target gives the same TLS length change:

Jimi [lib/test]% strip -o mine /usr/obj/usr/src/lib/libxo/libxo.so.0.full
Jimi [lib/test]% readelf -e mine > before.elf
Jimi [lib/test]% strip mine
Jimi [lib/test]% readelf -e mine > after.elf
Jimi [lib/test]% diff -u before.elf after.elf
--- before.elf  2018-05-11 16:56:33.492235000 -0400
+++ after.elf   2018-05-11 16:56:40.876225000 -0400
@@ -28,7 +28,7 @@
   LOAD           0x000000 0x00000000 0x00000000 0x171f8 0x171f8 R E 0x1000
   LOAD           0x0171f8 0x000181f8 0x000181f8 0x005fc 0x0064c RW  0x1000
   DYNAMIC        0x0172f4 0x000182f4 0x000182f4 0x000d8 0x000d8 RW  0x4
-  TLS            0x0171f8 0x000181f8 0x000181f8 0x00000 0x0064c R   0x8
+  TLS            0x0171f8 0x000187f4 0x000187f4 0x00000 0x00050 R   0x8
   GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x4

  Section to Segment mapping:
@@ -36,7 +36,7 @@
    00     .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame .comment .shstrtab
    01     .ctors .dtors .jcr .data.rel.ro .dynamic .got .got.plt .data .bss
    02     .dynamic
-   03     .ctors .dtors .jcr .data.rel.ro .dynamic .got .got.plt .data .bss
+   03     .bss
    04
 There are 27 section headers, starting at offset 0x179a4:

I see the same issue when "strip" is used twice in a row (both with "-o"):

Jimi [lib/test]% strip -o mine /usr/obj/usr/src/lib/libxo/libxo.so.0.full
Jimi [lib/test]% readelf -e mine > before.elf
Jimi [lib/test]% strip -o never mine
Jimi [lib/test]% readelf -e never > after.elf
Jimi [lib/test]% diff -u before.elf after.elf
--- before.elf  2018-05-11 16:58:07.845980000 -0400
+++ after.elf   2018-05-11 16:58:44.398731000 -0400
@@ -28,7 +28,7 @@
   LOAD           0x000000 0x00000000 0x00000000 0x171f8 0x171f8 R E 0x1000
   LOAD           0x0171f8 0x000181f8 0x000181f8 0x005fc 0x0064c RW  0x1000
   DYNAMIC        0x0172f4 0x000182f4 0x000182f4 0x000d8 0x000d8 RW  0x4
-  TLS            0x0171f8 0x000181f8 0x000181f8 0x00000 0x0064c R   0x8
+  TLS            0x0171f8 0x000187f4 0x000187f4 0x00000 0x00050 R   0x8
   GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x4

  Section to Segment mapping:
@@ -36,7 +36,7 @@
    00     .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame .comment .shstrtab
    01     .ctors .dtors .jcr .data.rel.ro .dynamic .got .got.plt .data .bss
    02     .dynamic
-   03     .ctors .dtors .jcr .data.rel.ro .dynamic .got .got.plt .data .bss
+   03     .bss
    04
 There are 27 section headers, starting at offset 0x179a4:

----------

Off to look at strip.....  Please holler if this sounds familiar....

Thanks,
 Phil
Comment 28 Phil Shafer freebsd_committer 2018-05-13 05:38:04 UTC
Created attachment 193345 [details]
Fix from Kai Wang (kaiw@) for elfcopy

Many thanks for Kai Wang (kaiw@) for this fix!

Thanks,
 Phil
Comment 29 Phil Shafer freebsd_committer 2018-05-13 05:39:05 UTC
Created attachment 193346 [details]
Fix from Kai Wang (kaiw@) for readelf

Companion fix from Kai for readelf.

Thanks,
 Phil
Comment 30 Phil Shafer freebsd_committer 2018-05-13 06:22:57 UTC
Building head now; will commit fix tomorrow.

Thanks,
 Phil
Comment 31 commit-hook freebsd_committer 2018-05-14 05:21:54 UTC
A commit references this bug:

Author: phil
Date: Mon May 14 05:21:19 UTC 2018
New revision: 333600
URL: https://svnweb.freebsd.org/changeset/base/333600

Log:
  Handle thread-local storage (TLS) segments correctly when
  copying (objcopy) and displaying (readelf) them.

  PR:		227552
  Submitted by:	kaiw (maintainer)
  Reported by:	jachmann@unitix.org
  Reviewed by:	phil
  MFC after:	1 day

Changes:
  head/contrib/elftoolchain/elfcopy/elfcopy.h
  head/contrib/elftoolchain/elfcopy/sections.c
  head/contrib/elftoolchain/elfcopy/segments.c
  head/contrib/elftoolchain/readelf/readelf.c
Comment 32 Phil Shafer freebsd_committer 2018-05-14 05:23:12 UTC
Fix from kaiw@ is in head at r333600.  Building 11-stable now; will MFC tomorrow.

Thanks,
 Phil
Comment 33 Adam Stylinski 2018-05-17 14:11:36 UTC
I'm still seeing this bug in 11-STABLE with i386, even after these patches.  

root@fbsd-stable-vm:/usr/src/tools/tools/nanobsd # /usr/obj/nanobsd.ALIX/_.w/usr/bin/uptime 
Segmentation fault (core dumped)

(lldb) target create "/usr/obj/nanobsd.ALIX/_.w/usr/bin/uptime"
Current executable set to '/usr/obj/nanobsd.ALIX/_.w/usr/bin/uptime' (i386).
(lldb) r
Process 51608 launching
Process 51608 launched: '/usr/obj/nanobsd.ALIX/_.w/usr/bin/uptime' (i386)
Process 51608 stopped
* thread #1, name = 'uptime', stop reason = signal SIGBUS: hardware error
    frame #0: 0xffffffff
error: Bad address
(lldb) bt
* thread #1, name = 'uptime', stop reason = signal SIGBUS: hardware error
  * frame #0: 0xfffffff

(compiling i386 nanobsd images on amd64).
Comment 34 Dimitry Andric freebsd_committer 2018-05-17 14:33:51 UTC
(In reply to Adam Stylinski from comment #33)
> I'm still seeing this bug in 11-STABLE with i386, even after these patches.  
> 
> root@fbsd-stable-vm:/usr/src/tools/tools/nanobsd #
> /usr/obj/nanobsd.ALIX/_.w/usr/bin/uptime 
> Segmentation fault (core dumped)

It's probably loading a bad copy of libxo.so.0, from /lib.  What is the output of:

ldd /usr/obj/nanobsd.ALIX/_.w/usr/bin/uptime

and for the libxo.so.0 file listed there, show the output of:

readelf -lW <path from ldd output above>/libxo.so.0
Comment 35 Adam Stylinski 2018-05-17 14:41:17 UTC
(In reply to Dimitry Andric from comment #34)

root@fbsd-stable-vm:/usr/src/tools/tools/nanobsd # ldd /usr/obj/nanobsd.ALIX/_.w/usr/bin/uptime
/usr/obj/nanobsd.ALIX/_.w/usr/bin/uptime:
	libkvm.so.7 => /usr/lib32/libkvm.so.7 (0x28071000)
	libsbuf.so.6 => /usr/lib32/libsbuf.so.6 (0x2807e000)
	libxo.so.0 => /usr/lib32/libxo.so.0 (0x28081000)
	libutil.so.9 => /usr/lib32/libutil.so.9 (0x2809a000)
	libc.so.7 => /usr/lib32/libc.so.7 (0x280ac000)
	libelf.so.2 => /usr/lib32/libelf.so.2 (0x28213000)

root@fbsd-stable-vm:/usr/src/tools/tools/nanobsd # readelf -lW /usr/lib32/libxo.so.0

Elf file type is DYN (Shared object file)
Entry point 0x2e40
There are 5 program headers, starting at offset 52

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x000000 0x00000000 0x00000000 0x171cc 0x171cc R E 0x1000
  LOAD           0x0171d0 0x000181d0 0x000181d0 0x00604 0x00654 RW  0x1000
  DYNAMIC        0x0172d4 0x000182d4 0x000182d4 0x000d8 0x000d8 RW  0x4
  TLS            0x0171d0 0x000187d4 0x000187d4 0x00000 0x00050 R   0x8
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x4

 Section to Segment mapping:
  Segment Sections...
   00     .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame .comment .gnu_debuglink .shstrtab 
   01     .ctors .dtors .jcr .data.rel.ro .dynamic .got .got.plt .data .bss 
   02     .dynamic 
   03     .bss 
   04   

And on the hardware in question:

root@wap1:~ # ldd `which uptime`
/usr/bin/uptime:
	libkvm.so.7 => /lib/libkvm.so.7 (0x28070000)
	libsbuf.so.6 => /lib/libsbuf.so.6 (0x2807d000)
	libxo.so.0 => /lib/libxo.so.0 (0x28080000)
	libutil.so.9 => /lib/libutil.so.9 (0x28099000)
	libc.so.7 => /lib/libc.so.7 (0x280ab000)
	libelf.so.2 => /lib/libelf.so.2 (0x2820a000)

root@wap1:~ # readelf -lW /lib/libxo.so.0 

Elf file type is DYN (Shared object file)
Entry point 0x2e40
There are 5 program headers, starting at offset 52

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x000000 0x00000000 0x00000000 0x17160 0x17160 R E 0x1000
  LOAD           0x017160 0x00018160 0x00018160 0x00604 0x00654 RW  0x1000
  DYNAMIC        0x017264 0x00018264 0x00018264 0x000d8 0x000d8 RW  0x4
  TLS            0x017160 0x00018764 0x00018764 0x00000 0x00050 R   0x8
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x4

 Section to Segment mapping:
  Segment Sections...
   00     .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame .comment .gnu_debuglink .shstrtab 
   01     .ctors .dtors .jcr .data.rel.ro .dynamic .got .got.plt .data .bss 
   02     .dynamic 
   03     .bss 
   04 

root@wap1:~ # uptime
Segmentation fault
Comment 36 Dimitry Andric freebsd_committer 2018-05-17 14:48:41 UTC
(In reply to Adam Stylinski from comment #35)
> (In reply to Dimitry Andric from comment #34)
> 
> root@fbsd-stable-vm:/usr/src/tools/tools/nanobsd # ldd
> /usr/obj/nanobsd.ALIX/_.w/usr/bin/uptime
> /usr/obj/nanobsd.ALIX/_.w/usr/bin/uptime:
> 	libkvm.so.7 => /usr/lib32/libkvm.so.7 (0x28071000)
> 	libsbuf.so.6 => /usr/lib32/libsbuf.so.6 (0x2807e000)
> 	libxo.so.0 => /usr/lib32/libxo.so.0 (0x28081000)
> 	libutil.so.9 => /usr/lib32/libutil.so.9 (0x2809a000)
> 	libc.so.7 => /usr/lib32/libc.so.7 (0x280ac000)
> 	libelf.so.2 => /usr/lib32/libelf.so.2 (0x28213000)

Hmm this is weird, it should not link to 32 bit libraries, unless uptime itself is a 32 bit executable?  Are you doing a cross-build here?


> root@fbsd-stable-vm:/usr/src/tools/tools/nanobsd # readelf -lW
> /usr/lib32/libxo.so.0
> 
> Elf file type is DYN (Shared object file)
> Entry point 0x2e40
> There are 5 program headers, starting at offset 52
> 
> Program Headers:
>   Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
>   LOAD           0x000000 0x00000000 0x00000000 0x171cc 0x171cc R E 0x1000
>   LOAD           0x0171d0 0x000181d0 0x000181d0 0x00604 0x00654 RW  0x1000
>   DYNAMIC        0x0172d4 0x000182d4 0x000182d4 0x000d8 0x000d8 RW  0x4
>   TLS            0x0171d0 0x000187d4 0x000187d4 0x00000 0x00050 R   0x8

Yeah, this is definitely a messed up TLS section, produced by the buggy version of strip.


> And on the hardware in question:
> 
> root@wap1:~ # ldd `which uptime`
> /usr/bin/uptime:
> 	libkvm.so.7 => /lib/libkvm.so.7 (0x28070000)
> 	libsbuf.so.6 => /lib/libsbuf.so.6 (0x2807d000)
> 	libxo.so.0 => /lib/libxo.so.0 (0x28080000)
> 	libutil.so.9 => /lib/libutil.so.9 (0x28099000)
> 	libc.so.7 => /lib/libc.so.7 (0x280ab000)
> 	libelf.so.2 => /lib/libelf.so.2 (0x2820a000)

This looks more normal...


> root@wap1:~ # readelf -lW /lib/libxo.so.0 
> 
> Elf file type is DYN (Shared object file)
> Entry point 0x2e40
> There are 5 program headers, starting at offset 52
> 
> Program Headers:
>   Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
>   LOAD           0x000000 0x00000000 0x00000000 0x17160 0x17160 R E 0x1000
>   LOAD           0x017160 0x00018160 0x00018160 0x00604 0x00654 RW  0x1000
>   DYNAMIC        0x017264 0x00018264 0x00018264 0x000d8 0x000d8 RW  0x4
>   TLS            0x017160 0x00018764 0x00018764 0x00000 0x00050 R   0x8

But that is still messed up.  For some reason, it still used the buggy version of strip.
Comment 37 Adam Stylinski 2018-05-17 14:55:29 UTC
(In reply to Dimitry Andric from comment #36)

The build machine in question is on amd64 and is using nanobsd scripts to build images for i386. The install from which this build is happening is on 11-stable, updated as of yesterday afternoon (with make delete-old and make delete-old-libs run).  The same source tree (updated via svnup) is used to build the nanobsd image.
Comment 38 commit-hook freebsd_committer 2018-05-17 21:50:00 UTC
A commit references this bug:

Author: marius
Date: Thu May 17 21:49:35 UTC 2018
New revision: 333770
URL: https://svnweb.freebsd.org/changeset/base/333770

Log:
  MFC: r333600 (phil)

  Handle thread-local storage (TLS) segments correctly when
  copying (objcopy) and displaying (readelf) them.

  PR:		227552
  Submitted by:	kaiw (maintainer)
  Approved by:	re (gjb)

Changes:
_U  stable/11/
  stable/11/contrib/elftoolchain/elfcopy/elfcopy.h
  stable/11/contrib/elftoolchain/elfcopy/sections.c
  stable/11/contrib/elftoolchain/elfcopy/segments.c
  stable/11/contrib/elftoolchain/readelf/readelf.c
Comment 39 Adam Stylinski 2018-05-17 22:43:13 UTC
(In reply to commit-hook from comment #38)

Ahh, had this not been MFC'd yet?  I thought I saw it had in the web svn frontend but maybe I was mistakenly browsing /base.
Comment 40 Phil Shafer freebsd_committer 2018-05-17 23:10:48 UTC
The fix is not MFC'd to 11/stable yet.   I was waiting for the 3 day minimum MFC delay, but am away on business.  I'll try to get it in tomorrow.  If not, it will be MFC'd next week.

Thanks,
 Phil
Comment 41 Adam Stylinski 2018-05-18 13:42:18 UTC
(In reply to Phil Shafer from comment #40)

It looks like kaiw did it already.
Comment 42 Dimitry Andric freebsd_committer 2018-05-19 21:16:21 UTC
Fix implemented in head r333600, merged to stable/11 in r333770, and also available in 11.2-BETA2.