Bug 231790

Summary: objcopy: corrupts relocation entries in big-endian mips64 output when adjusting symbol indexes
Product: Base System Reporter: Ed Maste <emaste>
Component: kernAssignee: Ed Maste <emaste>
Status: Closed FIXED    
Severity: Affects Only Me CC: jhb, jkoshy, mips, sbruno
Priority: --- Keywords: regression
Version: CURRENT   
Hardware: mips   
OS: Any   
URL: https://reviews.freebsd.org/D17380
See Also: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231887
Bug Depends on:    
Bug Blocks: 228911    

Description Ed Maste freebsd_committer 2018-09-28 17:08:08 UTC
On an EdgeRouter Lite

uname -a: FreeBSD  12.0-ALPHA7 FreeBSD 12.0-ALPHA7 r338982+19c946c159e5(master) 
ERL  mips

# kldload /boot/kernel/geom_nop.ko
kldload: unexpected relocation type 3
kldload: unexpected relocation type 12
kldload: unexpected relocation type 11
<many more lines, types 3, 11, and 12>
cpu:0-Trap cause = 2 (TLB miss (load or instr. fetch) - kernel mode)
panic: trap
cpuid = 0
time = 1538152731
Uptime: 3h32m5s

Some examples of the relocations:
# for r in R_MIPS_REL32 R_MIPS_GPREL32 R_MIPS_16; do readelf -r /boot/kernel/geom_nop.ko | grep $r | head -n 5; done
000000000028 000200000003 R_MIPS_REL32        0000000000000000 .rodata + 420
000000000184 000200000003 R_MIPS_REL32        0000000000000000 .rodata + 3a0
0000000002a8 000200000003 R_MIPS_REL32        0000000000000000 .rodata + 420
000000000578 000200000003 R_MIPS_REL32        0000000000000000 .rodata + 2da4
0000000005c8 000200000003 R_MIPS_REL32        0000000000000000 .rodata + 2da4
000000000034 004a0000000c R_MIPS_GPREL32      0000000000000000 __mtx_unlock_sleep + 0
00000000004c 00630000000c R_MIPS_GPREL32      0000000000000000  + 0
000000000098 00040000000c R_MIPS_GPREL32      0000000000000000 .data + 0
0000000000b0 00300000000c R_MIPS_GPREL32      0000000000000000 arc4random + 0
0000000000d0 00090000000c R_MIPS_GPREL32      0000000000000478 g_nop_orphan + 0
000000000000 000200000001 R_MIPS_16           0000000000000000 .rodata + 0
000000000020 000200000001 R_MIPS_16           0000000000000000 .rodata + 440
000000000040 000200000001 R_MIPS_16           0000000000000000 .rodata + 478
000000000060 000200000001 R_MIPS_16           0000000000000000 .rodata + 4b0
000000000080 000200000001 R_MIPS_16           0000000000000000 .rodata + 2dd0
Comment 1 Sean Bruno freebsd_committer 2018-09-30 00:33:16 UTC
This works at svn r338477.  r338478 (which is a likely culprit), doesn't build until the follow on commit at r338485, and at that point the kldload -> panic error occurs.
Comment 2 Sean Bruno freebsd_committer 2018-09-30 19:40:16 UTC
Reverting 338478 and 338485 allows kldload to work again.
Comment 3 Ed Maste freebsd_committer 2018-10-01 11:57:21 UTC
Parsing the bad one readelf emits very many errors:
readelf: Error:  bad symbol index: 0000011b in reloc

And relocation differences:

│ @@ -1,36543 +1,36543 @@                                                       
│                                                                               
│  Relocation section '.rela.text' at offset 0x21508 contains 11885 entries:    
│      Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
│ -0000000000000018  000000030000000b R_MIPS_CALL16          0000000000000000 .rodata.str1.8 + 0
│ +0000000000000018  000000020000001d R_MIPS_HIGHEST         0000000000000000 .rodata + 0
│                      Type2: R_MIPS_NONE                                       
│                      Type3: R_MIPS_NONE                                       
│ -000000000000001c  0000000300000000 R_MIPS_NONE            0000000000000000 .rodata.str1.8 + 0
│ +000000000000001c  0000000200000005 R_MIPS_HI16            0000000000000000 .rodata + 0
│                      Type2: R_MIPS_NONE                                       
│                      Type3: R_MIPS_NONE                                       
│ -0000000000000020  000000030000000a R_MIPS_PC16            0000000000000000 .rodata.str1.8 + 0
│ +0000000000000020  000000020000001c R_MIPS_HIGHER          0000000000000000 .rodata + 0
│                      Type2: R_MIPS_NONE                                       
│                      Type3: R_MIPS_NONE                                       
│ -0000000000000024  0000000300000000 R_MIPS_NONE            0000000000000000 .rodata.str1.8 + 0
│ +0000000000000024  0000000200000006 R_MIPS_LO16            0000000000000000 .rodata + 0
│                      Type2: R_MIPS_NONE                                       
│                      Type3: R_MIPS_NONE                                       
│ -0000000000000040  000000040000000b R_MIPS_CALL16          0000000000000000 .data + 0
│ +0000000000000040  000000030000001d R_MIPS_HIGHEST         0000000000000000 .rodata.str1.8 + 0
│                      Type2: R_MIPS_NONE                                       
│                      Type3: R_MIPS_NONE                                       
│ -0000000000000044  0000000400000000 R_MIPS_NONE            0000000000000000 .data + 0
│ +0000000000000044  0000000300000005 R_MIPS_HI16            0000000000000000 .rodata.str1.8 + 0
│                      Type2: R_MIPS_NONE                                       
│                      Type3: R_MIPS_NONE                                       
│ -0000000000000048  000000040000000a R_MIPS_PC16            0000000000000000 .data + 0
│ +0000000000000048  000000030000001c R_MIPS_HIGHER          0000000000000000 .rodata.str1.8 + 0
│                      Type2: R_MIPS_NONE                                       
│                      Type3: R_MIPS_NONE                                       
│ -000000000000004c  0000000400000000 R_MIPS_NONE            0000000000000000 .data + 0
│ +000000000000004c  0000000300000006 R_MIPS_LO16            0000000000000000 .rodata.str1.8 + 0
│                      Type2: R_MIPS_NONE                                       
│                      Type3: R_MIPS_NONE                                       
│ -0000000000000060  000000040000000b R_MIPS_CALL16          0000000000000000 .data + 8
│ +0000000000000060  000000030000001d R_MIPS_HIGHEST         0000000000000000 .rodata.str1.8 + 8
│                      Type2: R_MIPS_NONE                                       
│                      Type3: R_MIPS_NONE                                       
│ -0000000000000064  0000000400000000 R_MIPS_NONE            0000000000000000 .data + 8
│ +0000000000000064  0000000300000005 R_MIPS_HI16            0000000000000000 .rodata.str1.8 + 8
│                      Type2: R_MIPS_NONE                                       
│                      Type3: R_MIPS_NONE                                       
│ -0000000000000068  000000040000000a R_MIPS_PC16            0000000000000000 .data + 8
│ +0000000000000068  000000030000001c R_MIPS_HIGHER          0000000000000000 .rodata.str1.8 + 8
│                      Type2: R_MIPS_NONE                                       
│                      Type3: R_MIPS_NONE                                       
│ -000000000000006c  0000000400000000 R_MIPS_NONE            0000000000000000 .data + 8
│ +000000000000006c  0000000300000006 R_MIPS_LO16            0000000000000000 .rodata.str1.8 + 8
│                      Type2: R_MIPS_NONE                                       
│                      Type3: R_MIPS_NONE                                       
...
Comment 4 Ed Maste freebsd_committer 2018-10-01 13:11:58 UTC
From another module,

  [ 3] .rela.text        RELA             0000000000000000  00004400
       00000000000095b8  0000000000000018   I      21     2     8

-00004400: 0000 0000 0000 0028 0000 0002 0000 0003  .......(........
+00004400: 0000 0000 0000 0028 0000 0001 0000 0004  .......(........
           -offset------------ -sym ndx- 

 00004410: 0000 0000 0000 0420 0000 0000 0000 0034  ....... .......4
           -addend------------ -offset------------

The symbol index and relocation type info is corrupt, but note that the sum of the symbol index and the following 32-bit word is equal: it appears we're adjusting the symbol index (i.e., because we've removed symbols) at the wrong time relative to endianness conversion.
Comment 5 John Baldwin freebsd_committer freebsd_triage 2018-10-01 16:40:59 UTC
This almost feels as if the EL change is being used on EB object files instead of only EL object files.  Those types of errors and wrong output from readelf are what I saw on EL prior to the change.  I'll have to try to get some time to build a new world to test.  However, just using readelf on an amd64 host against an existing mips64 BE object file is probably sufficient for testing.
Comment 6 Ed Maste freebsd_committer 2018-10-01 18:40:16 UTC
As far as I can tell readelf is broken before and after r338478 on BE mips64.

It looks like it is indeed a false-positive invocation of the le conversion - a breakpoint on _libelf_mips64el_r_info_tom when running "objcopy --strip-debug geom_mirror.ko.works out" stops on:

* thread #1, name = 'objcopy', stop reason = breakpoint 2.1
  * frame #0: 0x000000528b50f5c4 libelf.so.2`_libelf_mips64el_r_info_tom(r_info=8589934621) at gelf_mips64el.c:74
    frame #1: 0x000000528b50e9f2 libelf.so.2`gelf_getrela(ed=<unavailable>, ndx=<unavailable>, dst=0x00007fffffffe400) at gelf_rela.c:97
    frame #2: 0x00000000002124b7 objcopy`copy_content [inlined] update_reloc(ecp=<unavailable>) at sections.c:832
    frame #3: 0x00000000002123c1 objcopy`copy_content(ecp=<unavailable>) at sections.c:604
    frame #4: 0x000000000020db67 objcopy`create_elf(ecp=0x0000000800877000) at main.c:389
    frame #5: 0x000000000020f0a3 objcopy`create_file(ecp=0x0000000800877000, src="geom_mirror.ko.works", dst="geom_mirror.ko.works") at main.c:705
    frame #6: 0x000000000020ec81 objcopy`main [inlined] elfcopy_main(ecp=<unavailable>, argc=<unavailable>, argv=<unavailable>) at main.c:1028
    frame #7: 0x000000000020ec79 objcopy`main(argc=4, argv=0x00007fffffffe750) at main.c:1597
    frame #8: 0x0000000000209095 objcopy`_start(ap=<unavailable>, cleanup=<unavailable>) at crt1.c:74
Comment 7 Ed Maste freebsd_committer 2018-10-01 20:40:53 UTC
In create_file() we create the output ELF object with elf_begin(..., ELF_C_WRITE_, ...).

It calls _libelf_open_object where we default to the host's endianness for e_byteorder: e->e_byteorder = LIBELF_PRIVATE(byteorder);

We set the output elf header endianness as appropriate:
        if (ecp->oed == ELFDATANONE)
                ecp->oed = ieh.e_ident[EI_DATA];
        oeh.e_ident[EI_DATA]  = ecp->oed;

Then the oeh is passed to gelf_update_ehdr, but it does not update e_byteorder; this is done in _libelf_resync_elf().

There are several ways this could be addressed - I'm not sure what's best off hand, but updating e_byteorder in gelf_update_ehdr might be the easiest short-term fix.
Comment 8 John Baldwin freebsd_committer freebsd_triage 2018-10-01 20:55:06 UTC
Adding Joseph for his thoughts on how best to handle this.  The short-term fix you suggest might be the right way, but perhaps we need to be checking some other field instead in the _libelf_mips64el() function?  I'll defer to Joseph.
Comment 9 Ed Maste freebsd_committer 2018-10-01 21:04:11 UTC
(In reply to John Baldwin from comment #8)
Yes, after writing that comment I did some further poking around and believe just using e_ident[EI_DATA] in `_libelf_is_mips64el` is the best short-term fix - see review D17380. This also fixes readelf.
Comment 10 Ed Maste freebsd_committer 2018-10-02 14:34:42 UTC
Patch ready in D17380, waiting on re@ approval.
Comment 11 commit-hook freebsd_committer 2018-10-02 15:08:55 UTC
A commit references this bug:

Author: emaste
Date: Tue Oct  2 15:08:42 UTC 2018
New revision: 339083
URL: https://svnweb.freebsd.org/changeset/base/339083

Log:
  libelf: correct mips64el test to use ELF header

  libelf maintains two views of endianness: e_byteorder, and
  e_ident[EI_DATA] in the ELF header itself.  e_byteorder is not always
  kept in sync, so use the ELF header endianness to test for mips64el.

  PR:		231790
  Bisected by:	sbruno
  Reviewed by:	jhb
  Approved by:	re (kib)
  MFC with:	r338478
  Sponsored by:	The FreeBSD Foundation
  Differential Revision:	https://reviews.freebsd.org/D17380

Changes:
  head/contrib/elftoolchain/libelf/gelf_mips64el.c
Comment 12 commit-hook freebsd_committer 2018-11-09 21:45:47 UTC
A commit references this bug:

Author: emaste
Date: Fri Nov  9 21:45:44 UTC 2018
New revision: 340309
URL: https://svnweb.freebsd.org/changeset/base/340309

Log:
  Fix objcopy for little-endian MIPS64 objects.

  MFC r338478 (jhb): Fix objcopy for little-endian MIPS64 objects.

  MIPS64 does not store the 'r_info' field of a relocation table entry as
  a 64-bit value consisting of a 32-bit symbol index in the high 32 bits
  and a 32-bit type in the low 32 bits as on other architectures.  Instead,
  the 64-bit 'r_info' field is really a 32-bit symbol index followed by four
  individual byte type fields.  For big-endian MIPS64, treating this as a
  64-bit integer happens to be compatible with the layout expected by other
  architectures (symbol index in upper 32-bits of resulting "native" 64-bit
  integer).  However, for little-endian MIPS64 the parsed 64-bit integer
  contains the symbol index in the low 32 bits and the 4 individual byte
  type fields in the upper 32-bits (but as if the upper 32-bits were
  byte-swapped).

  To cope, add two helper routines in gelf_getrel.c to translate between the
  correct native 'r_info' value and the value obtained after the normal
  byte-swap translation.  Use these routines in gelf_getrel(), gelf_getrela(),
  gelf_update_rel(), and gelf_update_rela().  This fixes 'readelf -r' on
  little-endian MIPS64 objects which was previously decoding incorrect
  relocations as well as 'objcopy: invalid symbox index' warnings from
  objcopy when extracting debug symbols from kernel modules.

  Even with this fixed, objcopy was still crashing when trying to extract
  debug symbols from little-endian MIPS64 modules.  The workaround in
  gelf_*rel*() depends on the current ELF object having a valid ELF header
  so that the 'e_machine' field can be compared against EM_MIPS.  objcopy
  was parsing the relocation entries to possibly rewrite the 'r_info' fields
  in the update_relocs() function before writing the initial ELF header to
  the destination object file.  Move the initial write of the ELF header
  earlier before copy_contents() so that update_relocs() uses the correct
  symbol index values.

  Note that this change should really go upstream.  The binutils readelf
  source has a similar hack for MIPS64EL though I implemented this version
  from scratch using the MIPS64 ABI PDF as a reference.

  MFC r339083 (emaste): libelf: correct mips64el test to use ELF header

  libelf maintains two views of endianness: e_byteorder, and
  e_ident[EI_DATA] in the ELF header itself.  e_byteorder is not always
  kept in sync, so use the ELF header endianness to test for mips64el.

  MFC r339473 (emaste): libelf: also test for 64-bit ELF in _libelf_is_mips64el

  Although _libelf_is_mips64el is only called in contexts where we've
  already checked that e_class is ELFCLASS64 but this may change in the
  future.  Add a safety belt so that we don't access an invalid e_ehdr64
  union member if it does.

  PR:		231790

Changes:
_U  stable/11/
  stable/11/contrib/elftoolchain/elfcopy/main.c
  stable/11/contrib/elftoolchain/libelf/Makefile
  stable/11/contrib/elftoolchain/libelf/_libelf.h
  stable/11/contrib/elftoolchain/libelf/gelf_mips64el.c
  stable/11/contrib/elftoolchain/libelf/gelf_rel.c
  stable/11/contrib/elftoolchain/libelf/gelf_rela.c
  stable/11/sys/sys/param.h