Bug 225105 - Linux static golang binaries crash at startup
Summary: Linux static golang binaries crash at startup
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-emulation mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-01-12 12:07 UTC by Edward Tomasz Napierala
Modified: 2018-08-27 07:45 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Edward Tomasz Napierala freebsd_committer 2018-01-12 12:07:27 UTC
Statically linked Linux binaries for golang seem to crash at startup, like this:

% kdump                       
 88877 ktrace   RET   ktrace 0
 88877 ktrace   CALL  execve(0x7fffffffec95,0x7fffffffea08,0x7fffffffea18)
 88877 ktrace   NAMI  "./gofmt"
 88877 ktrace   PFLT  0x617000 0x2<VM_PROT_WRITE>
 88877 ktrace   PRET  KERN_SUCCESS
 88877 ktrace   PFLT  0x7fffffffe000 0x2<VM_PROT_WRITE>
 88877 ktrace   PRET  KERN_SUCCESS
 88877 ktrace   PFLT  0x7fffffffd000 0x2<VM_PROT_WRITE>
 88877 ktrace   PRET  KERN_SUCCESS
 88877 gofmt    RET   linux_execve 0
 88877 gofmt    PFLT  0x618000 0x2<VM_PROT_WRITE>
 88877 gofmt    PRET  KERN_SUCCESS
 88877 gofmt    PFLT  0x636000 0x2<VM_PROT_WRITE>
 88877 gofmt    PRET  KERN_SUCCESS
 88877 gofmt    CALL  linux_arch_prctl(0x1002,0x618be8)
 88877 gofmt    RET   linux_arch_prctl 0
 88877 gofmt    PSIG  SIGSEGV SIG_DFL code=SEGV_MAPERR
 88877 gofmt    NAMI  "gofmt.core"

% gdb801 ./gofmt              
GNU gdb (GDB) 8.0.1 [GDB v8.0.1 for FreeBSD]
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd12.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./gofmt...done.
warning: Missing auto-load script at offset 0 in section .debug_gdb_scripts
of file /usr/home/en322/aosp/prebuilts/go/linux-x86/bin/gofmt.
Use `info auto-load python-scripts [REGEXP]' to list them.
(gdb) run
Starting program: /usr/home/en322/aosp/prebuilts/go/linux-x86/bin/gofmt 

Program received signal SIGSEGV, Segmentation fault.
runtime.rt0_go () at prebuilts/go/linux-x86/src/runtime/asm_amd64.s:149
149     prebuilts/go/linux-x86/src/runtime/asm_amd64.s: No such file or directory.
(gdb) where
#0  runtime.rt0_go () at prebuilts/go/linux-x86/src/runtime/asm_amd64.s:149
#1  0x0000000000000000 in ?? ()
(gdb) disass
Dump of assembler code for function runtime.rt0_go:
   0x0000000000453510 <+0>:     mov    %rdi,%rax
   0x0000000000453513 <+3>:     mov    %rsi,%rbx
   0x0000000000453516 <+6>:     sub    $0x27,%rsp
   0x000000000045351a <+10>:    and    $0xfffffffffffffff0,%rsp
   0x000000000045351e <+14>:    mov    %rax,0x10(%rsp)
   0x0000000000453523 <+19>:    mov    %rbx,0x18(%rsp)
   0x0000000000453528 <+24>:    lea    0x1c5151(%rip),%rdi        # 0x618680 <runtime.g0>
   0x000000000045352f <+31>:    lea    -0xff98(%rsp),%rbx
   0x0000000000453537 <+39>:    mov    %rbx,0x10(%rdi)
   0x000000000045353b <+43>:    mov    %rbx,0x18(%rdi)
   0x000000000045353f <+47>:    mov    %rbx,(%rdi)
   0x0000000000453542 <+50>:    mov    %rsp,0x8(%rdi)
   0x0000000000453546 <+54>:    xor    %eax,%eax
   0x0000000000453548 <+56>:    cpuid
   0x000000000045354a <+58>:    mov    %eax,%esi
   0x000000000045354c <+60>:    cmp    $0x0,%eax
   0x000000000045354f <+63>:    je     0x453656 <runtime.rt0_go+326>
   0x0000000000453555 <+69>:    cmp    $0x756e6547,%ebx
   0x000000000045355b <+75>:    jne    0x45357b <runtime.rt0_go+107>
   0x000000000045355d <+77>:    cmp    $0x49656e69,%edx
   0x0000000000453563 <+83>:    jne    0x45357b <runtime.rt0_go+107>
   0x0000000000453565 <+85>:    cmp    $0x6c65746e,%ecx
   0x000000000045356b <+91>:    jne    0x45357b <runtime.rt0_go+107>
   0x000000000045356d <+93>:    movb   $0x1,0x1e2ed0(%rip)        # 0x636444 <runtime.isIntel>
   0x0000000000453574 <+100>:   movb   $0x1,0x1e2ecd(%rip)        # 0x636448 <runtime.lfenceBeforeRdtsc>
   0x000000000045357b <+107>:   mov    $0x1,%eax
   0x0000000000453580 <+112>:   cpuid
   0x0000000000453582 <+114>:   mov    %eax,0x1e2f1c(%rip)        # 0x6364a4 <runtime.processorVersionInfo>
   0x0000000000453588 <+120>:   test   $0x4000000,%edx
   0x000000000045358e <+126>:   setne  0x1e2ebd(%rip)        # 0x636452 <runtime.support_sse2>
   0x0000000000453595 <+133>:   test   $0x200,%ecx
   0x000000000045359b <+139>:   setne  0x1e2eb3(%rip)        # 0x636455 <runtime.support_ssse3>
   0x00000000004535a2 <+146>:   test   $0x80000,%ecx
   0x00000000004535a8 <+152>:   setne  0x1e2ea4(%rip)        # 0x636453 <runtime.support_sse41>
   0x00000000004535af <+159>:   test   $0x100000,%ecx
   0x00000000004535b5 <+165>:   setne  0x1e2e98(%rip)        # 0x636454 <runtime.support_sse42>
   0x00000000004535bc <+172>:   test   $0x800000,%ecx
   0x00000000004535c2 <+178>:   setne  0x1e2e88(%rip)        # 0x636451 <runtime.support_popcnt>
   0x00000000004535c9 <+185>:   test   $0x2000000,%ecx
   0x00000000004535cf <+191>:   setne  0x1e2e74(%rip)        # 0x63644a <runtime.support_aes>
   0x00000000004535d6 <+198>:   test   $0x8000000,%ecx
   0x00000000004535dc <+204>:   setne  0x1e2e6d(%rip)        # 0x636450 <runtime.support_osxsave>
   0x00000000004535e3 <+211>:   test   $0x10000000,%ecx
   0x00000000004535e9 <+217>:   setne  0x1e2e5b(%rip)        # 0x63644b <runtime.support_avx>
   0x00000000004535f0 <+224>:   cmp    $0x7,%esi
   0x00000000004535f3 <+227>:   jl     0x453632 <runtime.rt0_go+290>
   0x00000000004535f5 <+229>:   mov    $0x7,%eax
   0x00000000004535fa <+234>:   xor    %ecx,%ecx
   0x00000000004535fc <+236>:   cpuid
   0x00000000004535fe <+238>:   test   $0x8,%ebx
   0x0000000000453604 <+244>:   setne  0x1e2e42(%rip)        # 0x63644d <runtime.support_bmi1>
   0x000000000045360b <+251>:   test   $0x20,%ebx
   0x0000000000453611 <+257>:   setne  0x1e2e34(%rip)        # 0x63644c <runtime.support_avx2>
   0x0000000000453618 <+264>:   test   $0x100,%ebx
   0x000000000045361e <+270>:   setne  0x1e2e29(%rip)        # 0x63644e <runtime.support_bmi2>
   0x0000000000453625 <+277>:   test   $0x200,%ebx
   0x000000000045362b <+283>:   setne  0x1e2e1d(%rip)        # 0x63644f <runtime.support_erms>
   0x0000000000453632 <+290>:   cmpb   $0x1,0x1e2e17(%rip)        # 0x636450 <runtime.support_osxsave>
   0x0000000000453639 <+297>:   jne    0x453648 <runtime.rt0_go+312>
   0x000000000045363b <+299>:   xor    %ecx,%ecx
   0x000000000045363d <+301>:   xgetbv
   0x0000000000453640 <+304>:   and    $0x6,%eax
   0x0000000000453643 <+307>:   cmp    $0x6,%eax
   0x0000000000453646 <+310>:   je     0x453656 <runtime.rt0_go+326>
   0x0000000000453648 <+312>:   movb   $0x0,0x1e2dfc(%rip)        # 0x63644b <runtime.support_avx>
   0x000000000045364f <+319>:   movb   $0x0,0x1e2df6(%rip)        # 0x63644c <runtime.support_avx2>
   0x0000000000453656 <+326>:   mov    0x1c43cb(%rip),%rax        # 0x617a28 <_cgo_init>
---Type <return> to continue, or q <return> to quit---
   0x000000000045365d <+333>:   test   %rax,%rax
   0x0000000000453660 <+336>:   je     0x453688 <runtime.rt0_go+376>
   0x0000000000453662 <+338>:   mov    %rdi,%rcx
   0x0000000000453665 <+341>:   lea    0x1bc4(%rip),%rsi        # 0x455230 <setg_gcc>
   0x000000000045366c <+348>:   callq  *%rax
   0x000000000045366e <+350>:   lea    0x1c500b(%rip),%rcx        # 0x618680 <runtime.g0>
   0x0000000000453675 <+357>:   mov    (%rcx),%rax
   0x0000000000453678 <+360>:   add    $0x370,%rax
   0x000000000045367e <+366>:   mov    %rax,0x10(%rcx)
   0x0000000000453682 <+370>:   mov    %rax,0x18(%rcx)
   0x0000000000453686 <+374>:   jmp    0x4536b7 <runtime.rt0_go+423>
   0x0000000000453688 <+376>:   lea    0x1c5551(%rip),%rdi        # 0x618be0 <runtime.m0+96>
   0x000000000045368f <+383>:   callq  0x457940 <runtime.settls>
=> 0x0000000000453694 <+388>:   movq   $0x123,%fs:0xfffffffffffffff8
   0x00000000004536a1 <+401>:   mov    0x1c5538(%rip),%rax        # 0x618be0 <runtime.m0+96>
   0x00000000004536a8 <+408>:   cmp    $0x123,%rax
   0x00000000004536ae <+414>:   je     0x4536b7 <runtime.rt0_go+423>
   0x00000000004536b0 <+416>:   mov    %eax,0x0
   0x00000000004536b7 <+423>:   lea    0x1c4fc2(%rip),%rcx        # 0x618680 <runtime.g0>
   0x00000000004536be <+430>:   mov    %rcx,%fs:0xfffffffffffffff8
   0x00000000004536c7 <+439>:   lea    0x1c54b2(%rip),%rax        # 0x618b80 <runtime.m0>
   0x00000000004536ce <+446>:   mov    %rcx,(%rax)
   0x00000000004536d1 <+449>:   mov    %rax,0x30(%rcx)
   0x00000000004536d5 <+453>:   cld    
   0x00000000004536d6 <+454>:   callq  0x4378c0 <runtime.check>
   0x00000000004536db <+459>:   mov    0x10(%rsp),%eax
   0x00000000004536df <+463>:   mov    %eax,(%rsp)
   0x00000000004536e2 <+466>:   mov    0x18(%rsp),%rax
   0x00000000004536e7 <+471>:   mov    %rax,0x8(%rsp)
   0x00000000004536ec <+476>:   callq  0x4372c0 <runtime.args>
   0x00000000004536f1 <+481>:   callq  0x4267a0 <runtime.osinit>
   0x00000000004536f6 <+486>:   callq  0x42b1f0 <runtime.schedinit>
   0x00000000004536fb <+491>:   lea    0x11f8e6(%rip),%rax        # 0x572fe8 <runtime.mainPC>
   0x0000000000453702 <+498>:   push   %rax
   0x0000000000453703 <+499>:   pushq  $0x0
   0x0000000000453705 <+501>:   callq  0x431b00 <runtime.newproc>
   0x000000000045370a <+506>:   pop    %rax
   0x000000000045370b <+507>:   pop    %rax
   0x000000000045370c <+508>:   callq  0x42d0b0 <runtime.mstart>
   0x0000000000453711 <+513>:   movl   $0xf1,0xf1
   0x000000000045371c <+524>:   retq 

That part of the go runtime source looks like this:

needtls:
#ifdef GOOS_plan9
        // skip TLS setup on Plan 9
        JMP ok
#endif
#ifdef GOOS_solaris
        // skip TLS setup on Solaris
        JMP ok
#endif

        LEAQ    runtime·m0+m_tls(SB), DI
        CALL    runtime·settls(SB)

        // store through it, to make sure it works
        get_tls(BX)
        MOVQ    $0x123, g(BX)
        MOVQ    runtime·m0+m_tls(SB), AX
        CMPQ    AX, $0x123
        JEQ 2(PC)
        MOVL    AX, 0   // abort
ok:
Comment 1 Edward Tomasz Napierala freebsd_committer 2018-01-12 12:28:46 UTC
FWIW, reverting r313993 doesn't seem to fix it.
Comment 2 Edward Tomasz Napierala freebsd_committer 2018-01-12 13:50:23 UTC
Also FWIW, reverting r313993 seems to break dynamically linked golang binaries.
Comment 3 Conrad Meyer freebsd_committer 2018-01-12 20:06:46 UTC
What is the Linux code for runtime.settls?  And what is the value of %fs after the crash?  Can you share the Linux gofmt binary somewhere (freefall)?
Comment 4 Conrad Meyer freebsd_committer 2018-01-12 20:27:22 UTC
(In reply to Edward Tomasz Napierala from comment #1)
I think r313993 did (sort of) introduce this bug.  I'm curious why reverting it does not fix the issue.  I think one problem may be the lack of set_pcb_flags(pcb, PCB_FULL_IRET) in linux_set_cloned_tls().

Here's the problem: AMD64_SET_FSBASE expects a pointer to a pointer:

          case AMD64_SET_FSBASE:
                  error = copyin(uap->parms, &a64base, sizeof(a64base));
                  if (!error) {
                          if (a64base < VM_MAXUSER_ADDRESS) {
                                  set_pcb_flags(pcb, PCB_FULL_IRET);
                                  pcb->pcb_fsbase = a64base;
                                  td->td_frame->tf_fs = _ufssel;
                          } else
                                  error = EINVAL;
                  }
                  break;

linux_arch_prctl() after r313993 is just passing in the pointer value itself:

	case LINUX_ARCH_SET_FS:
		bsd_args.op = AMD64_SET_FSBASE;
		bsd_args.parms = (void *)args->addr;
		error = sysarch(td, &bsd_args);

Previously, it would set the value args->addr directly:


	case LINUX_ARCH_SET_FS:
		error = linux_set_cloned_tls(td, (void *)args->addr);
...
linux_set_cloned_tls(struct thread *td, void *desc)
{
...
	pcb = td->td_pcb;
	pcb->pcb_fsbase = (register_t)desc;


Please try this patch:

--- a/sys/amd64/linux/linux_machdep.c
+++ b/sys/amd64/linux/linux_machdep.c
@@ -234,14 +234,14 @@ linux_arch_prctl(struct thread *td, struct linux_arch_prctl_args *args)
        switch (args->code) {
        case LINUX_ARCH_SET_GS:
                bsd_args.op = AMD64_SET_GSBASE;
-               bsd_args.parms = (void *)args->addr;
+               bsd_args.parms = (void *)&args->addr;
                error = sysarch(td, &bsd_args);
                if (error == EINVAL)
                        error = EPERM;
                break;
        case LINUX_ARCH_SET_FS:
                bsd_args.op = AMD64_SET_FSBASE;
-               bsd_args.parms = (void *)args->addr;
+               bsd_args.parms = (void *)&args->addr;
                error = sysarch(td, &bsd_args);
                if (error == EINVAL)
                        error = EPERM;


I would also consider changing linux_set_cloned_tls to match sysarch() AMD64_SET_FSBASE:

@@ -271,6 +271,7 @@ linux_set_cloned_tls(struct thread *td, void *desc)
                return (EPERM);

        pcb = td->td_pcb;
+       set_pcb_flags(pcb, PCB_FULL_IRET);
        pcb->pcb_fsbase = (register_t)desc;
        td->td_frame->tf_fs = _ufssel;



Or better yet, just invoking sysarch() as well:

@@ -265,14 +265,13 @@ linux_arch_prctl(struct thread *td, struct linux_arch_prctl_args *args)
 int
 linux_set_cloned_tls(struct thread *td, void *desc)
 {
-       struct pcb *pcb;
-
-       if ((uint64_t)desc >= VM_MAXUSER_ADDRESS)
-               return (EPERM);
-
-       pcb = td->td_pcb;
-       pcb->pcb_fsbase = (register_t)desc;
-       td->td_frame->tf_fs = _ufssel;
+       struct sysarch_args bsd_args;
+       int error;

-       return (0);
+       bsd_args.op = AMD64_SET_FSBASE;
+       bsd_args.parms = (void *)&desc;
+       error = sysarch(td, &bsd_args);
+       if (error == EINVAL)
+               error = EPERM;
+       return (error);
 }
Comment 5 Edward Tomasz Napierala freebsd_committer 2018-01-13 14:02:37 UTC
The first chunks (passing pointers to pointers) breaks the dynamically linked binary:

% ./go
cannot set up thread-local storage: cannot set %fs base address for thread-local storage; ktrace looks like this:

 64990 go       CALL  linux_arch_prctl(0x1002,0x800ac31c0)
 64990 go       RET   linux_arch_prctl -1 errno -14 Bad address
 64990 go       CALL  writev(0x2,0x7fffffffc410,0x3)
 64990 go       GIO   fd 2 wrote 89 bytes
       "cannot set up thread-local storage: cannot set %fs base address for thread-local storage
       "
 64990 go       RET   writev 89/0x59
 64990 go       CALL  linux_exit_group(0x7f)

It also doesn't fix the statically linked binary, although it changes the way it fails:
(gdb) run
Starting program: /usr/home/en322/aosp/prebuilts/go/linux-x86/bin/gofmt 

Program received signal SIGSEGV, Segmentation fault.
runtime.settls () at prebuilts/go/linux-x86/src/runtime/sys_linux_amd64.s:524
524     prebuilts/go/linux-x86/src/runtime/sys_linux_amd64.s: No such file or directory.
(gdb) where
#0  runtime.settls () at prebuilts/go/linux-x86/src/runtime/sys_linux_amd64.s:524
#1  0x0000000000453694 in runtime.rt0_go () at prebuilts/go/linux-x86/src/runtime/asm_amd64.s:145
#2  0x0000000000000000 in ?? ()
Comment 6 Edward Tomasz Napierala freebsd_committer 2018-01-13 14:04:03 UTC
Now, reverting r313993 and instead applying the second chunk (set_pcb_flags()) does seem to make both dynamic and static binaries work.
Comment 7 Edward Tomasz Napierala freebsd_committer 2018-01-13 14:09:08 UTC
If instead I revert r313993 and apply the final chunk (turning linux_set_cloned_tls() into a wrapper), we're back to both binaries crashing.
Comment 8 Edward Tomasz Napierala freebsd_committer 2018-01-13 14:12:59 UTC
The binaries in question are here:

https://people.freebsd.org/~trasz/go
https://people.freebsd.org/~trasz/gofmt
Comment 9 Conrad Meyer freebsd_committer 2018-01-13 19:29:28 UTC
(In reply to Edward Tomasz Napierala from comment #8)
Thanks.  I hope to take a deeper look soon.
Comment 10 chardon.frederic 2018-08-27 07:10:25 UTC
(In reply to Conrad Meyer from comment #9)

Any news on this? I got the same problem, doing as trasz wrote in comment #6 solved the issue.
Thanks
Comment 11 Yanko Yankulov 2018-08-27 07:45:06 UTC
Hi guys,


Hit this a few weeks ago haven't notice the ticket, the sysarch call will try to load the arguments from userspace and fail as they are on the kernel stack. A working (but ugly) solution is to duplicate the code in linux_machdep:

--- a/sys/amd64/linux/linux_machdep.c
+++ b/sys/amd64/linux/linux_machdep.c
@@ -240,10 +240,14 @@ linux_arch_prctl(struct thread *td, struct linux_arch_prctl_args *args)
                        error = EPERM;
                break;
        case LINUX_ARCH_SET_FS:
-               bsd_args.op = AMD64_SET_FSBASE;
-               bsd_args.parms = (void *)args->addr;
-               error = sysarch(td, &bsd_args);
-               if (error == EINVAL)
+               if (args->addr < VM_MAXUSER_ADDRESS) {
+                       struct pcb *pcb = curthread->td_pcb;
+                       set_pcb_flags(pcb, PCB_FULL_IRET);
+                       pcb->pcb_fsbase = args->addr;
+                       td->td_frame->tf_fs = _ufssel;
+                       error = 0;
+               }
+               else
                        error = EPERM;

A better solution will be to change sysarch to accept addition parameter about the location of the memory, but haven't have the time to explore this path yet.

Hope this helps.