Bug 181328

Summary: GCC 4.2.1 20070831 may delay initialization of automatic struct variable too much
Product: Base System Reporter: Patrik Nyblom <patrik.nyblom>
Component: gnuAssignee: freebsd-bugs mailing list <bugs>
Status: Closed Overcome By Events    
Severity: Affects Only Me CC: emaste, pfg
Priority: Normal    
Version: 9.1-RELEASE   
Hardware: Any   
OS: Any   

Description Patrik Nyblom 2013-08-15 16:30:00 UTC
This problem was found on FreeBSD 9.1-RELEASE-p4 when building
Erlang/OTP. An automatic struct initialization is delayed beyond the
place where it's used. The code looks like this:

Eterm build_exec_return(Process *p, int rc, RestartContext *restartp, Eterm orig_subject) 
{
    Eterm res;
    Eterm *hp;
    if (rc <= 0) {
       ....
    } else {
	ReturnInfo *ri = restartp->ret_info; 
 	ReturnInfo defri = {RetNone,0,{0}};
	if (ri == NULL) {
	    ri = &defri;
	}
	if (ri->type == RetNone) {
	    ...
	} else if (ri->type == RetIndex){
	    ...
	} else {
	    ...
	}      
    ...
}

I.e. if restartp->ret_info == NULL, ri will be set to &defri, which is
initialized to {RetNone,0,{0}}.

What happened was that the first branch ( else if (ri->type ==
RetNone)) was *not* taken when restartp->ret_info was null. Instead
the default branch, i.e. the last else branch was taken.

This happens when -O2 or -O3 optimization is on. It goes away with the
slightest code change, so it's really hard to boil it down to a
smaller example. I have prepares two source files, one which is
basically gcc -E of the source file from the Erlang VM, and one that
is a main program and a lot of stubs to make a runnable program. Here
is a gdb session which shows the problem:

$ gcc  -Werror=return-type  -g  -O3 -fomit-frame-pointer  -Wall -Wstrict-prototypes -Wmissing-prototypes -Wdeclaration-after-statement -o t a.c b.c -lpthread
$ gdb t
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...
(gdb) break build_exec_return
Breakpoint 1 at 0x4012e0: file a.c, line 16411.
(gdb) run
Starting program: /ldisk/pan/report/t 
[New LWP 5286071]
[New Thread 801007400 (LWP 5286071/t)]
[Switching to Thread 801007400 (LWP 5286071/t)]

Breakpoint 1, build_exec_return (p=0x0, rc=1, restartp=0x7fffffffd970, 
    orig_subject=0) at a.c:16411
16411   {
(gdb) n
16414       if (rc <= 0) {
(gdb) 
16411   {
(gdb) 
16414       if (rc <= 0) {
(gdb) 
16429    ReturnInfo *ri = restartp->ret_info;
(gdb) 
16431    if (ri == ((void *)0)) {
(gdb) 
16434    if (ri->type == RetNone) {
(gdb) 
16431    if (ri == ((void *)0)) {
(gdb) 
16434    if (ri->type == RetNone) {
(gdb) 
16436    } else if (ri->type == RetIndex){
(gdb) 
16430    ReturnInfo defri = {RetNone,0,{0}};
(gdb) 
16436    } else if (ri->type == RetIndex){
(gdb) 
16501        if (restartp->flags & 0x1) {
(gdb) quit

As can be seen, the initialization of defri is done after the
comparision of ri->type, so, a little bit depending on what was in the
memory where defri resides, it will take any branc.  The else branch
beeing the most likely.

Looking at the assembler instructions:
0x00000000004012e0 <build_exec_return+0>:       push   %r15
0x00000000004012e2 <build_exec_return+2>:       mov    %rdx,%r15
0x00000000004012e5 <build_exec_return+5>:       push   %r14
0x00000000004012e7 <build_exec_return+7>:       push   %r13
0x00000000004012e9 <build_exec_return+9>:       push   %r12
0x00000000004012eb <build_exec_return+11>:      push   %rbp
0x00000000004012ec <build_exec_return+12>:      push   %rbx
0x00000000004012ed <build_exec_return+13>:      sub    $0x98,%rsp
0x00000000004012f4 <build_exec_return+20>:      test   %esi,%esi
0x00000000004012f6 <build_exec_return+22>:      mov    %rdi,0x10(%rsp)
0x00000000004012fb <build_exec_return+27>:      mov    %esi,0xc(%rsp)
0x00000000004012ff <build_exec_return+31>:      jle    0x4016e5 <build_exec_return+1029>
0x0000000000401305 <build_exec_return+37>:      mov    0x88(%rdx),%rbp
0x000000000040130c <build_exec_return+44>:      lea    0x70(%rsp),%rax
0x0000000000401311 <build_exec_return+49>:      mov    $0x490b,%ebx
0x0000000000401316 <build_exec_return+54>:      test   %rbp,%rbp
0x0000000000401319 <build_exec_return+57>:      cmove  %rax,%rbp
0x000000000040131d <build_exec_return+61>:      mov    0x0(%rbp),%eax
0x0000000000401320 <build_exec_return+64>:      cmp    $0x3,%eax
0x0000000000401323 <build_exec_return+67>:      je     0x401500 <build_exec_return+544>
0x0000000000401329 <build_exec_return+73>:      test   %eax,%eax
0x000000000040132b <build_exec_return+75>:      movl   $0x3,0x70(%rsp)
0x0000000000401333 <build_exec_return+83>:      movl   $0x0,0x74(%rsp)
0x000000000040133b <build_exec_return+91>:      movl   $0x0,0x78(%rsp)
0x0000000000401343 <build_exec_return+99>:      jne    0x401515 <build_exec_return+565>

We see the three instructions initializing the struct
(0x000000000040132b, 0x0000000000401333 and 0x000000000040133b)
happening after the cmp, but before the jne, which seems to me the
wrong place.

Doing even the slightest change, like adding an exit(1); to the else
branch, will instead give the following:

> gdb ./t
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...
(gdb) break build_exec_return
Breakpoint 1 at 0x4012e0: file a.c, line 16411.
(gdb) run
Starting program: /ldisk/pan/report/t 
[New LWP 5285172]
[New Thread 801007400 (LWP 5285172/t)]
[Switching to Thread 801007400 (LWP 5285172/t)]

Breakpoint 1, build_exec_return (p=0x0, rc=1, restartp=0x7fffffffd970, orig_subject=0) at a.c:16411
16411   {
(gdb) n
16414       if (rc <= 0) {
(gdb) 
16411   {
(gdb) 
16414       if (rc <= 0) {
(gdb) 
16429    ReturnInfo *ri = restartp->ret_info;
(gdb) 
16431    if (ri == ((void *)0)) {
(gdb) 
16430    ReturnInfo defri = {RetNone,0,{0}};
(gdb) 
16434    if (ri->type == RetNone) {
(gdb) 
16431    if (ri == ((void *)0)) {
(gdb) 
16434    if (ri->type == RetNone) {
(gdb) p *ri
$1 = {type = RetNone, num_spec = -126, v = {0}}

Here the type field is initialized before the comparision (the other
fields are also initialized, but only just before they are used). This
is as it should be. The assembler looks like this now:

0x00000000004012e0 <build_exec_return+0>:       push   %r15
0x00000000004012e2 <build_exec_return+2>:       push   %r14
0x00000000004012e4 <build_exec_return+4>:       mov    %rdi,%r14
0x00000000004012e7 <build_exec_return+7>:       push   %r13
0x00000000004012e9 <build_exec_return+9>:       push   %r12
0x00000000004012eb <build_exec_return+11>:      push   %rbp
0x00000000004012ec <build_exec_return+12>:      push   %rbx
0x00000000004012ed <build_exec_return+13>:      sub    $0x98,%rsp
0x00000000004012f4 <build_exec_return+20>:      test   %esi,%esi
0x00000000004012f6 <build_exec_return+22>:      mov    %esi,0x14(%rsp)
0x00000000004012fa <build_exec_return+26>:      mov    %rdx,0x8(%rsp)
0x00000000004012ff <build_exec_return+31>:      jle    0x401642 <build_exec_return+866>
0x0000000000401305 <build_exec_return+37>:      mov    0x88(%rdx),%rbp
0x000000000040130c <build_exec_return+44>:      lea    0x70(%rsp),%rax
0x0000000000401311 <build_exec_return+49>:      movl   $0x3,0x70(%rsp)
0x0000000000401319 <build_exec_return+57>:      mov    $0x490b,%ebx
0x000000000040131e <build_exec_return+62>:      test   %rbp,%rbp
0x0000000000401321 <build_exec_return+65>:      cmove  %rax,%rbp
0x0000000000401325 <build_exec_return+69>:      mov    0x0(%rbp),%eax
0x0000000000401328 <build_exec_return+72>:      cmp    $0x3,%eax
0x000000000040132b <build_exec_return+75>:      je     0x4014b1 <build_exec_return+465>
0x0000000000401331 <build_exec_return+81>:      test   %eax,%eax
0x0000000000401333 <build_exec_return+83>:      movl   $0x0,0x74(%rsp)
0x000000000040133b <build_exec_return+91>:      movl   $0x0,0x78(%rsp)
0x0000000000401343 <build_exec_return+99>:      jne    0x4014c6 <build_exec_return+486>

The type field is initialized (0x0000000000401311) before the first
cmp (0x0000000000401328), so everything is fine. The rest is
initialized when needed.

As i mentioned, the bug is very elusive. I've prepared a boiled down
set of files, including a make file, which can be used to repeat the
first gdb session. The program dums core when the wrong branch is
taken, but prints out a message if the program behaves as
expected. The tar file with the two c files and a Makefile can be
fetched from http://erlang.org/~pan/BSD_GCC_trouble_report.tar The set
of files is too large to be accepted as a shar attachment to the
trouble report.

Sorry for the bulky trouble report, that I cannot be more specific and
that I cannot find a smaller example!

Cheers,
Patrik

How-To-Repeat: A tar file with the test program can be fetched from:
http://erlang.org/~pan/BSD_GCC_trouble_report.tar

Untar, make and run the program with gdb, break in build_exec_return,
run and step through the function, as in the description above. The
defri struct will not be initialized before ri is dereferenced and the
program will most probably take the wrong branch. Just running the
program gives segmentation fault.
Comment 1 Pedro F. Giffuni freebsd_committer 2015-01-22 15:49:30 UTC
Thank you for taking the time to report this.
As of FreeBSD 10.x we are not shipping the old gcc 4.2.1 by default and the old gcc in base for the 9.x series will not be receiving major updates.

FWIW, On FreeBSD 10.1, using clang, I got this:

$ make
cc  -Werror=return-type  -g  -O3 -fomit-frame-pointer  -Wall -Wstrict-prototypes -Wmissing-prototypes -Wdeclaration-after-statement -o t a.c b.c -lpthread
a.c:3249:8: error: unsupported inline asm: input with type 'ethr_sint_t'
      (aka 'long') matching output with type 'char'
   "3"(new[1]),
       ^~~~~~
1 error generated.
b.c:3252:8: error: unsupported inline asm: input with type 'ethr_sint_t'
      (aka 'long') matching output with type 'char'
   "3"(new[1]),
       ^~~~~~
1 error generated.
*** Error code 1

Stop.

Perhaps you could try newer gcc or clang for ports?
Comment 2 Eitan Adler freebsd_committer freebsd_triage 2018-05-20 23:50:26 UTC
For bugs matching the following conditions:
- Status == In Progress
- Assignee == "bugs@FreeBSD.org"
- Last Modified Year <= 2017

Do
- Set Status to "Open"
Comment 3 Ed Maste freebsd_committer 2019-08-12 18:01:01 UTC
Close per discussion in comment 1.