This problem was found on FreeBSD 9.1-RELEASE-p4 when building Erlang/OTP. An automatic struct initialization is delayed beyond the place where it's used. The code looks like this: Eterm build_exec_return(Process *p, int rc, RestartContext *restartp, Eterm orig_subject) { Eterm res; Eterm *hp; if (rc <= 0) { .... } else { ReturnInfo *ri = restartp->ret_info; ReturnInfo defri = {RetNone,0,{0}}; if (ri == NULL) { ri = &defri; } if (ri->type == RetNone) { ... } else if (ri->type == RetIndex){ ... } else { ... } ... } I.e. if restartp->ret_info == NULL, ri will be set to &defri, which is initialized to {RetNone,0,{0}}. What happened was that the first branch ( else if (ri->type == RetNone)) was *not* taken when restartp->ret_info was null. Instead the default branch, i.e. the last else branch was taken. This happens when -O2 or -O3 optimization is on. It goes away with the slightest code change, so it's really hard to boil it down to a smaller example. I have prepares two source files, one which is basically gcc -E of the source file from the Erlang VM, and one that is a main program and a lot of stubs to make a runnable program. Here is a gdb session which shows the problem: $ gcc -Werror=return-type -g -O3 -fomit-frame-pointer -Wall -Wstrict-prototypes -Wmissing-prototypes -Wdeclaration-after-statement -o t a.c b.c -lpthread $ gdb t GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... (gdb) break build_exec_return Breakpoint 1 at 0x4012e0: file a.c, line 16411. (gdb) run Starting program: /ldisk/pan/report/t [New LWP 5286071] [New Thread 801007400 (LWP 5286071/t)] [Switching to Thread 801007400 (LWP 5286071/t)] Breakpoint 1, build_exec_return (p=0x0, rc=1, restartp=0x7fffffffd970, orig_subject=0) at a.c:16411 16411 { (gdb) n 16414 if (rc <= 0) { (gdb) 16411 { (gdb) 16414 if (rc <= 0) { (gdb) 16429 ReturnInfo *ri = restartp->ret_info; (gdb) 16431 if (ri == ((void *)0)) { (gdb) 16434 if (ri->type == RetNone) { (gdb) 16431 if (ri == ((void *)0)) { (gdb) 16434 if (ri->type == RetNone) { (gdb) 16436 } else if (ri->type == RetIndex){ (gdb) 16430 ReturnInfo defri = {RetNone,0,{0}}; (gdb) 16436 } else if (ri->type == RetIndex){ (gdb) 16501 if (restartp->flags & 0x1) { (gdb) quit As can be seen, the initialization of defri is done after the comparision of ri->type, so, a little bit depending on what was in the memory where defri resides, it will take any branc. The else branch beeing the most likely. Looking at the assembler instructions: 0x00000000004012e0 <build_exec_return+0>: push %r15 0x00000000004012e2 <build_exec_return+2>: mov %rdx,%r15 0x00000000004012e5 <build_exec_return+5>: push %r14 0x00000000004012e7 <build_exec_return+7>: push %r13 0x00000000004012e9 <build_exec_return+9>: push %r12 0x00000000004012eb <build_exec_return+11>: push %rbp 0x00000000004012ec <build_exec_return+12>: push %rbx 0x00000000004012ed <build_exec_return+13>: sub $0x98,%rsp 0x00000000004012f4 <build_exec_return+20>: test %esi,%esi 0x00000000004012f6 <build_exec_return+22>: mov %rdi,0x10(%rsp) 0x00000000004012fb <build_exec_return+27>: mov %esi,0xc(%rsp) 0x00000000004012ff <build_exec_return+31>: jle 0x4016e5 <build_exec_return+1029> 0x0000000000401305 <build_exec_return+37>: mov 0x88(%rdx),%rbp 0x000000000040130c <build_exec_return+44>: lea 0x70(%rsp),%rax 0x0000000000401311 <build_exec_return+49>: mov $0x490b,%ebx 0x0000000000401316 <build_exec_return+54>: test %rbp,%rbp 0x0000000000401319 <build_exec_return+57>: cmove %rax,%rbp 0x000000000040131d <build_exec_return+61>: mov 0x0(%rbp),%eax 0x0000000000401320 <build_exec_return+64>: cmp $0x3,%eax 0x0000000000401323 <build_exec_return+67>: je 0x401500 <build_exec_return+544> 0x0000000000401329 <build_exec_return+73>: test %eax,%eax 0x000000000040132b <build_exec_return+75>: movl $0x3,0x70(%rsp) 0x0000000000401333 <build_exec_return+83>: movl $0x0,0x74(%rsp) 0x000000000040133b <build_exec_return+91>: movl $0x0,0x78(%rsp) 0x0000000000401343 <build_exec_return+99>: jne 0x401515 <build_exec_return+565> We see the three instructions initializing the struct (0x000000000040132b, 0x0000000000401333 and 0x000000000040133b) happening after the cmp, but before the jne, which seems to me the wrong place. Doing even the slightest change, like adding an exit(1); to the else branch, will instead give the following: > gdb ./t GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... (gdb) break build_exec_return Breakpoint 1 at 0x4012e0: file a.c, line 16411. (gdb) run Starting program: /ldisk/pan/report/t [New LWP 5285172] [New Thread 801007400 (LWP 5285172/t)] [Switching to Thread 801007400 (LWP 5285172/t)] Breakpoint 1, build_exec_return (p=0x0, rc=1, restartp=0x7fffffffd970, orig_subject=0) at a.c:16411 16411 { (gdb) n 16414 if (rc <= 0) { (gdb) 16411 { (gdb) 16414 if (rc <= 0) { (gdb) 16429 ReturnInfo *ri = restartp->ret_info; (gdb) 16431 if (ri == ((void *)0)) { (gdb) 16430 ReturnInfo defri = {RetNone,0,{0}}; (gdb) 16434 if (ri->type == RetNone) { (gdb) 16431 if (ri == ((void *)0)) { (gdb) 16434 if (ri->type == RetNone) { (gdb) p *ri $1 = {type = RetNone, num_spec = -126, v = {0}} Here the type field is initialized before the comparision (the other fields are also initialized, but only just before they are used). This is as it should be. The assembler looks like this now: 0x00000000004012e0 <build_exec_return+0>: push %r15 0x00000000004012e2 <build_exec_return+2>: push %r14 0x00000000004012e4 <build_exec_return+4>: mov %rdi,%r14 0x00000000004012e7 <build_exec_return+7>: push %r13 0x00000000004012e9 <build_exec_return+9>: push %r12 0x00000000004012eb <build_exec_return+11>: push %rbp 0x00000000004012ec <build_exec_return+12>: push %rbx 0x00000000004012ed <build_exec_return+13>: sub $0x98,%rsp 0x00000000004012f4 <build_exec_return+20>: test %esi,%esi 0x00000000004012f6 <build_exec_return+22>: mov %esi,0x14(%rsp) 0x00000000004012fa <build_exec_return+26>: mov %rdx,0x8(%rsp) 0x00000000004012ff <build_exec_return+31>: jle 0x401642 <build_exec_return+866> 0x0000000000401305 <build_exec_return+37>: mov 0x88(%rdx),%rbp 0x000000000040130c <build_exec_return+44>: lea 0x70(%rsp),%rax 0x0000000000401311 <build_exec_return+49>: movl $0x3,0x70(%rsp) 0x0000000000401319 <build_exec_return+57>: mov $0x490b,%ebx 0x000000000040131e <build_exec_return+62>: test %rbp,%rbp 0x0000000000401321 <build_exec_return+65>: cmove %rax,%rbp 0x0000000000401325 <build_exec_return+69>: mov 0x0(%rbp),%eax 0x0000000000401328 <build_exec_return+72>: cmp $0x3,%eax 0x000000000040132b <build_exec_return+75>: je 0x4014b1 <build_exec_return+465> 0x0000000000401331 <build_exec_return+81>: test %eax,%eax 0x0000000000401333 <build_exec_return+83>: movl $0x0,0x74(%rsp) 0x000000000040133b <build_exec_return+91>: movl $0x0,0x78(%rsp) 0x0000000000401343 <build_exec_return+99>: jne 0x4014c6 <build_exec_return+486> The type field is initialized (0x0000000000401311) before the first cmp (0x0000000000401328), so everything is fine. The rest is initialized when needed. As i mentioned, the bug is very elusive. I've prepared a boiled down set of files, including a make file, which can be used to repeat the first gdb session. The program dums core when the wrong branch is taken, but prints out a message if the program behaves as expected. The tar file with the two c files and a Makefile can be fetched from http://erlang.org/~pan/BSD_GCC_trouble_report.tar The set of files is too large to be accepted as a shar attachment to the trouble report. Sorry for the bulky trouble report, that I cannot be more specific and that I cannot find a smaller example! Cheers, Patrik How-To-Repeat: A tar file with the test program can be fetched from: http://erlang.org/~pan/BSD_GCC_trouble_report.tar Untar, make and run the program with gdb, break in build_exec_return, run and step through the function, as in the description above. The defri struct will not be initialized before ri is dereferenced and the program will most probably take the wrong branch. Just running the program gives segmentation fault.
Thank you for taking the time to report this. As of FreeBSD 10.x we are not shipping the old gcc 4.2.1 by default and the old gcc in base for the 9.x series will not be receiving major updates. FWIW, On FreeBSD 10.1, using clang, I got this: $ make cc -Werror=return-type -g -O3 -fomit-frame-pointer -Wall -Wstrict-prototypes -Wmissing-prototypes -Wdeclaration-after-statement -o t a.c b.c -lpthread a.c:3249:8: error: unsupported inline asm: input with type 'ethr_sint_t' (aka 'long') matching output with type 'char' "3"(new[1]), ^~~~~~ 1 error generated. b.c:3252:8: error: unsupported inline asm: input with type 'ethr_sint_t' (aka 'long') matching output with type 'char' "3"(new[1]), ^~~~~~ 1 error generated. *** Error code 1 Stop. Perhaps you could try newer gcc or clang for ports?
For bugs matching the following conditions: - Status == In Progress - Assignee == "bugs@FreeBSD.org" - Last Modified Year <= 2017 Do - Set Status to "Open"
Close per discussion in comment 1.