On a recent FreeBSD stable/11 the rescue binary content varies depending on the number of cpu cores. With a 2 core system I can reproduce the binaries from the official 11.4-BETA2 release, with a 4 core system I get different results: SHA256 (/images/FreeBSD/FreeBSD-11.4-BETA2-amd64-r361067-official/rescue/cp) = 3c96ab697fc721c226f99971863bcadbbbc73ccfec1abe2f04cbce60e3276a89 SHA256 (/images/FreeBSD/FreeBSD-11.4-BETA2-amd64-r361067-rebuild-2cores-t520/rescue/cp) = 3c96ab697fc721c226f99971863bcadbbbc73ccfec1abe2f04cbce60e3276a89 SHA256 (/images/FreeBSD/FreeBSD-11.4-BETA2-amd64-r361067-rebuild-4cores/rescue/cp) = 06e0b2964901fa02ed2f4e02c1334cb82ce1d97f4a6fc7a2632a6f74b35955d1 Diffoscope seems to indicate that some offsets are different. The output starts with: --- FreeBSD/FreeBSD-11.4-BETA2-amd64-r361067-official/rescue/cp +++ FreeBSD/FreeBSD-11.4-BETA2-amd64-r361067-rebuild-4cores/rescue/cp ├── readelf --wide --sections {} │ @@ -1,16 +1,16 @@ │ There are 19 section headers, starting at offset 0xa18bb0: │ │ Section Headers: │ [Nr] Name Type Addr Off Size ES Flg Lk Inf Al │ [ 0] NULL 0000000000000000 000000 000000 00 0 0 0 │ [ 1] .note.tag NOTE 0000000000400190 000190 000030 00 A 0 0 4 │ [ 2] .init PROGBITS 00000000004001c0 0001c0 000018 00 AX 0 0 4 │ - [ 3] .text PROGBITS 0000000000400200 000200 6dab08 00 AX 0 0 64 │ - [ 4] .fini PROGBITS 0000000000adad08 6dad08 00000e 00 AX 0 0 4 │ + [ 3] .text PROGBITS 0000000000400200 000200 6daa88 00 AX 0 0 64 │ + [ 4] .fini PROGBITS 0000000000adac88 6dac88 00000e 00 AX 0 0 4 │ [ 5] .rodata PROGBITS 0000000000adb000 6db000 216bd8 00 A 0 0 4096 │ [ 6] .eh_frame_hdr PROGBITS 0000000000cf1bd8 8f1bd8 0245ec 00 A 0 0 4 │ [ 7] .eh_frame X86_64_UNWIND 0000000000d161c8 9161c8 0bb90c 00 A 0 0 8 │ [ 8] .tdata PROGBITS 0000000000fd2000 9d2000 000068 00 WAT 0 0 16 │ [ 9] .tbss NOBITS 0000000000fd2070 9d2068 000840 00 WAT 0 0 16 │ [10] .ctors PROGBITS 0000000000fd2070 9d2070 0000c0 00 WA 0 0 8 │ [11] .dtors PROGBITS 0000000000fd2130 9d2130 000018 00 WA 0 0 8 ├── readelf --wide --debug-dump=frames {} │ @@ -9894,21179 +9894,21195 @@ │ DW_CFA_nop │ DW_CFA_nop │ DW_CFA_nop │ DW_CFA_nop │ DW_CFA_nop │ DW_CFA_nop │ │ -00005ee0 0000002c 00005ee4 FDE cie=00000000 pc=0043c780..0043d57c │ +00005ee0 0000002c 00005ee4 FDE cie=00000000 pc=0043c780..0043d4c3 │ DW_CFA_advance_loc: 1 to 0043c781 │ DW_CFA_def_cfa_offset: 16 │ DW_CFA_offset: r6 (rbp) at cfa-16 │ DW_CFA_advance_loc: 3 to 0043c784 │ DW_CFA_def_cfa_register: r6 (rbp) │ DW_CFA_advance_loc: 16 to 0043c794 │ DW_CFA_offset: r3 (rbx) at cfa-56 │ DW_CFA_offset: r12 (r12) at cfa-48 │ DW_CFA_offset: r13 (r13) at cfa-40 │ DW_CFA_offset: r14 (r14) at cfa-32 │ DW_CFA_offset: r15 (r15) at cfa-24 │ - DW_CFA_advance_loc2: 3554 to 0043d576 │ + DW_CFA_advance_loc2: 3369 to 0043d4bd │ DW_CFA_def_cfa: r7 (rsp) ofs 8 │ - DW_CFA_advance_loc: 1 to 0043d577 │ + DW_CFA_advance_loc: 1 to 0043d4be │ DW_CFA_def_cfa: r6 (rbp) ofs 16 │ DW_CFA_nop │ DW_CFA_nop The whole diff in text format has a size of 217M so I'm not attaching it. I first saw the issue after rebasing ElectroBSD on r360524 and confirmed that it still exists in r360986. The problem does not exist in r357496 and earlier but I don't yet know the exact revision when the problem was introduced. I can't reproduce the issue on FreeBSD-13.0-CURRENT amd64 using the 20200514-r361019 snapshot. I haven't tested 12/stable.
Notably, .text is different (different size 0x6dab08 vs 0x6daa88). The relocations and framedata seem to correspond to that (though the section-level diff does not show any difference in .eh_frame?).
diffoscope should disassemble the .text sections and compare the result; would you be able to extract and post a few dozen lines from where .text diverges? Is this unaffected by -j? Are there other differences between the 2 core and 4 core systems?
Created attachment 214763 [details] First 600 lines of the text section difference
I'm always compiling with -j2. Once I noticed the issue using completely different machines I reproduced it with bhyve changing only the core number.
Ah, the early differences are just different offsets due to later changes, │ - 40024a: e8 21 94 69 00 callq 0xa99670 │ + 40024a: e8 a1 93 69 00 callq 0xa995f0 Do you have somewhere you could share the full text diff? Or, find the first non-callq difference, perhaps around 0x9f....
I uploaded the whole compressed diff to: https://www.fabiankeil.de/tmp/FreeBSD-11.4-BETA2-amd64-r361067-rescue-cp.diff.gz (33 MB)
Diff seems to start here (I reordered lines for clarity): │ @@ -63992,183658 +63992,183673 @@ ... │ - 43c7b5: 49 89 cd mov %rcx,%r13 │ + 43c7b5: 49 89 cf mov %rcx,%r15 // ^^ r13 -> r15 │ - 43c7b8: 49 29 c5 sub %rax,%r13 │ + 43c7b8: 49 29 c7 sub %rax,%r15 │ - 43c7bb: 4c 89 ec mov %r13,%rsp │ + 43c7bb: 4c 89 fc mov %r15,%rsp ... │ - 43c803: 4c 63 c2 movslq %edx,%r8 │ + 43c803: 4c 63 da movslq %edx,%r11 // ^^ r8 -> r11 │ - 43c806: 49 c1 e0 03 shl $0x3,%r8 │ + 43c806: 49 c1 e3 03 shl $0x3,%r11 │ - 43c80a: 4c 03 05 e7 7a bc 00 add 0xbc7ae7(%rip),%r8 # 0x10042f8 │ + 43c80a: 4c 03 1d e7 7a bc 00 add 0xbc7ae7(%rip),%r11 # 0x10042f8 │ - 43c811: 4c 89 05 d0 7a bc 00 mov %r8,0xbc7ad0(%rip) # 0x10042e8 │ + 43c811: 4c 89 1d d0 7a bc 00 mov %r11,0xbc7ad0(%rip) # 0x10042e8 │ - 43c818: 4c 3b 05 d1 7a bc 00 cmp 0xbc7ad1(%rip),%r8 # 0x10042f0 │ + 43c818: 4c 3b 1d d1 7a bc 00 cmp 0xbc7ad1(%rip),%r11 # 0x10042f0 │ 43c81f: 76 07 jbe 0x43c828 │ - 43c821: 4c 89 05 c8 7a bc 00 mov %r8,0xbc7ac8(%rip) # 0x10042f0 │ + 43c821: 4c 89 1d c8 7a bc 00 mov %r11,0xbc7ac8(%rip) # 0x10042f0 These particular changes are not the ones that affect code *size* (those must be later), but do show reproducibility problems.
First code-size change is shortly afterwards: │ - 43c83e: 45 31 c0 xor %r8d,%r8d │ - 43c841: 44 8a 3b mov (%rbx),%r15b │ + 43c83e: 45 31 db xor %r11d,%r11d │ + 43c841: 8a 0b mov (%rbx),%cl %cl (low 8 bits of %ecx) has a shorter mov.b representation than %r15b (low 8 bits of amd64 %r15).
So clearly register selection is non-deterministic for some reason. Perhaps some optimization pass runs for a non-deterministic amount of time or % of system memory, and scaling -j impacts that?
Here is where the interesting parts start - I've rearranged the diff -/+ to make it easier to see where differences exist: │ - 43c803: 4c 63 c2 movslq %edx,%r8 │ + 43c803: 4c 63 da movslq %edx,%r11 │ - 43c806: 49 c1 e0 03 shl $0x3,%r8 │ + 43c806: 49 c1 e3 03 shl $0x3,%r11 │ - 43c80a: 4c 03 05 e7 7a bc 00 add 0xbc7ae7(%rip),%r8 # 0x10042f8 │ + 43c80a: 4c 03 1d e7 7a bc 00 add 0xbc7ae7(%rip),%r11 # 0x10042f8 │ - 43c811: 4c 89 05 d0 7a bc 00 mov %r8,0xbc7ad0(%rip) # 0x10042e8 │ + 43c811: 4c 89 1d d0 7a bc 00 mov %r11,0xbc7ad0(%rip) # 0x10042e8 │ - 43c818: 4c 3b 05 d1 7a bc 00 cmp 0xbc7ad1(%rip),%r8 # 0x10042f0 │ + 43c818: 4c 3b 1d d1 7a bc 00 cmp 0xbc7ad1(%rip),%r11 # 0x10042f0 │ 43c81f: 76 07 jbe 0x43c828 │ - 43c821: 4c 89 05 c8 7a bc 00 mov %r8,0xbc7ac8(%rip) # 0x10042f0 │ + 43c821: 4c 89 1d c8 7a bc 00 mov %r11,0xbc7ac8(%rip) # 0x10042f0 │ 43c828: 48 b8 00 00 00 00 01 mov $0x100000000,%rax │ 43c82f: 00 00 00 │ 43c832: 49 01 c6 add %rax,%r14 │ 43c835: 49 c1 fe 20 sar $0x20,%r14 │ 43c839: 4c 01 f3 add %r14,%rbx │ 43c83c: eb 03 jmp 0x43c841 │ - 43c83e: 45 31 c0 xor %r8d,%r8d │ + 43c83e: 45 31 db xor %r11d,%r11d │ - 43c841: 44 8a 3b mov (%rbx),%r15b │ + 43c841: 8a 0b mov (%rbx),%cl │ - 43c844: 45 31 c9 xor %r9d,%r9d │ + 43c843: 45 31 c9 xor %r9d,%r9d │ - 43c847: 45 84 ff test %r15b,%r15b │ + 43c846: 84 c9 test %cl,%cl │ + 43c848: 4c 89 ad 00 ff ff ff mov %r13,-0x100(%rbp) │ - 43c84a: 74 60 je 0x43c8ac │ + 43c84f: 74 4f je 0x43c8a0 │ - 43c84c: 45 31 f6 xor %r14d,%r14d │ + 43c851: 45 31 e4 xor %r12d,%r12d │ - 43c84f: 48 b8 01 00 00 00 89 mov $0x1288900000001,%rax │ - 43c856: 28 01 00 │ + 43c854: 48 b8 01 00 00 00 89 mov $0x1288900000001,%rax │ + 43c85b: 28 01 00 │ - 43c859: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) │ + 43c85e: 66 90 xchg %ax,%ax │ 43c860: ba 01 00 00 00 mov $0x1,%edx │ - 43c865: 44 89 f9 mov %r15d,%ecx │ - 43c868: 48 d3 e2 shl %cl,%rdx │ + 43c865: 48 d3 e2 shl %cl,%rdx │ - 43c86b: 41 80 ff 3f cmp $0x3f,%r15b │ + 43c868: 80 f9 3f cmp $0x3f,%cl │ - 43c86f: 0f 87 73 01 00 00 ja 0x43c9e8 │ + 43c86b: 0f 87 60 01 00 00 ja 0x43c9d1 │ - 43c875: 48 21 c2 and %rax,%rdx │ + 43c871: 48 21 c2 and %rax,%rdx │ - 43c878: 0f 84 6a 01 00 00 je 0x43c9e8 │ + 43c874: 0f 84 57 01 00 00 je 0x43c9d1 │ - 43c87e: 47 88 7c 35 01 mov %r15b,0x1(%r13,%r14,1) │ + 43c87a: 43 88 4c 27 01 mov %cl,0x1(%r15,%r12,1) │ - 43c883: 43 c6 44 35 02 00 movb $0x0,0x2(%r13,%r14,1) │ + 43c87f: 43 c6 44 27 02 00 movb $0x0,0x2(%r15,%r12,1) │ - 43c889: 46 0f b6 7c 33 01 movzbl 0x1(%rbx,%r14,1),%r15d │ + 43c885: 42 0f b6 4c 23 01 movzbl 0x1(%rbx,%r12,1),%ecx │ - 43c88f: 49 83 c6 01 add $0x1,%r14 │ + 43c88b: 49 83 c4 01 add $0x1,%r12 │ + 43c88f: 84 c9 test %cl,%cl │ - 43c803: 4c 63 c2 movslq %edx,%r8 │ + 43c803: 4c 63 da movslq %edx,%r11 │ - 43c806: 49 c1 e0 03 shl $0x3,%r8 │ + 43c806: 49 c1 e3 03 shl $0x3,%r11 │ - 43c80a: 4c 03 05 e7 7a bc 00 add 0xbc7ae7(%rip),%r8 # 0x10042f8 │ + 43c80a: 4c 03 1d e7 7a bc 00 add 0xbc7ae7(%rip),%r11 # 0x10042f8 │ - 43c811: 4c 89 05 d0 7a bc 00 mov %r8,0xbc7ad0(%rip) # 0x10042e8 │ + 43c811: 4c 89 1d d0 7a bc 00 mov %r11,0xbc7ad0(%rip) # 0x10042e8 │ - 43c818: 4c 3b 05 d1 7a bc 00 cmp 0xbc7ad1(%rip),%r8 # 0x10042f0 │ + 43c818: 4c 3b 1d d1 7a bc 00 cmp 0xbc7ad1(%rip),%r11 # 0x10042f0 │ 43c81f: 76 07 jbe 0x43c828 │ - 43c821: 4c 89 05 c8 7a bc 00 mov %r8,0xbc7ac8(%rip) # 0x10042f0 │ + 43c821: 4c 89 1d c8 7a bc 00 mov %r11,0xbc7ac8(%rip) # 0x10042f0 │ - 43c865: 44 89 f9 mov %r15d,%ecx │ - 43c868: 48 d3 e2 shl %cl,%rdx │ + 43c865: 48 d3 e2 shl %cl,%rdx │ - 43c86b: 41 80 ff 3f cmp $0x3f,%r15b │ + 43c868: 80 f9 3f cmp $0x3f,%cl │ - 43c86f: 0f 87 73 01 00 00 ja 0x43c9e8 │ + 43c86b: 0f 87 60 01 00 00 ja 0x43c9d1 │ - 43c875: 48 21 c2 and %rax,%rdx │ + 43c871: 48 21 c2 and %rax,%rdx │ - 43c878: 0f 84 6a 01 00 00 je 0x43c9e8 │ + 43c874: 0f 84 57 01 00 00 je 0x43c9d1 │ - 43c87e: 47 88 7c 35 01 mov %r15b,0x1(%r13,%r14,1) │ + 43c87a: 43 88 4c 27 01 mov %cl,0x1(%r15,%r12,1) I have trouble figuring out how core counts would affect the compiler in this way, though.
(In reply to Conrad Meyer from comment #9) Perhaps this is compiler nondeterminism introduced in Clang 10?
Beats me; I think dim@ might have more of a clue. You or dim@ probably have better channels for inquiring further. :-)
This is an interesting bug, but what is the easiest way to reproduce? Does anybody know which particular .c file, or even which particular function is apparently compiled differently? Note that clang is really single threaded, so I don't understand how the number of cores can affect the result. Maybe the non-determinism (if it really exists) is independent of the number of cores, but changing that influences stack allocations, etc? And is this *only* on 11.x? Because that is still using BFD ld, as far as I know. Although the linker shouldn't matter...
I currently use the lazy approach and build complete releases. Obviously this makes bisecting slow but I haven't had time to simplify the steps yet. I've only seen the issue on 11.x but haven't tested 12.x.
Are people using -j here to build /usr/src/rescue, and if so, with which setting? I've run a plain 'make' on a VM with 4 cores at first, then 2 cores, but the resulting /usr/obj/usr/src/rescue directories are bit-identical then. Maybe I should retry with -j4 when I configure 4 cores, and -j2 when I configure 2 cores?
Okay, I can reproduce, and it appears to be something in usr.bin/printf/printf.c, which gets built as part of sh. In particular, in the printf_doformat function. One version gives: 0000000000000440 <printf_doformat>: 440: 55 push %rbp 441: 48 89 e5 mov %rsp,%rbp 444: 41 57 push %r15 446: 41 56 push %r14 448: 41 55 push %r13 44a: 41 54 push %r12 44c: 53 push %rbx 44d: 48 81 ec e8 00 00 00 sub $0xe8,%rsp 454: 49 89 f4 mov %rsi,%r12 457: 48 89 fb mov %rdi,%rbx 45a: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # 461 <printf_doformat+0x21> 461: 48 89 45 d0 mov %rax,-0x30(%rbp) 465: e8 00 00 00 00 callq 46a <printf_doformat+0x2a> another: 0000000000000440 <printf_doformat>: 440: 55 push %rbp 441: 48 89 e5 mov %rsp,%rbp 444: 41 57 push %r15 446: 41 56 push %r14 448: 41 55 push %r13 44a: 41 54 push %r12 44c: 53 push %rbx 44d: 48 81 ec e8 00 00 00 sub $0xe8,%rsp 454: 49 89 f5 mov %rsi,%r13 457: 48 89 fb mov %rdi,%rbx 45a: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # 461 <printf_doformat+0x21> 461: 48 89 45 d0 mov %rax,-0x30(%rbp) 465: e8 00 00 00 00 callq 46a <printf_doformat+0x2a> The C function starts with: static char * printf_doformat(char *fmt, int *rval) { static const char skip1[] = "#'-+ 0"; int fieldwidth, haveprec, havewidth, mod_ldbl, precision; char convch, nextch; char start[strlen(fmt) + 1]; char **fargv; char *dptr; int l; and the starting piece of code corresponds to the dynamic array declaration, char start[strlen(fmt) + 1]. As to why this is happening, no clue yet... I'm trying to build clang with UBSan, to see if it hits something in particular.
In the initial report I wrote: > I first saw the issue after rebasing ElectroBSD on > r360524 and confirmed that it still exists in r360986. As it turns out that wasn't entirely correct. While both revisions result in base.txz differences with different cpu cores, the rescue binaries in r360524 are reproducible but the /bin/sh differs. Presumably because of the printf.c issue found by Dimitry. In r360986 the /bin/sh issue is already gone. I'll continue bisecting ...
So after quite a bit of head scratching, I found that the problem appears to be caused by upstream commit https://github.com/llvm/llvm-project/commit/b4a99a061f517e60985667e39519f60186cbb469: commit b4a99a061f517e60985667e39519f60186cbb469 Author: Alexandre Ganea <alexandre.ganea@ubisoft.com> Date: Mon Jan 13 10:40:04 2020 -0500 [Clang][Driver] Re-use the calling process instead of creating a new process for the cc1 invocation With this patch, the clang tool will now call the -cc1 invocation directly inside the same process. Previously, the -cc1 invocation was creating, and waiting for, a new process. This patch therefore reduces the number of created processes during a build, thus it reduces build times on platforms where process creation can be costly (Windows) and/or impacted by a antivirus. It also makes debugging a bit easier, as there's no need to attach to the secondary -cc1 process anymore, breakpoints will be hit inside the same process. Crashes or signaling inside the -cc1 invocation will have the same side-effect as before, and will be reported through the same means. This behavior can be controlled at compile-time through the CLANG_SPAWN_CC1 cmake flag, which defaults to OFF. Setting it to ON will revert to the previous behavior, where any -cc1 invocation will create/fork a secondary process. At run-time, it is also possible to tweak the CLANG_SPAWN_CC1 environment variable. Setting it and will override the compile-time setting. A value of 0 calls -cc1 inside the calling process; a value of 1 will create a secondary process, as before. Differential Revision: https://reviews.llvm.org/D69825 For some reason, not spawning a second process for the cc1 stage has this particular influence on the end result, and *only* for the printf.c file! I'm still digging further into this rabbit hole, but for now a good workaround is to set the environment variable CLANG_SPAWN_CC1=1. This forces the clang driver to fork and exec a new instance for the cc1 stage, and clears up the problem completely for me.
Thanks a lot for looking into this, Dimitry. Which stable/11 revision are you currently testing on? I tried the work around with a system based on r360986 and still ended up with unreproducible rescue binaries like before. I'm currently trying it with a system based on r360524 to see if it makes /bin/sh reproducible for me.
(In reply to Fabian Keil from comment #19) > Which stable/11 revision are you currently testing on? On head, base r361410 (as of 2020-05-23) in fact. > I tried the work around with a system based on r360986 and still ended up with unreproducible rescue binaries like before. Yes, unfortunately upstream replaced the CLANG_SPAWN_CC1 environment variable right after llvm 10 was branched, in https://github.com/llvm/llvm-project/commit/c4a134a5107c1926262ee887057036ca53aa7265: Author: Nico Weber <thakis@chromium.org> Date: Wed Jan 15 10:45:02 2020 -0500 Replace CLANG_SPAWN_CC1 env var with a driver mode flag Flags are clang's default UI is flags. We can have an env var in addition to that, but in D69825 nobody has yet mentioned why this needs an env var, so omit it for now. If someone needs to set the flag via env var, the existing CCC_OVERRIDE_OPTIONS mechanism works for it (set CCC_OVERRIDE_OPTIONS=+-fno-integrated-cc1 for example). Also mention the cc1-in-process change in the release notes. Also spruce up the test a bit so it actually tests something :) Differential Revision: https://reviews.llvm.org/D72769 (cherry picked from commit 8e5018e990b701391e6c33ba85b012343df67272) It's a bit awkward to add -fno-integrated-cc1 to each CC invocation, but it be done either with the mechanism shown in the commit mesage above, or by setting CFLAGS="-O2 -fno-integrated-cc1" in make.conf. Alternatively, I could flip the default in lib/clang/include/clang/Config/config.h. In any case, I haven't yet gotten to the bottom of why this causes changed behavior.
Building r360524 with CLANG_SPAWN_CC1=1 resulted in unreproducible /bin/sh binaries which is unsurprising if the variable is no longer read by clang. Surprising is that I also built systems based on r361652 (stable/11 from 2020-05-30) on a 4-core system and a 2-core system, once per system with CLANG_SPAWN_CC1=1 and once without, and only got reproducible rescue binaries with the variable set. I confirmed that there is no code left that is reading the variable so it seems like the issue isn't triggered 100% reliably. This would also explain why I didn't see it with the FreeBSD-13.0-CURRENT 20200514-r361019 snapshot even though you are seeing it on HEAD. I'll test the lib/clang/include/clang/Config/config.h modification next. Setting it permanently wouldn't be a problem for ElectroBSD.
D'oh. I realized that some of my reproduction attempts were flawed because I didn't set WITHOUT_SYSTEM_COMPILER and only built the tested release once to safe time, thus testing the currently installed system compiler. Normally I build a release, install the release and rebuild the release with itself. Anyway, doing it properly with a system based on r361652 with "#define CLANG_SPAWN_CC1 1" seems to get worse results. Instead of a completely reproducible build I got an unreproducible /bin/sh, unreproducible rescue binaries and an unreproducible /usr/bin/printf.
Hello everyone, I am the author of the change pointed out by Dimitry above. I am really sorry for all the wasted time on this. This seems to unearth a determinism issue in the register allocation in LLVM. It could be the -fintegrated-cc1 change or it could be a latent bug. Could you possibly C-Reduce the printf.c bug please? Additionally, can the bug be reproduced with the latest LLVM? Many thanks!
(In reply to Alexandre Ganea from comment #23) This is the tricky part! I've only been able to reproduce this problem until now within the FreeBSD build system. That is, the result of one particular file being compiled is dependent on whether make is running with or without -j (and in some cases, the value of the -j option). As of now, I can "fix" the problem by using #define CLANG_SPAWN_CC1 1 in lib/clang/include/clang/Config/config.h, which effectively disables the in-process cc1 by default. At least then you always get the same object file output. But as to *why* the result differs when CLANG_SPAWN_CC1=0, and being invoked under different variations of make or make -j, it is not clear yet. One thing which Bryan Drewery just mentioned on IRC is that make -j messes with stderr, causing it to no longer be a tty. I think that must have some sort of influence, but it's going to be tricky to debug (especially if you want to send debug output to stderr :).
(In reply to Alexandre Ganea from comment #23) Thanks for the reply on this issue Alexandre. It seems rather unlikely that your change could be directly responsible for this issue and that it's more likely just exposing a latent issue or some sort of second order effect. It seems like we might want to disable the behaviour by default in stable/11 and stable/12, what do you think dim?
Created attachment 215200 [details] Reproduction files for printf.c This is a tarball with reproduction files for compiling printf.c, which should be completely standalone. Contents: bug246630-repro/printf-699ef3.c bug246630-repro/printf-699ef3.sh Note that the shell script already invokes clang -cc1, so it will not do anything special (AFAIK) with the spawn-new-cc setting. But it could be used to attempt to detect undefined behavior or usage of uninitialized memory. Since valgrind doesn't work on FreeBSD, I had to resort to running it on Linux, and there I get: ==120363== Conditional jump or move depends on uninitialised value(s) ==120363== at 0x1634474: llvm::ConstantExpr::getGetElementPtr(llvm::Type*, llvm::Constant*, llvm::ArrayRef<llvm::Value*>, bool, llvm::Optional<unsigned int>, llvm::Type*) (Constants.cpp:2191) ==120363== by 0x112D6D9: getGetElementPtr (Constants.h:1163) ==120363== by 0x112D6D9: (anonymous namespace)::SymbolicallyEvaluateGEP(llvm::GEPOperator const*, llvm::ArrayRef<llvm::Constant*>, llvm::DataLayout const&, llvm::TargetLibraryInfo const*) (ConstantFolding.cpp:1005) ==120363== by 0x112DF70: (anonymous namespace)::ConstantFoldInstOperandsImpl(llvm::Value const*, unsigned int, llvm::ArrayRef<llvm::Constant*>, llvm::DataLayout const&, llvm::TargetLibraryInfo const*) (ConstantFolding.cpp:1039) ==120363== by 0x112C165: (anonymous namespace)::ConstantFoldConstantImpl(llvm::Constant const*, llvm::DataLayout const&, llvm::TargetLibraryInfo const*, llvm::SmallDenseMap<llvm::Constant*, llvm::Constant*, 4u, llvm::DenseMapInfo<llvm::Constant*>, llvm::detail::DenseMapPair<llvm::Constant*, llvm::Constant*> >&) [clone .part.0] (ConstantFolding.cpp:1114) ==120363== by 0x112C5CF: llvm::ConstantFoldConstant(llvm::Constant const*, llvm::DataLayout const&, llvm::TargetLibraryInfo const*) (ConstantFolding.cpp:1194) ==120363== by 0x188F410: prepareICWorklistFromFunction (InstructionCombining.cpp:3584) ==120363== by 0x188F410: combineInstructionsOverFunction(llvm::Function&, llvm::InstCombineWorklist&, llvm::AAResults*, llvm::AssumptionCache&, llvm::TargetLibraryInfo&, llvm::DominatorTree&, llvm::OptimizationRemarkEmitter&, llvm::BlockFrequencyInfo*, llvm::ProfileSummaryInfo*, unsigned int, llvm::LoopInfo*) (InstructionCombining.cpp:3703) ==120363== by 0x189205F: runOnFunction (InstructionCombining.cpp:3789) ==120363== by 0x189205F: llvm::InstructionCombiningPass::runOnFunction(llvm::Function&) (InstructionCombining.cpp:3768) ==120363== by 0x16F4352: llvm::FPPassManager::runOnFunction(llvm::Function&) (LegacyPassManager.cpp:1482) ==120363== by 0x16F4DE8: llvm::FPPassManager::runOnModule(llvm::Module&) (LegacyPassManager.cpp:1518) ==120363== by 0x16F51A2: runOnModule (LegacyPassManager.cpp:1583) ==120363== by 0x16F51A2: llvm::legacy::PassManagerImpl::run(llvm::Module&) (LegacyPassManager.cpp:1695) ==120363== by 0x1FF4CFE: EmitAssembly (BackendUtil.cpp:954) ==120363== by 0x1FF4CFE: clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) (BackendUtil.cpp:1677) ==120363== by 0x2C471A8: clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (CodeGenAction.cpp:335) ==120363== Uninitialised value was created by a stack allocation ==120363== at 0x112C653: (anonymous namespace)::SymbolicallyEvaluateGEP(llvm::GEPOperator const*, llvm::ArrayRef<llvm::Constant*>, llvm::DataLayout const&, llvm::TargetLibraryInfo const*) (ConstantFolding.c I'm trying to reduce this now.
A commit references this bug: Author: dim Date: Wed Jun 3 16:51:30 UTC 2020 New revision: 361755 URL: https://svnweb.freebsd.org/changeset/base/361755 Log: Disable clang's -fintegrated-cc1 stage by default In bug 246630, it was found that part of the rescue binary could be compiled to very slightly different (but still equivalent) machine code, depending on the number of simultaneous make jobs (via the -j option). This turned out to be caused by the upstream change that made clang's first stage compiler (i.e. the -cc1 stage) run as part of the initial clang process invocation, instead of forking and exec'ing a new clang process. We are currently investigating the root cause for the difference in output, but while that is ongoing, disable the integrated cc1 stage for now to work around it. You can always turn it on explicitly by using the -fintegrated-cc1 option, or turn it off with -fno-integrated-cc1. Direct commit to stable/{11,12}, so this can hopefully end up in the upcoming 11.4-RELEASE. Reported by: Fabian Keil <fk@fabiankeil.de> PR: 246630 Changes: stable/11/lib/clang/include/clang/Config/config.h stable/12/lib/clang/include/clang/Config/config.h
A commit references this bug: Author: dim Date: Wed Jun 3 21:22:15 UTC 2020 New revision: 361772 URL: https://svnweb.freebsd.org/changeset/base/361772 Log: MF11 r361755: Disable clang's -fintegrated-cc1 stage by default In bug 246630, it was found that part of the rescue binary could be compiled to very slightly different (but still equivalent) machine code, depending on the number of simultaneous make jobs (via the -j option). This turned out to be caused by the upstream change that made clang's first stage compiler (i.e. the -cc1 stage) run as part of the initial clang process invocation, instead of forking and exec'ing a new clang process. We are currently investigating the root cause for the difference in output, but while that is ongoing, disable the integrated cc1 stage for now to work around it. You can always turn it on explicitly by using the -fintegrated-cc1 option, or turn it off with -fno-integrated-cc1. Direct commit to stable/{11,12}, so this can hopefully end up in the upcoming 11.4-RELEASE. Approved by: re (gjb) Reported by: Fabian Keil <fk@fabiankeil.de> PR: 246630 Changes: _U releng/11.4/ releng/11.4/lib/clang/include/clang/Config/config.h
As reported in comment 22, the "#define CLANG_SPAWN_CC1 1" change seems to result in worse results on stable/11 for me. I've just confirmed this with a system based on r361755: building it on two systems, one with 2 cpu cores, one with 4, resulted in an unreproducible /bin/sh, unreproducible rescue binaries and an unreproducible /usr/bin/printf. Maybe there's a relevant difference between the clang 10.0.0 in stable/11 and the more recent one in HEAD.
(In reply to Fabian Keil from comment #29) That's very strange. I could fairly easily reproduce the different printf.o output by building either with "make -j2", or with "make" on the same machine. Since "make -j" appears to mess around with ttys and stderr, it could explain the slightly different memory contents of the program at startup. In any case, removing the integrated cc1 stage always resulted in the same printf.o output. Are you saying you ran the same build on different machines? Are you 100% sure that all the includes pulled into printf.c were exactly the same? Is there any way you can do a sha256sum of the .i files? > Maybe there's a relevant difference between the clang 10.0.0 in stable/11 and the more recent one in HEAD. I don't think so, head has 10.0.1-rc1, but it showed precisely the effects I mentioned before. Disabling integrated cc1 fixed it for me.
(In reply to Dimitry Andric from comment #30) Yes, I used different machines but the same source. Using jails, I built a release, installed the release (in the jail) and built the release again (because my build scripts aren't setting WITHOUT_SYSTEM_COMPILER, yet). The generated source tar balls were identical so the printf includes should be as well. I assume you are talking about the ".i" files from clang and will upload the SHA256 sums. They were identical on both machines.
Created attachment 215251 [details] SHA256 sums of clang interface .i files from a system based on r361755
Created attachment 215531 [details] Merge llvm-project commit 0cecafd647cc I have been able to consistently reproduce the different outputs with a script, and by bisecting on a 11.x machine (which took a LONG time!) I found that upstream's https://reviews.llvm.org/rG0cecafd647cc fixes it for my case. That is, resetting CLANG_SPAWN_CC1 back to 0 and merging the non-test parts of the upstream change into our contrib/llvm-project seems to make this output reproducible again for me. I would appreciate it if you could apply this on your various systems as a test-run, and check if it now produces consistent output every time.
I applied the patch on my systems and indeed got reproducible binaries. Thanks a lot for tracking this down.
A commit references this bug: Author: dim Date: Thu Jun 18 20:41:44 UTC 2020 New revision: 362341 URL: https://svnweb.freebsd.org/changeset/base/362341 Log: Merge commit 0cecafd647cc from llvm git (by Alina Sbirlea): [BasicAA] Make BasicAA a cfg pass. Summary: Part of the changes in D44564 made BasicAA not CFG only due to it using PhiAnalysisValues which may have values invalidated. Subsequent patches (rL340613) appear to have addressed this limitation. BasicAA should not be invalidated by non-CFG-altering passes. A concrete example is MemCpyOpt which preserves CFG, but we are testing it invalidates BasicAA. llvm-dev RFC: https://groups.google.com/forum/#!topic/llvm-dev/eSPXuWnNfzM Reviewers: john.brawn, sebpop, hfinkel, brzycki Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74353 This fixes an issue with clang's -fintegrated-cc1 feature, which could make it output slightly different assembly code, depending on the way it was invoked. In r361755 we attempted to work around it by disabling the integrated cc1 stage, but it did not solve the root cause for all situations. Extensive testing and bisecting showed that the above change finally makes the output deterministic, even if -fintegrated-cc1 is on. Reported by: Fabian Keil <fk@fabiankeil.de> PR: 246630 MFC after: 3 days Changes: head/contrib/llvm-project/llvm/lib/Analysis/BasicAliasAnalysis.cpp
A commit references this bug: Author: dim Date: Sun Jun 21 09:24:48 UTC 2020 New revision: 362461 URL: https://svnweb.freebsd.org/changeset/base/362461 Log: MFC r362341: Merge commit 0cecafd647cc from llvm git (by Alina Sbirlea): [BasicAA] Make BasicAA a cfg pass. Summary: Part of the changes in D44564 made BasicAA not CFG only due to it using PhiAnalysisValues which may have values invalidated. Subsequent patches (rL340613) appear to have addressed this limitation. BasicAA should not be invalidated by non-CFG-altering passes. A concrete example is MemCpyOpt which preserves CFG, but we are testing it invalidates BasicAA. llvm-dev RFC: https://groups.google.com/forum/#!topic/llvm-dev/eSPXuWnNfzM Reviewers: john.brawn, sebpop, hfinkel, brzycki Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74353 This fixes an issue with clang's -fintegrated-cc1 feature, which could make it output slightly different assembly code, depending on the way it was invoked. In r361755 we attempted to work around it by disabling the integrated cc1 stage, but it did not solve the root cause for all situations. Extensive testing and bisecting showed that the above change finally makes the output deterministic, even if -fintegrated-cc1 is on. Reported by: Fabian Keil <fk@fabiankeil.de> PR: 246630 Changes: _U stable/11/ _U stable/11/contrib/llvm-project/llvm/ stable/11/contrib/llvm-project/llvm/lib/Analysis/BasicAliasAnalysis.cpp stable/11/lib/clang/include/clang/Config/config.h _U stable/12/ stable/12/contrib/llvm-project/llvm/lib/Analysis/BasicAliasAnalysis.cpp stable/12/lib/clang/include/clang/Config/config.h
MARKED AS SPAM