Bug 246630 - stable/11 regression: base.txz reproducibility depends on number of cpu cores
Summary: stable/11 regression: base.txz reproducibility depends on number of cpu cores
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: Unspecified
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords: regression
Depends on:
Blocks:
 
Reported: 2020-05-21 11:52 UTC by Fabian Keil
Modified: 2020-06-21 20:39 UTC (History)
4 users (show)

See Also:


Attachments
First 600 lines of the text section difference (32.48 KB, patch)
2020-05-22 17:33 UTC, Fabian Keil
no flags Details | Diff
Reproduction files for printf.c (57.35 KB, application/octet-stream)
2020-06-03 16:27 UTC, Dimitry Andric
no flags Details
SHA256 sums of clang interface .i files from a system based on r361755 (11.11 KB, text/plain)
2020-06-05 14:59 UTC, Fabian Keil
no flags Details
Merge llvm-project commit 0cecafd647cc (1.53 KB, patch)
2020-06-13 22:30 UTC, Dimitry Andric
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Fabian Keil 2020-05-21 11:52:05 UTC
On a recent FreeBSD stable/11 the rescue binary content varies depending
on the number of cpu cores.

With a 2 core system I can reproduce the binaries from the official
11.4-BETA2 release, with a 4 core system I get different results:

SHA256 (/images/FreeBSD/FreeBSD-11.4-BETA2-amd64-r361067-official/rescue/cp) = 3c96ab697fc721c226f99971863bcadbbbc73ccfec1abe2f04cbce60e3276a89
SHA256 (/images/FreeBSD/FreeBSD-11.4-BETA2-amd64-r361067-rebuild-2cores-t520/rescue/cp) = 3c96ab697fc721c226f99971863bcadbbbc73ccfec1abe2f04cbce60e3276a89
SHA256 (/images/FreeBSD/FreeBSD-11.4-BETA2-amd64-r361067-rebuild-4cores/rescue/cp) = 06e0b2964901fa02ed2f4e02c1334cb82ce1d97f4a6fc7a2632a6f74b35955d1

Diffoscope seems to indicate that some offsets are different.
The output starts with:

--- FreeBSD/FreeBSD-11.4-BETA2-amd64-r361067-official/rescue/cp
+++ FreeBSD/FreeBSD-11.4-BETA2-amd64-r361067-rebuild-4cores/rescue/cp
├── readelf --wide --sections {}
│ @@ -1,16 +1,16 @@
│  There are 19 section headers, starting at offset 0xa18bb0:
│  
│  Section Headers:
│    [Nr] Name              Type            Addr             Off    Size   ES Flg Lk Inf Al
│    [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
│    [ 1] .note.tag         NOTE            0000000000400190 000190 000030 00   A  0   0  4
│    [ 2] .init             PROGBITS        00000000004001c0 0001c0 000018 00  AX  0   0  4
│ -  [ 3] .text             PROGBITS        0000000000400200 000200 6dab08 00  AX  0   0 64
│ -  [ 4] .fini             PROGBITS        0000000000adad08 6dad08 00000e 00  AX  0   0  4
│ +  [ 3] .text             PROGBITS        0000000000400200 000200 6daa88 00  AX  0   0 64
│ +  [ 4] .fini             PROGBITS        0000000000adac88 6dac88 00000e 00  AX  0   0  4
│    [ 5] .rodata           PROGBITS        0000000000adb000 6db000 216bd8 00   A  0   0 4096
│    [ 6] .eh_frame_hdr     PROGBITS        0000000000cf1bd8 8f1bd8 0245ec 00   A  0   0  4
│    [ 7] .eh_frame         X86_64_UNWIND   0000000000d161c8 9161c8 0bb90c 00   A  0   0  8
│    [ 8] .tdata            PROGBITS        0000000000fd2000 9d2000 000068 00 WAT  0   0 16
│    [ 9] .tbss             NOBITS          0000000000fd2070 9d2068 000840 00 WAT  0   0 16
│    [10] .ctors            PROGBITS        0000000000fd2070 9d2070 0000c0 00  WA  0   0  8
│    [11] .dtors            PROGBITS        0000000000fd2130 9d2130 000018 00  WA  0   0  8
├── readelf --wide --debug-dump=frames {}
│ @@ -9894,21179 +9894,21195 @@
│    DW_CFA_nop
│    DW_CFA_nop
│    DW_CFA_nop
│    DW_CFA_nop
│    DW_CFA_nop
│    DW_CFA_nop
│  
│ -00005ee0 0000002c 00005ee4 FDE cie=00000000 pc=0043c780..0043d57c
│ +00005ee0 0000002c 00005ee4 FDE cie=00000000 pc=0043c780..0043d4c3
│    DW_CFA_advance_loc: 1 to 0043c781
│    DW_CFA_def_cfa_offset: 16
│    DW_CFA_offset: r6 (rbp) at cfa-16
│    DW_CFA_advance_loc: 3 to 0043c784
│    DW_CFA_def_cfa_register: r6 (rbp)
│    DW_CFA_advance_loc: 16 to 0043c794
│    DW_CFA_offset: r3 (rbx) at cfa-56
│    DW_CFA_offset: r12 (r12) at cfa-48
│    DW_CFA_offset: r13 (r13) at cfa-40
│    DW_CFA_offset: r14 (r14) at cfa-32
│    DW_CFA_offset: r15 (r15) at cfa-24
│ -  DW_CFA_advance_loc2: 3554 to 0043d576
│ +  DW_CFA_advance_loc2: 3369 to 0043d4bd
│    DW_CFA_def_cfa: r7 (rsp) ofs 8
│ -  DW_CFA_advance_loc: 1 to 0043d577
│ +  DW_CFA_advance_loc: 1 to 0043d4be
│    DW_CFA_def_cfa: r6 (rbp) ofs 16
│    DW_CFA_nop
│    DW_CFA_nop

The whole diff in text format has a size of 217M so I'm not attaching it.

I first saw the issue after rebasing ElectroBSD on r360524 and confirmed
that it still exists in r360986.

The problem does not exist in r357496 and earlier but I don't yet know
the exact revision when the problem was introduced.

I can't reproduce the issue on FreeBSD-13.0-CURRENT amd64
using the 20200514-r361019 snapshot.

I haven't tested 12/stable.
Comment 1 Conrad Meyer freebsd_committer 2020-05-21 13:46:59 UTC
Notably, .text is different (different size 0x6dab08 vs 0x6daa88).  The relocations and framedata seem to correspond to that (though the section-level diff does not show any difference in .eh_frame?).
Comment 2 Ed Maste freebsd_committer 2020-05-22 17:20:03 UTC
diffoscope should disassemble the .text sections and compare the result; would you be able to extract and post a few dozen lines from where .text diverges?

Is this unaffected by -j? Are there other differences between the 2 core and 4 core systems?
Comment 3 Fabian Keil 2020-05-22 17:33:09 UTC
Created attachment 214763 [details]
First 600 lines of the text section difference
Comment 4 Fabian Keil 2020-05-22 17:36:32 UTC
I'm always compiling with -j2.

Once I noticed the issue using completely different machines I
reproduced it with bhyve changing only the core number.
Comment 5 Ed Maste freebsd_committer 2020-05-22 17:38:51 UTC
Ah, the early differences are just different offsets due to later changes,

│ -  40024a:	e8 21 94 69 00       	callq  0xa99670
│ +  40024a:	e8 a1 93 69 00       	callq  0xa995f0

Do you have somewhere you could share the full text diff? Or, find the first non-callq difference, perhaps around 0x9f....
Comment 6 Fabian Keil 2020-05-22 17:52:06 UTC
I uploaded the whole compressed diff to:
https://www.fabiankeil.de/tmp/FreeBSD-11.4-BETA2-amd64-r361067-rescue-cp.diff.gz
(33 MB)
Comment 7 Conrad Meyer freebsd_committer 2020-05-22 18:23:37 UTC
Diff seems to start here (I reordered lines for clarity):

│ @@ -63992,183658 +63992,183673 @@
...
│ -  43c7b5:    49 89 cd                mov    %rcx,%r13
│ +  43c7b5:    49 89 cf                mov    %rcx,%r15

// ^^ r13 -> r15

│ -  43c7b8:    49 29 c5                sub    %rax,%r13
│ +  43c7b8:    49 29 c7                sub    %rax,%r15

│ -  43c7bb:    4c 89 ec                mov    %r13,%rsp
│ +  43c7bb:    4c 89 fc                mov    %r15,%rsp

...

│ -  43c803:    4c 63 c2                movslq %edx,%r8
│ +  43c803:    4c 63 da                movslq %edx,%r11

// ^^ r8 -> r11

│ -  43c806:    49 c1 e0 03             shl    $0x3,%r8
│ +  43c806:    49 c1 e3 03             shl    $0x3,%r11

│ -  43c80a:    4c 03 05 e7 7a bc 00    add    0xbc7ae7(%rip),%r8        # 0x10042f8
│ +  43c80a:    4c 03 1d e7 7a bc 00    add    0xbc7ae7(%rip),%r11        # 0x10042f8

│ -  43c811:    4c 89 05 d0 7a bc 00    mov    %r8,0xbc7ad0(%rip)        # 0x10042e8
│ +  43c811:    4c 89 1d d0 7a bc 00    mov    %r11,0xbc7ad0(%rip)        # 0x10042e8

│ -  43c818:    4c 3b 05 d1 7a bc 00    cmp    0xbc7ad1(%rip),%r8        # 0x10042f0
│ +  43c818:    4c 3b 1d d1 7a bc 00    cmp    0xbc7ad1(%rip),%r11        # 0x10042f0

│    43c81f:    76 07                   jbe    0x43c828
│ -  43c821:    4c 89 05 c8 7a bc 00    mov    %r8,0xbc7ac8(%rip)        # 0x10042f0
│ +  43c821:    4c 89 1d c8 7a bc 00    mov    %r11,0xbc7ac8(%rip)        # 0x10042f0


These particular changes are not the ones that affect code *size* (those must be later), but do show reproducibility problems.
Comment 8 Conrad Meyer freebsd_committer 2020-05-22 18:26:09 UTC
First code-size change is shortly afterwards:

│ -  43c83e:    45 31 c0                xor    %r8d,%r8d
│ -  43c841:    44 8a 3b                mov    (%rbx),%r15b

│ +  43c83e:    45 31 db                xor    %r11d,%r11d
│ +  43c841:    8a 0b                   mov    (%rbx),%cl

%cl (low 8 bits of %ecx) has a shorter mov.b representation than %r15b (low 8 bits of amd64 %r15).
Comment 9 Conrad Meyer freebsd_committer 2020-05-22 18:27:03 UTC
So clearly register selection is non-deterministic for some reason.  Perhaps some optimization pass runs for a non-deterministic amount of time or % of system memory, and scaling -j impacts that?
Comment 10 Ed Maste freebsd_committer 2020-05-22 18:29:33 UTC
Here is where the interesting parts start - I've rearranged the diff -/+ to make it easier to see where differences exist:

│ -  43c803:    4c 63 c2                movslq %edx,%r8
│ +  43c803:    4c 63 da                movslq %edx,%r11
│ -  43c806:    49 c1 e0 03             shl    $0x3,%r8
│ +  43c806:    49 c1 e3 03             shl    $0x3,%r11
│ -  43c80a:    4c 03 05 e7 7a bc 00    add    0xbc7ae7(%rip),%r8        # 0x10042f8
│ +  43c80a:    4c 03 1d e7 7a bc 00    add    0xbc7ae7(%rip),%r11        # 0x10042f8
│ -  43c811:    4c 89 05 d0 7a bc 00    mov    %r8,0xbc7ad0(%rip)        # 0x10042e8
│ +  43c811:    4c 89 1d d0 7a bc 00    mov    %r11,0xbc7ad0(%rip)        # 0x10042e8
│ -  43c818:    4c 3b 05 d1 7a bc 00    cmp    0xbc7ad1(%rip),%r8        # 0x10042f0
│ +  43c818:    4c 3b 1d d1 7a bc 00    cmp    0xbc7ad1(%rip),%r11        # 0x10042f0
│    43c81f:    76 07                   jbe    0x43c828
│ -  43c821:    4c 89 05 c8 7a bc 00    mov    %r8,0xbc7ac8(%rip)        # 0x10042f0
│ +  43c821:    4c 89 1d c8 7a bc 00    mov    %r11,0xbc7ac8(%rip)        # 0x10042f0
│    43c828:    48 b8 00 00 00 00 01    mov    $0x100000000,%rax
│    43c82f:    00 00 00
│    43c832:    49 01 c6                add    %rax,%r14
│    43c835:    49 c1 fe 20             sar    $0x20,%r14
│    43c839:    4c 01 f3                add    %r14,%rbx
│    43c83c:    eb 03                   jmp    0x43c841
│ -  43c83e:    45 31 c0                xor    %r8d,%r8d
│ +  43c83e:    45 31 db                xor    %r11d,%r11d
│ -  43c841:    44 8a 3b                mov    (%rbx),%r15b
│ +  43c841:    8a 0b                   mov    (%rbx),%cl
│ -  43c844:    45 31 c9                xor    %r9d,%r9d
│ +  43c843:    45 31 c9                xor    %r9d,%r9d
│ -  43c847:    45 84 ff                test   %r15b,%r15b
│ +  43c846:    84 c9                   test   %cl,%cl
│ +  43c848:    4c 89 ad 00 ff ff ff    mov    %r13,-0x100(%rbp)
│ -  43c84a:    74 60                   je     0x43c8ac
│ +  43c84f:    74 4f                   je     0x43c8a0
│ -  43c84c:    45 31 f6                xor    %r14d,%r14d
│ +  43c851:    45 31 e4                xor    %r12d,%r12d
│ -  43c84f:    48 b8 01 00 00 00 89    mov    $0x1288900000001,%rax
│ -  43c856:    28 01 00
│ +  43c854:    48 b8 01 00 00 00 89    mov    $0x1288900000001,%rax
│ +  43c85b:    28 01 00
│ -  43c859:    0f 1f 80 00 00 00 00    nopl   0x0(%rax)
│ +  43c85e:    66 90                   xchg   %ax,%ax
│    43c860:    ba 01 00 00 00          mov    $0x1,%edx
│ -  43c865:    44 89 f9                mov    %r15d,%ecx
│ -  43c868:    48 d3 e2                shl    %cl,%rdx
│ +  43c865:    48 d3 e2                shl    %cl,%rdx
│ -  43c86b:    41 80 ff 3f             cmp    $0x3f,%r15b
│ +  43c868:    80 f9 3f                cmp    $0x3f,%cl
│ -  43c86f:    0f 87 73 01 00 00       ja     0x43c9e8
│ +  43c86b:    0f 87 60 01 00 00       ja     0x43c9d1
│ -  43c875:    48 21 c2                and    %rax,%rdx
│ +  43c871:    48 21 c2                and    %rax,%rdx
│ -  43c878:    0f 84 6a 01 00 00       je     0x43c9e8
│ +  43c874:    0f 84 57 01 00 00       je     0x43c9d1
│ -  43c87e:    47 88 7c 35 01          mov    %r15b,0x1(%r13,%r14,1)
│ +  43c87a:    43 88 4c 27 01          mov    %cl,0x1(%r15,%r12,1)
│ -  43c883:    43 c6 44 35 02 00       movb   $0x0,0x2(%r13,%r14,1)
│ +  43c87f:    43 c6 44 27 02 00       movb   $0x0,0x2(%r15,%r12,1)
│ -  43c889:    46 0f b6 7c 33 01       movzbl 0x1(%rbx,%r14,1),%r15d
│ +  43c885:    42 0f b6 4c 23 01       movzbl 0x1(%rbx,%r12,1),%ecx
│ -  43c88f:    49 83 c6 01             add    $0x1,%r14
│ +  43c88b:    49 83 c4 01             add    $0x1,%r12
│ +  43c88f:    84 c9                   test   %cl,%cl
│ -  43c803:    4c 63 c2                movslq %edx,%r8
│ +  43c803:    4c 63 da                movslq %edx,%r11
│ -  43c806:    49 c1 e0 03             shl    $0x3,%r8
│ +  43c806:    49 c1 e3 03             shl    $0x3,%r11
│ -  43c80a:    4c 03 05 e7 7a bc 00    add    0xbc7ae7(%rip),%r8        # 0x10042f8
│ +  43c80a:    4c 03 1d e7 7a bc 00    add    0xbc7ae7(%rip),%r11        # 0x10042f8
│ -  43c811:    4c 89 05 d0 7a bc 00    mov    %r8,0xbc7ad0(%rip)        # 0x10042e8
│ +  43c811:    4c 89 1d d0 7a bc 00    mov    %r11,0xbc7ad0(%rip)        # 0x10042e8
│ -  43c818:    4c 3b 05 d1 7a bc 00    cmp    0xbc7ad1(%rip),%r8        # 0x10042f0
│ +  43c818:    4c 3b 1d d1 7a bc 00    cmp    0xbc7ad1(%rip),%r11        # 0x10042f0
│    43c81f:    76 07                   jbe    0x43c828
│ -  43c821:    4c 89 05 c8 7a bc 00    mov    %r8,0xbc7ac8(%rip)        # 0x10042f0
│ +  43c821:    4c 89 1d c8 7a bc 00    mov    %r11,0xbc7ac8(%rip)        # 0x10042f0
│ -  43c865:    44 89 f9                mov    %r15d,%ecx
│ -  43c868:    48 d3 e2                shl    %cl,%rdx
│ +  43c865:    48 d3 e2                shl    %cl,%rdx
│ -  43c86b:    41 80 ff 3f             cmp    $0x3f,%r15b
│ +  43c868:    80 f9 3f                cmp    $0x3f,%cl
│ -  43c86f:    0f 87 73 01 00 00       ja     0x43c9e8
│ +  43c86b:    0f 87 60 01 00 00       ja     0x43c9d1
│ -  43c875:    48 21 c2                and    %rax,%rdx
│ +  43c871:    48 21 c2                and    %rax,%rdx
│ -  43c878:    0f 84 6a 01 00 00       je     0x43c9e8
│ +  43c874:    0f 84 57 01 00 00       je     0x43c9d1
│ -  43c87e:    47 88 7c 35 01          mov    %r15b,0x1(%r13,%r14,1)
│ +  43c87a:    43 88 4c 27 01          mov    %cl,0x1(%r15,%r12,1)

I have trouble figuring out how core counts would affect the compiler in this way, though.
Comment 11 Ed Maste freebsd_committer 2020-05-22 18:30:36 UTC
(In reply to Conrad Meyer from comment #9)
Perhaps this is compiler nondeterminism introduced in Clang 10?
Comment 12 Conrad Meyer freebsd_committer 2020-05-22 18:32:04 UTC
Beats me; I think dim@ might have more of a clue.  You or dim@ probably have better channels for inquiring further. :-)
Comment 13 Dimitry Andric freebsd_committer 2020-05-23 10:39:51 UTC
This is an interesting bug, but what is the easiest way to reproduce? Does anybody know which particular .c file, or even which particular function is apparently compiled differently?

Note that clang is really single threaded, so I don't understand how the number of cores can affect the result. Maybe the non-determinism (if it really exists) is independent of the number of cores, but changing that influences stack allocations, etc?

And is this *only* on 11.x? Because that is still using BFD ld, as far as I know. Although the linker shouldn't matter...
Comment 14 Fabian Keil 2020-05-23 14:09:16 UTC
I currently use the lazy approach and build complete releases.
Obviously this makes bisecting slow but I haven't had time to
simplify the steps yet.

I've only seen the issue on 11.x but haven't tested 12.x.
Comment 15 Dimitry Andric freebsd_committer 2020-05-23 15:21:09 UTC
Are people using -j here to build /usr/src/rescue, and if so, with which setting?  I've run a plain 'make' on a VM with 4 cores at first, then 2 cores, but the resulting /usr/obj/usr/src/rescue directories are bit-identical then.

Maybe I should retry with -j4 when I configure 4 cores, and -j2 when I configure 2 cores?
Comment 16 Dimitry Andric freebsd_committer 2020-05-23 17:11:55 UTC
Okay, I can reproduce, and it appears to be something in usr.bin/printf/printf.c, which gets built as part of sh. In particular, in the printf_doformat function.

One version gives:

0000000000000440 <printf_doformat>:
     440:       55                      push   %rbp
     441:       48 89 e5                mov    %rsp,%rbp
     444:       41 57                   push   %r15
     446:       41 56                   push   %r14
     448:       41 55                   push   %r13
     44a:       41 54                   push   %r12
     44c:       53                      push   %rbx
     44d:       48 81 ec e8 00 00 00    sub    $0xe8,%rsp
     454:       49 89 f4                mov    %rsi,%r12
     457:       48 89 fb                mov    %rdi,%rbx
     45a:       48 8b 05 00 00 00 00    mov    0x0(%rip),%rax        # 461 <printf_doformat+0x21>
     461:       48 89 45 d0             mov    %rax,-0x30(%rbp)
     465:       e8 00 00 00 00          callq  46a <printf_doformat+0x2a>

another:

0000000000000440 <printf_doformat>:
     440:       55                      push   %rbp
     441:       48 89 e5                mov    %rsp,%rbp
     444:       41 57                   push   %r15
     446:       41 56                   push   %r14
     448:       41 55                   push   %r13
     44a:       41 54                   push   %r12
     44c:       53                      push   %rbx
     44d:       48 81 ec e8 00 00 00    sub    $0xe8,%rsp
     454:       49 89 f5                mov    %rsi,%r13
     457:       48 89 fb                mov    %rdi,%rbx
     45a:       48 8b 05 00 00 00 00    mov    0x0(%rip),%rax        # 461 <printf_doformat+0x21>
     461:       48 89 45 d0             mov    %rax,-0x30(%rbp)
     465:       e8 00 00 00 00          callq  46a <printf_doformat+0x2a>

The C function starts with:

static char *
printf_doformat(char *fmt, int *rval)
{
        static const char skip1[] = "#'-+ 0";
        int fieldwidth, haveprec, havewidth, mod_ldbl, precision;
        char convch, nextch;
        char start[strlen(fmt) + 1];
        char **fargv;
        char *dptr;
        int l;

and the starting piece of code corresponds to the dynamic array declaration, char start[strlen(fmt) + 1].

As to why this is happening, no clue yet... I'm trying to build clang with UBSan, to see if it hits something in particular.
Comment 17 Fabian Keil 2020-05-26 18:41:43 UTC
In the initial report I wrote:

> I first saw the issue after rebasing ElectroBSD on
> r360524 and confirmed that it still exists in r360986.

As it turns out that wasn't entirely correct.

While both revisions result in base.txz differences with different cpu cores, the rescue binaries in r360524 are reproducible but the /bin/sh differs. Presumably because of the printf.c issue found by Dimitry.

In r360986 the /bin/sh issue is already gone.

I'll continue bisecting ...
Comment 18 Dimitry Andric freebsd_committer 2020-05-29 19:33:24 UTC
So after quite a bit of head scratching, I found that the problem appears to be caused by upstream commit https://github.com/llvm/llvm-project/commit/b4a99a061f517e60985667e39519f60186cbb469:

commit b4a99a061f517e60985667e39519f60186cbb469
Author: Alexandre Ganea <alexandre.ganea@ubisoft.com>
Date:   Mon Jan 13 10:40:04 2020 -0500

    [Clang][Driver] Re-use the calling process instead of creating a new process for the cc1 invocation

    With this patch, the clang tool will now call the -cc1 invocation directly inside the same process. Previously, the -cc1 invocation was creating, and waiting for, a new process.
    This patch therefore reduces the number of created processes during a build, thus it reduces build times on platforms where process creation can be costly (Windows) and/or impacted by a antivirus.
    It also makes debugging a bit easier, as there's no need to attach to the secondary -cc1 process anymore, breakpoints will be hit inside the same process.

    Crashes or signaling inside the -cc1 invocation will have the same side-effect as before, and will be reported through the same means.

    This behavior can be controlled at compile-time through the CLANG_SPAWN_CC1 cmake flag, which defaults to OFF. Setting it to ON will revert to the previous behavior, where any -cc1 invocation will create/fork a secondary process.
    At run-time, it is also possible to tweak the CLANG_SPAWN_CC1 environment variable. Setting it and will override the compile-time setting. A value of 0 calls -cc1 inside the calling process; a value of 1 will create a secondary process, as before.

    Differential Revision: https://reviews.llvm.org/D69825

For some reason, not spawning a second process for the cc1 stage has this particular influence on the end result, and *only* for the printf.c file!

I'm still digging further into this rabbit hole, but for now a good workaround is to set the environment variable CLANG_SPAWN_CC1=1. This forces the clang driver to fork and exec a new instance for the cc1 stage, and clears up the problem completely for me.
Comment 19 Fabian Keil 2020-05-30 11:01:00 UTC
Thanks a lot for looking into this, Dimitry.

Which stable/11 revision are you currently testing on?

I tried the work around with a system based on r360986 and still ended up with unreproducible rescue binaries like before.

I'm currently trying it with a system based on r360524 to see if it makes /bin/sh reproducible for me.
Comment 20 Dimitry Andric freebsd_committer 2020-05-31 15:18:39 UTC
(In reply to Fabian Keil from comment #19)
> Which stable/11 revision are you currently testing on?

On head, base r361410 (as of 2020-05-23) in fact.

> I tried the work around with a system based on r360986 and still ended up with unreproducible rescue binaries like before.

Yes, unfortunately upstream replaced the CLANG_SPAWN_CC1 environment variable right after llvm 10 was branched, in https://github.com/llvm/llvm-project/commit/c4a134a5107c1926262ee887057036ca53aa7265:

Author: Nico Weber <thakis@chromium.org>
Date:   Wed Jan 15 10:45:02 2020 -0500

    Replace CLANG_SPAWN_CC1 env var with a driver mode flag

    Flags are clang's default UI is flags.

    We can have an env var in addition to that, but in D69825 nobody has yet
    mentioned why this needs an env var, so omit it for now.  If someone
    needs to set the flag via env var, the existing CCC_OVERRIDE_OPTIONS
    mechanism works for it (set CCC_OVERRIDE_OPTIONS=+-fno-integrated-cc1
    for example).

    Also mention the cc1-in-process change in the release notes.

    Also spruce up the test a bit so it actually tests something :)

    Differential Revision: https://reviews.llvm.org/D72769

    (cherry picked from commit 8e5018e990b701391e6c33ba85b012343df67272)

It's a bit awkward to add -fno-integrated-cc1 to each CC invocation, but it be done either with the mechanism shown in the commit mesage above, or by setting CFLAGS="-O2 -fno-integrated-cc1" in make.conf.

Alternatively, I could flip the default in lib/clang/include/clang/Config/config.h.

In any case, I haven't yet gotten to the bottom of why this causes changed behavior.
Comment 21 Fabian Keil 2020-05-31 15:59:14 UTC
Building r360524 with CLANG_SPAWN_CC1=1 resulted in unreproducible /bin/sh binaries which is unsurprising if the variable is no longer read by clang.

Surprising is that I also built systems based on r361652 (stable/11 from 2020-05-30) on a 4-core system and a 2-core system, once per system with CLANG_SPAWN_CC1=1 and once without, and only got reproducible rescue binaries with the variable set.

I confirmed that there is no code left that is reading the variable so it seems like the issue isn't triggered 100% reliably.

This would also explain why I didn't see it with the FreeBSD-13.0-CURRENT 20200514-r361019 snapshot even though you are seeing it on HEAD.

I'll test the lib/clang/include/clang/Config/config.h modification next. Setting it permanently wouldn't be a problem for ElectroBSD.
Comment 22 Fabian Keil 2020-06-01 07:01:42 UTC
D'oh. I realized that some of my reproduction attempts were flawed because I didn't set WITHOUT_SYSTEM_COMPILER and only built the tested release once to safe time, thus testing the currently installed system compiler.

Normally I build a release, install the release and rebuild the release with itself.

Anyway, doing it properly with a system based on r361652 with "#define CLANG_SPAWN_CC1 1" seems to get worse results.

Instead of a completely reproducible build I got an unreproducible /bin/sh, unreproducible rescue binaries and an unreproducible /usr/bin/printf.
Comment 23 Alexandre Ganea 2020-06-02 01:31:40 UTC
Hello everyone,
I am the author of the change pointed out by Dimitry above. I am really sorry for all the wasted time on this. This seems to unearth a determinism issue in the register allocation in LLVM. It could be the -fintegrated-cc1 change or it could be a latent bug.
Could you possibly C-Reduce the printf.c bug please? Additionally, can the bug be reproduced with the latest LLVM?
Many thanks!
Comment 24 Dimitry Andric freebsd_committer 2020-06-02 18:42:21 UTC
(In reply to Alexandre Ganea from comment #23)

This is the tricky part! I've only been able to reproduce this problem until now within the FreeBSD build system. That is, the result of one particular file being compiled is dependent on whether make is running with or without -j (and in some cases, the value of the -j option).

As of now, I can "fix" the problem by using #define CLANG_SPAWN_CC1 1 in lib/clang/include/clang/Config/config.h, which effectively disables the in-process cc1 by default. At least then you always get the same object file output.

But as to *why* the result differs when CLANG_SPAWN_CC1=0, and being invoked under different variations of make or make -j, it is not clear yet.

One thing which Bryan Drewery just mentioned on IRC is that make -j messes with stderr, causing it to no longer be a tty. I think that must have some sort of influence, but it's going to be tricky to debug (especially if you want to send debug output to stderr :).
Comment 25 Ed Maste freebsd_committer 2020-06-03 13:48:53 UTC
(In reply to Alexandre Ganea from comment #23)
Thanks for the reply on this issue Alexandre. It seems rather unlikely that your change could be directly responsible for this issue and that it's more likely just exposing a latent issue or some sort of second order effect.

It seems like we might want to disable the behaviour by default in stable/11 and stable/12, what do you think dim?
Comment 26 Dimitry Andric freebsd_committer 2020-06-03 16:27:30 UTC
Created attachment 215200 [details]
Reproduction files for printf.c

This is a tarball with reproduction files for compiling printf.c, which should be completely standalone. Contents:

bug246630-repro/printf-699ef3.c
bug246630-repro/printf-699ef3.sh

Note that the shell script already invokes clang -cc1, so it will not do anything special (AFAIK) with the spawn-new-cc setting. But it could be used to attempt to detect undefined behavior or usage of uninitialized memory.

Since valgrind doesn't work on FreeBSD, I had to resort to running it on Linux, and there I get:

==120363== Conditional jump or move depends on uninitialised value(s)
==120363==    at 0x1634474: llvm::ConstantExpr::getGetElementPtr(llvm::Type*, llvm::Constant*, llvm::ArrayRef<llvm::Value*>, bool, llvm::Optional<unsigned int>, llvm::Type*) (Constants.cpp:2191)
==120363==    by 0x112D6D9: getGetElementPtr (Constants.h:1163)
==120363==    by 0x112D6D9: (anonymous namespace)::SymbolicallyEvaluateGEP(llvm::GEPOperator const*, llvm::ArrayRef<llvm::Constant*>, llvm::DataLayout const&, llvm::TargetLibraryInfo const*) (ConstantFolding.cpp:1005)
==120363==    by 0x112DF70: (anonymous namespace)::ConstantFoldInstOperandsImpl(llvm::Value const*, unsigned int, llvm::ArrayRef<llvm::Constant*>, llvm::DataLayout const&, llvm::TargetLibraryInfo const*) (ConstantFolding.cpp:1039)
==120363==    by 0x112C165: (anonymous namespace)::ConstantFoldConstantImpl(llvm::Constant const*, llvm::DataLayout const&, llvm::TargetLibraryInfo const*, llvm::SmallDenseMap<llvm::Constant*, llvm::Constant*, 4u, llvm::DenseMapInfo<llvm::Constant*>, llvm::detail::DenseMapPair<llvm::Constant*, llvm::Constant*> >&) [clone .part.0] (ConstantFolding.cpp:1114)
==120363==    by 0x112C5CF: llvm::ConstantFoldConstant(llvm::Constant const*, llvm::DataLayout const&, llvm::TargetLibraryInfo const*) (ConstantFolding.cpp:1194)
==120363==    by 0x188F410: prepareICWorklistFromFunction (InstructionCombining.cpp:3584)
==120363==    by 0x188F410: combineInstructionsOverFunction(llvm::Function&, llvm::InstCombineWorklist&, llvm::AAResults*, llvm::AssumptionCache&, llvm::TargetLibraryInfo&, llvm::DominatorTree&, llvm::OptimizationRemarkEmitter&, llvm::BlockFrequencyInfo*, llvm::ProfileSummaryInfo*, unsigned int, llvm::LoopInfo*) (InstructionCombining.cpp:3703)
==120363==    by 0x189205F: runOnFunction (InstructionCombining.cpp:3789)
==120363==    by 0x189205F: llvm::InstructionCombiningPass::runOnFunction(llvm::Function&) (InstructionCombining.cpp:3768)
==120363==    by 0x16F4352: llvm::FPPassManager::runOnFunction(llvm::Function&) (LegacyPassManager.cpp:1482)
==120363==    by 0x16F4DE8: llvm::FPPassManager::runOnModule(llvm::Module&) (LegacyPassManager.cpp:1518)
==120363==    by 0x16F51A2: runOnModule (LegacyPassManager.cpp:1583)
==120363==    by 0x16F51A2: llvm::legacy::PassManagerImpl::run(llvm::Module&) (LegacyPassManager.cpp:1695)
==120363==    by 0x1FF4CFE: EmitAssembly (BackendUtil.cpp:954)
==120363==    by 0x1FF4CFE: clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) (BackendUtil.cpp:1677)
==120363==    by 0x2C471A8: clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (CodeGenAction.cpp:335)
==120363==  Uninitialised value was created by a stack allocation
==120363==    at 0x112C653: (anonymous namespace)::SymbolicallyEvaluateGEP(llvm::GEPOperator const*, llvm::ArrayRef<llvm::Constant*>, llvm::DataLayout const&, llvm::TargetLibraryInfo const*) (ConstantFolding.c

I'm trying to reduce this now.
Comment 27 commit-hook freebsd_committer 2020-06-03 16:52:32 UTC
A commit references this bug:

Author: dim
Date: Wed Jun  3 16:51:30 UTC 2020
New revision: 361755
URL: https://svnweb.freebsd.org/changeset/base/361755

Log:
  Disable clang's -fintegrated-cc1 stage by default

  In bug 246630, it was found that part of the rescue binary could be
  compiled to very slightly different (but still equivalent) machine code,
  depending on the number of simultaneous make jobs (via the -j option).

  This turned out to be caused by the upstream change that made clang's
  first stage compiler (i.e. the -cc1 stage) run as part of the initial
  clang process invocation, instead of forking and exec'ing a new clang
  process.

  We are currently investigating the root cause for the difference in
  output, but while that is ongoing, disable the integrated cc1 stage for
  now to work around it. You can always turn it on explicitly by using the
  -fintegrated-cc1 option, or turn it off with -fno-integrated-cc1.

  Direct commit to stable/{11,12}, so this can hopefully end up in the
  upcoming 11.4-RELEASE.

  Reported by:	Fabian Keil <fk@fabiankeil.de>
  PR:		246630

Changes:
  stable/11/lib/clang/include/clang/Config/config.h
  stable/12/lib/clang/include/clang/Config/config.h
Comment 28 commit-hook freebsd_committer 2020-06-03 21:23:03 UTC
A commit references this bug:

Author: dim
Date: Wed Jun  3 21:22:15 UTC 2020
New revision: 361772
URL: https://svnweb.freebsd.org/changeset/base/361772

Log:
  MF11 r361755:

  Disable clang's -fintegrated-cc1 stage by default

  In bug 246630, it was found that part of the rescue binary could be
  compiled to very slightly different (but still equivalent) machine code,
  depending on the number of simultaneous make jobs (via the -j option).

  This turned out to be caused by the upstream change that made clang's
  first stage compiler (i.e. the -cc1 stage) run as part of the initial
  clang process invocation, instead of forking and exec'ing a new clang
  process.

  We are currently investigating the root cause for the difference in
  output, but while that is ongoing, disable the integrated cc1 stage for
  now to work around it. You can always turn it on explicitly by using the
  -fintegrated-cc1 option, or turn it off with -fno-integrated-cc1.

  Direct commit to stable/{11,12}, so this can hopefully end up in the
  upcoming 11.4-RELEASE.

  Approved by:	re (gjb)
  Reported by:	Fabian Keil <fk@fabiankeil.de>
  PR:		246630

Changes:
_U  releng/11.4/
  releng/11.4/lib/clang/include/clang/Config/config.h
Comment 29 Fabian Keil 2020-06-04 07:13:43 UTC
As reported in comment 22, the "#define CLANG_SPAWN_CC1 1" change seems to result in worse results on stable/11 for me.

I've just confirmed this with a system based on r361755: building it on two systems, one with 2 cpu cores, one with 4, resulted in an unreproducible /bin/sh, unreproducible rescue binaries and an unreproducible /usr/bin/printf.

Maybe there's a relevant difference between the clang 10.0.0 in stable/11 and the more recent one in HEAD.
Comment 30 Dimitry Andric freebsd_committer 2020-06-04 18:15:45 UTC
(In reply to Fabian Keil from comment #29)
That's very strange. I could fairly easily reproduce the different printf.o output by building either with "make -j2", or with "make" on the same machine. 

Since "make -j" appears to mess around with ttys and stderr, it could explain the slightly different memory contents of the program at startup. In any case, removing the integrated cc1 stage always resulted in the same printf.o output.

Are you saying you ran the same build on different machines? Are you 100% sure that all the includes pulled into printf.c were exactly the same? Is there any way you can do a sha256sum of the .i files?


> Maybe there's a relevant difference between the clang 10.0.0 in stable/11 and the more recent one in HEAD.

I don't think so, head has 10.0.1-rc1, but it showed precisely the effects I mentioned before. Disabling integrated cc1 fixed it for me.
Comment 31 Fabian Keil 2020-06-05 14:56:41 UTC
(In reply to Dimitry Andric from comment #30)

Yes, I used different machines but the same source.

Using jails, I built a release, installed the release (in the jail) and built the release again (because my build scripts aren't setting WITHOUT_SYSTEM_COMPILER, yet).

The generated source tar balls were identical so the printf includes should be as well.

I assume you are talking about the ".i" files from clang and will upload the SHA256 sums. They were identical on both machines.
Comment 32 Fabian Keil 2020-06-05 14:59:04 UTC
Created attachment 215251 [details]
SHA256 sums of clang interface .i files from a system based on r361755
Comment 33 Dimitry Andric freebsd_committer 2020-06-13 22:30:47 UTC
Created attachment 215531 [details]
Merge llvm-project commit 0cecafd647cc

I have been able to consistently reproduce the different outputs with a script, and by bisecting on a 11.x machine (which took a LONG time!) I found that upstream's https://reviews.llvm.org/rG0cecafd647cc fixes it for my case.

That is, resetting CLANG_SPAWN_CC1 back to 0 and merging the non-test parts of the upstream change into our contrib/llvm-project seems to make this output reproducible again for me.

I would appreciate it if you could apply this on your various systems as a test-run, and check if it now produces consistent output every time.
Comment 34 Fabian Keil 2020-06-15 07:35:40 UTC
I applied the patch on my systems and indeed got reproducible binaries.

Thanks a lot for tracking this down.
Comment 35 commit-hook freebsd_committer 2020-06-18 20:41:49 UTC
A commit references this bug:

Author: dim
Date: Thu Jun 18 20:41:44 UTC 2020
New revision: 362341
URL: https://svnweb.freebsd.org/changeset/base/362341

Log:
  Merge commit 0cecafd647cc from llvm git (by Alina Sbirlea):

    [BasicAA] Make BasicAA a cfg pass.

    Summary:
    Part of the changes in D44564 made BasicAA not CFG only due to it
    using PhiAnalysisValues which may have values invalidated. Subsequent
    patches (rL340613) appear to have addressed this limitation.

    BasicAA should not be invalidated by non-CFG-altering passes. A
    concrete example is MemCpyOpt which preserves CFG, but we are testing
    it invalidates BasicAA.

    llvm-dev RFC:
    https://groups.google.com/forum/#!topic/llvm-dev/eSPXuWnNfzM

    Reviewers: john.brawn, sebpop, hfinkel, brzycki

    Subscribers: hiraditya, llvm-commits

    Tags: #llvm

    Differential Revision: https://reviews.llvm.org/D74353

  This fixes an issue with clang's -fintegrated-cc1 feature, which could
  make it output slightly different assembly code, depending on the way it
  was invoked.

  In r361755 we attempted to work around it by disabling the integrated
  cc1 stage, but it did not solve the root cause for all situations.

  Extensive testing and bisecting showed that the above change finally
  makes the output deterministic, even if -fintegrated-cc1 is on.

  Reported by:	Fabian Keil <fk@fabiankeil.de>
  PR:		246630
  MFC after:	3 days

Changes:
  head/contrib/llvm-project/llvm/lib/Analysis/BasicAliasAnalysis.cpp
Comment 36 commit-hook freebsd_committer 2020-06-21 09:25:16 UTC
A commit references this bug:

Author: dim
Date: Sun Jun 21 09:24:48 UTC 2020
New revision: 362461
URL: https://svnweb.freebsd.org/changeset/base/362461

Log:
  MFC r362341:

  Merge commit 0cecafd647cc from llvm git (by Alina Sbirlea):

    [BasicAA] Make BasicAA a cfg pass.

    Summary:
    Part of the changes in D44564 made BasicAA not CFG only due to it
    using PhiAnalysisValues which may have values invalidated. Subsequent
    patches (rL340613) appear to have addressed this limitation.

    BasicAA should not be invalidated by non-CFG-altering passes. A
    concrete example is MemCpyOpt which preserves CFG, but we are testing
    it invalidates BasicAA.

    llvm-dev RFC:
    https://groups.google.com/forum/#!topic/llvm-dev/eSPXuWnNfzM

    Reviewers: john.brawn, sebpop, hfinkel, brzycki

    Subscribers: hiraditya, llvm-commits

    Tags: #llvm

    Differential Revision: https://reviews.llvm.org/D74353

  This fixes an issue with clang's -fintegrated-cc1 feature, which could
  make it output slightly different assembly code, depending on the way it
  was invoked.

  In r361755 we attempted to work around it by disabling the integrated
  cc1 stage, but it did not solve the root cause for all situations.

  Extensive testing and bisecting showed that the above change finally
  makes the output deterministic, even if -fintegrated-cc1 is on.

  Reported by:	Fabian Keil <fk@fabiankeil.de>
  PR:		246630

Changes:
_U  stable/11/
_U  stable/11/contrib/llvm-project/llvm/
  stable/11/contrib/llvm-project/llvm/lib/Analysis/BasicAliasAnalysis.cpp
  stable/11/lib/clang/include/clang/Config/config.h
_U  stable/12/
  stable/12/contrib/llvm-project/llvm/lib/Analysis/BasicAliasAnalysis.cpp
  stable/12/lib/clang/include/clang/Config/config.h