Bug 226059 - Clang produces vmovaps with unaligned operand
Summary: Clang produces vmovaps with unaligned operand
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: Dimitry Andric
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-02-20 10:34 UTC by Gleb Popov
Modified: 2018-08-08 15:18 UTC (History)
3 users (show)

See Also:


Attachments
Tarball with requested files (785.50 KB, application/octet-stream)
2018-03-04 19:23 UTC, Gleb Popov
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Gleb Popov freebsd_committer freebsd_triage 2018-02-20 10:34:23 UTC
I am compiling lang/ghc with CPUTYPE?=native set. The ghc-pkg executable produced during build gets broken. During staging step, the following command is run

/wrkdirs/usr/ports/lang/ghc/work/stage/usr/local/lib/ghc-8.0.2/bin/ghc-pkg --force --global-package-db "/wrkdirs/usr/ports/lang/ghc/work/stage/usr/local/lib/ghc-8.0.2/package.conf.d" update rts/dist/package.conf.install

but it fails with SIGBUS. Debugging under LLDB gives

(lldb) run
Process 29833 stopped
* thread #1, name = 'ghc-pkg', stop reason = signal SIGBUS: hardware error
    frame #0: 0x0000000804e476c6 libHSrts-ghc8.0.2.so`initGcThreads [inlined] new_gc_thread(n=0) at GC.c:818
   815          ws->todo_q = newWSDeque(128);
   816          ws->todo_overflow = NULL;
   817          ws->n_todo_overflow = 0;
-> 818          ws->todo_large_objects = NULL;
   819 
   820          ws->part_list = NULL;
   821          ws->n_part_blocks = 0;
(lldb) disas -F intel
->  0x804e476c6 <+326>: vmovaps ymmword ptr [rbx + 0x40], ymm0

The value of rbx + 0x40 mod 16 = 0, but rbx + 0x40 mod 32 = 16, which is a source of the error, I suspect.

Steps to reproduce:

# echo CPUTYPE?=native >> /usr/local/etc/poudriere.d/<yourjail>-<yourports>-make.conf
# poudriere testport -j yourjail -p yourports -I lang/ghc

Once that fails, enter the jail and

# cd /wrkdirs/usr/ports/lang/ghc/work/ghc-8.0.2
# lldb -- /wrkdirs/usr/ports/lang/ghc/work/stage/usr/local/lib/ghc-8.0.2/bin/ghc-pkg --force --global-package-db "/wrkdirs/usr/ports/lang/ghc/work/stage/usr/local/lib/ghc-8.0.2/package.conf.d" update rts/dist/package.conf.install

The same bug is present on 11.1-RELEASE too.
Comment 1 Dimitry Andric freebsd_committer freebsd_triage 2018-02-21 07:35:25 UTC
I assume this is occurring on i386?  And what actual CPU type do you have?

If you want to see what clang auto-detects, run:

clang -v -march=native -c -x c /dev/null -o /dev/null 2>&1 | grep target-cpu

and look for the -target-cpu option.
Comment 2 Gleb Popov freebsd_committer freebsd_triage 2018-02-21 08:42:00 UTC
(In reply to Dimitry Andric from comment #1)

Nope, I'm on amd64:

"/usr/bin/clang" -cc1 -triple x86_64-unknown-freebsd12.0 <...> -target-cpu skylake <...>
Comment 3 Dimitry Andric freebsd_committer freebsd_triage 2018-02-21 14:15:25 UTC
Unfortunately I do not have skylake or AVX2 capable hardware at this moment, and -march=native on my IvyBridge machine works just fine (i.e, no SIGBUS).

Having this reduced to a somewhat smaller test case would be nice...
Comment 4 Dimitry Andric freebsd_committer freebsd_triage 2018-02-23 10:15:31 UTC
(In reply to arrowd from comment #2)
> Nope, I'm on amd64:
> 
> "/usr/bin/clang" -cc1 -triple x86_64-unknown-freebsd12.0 <...> -target-cpu
> skylake <...>

Is it possible for you to figure out how GC.c is compiled on your system by the ghc build process, and then manually run the same command, adding "-v -save-temps" ?

Then please put the .c, .ii, .s and .o files in a tarball, together with a log of the full compiler output (e.g. the intermediate command lines that it shows via -v), and attach that.
Comment 5 Gleb Popov freebsd_committer freebsd_triage 2018-03-04 19:23:05 UTC
Created attachment 191205 [details]
Tarball with requested files

Here is the tarball with files you requested.
Comment 6 Conrad Meyer freebsd_committer freebsd_triage 2018-08-05 01:55:42 UTC
&the_gc_thread is cast to gc_thread *t.

new_gc_thread((gc_thread *)&the_gc_thread) =>
  ws = &t->gens[...]

ws is gen_workspace, which is __aligned(64) (supposedly).  (This is why Clang is able to generate the aligned AVX op.)

Neither gc_thread nor gc_thread::gens are tagged with any explicit alignment constraint.

the_gc_thread is declared as 'StgWord8 the_gc_thread[sizeof(gc_thread) + 64 * sizeof(gen_workspace)];', which has no alignment requirements.

That's the problem.  This is bogus code in GHC.
Comment 7 Gleb Popov freebsd_committer freebsd_triage 2018-08-08 14:04:30 UTC
Fixed upstream: https://ghc.haskell.org/trac/ghc/ticket/15482

Thanks cem for your analysis.
Comment 8 Conrad Meyer freebsd_committer freebsd_triage 2018-08-08 15:08:02 UTC
(In reply to Gleb Popov from comment #7)
Happy to help!  It was a fun puzzle and you did all the hard work for me :-).
Comment 9 Conrad Meyer freebsd_committer freebsd_triage 2018-08-08 15:18:41 UTC
(In reply to Gleb Popov from comment #7)
Please note that the upstream GHC fix is incorrect and continue to follow-up with them about that.

Their change c6cc93bca only aligns the array to W_ aka StgWord aka StgWord64 aka unsigned long (8 bytes).

This is insufficient for AVX2 alignment[1] (16 bytes for xmm, 32 for ymm) and still violates the guarantee attached to the gen_workspace structure (64 byte alignment).  They need to remove the 64-byte gen_workspace alignment or add 64-byte alignment to the array to remove their UB.  (They could align both to the smaller 32 bytes and still allow the compiler to take advantage of AVX2.)

I don't know what lead them to believe an 8-byte alignment would fix an unaligned 32-byte AVX access.

[1]: https://www.felixcloutier.com/x86/MOVAPS.html