Summary: | Clang produces vmovaps with unaligned operand | ||||||
---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | Gleb Popov <arrowd> | ||||
Component: | bin | Assignee: | Dimitry Andric <dim> | ||||
Status: | Closed FIXED | ||||||
Severity: | Affects Only Me | CC: | cem, dim, eadler | ||||
Priority: | --- | ||||||
Version: | CURRENT | ||||||
Hardware: | Any | ||||||
OS: | Any | ||||||
See Also: | https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229788 | ||||||
Attachments: |
|
Description
Gleb Popov
2018-02-20 10:34:23 UTC
I assume this is occurring on i386? And what actual CPU type do you have? If you want to see what clang auto-detects, run: clang -v -march=native -c -x c /dev/null -o /dev/null 2>&1 | grep target-cpu and look for the -target-cpu option. (In reply to Dimitry Andric from comment #1) Nope, I'm on amd64: "/usr/bin/clang" -cc1 -triple x86_64-unknown-freebsd12.0 <...> -target-cpu skylake <...> Unfortunately I do not have skylake or AVX2 capable hardware at this moment, and -march=native on my IvyBridge machine works just fine (i.e, no SIGBUS). Having this reduced to a somewhat smaller test case would be nice... (In reply to arrowd from comment #2) > Nope, I'm on amd64: > > "/usr/bin/clang" -cc1 -triple x86_64-unknown-freebsd12.0 <...> -target-cpu > skylake <...> Is it possible for you to figure out how GC.c is compiled on your system by the ghc build process, and then manually run the same command, adding "-v -save-temps" ? Then please put the .c, .ii, .s and .o files in a tarball, together with a log of the full compiler output (e.g. the intermediate command lines that it shows via -v), and attach that. Created attachment 191205 [details]
Tarball with requested files
Here is the tarball with files you requested.
&the_gc_thread is cast to gc_thread *t. new_gc_thread((gc_thread *)&the_gc_thread) => ws = &t->gens[...] ws is gen_workspace, which is __aligned(64) (supposedly). (This is why Clang is able to generate the aligned AVX op.) Neither gc_thread nor gc_thread::gens are tagged with any explicit alignment constraint. the_gc_thread is declared as 'StgWord8 the_gc_thread[sizeof(gc_thread) + 64 * sizeof(gen_workspace)];', which has no alignment requirements. That's the problem. This is bogus code in GHC. Fixed upstream: https://ghc.haskell.org/trac/ghc/ticket/15482 Thanks cem for your analysis. (In reply to Gleb Popov from comment #7) Happy to help! It was a fun puzzle and you did all the hard work for me :-). (In reply to Gleb Popov from comment #7) Please note that the upstream GHC fix is incorrect and continue to follow-up with them about that. Their change c6cc93bca only aligns the array to W_ aka StgWord aka StgWord64 aka unsigned long (8 bytes). This is insufficient for AVX2 alignment[1] (16 bytes for xmm, 32 for ymm) and still violates the guarantee attached to the gen_workspace structure (64 byte alignment). They need to remove the 64-byte gen_workspace alignment or add 64-byte alignment to the array to remove their UB. (They could align both to the smaller 32 bytes and still allow the compiler to take advantage of AVX2.) I don't know what lead them to believe an 8-byte alignment would fix an unaligned 32-byte AVX access. [1]: https://www.felixcloutier.com/x86/MOVAPS.html |