Bug 238082

Summary: graphics/mesa-dri: clang 8 (from head@r347549) crashes during build on powerpc64 (WITH_CLANG_IS_CC)
Product: Base System Reporter: Mark Millard <marklmi26-fbsd>
Component: binAssignee: Dimitry Andric <dim>
Status: Closed FIXED    
Severity: Affects Only Me CC: dim, toolchain, x11
Priority: --- Keywords: regression, toolchain
Version: CURRENTFlags: dim: mfc-stable12+
dim: mfc-stable11+
Hardware: powerpc   
OS: Any   
See Also: https://bugs.llvm.org/show_bug.cgi?id=42010
Bug Depends on:    
Bug Blocks: 236062    
Attachments:
Description Flags
/tmp/nir_constant_expressions-9b094e.c from clang's saved failure information
none
/tmp/nir_constant_expressions-9b094e.sh from clang's saved failure information none

Description Mark Millard 2019-05-23 19:27:08 UTC
Created attachment 204574 [details]
/tmp/nir_constant_expressions-9b094e.c from clang's saved failure information

[This is from the unusual context of a powerpc64 old PowerMac)
built with system clang 8 and base/binutils instead of gcc 4.2.1
based tooling.]

From a poudriere bulk build in a powerpc64 context (old PowerMac)
that was built with and uses system clang 8 and base/binutils
instead of the gcc 4.2.1 toolchain. I got:

[09:05:56] [04] [00:14:42] Saved graphics/mesa-dri | mesa-dri-18.3.2_2 wrkdir to: /usr/local/poudriere/data/wrkdirs/FBSDpowerpc64-default/default/mesa-dri-18.3.2_2.tbz
[09:05:57] [04] [00:14:43] Finished graphics/mesa-dri | mesa-dri-18.3.2_2: Failed: build
[09:05:59] [04] [00:14:45] Skipping x11-drivers/xf86-input-keyboard | xf86-input-keyboard-1.9.0_3: Dependent port graphics/mesa-dri | mesa-dri-18.3.2_2 failed
[09:05:59] [04] [00:14:45] Skipping x11-drivers/xf86-input-mouse | xf86-input-mouse-1.9.3_2: Dependent port graphics/mesa-dri | mesa-dri-18.3.2_2 failed
[09:05:59] [04] [00:14:45] Skipping x11-drivers/xf86-video-scfb | xf86-video-scfb-0.0.4_7: Dependent port graphics/mesa-dri | mesa-dri-18.3.2_2 failed
[09:05:59] [04] [00:14:45] Skipping x11-drivers/xf86-video-vesa | xf86-video-vesa-2.4.0_2: Dependent port graphics/mesa-dri | mesa-dri-18.3.2_2 failed
[09:05:59] [04] [00:14:45] Skipping x11/xorg-minimal | xorg-minimal-7.5.2_2: Dependent port graphics/mesa-dri | mesa-dri-18.3.2_2 failed
[09:05:59] [04] [00:14:45] Skipping x11-servers/xorg-server | xorg-server-1.18.4_11,1: Dependent port graphics/mesa-dri | mesa-dri-18.3.2_2 failed

I do have a backtrace:

. . .
Core was generated by `/usr/bin/cc -cc1 -triple powerpc64-unknown-freebsd13.0 -emit-obj -disable-free -'.
Program terminated with signal SIGABRT, Aborted.
#0  .__sys_thr_kill () at thr_kill.S:3
3	RSYSCALL(thr_kill)
(gdb) bt
#0  .__sys_thr_kill () at thr_kill.S:3
#1  0x00000000133072d0 in __raise (s=330578472) at /usr/src/lib/libc/gen/raise.c:52
#2  0x00000000132c7898 in abort () at /usr/src/lib/libc/stdlib/abort.c:79
#3  0x00000000132f6c64 in __assert (func=<optimized out>, file=<optimized out>, line=<optimized out>, failedexpr=<optimized out>) at /usr/src/lib/libc/gen/assert.c:51
#4  0x00000000130f7c18 in WidenVectorResult () at /usr/src/contrib/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp:2531
#5  0x0000000012ad91f0 in run () at /usr/src/contrib/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:281
#6  0x0000000012adfa5c in LegalizeTypes () at /usr/src/contrib/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:1115
#7  0x000000001297ebb4 in CodeGenAndEmitDAG () at /usr/src/contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:776
#8  0x000000001297e114 in SelectBasicBlock () at /usr/src/contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:669
#9  0x000000001297cbc4 in SelectAllBasicBlocks () at /usr/src/contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1784
#10 0x0000000000000000 in ?? ()

(gdb) up 4
#4  0x00000000130f7c18 in WidenVectorResult () at /usr/src/contrib/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp:2531
2531	      assert(!TLI.isOperationLegalOrCustom(N->getOpcode(), WideVecVT) &&

(gdb) list
2526	    // libcalls on the undef elements. We are assuming that if the scalar op
2527	    // requires expanding, then the vector op needs expanding too.
2528	    EVT VT = N->getValueType(0);
2529	    if (TLI.isOperationExpand(N->getOpcode(), VT.getScalarType())) {
2530	      EVT WideVecVT = TLI.getTypeToTransformTo(*DAG.getContext(), VT);
2531	      assert(!TLI.isOperationLegalOrCustom(N->getOpcode(), WideVecVT) &&
2532	             "Target supports vector op, but scalar requires expansion?");
2533	      Res = DAG.UnrollVectorOp(N, WideVecVT.getVectorNumElements());
2534	      break;
2535	    }



Unfortunately poudiere bulk tar archives of failures do not
catch the /tmp/* material from:

cc: error: unable to execute command: Abort trap (core dumped)
cc: error: clang frontend command failed due to signal (use -v to see invocation)
FreeBSD clang version 8.0.0 (tags/RELEASE_800/final 356365) (based on LLVM 8.0.0)
Target: powerpc64-unknown-freebsd13.0
Thread model: posix
InstalledDir: /usr/bin
cc: note: diagnostic msg: PLEASE submit a bug report to https://bugs.freebsd.org/submit/ and include the crash backtrace, preprocessed source, and associated run script.
cc: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
cc: note: diagnostic msg: /tmp/nir_constant_expressions-b77016.c
cc: note: diagnostic msg: /tmp/nir_constant_expressions-b77016.sh
cc: note: diagnostic msg: 

********************
gmake[5]: *** [Makefile:2829: nir/nir_constant_expressions.lo] Error 1
gmake[5]: *** Waiting for unfinished jobs....
gmake[5]: Leaving directory '/wrkdirs/us

But I managed to repeat the from the context of the tar that
poudriere bulk produced.
Comment 1 Mark Millard 2019-05-23 19:28:28 UTC
Created attachment 204575 [details]
/tmp/nir_constant_expressions-9b094e.sh from clang's saved failure information
Comment 2 Jan Beich freebsd_committer freebsd_triage 2019-05-23 19:48:34 UTC
Over to toolchain@ but CC'ing port maintainer as well.
Comment 3 Jan Beich freebsd_committer freebsd_triage 2019-05-24 10:35:04 UTC
Clang 6.0.1 on 12.0 and devel/llvm70 before ports r490610 don't crash. Can someone bisect, minimize and submit upstream for feedback?

$ sh nir_constant_expressions-9b094e.sh
Assertion failed: (!TLI.isOperationLegalOrCustom(N->getOpcode(), WideVecVT) && "Target supports vector op, but scalar requires expansion?"), function WidenVectorResult, file /usr/src/contrib/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp, line 2532.
Abort trap
Comment 4 Dimitry Andric freebsd_committer freebsd_triage 2019-05-24 20:20:09 UTC
Reproduced, minimized and reported upstream here: https://bugs.llvm.org/show_bug.cgi?id=42010
Comment 5 commit-hook freebsd_committer freebsd_triage 2019-05-26 15:46:02 UTC
A commit references this bug:

Author: dim
Date: Sun May 26 15:44:58 UTC 2019
New revision: 348288
URL: https://svnweb.freebsd.org/changeset/base/348288

Log:
  Pull in r361696 from upstream llvm trunk (by Sanjay Patel):

    [SelectionDAG] soften assertion when legalizing narrow vector FP ops

    The test based on PR42010:
    https://bugs.llvm.org/show_bug.cgi?id=42010

    ...may show an inaccuracy for PPC's target defs, but we should not be
    so aggressive with an assert here. There's no telling what
    out-of-tree targets look like.

  This fixes an assertion when building the graphics/mesa-dri port for
  PowerPC64.

  Reported by:	Mark Millard <marklmi26-fbsd@yahoo.com>
  PR:		238082
  MFC after:	3 days

Changes:
  head/contrib/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
Comment 6 Mark Millard 2019-05-28 19:44:24 UTC
(In reply to commit-hook from comment #5)

I finally got around to testing the patch with a
rebuild attempt for graphics/mesa-dri . It just
failed a different assert:

Core was generated by `/usr/bin/cc -cc1 -triple powerpc64-unknown-freebsd13.0 -emit-obj -disable-free -'.
Program terminated with signal SIGABRT, Aborted.
#0  .__sys_thr_kill () at thr_kill.S:3
3	RSYSCALL(thr_kill)
(gdb) bt
#0  .__sys_thr_kill () at thr_kill.S:3
#1  0x00000000132c7898 in abort () at /usr/src/lib/libc/stdlib/abort.c:80
#2  0x00000000132d34bc in fprintf (fp=<optimized out>, fmt=<optimized out>) at /usr/src/lib/libc/stdio/fprintf.c:57
#3  0x00000000132f6c64 in __assert (func=<optimized out>, file=<optimized out>, line=<optimized out>, failedexpr=<optimized out>) at /usr/src/lib/libc/gen/assert.c:51
#4  0x00000000130f7c18 in getVectorNumElements () at /usr/src/contrib/llvm/include/llvm/CodeGen/ValueTypes.h:274
#5  WidenVectorResult () at /usr/src/contrib/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp:2531
#6  0x0000000000000000 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
. . .
#4  0x00000000130f7c18 in getVectorNumElements () at /usr/src/contrib/llvm/include/llvm/CodeGen/ValueTypes.h:274
274	      assert(isVector() && "Invalid vector type!");
(gdb) list
269	      return getExtendedVectorElementType();
270	    }
271	
272	    /// Given a vector type, return the number of elements it contains.
273	    unsigned getVectorNumElements() const {
274	      assert(isVector() && "Invalid vector type!");
275	      if (isSimple())
276	        return V.getVectorNumElements();
277	      return getExtendedVectorNumElements();
278	    }
Comment 7 Dimitry Andric freebsd_committer freebsd_triage 2019-05-28 20:27:09 UTC
(In reply to Mark Millard from comment #6)
> I finally got around to testing the patch with a
> rebuild attempt for graphics/mesa-dri . It just
> failed a different assert:
...
> 269	      return getExtendedVectorElementType();
> 270	    }
> 271	
> 272	    /// Given a vector type, return the number of elements it contains.
> 273	    unsigned getVectorNumElements() const {
> 274	      assert(isVector() && "Invalid vector type!");

Right, more tunnels in the rabbit hole. :)  Can you please upload these new .c and .sh files dropped after the assertion?
Comment 8 Mark Millard 2019-05-28 20:32:52 UTC
(In reply to Mark Millard from comment #6)

Looks like possible operator error in that a retry did
not fail. (I re-went through establishing the patch
to what the poudriere bulk sees as the system-clang-8.)

But in trying to see what might be going on I notice there
is logic like:

# need LLVM for libEGL wherever possible, but mixing GCC and LLVM breaks Gallium
.if ${CHOSEN_COMPILER_TYPE} == clang \
 || (${COMPONENT} == libs && ${ARCH} != sparc64)        # no working LLVM
MESA_LLVM_VER?= 60
.endif

.if "${MESA_LLVM_VER}" != ""
BUILD_DEPENDS+= llvm${MESA_LLVM_VER}>=3.9.0_4:devel/llvm${MESA_LLVM_VER}
.if ${COMPONENT} != libs
RUN_DEPENDS+=   llvm${MESA_LLVM_VER}>=3.9.0_4:devel/llvm${MESA_LLVM_VER}
.endif
CONFIGURE_ENV+= LLVM_CONFIG=${LOCALBASE}/bin/llvm-config${MESA_LLVM_VER}
LDFLAGS+=       -Wl,-rpath=${LOCALBASE}/llvm${MESA_LLVM_VER}/lib
CONFIGURE_ARGS+=        --enable-llvm
.else
CONFIGURE_ARGS+=        --disable-llvm
.endif


So it may be that devel/llvm60 ( or devel/llvm* ) needs to be patched
in order to work for contexts that would use it.

It seems that devel/llvm60 got its pkg added but mesa-dri actually
used what poudriere bulk saw as the system cc (system-clang-8 in my
context).

As for my 2nd test:

[00:01:44] [01] [00:00:00] Building graphics/mesa-dri | mesa-dri-18.3.2_2
[00:23:45] [01] [00:22:01] Finished graphics/mesa-dri | mesa-dri-18.3.2_2: Success

So I guess I somehow had the context for poudriere bulk messed up
relative the the system cc that it saw.
Comment 9 Mark Millard 2019-05-28 20:38:41 UTC
(In reply to Dimitry Andric from comment #7)

poudriere bulk does not capture the /tmp/* files.

When I attempted to reproduce outside poudriere via
an expansion of the tar it produces (as my starting
point), it failed to reproduce. Ultimately that lead
to my comment #8 material, including a successful
build via poudriere bulk as well.

At this point I do not know how to repeat the newly
observed failure point.
Comment 10 commit-hook freebsd_committer freebsd_triage 2019-05-29 18:24:16 UTC
A commit references this bug:

Author: dim
Date: Wed May 29 18:23:18 UTC 2019
New revision: 348367
URL: https://svnweb.freebsd.org/changeset/base/348367

Log:
  MFC r348288:

  Pull in r361696 from upstream llvm trunk (by Sanjay Patel):

    [SelectionDAG] soften assertion when legalizing narrow vector FP ops

    The test based on PR42010:
    https://bugs.llvm.org/show_bug.cgi?id=42010

    ...may show an inaccuracy for PPC's target defs, but we should not be
    so aggressive with an assert here. There's no telling what
    out-of-tree targets look like.

  This fixes an assertion when building the graphics/mesa-dri port for
  PowerPC64.

  Reported by:	Mark Millard <marklmi26-fbsd@yahoo.com>
  PR:		238082

Changes:
_U  stable/12/
  stable/12/contrib/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
Comment 11 commit-hook freebsd_committer freebsd_triage 2019-05-29 18:33:26 UTC
A commit references this bug:

Author: dim
Date: Wed May 29 18:32:44 UTC 2019
New revision: 348368
URL: https://svnweb.freebsd.org/changeset/base/348368

Log:
  MFC r348288:

  Pull in r361696 from upstream llvm trunk (by Sanjay Patel):

    [SelectionDAG] soften assertion when legalizing narrow vector FP ops

    The test based on PR42010:
    https://bugs.llvm.org/show_bug.cgi?id=42010

    ...may show an inaccuracy for PPC's target defs, but we should not be
    so aggressive with an assert here. There's no telling what
    out-of-tree targets look like.

  This fixes an assertion when building the graphics/mesa-dri port for
  PowerPC64.

  Approved by:	re (kib)
  Reported by:	Mark Millard <marklmi26-fbsd@yahoo.com>
  PR:		238082

Changes:
_U  stable/11/
  stable/11/contrib/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
Comment 12 Dimitry Andric freebsd_committer freebsd_triage 2019-05-29 18:39:15 UTC
(In reply to Mark Millard from comment #9)
> At this point I do not know how to repeat the newly
> observed failure point.

Ok, closing this bug now, as I merged the fix to both stable/11 and stable/12.