Bug 258358 - Fails to build on riscv: arm_bf16.h: sh: clang-tblgen: Exec format error
Summary: Fails to build on riscv: arm_bf16.h: sh: clang-tblgen: Exec format error
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: riscv (show other bugs)
Version: CURRENT
Hardware: riscv Any
: --- Affects Some People
Assignee: freebsd-toolchain (Nobody)
Keywords: needs-qa, regression
Depends on:
Reported: 2021-09-08 02:49 UTC by Klaus Küchemann
Modified: 2022-10-04 10:16 UTC (History)
8 users (show)

See Also:
koobs: maintainer-feedback? (arichardson)
koobs: maintainer-feedback? (toolchain)


Note You need to log in before you can comment on or make changes to this bug.
Description Klaus Küchemann 2021-09-08 02:49:06 UTC
root@rv64:/usr/src # uname -paKU
FreeBSD rv64 14.0-CURRENT FreeBSD 14.0-CURRENT #3 main-n249176-936f4a42fa2a: Mon Sep  6 23:02:54 UTC 2021     root@rv64:/usr/obj/usr/src/riscv.riscv64/sys/GENERIC-NODEBUG  riscv riscv64 1400032 1400032
root@rv64:/usr/src # sysctl hw
hw.machine: riscv
hw.ncpu: 4
hw.byteorder: 1234
hw.physmem: 8578408448
root@rv64:/usr/src # git show HEAD
commit 85bea309f935111cb362035795a59c263536b065 (HEAD -> main, origin/main, origin/HEAD)
Date:   Tue Sep 7 17:28:50 2021 +0100
crossBuilt from aarch64->Risc-V into NFS-directory, PXE-boot 
(make buildkernel/installkernel directly on hardware successful)

-- error:---
root@rv64:/usr/src # make -j4 buildworld
===> usr.sbin/autofs (includes)
--- includes_subdir_lib ---
--- arm_bf16.h ---
clang-tblgen -gen-arm-bf16  -I /usr/src/contrib/llvm-project/clang/include/clang/Basic -d arm_bf16.h.d  -o arm_bf16.h /usr/src/contrib/llvm-project/clang/include/clang/Basic/arm_bf16.td
ELF binary type "0" not known.
sh: clang-tblgen: Exec format error
*** [arm_bf16.h] Error code 126

make[5]: stopped in /usr/src/lib/clang/headers
      158.16 real       237.44 user        63.86 sys

make[1]: stopped in /usr/src


-- somewhat raw but effective workaround : ---

root@rv64:/usr/src # clang-tblgen --version
LLVM (http://llvm.org/):
  LLVM version 12.0.1
  Optimized build with assertions.
  Default target: riscv64-unknown-freebsd14.0
  Host CPU: (unknown)

root@rv64:/usr/src # brandelf -v /usr/obj/usr/src/riscv.riscv64/tmp/obj-tools/usr.bin/clang/clang-tblgen/clang-tblgen
File '/usr/obj/usr/src/riscv.riscv64/tmp/obj-tools/usr.bin/clang/clang-tblgen/clang-tblgen' is of brand 'SVR4' (0). 

root@rv64:/usr/src # cp /usr/bin/clang-tblgen /usr/obj/usr/src/riscv.riscv64/tmp/obj-tools/usr.bin/clang/clang-tblgen/clang-tblgen   :-)

root@rv64:/usr/src # make -j4 buildworld NO_CLEAN=yes ....

this bug happens only when building directly on the hardware, not when cross compiling.
Comment 1 Klaus Küchemann 2021-09-08 03:22:34 UTC
(In reply to Klaus Küchemann from comment #0)

well, of course same for llvm-tblgen :

ELF binary type "0" not known.
sh: llvm-tblgen: Exec format error
*** [llvm/Frontend/OpenMP/OMP.h.inc] Error code 126

make[6]: stopped in /usr/src/lib/clang/libllvm
--- cddl/lib__L ---

make[4]: stopped in /usr/src/cddl/lib

make[3]: stopped in /usr/src
--- lib__L ---

root@rv64:/usr/src # brandelf -v /usr/obj/usr/src/riscv.riscv64/tmp/obj-tools/usr.bin/clang/llvm-tblgen/llvm-tblgen
File '/usr/obj/usr/src/riscv.riscv64/tmp/obj-tools/usr.bin/clang/llvm-tblgen/llvm-tblgen' is of brand 'SVR4' (0).

root@rv64:/usr/src # cp /usr/bin/llvm-tblgen /usr/obj/usr/src/riscv.riscv64/tmp/obj-tools/usr.bin/clang/llvm-tblgen/llvm-tblgen
...continue ...
Comment 2 Kubilay Kocak freebsd_committer freebsd_triage 2021-09-08 07:30:40 UTC
May be related to base 8e1c989abbd1db4dac5b2149886012d43e27b9a9
Comment 3 Alex Richardson freebsd_committer 2021-09-08 09:05:06 UTC
That suggests to me that all of the bootstrap tools are not executable since they are missing the notes section. You should see it with all other tools built during bootstrap but since base 8e1c989abbd1db4dac5b2149886012d43e27b9a9, tblgen will probably be the first since the host tools can be used for everthing else If you build with BOOTSTRAP_ALL_TOOLS=1 it should fall over earlier.

What does `readelf --notes /usr/obj/usr/src/riscv.riscv64/tmp/obj-tools/usr.bin/clang/clang-tblgen/clang-tblgen` print?
Comment 4 Jessica Clarke freebsd_committer 2021-09-08 09:29:25 UTC
This is a known issue. We build the bootstrap tool as a static binary and with --gc-sections. The latter is key, as ELF notes are not considered as GC roots when COMDAT, as was made the case last year for the ABI note. Because RISC-V isn’t using EI_OSABI on FreeBSD there’s then no branding in the binary for the kernel to recognise (you need at least one of that, the dynamic linker path or the ELF note). The fix for this is to stop making the note COMDAT and put it only in crti.

There is a workaround. Find the -Wl,--gc-sections (IIRC it’s in usr.bin/clang/llvm.build.mk) and add ,-melf64lriscv_fbsd to the end of it.
Comment 5 Klaus Küchemann 2021-09-08 23:48:53 UTC
while slow(no cpu.freq-support yet),
built succeeded after overwriting /usr/obj/....clang-tblgen + llvm-tblgen
with clang-tblgen + llvm-tblgen from /usr/bin/
>>> World build completed on Wed Sep  8 19:23:01 UTC 2021
>>> World built in 57683 seconds, ncpu: 4, make -j4

I guess I'll leave the creation of a patch in the hands of you who are familiar with the exact root cause of the issue.
Comment 6 Jessica Clarke freebsd_committer 2021-09-08 23:51:39 UTC
(In reply to Klaus Küchemann from comment #5)

That's not supported and likely to break across major LLVM versions. I suggest you use the workaround I mentioned instead (though I misspoke, it's lib/clang/llvm.build.mk, not usr.bin/clang/llvm.build.mk).
Comment 7 Klaus Küchemann 2021-09-09 00:05:12 UTC
(In reply to Jessica Clarke from comment #6)
yeah, thanks for the hint & workaround, you're right, it would break in future builds.
since the build needed 16 hours on the fu540 I'll fallback to cross-compile from aarch64 again ..
until something like this arises perhaps : https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg2196804.html
Comment 8 Klaus Küchemann 2021-09-19 21:19:24 UTC
Just to confirm :

make clean in /usr/src (no src/make.conf), applied patch :
-- lib/clang/llvm.build.mk  --
 @@ -104,7 +104,7 @@ CFLAGS+=	-fdata-sections
 .if ${LINKER_TYPE} == "mac"
 LDFLAGS+=	-Wl,-dead_strip
-LDFLAGS+=	-Wl,--gc-sections
+LDFLAGS+=	-Wl,--gc-sections,-melf64lriscv_fbsd

>>> World build completed on Sun Sep 19 20:47:54 UTC 2021
>>> World built in 50245 seconds, ncpu: 4, make -j4
<hw.fdt.model: SiFive HiFive Unleashed A00>
(from 16 to 13 hours by hacked clocking from default 1Ghz to hw.clock.coreclk.frequency: 1399999944 )

I`ll leave it up to you to mark this pr as fixed/closed or to wait with it until yours official solution(s).

Comment 9 Michael Tuexen freebsd_committer 2022-09-24 19:41:06 UTC
Any chance this could be fixed?
Comment 10 Alex Richardson freebsd_committer 2022-10-04 10:16:26 UTC
I am aware a least two workarounds, but none of those can really be committed as is:
- Remove -Wl,--gc-sections from all makefiles (will result in larger binaries)
- Rebuilding lib/csu with https://reviews.freebsd.org/D35534 (breaks the build with older linkers/compilers - not a big deal for RISC-V, but it is for other architecture).

As jrtc27@ said in https://reviews.freebsd.org/D35534, the real solution would be to move the ELF notes so they can't be GC'd, but someone needs to do that work (I don't have time right now).