Bug 258358 - Fails to build on riscv: arm_bf16.h: sh: clang-tblgen: Exec format error
Summary: Fails to build on riscv: arm_bf16.h: sh: clang-tblgen: Exec format error
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: riscv (show other bugs)
Version: CURRENT
Hardware: riscv Any
: --- Affects Some People
Assignee: freebsd-toolchain (Nobody)
URL:
Keywords: needs-qa, regression
: 270610 (view as bug list)
Depends on:
Blocks:
 
Reported: 2021-09-08 02:49 UTC by Klaus Küchemann
Modified: 2023-04-08 06:50 UTC (History)
13 users (show)

See Also:
koobs: maintainer-feedback? (arichardson)
koobs: maintainer-feedback? (toolchain)


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Klaus Küchemann 2021-09-08 02:49:06 UTC
--environment:--
root@rv64:/usr/src # uname -paKU
FreeBSD rv64 14.0-CURRENT FreeBSD 14.0-CURRENT #3 main-n249176-936f4a42fa2a: Mon Sep  6 23:02:54 UTC 2021     root@rv64:/usr/obj/usr/src/riscv.riscv64/sys/GENERIC-NODEBUG  riscv riscv64 1400032 1400032
-
root@rv64:/usr/src # sysctl hw
hw.machine: riscv
hw.ncpu: 4
hw.byteorder: 1234
hw.physmem: 8578408448
-
root@rv64:/usr/src # git show HEAD
commit 85bea309f935111cb362035795a59c263536b065 (HEAD -> main, origin/main, origin/HEAD)
...
Date:   Tue Sep 7 17:28:50 2021 +0100
-
crossBuilt from aarch64->Risc-V into NFS-directory, PXE-boot 
(make buildkernel/installkernel directly on hardware successful)

-- error:---
root@rv64:/usr/src # make -j4 buildworld
...
..
===> usr.sbin/autofs (includes)
--- includes_subdir_lib ---
--- arm_bf16.h ---
clang-tblgen -gen-arm-bf16  -I /usr/src/contrib/llvm-project/clang/include/clang/Basic -d arm_bf16.h.d  -o arm_bf16.h /usr/src/contrib/llvm-project/clang/include/clang/Basic/arm_bf16.td
ELF binary type "0" not known.
sh: clang-tblgen: Exec format error
*** [arm_bf16.h] Error code 126

make[5]: stopped in /usr/src/lib/clang/headers
      158.16 real       237.44 user        63.86 sys

make[1]: stopped in /usr/src

.....

-- somewhat raw but effective workaround : ---

<<<<
root@rv64:/usr/src # clang-tblgen --version
LLVM (http://llvm.org/):
  LLVM version 12.0.1
  Optimized build with assertions.
  Default target: riscv64-unknown-freebsd14.0
  Host CPU: (unknown)

root@rv64:/usr/src # brandelf -v /usr/obj/usr/src/riscv.riscv64/tmp/obj-tools/usr.bin/clang/clang-tblgen/clang-tblgen
File '/usr/obj/usr/src/riscv.riscv64/tmp/obj-tools/usr.bin/clang/clang-tblgen/clang-tblgen' is of brand 'SVR4' (0). 
>>>>>

root@rv64:/usr/src # cp /usr/bin/clang-tblgen /usr/obj/usr/src/riscv.riscv64/tmp/obj-tools/usr.bin/clang/clang-tblgen/clang-tblgen   :-)

-
root@rv64:/usr/src # make -j4 buildworld NO_CLEAN=yes ....
continue...
---

this bug happens only when building directly on the hardware, not when cross compiling.
Comment 1 Klaus Küchemann 2021-09-08 03:22:34 UTC
(In reply to Klaus Küchemann from comment #0)

well, of course same for llvm-tblgen :

ELF binary type "0" not known.
sh: llvm-tblgen: Exec format error
*** [llvm/Frontend/OpenMP/OMP.h.inc] Error code 126

make[6]: stopped in /usr/src/lib/clang/libllvm
--- cddl/lib__L ---

make[4]: stopped in /usr/src/cddl/lib

make[3]: stopped in /usr/src
--- lib__L ---

-
root@rv64:/usr/src # brandelf -v /usr/obj/usr/src/riscv.riscv64/tmp/obj-tools/usr.bin/clang/llvm-tblgen/llvm-tblgen
File '/usr/obj/usr/src/riscv.riscv64/tmp/obj-tools/usr.bin/clang/llvm-tblgen/llvm-tblgen' is of brand 'SVR4' (0).

-
root@rv64:/usr/src # cp /usr/bin/llvm-tblgen /usr/obj/usr/src/riscv.riscv64/tmp/obj-tools/usr.bin/clang/llvm-tblgen/llvm-tblgen
--
...continue ...
Comment 2 Kubilay Kocak freebsd_committer freebsd_triage 2021-09-08 07:30:40 UTC
May be related to base 8e1c989abbd1db4dac5b2149886012d43e27b9a9
Comment 3 Alex Richardson freebsd_committer freebsd_triage 2021-09-08 09:05:06 UTC
That suggests to me that all of the bootstrap tools are not executable since they are missing the notes section. You should see it with all other tools built during bootstrap but since base 8e1c989abbd1db4dac5b2149886012d43e27b9a9, tblgen will probably be the first since the host tools can be used for everthing else If you build with BOOTSTRAP_ALL_TOOLS=1 it should fall over earlier.

What does `readelf --notes /usr/obj/usr/src/riscv.riscv64/tmp/obj-tools/usr.bin/clang/clang-tblgen/clang-tblgen` print?
Comment 4 Jessica Clarke freebsd_committer freebsd_triage 2021-09-08 09:29:25 UTC
This is a known issue. We build the bootstrap tool as a static binary and with --gc-sections. The latter is key, as ELF notes are not considered as GC roots when COMDAT, as was made the case last year for the ABI note. Because RISC-V isn’t using EI_OSABI on FreeBSD there’s then no branding in the binary for the kernel to recognise (you need at least one of that, the dynamic linker path or the ELF note). The fix for this is to stop making the note COMDAT and put it only in crti.

There is a workaround. Find the -Wl,--gc-sections (IIRC it’s in usr.bin/clang/llvm.build.mk) and add ,-melf64lriscv_fbsd to the end of it.
Comment 5 Klaus Küchemann 2021-09-08 23:48:53 UTC
while slow(no cpu.freq-support yet),
built succeeded after overwriting /usr/obj/....clang-tblgen + llvm-tblgen
with clang-tblgen + llvm-tblgen from /usr/bin/
--------------------------------------------------------------
>>> World build completed on Wed Sep  8 19:23:01 UTC 2021
>>> World built in 57683 seconds, ncpu: 4, make -j4
--------------------------------------------------------------

I guess I'll leave the creation of a patch in the hands of you who are familiar with the exact root cause of the issue.
thanks
Comment 6 Jessica Clarke freebsd_committer freebsd_triage 2021-09-08 23:51:39 UTC
(In reply to Klaus Küchemann from comment #5)

That's not supported and likely to break across major LLVM versions. I suggest you use the workaround I mentioned instead (though I misspoke, it's lib/clang/llvm.build.mk, not usr.bin/clang/llvm.build.mk).
Comment 7 Klaus Küchemann 2021-09-09 00:05:12 UTC
(In reply to Jessica Clarke from comment #6)
yeah, thanks for the hint & workaround, you're right, it would break in future builds.
since the build needed 16 hours on the fu540 I'll fallback to cross-compile from aarch64 again ..
until something like this arises perhaps : https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg2196804.html
Comment 8 Klaus Küchemann 2021-09-19 21:19:24 UTC
Just to confirm :

make clean in /usr/src (no src/make.conf), applied patch :
-- lib/clang/llvm.build.mk  --
 @@ -104,7 +104,7 @@ CFLAGS+=	-fdata-sections
 .if ${LINKER_TYPE} == "mac"
 LDFLAGS+=	-Wl,-dead_strip
 .else
-LDFLAGS+=	-Wl,--gc-sections
+LDFLAGS+=	-Wl,--gc-sections,-melf64lriscv_fbsd
 .endif
--

Result:
--------------------------------------------------------------
>>> World build completed on Sun Sep 19 20:47:54 UTC 2021
>>> World built in 50245 seconds, ncpu: 4, make -j4
--------------------------------------------------------------
<hw.fdt.model: SiFive HiFive Unleashed A00>
(from 16 to 13 hours by hacked clocking from default 1Ghz to hw.clock.coreclk.frequency: 1399999944 )


I`ll leave it up to you to mark this pr as fixed/closed or to wait with it until yours official solution(s).

thanks
K.
Comment 9 Michael Tuexen freebsd_committer freebsd_triage 2022-09-24 19:41:06 UTC
Any chance this could be fixed?
Comment 10 Alex Richardson freebsd_committer freebsd_triage 2022-10-04 10:16:26 UTC
I am aware a least two workarounds, but none of those can really be committed as is:
- Remove -Wl,--gc-sections from all makefiles (will result in larger binaries)
- Rebuilding lib/csu with https://reviews.freebsd.org/D35534 (breaks the build with older linkers/compilers - not a big deal for RISC-V, but it is for other architecture).

As jrtc27@ said in https://reviews.freebsd.org/D35534, the real solution would be to move the ELF notes so they can't be GC'd, but someone needs to do that work (I don't have time right now).
Comment 11 Jessica Clarke freebsd_committer freebsd_triage 2023-04-02 21:23:56 UTC
*** Bug 270610 has been marked as a duplicate of this bug. ***
Comment 12 Robert Clausecker freebsd_committer freebsd_triage 2023-04-02 21:26:49 UTC
This affects 13.2-RC6.  Maybe we want a workaround to be put in so riscv64 consumers can upgrade from source?
Comment 13 Jessica Clarke freebsd_committer freebsd_triage 2023-04-02 21:27:58 UTC
We didn't have one for 13.0->13.1. Almost nobody's seriously using RISC-V and upgrading from source natively. For those that do they can apply the one-line workaround.
Comment 14 Robert Clausecker freebsd_committer freebsd_triage 2023-04-02 21:35:53 UTC
The only supported way to upgrade riscv64 systems is by source.  You are telling me that without hand-patching, riscv64 systems simply cannot be upgraded.  This is a sad state of affairs.  Why not just put the workaround in?  It seems to be doing the trick for now and can always be replaced with a proper solution later.
Comment 15 Colin Percival freebsd_committer freebsd_triage 2023-04-02 21:45:38 UTC
Wearing the release engineer hat: We're not going to delay the release any further for a build failure on a tier 2 architecture.

This seems like something which could be done as an Errata Notice though?
Comment 16 Robert Clausecker freebsd_committer freebsd_triage 2023-04-02 22:08:17 UTC
(In reply to Colin Percival from comment #15)

An erratum sounds reasonable.  Perhaps a pointer could also be added to updating?
Comment 17 Mark Millard 2023-04-02 23:48:18 UTC
(In reply to Robert Clausecker from comment #14)

One can buildworld buildkernel on other types of machines,
such as amd64 and aarch64. One can installkernel and
installworld to some types of external media from such
machines as well. (I've no clue how common such external
media is for riscv64 use.)

In Jessica's statement "[a]lmost nobody's seriously using
RISC-V and upgrading from source natively" the "natively"
was important but dropped in your response: the suggestion
was to not do self-hosted builds/updates, but still doing
source code based builds, however.

(I've presumed same-endian contexts in the above wording.)
Comment 18 Robert Clausecker freebsd_committer freebsd_triage 2023-04-02 23:58:01 UTC
(In reply to Mark Millard from comment #17)

Such a procedure requires disassembling the machine to access the storage from another device and is thus fairly cumbersome.  Last time I checked, upgrading does not really work with this mechanism either as upgrading involves running some binaries to merge configuration and such, which doesn't work when cross-building.  It is additionally not documented anywhere.  Is this really how we want things to be?

It is a reasonable expectation that any FreeBSD system should be able to upgrade itself following procedures given in the handbook with no external computer needed.  Yes, upgrading is important to me.  I absolutely do not want to set up the system anew every time a new release comes out.  This is repulsive and a huge waste of time.
Comment 19 Mark Millard 2023-04-03 00:32:14 UTC
(In reply to Robert Clausecker from comment #18)

https://docs.freebsd.org/en/articles/committers-guide/#archs
reports for tier 2:

QUOTE
Tier 2 platforms should be self-hosting either via the in-tree toolchain or an external toolchain. If an external toolchain is required, official binary packages for an external toolchain will be provided.
END QUOTE

riscv64, riscv64sf are listed as tier 2 for 13.x and later on
https://www.freebsd.org/platforms/ . (There are notes around for
riscv64sf possibly being dropped.)

As for my cross builds/installs notes . . .

Definitely true that I have not been through all the steps
for cross builds/installs in some time. Do not take the
below as an overall claim that there would not be problems.

Having poor access to the media would be a pain for sure.
Nothing here deals with that.

As I remember, some activities like etcupdate (these days)
work from the builder/installer machine when done
appropriately (referencing the propriate source tree, told
the architecture, and target's mount and the like). For
example, etcupdate has:

     -M options     Pass options as additional parameters to make(1) when
                    building a “current” tree.  This can be used for to set
                    the TARGET or TARGET_ARCH variables for a cross-build.

(not just the likes of -s and -D and -d for where to get
or put information).