Bug 258494 - lang/erlang lang/erlang-runtime21 lang/erlang-runtime23: clang 13 build breaks dtrace if PGO is enabled
Summary: lang/erlang lang/erlang-runtime21 lang/erlang-runtime23: clang 13 build break...
Status: New
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-erlang (Nobody)
URL:
Keywords:
Depends on:
Blocks: 258209
  Show dependency treegraph
 
Reported: 2021-09-14 08:07 UTC by Dimitry Andric
Modified: 2021-09-15 19:32 UTC (History)
1 user (show)

See Also:
bugzilla: maintainer-feedback? (erlang)


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dimitry Andric freebsd_committer 2021-09-14 08:07:11 UTC
During an exp-run for llvm 13 (see bug 258209), it turned out that lang/erlang and lang/erlang-runtime2[1-3] fail to build with clang 13:

http://gohan04.nyi.freebsd.org/data/mainamd64PR258209-default/2021-09-05_20h27m09s/logs/errors/erlang-21.3.8.24_1,4.log
http://gohan04.nyi.freebsd.org/data/mainamd64PR258209-default/2021-09-05_20h27m09s/logs/errors/erlang-runtime21-21.3.8.24_1.log
http://gohan04.nyi.freebsd.org/data/mainamd64PR258209-default/2021-09-05_20h27m09s/logs/errors/erlang-runtime23-23.3.4.6.log

What appears to happen is that for these versions of erlang, PGO is enabled, and it first builds a PGO-enabled beam.smp:

gmake[5]: Entering directory '/wrkdirs/usr/ports/lang/erlang/work/otp-OTP-21.3.8.24/erts/emulator'
if utils/gen_git_version amd64-portbld-freebsd14.0/gen_git_version.mk; then touch beam/erl_bif_info.c; fi
echo " PROFILE beam.prof.smp"
 PROFILE beam.prof.smp
rm -f obj/amd64-portbld-freebsd14.0/opt/smp/erl*.profraw
set -e; LLVM_PROFILE_FILE="obj/amd64-portbld-freebsd14.0/opt/smp/erlc-%m.profraw" \
          ERL_FLAGS="-emu_type prof +S 1" erlc -W  -DPGO \
  -o obj/amd64-portbld-freebsd14.0/opt/smp test/estone_SUITE.erl > obj/amd64-portbld-freebsd14.0/opt/smp/PROFILE_LOG

after which it does a test run, and uses llvm-profdata to merge the profiling data into beam_emu_pu.o:

llvm-profdata merge -output obj/amd64-portbld-freebsd14.0/opt/smp/default.profdata obj/amd64-portbld-freebsd14.0/opt/smp/*.profraw
cc -fprofile-instr-use=obj/amd64-portbld-freebsd14.0/opt/smp/default.profdata  -Werror=undef -Werror=implicit -Werror=return-type   -O3 -fomit-frame-pointer -pipe  -fno-omit-frame-pointer -DMAP_N
ORESERVE=0 -fstack-protector-strong -fno-strict-aliasing  -I/wrkdirs/usr/ports/lang/erlang/work/otp-OTP-21.3.8.24/erts/amd64-portbld-freebsd14.0    -DHAVE_CONFIG_H -Wall -Wstrict-prototypes -Wmis
sing-prototypes -Wdeclaration-after-statement -DUSE_THREADS -D_THREAD_SAFE -D_REENTRANT -DPOSIX_THREADS   -Iamd64-portbld-freebsd14.0/opt/smp -Ibeam -Isys/unix -Isys/common -Iamd64-portbld-freebs
d14.0 -Ipcre -Ihipe -I../include -I../include/amd64-portbld-freebsd14.0 -I../include/internal -I../include/internal/amd64-portbld-freebsd14.0 -c beam/beam_emu.c -o obj/amd64-portbld-freebsd14.0/o
pt/smp/beam_emu_pu.o

Later, it runs dtrace over all the collected objects, and this dies:

dtrace -G -C -Ibeam \
  -s beam/erlang_dtrace.d \
  -o obj/amd64-portbld-freebsd14.0/opt/smp/erlang_pu_dtrace.o
  ... long list of objects ...
dtrace: failed to link script beam/erlang_dtrace.d: an error was encountered while processing obj/amd64-portbld-freebsd14.0/opt/smp/beam_emu_pu.o
gmake[5]: *** [amd64-portbld-freebsd14.0/Makefile:1005: obj/amd64-portbld-freebsd14.0/opt/smp/erlang_pu_dtrace.o] Error 1
gmake[5]: Leaving directory '/wrkdirs/usr/ports/lang/erlang/work/otp-OTP-21.3.8.24/erts/emulator'

Something in beam_emu_pu.o (emitted by clang or llvm 13) is tripping up dtrace, but I have very little knowledge about dtrace so I need help here. :)

Now some other erlang runtimes such as lang/erlang-runtime24 *do* build successfully with clang 13, but this is only because upstream disabled the PGO feature, as a side effect of https://github.com/erlang/otp/commit/b165524c732 ("erts: Implement the BeamAsm JIT"): 

--- a/erts/configure.in
+++ b/erts/configure.in
...
@@ -704,6 +719,9 @@ else
   fi
 fi

+dnl Disable pgo for now
+USE_PGO=false
+
 AC_SUBST(USE_PGO)
 AC_SUBST(PROFILE_COMPILER)

(This took a *lot* of time to bisect due to erlang's afwul non-linear history... :)

So, I'm unsure why upstream disabled this "for now", as it has been disabled for more than a year. We could work around the other erlang failures by also disabling PGO there, or with help from someone knowledgeable about dtrace, try to get to the bottom of why dtrace dies on clang 13 produced PGO object files.
Comment 1 Dimitry Andric freebsd_committer 2021-09-15 19:32:43 UTC
(In reply to Dimitry Andric from comment #0)
> Something in beam_emu_pu.o (emitted by clang or llvm 13) is tripping up dtrace, but I have very little knowledge about dtrace so I need help here. :)

So there are two objects that are instrumented with profile generation code, beam_emu_pg.o and erl_process_pg.o. These are linked into a beam.smp executable which generates profile data, and the data is used to build beam_emu_pu.o and erl_process_pu.o.

The difference between llvm 12 and llvm 13 output is in the produced sections for the _pu.o files. With llvm 12, there is a .llvm.call-graph-profile section (to be used by the linker to rearrange 'hot' and 'cold' parts):

  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
...
  [19] .llvm.call-graph-profile LOOS+0xfff4c02  0000000000000000 02e9f0 000e80 10   E 21   0  1

whereas with llvm 13, there is an additional .rel.llvm.call-graph-profile section:

  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
...
  [17] .llvm.call-graph-profile LOOS+0xfff4c09  0000000000000000 0279a9 000750 08   E 22   0  1
  [18] .rel.llvm.call-graph-profile REL             0000000000000000 030888 001d40 10     22  17  8

I have verified that removing the .rel.llvm.call-graph-profile section from the _pu.o files (using objcopy --remove .rel.llvm.call-graph-profile) makes dtrace not crash anymore.

However, I think the .rel.llvm.call-graph-profile section might contain information that is useful to the linker. So the question is still what is in this particular section that makes dtrace crash.