Bug 258494 - lang/erlang lang/erlang-runtime21 lang/erlang-runtime23: clang 13 build breaks dtrace if PGO is enabled
Summary: lang/erlang lang/erlang-runtime21 lang/erlang-runtime23: clang 13 build break...
Status: Closed FIXED
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-erlang (Nobody)
URL:
Keywords:
Depends on:
Blocks: 258209
  Show dependency treegraph
 
Reported: 2021-09-14 08:07 UTC by Dimitry Andric
Modified: 2021-10-03 10:21 UTC (History)
1 user (show)

See Also:
bugzilla: maintainer-feedback? (erlang)


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dimitry Andric freebsd_committer freebsd_triage 2021-09-14 08:07:11 UTC
During an exp-run for llvm 13 (see bug 258209), it turned out that lang/erlang and lang/erlang-runtime2[1-3] fail to build with clang 13:

http://gohan04.nyi.freebsd.org/data/mainamd64PR258209-default/2021-09-05_20h27m09s/logs/errors/erlang-21.3.8.24_1,4.log
http://gohan04.nyi.freebsd.org/data/mainamd64PR258209-default/2021-09-05_20h27m09s/logs/errors/erlang-runtime21-21.3.8.24_1.log
http://gohan04.nyi.freebsd.org/data/mainamd64PR258209-default/2021-09-05_20h27m09s/logs/errors/erlang-runtime23-23.3.4.6.log

What appears to happen is that for these versions of erlang, PGO is enabled, and it first builds a PGO-enabled beam.smp:

gmake[5]: Entering directory '/wrkdirs/usr/ports/lang/erlang/work/otp-OTP-21.3.8.24/erts/emulator'
if utils/gen_git_version amd64-portbld-freebsd14.0/gen_git_version.mk; then touch beam/erl_bif_info.c; fi
echo " PROFILE beam.prof.smp"
 PROFILE beam.prof.smp
rm -f obj/amd64-portbld-freebsd14.0/opt/smp/erl*.profraw
set -e; LLVM_PROFILE_FILE="obj/amd64-portbld-freebsd14.0/opt/smp/erlc-%m.profraw" \
          ERL_FLAGS="-emu_type prof +S 1" erlc -W  -DPGO \
  -o obj/amd64-portbld-freebsd14.0/opt/smp test/estone_SUITE.erl > obj/amd64-portbld-freebsd14.0/opt/smp/PROFILE_LOG

after which it does a test run, and uses llvm-profdata to merge the profiling data into beam_emu_pu.o:

llvm-profdata merge -output obj/amd64-portbld-freebsd14.0/opt/smp/default.profdata obj/amd64-portbld-freebsd14.0/opt/smp/*.profraw
cc -fprofile-instr-use=obj/amd64-portbld-freebsd14.0/opt/smp/default.profdata  -Werror=undef -Werror=implicit -Werror=return-type   -O3 -fomit-frame-pointer -pipe  -fno-omit-frame-pointer -DMAP_N
ORESERVE=0 -fstack-protector-strong -fno-strict-aliasing  -I/wrkdirs/usr/ports/lang/erlang/work/otp-OTP-21.3.8.24/erts/amd64-portbld-freebsd14.0    -DHAVE_CONFIG_H -Wall -Wstrict-prototypes -Wmis
sing-prototypes -Wdeclaration-after-statement -DUSE_THREADS -D_THREAD_SAFE -D_REENTRANT -DPOSIX_THREADS   -Iamd64-portbld-freebsd14.0/opt/smp -Ibeam -Isys/unix -Isys/common -Iamd64-portbld-freebs
d14.0 -Ipcre -Ihipe -I../include -I../include/amd64-portbld-freebsd14.0 -I../include/internal -I../include/internal/amd64-portbld-freebsd14.0 -c beam/beam_emu.c -o obj/amd64-portbld-freebsd14.0/o
pt/smp/beam_emu_pu.o

Later, it runs dtrace over all the collected objects, and this dies:

dtrace -G -C -Ibeam \
  -s beam/erlang_dtrace.d \
  -o obj/amd64-portbld-freebsd14.0/opt/smp/erlang_pu_dtrace.o
  ... long list of objects ...
dtrace: failed to link script beam/erlang_dtrace.d: an error was encountered while processing obj/amd64-portbld-freebsd14.0/opt/smp/beam_emu_pu.o
gmake[5]: *** [amd64-portbld-freebsd14.0/Makefile:1005: obj/amd64-portbld-freebsd14.0/opt/smp/erlang_pu_dtrace.o] Error 1
gmake[5]: Leaving directory '/wrkdirs/usr/ports/lang/erlang/work/otp-OTP-21.3.8.24/erts/emulator'

Something in beam_emu_pu.o (emitted by clang or llvm 13) is tripping up dtrace, but I have very little knowledge about dtrace so I need help here. :)

Now some other erlang runtimes such as lang/erlang-runtime24 *do* build successfully with clang 13, but this is only because upstream disabled the PGO feature, as a side effect of https://github.com/erlang/otp/commit/b165524c732 ("erts: Implement the BeamAsm JIT"): 

--- a/erts/configure.in
+++ b/erts/configure.in
...
@@ -704,6 +719,9 @@ else
   fi
 fi

+dnl Disable pgo for now
+USE_PGO=false
+
 AC_SUBST(USE_PGO)
 AC_SUBST(PROFILE_COMPILER)

(This took a *lot* of time to bisect due to erlang's afwul non-linear history... :)

So, I'm unsure why upstream disabled this "for now", as it has been disabled for more than a year. We could work around the other erlang failures by also disabling PGO there, or with help from someone knowledgeable about dtrace, try to get to the bottom of why dtrace dies on clang 13 produced PGO object files.
Comment 1 Dimitry Andric freebsd_committer freebsd_triage 2021-09-15 19:32:43 UTC
(In reply to Dimitry Andric from comment #0)
> Something in beam_emu_pu.o (emitted by clang or llvm 13) is tripping up dtrace, but I have very little knowledge about dtrace so I need help here. :)

So there are two objects that are instrumented with profile generation code, beam_emu_pg.o and erl_process_pg.o. These are linked into a beam.smp executable which generates profile data, and the data is used to build beam_emu_pu.o and erl_process_pu.o.

The difference between llvm 12 and llvm 13 output is in the produced sections for the _pu.o files. With llvm 12, there is a .llvm.call-graph-profile section (to be used by the linker to rearrange 'hot' and 'cold' parts):

  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
...
  [19] .llvm.call-graph-profile LOOS+0xfff4c02  0000000000000000 02e9f0 000e80 10   E 21   0  1

whereas with llvm 13, there is an additional .rel.llvm.call-graph-profile section:

  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
...
  [17] .llvm.call-graph-profile LOOS+0xfff4c09  0000000000000000 0279a9 000750 08   E 22   0  1
  [18] .rel.llvm.call-graph-profile REL             0000000000000000 030888 001d40 10     22  17  8

I have verified that removing the .rel.llvm.call-graph-profile section from the _pu.o files (using objcopy --remove .rel.llvm.call-graph-profile) makes dtrace not crash anymore.

However, I think the .rel.llvm.call-graph-profile section might contain information that is useful to the linker. So the question is still what is in this particular section that makes dtrace crash.
Comment 2 Dimitry Andric freebsd_committer freebsd_triage 2021-10-02 11:52:30 UTC
Unless somebody objects, I will commit a change tomorrow, that effectively adds:

--- a/erts/configure.in
+++ b/erts/configure.in
...
@@ -704,6 +719,9 @@ else
   fi
 fi

+dnl Disable pgo for now
+USE_PGO=false
+
 AC_SUBST(USE_PGO)
 AC_SUBST(PROFILE_COMPILER)

to the affected erlang ports. This will work around the dtrace failures, until somebody with dtrace knowledge can pick it up again.
Comment 3 commit-hook freebsd_committer freebsd_triage 2021-10-03 10:20:27 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/ports/commit/?id=9ce64e91907aaa43fc43a7b2faaae5bf99faaa56

commit 9ce64e91907aaa43fc43a7b2faaae5bf99faaa56
Author:     Dimitry Andric <dim@FreeBSD.org>
AuthorDate: 2021-10-02 14:52:35 +0000
Commit:     Dimitry Andric <dim@FreeBSD.org>
CommitDate: 2021-10-03 10:13:35 +0000

    lang/(erlang(-runtime2[13])?): work around dtrace failures with clang 13

    During an exp-run for llvm 13 (see bug 258209), it turned out that
    lang/erlang and lang/erlang-runtime2[13] fail to build with clang 13.

    What appears to happen is that for these versions of erlang, PGO is
    enabled, and it first builds a PGO-enabled beam.smp:

    gmake[5]: Entering directory '/wrkdirs/usr/ports/lang/erlang/work/otp-OTP-21.3.8.24/erts/emulator'
    if utils/gen_git_version amd64-portbld-freebsd14.0/gen_git_version.mk; then touch beam/erl_bif_info.c; fi
    echo " PROFILE beam.prof.smp"
     PROFILE beam.prof.smp
    rm -f obj/amd64-portbld-freebsd14.0/opt/smp/erl*.profraw
    set -e; LLVM_PROFILE_FILE="obj/amd64-portbld-freebsd14.0/opt/smp/erlc-%m.profraw" \
              ERL_FLAGS="-emu_type prof +S 1" erlc -W  -DPGO \
      -o obj/amd64-portbld-freebsd14.0/opt/smp test/estone_SUITE.erl > obj/amd64-portbld-freebsd14.0/opt/smp/PROFILE_LOG

    after which it does a test run, and uses llvm-profdata to merge the
    profiling data into beam_emu_pu.o:

      llvm-profdata merge -output obj/amd64-portbld-freebsd14.0/opt/smp/default.profdata obj/amd64-portbld-freebsd14.0/opt/smp/*.profraw
      cc -fprofile-instr-use=obj/amd64-portbld-freebsd14.0/opt/smp/default.profdata  -Werror=undef -Werror=implicit -Werror=return-type   -O3 -fomit-frame-pointer -pipe  -fno-omit-frame-pointer -DMAP_NORESERVE=0 -fstack-protector-strong -fno-strict-aliasing  -I/wrkdirs/usr/ports/lang/erlang/work/otp-OTP-21.3.8.24/erts/amd64-portbld-freebsd14.0    -DHAVE_CONFIG_H -Wall -Wstrict-prototypes -Wmissing-prototypes -Wdeclaration-after-statement -DUSE_THREADS -D_THREAD_SAFE -D_REENTRANT -DPOSIX_THREADS   -Iamd64-portbld-freebsd14.0/opt/smp -Ibeam -Isys/unix -Isys/common -Iamd64-portbld-freebsd14.0 -Ipcre -Ihipe -I../include -I../include/amd64-portbld-freebsd14.0 -I../include/internal -I../include/internal/amd64-portbld-freebsd14.0 -c beam/beam_emu.c -o obj/amd64-portbld-freebsd14.0/opt/smp/beam_emu_pu.o

    Later, it runs dtrace over all the collected objects, and this dies:

      dtrace -G -C -Ibeam \
        -s beam/erlang_dtrace.d \
        -o obj/amd64-portbld-freebsd14.0/opt/smp/erlang_pu_dtrace.o
        ... long list of objects ...
      dtrace: failed to link script beam/erlang_dtrace.d: an error was encountered while processing obj/amd64-portbld-freebsd14.0/opt/smp/beam_emu_pu.o
      gmake[5]: *** [amd64-portbld-freebsd14.0/Makefile:1005: obj/amd64-portbld-freebsd14.0/opt/smp/erlang_pu_dtrace.o] Error 1
      gmake[5]: Leaving directory '/wrkdirs/usr/ports/lang/erlang/work/otp-OTP-21.3.8.24/erts/emulator'

    Something in beam_emu_pu.o (emitted by clang or llvm 13) is tripping up
    dtrace, but I have very little knowledge about dtrace so I need help
    here. :)

    Now some other erlang runtimes such as lang/erlang-runtime24 *do* build
    successfully with clang 13, but this is only because upstream disabled
    the PGO feature, as a side effect of
    https://github.com/erlang/otp/commit/b165524c732 ("erts: Implement the
    BeamAsm JIT"):

    --- a/erts/configure.in
    +++ b/erts/configure.in
    ...
    @@ -704,6 +719,9 @@ else
       fi
     fi

    +dnl Disable pgo for now
    +USE_PGO=false
    +
     AC_SUBST(USE_PGO)
     AC_SUBST(PROFILE_COMPILER)

    I am unsure why upstream disabled this "for now", as it has been
    disabled for more than a year. So, for now, work around the dtrace
    failures by disabling PGO using the configure flag --disable-pgo, when
    building with clang >= 13.

    PR:             258494
    Approved by:    maintainer timeout (2 weeks)
    MFH:            2021Q4

 lang/erlang-runtime21/Makefile | 7 ++++++-
 lang/erlang-runtime23/Makefile | 7 ++++++-
 lang/erlang/Makefile           | 7 ++++++-
 3 files changed, 18 insertions(+), 3 deletions(-)