Bug 276170 - LLVM bug prevents from enabling PGO optimization for Python 3.11+
Summary: LLVM bug prevents from enabling PGO optimization for Python 3.11+
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 14.0-RELEASE
Hardware: arm64 Any
: --- Affects Only Me
Assignee: freebsd-toolchain (Nobody)
URL:
Keywords: regression
Depends on:
Blocks:
 
Reported: 2024-01-07 11:43 UTC by dmilith
Modified: 2024-01-12 13:12 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description dmilith 2024-01-07 11:43:58 UTC
Making a long story short:

clang: error: clang frontend command failed with exit code 134 (use -v to see invocation)
FreeBSD clang version 16.0.6 (https://github.com/llvm/llvm-project.git llvmorg-16.0.6-0-g7cbf1a259152)
Target: aarch64-unknown-freebsd14.0
Thread model: posix
InstalledDir: /usr/bin
#0 0x0000000004a3a0e8 (/usr/bin/clang+0x4a3a0e8)
#1 0x0000000004a384c4 (/usr/bin/clang+0x4a384c4)
#2 0x00000000049e99f4 (/usr/bin/clang+0x49e99f4)
#3 0x0000000045e67034 (/lib/libthr.so.3+0x2b034)

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang: note: diagnostic msg: /tmp/token-f9bcce.c
clang: note: diagnostic msg: /tmp/token-f9bcce.sh
clang: note: diagnostic msg:
Comment 1 dmilith 2024-01-07 11:47:03 UTC
I can't attach both of these files cause of some ridiculous limitations.
Comment 3 dmilith 2024-01-07 11:51:34 UTC
To reproduce, build Python 3.9+ (also the same for Python 3.11, 3.12) with --enable-optimizations - which will enable the PGO on which Clang will fail.
Comment 4 dmilith 2024-01-07 11:57:08 UTC
Also, it's worth noting that this works on FreeBSD up to version 13.2 without any issues.
Comment 5 Mark Linimon freebsd_committer freebsd_triage 2024-01-07 12:16:45 UTC
^Triage: assign to responsible group.

Bugmeister comment: I'm sorry that you don't like the filesize restrictions, but they are necessary for us to be able to keep Bugzilla running.
Comment 6 Dimitry Andric freebsd_committer freebsd_triage 2024-01-07 13:44:25 UTC
I could download the repro .c and .sh files from GitHub, but apparently this also needs a "code.profclangd" file which has been produced by an earlier profiling run:

+ clang -cc1 -triple aarch64-unknown-freebsd14.0 -emit-llvm-bc '-flto=full' -flto-unit -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name token.c -mrelocation-model pic -pic-level 2 -fhalf-no-semantic-interposition '-mframe-pointer=non-leaf' '-ffp-contract=on' -fno-rounding-math -mconstructor-aliases '-funwind-tables=2' -target-cpu generic -target-feature +neon -target-feature +v8a -target-abi aapcs -mllvm -treat-scalable-fixed-error-as-warning '-debug-info-kind=standalone' '-dwarf-version=5' '-debugger-tuning=gdb' '-fprofile-instrument-use-path=code.profclangd' '-fcoverage-compilation-dir=/Software/Python312/.src_34340eab1ef9bee6/Python-3.12.1' -D NDEBUG -D Py_BUILD_CORE -O3 -Wformat -Wformat-security -Wformat -Wformat-security '-Werror=implicit-function-declaration' -w '-std=c11' '-fdebug-compilation-dir=/Software/Python312/.src_34340eab1ef9bee6/Python-3.12.1' -ferror-limit 1 '-fvisibility=hidden' -fwrapv -pthread -stack-protector 1 -stack-protector-buffer-size 4 -stack-protector-buffer-size 4 -fno-signed-char '-fgnuc-version=4.2.1' -vectorize-loops -vectorize-slp -faddrsig '-D__GCC_HAVE_DWARF2_CFI_ASM=1' -x c token-7986e1.c
error: Error in reading profile code.profclangd: No such file or directory

If you still have this file, you can also upload it GitHub?
Comment 7 dmilith 2024-01-07 13:48:32 UTC
I'm sorry. Didn't want to be rude. I noticed the generated file size later. Didn't expect a 1MiB C file :)
Comment 8 Dimitry Andric freebsd_committer freebsd_triage 2024-01-07 13:55:38 UTC
It's not exceptional for a preprocessed .c file to be more than 1MB. If you preprocess some C++ sources it gets even crazier. :)  In any case, you can save quite a bit of space by xz'ing such files, as they compress very well.
Comment 9 dmilith 2024-01-07 14:07:11 UTC
(In reply to Dimitry Andric from comment #6)

I did a clean build of Python312 with "--enable-optimizations". Here I've packed everything: https://software.verknowsys.com/fbsd-bug-reports/cleanbuild-Python-3.12.1-FreeBSD14.0-arm64.zip

The archive contains all code* files generated in the Python build dir.
All files generated by the clang during the crash are under "tmp_generated_by_the_build". There are 4 kinds of crashes from what I understand.
Comment 10 Dimitry Andric freebsd_committer freebsd_triage 2024-01-07 14:20:29 UTC
I can't reproduce any crashes here, at least on my 15-CURRENT machine. Since 15-CURRENT uses a newer version of clang, I have used an older binary which identifies itself as "FreeBSD clang version 16.0.6 (https://github.com/llvm/llvm-project.git llvmorg-16.0.6-0-g7cbf1a259152)", corresponding to your version.

With the unpacked contents of your zipfile, I can run all .sh files (I only replaced the clang command in there to point at clang 16.0.6) successfully, and they all produce .bc files as intended:

$ ls -l *.bc
-rw-r--r--  1 dim dim 258948 2024-01-07 15:17:06 pegen-d7bb62.bc
-rw-r--r--  1 dim dim  52752 2024-01-07 15:17:07 pegen_errors-c153bb.bc
-rw-r--r--  1 dim dim   4244 2024-01-07 15:17:07 python-50eaf7.bc
-rw-r--r--  1 dim dim  14520 2024-01-07 15:17:07 token-7986e1.bc
-rw-r--r--  1 dim dim  14428 2024-01-07 15:17:07 token-9bf20d.bc

However, as indicated, this is on an amd64 host with 15-CURRENT and ample RAM.

I am suspecting that something else may be going on on your system? Do you see any out of memory errors or other indications like segfaults in dmesg or system logs?

Can you also share the full error message shown when an instance of clang crashes? There should be some more information about what is exactly causing the 134 exit code, which usually indicates an assertion failure.
Comment 11 dmilith 2024-01-07 15:00:18 UTC
I'm quite certain about the VM software. I have used the same Qemu 6.1 +HVF version for several years now, as for all other build hosts. On the FreeBSD v13.2-aarch64 VM, the PGO works flawlessly.

The full stdout/err output from the compilation is like this https://gist.github.com/dmilith/4ba2f5dbdc3026f1638f801f317834b1
Comment 12 dmilith 2024-01-07 15:12:20 UTC
Also, it's 100% reproducible. Even from the sh file:

#16:11:12 vks5-14-0 /tmp λ cp /Software/Python312/.src_34340eab1ef9bee6/Python-3.12.1/code.profclangd ./
#16:11:37 vks5-14-0 /tmp λ sh ./pegen_errors-927930.sh
Expected<T> must be checked before access or destruction.
Expected<T> value was in success state. (Note: Expected<T> values in success mode must still be checked prior to being destroyed).
PLEASE submit a bug report to https://bugs.freebsd.org/submit/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: /usr/bin/clang -cc1 -triple aarch64-unknown-freebsd14.0 -emit-llvm-bc -flto=thin -flto-unit -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name pegen_errors.c -mrelocation-model pic -pic-level 2 -fhalf-no-semantic-interposition -mframe-pointer=non-leaf -ffp-contract=on -fno-rounding-math -mconstructor-aliases -funwind-tables=2 -target-cpu generic -target-feature +neon -target-feature +v8a -target-abi aapcs -mllvm -treat-scalable-fixed-error-as-warning -debug-info-kind=standalone -dwarf-version=5 -debugger-tuning=gdb -fprofile-instrument-use-path=code.profclangd -fcoverage-compilation-dir=/Software/Python312/.src_34340eab1ef9bee6/Python-3.12.1 -D NDEBUG -D Py_BUILD_CORE -O3 -Wformat -Wformat-security -Wformat -Wformat-security -Werror=implicit-function-declaration -w -std=c11 -fdebug-compilation-dir=/Software/Python312/.src_34340eab1ef9bee6/Python-3.12.1 -ferror-limit 1 -fvisibility=hidden -ftrapv -fwrapv -pthread -stack-protector 1 -stack-protector-buffer-size 4 -stack-protector-buffer-size 4 -fno-signed-char -fgnuc-version=4.2.1 -vectorize-loops -vectorize-slp -faddrsig -D__GCC_HAVE_DWARF2_CFI_ASM=1 -x c pegen_errors-927930.c
#0 0x0000000004a3a0e8 (/usr/bin/clang+0x4a3a0e8)
#1 0x0000000004a384c4 (/usr/bin/clang+0x4a384c4)
#2 0x0000000004a3a7c8 (/usr/bin/clang+0x4a3a7c8)
#3 0x0000000045e67034 (/lib/libthr.so.3+0x2b034)
zsh: exit 134   sh ./pegen_errors-927930.sh
Comment 13 Dimitry Andric freebsd_committer freebsd_triage 2024-01-07 15:18:37 UTC
(In reply to dmilith from comment #12)
So on what host are you running this? A 14.0-RELEASE arm64 one?

I vaguely remember this "Expected<T> must be checked before access or destruction" showing up in the past for some lldb users, but that was never solved as it was not reproducible... At least not for me.
Comment 14 Dimitry Andric freebsd_committer freebsd_triage 2024-01-07 15:28:25 UTC
Never mind, I can reproduce here on a 14.0-RELEASE-p3 arm64 VM, with the system clang which is 16.0.6.

However, I also tried clang 17 from ports and that works fine with all the files. Since I am currently working on MFCing llvm 17 to stable/14 and stable/13, I'm unsure if it is worth the time to spend on figuring out which upstream commit fixed this.

That said, for 14.0 release it might be fixed in an errata notice, but it is quite a lot of work, and this problem only occurs in a very specific use case (which can be worked around easily), and will be made moot anyway after 17 is MFCd...
Comment 15 dmilith 2024-01-07 15:43:36 UTC
Cool, it means the fix is going to land ~14.1, right? Thanks for the confirmation.
Comment 16 Dimitry Andric freebsd_committer freebsd_triage 2024-01-07 15:51:30 UTC
If you need to work around it now, you can do "pkg install llvm17" from ports, and use that one instead, to get a PGO'd Python. (I hope that runs faster than regular Python ... :)
Comment 17 dmilith 2024-01-07 16:02:29 UTC
I already did this by simply adding "--disable-optimizations" for Python39, Python311 and Python312 on my FreeBSD-14.0 build system. It works :)

Thank you!
Comment 18 dmilith 2024-01-07 16:07:09 UTC
(In reply to Dimitry Andric from comment #16)

Well, Python can be significantly faster with PGO indeed :)

I wonder why "--enable-optimizations" is not enabled for Python port builds?
Comment 19 Mark Millard 2024-01-07 16:17:47 UTC
(In reply to dmilith from comment #12)

Looks like this is an internal report not directly about the source code
being compiled.

See, for example: https://reviews.llvm.org/D138781

where another example of this type of message is referenced and, apparently,
fixed for RISCV.
Comment 20 dmilith 2024-01-12 12:13:27 UTC
Well, it's worth to mention that the issue is:
- Present on Python 3.9+, but also any PGO optimization of any software will crash the compiler on arm64/aarch64.
- If port definitions would use "--enable-optimizations" configuration option - it would affect 100% of the users.

Cheers!
Comment 21 Dimitry Andric freebsd_committer freebsd_triage 2024-01-12 13:12:36 UTC
(In reply to dmilith from comment #20)
A quick workaround would be to disable the PGO option by default in the port, *iff* the architecture is arm64. I think there are more examples of that in the ports tree.