Bug 265809 - clang never finishes on one particular C++ file (for science/psi4)
Summary: clang never finishes on one particular C++ file (for science/psi4)
Status: In Progress
Alias: None
Product: Base System
Classification: Unclassified
Component: misc (show other bugs)
Version: 13.1-STABLE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-toolchain (Nobody)
URL: https://github.com/llvm/llvm-project/...
Keywords:
Depends on:
Blocks:
 
Reported: 2022-08-13 06:40 UTC by Yuri Victorovich
Modified: 2023-07-03 20:52 UTC (History)
3 users (show)

See Also:


Attachments
unity_1935_cxx.cxx.E.cxx.bz2 (494.30 KB, application/x-bzip)
2022-08-13 06:40 UTC, Yuri Victorovich
no flags Details
SIGQUIT from mariadb105-server-10.5.17 (8.19 KB, text/plain)
2022-09-15 17:20 UTC, Morgan Wesström
no flags Details
pars0pars.cc from mariadb105-server-10.5.17 (59.17 KB, text/plain)
2022-09-15 17:21 UTC, Morgan Wesström
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Yuri Victorovich freebsd_committer freebsd_triage 2022-08-13 06:40:04 UTC
Created attachment 235873 [details]
unity_1935_cxx.cxx.E.cxx.bz2

clang never finishes on the attached (compressed) C++ unit.

The command
> c++ -fno-strict-aliasing -fno-omit-frame-pointer -march=native -O3 -fPIC -c unity_1935_cxx.cxx.E.cxx

doesn't finish in hours.

But when -fno-strict-aliasing is removed it finishes within a minute.

> $ c++ --version
> FreeBSD clang version 14.0.5 (https://github.com/llvm/llvm-project.git llvmorg-14.0.5-0-gc12386ae247c)
> Target: x86_64-unknown-freebsd13.1
> Thread model: posix
> InstalledDir: /usr/bin


This problem was originally discovered when one C++ module in the psi4 project (https://github.com/psi4/psi4) never finished to compile.
Comment 1 John F. Carr 2022-08-15 18:57:59 UTC
The hang (or exponential runtime) appears to be in LLVM's DAGCombiner::Run working on the function deriv2eri3_aB_P__0__F__1___TwoPRep_unit__0__P__1___Ab__up_0.  To narrow it down I ran clang++ -emit-llvm to get an IR file.  That ran quickly.  Then I ran llc from an LLVM 14 build (it's not installed by default) on that IR file and that part hung.  The C++ file does not compile with LLVM 15 clang due to changes in intrinsics.  The LLVM 15 llc program hangs on the output from LLVM 14 clang.

Is anybody in particular responsible for reporting LLVM bugs upstream?
Comment 2 Morgan Wesström 2022-09-13 18:20:16 UTC
I believe I'm hitting the same or a related problem after source upgrading from FreeBSD-13.0-RELEASE-p5 to FreeBSD-13.1-RELEASE-p2. When recompiling all ports afterwards it gets stuck consistently on the same file from databases/mariadb105-server-10.5.17:

/usr/ports/databases/mariadb105-server/work/mariadb-10.5.17/storage/innobase/pars/pars0sym.cc

The compilation process gets stuck at 100% cpu but produces no output to the object file and truss(1) shows no syscalls at this point. I let it sit for 8+ hours in this state but it never progressed so I killed the process at that point.

I quickly installed a fresh FreeBSD-13.1-p2 in a virtual machine and verified that the ports compilation worked there. The only thing that differs between my  machines and a default setup are a few exclusions in /etc/src.conf and a CPUTYPE?=bonnell directive in /etc/make.conf

I reverted those changes and recompiled my system and then the ports compilation of mariadb105-server worked normally again. I restored the CPUTYPE?=bonnell and recompiled the system again and once more the compilation of mariadb gets stuck at the exact same file.

My knowledge is limited and I don't know how to track down the problem further but the CPUTYPE directive adds a -march= to the compiler arguments just as the original post shows. Perhaps there's a malfunction here in the clang/llvm 13 version included in FreeBSD-13.1? I had no such problems with clang/llvm 11 in FreeBSD-13.0.

As an extra test I also tried to compile mariadb106-server and it experiences the exact same stuck behaviour, albeit on a different file but in the same directory. I forgot to make a note of that filename unfortunately.
Comment 3 John F. Carr 2022-09-14 14:50:59 UTC
(In reply to Morgan Wesström from comment #2)

I was unable to reproduce the hang.  It may be unrelated to the original bug.  There are many reasons a compiler might hang.  If you send the compiler a QUIT signal while it is stuck it will crash and save enough information to submit a bug report: the preprocessed source and a command line to invoke the compiler.
Comment 4 Morgan Wesström 2022-09-14 15:06:24 UTC
(In reply to John F. Carr from comment #3)
Thank you, John. These are two identical 10 year old Atom D525 boxes. They are slow and take about 18 hours to recompile the whole base system. I will try your suggestion but it will take a few days before i have something to report. :)
Comment 5 Dimitry Andric freebsd_committer freebsd_triage 2022-09-14 15:24:20 UTC
I can reproduce this with Yuri's original test case. I'm currently attempting to reduce the test case to something that can be checked against various versions of clang, to see if it is a regression. After it has been reduced, I will probably submit it as an upstream bug.
Comment 6 Morgan Wesström 2022-09-15 17:19:43 UTC
(In reply to John F. Carr from comment #3)

I've recompiled my system again with CPUTYPE?=bonnell and can now reproduce the stall again. My earlier copy and paste unfortunately referenced the wrong file. The correct file should be pars0pars.cc for mariadb105-server-10.5.17. I apologize for that error. I have attached the crash backtrace and the source file.

If this isn't related to the original bug report, feel free to delete my posts and advice me if I should create a new bug report for this. I realize old Atom CPUs don't get much love these days and that I may be the only person affected.
Comment 7 Morgan Wesström 2022-09-15 17:20:39 UTC
Created attachment 236572 [details]
SIGQUIT from mariadb105-server-10.5.17
Comment 8 Morgan Wesström 2022-09-15 17:21:10 UTC
Created attachment 236573 [details]
pars0pars.cc from mariadb105-server-10.5.17
Comment 9 John F. Carr 2022-09-15 17:52:07 UTC
(In reply to Morgan Wesström from comment #7)

There's nothing BSD-specific here so I filed a bug report with LLVM.  https://github.com/llvm/llvm-project/issues/57764
Comment 10 Morgan Wesström 2022-09-15 18:26:35 UTC
(In reply to John F. Carr from comment #9)

Greatly appreciated, thank you. :) This is far beyond my level of understanding but I've subscribed to that thread and will monitor it if more info is requested. Once again, I apologize for hijacking this thread with an unrelated issue, it was not my intention.
Comment 11 John F. Carr 2023-07-03 20:45:11 UTC
LLVM issue 57164 has been accidentally fixed between 16.0 and the latest main branch.  There is no obviously relevant commit message but the latest llc compiles the IR file in 2 minutes instead of forever.
Comment 12 Dimitry Andric freebsd_committer freebsd_triage 2023-07-03 20:52:43 UTC
(In reply to John F. Carr from comment #11)
I don't see that, with llvm main (llvmorg-17-init-16183-g7b31a73ffe8) I still get:

Assertion failed: ((ExtraInfo->getCascade(Intf->reg()) < Cascade || VirtReg.isSpillable() < Intf->isSpillable()) && "Cannot decrease cascade number, illegal eviction"), function evictInterference, file /home/dim/src/llvm/llvm-project/llvm/lib/CodeGen/RegAllocGreedy.cpp, line 505.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: /home/dim/ins/llvmorg-17-init-16183-g7b31a73ffe8/bin/clang -O2 -c hang.ll
1.      Code generation
2.      Running pass 'Function Pass Manager' on module 'hang.ll'.
3.      Running pass 'Greedy Register Allocator' on function '@_Z26pars_info_add_int4_literalP11pars_info_tPKcm'

on your .ll test case. Maybe you have a 16.0 version with assertions disabled?