Bug 276690 - Compilation of a particular module never ends causing runaway builds for the port graphics/diplib on arm64 architecture
Summary: Compilation of a particular module never ends causing runaway builds for the ...
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: misc (show other bugs)
Version: 14.0-STABLE
Hardware: arm64 Any
: --- Affects Only Me
Assignee: freebsd-toolchain (Nobody)
URL: https://pkg-status.freebsd.org/ampere...
Keywords:
Depends on:
Blocks:
 
Reported: 2024-01-28 19:36 UTC by Yuri Victorovich
Modified: 2024-02-09 18:54 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Yuri Victorovich freebsd_committer freebsd_triage 2024-01-28 19:36:40 UTC
The problem occurs consistently on both 14 and 15.

See the linked build log.

Also see the history of fallouts: https://portsfallout.com/fallout?port=graphics%2Fdiplib&maintainer=yuri%40FreeBSD.org&env=&category=&flavor=
Comment 1 Mark Millard 2024-01-28 20:00:01 UTC
(In reply to Yuri Victorovich from comment #0)

Freshports indicates:

FreeBSD:15:aarch64	3.4.1
2024-01-17 20:15 – repo build date
2024-01-18 17:00 – processed by FreshPorts
2024-01-28 19:00 – last checked by FreshPorts

as a successful build for 15.

https://portsfallout.com/fallout?port=graphics%2Fdiplib&maintainer=yuri%40FreeBSD.org&env=&category=&flavor

shows no main-arm64 failures after 2023-12-08 05:30.

It also shows just aarch64 (arm64) failures, other than one powerpc64le
failure. So "Hardware: any any" seems misclassified?
Comment 2 Yuri Victorovich freebsd_committer freebsd_triage 2024-01-28 20:46:16 UTC
https://portsfallout.com shows that the problem existed at least from September 2023, and probably earlier.
Comment 3 Mark Millard 2024-01-28 20:56:35 UTC
(In reply to Yuri Victorovich from comment #2)

For main [so: 15] the problem last happened with:

build started at Fri Dec  8 02:53:32 UTC 2023
port directory: /usr/ports/graphics/diplib
package name: diplib-3.4.0
building for: FreeBSD main-arm64-default-job-11 15.0-CURRENT FreeBSD 15.0-CURRENT 1500005 arm64
. . .
Host OSVERSION: 1500000
Jail OSVERSION: 1500005

The first working one was with:

build started at Tue Dec 19 18:15:30 UTC 2023
port directory: /usr/ports/graphics/diplib
package name: diplib-3.4.0
building for: FreeBSD main-arm64-default-job-08 15.0-CURRENT FreeBSD 15.0-CURRENT 1500007 arm64
. . .
Host OSVERSION: 1500006
Jail OSVERSION: 1500007

After 1500005 but before 1500007 was the update to llvm17:

Bump __FreeBSD_version for llvm 17.0.6 merge
PR:		273753
MFC after:	1 month
Diffstat (limited to 'sys/sys/param.h')
-rw-r--r--	sys/sys/param.h	2	
1 files changed, 1 insertions, 1 deletions
diff --git a/sys/sys/param.h b/sys/sys/param.h
index 107b86707c9e..c79c46ab4342 100644
--- a/sys/sys/param.h
+++ b/sys/sys/param.h
@@ -73,7 +73,7 @@
  * cannot include sys/param.h and should only be updated here.
  */
 #undef __FreeBSD_version
-#define __FreeBSD_version 1500005
+#define __FreeBSD_version 1500006
 
 /*
  * __FreeBSD_kernel__ indicates that this system uses the kernel of FreeBSD,


That might be the difference for main [so: 15].
Comment 4 Mark Millard 2024-01-28 21:08:24 UTC
Note: I wish it was easier to go through fallout information based
on the system toolchain vintage in use in the Jail OSVERSION, here
looking at llvm16 vs. llvm17 results would be handy.
Comment 5 Yuri Victorovich freebsd_committer freebsd_triage 2024-01-28 21:09:55 UTC
This type of problem is more likely to be a compiler-related bug than a kernel-related issue.
Comment 6 Yuri Victorovich freebsd_committer freebsd_triage 2024-01-28 21:10:44 UTC
I disabled this port on this architecture until this is fixed to reduce fallout.
Comment 7 Warner Losh freebsd_committer freebsd_triage 2024-01-28 21:25:41 UTC
So this is aarch64 building for aarch64, right? No user-static involved?
Comment 8 Dimitry Andric freebsd_committer freebsd_triage 2024-01-28 21:34:37 UTC
I can reproduce. It's hanging on copy-buffer.cpp:

clang -cc1 -triple aarch64-unknown-freebsd14.0 -emit-obj -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name copy_buffer.cpp -mrelocation-model pic -pic-level 2 -fhalf-no-semantic-interposition '-mframe-pointer=non-leaf' -relaxed-aliasing '-ffp-contract=on' -fno-rounding-math -mconstructor-aliases '-funwind-tables=2' -target-cpu generic -target-feature +neon -target-feature +v8a -target-abi aapcs -mllvm -treat-scalable-fixed-error-as-warning '-debugger-tuning=gdb' '-fcoverage-compilation-dir=/wrkdirs/usr/ports/graphics/diplib/work/.build' -sys-header-deps -D DIP_CONFIG_DIP_BUILD_SHARED -D DIP_CONFIG_DOCTEST_IN_SHARED_LIB -D DIP_CONFIG_ENABLE_DOCTEST -D DIP_CONFIG_ENABLE_STACK_TRACE -D DIP_CONFIG_ENABLE_UNICODE -D DIP_CONFIG_HAS_ICS -D DIP_CONFIG_HAS_JPEG -D DIP_CONFIG_HAS_PRETTY_FUNCTION -D DIP_CONFIG_HAS_TIFF -D 'DIP_COPYRIGHT_YEAR="2023"' -D 'DIP_DEBUG_VERSION=0' -D 'DIP_EXAMPLES_DIR="/wrkdirs/usr/ports/graphics/diplib/work/diplib-3.4.1/examples"' -D DIP_EXPORTS -D 'DIP_VERSION_STRING="3.4.1"' -D DOCTEST_CONFIG_NO_SHORT_MACRO_NAMES -D EIGEN_DONT_PARALLELIZE -D EIGEN_MPL2_ONLY -D ICS_ZLIB -D NDEBUG -O2 -Wall -Wconversion -Wsign-conversion -Wno-c++17-extensions -Wno-gnu-line-marker -pedantic '-std=c++14' -fdeprecated-macro '-fdebug-compilation-dir=/wrkdirs/usr/ports/graphics/diplib/work/.build' -ferror-limit 19 '-fvisibility=hidden' -fvisibility-inlines-hidden -fopenmp -stack-protector 2 -fno-signed-char '-fgnuc-version=4.2.1' -fcxx-exceptions -fexceptions -vectorize-loops -vectorize-slp -faddrsig '-D__GCC_HAVE_DWARF2_CFI_ASM=1' -x c++ copy_buffer-bb5d14.cpp

I'm going to check whether this is reducible, or if there is an upstream fix.

Did this port ever compile successfully, with an older version of clang, or with an older version of the port?
Comment 9 Mark Millard 2024-01-28 21:53:55 UTC
(In reply to Yuri Victorovich from comment #6)

But it builds for main [so: 15] when the jail OSVERSION
contains llvm 17.0.6 . The older jail OSVERSION's with
llvm16 fail.

FYI (it is a mess to figure this out):

1302510 llvm 17 based
1400504 llvm 17 based
1500006 llvm 17 based

1302507 llvm 16 based
1400091 llvm 16 based

1302505 llvm 15 based
1400079 llvm 15 based

1301504 llvm 14 based
1400059 llvm 14 based

1400042 llvm 13 based

1300513 llvm 12 based
1400023 llvm 12 based

1300137 llvm 10 updates
1300096 llvm 10 updates
1300084 llvm 10 based
1300056 llvm fixes
1200060 llvm 6 based
Comment 10 Mark Millard 2024-01-28 21:56:02 UTC
(In reply to Dimitry Andric from comment #8)

It started building just fine for the main-arm64
build servers when they switched to a Jail OSVERSION
that was llvm17 based.
Comment 11 Mark Millard 2024-01-28 21:57:52 UTC
(In reply to Yuri Victorovich from comment #5)

True.

But 1500006 was created in order to indicate the tool
chain update to llvm17,  not for a kernel update.
Comment 12 Mark Millard 2024-01-28 22:06:52 UTC
(In reply to Warner Losh from comment #7)

Offical ampere* based builds (that are based on jail OSVERSIONs
that use llvm16). No qemu use.
Comment 13 Dimitry Andric freebsd_committer freebsd_triage 2024-01-29 21:39:23 UTC
It's weird, I can find an upstream commit that appears to fix this hang, which is https://github.com/llvm/llvm-project/commit/56e60bc5bbfb8fdf2b22a897e8801c87771c84e8 ("TargetLowering: fix an infinite DAG combine in SimplifySETCC ") but it appears to have already been applied on llvm 17.0.6.

E.g. I thought I could apply that change and make it work, but now I am having trouble reproducing the original hang. :)
Comment 14 Dimitry Andric freebsd_committer freebsd_triage 2024-01-29 22:09:59 UTC
Ah, I was misled a bit. 14.0-RELEASE-p3 has clang 16.0.6, so it will indeed have this bug, as https://github.com/llvm/llvm-project/commit/56e60bc5bbfb (aka llvmorg-17-init-17782-g56e60bc5bbfb) was well after 16.0 released. It did make it into 17.0.0 though.

So maybe a workaround would be to use devel/17 for this port, at least when on 13.x and 14.x? And maybe only on arm64, since it seems to not be triggering the problem when targeting x86?