Summary: | graphics/mesa-dri: Segmentation fault occurs while executing atexit handlers (affects lang/clover: Segmentation fault in OpenCL programs) | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Ports & Packages | Reporter: | shamaz.mazum | ||||||
Component: | Individual Port(s) | Assignee: | freebsd-x11 (Nobody) <x11> | ||||||
Status: | In Progress --- | ||||||||
Severity: | Affects Some People | CC: | crahman, dumbbell, grahamperrin, jbeich, kle, lhersch, manu, swills, val, vedran | ||||||
Priority: | --- | Keywords: | crash, needs-qa | ||||||
Version: | Latest | Flags: | bugzilla:
maintainer-feedback?
(x11) |
||||||
Hardware: | Any | ||||||||
OS: | Any | ||||||||
See Also: |
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=225415 https://bugs.freedesktop.org/show_bug.cgi?id=91869 |
||||||||
Attachments: |
|
Thank you for the report and patch. You mention "the program will crash at exit", could you detail what programs crashes? Are they programs from ports/packages? Are they custom programs you're unable to provide additional information, or core's/backtraces for? It may well also be worth reporting this issue upstream, as they may have a much better chance of reproduction or isolation, particularly in debug/traces are unable to be provided > could you detail what programs crashes? Are they programs from ports/packages?
I mean ALL programs using clover ocl provider, including devel/clinfo. One important thing, I forgot to mention is that you must have a suitable GPU (AMD Radeon HD or RX series will do) and drm-kmod port installed or old radeonkms module loaded (latter works only for old HD cards).
Try to launch clinfo and you will see something like this:
Number of platforms 1
Platform Name Clover
Platform Vendor Mesa
Platform Version OpenCL 1.1 Mesa 18.3.2
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd
Platform Extensions function suffix MESA
Platform Name Clover
Number of devices 1
Device Name Radeon RX 580 Series (POLARIS10, DRM 3.23.0, 12.0-RELEASE-p10, LLVM 8.0.1)
Device Vendor AMD
Device Vendor ID 0x1002
Device Version OpenCL 1.1 Mesa 18.3.2
Driver Version 18.3.2
Device OpenCL C Version OpenCL C 1.1
Device Type GPU
Device Profile FULL_PROFILE
Max compute units 36
Max clock frequency 1411MHz
Max work item dimensions 3
Max work item sizes 256x256x256
Max work group size 256
Preferred work group size multiple 64
Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 2 / 2
half 8 / 8 (cl_khr_fp16)
float 4 / 4
double 2 / 2 (cl_khr_fp64)
Half-precision Floating-point support (cl_khr_fp16)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Address bits 64, Little-Endian
Global memory size 4294967296 (4GiB)
Error Correction support No
Max memory allocation 3435973836 (3.2GiB)
Unified memory for Host and Device No
Minimum alignment for any data type 128 bytes
Alignment of base address 32768 bits (4096 bytes)
Global Memory cache type None
Image support No
Local memory type Local
Local memory size 32768 (32KiB)
Max constant buffer size 2147483647 (2GiB)
Max number of constant args 16
Max size of kernel argument 1024
Queue properties
Out-of-order execution No
Profiling Yes
Profiling timer resolution 0ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Device Available Yes
Compiler Available Yes
Device Extensions cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 cl_khr_fp16
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) Clover
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [MESA]
clCreateContext(NULL, ...) [default] Success [MESA]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
Platform Name Clover
Device Name Radeon RX 580 Series (POLARIS10, DRM 3.23.0, 12.0-RELEASE-p10, LLVM 8.0.1)
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1)
Platform Name Clover
Device Name Radeon RX 580 Series (POLARIS10, DRM 3.23.0, 12.0-RELEASE-p10, LLVM 8.0.1)
ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.2.12
ICD loader Profile OpenCL 2.2
NOTE: your OpenCL library declares to support OpenCL 2.2,
but it seems to support up to OpenCL 2.1 only.
Ошибка сегментации(core dumped)
I think maybe others FreeBSD+OpenCL+clover users may confirm this. My guess is that this is FreeBSD specific, because otherwise it would be noticed (maybe I am incorrect, but I think that there are not many OpenCL users who use FreeBSD).
If you can obtain a backtrace of the crash (as an attachment), that might be handy This is a backtrace (it's short, no need in attachment). (lldb) bt * thread #1, name = 'clinfo', stop reason = signal SIGSEGV * frame #0: 0x0000000806ce81d0 frame #1: 0x0000000800a030c5 libc.so.7`__cxa_finalize(dso=0x0000000000000000) at atexit.c:239 frame #2: 0x0000000800992cc1 libc.so.7`exit(status=0) at exit.c:74 frame #3: 0x0000000000401526 clinfo`___lldb_unnamed_symbol1$$clinfo + 390 Nothing useful to add here (sorry) other than, reproducible with FreeBSD 14.0-CURRENT. Background to my test: <https://forums.freebsd.org/threads/78825/> ---- % uname -KrU 14.0-CURRENT 1400021 1400021 % sudo pkg install -q -y clover grahamperrin's password: ===== Message from clover-20.2.3: -- ===> NOTICE: This port is deprecated; you may wish to reconsider installing it: Uses EOL Python 2.7 via devel/libclc. It is scheduled to be removed on or after 2021-06-23. % clinfo > /dev/null Segmentation fault (core dumped) % gdb attach clinfo.core GNU gdb (GDB) 10.2 [GDB v10.2 for FreeBSD] Copyright (C) 2021 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-portbld-freebsd14.0". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... attach: No such file or directory. [New LWP 107531] [New LWP 130945] [New LWP 130946] [New LWP 130947] [New LWP 130948] Core was generated by `clinfo'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00000008107d5f00 in ?? () [Current thread is 1 (LWP 107531)] (gdb) bt #0 0x00000008107d5f00 in ?? () #1 0x00000008003e0524 in ?? () #2 0x00007fffffffe388 in ?? () #3 0x0000000800b35a00 in ?? () #4 0x0000000000000000 in ?? () (gdb) q % sudo pkg delete -q -y clover libclc && sudo pkg clean -q -y % With FreeBSD 12.4, mesa-dri-22.3.3_2, clover-22.3.3_1 and clinfo-3.0.21.02.21 the problem still exists. However, the above (slightly customized) patch does not help me with the mentioned versions. The patch works after all. I had not noticed that clover is a slave port of mesa-dri and thus did not rebuild it after patching. attachment 207731 [details] doesn't help here on an Intel iGPU (see ports d8990eff958b). Rusticl isn't affected despite using the same compute support in Gallium. $ pkg install mesa-devel clinfo $ IRIS_ENABLE_CLOVER=1 clinfo >/dev/null 2>&1 Segmentation fault (lldb) bt * thread #1, name = 'clinfo', stop reason = signal SIGSEGV: invalid address (fault address: 0x840e27a10) * frame #0: 0x0000000840e27a10 frame #1: 0x000000082348e55e libc.so.7`__cxa_finalize + 366 frame #2: 0x000000082348eae1 libc.so.7`exit + 33 frame #3: 0x00000000002098b7 clinfo`_start(ap=<unavailable>, cleanup=<unavailable>) at crt1_c.c:75:2 While trying to use OpenCL with an RX5700XT on 14-CURRENT, this problem occurred. Sure enough, the patch allows clinfo to run to completion. The patch is a little different in mesa-22.3.7, and I've attached the modified patch (for /usr/ports/graphics/mesa-dri/files; be sure to rebuild all the associated port components). Created attachment 241985 [details]
Constructor/destructor workaround (mesa-22.3.7+)
Updated patch for more recent versions of mesa.
I'm pretty sure that I saw this issue when trying to run piglit a while ago. That being said I'm not confident of patching our tree without having upstream in the loop first. Can you open a MR upstream please ? I believe I am seeing this issue. I will check if it happens on Linux as well. Doesn't happen on Linux 6.10.8 with Mesa 24.2.1 and LLVM 18.1.8. Any ideas why this could be FreeBSD-specific? Hi all, as of 2025 here follows some additional information about the "clover" topic and Linux. Regrading Mesa 25.0 and LLVM 19 clover seems to perform quite well in conjunction with a Radeon RX 580 GPU (mentioned by the OP) which is GCN4 hardware. https://gitlab.freedesktop.org/mesa/mesa/-/issues/12332#note_2712711 https://gitlab.freedesktop.org/mesa/mesa/-/issues/12404 However, it should be added that clover suffers regularly of breakages mostly because of new or removed LLVM features. This seems to be again the case for LLVM 20.x and 21.x. https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33805#note_2802600 Finally, clover is in general not operational on GCN5 and newer Radeon class hardware including the RDNA series. Most likely this is because of some not properly set-up stuff at the LLVM side. https://gitlab.freedesktop.org/mesa/mesa/-/issues/4189 So it is currently not possible to use clover at an Radeon RX5700XT graphics card. But there exist as an alternative rusticl which should be also an option on BSD. ;-) Note, this information is predominately GCN Radeon hardware and radeonsi driver related. In contrast, older TeraScale Radeon hardware and the r600 Mesa driver have additional problems most likely because of some TeraScale specific LLVM flaws. For the overall attention I had here the recent information that clover is as of March 2025 regarded as deprecated. So it looks that it will be removed in Mesa 25.1. Until recently, the consensus was to wait with the deletion of clover until rusticl is also available for the r600 Mesa driver. But it looks that this has now changed, further information can be found at the corresponding Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19385 |
Created attachment 207731 [details] Constructor/destructor workaround Hello. This is a long standing bug (at least for me), but I've decided to report only now. When you use an OpenCL program using lang/clover, the programm will crash at exit. I've tracked the problem to a file src/util/u_queue.c in mesa sources. The segmentation fault occurs while executing atexit handlers (one such hander is added in global_init() ). BTW, there is nothing wrong with atexit_handler() function itself: it can be empty and the result will be the same (segmentation fault). I tried to reproduce this behavior in a test program without success, so I have no idea what is causing the bug. Meanwhile, I wrote a little workarond using constructor and destructor functions (supported by both clang and gcc). This works just fine without any errors.