Bug 240761 - graphics/mesa-dri: Segmentation fault occurs while executing atexit handlers (affects lang/clover: Segmentation fault in OpenCL programs)
Summary: graphics/mesa-dri: Segmentation fault occurs while executing atexit handlers ...
Status: In Progress
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-x11 (Nobody)
URL:
Keywords: crash, needs-qa
Depends on:
Blocks:
 
Reported: 2019-09-23 07:26 UTC by shamaz.mazum
Modified: 2024-10-27 01:08 UTC (History)
9 users (show)

See Also:
bugzilla: maintainer-feedback? (x11)


Attachments
Constructor/destructor workaround (882 bytes, patch)
2019-09-23 07:26 UTC, shamaz.mazum
no flags Details | Diff
Constructor/destructor workaround (mesa-22.3.7+) (956 bytes, patch)
2023-05-05 05:03 UTC, crahman
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description shamaz.mazum 2019-09-23 07:26:26 UTC
Created attachment 207731 [details]
Constructor/destructor workaround

Hello. This is a long standing bug (at least for me), but I've decided to report only now. When you use an OpenCL program using lang/clover, the programm will crash at exit.

I've tracked the problem to a file src/util/u_queue.c in mesa sources. The segmentation fault occurs while executing atexit handlers (one such hander is added in global_init() ). BTW, there is nothing wrong with atexit_handler() function itself: it can be empty and the result will be the same (segmentation fault).

I tried to reproduce this behavior in a test program without success, so I have no idea what is causing the bug.

Meanwhile, I wrote a little workarond using constructor and destructor functions (supported by both clang and gcc). This works just fine without any errors.
Comment 1 Kubilay Kocak freebsd_committer freebsd_triage 2019-09-23 08:02:22 UTC
Thank you for the report and patch.

You mention "the program will crash at exit", could you detail what programs crashes? Are they programs from ports/packages? Are they custom programs you're unable to provide additional information, or core's/backtraces for?

It may well also be worth reporting this issue upstream, as they may have a much better chance of reproduction or isolation, particularly in debug/traces are unable to be provided
Comment 2 shamaz.mazum 2019-09-23 08:23:33 UTC
> could you detail what programs crashes? Are they programs from ports/packages?

I mean ALL programs using clover ocl provider, including devel/clinfo. One important thing, I forgot to mention is that you must have a suitable GPU (AMD Radeon HD or RX series will do) and drm-kmod port installed or old radeonkms module loaded (latter works only for old HD cards).

Try to launch clinfo and you will see something like this:

Number of platforms                               1
  Platform Name                                   Clover
  Platform Vendor                                 Mesa
  Platform Version                                OpenCL 1.1 Mesa 18.3.2
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             MESA

  Platform Name                                   Clover
Number of devices                                 1
  Device Name                                     Radeon RX 580 Series (POLARIS10, DRM 3.23.0, 12.0-RELEASE-p10, LLVM 8.0.1)
  Device Vendor                                   AMD
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.1 Mesa 18.3.2
  Driver Version                                  18.3.2
  Device OpenCL C Version                         OpenCL C 1.1 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Max compute units                               36
  Max clock frequency                             1411MHz
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Preferred work group size multiple              64
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 2 / 2       
    half                                                 8 / 8        (cl_khr_fp16)
    float                                                4 / 4       
    double                                               2 / 2        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Address bits                                    64, Little-Endian
  Global memory size                              4294967296 (4GiB)
  Error Correction support                        No
  Max memory allocation                           3435973836 (3.2GiB)
  Unified memory for Host and Device              No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       32768 bits (4096 bytes)
  Global Memory cache type                        None
  Image support                                   No
  Local memory type                               Local
  Local memory size                               32768 (32KiB)
  Max constant buffer size                        2147483647 (2GiB)
  Max number of constant args                     16
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Profiling timer resolution                      0ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
  Device Available                                Yes
  Compiler Available                              Yes
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 cl_khr_fp16

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Clover
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [MESA]
  clCreateContext(NULL, ...) [default]            Success [MESA]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Clover
    Device Name                                   Radeon RX 580 Series (POLARIS10, DRM 3.23.0, 12.0-RELEASE-p10, LLVM 8.0.1)
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Clover
    Device Name                                   Radeon RX 580 Series (POLARIS10, DRM 3.23.0, 12.0-RELEASE-p10, LLVM 8.0.1)

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.12
  ICD loader Profile                              OpenCL 2.2
	NOTE:	your OpenCL library declares to support OpenCL 2.2,
		but it seems to support up to OpenCL 2.1 only.

Ошибка сегментации(core dumped)

I think maybe others FreeBSD+OpenCL+clover users may confirm this. My guess is that this is FreeBSD specific, because otherwise it would be noticed (maybe I am incorrect, but I think that there are not many OpenCL users who use FreeBSD).
Comment 3 Kubilay Kocak freebsd_committer freebsd_triage 2019-09-25 02:50:58 UTC
If you can obtain a backtrace of the crash (as an attachment), that might be handy
Comment 4 shamaz.mazum 2019-09-25 14:49:45 UTC
This is a backtrace (it's short, no need in attachment).

(lldb) bt
* thread #1, name = 'clinfo', stop reason = signal SIGSEGV
  * frame #0: 0x0000000806ce81d0
    frame #1: 0x0000000800a030c5 libc.so.7`__cxa_finalize(dso=0x0000000000000000) at atexit.c:239
    frame #2: 0x0000000800992cc1 libc.so.7`exit(status=0) at exit.c:74
    frame #3: 0x0000000000401526 clinfo`___lldb_unnamed_symbol1$$clinfo + 390
Comment 5 Graham Perrin freebsd_committer freebsd_triage 2021-06-13 00:20:35 UTC
Nothing useful to add here (sorry) other than, reproducible with FreeBSD 14.0-CURRENT. 

Background to my test: <https://forums.freebsd.org/threads/78825/>

----

% uname -KrU
14.0-CURRENT 1400021 1400021
% sudo pkg install -q -y clover
grahamperrin's password:
=====
Message from clover-20.2.3:

--
===>   NOTICE:

This port is deprecated; you may wish to reconsider installing it:

Uses EOL Python 2.7 via devel/libclc.

It is scheduled to be removed on or after 2021-06-23.
% clinfo > /dev/null
Segmentation fault (core dumped)
% gdb attach clinfo.core
GNU gdb (GDB) 10.2 [GDB v10.2 for FreeBSD]
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd14.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
attach: No such file or directory.
[New LWP 107531]
[New LWP 130945]
[New LWP 130946]
[New LWP 130947]
[New LWP 130948]
Core was generated by `clinfo'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00000008107d5f00 in ?? ()
[Current thread is 1 (LWP 107531)]
(gdb) bt
#0  0x00000008107d5f00 in ?? ()
#1  0x00000008003e0524 in ?? ()
#2  0x00007fffffffe388 in ?? ()
#3  0x0000000800b35a00 in ?? ()
#4  0x0000000000000000 in ?? ()
(gdb) q
% sudo pkg delete -q -y clover libclc && sudo pkg clean -q -y
%
Comment 6 Lars Herschke 2023-02-02 18:01:00 UTC
With FreeBSD 12.4, mesa-dri-22.3.3_2, clover-22.3.3_1 and clinfo-3.0.21.02.21 the problem still exists.

However, the above (slightly customized) patch does not help me with the mentioned versions.
Comment 7 Lars Herschke 2023-02-03 10:31:02 UTC
The patch works after all. I had not noticed that clover is a slave port of mesa-dri and thus did not rebuild it after patching.
Comment 8 Jan Beich freebsd_committer freebsd_triage 2023-03-11 19:29:45 UTC
attachment 207731 [details] doesn't help here on an Intel iGPU (see ports d8990eff958b). Rusticl isn't affected despite using the same compute support in Gallium.

$ pkg install mesa-devel clinfo
$ IRIS_ENABLE_CLOVER=1 clinfo >/dev/null 2>&1
Segmentation fault
(lldb) bt
* thread #1, name = 'clinfo', stop reason = signal SIGSEGV: invalid address (fault address: 0x840e27a10)
  * frame #0: 0x0000000840e27a10
    frame #1: 0x000000082348e55e libc.so.7`__cxa_finalize + 366
    frame #2: 0x000000082348eae1 libc.so.7`exit + 33
    frame #3: 0x00000000002098b7 clinfo`_start(ap=<unavailable>, cleanup=<unavailable>) at crt1_c.c:75:2
Comment 9 crahman 2023-05-05 05:01:05 UTC
While trying to use OpenCL with an RX5700XT on 14-CURRENT, this problem occurred.

Sure enough, the patch allows clinfo to run to completion.

The patch is a little different in mesa-22.3.7, and I've attached the modified patch (for /usr/ports/graphics/mesa-dri/files; be sure to rebuild all the associated port components).
Comment 10 crahman 2023-05-05 05:03:34 UTC
Created attachment 241985 [details]
Constructor/destructor workaround (mesa-22.3.7+)

Updated patch for more recent versions of mesa.
Comment 11 Emmanuel Vadot freebsd_committer freebsd_triage 2023-11-22 10:49:49 UTC
I'm pretty sure that I saw this issue when trying to run piglit a while ago.
That being said I'm not confident of patching our tree without having upstream in the loop first. Can you open a MR upstream please ?
Comment 12 Vedran Miletic 2024-09-05 07:01:23 UTC
I believe I am seeing this issue. I will check if it happens on Linux as well.
Comment 13 Vedran Miletic 2024-09-06 18:22:50 UTC
Doesn't happen on Linux 6.10.8 with Mesa 24.2.1 and LLVM 18.1.8. Any ideas why this could be FreeBSD-specific?