Summary: | After upgrading to 12.2-BETA3 and mesa-dri-20.2.0_1 OpenCL (clover) stopped working | ||
---|---|---|---|
Product: | Ports & Packages | Reporter: | shamaz.mazum |
Component: | Individual Port(s) | Assignee: | freebsd-x11 (Nobody) <x11> |
Status: | Closed FIXED | ||
Severity: | Affects Only Me | CC: | lwhsu, tjlegg, zeising |
Priority: | --- | ||
Version: | Latest | ||
Hardware: | Any | ||
OS: | Any |
Description
shamaz.mazum
2020-10-22 07:48:46 UTC
Have you tried rebuilding drm-fbsd12.0-kmod locally? Do you have clinfo installed? If so, can you provide the output? My output: $ clinfo MESA-LOADER: failed to retrieve device information MESA-LOADER: failed to retrieve device information MESA-LOADER: failed to retrieve device information MESA-LOADER: failed to retrieve device information Segmentation fault (core dumped) It used to locate the AMD processor prior to the latest libdrm update. (In reply to Niclas Zeising from comment #1) > Have you tried rebuilding drm-fbsd12.0-kmod locally? Yes, I rebuilt all modules which come from ports after upgrading to 12.2 (In reply to tjlegg from comment #2) > If so, can you provide the output? Sure, all seems to be working: Number of platforms 1 Platform Name Clover Platform Vendor Mesa Platform Version OpenCL 1.1 Mesa 20.2.0 Platform Profile FULL_PROFILE Platform Extensions cl_khr_icd Platform Extensions function suffix MESA Platform Name Clover Number of devices 1 Device Name Radeon RX 580 Series (POLARIS10, DRM 3.23.0, 12.2-BETA3, LLVM 10.0.1) Device Vendor AMD Device Vendor ID 0x1002 Device Version OpenCL 1.1 Mesa 20.2.0 Driver Version 20.2.0 Device OpenCL C Version OpenCL C 1.1 Device Type GPU Device Profile FULL_PROFILE Device Available Yes Compiler Available Yes Max compute units 36 Max clock frequency 1411MHz Max work item dimensions 3 Max work item sizes 256x256x256 Max work group size 256 Preferred work group size multiple 64 Preferred / native vector sizes char 16 / 16 short 8 / 8 int 4 / 4 long 2 / 2 half 0 / 0 (n/a) float 4 / 4 double 2 / 2 (cl_khr_fp64) Half-precision Floating-point support (n/a) Single-precision Floating-point support (core) Denormals No Infinity and NANs Yes Round to nearest Yes Round to zero No Round to infinity No IEEE754-2008 fused multiply-add No Support is emulated in software No Correctly-rounded divide and sqrt operations No Double-precision Floating-point support (cl_khr_fp64) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Address bits 64, Little-Endian Global memory size 4294967296 (4GiB) Error Correction support No Max memory allocation 3435973836 (3.2GiB) Unified memory for Host and Device No Minimum alignment for any data type 128 bytes Alignment of base address 32768 bits (4096 bytes) Global Memory cache type None Image support No Local memory type Local Local memory size 32768 (32KiB) Max number of constant args 16 Max constant buffer size 2147483392 (2GiB) Max size of kernel argument 1024 Queue properties Out-of-order execution No Profiling Yes Profiling timer resolution 0ns Execution capabilities Run OpenCL kernels Yes Run native kernels No Device Extensions cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) Clover clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [MESA] clCreateContext(NULL, ...) [default] Success [MESA] clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1) Platform Name Clover Device Name Radeon RX 580 Series (POLARIS10, DRM 3.23.0, 12.2-BETA3, LLVM 10.0.1) clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1) Platform Name Clover Device Name Radeon RX 580 Series (POLARIS10, DRM 3.23.0, 12.2-BETA3, LLVM 10.0.1) clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1) Platform Name Clover Device Name Radeon RX 580 Series (POLARIS10, DRM 3.23.0, 12.2-BETA3, LLVM 10.0.1) ICD loader properties ICD loader Name OpenCL ICD Loader ICD loader Vendor OCL Icd free software ICD loader Version 2.2.12 ICD loader Profile OpenCL 2.2 NOTE: your OpenCL library declares to support OpenCL 2.2, but it seems to support up to OpenCL 1.0 only. NOTE: your OpenCL library only supports OpenCL 1.0, but some installed platforms support OpenCL 1.1. Programs using 1.1 features may crash or behave unexpectedly Turns out, FreeBSD version is irrelevant. I've downgraded mesa-dri/mesa-libs/clover to 19.0.8_9 and everything is working again. So the bug is in the new mesa (In reply to shamaz.mazum from comment #0) > drmn0: GPU fault detected: 147 0x00004802 > drmn0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 > drmn0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0E048002 > drmn0: VM fault (0x02, vmid 7) at page 0, read from 'TC4' (0x54433400) (72) Does userland print anything relevant? Can you compare "clinfo" output between Mesa versions? Can you confirm the regression on -CURRENT? Mesa upstream is poorly tested on old kernels drivers. Stable Linux distros have old Mesa + old kernel but on FreeBSD packages are rolling while GPU kernel drivers are frozen except on -CURRENT. Also, Mesa ports are currently split in a way which is error-prone. > Does userland print anything relevant?
I checked if OpenCL functions return any errors. No, they do not. Difference in output of clinfo between versions:
--- test1 2020-11-05 08:44:01.248555000 +0300
+++ test2 2020-11-05 08:46:25.847840000 +0300
@@ -1,18 +1,18 @@
Number of platforms 1
Platform Name Clover
Platform Vendor Mesa
- Platform Version OpenCL 1.1 Mesa 19.0.8
+ Platform Version OpenCL 1.1 Mesa 20.2.0
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd
Platform Extensions function suffix MESA
Platform Name Clover
Number of devices 1
- Device Name Radeon RX 580 Series (POLARIS10, DRM 3.23.0, 12.2-RELEASE, LLVM 8.0.1)
+ Device Name Radeon RX 580 Series (POLARIS10, DRM 3.23.0, 12.2-RELEASE, LLVM 10.0.1)
Device Vendor AMD
Device Vendor ID 0x1002
- Device Version OpenCL 1.1 Mesa 19.0.8
- Driver Version 19.0.8
+ Device Version OpenCL 1.1 Mesa 20.2.0
+ Driver Version 20.2.0
Device OpenCL C Version OpenCL C 1.1
Device Type GPU
Device Profile FULL_PROFILE
@@ -29,17 +29,10 @@
short 8 / 8
int 4 / 4
long 2 / 2
- half 8 / 8 (cl_khr_fp16)
+ half 0 / 0 (n/a)
float 4 / 4
double 2 / 2 (cl_khr_fp64)
- Half-precision Floating-point support (cl_khr_fp16)
- Denormals No
- Infinity and NANs Yes
- Round to nearest Yes
- Round to zero No
- Round to infinity No
- IEEE754-2008 fused multiply-add No
- Support is emulated in software No
+ Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
@@ -69,7 +62,7 @@
Local memory type Local
Local memory size 32768 (32KiB)
Max number of constant args 16
- Max constant buffer size 2147483647 (2GiB)
+ Max constant buffer size 2147483392 (2GiB)
Max size of kernel argument 1024
Queue properties
Out-of-order execution No
@@ -78,33 +71,4 @@
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
- Device Extensions cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 cl_khr_fp16
-
-NULL platform behavior
- clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) Clover
- clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [MESA]
- clCreateContext(NULL, ...) [default] Success [MESA]
- clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1)
- Platform Name Clover
- Device Name Radeon RX 580 Series (POLARIS10, DRM 3.23.0, 12.2-RELEASE, LLVM 8.0.1)
- clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
- clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
- Platform Name Clover
- Device Name Radeon RX 580 Series (POLARIS10, DRM 3.23.0, 12.2-RELEASE, LLVM 8.0.1)
- clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
- clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
- clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1)
- Platform Name Clover
- Device Name Radeon RX 580 Series (POLARIS10, DRM 3.23.0, 12.2-RELEASE, LLVM 8.0.1)
-
-ICD loader properties
- ICD loader Name OpenCL ICD Loader
- ICD loader Vendor OCL Icd free software
- ICD loader Version 2.2.12
- ICD loader Profile OpenCL 2.2
- NOTE: your OpenCL library declares to support OpenCL 2.2,
- but it seems to support up to OpenCL 1.0 only.
- NOTE: your OpenCL library only supports OpenCL 1.0,
- but some installed platforms support OpenCL 1.1.
- Programs using 1.1 features may crash
- or behave unexpectedly
+ Device Extensions cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64
Tried to replace clCreateCommandQueue() with modern clCreateCommandQueueWithProperties() without any success.
I am reluctant to upgrade to CURRENT, but I can try to cherry-pick related linux KPI commits on top of releng/12.2 if you can say which commits are relevant.
Btw, upgrading mesa-libs and mesa-dri, but keeping the old version of clover seems to be an option.
(In reply to shamaz.mazum from comment #6) > - Device Name Radeon RX 580 Series (POLARIS10, DRM 3.23.0, 12.2-RELEASE, LLVM 8.0.1) > + Device Name Radeon RX 580 Series (POLARIS10, DRM 3.23.0, 12.2-RELEASE, LLVM 10.0.1) Can you try forcing LLVM_DEFAULT=80 in graphics/mesa-dri/Makefile.common then rebuilding lang/clover? I suspect it won't help but who knows. > I am reluctant to upgrade to CURRENT Try upgrading only kernel (keep COMPAT_FREEBSD12 from GENERIC) but leave userland (base and ports) intact, so it's easy to go back even without bectl(8). $ make kernel-toolchain $ make buildkernel $ make installkernel INSTKERNNAME=kernel.current $ nextboot -k kernel.current > I can try to cherry-pick related linux KPI commits on top of releng/12.2 > if you can say which commits are relevant. Each X.Y -> X.Y+1 update of GPU drivers spans *hundreds* of commits. And I've never had an AMD GPU, so can't make a guess. Besides, it'd be a waste of time looking through kernel commits if the issue can be reproduced on -CURRENT. Forgot to mention, building drm-current-kmod or drm-devel-kmod for non-default -CURRENT kernel would require adjusting some variables e.g., $ make install OSVERSION=9999999 KMODDIR=/boot/kernel.current SRC_BASE=/path/to/current/source/if/not/usr/src > Try upgrading only kernel (keep COMPAT_FREEBSD12 from GENERIC) but leave userland (base and ports) intact, so it's easy to go back even without bectl(8).
drm-current-kmod fails to build:
--- amdgpu_dm_pp_smu.o ---
/usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.4.62_3/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c:948:48: error: incompatible pointer types assigning to 'enum pp_smu_status (*)(struct pp_smu *, bool)' (aka 'enum pp_smu_status (*)(struct pp_smu *, _Bool)') from 'enum pp_smu_status (struct pp_smu *, BOOLEAN)' (aka 'enum pp_smu_status (struct pp_smu *, unsigned char)') [-Werror,-Wincompatible-pointer-types]
funcs->nv_funcs.set_pstate_handshake_support = pp_nv_set_pstate_handshake_support;
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
*** [amdgpu_dm_pp_smu.o] Error code 1
I think I must upgrade /usr/include/sys too, which I do not want to do ;) Unless there is another way
> drm-current-kmod fails to build:
Replacing BOOLEAN with bool is enough. The bug is still there with drm-current-kmod and llvm80.
It works again with mesa 20.2.3! PR can be closed |