Bug 214204

Summary: lang/clover: Unable to run OpenCL programs while X11 session is active
Product: Ports & Packages Reporter: Alexey Dokuchaev <danfe>
Component: Individual Port(s)Assignee: freebsd-x11 (Nobody) <x11>
Status: New ---    
Severity: Affects Only Me CC: greg, jbeich, w.schwarzenfeld
Priority: --- Flags: bugzilla: maintainer-feedback? (x11)
Version: Latest   
Hardware: amd64   
OS: Any   

Description Alexey Dokuchaev freebsd_committer 2016-11-04 10:26:07 UTC
I've decided to play with OpenCL on AMD A8-5550M APU laptop of mine a bit, so I've read our wiki [1], installed necessary ports, googled for a simple OpenCL program source code [2], built it, and run.

It ran OK, but only when I switched from my X11 session to console, or if I run it as root.  Being inside X, it reports:

> radeon: Failed to get PCI ID, error number -13
> Using platform: Clover
>  No devices found. Check OpenCL installation!
devel/clinfo also behaves this way (finds no devices) and spits out the same error.  My user belongs to "video" group, and permissions on /dev/dri/* are sane (default, "rw-" for "video" group).

-13 indicates "permission denied" (-EACCES).  Apparently, similar (or perhaps related) bug was discovered and fixed in Mesa back in 2012 [3].  The code in current Mesa is different though.  Perhaps FreeBSD needs special treatment, or original bug resurfaced.  I'm not an expert on Mesa codebase but will gladly provide any additional details and happy to test patches.

[1] https://wiki.freebsd.org/Graphics/OpenCL
[2] http://simpleopencl.blogspot.ru/2013/06/tutorial-simple-start-with-opencl-and-c.html
[3] https://lists.freedesktop.org/archives/mesa-commit/2012-July/038068.html
Comment 1 Walter Schwarzenfeld freebsd_triage 2018-01-18 02:07:02 UTC
Is this still relevant?
Comment 2 Alexey Dokuchaev freebsd_committer 2018-01-18 10:14:34 UTC
I believe so; albeit I cannot test it right now on the same hardware, I do not recall any commits and/or discussions that might have changed anything in this regard since it was reported.
Comment 3 Jan Beich freebsd_committer 2018-01-19 03:01:55 UTC
(In reply to Alexey Dokuchaev from comment #0)
> radeon: Failed to get PCI ID, error number -13

Find out which DRM ioctl fails and/or try radeonkms from graphics/drm-next-kmod. Maybe sharing a device between r600 and clover requires userptr or is supported only on newer hardware.
Comment 4 Jan Beich freebsd_committer 2018-01-19 04:06:37 UTC
Greg, can you reproduce the bug on amdgpu? For one, i915kms via lang/beignet requires userptr for OpenCL 2.0 but maybe amdgpu/radeonkms via lang/clover require it for any OpenCL version.
Comment 5 Alexey Dokuchaev freebsd_committer 2018-01-19 06:29:13 UTC
Playing with `graphics/drm-next-kmod' back in May 2017 did not bring me any luck: loading radeonkms immediately locked up the entire system, end of story.

(Not that I was very frustrated by that fact; I'm not really interested in anything next-ish anyway, I prefer to work with in-tree code and preferably of older versions.)
Comment 6 Greg V 2018-01-19 12:26:37 UTC
(In reply to Jan Beich from comment #4)
clover never worked for me on amdgpu, instant crash regardless of X11 sessions or whatever, I'm not sure if it's even supposed to work on amdgpu?

I'll post the stack trace here when I get home
Comment 7 Greg V 2018-01-19 20:19:09 UTC
err actually it hangs in a mutex, while creating a pthread o_0 https://gist.github.com/myfreeweb/c27af4790d88c37fb0f9314883bc1d78
Comment 8 Jan Beich freebsd_committer 2018-01-20 00:39:11 UTC
(In reply to Greg V from comment #7)
That looks similar to bug 220767. Try to prepend LD_PRELOAD=/lib/libthr.so.3
Comment 9 Jan Beich freebsd_committer 2018-01-20 00:42:58 UTC
or rebuild devel/clinfo with LDFLAGS+=-lpthread
Comment 10 Greg V 2018-01-20 10:33:26 UTC
(In reply to Jan Beich from comment #9)
wow thanks it works!! segfaults on exit

* thread #1, name = 'clinfo', stop reason = signal SIGSEGV
  * frame #0: 0x0000000807356260
    frame #1: 0x0000000800a2b8f5 libc.so.7`__cxa_finalize(dso=0x0000000000000000) at atexit.c:239
    frame #2: 0x00000008009b9051 libc.so.7`exit(status=0) at exit.c:74
    frame #3: 0x0000000000401526 clinfo`___lldb_unnamed_symbol1$$clinfo + 390

but outputs all device information just fine. (Currently running under Weston, not Xorg)