Bug 217886 - xorg fails to init GL on stable/11 because of a devel/libdevq bug(?)
Summary: xorg fails to init GL on stable/11 because of a devel/libdevq bug(?)
Status: Closed FIXED
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-x11 (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-03-18 11:13 UTC by holindho
Modified: 2017-04-09 08:40 UTC (History)
2 users (show)

See Also:


Attachments
Fix libdevq on stable/11 (754 bytes, text/x-csrc)
2017-03-18 11:13 UTC, holindho
no flags Details
update libdrm to 2.4.76 and drop dependency on libdevq (44.91 KB, patch)
2017-03-30 13:01 UTC, Matthew Rezny
no flags Details | Diff
update libdrm to 2.4.76 and drop dependency on libdevq (44.82 KB, patch)
2017-03-30 17:08 UTC, Matthew Rezny
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description holindho 2017-03-18 11:13:58 UTC
Created attachment 180922 [details]
Fix libdevq on stable/11

Machine is macpro1,1 and video card ATI Radeon 5xxx. Xorg fails to init GL as follows on FreeBSD stable/11 amd64 with xorg ati driver:

pci id for fd 6: 0000:0000, driver (null)
EGL_MESA_drm_image required.
pci id for fd 7: 0000:0000, driver (null)

The error means that gbm/EGL cannot init and that will cause rest of the xorg/mesa GL to fail. Xorg works in software renderer mode though.

The reason seems to be in the (fragile) libdevq and kernel interaction. 

Sysctl gives:
dev.vgapci.0.%location: slot=0 function=0 dbsf=pci0:8:0:0

...which is not supported by libdevq.

Attached a patch to correct libdevq's behaviour.
Comment 1 holindho 2017-03-18 11:31:12 UTC
The second form of the original code fails on my machine because of:

dev.vgapci.0.%parent: pci7

...which then yields a wrong 0:7:0:0 instead of 0:8:0:0.

Since there's a dbsf field, I don't see why not use it?
Comment 2 Matthew Rezny freebsd_committer 2017-03-24 23:20:50 UTC
libdevq is on the way out and I can assure you the new solution, implemented directly in libdrm, will not have this problem on your system because I do not use the dev.vgapci node at all. libdrm will get the bus ID from hw.dri nodes, and then use an ioctl on /dev/pci to retrieve the (sub)vendor/product and revision for the device.

I have already had success testing on a few systems.  Although I have already completely replaced use of libdevq, the previous approach was incomplete. There is a bit more work to be done before I declare libdrm ready.
Comment 3 holindho 2017-03-25 08:33:18 UTC
(In reply to Matthew Rezny from comment #2)

Good to hear it's getting replaced. Tracing the code from gbm/EGL down to libdevq revealed plenty of duplicated fragile looking code that attempts to figure out the device / bus ids.

The root cause of why I'm seeing the problem and many others are not seems to be in the ACPI parser in the kernel. Apple's firmware is probably a little broken and hands out weird PCI bus entries which, for instance, Linux seems to skip as invalid, but FreeBSD takes them in, creating an inconsistent PCI device tree. 

However, libdevq's parsing that the patch addresses still seems to me as broken as the first form does not appear valid at all on 11.0 systems (which just fall back to the older parsing method). I can live with my local patch, though, until a better libdrm emerges.
Comment 4 Matthew Rezny freebsd_committer 2017-03-30 13:01:05 UTC
Created attachment 181312 [details]
update libdrm to 2.4.76 and drop dependency on libdevq

I completed the work on libdrm earlier this week and it works in all the scenarios I can test. I have attached a patch in case you would like to test before it is committed to ports.
Comment 5 Jan Beich freebsd_committer 2017-03-30 14:30:04 UTC
Comment on attachment 181312 [details]
update libdrm to 2.4.76 and drop dependency on libdevq

libdevq removal regresses bug 217585 while libdrm-2.4.76 works fine as is.

$ ls -lL /dev/dri
total 0
crw-rw----  1 root  video  0x25a Mar 30 08:17 card0
crw-rw----  1 root  video  0x29a Mar 30 08:17 controlD64
crw-rw----  1 root  video  0x2da Mar 30 08:17 renderD128

$ LIBGL_DEBUG=verbose glxgears
libGL: OpenDriver: trying /usr/local/lib/dri/i915_dri.so
libGL: Can't open configuration file /home/foo/.drirc: No such file or directory.
libGL: Using DRI2 for screen 0
libGL: Can't open configuration file /home/foo/.drirc: No such file or directory.
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
73 frames in 5.1 seconds = 14.434 FPS
^C
Comment 6 Jan Beich freebsd_committer 2017-03-30 14:46:35 UTC
Comment on attachment 181312 [details]
update libdrm to 2.4.76 and drop dependency on libdevq

After removing i915kms from kld_list in /etc/rc.conf and letting intel DDX do the loading I see DRM_MAJOR == 3.

$ ls -lL /dev/dri
total 0
crw-rw----  1 root  video  0x30d Mar 30 14:46 card0
crw-rw----  1 root  video  0x34d Mar 30 14:46 controlD64
crw-rw----  1 root  video  0x38d Mar 30 14:46 renderD128
Comment 7 Nils Beyer 2017-03-30 14:54:07 UTC
My system: FreeBSD 12.0-CURRENT #0 334829e6c(drm-next)-dirty
My card: Radeon RX460

Thanks for the patch - applied it successfully (had to remove four empty previous patch files).

Now "glxinfo" tries to load "amdgpu_dri.so". Because this file doesn't exist on
my system, I simply have copied "radeonsi_dri.so" to "amdgpu_dri.so".

Of course, it doesn't work:
--------------------------------------------------------------------------------
libGL error: MESA-LOADER: failed to retrieve device information
libGL error: core dri driver extension not found
libGL error: failed to load driver: amdgpu
libGL error: MESA-LOADER: failed to retrieve device information
libGL error: core dri or dri2 extension not found
libGL error: failed to load driver: amdgpu
--------------------------------------------------------------------------------

sysctl hw.dri
--------------------------------------------------------------------------------
hw.dri.timestamp_precision: 20
hw.dri.vblank_offdelay: 5000
hw.dri.debug: 0
hw.dri.0.modesetting: 1
hw.dri.0.busid: pci:0000:24:00.0
hw.dri.0.vblank: 
crtc ref count    last     enabled inmodeset
  00  00 00003788 00000001 00      00
  01  01 00000000 00000000 00      01
  02  01 00000000 00000000 00      01
  03  01 00000000 00000000 00      01
  04  01 00000000 00000000 00      01
hw.dri.0.bufs: 
hw.dri.0.clients: 
a dev            pid   uid      magic     ioctls
y drm/0        101792     0          0          0

hw.dri.0.vm: 
slot offset             size       type flags address            mtrr

hw.dri.0.name: amdgpu 0x199 pci:0000:24:00.0
--------------------------------------------------------------------------------


I feel we're almost there...
Comment 8 Matthew Rezny freebsd_committer 2017-03-30 15:04:00 UTC
(In reply to Jan Beich (mail not working) from comment #6)

What kernel are you running? DRM_MAJOR has been defined as 0 the entire time libdevq was used. I had to hack libdrm to ignore the value of DRM_MAJOR for drm-next until it was confirmed that it should be 1 there. How do you have 3?
I removed the hack to ignore DRM_MAJOR, which would be the cause of the regression in your case.
Comment 9 Nils Beyer 2017-03-30 15:10:25 UTC
(In reply to myself from comment #7)

Sorry, here the verbose version of 'glxinfo':

env LIBGL_DEBUG=verbose glxinfo
--------------------------------------------------------------------------------
name of display: :0
libGL: Can't open configuration file /home/nbe/.drirc: No such file or directory.
libGL error: MESA-LOADER: failed to retrieve device information
libGL: using driver amdgpu for 4
libGL: OpenDriver: trying /usr/local/lib/dri/amdgpu_dri.so
libGL: driver does not expose __driDriverGetExtensions_amdgpu(): Undefined symbol "__driDriverGetExtensions_amdgpu"
libGL error: core dri driver extension not found
libGL error: failed to load driver: amdgpu
libGL error: MESA-LOADER: failed to retrieve device information
libGL: using driver amdgpu for 4
libGL: OpenDriver: trying /usr/local/lib/dri/amdgpu_dri.so
libGL: driver does not expose __driDriverGetExtensions_amdgpu(): Undefined symbol "__driDriverGetExtensions_amdgpu"
libGL error: core dri or dri2 extension not found
libGL error: failed to load driver: amdgpu
libGL: OpenDriver: trying /usr/local/lib/dri/swrast_dri.so
libGL: Can't open configuration file /home/nbe/.drirc: No such file or directory.
libGL: Can't open configuration file /home/nbe/.drirc: No such file or directory.
--------------------------------------------------------------------------------
Comment 10 Matthew Rezny freebsd_committer 2017-03-30 15:11:08 UTC
(In reply to Nils Beyer from comment #7)

amdgpu_dri.so and radeonsi_dri.so are not interchangeable, they are paured with different kernel drivers. From the output of hw.dri sysctl I assume you are running drm-next and have the amdgpu drm driver loaded, so you are just missing the the amdgpu dri driver from Mesa. I'll should probably go enable that in port, there is no reason not to build it for those using stock ports on drm-next kernel.
Comment 11 Jan Beich freebsd_committer 2017-03-30 15:23:09 UTC
(In reply to Matthew Rezny from comment #8)
> What kernel are you running?

drm-next (c2af518bfd1) merged with /head@r316260 with some minor hacks like bug 206711 on amd64. GPU is Skylake GT2 which is supported by i965.

> DRM_MAJOR has been defined as 0 the entire time libdevq was used.

drm-next probably changed how drm is initialized which affects dev_t value. According to major(3) manpage it no longer can be used to identify devices classes. I don't remember when exactly major/minor was dropped but probably around the same time devfs was introduced.

> How do you have 3?

I don't know, probably specific to my hardware, kernel configuration, etc. DRM_MAJOR rarely dropped to 1, it was mostly 2 or 3. I didn't care until ports r433862 moved libdevq logic to libdrm and broke my setup.
Comment 12 Nils Beyer 2017-03-30 15:24:28 UTC
(In reply to Matthew Rezny from comment #10)

understood. If you have a patch ready or can tell me how to enable "amdgpu"
build, I'll be more than happy to test it...
Comment 13 Matthew Rezny freebsd_committer 2017-03-30 15:41:16 UTC
(In reply to Jan Beich (mail not working) from comment #11)

Ok, so it was bad luck that everyone that I checked with had major 1 on drm-next. I will remove the DRM_MAJOR checks once again.
Comment 14 Matthew Rezny freebsd_committer 2017-03-30 17:08:46 UTC
Created attachment 181319 [details]
update libdrm to 2.4.76 and drop dependency on libdevq

The DRM_MAJOR checks have been removed once again. Please test on drm-next and let me know the results.
Comment 15 Matthew Rezny freebsd_committer 2017-03-30 17:15:01 UTC
(In reply to Nils Beyer from comment #12)

Unfortunately, not just yet. I was assuming there was a switch for amdgpu that I had never bothered to flip, but upon inspecting the configure script I found that amdgpu is under the switch for radeonsi. So it looks like it should build but obviously it isn't, thus some investigation is needed.

As far as I know, drm-next has Mesa 17 in their ports, and as they are actively working on amdgpu support I expect they have amdgpu_dri.so. You may want to try their ports tree instead of stock.
Comment 16 Nils Beyer 2017-03-30 17:38:21 UTC
(In reply to Matthew Rezny from comment #15)

it seems that "amdgpu_dri.so" is the closed-source AMDGPU-Pro driver from AMD
itself. So something is borked now since "amdgpu.ko" wants that now...
Comment 17 Nils Beyer 2017-03-30 17:48:20 UTC
For what it's worth, I've downgraded to libdrm 2.4.75, and Mesa wants
"radeonsi_dri.so":
---------------------------------------------------------------------------------
#env LIBGL_DEBUG=verbose glxinfo
name of display: :0
libGL: Can't open configuration file /root/.drirc: No such file or directory.
libGL error: pci id for fd 4: 0000:0000, driver (null)
libGL error: No driver found
libGL error: failed to load driver: (null)
libGL error: pci id for fd 4: 0000:0000, driver (null)
libGL: OpenDriver: trying /usr/local/lib/dri/radeonsi_dri.so
pci id for fd 5: 0000:0000, driver (null)
libGL error: failed to create dri screen
libGL error: failed to load driver: radeonsi
libGL: OpenDriver: trying /usr/local/lib/dri/swrast_dri.so
---------------------------------------------------------------------------------
Comment 18 Matthew Rezny freebsd_committer 2017-03-30 17:55:42 UTC
(In reply to Nils Beyer from comment #17)

Is that with or without libdevq? You should be able to just change 2.4.76 to 2.4.75 in the Makefile, makesum, and then make as usually. I did all the libdevq replacement work on 2.4.75 and did not have to adjust the patch for 2.4.76. There were changes in 2.4.76 for Polaris support so there's a chance a change was made that causes it to try amdgpu instead of radeonsi.

Unfortunately, the newest I have uses r600, so I have no direct experience with radeonsi or amdgpu and must rely on user reports.
Comment 19 Jan Beich freebsd_committer 2017-03-30 18:06:49 UTC
Comment on attachment 181319 [details]
update libdrm to 2.4.76 and drop dependency on libdevq

Works fine now, thank you. If you plan to upstream sometime better apply DRM_MAJOR checks everywhere then let FreeBSD use fallback e.g.,

  #ifdef __linux__
  #define DRM_MAJOR 226
  #endif

  #ifndef DRM_MAJOR
  #define DRM_MAJOR 0 /* ignore if unknown or unstable (e.g. FreeBSD) */
  #endif
Comment 20 Nils Beyer 2017-03-30 18:08:54 UTC
That was the original package version from the FreeBSD repository. You're correct,
after downversioning the Makefile to 2.4.75 (still having your patches active),
it still wants "amdgpu_dri.so".

I think it's time to bother the "drm-next" people, now. Thanks for your help
nonetheless...
Comment 21 Matthew Rezny freebsd_committer 2017-03-30 19:28:23 UTC
(In reply to Jan Beich (mail not working) from comment #19)

Thank you for confirming that.

I need to check with DragonFly to see if the MAJOR_ID is correct for them, in which case I'd like to keep the explicit define to 0, or if they are in the same situation as us in which case the default to 0 makes more sense. Either way, I would like to make the value for Linux explicit instead of the default. That detail wil get worked out in time. Since I did some refactoring to add support for our platform, I expect that upstream will want me to split this work into a series of patches and there may be some addition refactoring for their acceptance. Of course I'd like wider testing, as in from all users of ports, before I begin the upstreaming process.
Comment 22 holindho 2017-03-31 05:06:19 UTC
Comment on attachment 181319 [details]
update libdrm to 2.4.76 and drop dependency on libdevq

Thanks for the effort, my use cases appear to work with this patch. Tested on a MacPro1,1 / Radeon 5770, FreeBSD 11.0-p8 RELEASE. I had dri3 disabled in the Mesa builds and I did a "pkg delete libdevq" before testing just to make sure.
Comment 23 Matthew Rezny freebsd_committer 2017-04-09 08:40:29 UTC
Resolved by ports r438051