Bug 275388 - graphics/mesa-dri: 23.1.8 breaks some radeon cards
Summary: graphics/mesa-dri: 23.1.8 breaks some radeon cards
Status: Closed FIXED
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-x11 (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-11-27 20:41 UTC by Patrick Mackinlay
Modified: 2023-12-26 02:53 UTC (History)
5 users (show)

See Also:
bugzilla: maintainer-feedback? (x11)


Attachments
patch (2.37 KB, patch)
2023-12-26 02:53 UTC, Ivan Rozhuk
rozhuk.im: maintainer-approval?
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Patrick Mackinlay 2023-11-27 20:41:02 UTC
I have a FreeBSD 13.2 machine with an AMD Radeon RX 550 / 550 running X11 with the amdgpu driver

Upgrading

mesa-dri: 22.3.7_3 -> 23.1.8_1
mesa-libs: 22.3.7_2 -> 23.1.8

breaks X11. The display is a just a mess of randomly coloured pixel. Apps such as chrome/firefox/thunderbord have new log lines reporting

ac_rtld error: !data || data->d_size != shdr->sh_size
LLVM failed to upload shader

My Xorg log reports
[66.956] (II) AMDGPU(0): glamor X acceleration enabled on AMD Radeon RX 550 / 550 Series (polaris12, LLVM 16.0.6, DRM 3.40, 13.2-RELEASE-p2)

I have another machine with an onboard intel graphics cards with exactly the same kernel and set of ports and it works fine after the upgrade.

Other people have reported similar issues at:

https://forums.freebsd.org/threads/mesa-dri-23-1-8-and-radeon-570-and-failed-to-build-shader-variant.91095/

Other people have reported
RX 6700 XT: working
Radeon RX 6800: broken
Radeon 570: broken
Comment 1 Jan Beich freebsd_committer freebsd_triage 2023-11-27 23:13:39 UTC
Can you reproduce with modesetting DDX instead of xf86-video-amdgpu? It should be default without xorg.conf(5) or can be forced by using Driver "modesetting" in Device section.

Can you reproduce with graphics/mesa-devel? If not then updating to 23.2.1 or the upcoming 23.3.0 may help.
Comment 2 Ivan Rozhuk 2023-11-28 00:44:50 UTC
+1 Radeon RX 5600 XT

graphics/mesa-devel - installing this "fix" issue.

Rebuilding and reinstalling xf86-video-amdgpu, mesa-*, xorg-sever does not help.
Replacing "admgpu" -> "modesettings" does not help.
Comment 3 Emmanuel Vadot freebsd_committer freebsd_triage 2023-11-28 08:16:40 UTC
I have no problems here with my AMD Radeon RX 550 Series
All tests were done without xorg.conf, with xf86-video-amdgpu/ati installed and with those not installed.

"
[66.956] (II) AMDGPU(0): glamor X acceleration enabled on AMD Radeon RX 550 / 550 Series (polaris12, LLVM 16.0.6, DRM 3.40, 13.2-RELEASE-p2)
"

You seems to be using LLVM16 which isn't the default, any other non-default settings ?
Comment 4 Ivan Rozhuk 2023-11-28 08:52:51 UTC
(In reply to Emmanuel Vadot from comment #3)

All my settings:
http://netlab.dhis.org/download/software/os_cfg/FBSD/13/base/
+
http://netlab.dhis.org/download/software/os_cfg/FBSD/13/wks/
LLVM - 15, I force mesa default LLVM for all other ports except FF.

I do not try xorg without config, but suspect that only EFI framebuffer will work, without any acceleration.


Even more: after mesa update but before system reboot SDL apps and mpv (gpu video out used) show same visual noice.

After reboot system I see apps windows fragments with text that was opened before reboot instead of slim logon promt.
Comment 5 Emmanuel Vadot freebsd_committer freebsd_triage 2023-11-28 08:57:13 UTC
(In reply to Ivan Rozhuk from comment #4)
> I do not try xorg without config, but suspect that only EFI framebuffer will work, without any acceleration.

 Why ? config-less Xorg have worked for years.

Please test with official packages or without any settings.
Comment 6 Patrick Mackinlay 2023-11-28 15:50:14 UTC
I am not sure why my X11 drivers were built with llvm16, the build server generally builds ports with BATCH=1. There dont seem to be any other non default options. For some reason the build server built most things with llvm15 but the reset with llvm16:

Installed packages to be REMOVED:
        llvm16: 16.0.6_6
        mesa-dri: 23.1.8_1
        xf86-input-evdev: 2.10.6_7
        xf86-input-keyboard: 1.9.0_5
        xf86-input-libinput: 1.3.0
        xf86-input-mouse: 1.9.3_4
        xf86-input-synaptics: 1.9.1_10
        xf86-video-amdgpu: 22.0.0_1
        xf86-video-ati: 19.1.0_6,1
        xf86-video-intel: 2.99.917.923,1
        xf86-video-nv: 2.1.22
        xf86-video-scfb: 0.0.7_1
        xf86-video-vesa: 2.5.0_2
        xorg-drivers: 7.7_7
        xorg-server: 21.1.9,1

I rebuilt those with llvm15 and the issue persisted with the amdgpu driver, the modesetting driver and with no xorg config file (amdgpu).

However, I can confirm that installing the latest mesa-devel (I was not using mesa devel before) fixes the problem. So the problem goes away either if you downgrade the mesa ports:

mesa-dri: 22.3.7_3 -> 23.1.8_1
mesa-libs: 22.3.7_2 -> 23.1.8

or if you install mesa-devel-23.3.b.389 leaving the following mesa ports

mesa-devel-23.3.b.389 
mesa-dri-23.1.8_1
mesa-libs-23.1.8
Comment 7 Patrick Mackinlay 2023-11-28 15:56:00 UTC
I should mention that I also tested using the official mesa-dri and mesa-libs packages with the amdgpu driver only and without mesa-devel and the issue persisted
Comment 8 Ivan Rozhuk 2023-11-29 04:05:30 UTC
(In reply to Emmanuel Vadot from comment #5)

> Please test with official packages or without any settings.

I can not broke my main workstation for tests.

Same config on Ryzen 5750G work without issues.
I assume that some amdgpu family chips support has broken in mesa.
Comment 9 Jan Beich freebsd_committer freebsd_triage 2023-11-30 03:24:08 UTC
Does the patch in bug 275443 help?
Comment 10 Emmanuel Vadot freebsd_committer freebsd_triage 2023-11-30 09:00:41 UTC
(In reply to Jan Beich from comment #9)
I hope not because in Comment 7 they said that using the official packages make no difference and the official package shouldn't have this problem
Comment 11 Tao Zhen 2023-12-01 02:21:02 UTC
(In reply to Jan Beich from comment #9)
The patch fixed my display. I have RX580 on 14.0. Thanks!
Comment 12 Iouri V. Ivliev 2023-12-01 11:23:59 UTC
(In reply to Jan Beich from comment #9)
mesa-dri-23.1.8_2 linked against libelf.so.1 from elfutils-0.187 works for me.
But mesa-dri-23.1.8_1 and libelf.so.2 from the base system doesn't.

# uname -srm
FreeBSD 13.2-STABLE amd64
# pkg info -x llvm
llvm15-15.0.7_7
# pciconf -lv vgapci0
vgapci0@pci0:10:0:0:    class=0x030000 rev=0xc8 hdr=0x00 vendor=0x1002 device=0x1638 subvendor=0x1043 subdevice=0x8809
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'Cezanne [Radeon Vega Series / Radeon Vega Mobile Series]'
    class      = display
    subclass   = VGA
Comment 13 Emmanuel Vadot freebsd_committer freebsd_triage 2023-12-01 11:58:55 UTC
Ok so I did test on my side.
Using libelf from base works perfectly fine.
The problem is indeed that if one have elfutils installed it confuses mesa.
None of this would happens if everyone used poudriere to build packages in a clean env btw ...
Comment 14 Ivan Rozhuk 2023-12-01 12:20:05 UTC
(In reply to Emmanuel Vadot from comment #13)

I do not see how libelf breaks mesa, I only see that it broken for some H/W.
My mesa uses system libelf, https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=273839 i do apply this and have libelf.pc in system.
Only glib20 somehow continue use libelf from ports.
Comment 15 Emmanuel Vadot freebsd_committer freebsd_triage 2023-12-02 09:04:11 UTC
(In reply to Ivan Rozhuk from comment #14)

It's nice to hear that you have an heavily modified system only now ...
Anyway, even with your libelf.pc this doesn't change a thing.
Mesa with https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26284 applied and -Dlibelf=disabled (like it's right now) will ignore the .pc for libelf (any .pc). But since you have libelf from devel/elfutils installed and that you don't build your package with poudriere mesa will link with libelf from base but use the headers from devel/elfutils (See bug #275443 for an explanation). This cause the "LLVM failed to upload shade" error.
Comment 16 Patrick Mackinlay 2023-12-02 17:56:56 UTC
(In reply to Jan Beich from comment #9)

I can confirm that the patch fixes mesa-dri-23.1.8_1 for me
Comment 17 George Mitchell 2023-12-02 22:50:38 UTC
I had a similar problem with drm-510-kmod and amdgpu, and reverting mesa-dri and mesa-libs to my previously installed version fixed it.

I note that mesa version 23.3.0 has hit the ports tree; that might help, or it might be just the same ...
Comment 18 Ivan Rozhuk 2023-12-03 20:34:56 UTC
+1 on Ryzen 5 2500U

Updated mesa not help.


Also it links with libs that not present in Makefile:
mesa-devel is missing a required shared library: libxcb-keysyms.so.1
mesa-devel is missing a required shared library: libelf.so.1
mesa-dri is missing a required shared library: libxcb-keysyms.so.1
Comment 19 commit-hook freebsd_committer freebsd_triage 2023-12-04 08:56:11 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/ports/commit/?id=9f41e650f5645f1e50d8e51eb53ea231ff9f5149

commit 9f41e650f5645f1e50d8e51eb53ea231ff9f5149
Author:     Emmanuel Vadot <manu@FreeBSD.org>
AuthorDate: 2023-12-04 08:51:25 +0000
Commit:     Emmanuel Vadot <manu@FreeBSD.org>
CommitDate: 2023-12-04 08:55:01 +0000

    graphics/mesa: Fix port when elfutils is installed

    By default when building in a clean env (i.e. poudriere) libelf from base
    will be used.
    When building with an unclean env and if devel/elfutils is installed build
    system will be confused and use libelf headers from ${LOCALBASE}/include but
    libelf from base.

    Fix this.

    Sponsored by:   Beckhoff Automation GmbH & Co. KG
    PR:             275388

 graphics/mesa-dri/Makefile.common | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)
Comment 20 Benjamin Takacs 2023-12-04 10:17:27 UTC
(In reply to commit-hook from comment #19)
As noted in PR 275443 and in https://docs.freebsd.org/en/books/porters-handbook/makefiles/#makefile-automatic-dependencies that is only a workaround and not a fix and it shouldn't be used as it causes problems when the building system changes and with it the dependencies of the port.
Comment 21 Ivan Rozhuk 2023-12-05 07:10:41 UTC
(In reply to Benjamin Takacs from comment #20)
+1 I do not like such type of "auto" deps.
Add option for this or disable ports libelf linking.

I do not see profit form ports libelf.
Comment 22 Emmanuel Vadot freebsd_committer freebsd_triage 2023-12-05 08:23:44 UTC
(In reply to Ivan Rozhuk from comment #21)

I've not been able to completely disable header polution when elfutils is installed which is why I've done it this way.
If you don't want problems just use poudriere to compile ports in a clean env.
Now, can anyone who said having problems can confirm that the last update fixes it for them please ?
Comment 23 George Mitchell 2023-12-05 15:04:24 UTC
(In reply to Emmanuel Vadot from comment #22)
I am one of those apostates who uses portmaster, as I consider poudriere much too heavyweight for my use.  (I would use packages except it's almost impossible to avoid print/cups with packages.)  I am happy to report that the latest change indeed DOES fix the problem for me.
Comment 24 Ivan Rozhuk 2023-12-05 16:50:18 UTC
(In reply to Emmanuel Vadot from comment #22)
poudriere = "works on developer host", like docker, venv and other heavy workaround that compensate low developers/admins skills.

I do not use poudriere at least because it requires ZFS and other staff.

Probably some time I will write shell script that will create chroot, mount via nullfs all except /usr/local + /var, and install all required to build packages via pkg create + pkg add.
There is small overhead to pkg create+add for deps and same for export result.

Even portmaster can be extended to do this.
Comment 25 gnikl 2023-12-05 22:04:35 UTC
(In reply to commit-hook from comment #19)
>    When building with an unclean env and if devel/elfutils is installed build
>    system will be confused and use libelf headers from ${LOCALBASE}/include but
>    libelf from base.
This is a genuine mesa bug, isn't it?
Disabling libelf support with -Dlibelf=disabled should cause mesa not using libelf altogether. Thus still using libelf headers and/or linking against libelf should simply not happen. Why have an option to disable a feature if it does not do what it is intended for?
Comment 26 Emmanuel Vadot freebsd_committer freebsd_triage 2023-12-06 06:56:15 UTC
(In reply to gnikl from comment #25)

Yes and no.
Mesa needs libelf, what -Dlibelf=disabled do is not trying to use pkgconfig to get libelf info but in meson.build if it's not found (or disabled) it will fallback to cc.find_library (see https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/meson.build?ref_type=heads#L1849).
And since others deps adds ${LOCALBASE}/include in the cflags the libelf headers from elfutils resulting in a confused mesa trying to use libelf from base iwth struct/funcs definition from elfutils.
Comment 27 Emmanuel Vadot freebsd_committer freebsd_triage 2023-12-06 06:56:44 UTC
(In reply to George Mitchell from comment #23)

Thanks for confirming, closing the bug now.
Comment 28 Ivan Rozhuk 2023-12-26 02:53:40 UTC
Created attachment 247252 [details]
patch

This patch:
 - force use libelf from system, always
 - disable: libudev, openmp, xcb_keysyms - no more auto detection for port deps