Summary: | graphics/mesa-dri: 22.3.0 breaks direct rendering (radeon, SUMO, r600_dri) | ||
---|---|---|---|
Product: | Ports & Packages | Reporter: | Felix Palmen <zirias> |
Component: | Individual Port(s) | Assignee: | freebsd-x11 (Nobody) <x11> |
Status: | Closed FIXED | ||
Severity: | Affects Some People | CC: | KOT, dev, george, grahamperrin, manu, owen, tatsuki_makino |
Priority: | --- | Keywords: | crash |
Version: | Latest | Flags: | bugzilla:
maintainer-feedback?
(x11) |
Hardware: | Any | ||
OS: | Any | ||
URL: | https://gitlab.freedesktop.org/mesa/mesa/-/issues/7931 | ||
Attachments: |
Description
Felix Palmen
2022-12-12 08:34:26 UTC
Are you using gnome ? If yes this could be related to https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=267915 (In reply to Emmanuel Vadot from comment #1) No, currently x11-wm/fvwm3 with x11-wm/picom as compositor, but the latter now just segfaults on startup. I'll first verify by reverting mesa-* to 22.2.3 which will take some time because I have to wait for large builds... (In reply to Felix Palmen from comment #2) I don't really know much on how to use those but those seems to work on my amdgpu machine. Please try some other window manager (even wayland with sway) before reverting mesa. (In reply to Emmanuel Vadot from comment #3) > Please try some other window manager I don't think this would do any good. fvwm3 itself doesn't use glx, picom now crashes on start, other glx clients I tried (glxgears, mpv) fail as well, and glxinfo now shows me: | OpenGL renderer string: llvmpipe (LLVM 15.0.6, 128 bits) Also, Xorg crashing unless you force radeon(4) to use EXA (instead of glamor) is new and certainly unrelated to the wm. > (even wayland with sway) Uhm, TBH, I have no idea about wayland and never tried it. Could attempt this later … first I want to be 100% certain that the problem with X11 now is indeed the new mesa. Additional info, as I do have KDE plasma5 installed which also worked fine before, I gave it a shot: It doesn't even show a desktop because plasmashell now also segfaults. Created attachment 238734 [details] Xorg.0.log from boot environment n259662-ebdf27b6f367-d n259662-ebdf27b6f367-c is not bugged. Four mesa- packages are locked. n259662-ebdf27b6f367-d is amongst the boot environments that _are_ bugged. Whilst this environment was active, I unlocked the four packages then upgraded these three alone: mesa-gallium-va mesa-gallium-vdpau mesa-libs * startx, run manually, worked as expected following a restart. After upgrading mesa-dri then reboot -r * startx leads to a crash of Xorg. ---- % bectl list -c creation | grep ebdf27b6f367 n259662-ebdf27b6f367-a - - 3.27G 2022-12-11 12:41 n259662-ebdf27b6f367-b - - 45.2M 2022-12-11 14:01 n259662-ebdf27b6f367-c NR / 303G 2022-12-12 07:07 n259662-ebdf27b6f367-d - - 515M 2022-12-12 17:43 % pkg lock -l Currently locked packages: drm-510-kmod-5.10.113_8 mesa-dri-22.2.3 mesa-gallium-va-22.2.3 mesa-gallium-vdpau-22.2.3 mesa-libs-22.2.3 % uname -aKU FreeBSD mowa219-gjp4-8570p-freebsd 14.0-CURRENT FreeBSD 14.0-CURRENT #27 main-n259662-ebdf27b6f367-dirty: Sun Dec 11 11:31:52 GMT 2022 grahamperrin@mowa219-gjp4-8570p-freebsd:/usr/obj/usr/src/amd64.amd64/sys/GENERIC-NODEBUG amd64 1400074 1400074 % grep Xorg /var/log/messages Dec 11 13:49:41 mowa219-gjp4-8570p-freebsd kernel: pid 63653 (Xorg), jid 0, uid 0: exited on signal 6 (core dumped) Dec 12 18:08:16 mowa219-gjp4-8570p-freebsd kernel: pid 4564 (Xorg), jid 0, uid 0: exited on signal 6 (core dumped) % <https://bsd-hardware.info/?probe=64c92d49d9> (In reply to Felix Palmen from comment #0) > … Xorg crashes on startup trying to load the radeon driver with a > segmentation fault. … Here (with AMD Thames [Radeon HD 7550M/7570M/7650M], <https://bsd-hardware.info/?probe=64c92d49d9#pci:1002-6841-103c-17a9>), the radeonkms module loads without difficulty. My first observation of the crash of Xorg was with SDDM (the norm for me), % sysrc sddm_enable sddm_enable: YES % With SDDM _not_ started automatically, I can trigger the crash as root with startx, # grep -v \#\ /root/.xinitrc #!/bin/sh /usr/local/bin/twm & sleep 1 exec xterm # I tried to make a debug version, but Mk/Uses/meson.mk does not seem to be working. This is the backtrace when startx fails. * thread #1, name = 'Xorg', stop reason = signal SIGABRT * frame #0: 0x0000000800844d9a libc.so.7`__sys_thr_kill + 10 frame #1: 0x0000000800843174 libc.so.7`__raise + 52 frame #2: 0x00000008007b8e09 libc.so.7`abort + 73 frame #3: 0x00000000003c0aaa Xorg`OsAbort + 26 frame #4: 0x00000000003c7bac Xorg`___lldb_unnamed_symbol5157 + 76 frame #5: 0x00000000003c6302 Xorg`FatalError + 274 frame #6: 0x00000000003be365 Xorg`___lldb_unnamed_symbol5066 + 165 frame #7: 0x0000000800668c80 libthr.so.3`___lldb_unnamed_symbol638 + 224 frame #8: 0x000000080066816e libthr.so.3`___lldb_unnamed_symbol619 + 318 frame #9: 0x00007ffffffff003 frame #10: 0x00000008031de9ff r600_dri.so thread #2, name = 'Xorg', stop reason = signal SIGABRT frame #0: 0x00000008007e9c2a libc.so.7`__sys_poll + 10 frame #1: 0x0000000800665b06 libthr.so.3`___lldb_unnamed_symbol575 + 54 frame #2: 0x0000000800633d70 libudev.so.0`___lldb_unnamed_symbol247 + 192 frame #3: 0x0000000800662fd4 libthr.so.3`___lldb_unnamed_symbol538 + 324 thread #3, name = 'Xorg:rcs0', stop reason = signal SIGABRT frame #0: 0x00000008006722bc libthr.so.3`___lldb_unnamed_symbol732 + 92 frame #1: 0x000000080066f91f libthr.so.3`___lldb_unnamed_symbol701 + 623 frame #2: 0x00000008023d80fb r600_dri.so`___lldb_unnamed_symbol10178 + 43 frame #3: 0x0000000802355352 r600_dri.so`___lldb_unnamed_symbol9104 + 50 frame #4: 0x00000008023d8440 r600_dri.so`___lldb_unnamed_symbol10179 + 672 frame #5: 0x0000000800662fd4 libthr.so.3`___lldb_unnamed_symbol538 + 324 After downgrading mesa packages to 22.2.3 (reverted commit 855947ebf7e738232a8bbf6d47cc56f2896f276f), everything is back to normal again, glxinfo shows: | OpenGL renderer string: AMD SUMO (DRM 2.50.0 / 13.1-RELEASE-p5, LLVM 13.0.1) (In reply to Graham Perrin from comment #7) > > … Xorg crashes on startup trying to load the radeon driver with a > > segmentation fault. … > > Here [...] the radeonkms module loads without difficulty. I was talking about the Xorg radeon driver (radeon_drv.so from x11-drivers/xf86-video-ati), not the kernel module. The kernel module seems unrelated here. (In reply to Felix Palmen from comment #9) Does it help to rebuild and force reinstall all xorg ports ? (just wondering if there is some abi break in mesa somewhere in the radeon dri driver). A few tests that one should do : - Test kmscube under the console - Compile mesa from the git branch 22.3.0 with the patches from the ports tree applied so one will have a better stacktrace. Sorry I can't help more but I don't have affected hardware. Cheers, Created attachment 238746 [details] backtrace comment #8 is wrong :) Mk/Uses/meson.mk is working correctly. The backtrace is wrong because it reads what 22.3.0 dumped with the 22.2.3 library alignment. The correct backtrace is attached. (In reply to Tatsuki Makino from comment #12) Isn't this a problem on part of llvm15? (In reply to Tatsuki Makino from comment #13) You can always test to rebuild mesa-dri 22.3 and changing to 13 here : https://cgit.freebsd.org/ports/tree/graphics/mesa-dri/Makefile.common#n86 (In reply to Emmanuel Vadot from comment #14) The debug version compiled with llvm13 fails startx with the same result as attachment 238746 [details]. I guess it is not related to the llvm version change. Same here with REDWOOD, FirePro V4800, two screens, 13.1-RELEASE. I also get segfaults with mesa 22.3.x, both Xorg and wayland crash immediately. Did not get any useful debug stacktrace, only the crash handler of Xorg shows up. Neither does the Xorg log tell anything, it just says segfault. As a hint for testing, I can provoke the crash with mesa-22.2.3 and mesa-devel installed additionally. Which is easier to test and bisect as there are no dependencies, and reverting it is just "pkg delete mesa-devel". I was able to bisect the crash in my case to this commit in mesa: https://gitlab.freedesktop.org/mesa/mesa/-/commit/7662a5e9d34515bd44a97b3726490f31490b57c6 Unfortunately I don't know what to make of it. The commit just removes some obsolete functionality, which may trigger a different path or leave something uninitialized. Furthermore, there's something which broke rendering prior to this commit - with the screen just left blank. Couldn't bisect that one yet. (In reply to Emmanuel Vadot from comment #11) > Does it help to rebuild and force reinstall all xorg ports ? (just wondering > if there is some abi break in mesa somewhere in the radeon dri driver). I started a bulk build of my whole ports list into a separate "set" with package fetching disabled. Will install this on my desktop with 'pkg upgrade -f'. So, I'll be able to answer this question in a few days... Thanks everyone else for the debugging work so far! (In reply to Florian Walpen from comment #16) For the records, the problem with the blank screen seems to be unrelated and was introduced by commit https://gitlab.freedesktop.org/mesa/mesa/-/commit/dfbb4b384aa93160f1baa3497c35d82f2b7dcbc0 The file in question isn't there anymore in current mesa, so this bug may already be fixed (but I cannot test due to the segfault). Created attachment 238895 [details]
Backtrace with mesa-devel-22.3.b.2118
I was able to get a proper stacktrace with mesa-devel-22.3.b.2118, see attachment. It's exactly like the one by Tatsuki Makino, so we're definitely looking at the same problem here.
Created attachment 238898 [details]
Valgrind log of Xorg segfault, verbose.
Also I was able to get a trace of this in valgrind, attached here. It's verbose and kind of lengthy, search for "Invalid read".
Backtrace is the same as the others, and the read address matches what I see in the core dump. What is interesting is the "Use of uninitialized value" just before the invalid read, at the same place in the stack. It suggests that the invalid read is a consequence of operating on uninitialized data, in the red-black-tree implementation of C++ std::set.
Unless our standard library is seriously broken, this looks like a case for upstream mesa to me. It's their internal data structures with a custom allocator. Did anybody already report it there?
Ah yes, my card is Caicos [Radeon HD 6450/7450/8450 / R5 230 OEM] :) (In reply to Florian Walpen from comment #19) There are the following differences between 22.2 and 22.3 for this section called sfn_optimizer.cpp:363:17. @@ -358,62 +354,148 @@ auto src = instr->psrc(0); auto dest = instr->dest(); - for (auto& i : instr->dest()->uses()) { + for (auto& i : dest->uses()) { /* SSA can always be propagated, registers only in the same block The change seems to be a result of the previous assignment to dest, but it may be important to call instr->dest() with for here. But I don't know about C++ :) Spanning multiple boot environments: % grep mesa-dri /var/log/messages Dec 17 15:10:51 mowa219-gjp4-8570p-freebsd pkg[53532]: mesa-dri upgraded: 22.2.3 -> 22.3.1 Dec 17 15:27:24 mowa219-gjp4-8570p-freebsd pkg[2866]: mesa-dri reinstalled: 22.3.1 -> 22.3.1 Dec 18 03:28:00 mowa219-gjp4-8570p-freebsd pkg[3421]: mesa-dri upgraded: 22.2.3 -> 22.3.0 Dec 18 03:34:15 mowa219-gjp4-8570p-freebsd pkg[2869]: mesa-dri upgraded: 22.3.0 -> 22.3.1 Dec 18 03:37:39 mowa219-gjp4-8570p-freebsd pkg[2926]: mesa-dri reinstalled: 22.3.1 -> 22.3.1 Dec 18 10:38:59 mowa219-gjp4-8570p-freebsd pkg[66195]: mesa-dri-22.3.1 deinstalled % Now: * xorg and xorg-server have a missing dependency (mesa-dri) * SDDM, KDE Plasma etc. work fine, for me, despite what's missing. ______________________________________________________________________________________________ Side note: in one of the recent boot environments, pkg-autoremove(8) identified mesa-dri as orphaned (whilst xorg and xorg-server were installed). Misidentification, I guess, but I don't intend to make a bug report for this unless someone directs me to do so. (In reply to Emmanuel Vadot from comment #11) > Does it help to rebuild and force reinstall all xorg ports ? (just wondering > if there is some abi break in mesa somewhere in the radeon dri driver). Unfortunately no luck with that. Just to be sure, rebuilt and reinstalled ALL ports and the result is Xorg segfaulting on startup... Yes, this looks very much like an upstream bug. I didn't report yet, but tried to find existing reports, because I'd assume Linux to be affected as well, but couldn't find anything so far. Could one of you who already obtained meaningful stacktraces do it please? @Tatsuki Makino > There are the following differences between 22.2 and 22.3 for this section called sfn_optimizer.cpp:363:17. These shouldn't have any effect on the compiled binary, as dest() is inlined and multiple calls will be optimized out. But thanks anyway :-) @Felix Palmen > Yes, this looks very much like an upstream bug. I didn't report yet, but tried to find existing reports, because I'd assume Linux to be affected as well, but couldn't find anything so far. That makes it a bit suspicious, I'd also expect this to happen on Linux too. But I'm out of ideas. > Could one of you who already obtained meaningful stacktraces do it please? I'll get to it later this evening. *** Bug 268451 has been marked as a duplicate of this bug. *** Reverting to mesa 22.2.3 fixed the problem for me. Created attachment 238934 [details] Xorg.0.log.old (In reply to Graham Perrin from comment #22) Crash with mesa-devel-22.3.b.2234 in lieu of mesa-dri. I have the .core file, however this is not useful: … Core was generated by `/usr/local/libexec/Xorg :0 -auth /root/.serverauth.71358'. Program terminated with signal SIGABRT, Aborted. Sent by thr_kill() from pid 71371 and user 0. #0 0x000000082a022daa in ?? () [Current thread is 1 (LWP 100835)] (gdb) bt #0 0x000000082a022daa in ?? () #1 0x0000000829f915f4 in ?? () #2 0x00000000000189e3 in ?? () #3 0xc41f90ba55d83a4c in ?? () #4 0x0000000820798854 in ?? () #5 0x0000000820799170 in ?? () #6 0x0000000820798870 in ?? () #7 0x000000082a04e659 in ?? () #8 0xc41f90ba55d83a4c in ?? () #9 0x0000000000000000 in ?? () (gdb) q % For now, I'm back to SDDM and Plasma with neither mesa-dri nor mesa-devel. Created attachment 238935 [details] backtrace from lldb (In reply to Graham Perrin from comment #28) lldb pkg info -x mesa uname -aKU Outputs attached. Backtrace is from the .core at the time of using mesa-devel-22.3.b.2234 in lieu of mesa-dri. Upstream issue: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7931 Feel free to chime in if I forgot something. (In reply to Graham Perrin from comment #29) You need to have mesa-devel with debug symbols _installed_ to get a meaningful stacktrace. Unless I'm misinterpreting what you are doing. > For now, I'm back to SDDM and Plasma with neither mesa-dri nor mesa-devel. Didn't know that this works, what does kinfocenter say about the graphics? I suppose it's in some kind of framebuffer mode then? Seems to be fixed upstream, thanks for reporting there: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7931 Created attachment 238945 [details]
Add upstream patch to the graphics/mesa-dri port.
Patch to add the upstream patch to the graphics/mesa-dri port. Apply with "git am".
This fixed the issue for me, please test.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/ports/commit/?id=ad2ced80deaee8815b72d927a8e6d2b687b4864b commit ad2ced80deaee8815b72d927a8e6d2b687b4864b Author: Florian Walpen <dev@submerge.ch> AuthorDate: 2022-12-20 16:55:02 +0000 Commit: Emmanuel Vadot <manu@FreeBSD.org> CommitDate: 2022-12-20 17:16:51 +0000 graphics/mesa-dri: Fix a crash for radeon r600 graphic cards. Add an upstream patch to fix an immediate crash of Xorg and wayland on systems with radeon r600 based graphic cards. See: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7931 PR: 268327 graphics/mesa-dri/Makefile | 1 + ...src_gallium_drivers_r600_sfn_sfn__optimizer.cpp (new) | 16 ++++++++++++++++ 2 files changed, 17 insertions(+) (In reply to commit-hook from comment #34) When mesa-dri with this commit is used, Xorg can be started. Thank you very much. Created attachment 238950 [details] Screenshot: Plasma with neither mesa-dri nor mesa-devel (In reply to Florian Walpen from comment #31) >> … SDDM and Plasma with neither mesa-dri nor mesa-devel. There was a system tray notification: about software rendering, and the possibility of degradation. I sensed no degradation. > … what does kinfocenter say about the graphics? … OpenGL (EGL) information captured in the attached screenshot. Created attachment 238951 [details] Graphics information from Info Center in Plasma (In reply to commit-hook from comment #34) Fixed for me. AMD TURKS % pciconf -lv | grep -B4 VGA vgapci0@pci0:1:0:0: class=0x030000 rev=0x00 hdr=0x00 vendor=0x1002 device=0x6841 subvendor=0x103c subdevice=0x17a9 vendor = 'Advanced Micro Devices, Inc. [AMD/ATI]' device = 'Thames [Radeon HD 7550M/7570M/7650M]' class = display subclass = VGA % pkg info -x mesa-dri mesa-dri-22.3.1_1 % (In reply to commit-hook from comment #34) With this update, the crash is fixed for me. (I am using xf86-video-ati; see duplicate bug #268451.) Thanks! (In reply to Graham Perrin from comment #36) > There was a system tray notification: about software rendering, and the possibility of degradation. > > I sensed no degradation. Thanks for the infos - arguably it should be slower than HW acceleration... I thought that KWin would fallback to mesa llvmpipe, but obviously there's another graphics backend involved here. Anyway, hopefully this workaround isn't needed anymore. (In reply to Florian Walpen from comment #33) > Patch to add the upstream patch to the graphics/mesa-dri port. Are you sure this is complete? Just asking because the upstream issue has a few other commits linked as well.... but then: > This fixed the issue for me, please test. Everything I tested on my machine is fine again! (Output of glxinfo, glxgears, compositing with picom using glx backend, h.264 video rendering using vaapi ...) With all the other confirmations, time to close? (In reply to Felix Palmen from comment #40) I think so. Thanks a lot everyone for bisecting and reporting upstream. (In reply to Felix Palmen from comment #40) > Are you sure this is complete? Just asking because the upstream issue has a few other commits linked as well.... There were multiple attempts and some rebases, but AFAICT only this one commit was merged. And it fixes exactly the issue he identified: Deleting an entry from the C++ std::set that our iterator currently points to. This invalidates the iterator. The fix is to increment the iterator before deletion, which is correct according to specs, see https://en.cppreference.com/w/cpp/container/set/erase (In reply to Florian Walpen from comment #42) As I said, everything seems to work correctly here, I was just worried some weird edge case might be missing. So, thanks a lot for your work, and kudos for your C++ expertise! :) *** Bug 268654 has been marked as a duplicate of this bug. *** |