Bug 274770 - graphics/drm-515-kmod - 6750XT panic / hard crashes AMDGPU
Summary: graphics/drm-515-kmod - 6750XT panic / hard crashes AMDGPU
Status: New
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-x11 (Nobody)
URL:
Keywords: regression
Depends on:
Blocks:
 
Reported: 2023-10-28 13:26 UTC by Clay Ayers
Modified: 2024-03-21 20:41 UTC (History)
13 users (show)

See Also:
bugzilla: maintainer-feedback? (x11)


Attachments
dmesg (83.35 KB, text/plain)
2023-10-28 13:26 UTC, Clay Ayers
no flags Details
Core Crash Dump (273.35 KB, text/plain)
2023-10-28 14:51 UTC, Clay Ayers
no flags Details
Latest Core Dump (241.82 KB, application/x-troff-man)
2023-11-05 13:18 UTC, Clay Ayers
no flags Details
Core Dump - Unplug Monitor (175.83 KB, application/x-troff-man)
2023-11-07 17:47 UTC, Clay Ayers
no flags Details
/var/log/messages for black screen freeze (621.19 KB, text/plain)
2024-01-29 12:21 UTC, illegalcoding
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Clay Ayers 2023-10-28 13:26:42 UTC
Created attachment 245940 [details]
dmesg

FreeBSD 14.0-RC2 drm-515-kmod-5.15.118 ever since this version of drm-515-kmod I've had full system lockups where my monitor goes off and does not come back up. I've also had an issue where it reboots my system. I was not doing anything but running x windows with cwm and picom. just idling. My X0rg.log has no relevant information, so what logs do I need to share for this?
Comment 1 Clay Ayers 2023-10-28 14:51:29 UTC
Created attachment 245946 [details]
Core Crash Dump

I turned on core dumps and here is what happened. I was using firefox and it restarted.
Comment 2 Jan Beich freebsd_committer freebsd_triage 2023-10-28 15:03:33 UTC
(In reply to Clay Ayers from comment #0)
> drm-515-kmod-5.15.118 ever since this version

Do you mean drm-515-kmod before ports e14404ac73e7 works fine? If so add "regression" to Keywords field in bugzilla.
Comment 3 Clay Ayers 2023-10-28 18:23:45 UTC
(In reply to Jan Beich from comment #2)
I mean this one. 	e97a9e1fca316c785ecd436a51d1a1bd026ed23a

https://cgit.freebsd.org/ports/commit/?id=e97a9e1fca316c785ecd436a51d1a1bd026ed23a

It was working fine before the update (freebsd-drm-kmod-5.15.118-drm_v5.15.118_2_GH0.tar.gz) that I got from a pkg upgrade.

Do I still need to add regression?
Comment 4 Clay Ayers 2023-11-01 02:33:14 UTC
So far I haven't had the crash since disabling picom. I will test some 3d applications like Quake and Blender to see if they crash it or if it's just picom doing a crash.
Comment 5 Vladimir Kondratyev freebsd_committer freebsd_triage 2023-11-01 11:44:52 UTC
You may try drm-kmod 5.15 with Ubuntu20 drm backports and dma-buf reverted to GPL code: https://github.com/wulf7/drm-kmod.git branch 5.15-lts-focal.
It works quite stable on bunch of green_sardine and yellow_carp laptops.
Comment 6 Clay Ayers 2023-11-05 13:17:33 UTC
So this looks like a bug with Picom with graphics/drm-515-kmod because it only happens when I'm running Picom. I have not been able to make the crash happen whilst !picom. Using Blender and 3d games, FreeBSD is stable. Using Picom and in an undetermined time / usage it hard crashes and reset my FreeBSD. I am unsure how to get good logs for this.
Comment 7 Clay Ayers 2023-11-05 13:18:31 UTC
Created attachment 246139 [details]
Latest Core Dump

Latest coredump on RC4, Picom running was what crashed it afaik.
Comment 8 Clay Ayers 2023-11-07 17:47:09 UTC
Created attachment 246182 [details]
Core Dump - Unplug Monitor

I unplugged my Wacom One monitor while in Console Mode (I was not running X) and it restarted my system. So it looks like this is not a Picom problem after all. And the Crash Core Dump says it's drm-515-kmod from what I can tell.
Comment 9 Clay Ayers 2023-11-07 17:52:46 UTC
Cross reported here https://github.com/freebsd/drm-kmod/issues/263
Comment 10 Clay Ayers 2023-11-08 18:08:10 UTC
Resizing Luakit also crashes my system. Do I need to post more of my core dumps?
Comment 11 Clay Ayers 2023-11-18 19:27:11 UTC
(In reply to Vladimir Kondratyev from comment #5)
I am now testing the 5.15-lts branch. Thank you for the suggestion. I manually replaced my .ko files after compiling, I hope that works.
Comment 12 Clay Ayers 2023-11-20 20:59:31 UTC
After doing what Vladimir suggested and installing the 515-lts branch, I can report stability for 2 solid days now with no crashes. 

This:
https://github.com/wulf7/drm-kmod/tree/5.15-lts

Is what I am referring to.

It has solved this problem for now, but how can I identify what broke it in the current port?
Comment 13 Vladimir Kondratyev freebsd_committer freebsd_triage 2023-11-22 08:57:48 UTC
(In reply to Clay Ayers from comment #12)

You are running a wrong branch. It has memory leak which hides one known dma-buf bug at the cost of multi-gigabyte per day memory loss.

The right branch is: https://github.com/wulf7/drm-kmod/tree/5.15-lts-focal

It also has a lot of external monitor support fixes for recent amdgpu display cores backported through Ubuntu 20.

Unfortunately, I am not able reproduce aforementioned bug, so this branch fixed it with replacing of BSD-licensed part of dma-buf with GPL-ed code.
Comment 14 Marek Krzywdziński 2023-11-22 16:59:42 UTC
I am able to trigger each time hard reboot (crash) with AMD Radeon RX 580 and X11 with two screens connected using DisplayPort and HDMI. System is stable with only single screen connected to either DP or HDMI. By the way, in this case screen resolution is bigger on DP.
Comment 15 Stefan Schwarzer 2023-11-28 17:53:40 UTC
I Can confirm that Comment #14 can be fixed with the driver from  Comment #13. I had the same problem (6800xt) that completely vanished now.
Comment 16 Clay Ayers 2024-01-15 02:30:41 UTC
(In reply to Stefan Schwarzer from comment #15)
I can confirm this as well. It's been pretty stable for me, although I suspect a memory leak as I think I was getting lockups after leaving X11 on for days. At any rate I'm testing the new ones that just dropped today. gpu-firmware-amd-kmod-navy-flounder-20230625.pkg
Comment 17 Stefan Schwarzer 2024-01-17 06:51:09 UTC
(In reply to Clay Ayers from comment #16)
I'm Testing that too, for now it seems to be stable, usually i got crashes within one hour. It was stable for the whole day yesterday.
Comment 18 illegalcoding 2024-01-29 12:21:58 UTC
Created attachment 248059 [details]
/var/log/messages for black screen freeze
Comment 19 illegalcoding 2024-01-29 12:26:23 UTC
I am facing this same issue (with an AMD RX 6600 and 1 DP and 1 HDMI monitor), and compiling the 515-lts-focal drm-kmod seems to have helped, but now I'm getting weird lock ups where both screens go black but the OS doesn't panic or anything. It just starts with everything freezing (but I can still move my cursor) and then eventually both my displays go black (I'm pretty sure sound still works though). I have to hard reset my PC to get it working again.

There's some interesting stuff I saw in my /var/log/messages, so I have attached that.

P.S:
It actually crashed while I was writing this too, so they seem to be getting more and more frequent :/
Comment 20 Alexander Vereeken 2024-02-08 06:17:32 UTC
Hello,

i can confirm with an RX 6700 XT aswell that the panics are now away by using the branch https://github.com/wulf7/drm-kmod/tree/5.15-lts-focal.

Are the plans that something like that, gets pulled to the main tree?
Comment 21 Vladimir Kondratyev freebsd_committer freebsd_triage 2024-02-09 21:55:00 UTC
(In reply to Alexander Vereeken from comment #20)

> Are the plans that something like that, gets pulled to the main tree?
Yes. I'll do a MR soon
Comment 22 Alexander Vereeken 2024-02-10 10:57:24 UTC
(In reply to Vladimir Kondratyev from comment #21)

Nice, thank you!
Comment 23 Vladimir Kondratyev freebsd_committer freebsd_triage 2024-02-11 13:11:19 UTC
(In reply to Alexander Vereeken from comment #22)

>Nice, thank you!
See: https://github.com/freebsd/drm-kmod/pull/285

Requires some testing.
Comment 24 Andrew 2024-02-18 23:43:18 UTC
I have problems with Radeon RX 6600 after recent updates. I run FreeBSD 14.0 RELEASE-p5 and latest packages. Before the updates the card worked just fine. Now startx gives me Segmentation fault at address 0x2

Does it make sense for me to try https://github.com/wulf7/drm-kmod/tree/5.15-lts-focal? If yes what exactly do I need to build and how to deploy the new binaries?
Comment 25 Yasushi Hayakawa 2024-02-21 15:19:11 UTC
(In reply to Andrew from comment #24)

I had the same problem with RX6600M on 14.0-RELEASE. In my case it was caused by mesa-devel, not drm-kmod. After deinstallation of mesa-devel package, Xserver works fine again using 5.15-lts-focal.
Comment 26 Alexander Vereeken 2024-02-21 20:41:14 UTC
Hello,

(In reply to Yasushi Hayakawa from comment #25)

if you mean starting up then its not related to drm-kmod.

I had reported this problem but never got looked into it correctly (PR 271136)

However the quick hammer fix is:

rm -r -d /usr/local/lib/dri && cp -r '/usr/local/lib/dri-devel' '/usr/local/lib/dri'
Comment 27 Alexander Vereeken 2024-02-21 20:43:21 UTC
(In reply to Alexander Vereeken from comment #26)

To revert this: pkg install -f mesa-dri
Comment 28 Yasushi Hayakawa 2024-02-22 00:10:30 UTC
(In reply to Alexander Vereeken from comment #26)

Thank you for your suggestion. 
I have no problem with stable mesa-dri package, and I can use 32-bit applications on recent WINE 8.02 package without mesa-devel.
Comment 29 Andrew 2024-02-22 18:40:28 UTC
(In reply to Yasushi Hayakawa from comment #25)
How exactly can I try the 515-lts-focal branch? I cloned the repo, switched to the branch, built from the root and got a bunch of *.ko files:

andrew@obama:drm-kmod$ find . -name \*.ko
./ttm/ttm.ko
./i915/i915kms.ko
./dmabuf/dmabuf.ko
./amd/amdgpu/amdgpu.ko
./radeon/radeonkms.ko
./drm/drm.ko

Do I just copy them over to /boot/modules? What about a bunch of amdgpu_dimgrey_cavefish*.ko files?
Comment 30 Andrew 2024-02-22 23:40:06 UTC
(In reply to Andrew from comment #29)
Or do I replace /usr/ports/graphics/drm-515-kmod with files from https://github.com/wulf7/drm-kmod/tree/5.15-lts-focal and use regular ports commands to build and install?
Comment 31 Andrew 2024-02-23 00:00:33 UTC
(In reply to Andrew from comment #30)
Tried:

make
make install

Got the same error from X11:

...
[    41.095] (II) LoadModule: "amdgpu"
[    41.095] (WW) Warning, couldn't open module amdgpu
[    41.095] (EE) Failed to load module "amdgpu" (module does not exist, 0)
[    41.095] (EE) No drivers available.
...

For comparison tried to reinstall the offical port graphics/drm-515-kmod - the same error.
Comment 32 Son Phan Trung 2024-02-27 12:35:48 UTC
(In reply to Andrew from comment #29)

> What about a bunch of amdgpu_dimgrey_cavefish*.ko files?

Well, the files are treated differently from the main drm package, it's rather the gpu-firmware-kmod package that handles them (specifically gpu-firmware-amd-kmod-dimgrey-cavefish package)
Comment 33 Alexander Vereeken 2024-03-14 19:02:26 UTC
(In reply to Vladimir Kondratyev from comment #23)

I can confirm that this branch is stable.

No more kernel panics since i do use this branch.
Comment 34 Spíosra 2024-03-21 20:41:10 UTC
Experiencing similar issues with my AMD RX 6800 XT. System will suddenly reboot. It seems more predictable when WebRTC-realated software is involved (Discord or Google Meet on Firefox, Chromium, etc.). No interesting logs. Very curious.

None of 5.15-lts-focal or downgrading drm_kmod to 5.10 fixes it for me.