Bug 257786 - AMD GPU freeze with the new g20210330 gpu firmwares
Summary: AMD GPU freeze with the new g20210330 gpu firmwares
Status: New
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: amd64 Any
: --- Affects Some People
Assignee: Johannes M Dieterich
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-08-12 14:09 UTC by Ali Abdallah
Modified: 2021-10-06 10:18 UTC (History)
7 users (show)

See Also:


Attachments
Attaching the relevant part of dmesg when the problem had occured. (26.32 KB, text/plain)
2021-08-12 14:09 UTC, Ali Abdallah
no flags Details
pciconf -lvb full output (11.69 KB, text/plain)
2021-08-12 14:31 UTC, Ali Abdallah
no flags Details
Update amdgpu firmwares to the latest (1.34 KB, patch)
2021-08-12 17:40 UTC, Jung-uk Kim
no flags Details | Diff
Update amdgpu firmwares to the latest (4.23 KB, patch)
2021-08-12 18:33 UTC, Jung-uk Kim
no flags Details | Diff
Update amdgpu firmwares to 20210716 (4.23 KB, patch)
2021-08-12 21:11 UTC, Jung-uk Kim
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ali Abdallah 2021-08-12 14:09:00 UTC
I recently switched my port tree from 2021Q2 to 2021Q3, after updating
and rebooting my FreeBSD 13.0 system, I started to notice random system
freeze, I can ssh to the frozen system, and from dmesg I see:

---
Aug  4 08:58:51 Fryzen495 kernel: drmn0: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process  pid 100349 thread  pid 100349)
Aug  4 08:58:51 Fryzen495 kernel: drmn0:   in page starting at address 0x000080012c3f0000 from client 27
Aug  4 08:58:51 Fryzen495 kernel: drmn0: VM_L2_PROTECTION_FAULT_STATUS:0x00141051
Aug  4 08:58:51 Fryzen495 kernel: drmn0:      MORE_FAULTS: 0x1
Aug  4 08:58:51 Fryzen495 kernel: drmn0:      WALKER_ERROR: 0x0
Aug  4 08:58:51 Fryzen495 kernel: drmn0:      PERMISSION_FAULTS: 0x5
Aug  4 08:58:51 Fryzen495 kernel: drmn0:      MAPPING_ERROR: 0x0
Aug  4 08:58:51 Fryzen495 kernel: drmn0:      RW: 0x1
---

The only thing seemed relevant for me between 2021Q2 and 2021Q3 is the
newer GPU firmware g20210330 versus g20210224. I downgraded to
g20210224, rebooted the system, and it is running stable as before.

My system is a Thinkpad T495 with Picasso GPU. Please don't hesitate to ask for more information.
Comment 1 Ali Abdallah 2021-08-12 14:09:44 UTC
Created attachment 227129 [details]
Attaching the relevant part of dmesg when the problem had occured.
Comment 2 Ali Abdallah 2021-08-12 14:31:51 UTC
Created attachment 227131 [details]
pciconf -lvb full output
Comment 4 Jung-uk Kim freebsd_committer 2021-08-12 17:40:38 UTC
Created attachment 227136 [details]
Update amdgpu firmwares to the latest

Please try this patch.
Comment 5 Jung-uk Kim freebsd_committer 2021-08-12 18:33:52 UTC
Created attachment 227137 [details]
Update amdgpu firmwares to the latest

Sorry, the previous patch was incomplete.  Please try this instead.
Comment 6 Jung-uk Kim freebsd_committer 2021-08-12 18:39:34 UTC
(In reply to Jung-uk Kim from comment #5)
Actually, this firmware caused problem for my Picasso platform, i.e., Ryzen 5 3500U, and the firmware in the ports tree is working just fine. :-(
Comment 7 Jung-uk Kim freebsd_committer 2021-08-12 21:07:27 UTC
(In reply to Jung-uk Kim from comment #6)
I tried every firmware release for amdgpu since 20210315.  20210511 and 20210716 worked fine for me.
Comment 8 Jung-uk Kim freebsd_committer 2021-08-12 21:11:40 UTC
Created attachment 227143 [details]
Update amdgpu firmwares to 20210716

This patch sync. amdgpu firmwares to linux-firmware-20210716.  Please try this too if possible.
Comment 9 Ali Abdallah 2021-08-13 06:37:58 UTC
I will give linux-firmware-20210716 a try and report back here.
Comment 10 Ali Abdallah 2021-08-25 15:33:32 UTC
I'm using linux-firmware-20210716 on my T495 since Thu Aug 19 18:05 without any issue so far.
Comment 11 Ali Abdallah 2021-09-01 09:42:06 UTC
(In reply to Ali Abdallah from comment #10)

Too fast, today I had the same crash with linux-firmware-20210716, going back to g20210224.
Comment 12 Evilham 2021-10-06 10:18:07 UTC
Hello, I had been having crashes related to graphics (thought it was this: https://github.com/freebsd/drm-kmod/issues/78 at first) since April.

A couple weeks ago I decided to try the patch in this PR that upgrades `graphics/drm-firmware-kmod` to 20210812 and with that my problem has been gone.

This is a Lenovo A485 with an AMD Ryzen 7 PRO 2700U w/ Radeon Vega Mobile Gfx.