Bug 253801 - graphics/drm-fbsd13-kmod panic when resuming from sleep
Summary: graphics/drm-fbsd13-kmod panic when resuming from sleep
Status: Open
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: amd64 Any
: --- Affects Many People
Assignee: freebsd-x11 (Nobody)
URL:
Keywords: crash, i915, panic
Depends on:
Blocks:
 
Reported: 2021-02-23 17:39 UTC by Patricio Villar
Modified: 2021-07-22 02:10 UTC (History)
6 users (show)

See Also:
bugzilla: maintainer-feedback? (x11)


Attachments
photo showing interesting console errors (178.31 KB, image/jpeg)
2021-02-23 17:39 UTC, Patricio Villar
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Patricio Villar 2021-02-23 17:39:50 UTC
Created attachment 222765 [details]
photo showing interesting console errors

OS: FreeBSD 13.0-BETA3
Graphics: Intel HD Graphics 5500
Driver: modesetting

Sometimes, graphics won't work after waking up from suspend. It shows:
"drmn0: Failed to idle engines, declaring wedged!"
Note this doesn't cause a panic-reboot.

Other times, it seems to resume just fine, but triggers a panic after a few seconds of use. It either logs me out of my running session, or instantly reboots my system.

These issues don't happen all the time. Actually, I'm not sure how to consistently reproduce them :(
But as far as I can tell, it's been happening since at least 13.0-ALPHA3.

Possibly a duplicate of:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253797
Comment 1 Dave Cottlehuber freebsd_committer 2021-03-06 08:38:42 UTC
I see a similar issue here, just 100% reproducible (0 successful resumes out of more than 11). 13.0-BETA4. No visible panic, just black screen from startup, but you can tell from the heat of the laptop that its locked up :).

Intel HD Graphics 620 (Dell XPS 13)
- https://wiki.freebsd.org/Laptops/Dell_XPS13_9360 is the exact laptop
- dmesg, acpiconf, sysctl, loader.conf etc https://gist.github.com/dch/463caaaf723eabf84cf678b618b2d206

tried with & without following loader.conf tunables, and including below patch:
# https://github.com/freebsd/drm-kmod/issues/14
hw.i915kms.enable_psr=0
# cpu/gpu power tuning
drm.i915.enable_rc6=7
kern.hz=100
debug.cpufreq.lowest=600
hint.p4tcc.0.disabled="1"
hint.acpi_throttle.0.disabled="1"
Comment 2 Patricio Villar 2021-03-14 17:38:21 UTC
Since 13.0-RC1 came out, this bug has only happened to me once. So at least I can say its occurrence has been greatly reduced.
Does this mean it is not really a drm-kmod bug but a base system issue? Given the drm port hasn't been updated/modified at all in the meantime...
Comment 3 Dave Cottlehuber freebsd_committer 2021-03-16 09:53:21 UTC
I still get 100% hangs, now on 13.0-RC2.

I've needed to update my tunables, they were from an earlier port version, but that makes no difference to the outcome.

hw.i915kms.enable_dc=0
hw.i915kms.reset=1
kern.hz=100
debug.cpufreq.lowest=600
hint.p4tcc.0.disabled=1
hint.acpi_throttle.0.disabled=1
Comment 4 Patricio Villar 2021-03-16 12:50:55 UTC
(In reply to Dave Cottlehuber from comment #3)
Mmm that's a shame. But I think yours is rather a different problem actually, different symptoms.
Comment 5 Patricio Villar 2021-04-02 14:44:59 UTC
13.0-RC2 and RC3 were nearly flawless and this issue became really hard to see.
Unfortunately, 13.0-RC4 brought a big regression in this regard. It now happens most of the time!

At least, I think I was able to understand a little bit more:
When resuming, if I see this console message: "drmn0: Failed to idle engines, declaring wedged!" then it completely fails to resume X.

Whereas, when the above line doesn't appear and this other line still shows: "i915 raw-wakerefs=3 wakelocks=3 on cleanup" then it looks like all is fine for a few seconds, but after a while, Xorg is killed and restarted all of a sudden and I'm right back at my display manager (SDDM).

Any ideas?
Comment 6 Patricio Villar 2021-04-05 19:33:05 UTC
Same with 13.0-RC5...
I'd really like to see this issue go away before 13.0-RELEASE, otherwise it will be a real disappointment to have to deal with this bug potentially till 13.1-RELEASE :(
Comment 7 Graham Perrin 2021-04-15 16:38:29 UTC
How are things with graphics/drm-fbsd13-kmod 5.4.92.g20210202 on 13.0-RELEASE?
Comment 8 Patricio Villar 2021-04-15 19:16:50 UTC
(In reply to Graham Perrin from comment #7)
Hi Graham, fortunately things have been working really great since I updated to 13.0-RELEASE!!
The reason I hadn't reported it yet was because I wanted to test some more to be really sure all was indeed fine.
So far so good!
Comment 9 Dave Cottlehuber freebsd_committer 2021-04-16 06:12:43 UTC
For me, I see no changes. 100% reliable on 12.2R (since 11.something even, IIRC). And on 13.0R no successful resumes at all.
Comment 10 Patricio Villar 2021-04-16 16:57:29 UTC
I'm afraid I spoke too soon :(

On FreeBSD 13.0-RELEASE, resuming from sleep is behaving the way it did around 13.0-RC2/RC3 for me. This is, it works most of the time, but still fails once in a while. Unlike 12.2-RELEASE, where it just works all of the time...

Anyway, I discovered that when I get the dreaded "drmn0: Failed to idle engines, declaring wedged!" message, I can:

kldunload i915kms; kldload i915kms; service sddm restart

to get back X, at the cost of losing all my unsaved open files and stuff. At least this way I avoid having to reboot my system.
Comment 11 Alexey Dokuchaev freebsd_committer 2021-04-17 08:03:53 UTC
(In reply to Patricio Villar from comment #10)
> This is, [resuming from sleep] works most of the time, but still fails once
> in a while.  Unlike 12.2-RELEASE, where it just works all of the time...
Could it be that back when you were on 12.2, you also had different version of `graphics/gpu-firmware-kmod' port installed?

My i5-7200U-based (Kaby Lake, HD620) laptop stopped resuming reliably after big DRM-related ports update which I did in January.  At first I've suspected `graphics/drm-current-kmod' port causes this (they've recently started to follow Linux 5.4 and I've seen people reported a handful of regressions compared to 4.16), so I've iteratively tried every port revision down to drm-current-kmod-4.16.g20200320 (r548207) which definitely worked before, but the resume was still broken.  Then I've downgraded the firmware port to gpu-firmware-kmod-g20200130 (r524664) and resume become reliable again.

It's kind of strange, as commit logs mention only changes related to AMD chips, but you might still wanna try to downgrade the firmware package and see if it makes a difference.
Comment 12 Patricio Villar 2021-04-17 15:15:43 UTC
Thanks you, but just downgrading gpu-firmware-kmod didn't make a difference for me (I kept drm-kmod 5.4.92.g20210202).
As I said, on 13.0-RELEASE it's definitely harder to trigger this panic compared to other 13.0-* builds, but it still happens from time to time. It's not 100% reliable unlike previous major release.
Comment 13 Patricio Villar 2021-04-17 15:34:59 UTC
Oh I forgot to add that I usually do around 5-6 suspend/resume cycles for it to happen BTW
Comment 14 Patricio Villar 2021-05-03 22:08:31 UTC
UPDATE: I think I finally made some progress to workaround this problem:

I used to have a little devd script that locked my screen when resuming from sleep. It never did any harm on FreeBSD 11/12 so I didn't think it could be problematic now, but it turns out it may have been the culprit... After all, something definitely changed on 13.0.

What I did was modify the moment when this script was executed: instead of becoming active at resume, it is now triggered when closing the lid (which means, before suspend).

I don't want to assume this is the ultimate solution or something, especially since so many assumptions have been made to this point, but so far it REALLY seems to have fixed this issue for me.

I've had about 20+ successful and uninterrupted (this is, without shutting down or rebooting in the middle) suspend/resume cycles without any issue. And I've really tried to make it difficult to succeed by having multiple programs running at the same time and stuff like that.

I don't know if some of you too have some kind of custom scripts that get executed at resume time, but if that's the case, I encourage you to switch them to run while going to sleep instead, or even drop them altogether, and see if it makes a change for you.
Comment 15 Maxnix 2021-06-27 16:48:12 UTC
Hello all. I'd like to report my experience as well:

OS: FreeBSD 13.0-RELEASE
Grpahics: Intel HD Graphics 4600
Driver: Modesetting

Initially I experienced the crash of my Xorg session whenever there was my conky+dzen2 bar running.

Investigating the problem I noticed that even suspending while having Firefox running, or VLC playing (even paused) a video Xorg crashed.

The common factor among dzen Firefox and VLC seemed OpenGL, so I tried this steps to test and workaround the problem:

1. Since I use a script to lock my session at suspend too (launched by /etc/rc.suspend), I added an if statement that terminate dzen2 before suspending if running. The script that locks my session launches it back after the resume and unlock.

2. Disabled Firefox hardware acceleration.

3. Tried to suspend while VLC was playing a video using the XCB (NOT XCB-X11) output.

This led me to successfull suspend/resume cycles so far, so OpenGL use by a running program seems the culprit of the crash.

HTH