Bug 263632 - framework laptop crashes a few seconds after resuming from sleep
Summary: framework laptop crashes a few seconds after resuming from sleep
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-04-28 16:19 UTC by Jonathan Vasquez
Modified: 2022-05-18 19:11 UTC (History)
3 users (show)

See Also:


Attachments
last dmesg (19.76 KB, text/plain)
2022-04-28 16:20 UTC, Jonathan Vasquez
no flags Details
recent dmesg (85.47 KB, text/plain)
2022-05-06 20:10 UTC, Jonathan Vasquez
no flags Details
dump3 (21.45 KB, text/plain)
2022-05-18 19:05 UTC, Jonathan Vasquez
no flags Details
graphics-1.jpg (991.54 KB, image/jpeg)
2022-05-18 19:10 UTC, Jonathan Vasquez
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jonathan Vasquez 2022-04-28 16:19:09 UTC
Hey all,

It seems that framework laptop crashes after it resumes from sleep. I'll land either in the SDDM window (or Plasma Desktop depending how fast I can type my password lol), and then a few seconds later it will crash (freeze completely).

I'm running FreeBSD 14.0-CURRENT #0 main-n255077-490a0f77de7. My /etc/rc.conf does have the blocklist for ng_ubt mentioned here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=260161

My system is also running ZFS so /var/crash reports are turned off, but the dmesg output (and /var/log/messages) seems to have some output that may help. It might be related to the i915 (drm-devel-kmod-5.7.19.g20220223) driver rather than bluetooth related.

- Jonathan
Comment 1 Jonathan Vasquez 2022-04-28 16:20:49 UTC
Created attachment 233563 [details]
last dmesg

Lines 242 - 318 may be relevant in the 'last dmesg' attachment.
Comment 2 Jonathan Vasquez 2022-04-28 16:21:37 UTC
Current: /etc/rc.conf

hostname="leslie"
dumpdev="NO"
zfs_enable="YES"

## [Start Other]

#wlans_rtwn0="wlan0"
#ifconfig_wlan0="inet 192.168.1.101 netmask 255.255.255.0 ssid Summerland WPA"
#ifconfig_wlan0="WPA SYNCDHCP"
#defaultrouter="192.168.1.1"

## [End Other]

kld_list="i915kms if_iwlwifi"
devmatch_blocklist="ng_ubt"

ifconfig_ue0="DHCP"

wlans_iwlwifi0="wlan0"
ifconfig_wlan0="WPA DHCP"
create_args_wlan0="regdomain FCC country US"

dbus_enable="YES"
sddm_enable="YES"
syncthing_enable="YES"

# Syncthing
syncthing_user="jon"
syncthing_group="jon"

webcamd_enable="YES"
Comment 3 Mark Johnston freebsd_committer 2022-04-28 16:27:42 UTC
Does the system fully crash, or is the display just frozen?  Can you ssh into the system while it's in that state?

> My /etc/rc.conf does have the blocklist for ng_ubt mentioned here

I think you don't need that anymore, but it shouldn't matter.
Comment 4 Graham Perrin 2022-04-28 17:08:09 UTC
(In reply to Jonathan Vasquez from comment #0)

> 490a0f77de7

Was drm-devel-kmod-5.7.19.g20220223 built (and installed) whilst running that version? Or installed as a package from the latest repo?
Comment 5 Jonathan Vasquez 2022-04-28 17:17:34 UTC
Hey Mark, Graham,

@Mark

Thanks for that. I tried to ssh into the system but it was completely down. (I connected before hand and verified I was able to actually ssh into the system, then put it to sleep, resumed, and wasn't able to connect at all.. it probably would have taken a few seconds for the network to come back up but by the time it would have been available, the system had already crashed).

Also sounds good about the "ng_ubt" situation, that makes sense since I didn't notice any USB enumeration delays necessarily during my boots recently so this issue would have been fixed a while ago when it was first reported.

@Graham

Funny you mentioned that. So I installed this system a few times over the past few days on 13.0-RELEASE, 13.1-RC4, and a few times on a FreeBSD 14-CURRENT snapshot (FreeBSD-14.0-CURRENT-amd64-20220421-b91a48693a5-254961) and when I had tried to do a `pkg install drm-devel-kmod`. A few days ago when I was on 14-CURRENT and installed drm-devel-kmod, it worked fine. However when I reinstalled it from the same snapshot on 14-CURRENT, the drm-devel-kmod package wouldn't install anymore, I believe it complained about the version number being too new (FreeBSD version specifically, I think it was incremented by 1). At that point I decided it probably would be better for me to just check out the latest CURRENT sources and rebuild world/kernel. Once I did that, I built drm-devel-kmod from source, and that's what I'm using atm :).
Comment 6 Jonathan Vasquez 2022-04-28 17:19:38 UTC
@Graham

Arg.. typo.

".. (FreeBSD-14.0-CURRENT-amd64-20220421-b91a48693a5-254961) and when I had tried to do a `pkg install drm-devel-kmod` it worked fine. However when I reinstalled it from the same snapshot on 14-CURRENT" .. **
Comment 7 Jonathan Vasquez 2022-04-28 20:09:11 UTC
Some minor updates. I originally said that I have dumps disabled since this system is running ZFS (and I remember there was an issue with ZFS + crash dumps). Checking my servers config, I noticed I had written a comment about this, which was that I disabled crash dumps because it caused issues when using a mirrored swap configuration on ZFS. Since my laptop isn't using mirrored swap, I re-enabled it (dumpdev="YES" (rather than "AUTO" or "NO)). The first few tries I tried to put the computer to sleep and then wake it, the computer didn't actually crash. I then switched dumpdev back to NO and the computer crashed within seconds. Switching it back to YES started crashing again so unfortunately that didn't lead to anything. However, I then tried to leave dumpdev enabled and attempted to try and get some crash dumps from the sleep wake. Since I originally installed FreeBSD with only 16 gb of physical swap space, and this machine had 32 gb, I received messages regarding their is no suitable dump device and that the save core is not run. To mitigate this, I tried to just create a 64 GB swap file but that didn't work either. Furthermore, trying to do a 'dumpon /dev/md0' yielded the following:

root@leslie:~ # dumpon -l
/dev/null
root@leslie:~ # dumpon /dev/md0
dumpon: ioctl(DIOCSKERNELDUMP): Operation not supported

I suppose this is related to swapfiles not being supported for crash dumps?

https://forums.freebsd.org/threads/how-to-enable-crash-dumps.84340/#post-559174

SirDice also mentioned that swap files probably can't be used for this. Since it was pretty time consuming to get my system up to this point, I don't want to do the work to reinstall the entire system just to realign the physical swap.. eventually I'll need to do it if I want to properly test out FreeBSD's hibernate support however.

To finish off, the few times that I did wake up the system and it didn't immediately crash, I did notice the following lines in dmesg, maybe this helps in some way:

drmn0: GPU HANG: ecode 12:1:85dffffb, in MainThread [100939]
drmn0: Resetting rcs0 for stopped heartbeat on rcs0
Comment 8 Jonathan Vasquez 2022-04-29 21:31:34 UTC
I decided to spend some time today to reinstall the system and have a proper physical swap partition so that we can get the core dumps (I wanted to do some other changes to my core partition layout anyways). I backed up my /usr/src and other relevant dirs/configs so that I could quickly get back up and running, this worked perfectly :). I'll be attempting to generate some core dumps for this and will post them most likely over the weekend.
Comment 9 Jonathan Vasquez 2022-05-02 18:53:35 UTC
Hey all, some minor updates regarding crash dumps after I successfully enabled core dump functionality. I had several crashes today regarding graphics / wireless) and I checked to see if there were any dumps generated, and there were none. I also tried to force save the core dumps as well but that yielded nothing. If anyone has any ideas of how I can get the core dumps after the crashes, let me know and I can retry to extract them from my box. I'm thinking that "kern-dbg" is something that is already provided by me doing a full compilation of CURRENT for world/kernel, where kernel already has debugging symbols enabled by default. When I get a chance, I'll try and read a bit more about kernel debugging for the handbook and see if that has any revelations.
Comment 10 Jonathan Vasquez 2022-05-04 03:57:01 UTC
Some interesting updates.

So it seems that it may not be the kernel that's crashing (yet) but the X server in combination with the kernel.. basically the behavior that I'm experiencing can be reproduced (non-deterministically but it will happen) if I do:

xrandr --output eDP-1 --scale 0.8

You can use any other number as well. As long you cause frequent changes, it will eventually happen. Now something that's further interesting is that once it locks up, pressing keys or moving your mouse won't work. Attempting to switch to a virtual terminal also won't work, or at least it would seem like it didn't. The first interesting thing I noticed was that even though I wasn't able to do much after it locked up, if I pressed the power button, the system actually received the signal and I could see my tty1 again and the system eventually shuts down. So that would indicate that the kernel hasn't actually crashed. I also saw the following message (same as before but with another line):

drmn0: GPU HANG: ecode 12:1:85dffffb, in MainThread [100328]
drmn0: Resetting rcs0 for stopped heartbeat on rcs0
drmn0: Xorg[100328] context reset due to GPU hang

The second interesting thing was that once I got it locked again, I was actually able to - eventually - switch to a TTY. I just kept pressing Ctrl + Alt + F[1-4] until one of them got through to the system.

Lastly, after the system got locked, I managed to switch to a TTY and then kill X, and start another session immediately. Once I locked it for a second time in a row, my system was truly frozen and I had no chance of switching to a TTY and the power button also stopped getting its shutdown signal through to the kernel. I had to hard power off the machine. So it's interesting that a second "hard crash" managed to be worse than the first, as if there was some remnant state that managed to worsen things.
Comment 11 Jonathan Vasquez 2022-05-06 20:10:42 UTC
Created attachment 233776 [details]
recent dmesg

A few days ago I decided to give Wayland / sway a shot and I've noticed that the system has been a lot more stable on Wayland than on X11 (w/ i3 or Plasma). Specifically in that I'm not getting as many i915 related crashes anymore. This prompted me to try and put the computer to sleep and see if the i915 errors we were getting on resume would be reduced or disappear. Unfortunately, even if Wayland/sway seem to be "surviving" the i915 driver (It would something still hang and crash but I would be able to either get out and restart sway, or sway would come back to life after a few seconds), the system still crashed a few seconds after resume. We still see the usual errors. I did get these errors now that I'm wayland but it doesn't seem to be causing me any noticeable issues.

Attaching recent dmesg.
Comment 12 bkidney@briankidney.ca 2022-05-10 00:13:40 UTC
I have similar issues with my Framework Laptop. I have been able to trace it to the suspend/resume of the if_iwlwifi driver in my case. As far as I know suspend/resume is not working for this drive and is on the list of "next steps" of the developer working on it.

The was I isolated this was to add the iwlwifi driver to the devmatch_blocklist after booting, then suspend and resume. Without this in rc.conf my system consistently freezes a few seconds after resume. With this added, the system has no problem resuming. It is a test, not a workaround. Right now as far as I can tell you cannot suspend/resume with working wifi with the Intel AX210.
Comment 13 Jonathan Vasquez 2022-05-10 00:50:33 UTC
Hey Brian,

Thanks for reporting that feedback. If that's true then that opens up a bunch of more possibilities. Specifically in that maybe isn't the graphics driver that's causing a full system hang (although we obviously see GPU HANG messages), but it would also mean that the wifi driver could be causing a panic just through general use (since I can't put the computer to sleep, I pretty much just turn it on, try not to stress out the graphics too much, and then turn it off). I sometimes would be using it and it would just lock up. That's just one possibility. I think it may actually be a combination of two bad bugs in each driver, since from what I reported before, stressing out the card (even in a light way) could trigger a lock up. Last night I was moving some windows around from my external monitor (I'm using i3 so moving it from that monitor's workspace) to the framework monitor .. and then it locked up. It doesn't happen all the time but you really don't know when it's gonna happen. Also things like restarting i3 a lot of using xrandr to change the scaling (try doing it a lot) will eventually lock up the system.
Comment 14 Jonathan Vasquez 2022-05-18 19:05:05 UTC
Some more updates. So I was able to fix my computer just generally crashing on X11 (with any DE, although I normally use i3 or KDE) by adding the following to my /usr/local/etc/X!1/xorg.conf.d/10-video.conf:

```
Section "Device"
        Identifier  "Card0"
        Driver      "modesetting"
        BusID       "PCI:0:2:0"
        Option      "AccelMethod"       "SNA"
EndSection
```

Once I added that, I re-tested the resume behavior since as I said earlier, there are probably multiple bugs here (some in the graphics layer, and some in the wireless layer).

During my first three tests, I did 'acpiconf -s 3', waited about 30 seconds, 2 minutes, and 5 minutes respectively. None of these crashed. However, the wireless driver stopped working on the first resume. The good news is that I was able to get more data out of 'dmesg' regarding iwlwifi (attached is 'dump3').

During the third resume, I tried a 'service netif restart wlan0' to see if it would crash (given what we know about the old state), and yup, it crashed immediately. On a fresh reboot, I tried again my 'acpiconf -s 3' tests, and the system started to have non-deterministic behavior as follows:

1. The system resumed fine, didn't crash, but wifi wasn't working. Restarting the wifi driver actually yielded a working connection.

2. The system resumed, and crashed about 10 seconds later, this was a hard crash and the system actually spat out A LOT of output very quickly, ending with an error in the graphics driver (drm-510-kmod, attached as graphics-1.jpg). I was expecting a crash dump to be properly saved since I have the dumpdev properly set now (and even using a 32 GB physical swap :O), but when the system started up it didn't dump anything. No cores were found.

3. The system resumed, and immediately decides to shutdown (as if the power signal was pressed).

Should I open another bug report specifically targeting the graphics portion of this since it does seem we are running into two separate bugs (given this is primarily for the wireless crash on resume)?
Comment 15 Jonathan Vasquez 2022-05-18 19:05:20 UTC
Created attachment 234027 [details]
dump3
Comment 16 Jonathan Vasquez 2022-05-18 19:10:13 UTC
Created attachment 234028 [details]
graphics-1.jpg
Comment 17 Jonathan Vasquez 2022-05-18 19:11:18 UTC
The words under the graphics-1.jpg light is "vpanic" and "panic". I'm also running the following now:

FreeBSD 13.1-STABLE #1 stable/13-n250869-2430388070f: Tue May 17 11:51:35 EDT 2022     root@leslie:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64