Hey all, It seems that framework laptop crashes after it resumes from sleep. I'll land either in the SDDM window (or Plasma Desktop depending how fast I can type my password lol), and then a few seconds later it will crash (freeze completely). I'm running FreeBSD 14.0-CURRENT #0 main-n255077-490a0f77de7. My /etc/rc.conf does have the blocklist for ng_ubt mentioned here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=260161 My system is also running ZFS so /var/crash reports are turned off, but the dmesg output (and /var/log/messages) seems to have some output that may help. It might be related to the i915 (drm-devel-kmod-5.7.19.g20220223) driver rather than bluetooth related. - Jonathan
Created attachment 233563 [details] last dmesg Lines 242 - 318 may be relevant in the 'last dmesg' attachment.
Current: /etc/rc.conf hostname="leslie" dumpdev="NO" zfs_enable="YES" ## [Start Other] #wlans_rtwn0="wlan0" #ifconfig_wlan0="inet 192.168.1.101 netmask 255.255.255.0 ssid Summerland WPA" #ifconfig_wlan0="WPA SYNCDHCP" #defaultrouter="192.168.1.1" ## [End Other] kld_list="i915kms if_iwlwifi" devmatch_blocklist="ng_ubt" ifconfig_ue0="DHCP" wlans_iwlwifi0="wlan0" ifconfig_wlan0="WPA DHCP" create_args_wlan0="regdomain FCC country US" dbus_enable="YES" sddm_enable="YES" syncthing_enable="YES" # Syncthing syncthing_user="jon" syncthing_group="jon" webcamd_enable="YES"
Does the system fully crash, or is the display just frozen? Can you ssh into the system while it's in that state? > My /etc/rc.conf does have the blocklist for ng_ubt mentioned here I think you don't need that anymore, but it shouldn't matter.
(In reply to Jonathan Vasquez from comment #0) > 490a0f77de7 Was drm-devel-kmod-5.7.19.g20220223 built (and installed) whilst running that version? Or installed as a package from the latest repo?
Hey Mark, Graham, @Mark Thanks for that. I tried to ssh into the system but it was completely down. (I connected before hand and verified I was able to actually ssh into the system, then put it to sleep, resumed, and wasn't able to connect at all.. it probably would have taken a few seconds for the network to come back up but by the time it would have been available, the system had already crashed). Also sounds good about the "ng_ubt" situation, that makes sense since I didn't notice any USB enumeration delays necessarily during my boots recently so this issue would have been fixed a while ago when it was first reported. @Graham Funny you mentioned that. So I installed this system a few times over the past few days on 13.0-RELEASE, 13.1-RC4, and a few times on a FreeBSD 14-CURRENT snapshot (FreeBSD-14.0-CURRENT-amd64-20220421-b91a48693a5-254961) and when I had tried to do a `pkg install drm-devel-kmod`. A few days ago when I was on 14-CURRENT and installed drm-devel-kmod, it worked fine. However when I reinstalled it from the same snapshot on 14-CURRENT, the drm-devel-kmod package wouldn't install anymore, I believe it complained about the version number being too new (FreeBSD version specifically, I think it was incremented by 1). At that point I decided it probably would be better for me to just check out the latest CURRENT sources and rebuild world/kernel. Once I did that, I built drm-devel-kmod from source, and that's what I'm using atm :).
@Graham Arg.. typo. ".. (FreeBSD-14.0-CURRENT-amd64-20220421-b91a48693a5-254961) and when I had tried to do a `pkg install drm-devel-kmod` it worked fine. However when I reinstalled it from the same snapshot on 14-CURRENT" .. **
Some minor updates. I originally said that I have dumps disabled since this system is running ZFS (and I remember there was an issue with ZFS + crash dumps). Checking my servers config, I noticed I had written a comment about this, which was that I disabled crash dumps because it caused issues when using a mirrored swap configuration on ZFS. Since my laptop isn't using mirrored swap, I re-enabled it (dumpdev="YES" (rather than "AUTO" or "NO)). The first few tries I tried to put the computer to sleep and then wake it, the computer didn't actually crash. I then switched dumpdev back to NO and the computer crashed within seconds. Switching it back to YES started crashing again so unfortunately that didn't lead to anything. However, I then tried to leave dumpdev enabled and attempted to try and get some crash dumps from the sleep wake. Since I originally installed FreeBSD with only 16 gb of physical swap space, and this machine had 32 gb, I received messages regarding their is no suitable dump device and that the save core is not run. To mitigate this, I tried to just create a 64 GB swap file but that didn't work either. Furthermore, trying to do a 'dumpon /dev/md0' yielded the following: root@leslie:~ # dumpon -l /dev/null root@leslie:~ # dumpon /dev/md0 dumpon: ioctl(DIOCSKERNELDUMP): Operation not supported I suppose this is related to swapfiles not being supported for crash dumps? https://forums.freebsd.org/threads/how-to-enable-crash-dumps.84340/#post-559174 SirDice also mentioned that swap files probably can't be used for this. Since it was pretty time consuming to get my system up to this point, I don't want to do the work to reinstall the entire system just to realign the physical swap.. eventually I'll need to do it if I want to properly test out FreeBSD's hibernate support however. To finish off, the few times that I did wake up the system and it didn't immediately crash, I did notice the following lines in dmesg, maybe this helps in some way: drmn0: GPU HANG: ecode 12:1:85dffffb, in MainThread [100939] drmn0: Resetting rcs0 for stopped heartbeat on rcs0
I decided to spend some time today to reinstall the system and have a proper physical swap partition so that we can get the core dumps (I wanted to do some other changes to my core partition layout anyways). I backed up my /usr/src and other relevant dirs/configs so that I could quickly get back up and running, this worked perfectly :). I'll be attempting to generate some core dumps for this and will post them most likely over the weekend.
Hey all, some minor updates regarding crash dumps after I successfully enabled core dump functionality. I had several crashes today regarding graphics / wireless) and I checked to see if there were any dumps generated, and there were none. I also tried to force save the core dumps as well but that yielded nothing. If anyone has any ideas of how I can get the core dumps after the crashes, let me know and I can retry to extract them from my box. I'm thinking that "kern-dbg" is something that is already provided by me doing a full compilation of CURRENT for world/kernel, where kernel already has debugging symbols enabled by default. When I get a chance, I'll try and read a bit more about kernel debugging for the handbook and see if that has any revelations.
Some interesting updates. So it seems that it may not be the kernel that's crashing (yet) but the X server in combination with the kernel.. basically the behavior that I'm experiencing can be reproduced (non-deterministically but it will happen) if I do: xrandr --output eDP-1 --scale 0.8 You can use any other number as well. As long you cause frequent changes, it will eventually happen. Now something that's further interesting is that once it locks up, pressing keys or moving your mouse won't work. Attempting to switch to a virtual terminal also won't work, or at least it would seem like it didn't. The first interesting thing I noticed was that even though I wasn't able to do much after it locked up, if I pressed the power button, the system actually received the signal and I could see my tty1 again and the system eventually shuts down. So that would indicate that the kernel hasn't actually crashed. I also saw the following message (same as before but with another line): drmn0: GPU HANG: ecode 12:1:85dffffb, in MainThread [100328] drmn0: Resetting rcs0 for stopped heartbeat on rcs0 drmn0: Xorg[100328] context reset due to GPU hang The second interesting thing was that once I got it locked again, I was actually able to - eventually - switch to a TTY. I just kept pressing Ctrl + Alt + F[1-4] until one of them got through to the system. Lastly, after the system got locked, I managed to switch to a TTY and then kill X, and start another session immediately. Once I locked it for a second time in a row, my system was truly frozen and I had no chance of switching to a TTY and the power button also stopped getting its shutdown signal through to the kernel. I had to hard power off the machine. So it's interesting that a second "hard crash" managed to be worse than the first, as if there was some remnant state that managed to worsen things.
Created attachment 233776 [details] recent dmesg A few days ago I decided to give Wayland / sway a shot and I've noticed that the system has been a lot more stable on Wayland than on X11 (w/ i3 or Plasma). Specifically in that I'm not getting as many i915 related crashes anymore. This prompted me to try and put the computer to sleep and see if the i915 errors we were getting on resume would be reduced or disappear. Unfortunately, even if Wayland/sway seem to be "surviving" the i915 driver (It would something still hang and crash but I would be able to either get out and restart sway, or sway would come back to life after a few seconds), the system still crashed a few seconds after resume. We still see the usual errors. I did get these errors now that I'm wayland but it doesn't seem to be causing me any noticeable issues. Attaching recent dmesg.
I have similar issues with my Framework Laptop. I have been able to trace it to the suspend/resume of the if_iwlwifi driver in my case. As far as I know suspend/resume is not working for this drive and is on the list of "next steps" of the developer working on it. The was I isolated this was to add the iwlwifi driver to the devmatch_blocklist after booting, then suspend and resume. Without this in rc.conf my system consistently freezes a few seconds after resume. With this added, the system has no problem resuming. It is a test, not a workaround. Right now as far as I can tell you cannot suspend/resume with working wifi with the Intel AX210.
Hey Brian, Thanks for reporting that feedback. If that's true then that opens up a bunch of more possibilities. Specifically in that maybe isn't the graphics driver that's causing a full system hang (although we obviously see GPU HANG messages), but it would also mean that the wifi driver could be causing a panic just through general use (since I can't put the computer to sleep, I pretty much just turn it on, try not to stress out the graphics too much, and then turn it off). I sometimes would be using it and it would just lock up. That's just one possibility. I think it may actually be a combination of two bad bugs in each driver, since from what I reported before, stressing out the card (even in a light way) could trigger a lock up. Last night I was moving some windows around from my external monitor (I'm using i3 so moving it from that monitor's workspace) to the framework monitor .. and then it locked up. It doesn't happen all the time but you really don't know when it's gonna happen. Also things like restarting i3 a lot of using xrandr to change the scaling (try doing it a lot) will eventually lock up the system.
Some more updates. So I was able to fix my computer just generally crashing on X11 (with any DE, although I normally use i3 or KDE) by adding the following to my /usr/local/etc/X!1/xorg.conf.d/10-video.conf: ``` Section "Device" Identifier "Card0" Driver "modesetting" BusID "PCI:0:2:0" Option "AccelMethod" "SNA" EndSection ``` Once I added that, I re-tested the resume behavior since as I said earlier, there are probably multiple bugs here (some in the graphics layer, and some in the wireless layer). During my first three tests, I did 'acpiconf -s 3', waited about 30 seconds, 2 minutes, and 5 minutes respectively. None of these crashed. However, the wireless driver stopped working on the first resume. The good news is that I was able to get more data out of 'dmesg' regarding iwlwifi (attached is 'dump3'). During the third resume, I tried a 'service netif restart wlan0' to see if it would crash (given what we know about the old state), and yup, it crashed immediately. On a fresh reboot, I tried again my 'acpiconf -s 3' tests, and the system started to have non-deterministic behavior as follows: 1. The system resumed fine, didn't crash, but wifi wasn't working. Restarting the wifi driver actually yielded a working connection. 2. The system resumed, and crashed about 10 seconds later, this was a hard crash and the system actually spat out A LOT of output very quickly, ending with an error in the graphics driver (drm-510-kmod, attached as graphics-1.jpg). I was expecting a crash dump to be properly saved since I have the dumpdev properly set now (and even using a 32 GB physical swap :O), but when the system started up it didn't dump anything. No cores were found. 3. The system resumed, and immediately decides to shutdown (as if the power signal was pressed). Should I open another bug report specifically targeting the graphics portion of this since it does seem we are running into two separate bugs (given this is primarily for the wireless crash on resume)?
Created attachment 234027 [details] dump3
Created attachment 234028 [details] graphics-1.jpg
The words under the graphics-1.jpg light is "vpanic" and "panic". I'm also running the following now: FreeBSD 13.1-STABLE #1 stable/13-n250869-2430388070f: Tue May 17 11:51:35 EDT 2022 root@leslie:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
I can confirm that it's the iwlwifi driver that is causing the crash to happen on resume. I've switched to wifibox on my thinkpad x1c7 which had the same issue. Today I got a chance to try out this same fix on the framework laptop and it worked. Without using iwlwifi and using wifibox instead, I'm able to sleep/resume my computer successfully and still have wifi.
(In reply to Jonathan Vasquez from comment #18) > … iwlwifi … causing the crash to happen on resume. … Triage: component: kern, should be wireless. (In reply to Jonathan Vasquez from comment #17) > … now: > > FreeBSD 13.1-STABLE #1 stable/13-n250869-2430388070f: Tue May 17 11:51:35 EDT 2022 root@leslie:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 – <https://cgit.freebsd.org/src/log/?h=stable%2F13&qt=range&q=2430388070f> Please try a more recent STABLE. Bug 263613 comment 14 is notable. bz@ any other thought? TIA
(In reply to Graham Perrin from comment #19) > Please try a more recent STABLE. For clarity: this should be with an installation of graphics/drm-510-kmod that is built after updating the OS (plus a restart of the OS, for the module to become effective). (In reply to Jonathan Vasquez from comment #16) I wonder whether you have, or had, a combination of bugs, not all of which relate to iwlwifi …
Hey Graham, I've been upgrading my system on 13.1-STABLE (including using the latest drm-510-kmod from ports) the last few months and the issue still occurs. My current build is stable/13-n252072-1243360b1a0/GENERIC. - Jonathan
Also to clarify, after I disabled iwlwifi, I can use sleep/resume fine. I'm using "wifibox" in the meantime to get wireless internet.
I'll take it and try to have a look the next weeks.
Hey Bjoern, I recompiled the latest 13.1-STABLE (stable/13-n252112-84d3fc26e3a) and re-tested a few scenarios. All are using the same settings in /etc/rc.conf: ``` wlans_iwlwifi0="wlan0" ifconfig_wlan0="WPA SYNCDHCP" create_args_wlan0="country US regdomain FCC" ``` Scenario 1 ----------- If the machine boots up and gets a DHCP address (and we can ping a website), we can proceed to run 'zzz' and then resume. Upon resume, the machine didn't crash immediately. However the internet no longer worked (even if ifconfig says it's associated and has an IP. netstat also looked correct as well, I've attached the files for that. You can also see some useful information in the dmesg output). Doing a 'service netif restart wlan0' will crash the machine. You'll see the message saying that wpa_supplicant is stopping, then some PID information, and then the machine will hard lock. Scenario 2 ------------ If you start up the machine but the machine just keeps saying "DHCPREQUEST" (and other DHCP messages), but doesn't actually get an IP, you could Ctrl + C to cancel it and continue execution. Once you are back on your ttyv0 terminal (with no X, just pure CLI), you can log in as root and do 'service netif restart wlan0', that caused the machine to crash immediately and I'm actually getting a infinite loop crash where I see some stuff related to 'drm-510-kmod'. However that doesn't make any sense since as I mentioned, this is done from ttyv0 without X running and I just did 'service netif restart wlan0'.
Created attachment 235909 [details] dmesg 2022-08-14-1
Created attachment 235910 [details] ifconfig-netstat-2022-08-14-1
I have found a workaround for this issue by adding ``` /usr/sbin/service netif stop wlan0 ``` to rc.suspend and ``` /usr/sbin/service netif start wlan0 ``` to rc.resume. This is currently working on 14-CURRENT (commit main-n264816-a4aaee2120ce).
So here's my guess for a "next step". I think we need to know if the panic occurs at exactly the same place every time. I'm going to guess the answer is "no" -- which might indicate that the iwlwifi wakeup routine is not cleaning up after itself, and it's merely stochastic that the crash happens in the drm code. I'd have to know a lot more about the structure of both the iwl and drm code to be able to guess anything else. mcl
I see this too on an (old) Dell XPS13 with intel 8265 added to it - https://wiki.freebsd.org/Laptops/Dell_XPS13_9360 consistently on 13/14/15 over the months.
Hi, I had reports from people recently who said they moved through town and did suspend/resume and not crash and it worked afterwards. I am still lacking details but I want to ask back if anyone can confirm problems still existing on latest 14 or 15?
Still suspend/resume not working for me unless I use the workaround I posted previously. FreeBSD 15.0-CURRENT #28 main-n266169-a69b6af2024f Framework Laptop (Original 13") Intel(R) Wi-Fi 6 AX210 If you need any additional information, let me know.
Suspend and resume kernel panics on me a well, I am running 15-CURRENT to get the latest drm-kmod. It kernel panic's in iwl driver. The trick of stopping wlan0 and starting again is currently working (after one test). But I'll be trying more times!
*** Bug 277734 has been marked as a duplicate of this bug. ***
Hey Bjoern, I wanted to give an update since I recently purchase a new Intel AX210 card (and swapped out my Atheros chip - which has been working pretty well but FreeBSD in general has slow performance with basically all of the wireless chips I've tried) in order to re-test the iwlwifi driver. These results are on the latest FreeBSD 14-RELEASE (FreeBSD leslie 14.0-RELEASE-p6 FreeBSD 14.0-RELEASE-p6 #0: Tue Mar 26 20:26:20 UTC 2024 root@amd64-builder.daemonology.net:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64). The good news is that the basic packet passing is working and I can reach the internet, my NAS (over NFS), watch videos, and that sort of basic stuff. The bad news is that if I put the machine to sleep and then resume, even if it doesn't crash during the first attempt, the connection never automatically works again. ifconfig mentions that it's still associated, but I can't ping anything. After putting the machine to sleep/resumed a few more times in a row, I eventually got the machine to crash on the third attempt. This was connected on 2.4 GHz frequency. Attempting to connect to my 5 GHz channel never worked. I did see it associated for a few seconds, and then disconnected (same passphrase and I know both of my frequencies work perfectly fine with my other devices). Another thing is that the performance of the chip is just way to slow to be usable. I tried transferring a 2.1 GB file and after 6 minutes I gave up. I tried transferring a 684 MB file from my NAS (over NFS) and it took 14 minutes to complete. I did see the following line in `ifconfig` slowly drop from 54 Mbps to about 6 Mbps over the duration of the 14 minutes, it may have even dropped lower: media: IEEE 802.11 Wireless Ethernet OFDM/54Mbps mode 11g
(In reply to Jonathan Vasquez from comment #34) You want to try 13.3-R, stab;e/13, stable/14 or main or wait for the upcoming 14.1-R at least which all should have a lot more stability (even if nothing else) than 14.0.
Thanks Bjoern, I'm compiling stable/14 now and will let you know what happens. Earlier today I did a fresh install on FreeBSD 14-RELEASE and the card was able to connect to the 5 GHz network with no issue and it didn't disconnect. Updating to -6 afterwards still kept the connection. The only thing I could think that may be the difference was that yesterday I was using a static ip configuration, where as for this test I was just using DHCP.
Hey Bjoern, I'm on stable/14 (FreeBSD leslie 14.0-STABLE FreeBSD 14.0-STABLE #0 stable/14-n267200-50f771371356: Sat Apr 13 20:23:30 EDT 2024 root@leslie:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64) and have some updates. 1. I re-transferred that 684 MB file from my home NAS again using the wireless card and instead of it taking 14m, it now only took 5m48s, so a big improvement. Doing some basic 'ls -lh' while it was transferring, it was basically transferring at about 2 MB per sec. 2. After the above, I tested sleep/resume. The system came back up without crashing, but the internet connection was broken even though 'ifconfig' said that it was associated to the network. I then did 4 more sleep/resume in sequence to see if the system would crash. It didn't :). 3. After those tests were done, I then tested tear down since I remember we mentioned that the iwlwifi driver/card doesn't like being teared down and it could crash. So I first did a `service netif restart && service routing restart` and the internet worked again (doing netif alone wasn't enough, I needed routing as well.. most likely because I'm using a static ip configuration: wlans_iwlwifi0="wlan0" ifconfig_wlan0="WPA inet 192.168.1.101 netmask 255.255.255.0" ifconfig_wlan0_ipv6="inet6 accept_rtadv" defaultrouter="192.168.1.1" create_args_wlan0="country US regdomain FCC" I then did two more sleep/resume and it didn't crash (but internet broke again). 4. After this, I stress tested the tear down by doing the netif/routing restarts 8 more times in a row. It did not crash. This is all good news and is an indication that your stability fixes are working, although I'll need to use it throughout the days/weeks more to see how stable it actually is. I'll be trying to play around with if_lagg and seeing how well it works for failover recovery between my ethernet interface that's on an type c dongle and the built in Intel AX210 wireless card, so I can more easily switch between locations at home ;). I did hear that FreeBSD's network stack in general is at an 802.11g level so regardless of what the driver is (Atheros, Intel, etc), the maximum speed will be at g speeds. Do you know what's the status with upgrading FreeBSD's wifi stack to 802.11n/ac/ax? I believe I read in one of the recent FreeBSD foundation reports that more investments will start occurring for the wireless network stack specifically. If you have any links where I can read more up on that, that would be helpful. Other info (also attached dmesg-2024-04-14-1136): wlan0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=0 ether c8:15:4e:aa:6d:92 inet 192.168.1.101 netmask 0xffffff00 broadcast 192.168.1.255 inet6 fe80::ca15:4eff:feaa:6d92%wlan0 prefixlen 64 scopeid 0x2 groups: wlan ssid <redacted> channel 40 (5200 MHz 11a) bssid c8:7f:54:b5:d5:f4 regdomain FCC country US authmode WPA2/802.11i privacy ON deftxkey UNDEF AES-CCM 3:128-bit txpower 17 bmiss 7 mcastrate 6 mgmtrate 6 scanvalid 60 wme roaming MANUAL parent interface: iwlwifi0 media: IEEE 802.11 Wireless Ethernet OFDM/54Mbps mode 11a status: associated nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL> - Jonathan
Created attachment 249972 [details] dmesg-2024-04-14-1136
(In reply to Jonathan Vasquez from comment #37) First thanks for all the testing and reporting back. Sounds like good news. When using the suspend/resume are you using the workaround from (Comment #27) https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=263632#c27 ? I assume not given you are manually restarting afterwards? I wonder if these days rather than doing a full service restart basically if would be sufficient to do an ifconfig wlan0 down / ifconfig wlan0 up instead? As it goes for more speed, ath/iwn/run/rtwn/.. do 11n; I hope we'll get back to that for iwlwifi and rtw88 (and other LinuxKPI based drivers) in the next weeks.
You're welcome Bjoern. I haven't used the automatic workaround but good call out since I can use that in the meantime. I did do something like that in the past when I was messing with devd and wifibox. I'll test the simple up/down and see what happens.
I tested the `ifconfig wlan0 down` and `ifconfig wlan0 up` approach but it didn't work unfortunately, it just stayed like this: wlan0: flags=8c43<UP,BROADCAST,RUNNING,DRV_OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=0 ether c8:15:4e:aa:6d:92 inet 192.168.1.102 netmask 0xffffff00 broadcast 192.168.1.255 groups: wlan ssid "" channel 40 (5200 MHz 11a) regdomain FCC country US authmode WPA1+WPA2/802.11i privacy ON deftxkey UNDEF AES-CCM 2:128-bit txpower 17 bmiss 7 mcastrate 6 mgmtrate 6 scanvalid 60 wme roaming MANUAL parent interface: iwlwifi0 media: IEEE 802.11 Wireless Ethernet autoselect (autoselect) status: no carrier nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> However I was able to get more log output in dmesg after I did that. The output is attached as 'dmesg-2024-04-14-1650.txt`. I tried the above up/down two times but neither worked. If you look for "iwlwifi0: Error sending TXPATH_FLUSH: time out after 2000ms.", you can see the segments when I did "down". The full service restart fixed it again.
Created attachment 249978 [details] dmesg-2024-04-14-1650
(In reply to Jonathan Vasquez from comment #41) Thanks for testing the down/up as well.