Bug 263632 - iwlwifi(4) crashes (freezes) a few seconds after resuming from sleep (suspend/resume not working; workaround known for now)
Summary: iwlwifi(4) crashes (freezes) a few seconds after resuming from sleep (suspend...
Status: In Progress
Alias: None
Product: Base System
Classification: Unclassified
Component: wireless (show other bugs)
Version: Unspecified
Hardware: Any Any
: --- Affects Some People
Assignee: Bjoern A. Zeeb
URL:
Keywords: crash
: 277734 (view as bug list)
Depends on:
Blocks: frameworklaptop 14.0r iwlwifi
  Show dependency treegraph
 
Reported: 2022-04-28 16:19 UTC by Jonathan Vasquez
Modified: 2024-12-09 15:56 UTC (History)
12 users (show)

See Also:


Attachments
last dmesg (19.76 KB, text/plain)
2022-04-28 16:20 UTC, Jonathan Vasquez
no flags Details
recent dmesg (85.47 KB, text/plain)
2022-05-06 20:10 UTC, Jonathan Vasquez
no flags Details
dump3 (21.45 KB, text/plain)
2022-05-18 19:05 UTC, Jonathan Vasquez
no flags Details
graphics-1.jpg (991.54 KB, image/jpeg)
2022-05-18 19:10 UTC, Jonathan Vasquez
no flags Details
dmesg 2022-08-14-1 (21.13 KB, text/plain)
2022-08-14 18:58 UTC, Jonathan Vasquez
no flags Details
ifconfig-netstat-2022-08-14-1 (2.14 KB, text/plain)
2022-08-14 19:00 UTC, Jonathan Vasquez
no flags Details
dmesg-2024-04-14-1136 (78.59 KB, text/plain)
2024-04-14 15:37 UTC, Jonathan Vasquez
no flags Details
dmesg-2024-04-14-1650 (47.74 KB, text/plain)
2024-04-14 20:52 UTC, Jonathan Vasquez
no flags Details
recent iwlwifi lines only from /var/log/messages (11.76 KB, text/plain)
2024-11-27 06:06 UTC, Graham Perrin
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jonathan Vasquez 2022-04-28 16:19:09 UTC
Hey all,

It seems that framework laptop crashes after it resumes from sleep. I'll land either in the SDDM window (or Plasma Desktop depending how fast I can type my password lol), and then a few seconds later it will crash (freeze completely).

I'm running FreeBSD 14.0-CURRENT #0 main-n255077-490a0f77de7. My /etc/rc.conf does have the blocklist for ng_ubt mentioned here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=260161

My system is also running ZFS so /var/crash reports are turned off, but the dmesg output (and /var/log/messages) seems to have some output that may help. It might be related to the i915 (drm-devel-kmod-5.7.19.g20220223) driver rather than bluetooth related.

- Jonathan
Comment 1 Jonathan Vasquez 2022-04-28 16:20:49 UTC
Created attachment 233563 [details]
last dmesg

Lines 242 - 318 may be relevant in the 'last dmesg' attachment.
Comment 2 Jonathan Vasquez 2022-04-28 16:21:37 UTC
Current: /etc/rc.conf

hostname="leslie"
dumpdev="NO"
zfs_enable="YES"

## [Start Other]

#wlans_rtwn0="wlan0"
#ifconfig_wlan0="inet 192.168.1.101 netmask 255.255.255.0 ssid Summerland WPA"
#ifconfig_wlan0="WPA SYNCDHCP"
#defaultrouter="192.168.1.1"

## [End Other]

kld_list="i915kms if_iwlwifi"
devmatch_blocklist="ng_ubt"

ifconfig_ue0="DHCP"

wlans_iwlwifi0="wlan0"
ifconfig_wlan0="WPA DHCP"
create_args_wlan0="regdomain FCC country US"

dbus_enable="YES"
sddm_enable="YES"
syncthing_enable="YES"

# Syncthing
syncthing_user="jon"
syncthing_group="jon"

webcamd_enable="YES"
Comment 3 Mark Johnston freebsd_committer freebsd_triage 2022-04-28 16:27:42 UTC
Does the system fully crash, or is the display just frozen?  Can you ssh into the system while it's in that state?

> My /etc/rc.conf does have the blocklist for ng_ubt mentioned here

I think you don't need that anymore, but it shouldn't matter.
Comment 4 Graham Perrin freebsd_committer freebsd_triage 2022-04-28 17:08:09 UTC
(In reply to Jonathan Vasquez from comment #0)

> 490a0f77de7

Was drm-devel-kmod-5.7.19.g20220223 built (and installed) whilst running that version? Or installed as a package from the latest repo?
Comment 5 Jonathan Vasquez 2022-04-28 17:17:34 UTC
Hey Mark, Graham,

@Mark

Thanks for that. I tried to ssh into the system but it was completely down. (I connected before hand and verified I was able to actually ssh into the system, then put it to sleep, resumed, and wasn't able to connect at all.. it probably would have taken a few seconds for the network to come back up but by the time it would have been available, the system had already crashed).

Also sounds good about the "ng_ubt" situation, that makes sense since I didn't notice any USB enumeration delays necessarily during my boots recently so this issue would have been fixed a while ago when it was first reported.

@Graham

Funny you mentioned that. So I installed this system a few times over the past few days on 13.0-RELEASE, 13.1-RC4, and a few times on a FreeBSD 14-CURRENT snapshot (FreeBSD-14.0-CURRENT-amd64-20220421-b91a48693a5-254961) and when I had tried to do a `pkg install drm-devel-kmod`. A few days ago when I was on 14-CURRENT and installed drm-devel-kmod, it worked fine. However when I reinstalled it from the same snapshot on 14-CURRENT, the drm-devel-kmod package wouldn't install anymore, I believe it complained about the version number being too new (FreeBSD version specifically, I think it was incremented by 1). At that point I decided it probably would be better for me to just check out the latest CURRENT sources and rebuild world/kernel. Once I did that, I built drm-devel-kmod from source, and that's what I'm using atm :).
Comment 6 Jonathan Vasquez 2022-04-28 17:19:38 UTC
@Graham

Arg.. typo.

".. (FreeBSD-14.0-CURRENT-amd64-20220421-b91a48693a5-254961) and when I had tried to do a `pkg install drm-devel-kmod` it worked fine. However when I reinstalled it from the same snapshot on 14-CURRENT" .. **
Comment 7 Jonathan Vasquez 2022-04-28 20:09:11 UTC
Some minor updates. I originally said that I have dumps disabled since this system is running ZFS (and I remember there was an issue with ZFS + crash dumps). Checking my servers config, I noticed I had written a comment about this, which was that I disabled crash dumps because it caused issues when using a mirrored swap configuration on ZFS. Since my laptop isn't using mirrored swap, I re-enabled it (dumpdev="YES" (rather than "AUTO" or "NO)). The first few tries I tried to put the computer to sleep and then wake it, the computer didn't actually crash. I then switched dumpdev back to NO and the computer crashed within seconds. Switching it back to YES started crashing again so unfortunately that didn't lead to anything. However, I then tried to leave dumpdev enabled and attempted to try and get some crash dumps from the sleep wake. Since I originally installed FreeBSD with only 16 gb of physical swap space, and this machine had 32 gb, I received messages regarding their is no suitable dump device and that the save core is not run. To mitigate this, I tried to just create a 64 GB swap file but that didn't work either. Furthermore, trying to do a 'dumpon /dev/md0' yielded the following:

root@leslie:~ # dumpon -l
/dev/null
root@leslie:~ # dumpon /dev/md0
dumpon: ioctl(DIOCSKERNELDUMP): Operation not supported

I suppose this is related to swapfiles not being supported for crash dumps?

https://forums.freebsd.org/threads/how-to-enable-crash-dumps.84340/#post-559174

SirDice also mentioned that swap files probably can't be used for this. Since it was pretty time consuming to get my system up to this point, I don't want to do the work to reinstall the entire system just to realign the physical swap.. eventually I'll need to do it if I want to properly test out FreeBSD's hibernate support however.

To finish off, the few times that I did wake up the system and it didn't immediately crash, I did notice the following lines in dmesg, maybe this helps in some way:

drmn0: GPU HANG: ecode 12:1:85dffffb, in MainThread [100939]
drmn0: Resetting rcs0 for stopped heartbeat on rcs0
Comment 8 Jonathan Vasquez 2022-04-29 21:31:34 UTC
I decided to spend some time today to reinstall the system and have a proper physical swap partition so that we can get the core dumps (I wanted to do some other changes to my core partition layout anyways). I backed up my /usr/src and other relevant dirs/configs so that I could quickly get back up and running, this worked perfectly :). I'll be attempting to generate some core dumps for this and will post them most likely over the weekend.
Comment 9 Jonathan Vasquez 2022-05-02 18:53:35 UTC
Hey all, some minor updates regarding crash dumps after I successfully enabled core dump functionality. I had several crashes today regarding graphics / wireless) and I checked to see if there were any dumps generated, and there were none. I also tried to force save the core dumps as well but that yielded nothing. If anyone has any ideas of how I can get the core dumps after the crashes, let me know and I can retry to extract them from my box. I'm thinking that "kern-dbg" is something that is already provided by me doing a full compilation of CURRENT for world/kernel, where kernel already has debugging symbols enabled by default. When I get a chance, I'll try and read a bit more about kernel debugging for the handbook and see if that has any revelations.
Comment 10 Jonathan Vasquez 2022-05-04 03:57:01 UTC
Some interesting updates.

So it seems that it may not be the kernel that's crashing (yet) but the X server in combination with the kernel.. basically the behavior that I'm experiencing can be reproduced (non-deterministically but it will happen) if I do:

xrandr --output eDP-1 --scale 0.8

You can use any other number as well. As long you cause frequent changes, it will eventually happen. Now something that's further interesting is that once it locks up, pressing keys or moving your mouse won't work. Attempting to switch to a virtual terminal also won't work, or at least it would seem like it didn't. The first interesting thing I noticed was that even though I wasn't able to do much after it locked up, if I pressed the power button, the system actually received the signal and I could see my tty1 again and the system eventually shuts down. So that would indicate that the kernel hasn't actually crashed. I also saw the following message (same as before but with another line):

drmn0: GPU HANG: ecode 12:1:85dffffb, in MainThread [100328]
drmn0: Resetting rcs0 for stopped heartbeat on rcs0
drmn0: Xorg[100328] context reset due to GPU hang

The second interesting thing was that once I got it locked again, I was actually able to - eventually - switch to a TTY. I just kept pressing Ctrl + Alt + F[1-4] until one of them got through to the system.

Lastly, after the system got locked, I managed to switch to a TTY and then kill X, and start another session immediately. Once I locked it for a second time in a row, my system was truly frozen and I had no chance of switching to a TTY and the power button also stopped getting its shutdown signal through to the kernel. I had to hard power off the machine. So it's interesting that a second "hard crash" managed to be worse than the first, as if there was some remnant state that managed to worsen things.
Comment 11 Jonathan Vasquez 2022-05-06 20:10:42 UTC
Created attachment 233776 [details]
recent dmesg

A few days ago I decided to give Wayland / sway a shot and I've noticed that the system has been a lot more stable on Wayland than on X11 (w/ i3 or Plasma). Specifically in that I'm not getting as many i915 related crashes anymore. This prompted me to try and put the computer to sleep and see if the i915 errors we were getting on resume would be reduced or disappear. Unfortunately, even if Wayland/sway seem to be "surviving" the i915 driver (It would something still hang and crash but I would be able to either get out and restart sway, or sway would come back to life after a few seconds), the system still crashed a few seconds after resume. We still see the usual errors. I did get these errors now that I'm wayland but it doesn't seem to be causing me any noticeable issues.

Attaching recent dmesg.
Comment 12 bkidney@briankidney.ca 2022-05-10 00:13:40 UTC
I have similar issues with my Framework Laptop. I have been able to trace it to the suspend/resume of the if_iwlwifi driver in my case. As far as I know suspend/resume is not working for this drive and is on the list of "next steps" of the developer working on it.

The was I isolated this was to add the iwlwifi driver to the devmatch_blocklist after booting, then suspend and resume. Without this in rc.conf my system consistently freezes a few seconds after resume. With this added, the system has no problem resuming. It is a test, not a workaround. Right now as far as I can tell you cannot suspend/resume with working wifi with the Intel AX210.
Comment 13 Jonathan Vasquez 2022-05-10 00:50:33 UTC
Hey Brian,

Thanks for reporting that feedback. If that's true then that opens up a bunch of more possibilities. Specifically in that maybe isn't the graphics driver that's causing a full system hang (although we obviously see GPU HANG messages), but it would also mean that the wifi driver could be causing a panic just through general use (since I can't put the computer to sleep, I pretty much just turn it on, try not to stress out the graphics too much, and then turn it off). I sometimes would be using it and it would just lock up. That's just one possibility. I think it may actually be a combination of two bad bugs in each driver, since from what I reported before, stressing out the card (even in a light way) could trigger a lock up. Last night I was moving some windows around from my external monitor (I'm using i3 so moving it from that monitor's workspace) to the framework monitor .. and then it locked up. It doesn't happen all the time but you really don't know when it's gonna happen. Also things like restarting i3 a lot of using xrandr to change the scaling (try doing it a lot) will eventually lock up the system.
Comment 14 Jonathan Vasquez 2022-05-18 19:05:05 UTC
Some more updates. So I was able to fix my computer just generally crashing on X11 (with any DE, although I normally use i3 or KDE) by adding the following to my /usr/local/etc/X!1/xorg.conf.d/10-video.conf:

```
Section "Device"
        Identifier  "Card0"
        Driver      "modesetting"
        BusID       "PCI:0:2:0"
        Option      "AccelMethod"       "SNA"
EndSection
```

Once I added that, I re-tested the resume behavior since as I said earlier, there are probably multiple bugs here (some in the graphics layer, and some in the wireless layer).

During my first three tests, I did 'acpiconf -s 3', waited about 30 seconds, 2 minutes, and 5 minutes respectively. None of these crashed. However, the wireless driver stopped working on the first resume. The good news is that I was able to get more data out of 'dmesg' regarding iwlwifi (attached is 'dump3').

During the third resume, I tried a 'service netif restart wlan0' to see if it would crash (given what we know about the old state), and yup, it crashed immediately. On a fresh reboot, I tried again my 'acpiconf -s 3' tests, and the system started to have non-deterministic behavior as follows:

1. The system resumed fine, didn't crash, but wifi wasn't working. Restarting the wifi driver actually yielded a working connection.

2. The system resumed, and crashed about 10 seconds later, this was a hard crash and the system actually spat out A LOT of output very quickly, ending with an error in the graphics driver (drm-510-kmod, attached as graphics-1.jpg). I was expecting a crash dump to be properly saved since I have the dumpdev properly set now (and even using a 32 GB physical swap :O), but when the system started up it didn't dump anything. No cores were found.

3. The system resumed, and immediately decides to shutdown (as if the power signal was pressed).

Should I open another bug report specifically targeting the graphics portion of this since it does seem we are running into two separate bugs (given this is primarily for the wireless crash on resume)?
Comment 15 Jonathan Vasquez 2022-05-18 19:05:20 UTC
Created attachment 234027 [details]
dump3
Comment 16 Jonathan Vasquez 2022-05-18 19:10:13 UTC
Created attachment 234028 [details]
graphics-1.jpg
Comment 17 Jonathan Vasquez 2022-05-18 19:11:18 UTC
The words under the graphics-1.jpg light is "vpanic" and "panic". I'm also running the following now:

FreeBSD 13.1-STABLE #1 stable/13-n250869-2430388070f: Tue May 17 11:51:35 EDT 2022     root@leslie:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
Comment 18 Jonathan Vasquez 2022-08-06 01:53:36 UTC
I can confirm that it's the iwlwifi driver that is causing the crash to happen on resume. I've switched to wifibox on my thinkpad x1c7 which had the same issue. Today I got a chance to try out this same fix on the framework laptop and it worked. Without using iwlwifi and using wifibox instead, I'm able to sleep/resume my computer successfully and still have wifi.
Comment 19 Graham Perrin freebsd_committer freebsd_triage 2022-08-14 08:01:27 UTC
(In reply to Jonathan Vasquez from comment #18)

> … iwlwifi … causing the crash to happen on resume. …

Triage: component: kern, should be wireless. 


(In reply to Jonathan Vasquez from comment #17)

> … now:
> 
> FreeBSD 13.1-STABLE #1 stable/13-n250869-2430388070f: Tue May 17 11:51:35 EDT 2022     root@leslie:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64

– <https://cgit.freebsd.org/src/log/?h=stable%2F13&qt=range&q=2430388070f>

Please try a more recent STABLE. 

Bug 263613 comment 14 is notable. 

bz@ any other thought? TIA
Comment 20 Graham Perrin freebsd_committer freebsd_triage 2022-08-14 08:11:42 UTC
(In reply to Graham Perrin from comment #19)

> Please try a more recent STABLE. 

For clarity: this should be with an installation of graphics/drm-510-kmod that is built after updating the OS (plus a restart of the OS, for the module to become effective). 


(In reply to Jonathan Vasquez from comment #16)

I wonder whether you have, or had, a combination of bugs, not all of which relate to iwlwifi …
Comment 21 Jonathan Vasquez 2022-08-14 13:21:10 UTC
Hey Graham,

I've been upgrading my system on 13.1-STABLE (including using the latest drm-510-kmod from ports) the last few months and the issue still occurs. My current build is stable/13-n252072-1243360b1a0/GENERIC.

- Jonathan
Comment 22 Jonathan Vasquez 2022-08-14 13:22:33 UTC
Also to clarify, after I disabled iwlwifi, I can use sleep/resume fine. I'm using "wifibox" in the meantime to get wireless internet.
Comment 23 Bjoern A. Zeeb freebsd_committer freebsd_triage 2022-08-14 14:25:34 UTC
I'll take it and try to have a look the next weeks.
Comment 24 Jonathan Vasquez 2022-08-14 18:58:31 UTC
Hey Bjoern,

I recompiled the latest 13.1-STABLE (stable/13-n252112-84d3fc26e3a) and re-tested a few scenarios.

All are using the same settings in /etc/rc.conf:

```
wlans_iwlwifi0="wlan0"
ifconfig_wlan0="WPA SYNCDHCP"
create_args_wlan0="country US regdomain FCC"
```

Scenario 1
-----------
If the machine boots up and gets a DHCP address (and we can ping a website), we can proceed to run 'zzz' and then resume. Upon resume, the machine didn't crash immediately. However the internet no longer worked (even if ifconfig says it's associated and has an IP. netstat also looked correct as well, I've attached the files for that. You can also see some useful information in the dmesg output).

Doing a 'service netif restart wlan0' will crash the machine. You'll see the message saying that wpa_supplicant is stopping, then some PID information, and then the machine will hard lock.

Scenario 2
------------
If you start up the machine but the machine just keeps saying "DHCPREQUEST" (and other DHCP messages), but doesn't actually get an IP, you could Ctrl + C to cancel it and continue execution. Once you are back on your ttyv0 terminal (with no X, just pure CLI), you can log in as root and do 'service netif restart wlan0', that caused the machine to crash immediately and I'm actually getting a infinite loop crash where I see some stuff related to 'drm-510-kmod'. However that doesn't make any sense since as I mentioned, this is done from ttyv0 without X running and I just did 'service netif restart wlan0'.
Comment 25 Jonathan Vasquez 2022-08-14 18:58:53 UTC
Created attachment 235909 [details]
dmesg 2022-08-14-1
Comment 26 Jonathan Vasquez 2022-08-14 19:00:36 UTC
Created attachment 235910 [details]
ifconfig-netstat-2022-08-14-1
Comment 27 bkidney@briankidney.ca 2023-08-21 02:29:38 UTC
I have found a workaround for this issue by adding

```
/usr/sbin/service netif stop wlan0
```

to rc.suspend and 

```
/usr/sbin/service netif start wlan0
```

to rc.resume.

This is currently working on 14-CURRENT (commit main-n264816-a4aaee2120ce).
Comment 28 Mark Linimon freebsd_committer freebsd_triage 2023-09-21 02:33:42 UTC
So here's my guess for a "next step".

I think we need to know if the panic occurs at exactly the same place every time.

I'm going to guess the answer is "no" -- which might indicate that the iwlwifi wakeup routine is not cleaning up after itself, and it's merely stochastic that the crash happens in the drm code.

I'd have to know a lot more about the structure of both the iwl and drm code to be able to guess anything else.

mcl
Comment 29 Dave Cottlehuber freebsd_committer freebsd_triage 2023-10-04 17:45:49 UTC
I see this too on an (old) Dell XPS13 with intel 8265 added to it
- https://wiki.freebsd.org/Laptops/Dell_XPS13_9360
consistently on 13/14/15 over the months.
Comment 30 Bjoern A. Zeeb freebsd_committer freebsd_triage 2023-10-25 21:32:36 UTC
Hi,

I had reports from people recently who said they moved through town and did suspend/resume and not crash and it worked afterwards.  I am still lacking details but I want to ask back if anyone can confirm problems still existing on latest 14 or 15?
Comment 31 bkidney@briankidney.ca 2023-10-30 03:22:21 UTC
Still suspend/resume not working for me unless I use the workaround I posted previously.

FreeBSD 15.0-CURRENT #28 main-n266169-a69b6af2024f
Framework Laptop (Original 13")
Intel(R) Wi-Fi 6 AX210

If you need any additional information, let me know.
Comment 32 mmatalka 2023-12-30 09:08:14 UTC
Suspend and resume kernel panics on me a well, I am running 15-CURRENT to get the latest drm-kmod.  It kernel panic's in iwl driver.  The trick of stopping wlan0 and starting again is currently working (after one test).  But I'll be trying more times!
Comment 33 Bjoern A. Zeeb freebsd_committer freebsd_triage 2024-03-18 20:03:15 UTC
*** Bug 277734 has been marked as a duplicate of this bug. ***
Comment 34 Jonathan Vasquez 2024-04-13 00:49:20 UTC
Hey Bjoern,

I wanted to give an update since I recently purchase a new Intel AX210 card (and swapped out my Atheros chip - which has been working pretty well but FreeBSD in general has slow performance with basically all of the wireless chips I've tried) in order to re-test the iwlwifi driver.

These results are on the latest FreeBSD 14-RELEASE (FreeBSD leslie 14.0-RELEASE-p6 FreeBSD 14.0-RELEASE-p6 #0: Tue Mar 26 20:26:20 UTC 2024     root@amd64-builder.daemonology.net:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64).

The good news is that the basic packet passing is working and I can reach the internet, my NAS (over NFS), watch videos, and that sort of basic stuff.

The bad news is that if I put the machine to sleep and then resume, even if it doesn't crash during the first attempt, the connection never automatically works again. ifconfig mentions that it's still associated, but I can't ping anything. After putting the machine to sleep/resumed a few more times in a row, I eventually got the machine to crash on the third attempt.

This was connected on 2.4 GHz frequency. Attempting to connect to my 5 GHz channel never worked. I did see it associated for a few seconds, and then disconnected (same passphrase and I know both of my frequencies work perfectly fine with my other devices).

Another thing is that the performance of the chip is just way to slow to be usable. I tried transferring a 2.1 GB file and after 6 minutes I gave up. I tried transferring a 684 MB file from my NAS (over NFS) and it took 14 minutes to complete. I did see the following line in `ifconfig` slowly drop from 54 Mbps to about 6 Mbps over the duration of the 14 minutes, it may have even dropped lower:

media: IEEE 802.11 Wireless Ethernet OFDM/54Mbps mode 11g
Comment 35 Bjoern A. Zeeb freebsd_committer freebsd_triage 2024-04-13 11:10:51 UTC
(In reply to Jonathan Vasquez from comment #34)

You want to try 13.3-R, stab;e/13, stable/14 or main or wait for the upcoming 14.1-R at least which all should have a lot more stability (even if nothing else) than 14.0.
Comment 36 Jonathan Vasquez 2024-04-13 22:57:58 UTC
Thanks Bjoern, I'm compiling stable/14 now and will let you know what happens.

Earlier today I did a fresh install on FreeBSD 14-RELEASE and the card was able to connect to the 5 GHz network with no issue and it didn't disconnect. Updating to -6 afterwards still kept the connection. The only thing I could think that may be the difference was that yesterday I was using a static ip configuration, where as for this test I was just using DHCP.
Comment 37 Jonathan Vasquez 2024-04-14 15:36:17 UTC
Hey Bjoern,

I'm on stable/14 (FreeBSD leslie 14.0-STABLE FreeBSD 14.0-STABLE #0 stable/14-n267200-50f771371356: Sat Apr 13 20:23:30 EDT 2024     root@leslie:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64) and have some updates.

1. I re-transferred that 684 MB file from my home NAS again using the wireless card and instead of it taking 14m, it now only took 5m48s, so a big improvement. Doing some basic 'ls -lh' while it was transferring, it was basically transferring at about 2 MB per sec.

2. After the above, I tested sleep/resume. The system came back up without crashing, but the internet connection was broken even though 'ifconfig' said that it was associated to the network. I then did 4 more sleep/resume in sequence to see if the system would crash. It didn't :).

3. After those tests were done, I then tested tear down since I remember we mentioned that the iwlwifi driver/card doesn't like being teared down and it could crash. So I first did a `service netif restart && service routing restart` and the internet worked again (doing netif alone wasn't enough, I needed routing as well.. most likely because I'm using a static ip configuration:

wlans_iwlwifi0="wlan0"
ifconfig_wlan0="WPA inet 192.168.1.101 netmask 255.255.255.0"
ifconfig_wlan0_ipv6="inet6 accept_rtadv"
defaultrouter="192.168.1.1"
create_args_wlan0="country US regdomain FCC"

I then did two more sleep/resume and it didn't crash (but internet broke again).

4. After this, I stress tested the tear down by doing the netif/routing restarts 8 more times in a row. It did not crash.

This is all good news and is an indication that your stability fixes are working, although I'll need to use it throughout the days/weeks more to see how stable it actually is. I'll be trying to play around with if_lagg and seeing how well it works for failover recovery between my ethernet interface that's on an type c dongle and the built in Intel AX210 wireless card, so I can more easily switch between locations at home ;).

I did hear that FreeBSD's network stack in general is at an 802.11g level so regardless of what the driver is (Atheros, Intel, etc), the maximum speed will be at g speeds. Do you know what's the status with upgrading FreeBSD's wifi stack to 802.11n/ac/ax? I believe
I read in one of the recent FreeBSD foundation reports that more investments will start occurring for the wireless network stack specifically. If you have any links where I can read more up on that, that would be helpful.

Other info (also attached dmesg-2024-04-14-1136):

wlan0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=0
	ether c8:15:4e:aa:6d:92
	inet 192.168.1.101 netmask 0xffffff00 broadcast 192.168.1.255
	inet6 fe80::ca15:4eff:feaa:6d92%wlan0 prefixlen 64 scopeid 0x2
	groups: wlan
	ssid <redacted> channel 40 (5200 MHz 11a) bssid c8:7f:54:b5:d5:f4
	regdomain FCC country US authmode WPA2/802.11i privacy ON
	deftxkey UNDEF AES-CCM 3:128-bit txpower 17 bmiss 7 mcastrate 6
	mgmtrate 6 scanvalid 60 wme roaming MANUAL
	parent interface: iwlwifi0
	media: IEEE 802.11 Wireless Ethernet OFDM/54Mbps mode 11a
	status: associated
	nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>

- Jonathan
Comment 38 Jonathan Vasquez 2024-04-14 15:37:08 UTC
Created attachment 249972 [details]
dmesg-2024-04-14-1136
Comment 39 Bjoern A. Zeeb freebsd_committer freebsd_triage 2024-04-14 19:37:55 UTC
(In reply to Jonathan Vasquez from comment #37)

First thanks for all the testing and reporting back.  Sounds like good news.

When using the suspend/resume are you using the workaround from (Comment #27) https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=263632#c27 ? I assume not given you are manually restarting afterwards?

I wonder if these days rather than doing a full service restart basically if would be sufficient to do an ifconfig wlan0 down / ifconfig wlan0 up instead?

As it goes for more speed,  ath/iwn/run/rtwn/.. do 11n;  I hope we'll get back to that for iwlwifi and rtw88  (and other LinuxKPI based drivers) in the next weeks.
Comment 40 Jonathan Vasquez 2024-04-14 20:22:46 UTC
You're welcome Bjoern.

I haven't used the automatic workaround but good call out since I can use that in the meantime. I did do something like that in the past when I was messing with devd and wifibox. I'll test the simple up/down and see what happens.
Comment 41 Jonathan Vasquez 2024-04-14 20:52:21 UTC
I tested the `ifconfig wlan0 down` and `ifconfig wlan0 up` approach but it didn't work unfortunately, it just stayed like this:

wlan0: flags=8c43<UP,BROADCAST,RUNNING,DRV_OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=0
	ether c8:15:4e:aa:6d:92
	inet 192.168.1.102 netmask 0xffffff00 broadcast 192.168.1.255
	groups: wlan
	ssid "" channel 40 (5200 MHz 11a)
	regdomain FCC country US authmode WPA1+WPA2/802.11i privacy ON
	deftxkey UNDEF AES-CCM 2:128-bit txpower 17 bmiss 7 mcastrate 6
	mgmtrate 6 scanvalid 60 wme roaming MANUAL
	parent interface: iwlwifi0
	media: IEEE 802.11 Wireless Ethernet autoselect (autoselect)
	status: no carrier
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>

However I was able to get more log output in dmesg after I did that. The output is attached as 'dmesg-2024-04-14-1650.txt`. I tried the above up/down two times but neither worked. If you look for "iwlwifi0: Error sending TXPATH_FLUSH: time out after 2000ms.", you can see the segments when I did "down". The full service restart fixed it again.
Comment 42 Jonathan Vasquez 2024-04-14 20:52:39 UTC
Created attachment 249978 [details]
dmesg-2024-04-14-1650
Comment 43 Bjoern A. Zeeb freebsd_committer freebsd_triage 2024-04-15 15:49:16 UTC
(In reply to Jonathan Vasquez from comment #41)

Thanks for testing the down/up as well.
Comment 44 Graham Perrin 2024-11-27 06:06:38 UTC
Created attachment 255483 [details]
recent iwlwifi lines only from /var/log/messages

In this attachment: I assume that all lines at Nov 27 05:20:08 
were a symptom of this bug 263632. If not, I can make a separate report. 

Before working around: 


% date ; uptime
Wed 27 Nov 2024 05:33:44 GMT
 5:33a.m.  up  4:31, 7 users, load averages: 2.49, 2.55, 1.88
% grep suspend /var/log/messages
Nov 26 08:10:07 mowa219-gjp4-zbook-freebsd acpi[18334]: suspend at 20241126 08:10:07
Nov 27 05:19:33 mowa219-gjp4-zbook-freebsd acpi[65154]: suspend at 20241127 05:19:33
% ifconfig wlan1 | grep -A 6 authmode
        regdomain ETSI country GB authmode WPA2/802.11i privacy ON
        deftxkey UNDEF TKIP 2:128-bit txpower 17 bmiss 7 mcastrate 6
        mgmtrate 6 scanvalid 60 wme roaming MANUAL
        parent interface: iwlwifi0
        media: IEEE 802.11 Wireless Ethernet OFDM/54Mbps mode 11a
        status: associated
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
% route show default
   route to: default
destination: default
       mask: default
    gateway: 192.168.1.1
        fib: 0
  interface: wlan1
      flags: <UP,GATEWAY,DONE,STATIC>
 recvpipe  sendpipe  ssthresh  rtt,msec    mtu        weight    expire
       0         0         0         0      1500         1         0 
% ping -4 -c 2 freshports.org
ping: cannot resolve freshports.org: Host name lookup failure
% 


A workaround, which I'll note separately (it's unusual), was performed at 05:35.
Comment 45 Graham Perrin 2024-11-27 06:18:01 UTC
Below, the 05:35 workaround and other relevant information. 

(The longhand workaround suits me for reasons that are off-topic from 263632.)

% su -
Password:
root@mowa219-gjp4-zbook-freebsd:~ # route delete default ; ifconfig gif0 down ; service netif stop em0 > & /dev/null ; ifconfig wlan0 destroy ; ifconfig wlan1 destroy ; sleep 1 ; service netif start wlan1 > & /dev/null ; sleep 15 ; resolvconf -i ; route show default ; ping -4 -c 2 freshports.org
delete net default
ifconfig: interface wlan0 does not exist
wlan1 
   route to: default
destination: default
       mask: default
    gateway: 192.168.1.1
        fib: 0
  interface: wlan1
      flags: <UP,GATEWAY,DONE,STATIC>
 recvpipe  sendpipe  ssthresh  rtt,msec    mtu        weight    expire
       0         0         0         0      1500         1         0 
PING freshports.org (54.227.255.74): 56 data bytes
64 bytes from 54.227.255.74: icmp_seq=0 ttl=52 time=91.210 ms
64 bytes from 54.227.255.74: icmp_seq=1 ttl=52 time=91.527 ms

--- freshports.org ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 91.210/91.369/91.527/0.159 ms
root@mowa219-gjp4-zbook-freebsd:~ # uname -bmvKU
FreeBSD 15.0-CURRENT main-n273879-3f0289ea7f66 GENERIC-NODEBUG amd64 1500028 1500028 1910200f0b8b51e4e493dbf2e93bd1589b540b29
root@mowa219-gjp4-zbook-freebsd:~ # 


% pciconf -lv | grep -A 3 iwlwifi
iwlwifi0@pci0:61:0:0:   class=0x028000 rev=0x6b hdr=0x00 vendor=0x8086 device=0x08b1 subvendor=0x8086 subdevice=0xc060
    vendor     = 'Intel Corporation'
    device     = 'Wireless 7260'
    class      = network
% sysrc devmatch_blocklist
devmatch_blocklist: i915kms if_iwm
% sysrc kld_list
kld_list: fusefs filemon nvidia-modeset
% grep -B 4 -A 1 ifconfig_wlan1 /etc/rc.conf

wlans_iwlwifi0="wlan1"
create_args_wlan1="wlanmode sta country GB regdomain etsi"
# create_args_wlan1="country GB regdomain etsi"
ifconfig_wlan1="WPA DHCP"

%