Bug 269824 - iwlwifi sporadic loss of connectivity, then kernel panic on service netif restart
Summary: iwlwifi sporadic loss of connectivity, then kernel panic on service netif res...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.2-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-wireless (Nobody)
URL:
Keywords: crash, needs-qa
Depends on:
Blocks: iwlwifi
  Show dependency treegraph
 
Reported: 2023-02-25 11:48 UTC by Maxim Usatov
Modified: 2024-02-19 16:51 UTC (History)
9 users (show)

See Also:
bz: mfc-stable14+
bz: mfc-stable13+


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Maxim Usatov 2023-02-25 11:48:26 UTC
Experiencing sporadic loss of wireless connectivity, ifconfig just shows "" in ssid. Doing service netif restart results in kernel panic. 

Relevant logs:
Feb 25 11:37:58 freebsd devd[2204]: Processing event '!system=IFNET subsystem=wlan0 type=LINK_DOWN'
Feb 25 11:37:58 freebsd kernel: wlan0: link state changed to DOWN
Feb 25 11:37:58 freebsd wpa_supplicant[337]: wlan0: CTRL-EVENT-DISCONNECTED bssid=28:b3:71:17:08:98 reason=0
Feb 25 11:37:58 freebsd wpa_supplicant[337]: ioctl[SIOCS80211, op=20, val=0, arg_len=7]: Can't assign requested address
Feb 25 11:38:31 freebsd ntpd[2586]: Soliciting pool server 162.159.200.1
Feb 25 11:38:49 freebsd dbus-daemon[2810]: [session uid=1001 pid=2808] Activating service name='org.xfce.Xfconf' requested by ':1.11' (uid=1001 pid=2841 comm="")
Feb 25 11:38:49 freebsd dbus-daemon[2810]: [session uid=1001 pid=2808] Successfully activated service 'org.xfce.Xfconf'
Feb 25 11:39:11 freebsd devd[2204]: Processing event '!system=DEVFS subsystem=CDEV type=CREATE cdev=pts/0'
Feb 25 11:39:19 freebsd devd[2204]: Processing event '!system=IFNET subsystem=lo0 type=LINK_DOWN'
Feb 25 11:39:19 freebsd kernel: lo0: link state changed to DOWN
Feb 25 11:39:19 freebsd dhclient[1833]: My address (192.168.0.33) was deleted, dhclient exiting
Feb 25 11:39:19 freebsd wpa_supplicant[337]: wlan0: CTRL-EVENT-DSCP-POLICY clear_all
Feb 25 11:39:19 freebsd syslogd: last message repeated 1 times
Feb 25 11:39:19 freebsd wpa_supplicant[337]: wlan0: CTRL-EVENT-TERMINATING 
Feb 25 11:39:19 freebsd dhclient[1833]: connection closed
Feb 25 11:39:19 freebsd dhclient[1833]: exiting.
Feb 25 11:40:49 freebsd syslogd: restart
Feb 25 11:40:49 freebsd syslogd: kernel boot file is /boot/kernel/kernel
Feb 25 11:40:49 freebsd kernel: iwlwifi0: iwl_trans_send_cmd bad state = 0
Feb 25 11:40:49 freebsd kernel: iwlwifi0: Failed to remove MAC context: -5
Feb 25 11:40:49 freebsd kernel: iwlwifi0: iwl_trans_send_cmd bad state = 0
Feb 25 11:40:49 freebsd kernel: iwlwifi0: Failed to synchronize multicast groups update
Feb 25 11:40:49 freebsd kernel: 
Feb 25 11:40:49 freebsd syslogd: last message repeated 1 times
Feb 25 11:40:49 freebsd kernel: Fatal trap 12: page fault while in kernel mode
Feb 25 11:40:49 freebsd kernel: cpuid = 6; apic id = 18
Feb 25 11:40:49 freebsd kernel: fault virtual address	= 0x440
Feb 25 11:40:49 freebsd kernel: fault code		= supervisor read data, page not present
Feb 25 11:40:49 freebsd kernel: instruction pointer	= 0x20:0xffffffff80bf90ce
Feb 25 11:40:49 freebsd kernel: stack pointer	        = 0x28:0xfffffe013038f9e0
Feb 25 11:40:49 freebsd kernel: frame pointer	        = 0x28:0xfffffe013038fa60
Feb 25 11:40:49 freebsd kernel: code segment		= base rx0, limit 0xfffff, type 0x1b
Feb 25 11:40:49 freebsd kernel: 			= DPL 0, pres 1, long 1, def32 0, gran 1
Feb 25 11:40:49 freebsd kernel: processor eflags	= interrupt enabled, resume, IOPL = 0
Feb 25 11:40:49 freebsd kernel: current process		= 19185 (ifconfig)
Feb 25 11:40:49 freebsd kernel: trap number		= 12
Feb 25 11:40:49 freebsd kernel: panic: page fault
Feb 25 11:40:49 freebsd kernel: cpuid = 6
Feb 25 11:40:49 freebsd kernel: time = 1677325160
Feb 25 11:40:49 freebsd kernel: KDB: stack backtrace:
Feb 25 11:40:49 freebsd kernel: #0 0xffffffff80c694c5 at kdb_backtrace+0x65
Feb 25 11:40:49 freebsd kernel: #1 0xffffffff80c1bb7f at vpanic+0x17f
Feb 25 11:40:49 freebsd kernel: #2 0xffffffff80c1b9f3 at panic+0x43
Feb 25 11:40:49 freebsd kernel: #3 0xffffffff810afdf5 at trap_fatal+0x385
Feb 25 11:40:49 freebsd kernel: #4 0xffffffff810afe4f at trap_pfault+0x4f
Feb 25 11:40:49 freebsd kernel: #5 0xffffffff810875d8 at calltrap+0x8
Feb 25 11:40:49 freebsd kernel: #6 0xffffffff80d9ccc3 at ieee80211_node_psq_drain+0xf3
Feb 25 11:40:49 freebsd kernel: #7 0xffffffff80d90d57 at node_cleanup+0xa7
Feb 25 11:40:49 freebsd kernel: #8 0xffffffff80d90c75 at node_free+0x25
Feb 25 11:40:49 freebsd kernel: #9 0xffffffff80d9138b at ieee80211_node_vdetach+0x2b
Feb 25 11:40:49 freebsd kernel: #10 0xffffffff80d68d7c at ieee80211_vap_detach+0x44c
Feb 25 11:40:49 freebsd kernel: #11 0xffffffff80e55ace at lkpi_ic_vap_delete+0x9e
Feb 25 11:40:49 freebsd kernel: #12 0xffffffff80d35421 at if_clone_destroyif+0x1c1
Feb 25 11:40:49 freebsd kernel: #13 0xffffffff80d35206 at if_clone_destroy+0x196
Feb 25 11:40:49 freebsd kernel: #14 0xffffffff80d3256b at ifioctl+0x32b
Feb 25 11:40:49 freebsd kernel: #15 0xffffffff80c8982b at kern_ioctl+0x25b
Feb 25 11:40:49 freebsd kernel: #16 0xffffffff80c89531 at sys_ioctl+0xf1
Feb 25 11:40:49 freebsd kernel: #17 0xffffffff810b06ec at amd64_syscall+0x10c
Feb 25 11:40:49 freebsd kernel: Uptime: 2h13m11s

My laptop hwprobe: https://bsd-hardware.info/?probe=b7a491a010

FreeBSD 13.1-RELEASE-p6 GENERIC
Comment 1 Graham Perrin freebsd_committer freebsd_triage 2023-02-25 16:49:18 UTC
From <https://bsd-hardware.info/?probe=b7a491a010#pci:8086-51f0-8086-0090>: 

> … 8086:51f0:8086:0090 … Alder Lake-P PCH CNVi WiFi
Comment 2 Ihor Antonov 2023-02-25 19:08:32 UTC
I do have the same/similar problem occasionally - sporadic loss of connectivity, also 
unreliable establishment of connection as well. It takes up to 5-10 time to do 
"service netif restart" before connection is successfully established.

I am on 14-Current,
TigerLake laptop with AX201 wifi chip

Panic also happens sometimes after resume from sleep, when I run "service netif restart"
Comment 3 Maxim Usatov 2023-04-12 07:41:10 UTC
Upgraded to FreeBSD 13.2. Same occasional kernel panics.
Comment 4 Bjoern A. Zeeb freebsd_committer freebsd_triage 2023-09-30 08:46:18 UTC
if you can try main: please update to/past the revision mentioned in:
https://lists.freebsd.org/archives/freebsd-wireless/2023-September/001441.html
Comment 5 Eirik Oeverby 2023-10-06 08:52:54 UTC
(In reply to Bjoern A. Zeeb from comment #4)
I'm on releng/14, has this hit that branch yet? I can't take this laptop to -current at the moment, sorry - but I'm seeing all the panic/spontaneous reboots and random network disconnects as others are reporting.
Comment 6 Bjoern A. Zeeb freebsd_committer freebsd_triage 2023-10-06 10:05:45 UTC
(In reply to Eirik Oeverby from comment #5)

No, not all has hit stable/14 yet.  I hope it will the next hours.  Got interrupted, twice, yesterday losing network for a few hours.
Comment 7 Eirik Oeverby 2023-10-08 10:18:33 UTC
(In reply to Bjoern A. Zeeb from comment #6)
As a follow-up, it seems like I'll lose wifi connectivity anything from every 10-15 minutes to (sometimes) every couple of hours. I'm not able to pinpoint anything in particular causing it. My IRC log tells me I'm disconnecting every 10-20 minutes during a long period of inactivity (as in, computer put aside, but not going to sleep). If I wait long enough it will reconnect, but running `ifconfig wlan0 scan` resolves the issue temporarily.
Nothing in dmesg.
Comment 8 Bjoern A. Zeeb freebsd_committer freebsd_triage 2023-10-08 12:34:39 UTC
(In reply to Eirik Oeverby from comment #7)

So for now most changes are in stable/14 -- you probably noticed.

Questions:

(0) which chipset/firmware version is this?

(1) concerning the topic: "sporadic loss of connectivity" still there;  panics are gone?  -- there's one known issue to cause a panic still (tracked elsewhere).

(2) Can you increase wpa_supplicant logging (-ddd) and see if anything shows up in the logs that gives you a reason?

(3)  Out of fun and curiosity, what happens if you would let a ping to your gateway run while otherwise "idle" (laptop put away)?
Comment 9 Eirik Oeverby 2023-10-08 14:36:55 UTC
(In reply to Bjoern A. Zeeb from comment #8)
0: Is this enough?
iwlwifi0: <iwlwifi> mem 0xea338000-0xea33bfff at device 20.3 on pci0
iwlwifi0: successfully loaded firmware image 'iwlwifi-QuZ-a0-hr-b0-73.ucode'
iwlwifi0: api flags index 2 larger than supported by driver
iwlwifi0: TLV_FW_FSEQ_VERSION: FSEQ Version: 89.3.35.37
iwlwifi0: loaded firmware version 73.35c0a2c6.0 QuZ-a0-hr-b0-73.ucode op_mode iwlmvm
iwlwifi0: Detected Intel(R) Wi-Fi 6 AX201 160MHz, REV=0x351
uhub0: 18 ports with 18 removable, self powered
iwlwifi0: Detected RF HR B3, rfid=0x10a100

1: Not entirely, I have had spontaneous reboots on resume or even when issuing a scan or restarting netif - but no record of them since I'm in X when it happens and it always just boots. It didn't use to do that, not sure if it's a hardware reset or something I can prevent through some sysctl. What hasn't happened (yet) is that I need to hard-reset the machine; this sometimes happened on netif restart, especially related to being associated with an iPhone hotspot - usually afterwards, when trying to associate with something else.

2: I've done that now; where should I expect the logs? I see a bit in /var/log/messages so I'll get back with info if I get any.

3: I'll let you know.

PS: Whenever it reconnects (following a manual scan), speed tends to drop. I usually find myself at the lowest possible speed, like 11 or 6Mbps or whatever it is. Hideously slow, anyway. A netif restart gets me back to 54 for a bit.

/Eirik
Comment 10 Eirik Oeverby 2023-10-08 22:39:48 UTC
(In reply to Bjoern A. Zeeb from comment #8)
3: I get "Network is down" when it drops off - which it still does. Also wildly varying response times; periods of ~1-2ms, then a few hundred, then bursts of 1-10 seconds delay (but not necessarily any loss), then back to a few dozen or whatnot.
The node I'm pinging is my freebsd firewall, and there's no other traffic to speak of on the network.
Comment 11 rkoberman 2023-10-09 04:30:07 UTC
I'm also seeing this. Restarting the interface often causes a panic. If I have X running, the system freezes, but I exited X and restarted the interface d it crashed. Looks like the failure locks up the graphics. I then started over, exited X and restarted the interface. Panic immediately after "Starting wpa_supplicant."
Sun Oct  8 17:37:18 PDT 2023

FreeBSD ptavv 15.0-CURRENT FreeBSD 15.0-CURRENT #7 main-n265807-04c8bfc17610: Sat Oct  7 23:34:33 PDT 2023     root@ptavv:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64

Driver calls the interface an AX211 but pciconf says it's a Alder Lake-P PCH CNVi WiFi.

Here is the start of core.rxt. I can attach the full file (166K), if it would help.

panic: lkpi_sta_auth_to_scan: lsta 0xfffff8000bb92000 state not NONE: 0, nstate 1 arg 1

Unread portion of the kernel message buffer:
<6>wlan0: ieee80211_new_state_locked: pending SCAN -> AUTH transition lost
<4>Invalid TXQ id
WARNING !drm_modeset_is_locked(&crtc->mutex) failed at /usr/obj/usr/src/amd64.amd64/sys/GENERIC/usr/ports/graphics/drm-515-kmod/work/drm-kmod-drm_v5.15.25_5/drivers/gpu/drm/drm_atomic_helper.$
WARNING !drm_modeset_is_locked(&crtc->mutex) failed at /usr/obj/usr/src/amd64.amd64/sys/GENERIC/usr/ports/graphics/drm-515-kmod/work/drm-kmod-drm_v5.15.25_5/drivers/gpu/drm/drm_atomic_helper.$
WARNING !drm_modeset_is_locked(&crtc->mutex) failed at /usr/obj/usr/src/amd64.amd64/sys/GENERIC/usr/ports/graphics/drm-515-kmod/work/drm-kmod-drm_v5.15.25_5/drivers/gpu/drm/drm_atomic_helper.$
WARNING !drm_modeset_is_locked(&crtc->mutex) failed at /usr/obj/usr/src/amd64.amd64/sys/GENERIC/usr/ports/graphics/drm-515-kmod/work/drm-kmod-drm_v5.15.25_5/drivers/gpu/drm/drm_atomic_helper.$
WARNING !drm_modeset_is_locked(&dev->mode_config.connection_mutex) failed at /usr/obj/usr/src/amd64.amd64/sys/GENERIC/usr/ports/graphics/drm-515-kmod/work/drm-kmod-drm_v5.15.25_5/drivers/gpu/$


Fatal trap 9: general protection fault while in kernel mode
cpuid = 8; apic id = 18
instruction pointer     = 0x20:0xffffffff831d25d0
stack pointer           = 0x28:0xfffffe015adc85f0
frame pointer           = 0x28:0xfffffe015adc8630
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = resume, IOPL = 0
current process         = 0 (iwlwifi0 net80211 t)
rdi: 85f9894d80558b4c rsi: fffff8000bdcc028 rdx: fffff8000bdcc010
rcx: fffffe014884c3f8  r8: 85f9894d80558b14  r9: 0000000000000000
rax: fffffe015adc89c8 rbx: fffffe014884f2b0 rbp: fffffe015adc8630
r10: fffffe015bf91a72 r11: fffff80319cd8800 r12: fffff80049b92480
r13: fffffe014884c000 r14: fffff80319cd8800 r15: 0000000000000000
trap number             = 9
panic: general protection fault
cpuid = 8
time = 1696811656

KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe015adc8330
vpanic() at vpanic+0x132/frame 0xfffffe015adc8460
panic() at panic+0x43/frame 0xfffffe015adc84c0
trap_fatal() at trap_fatal+0x40c/frame 0xfffffe015adc8520
calltrap() at calltrap+0x8/frame 0xfffffe015adc8520
--- trap 0x9, rip = 0xffffffff831d25d0, rsp = 0xfffffe015adc85f0, rbp = 0xfffffe015adc8630 ---
intel_atomic_get_global_obj_state() at intel_atomic_get_global_obj_state+0x90/frame 0xfffffe015adc8630
skl_compute_wm() at skl_compute_wm+0xaec/frame 0xfffffe015adc8850
intel_atomic_check() at intel_atomic_check+0xeff/frame 0xfffffe015adc8920
drm_atomic_check_only() at drm_atomic_check_only+0x4a3/frame 0xfffffe015adc8990
drm_atomic_commit() at drm_atomic_commit+0x13/frame 0xfffffe015adc89b0
drm_client_modeset_commit_atomic() at drm_client_modeset_commit_atomic+0x158/frame 0xfffffe015adc8a20
drm_client_modeset_commit_locked() at drm_client_modeset_commit_locked+0x74/frame 0xfffffe015adc8a70
drm_client_modeset_commit() at drm_client_modeset_commit+0x21/frame 0xfffffe015adc8a90
drm_fb_helper_restore_fbdev_mode_unlocked() at drm_fb_helper_restore_fbdev_mode_unlocked+0x83/frame 0xfffffe015adc8ac0
vt_kms_postswitch() at vt_kms_postswitch+0x181/frame 0xfffffe015adc8af0
vt_window_switch() at vt_window_switch+0x25e/frame 0xfffffe015adc8b30
vtterm_cngrab() at vtterm_cngrab+0x4f/frame 0xfffffe015adc8b50
cngrab() at cngrab+0x26/frame 0xfffffe015adc8b70
vpanic() at vpanic+0xd1/frame 0xfffffe015adc8ca0
panic() at panic+0x43/frame 0xfffffe015adc8d00
lkpi_sta_auth_to_scan() at lkpi_sta_auth_to_scan+0x2a7/frame 0xfffffe015adc8d80
lkpi_iv_newstate() at lkpi_iv_newstate+0x253/frame 0xfffffe015adc8df0
ieee80211_newstate_cb() at ieee80211_newstate_cb+0x1e7/frame 0xfffffe015adc8e40
taskqueue_run_locked() at taskqueue_run_locked+0xab/frame 0xfffffe015adc8ec0
taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame 0xfffffe015adc8ef0
fork_exit() at fork_exit+0x82/frame 0xfffffe015adc8f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe015adc8f30
--- trap 0x74117411, rip = 0x174b174b174b174b, rsp = 0x9f3c9f3c9f3c9f3c, rbp = 0xa1f0a1f0a1f0a1f
Comment 12 Bjoern A. Zeeb freebsd_committer freebsd_triage 2023-10-09 09:15:54 UTC
(In reply to rkoberman from comment #11)

If you scroll down to the dmesg section in your core.txt, what's before the
"wlan0: ieee80211_new_state_locked: pending SCAN -> AUTH transition lost" line?

Alternatively can you email me the entire core.txt to bz@ privately?
Comment 13 Bjoern A. Zeeb freebsd_committer freebsd_triage 2023-10-09 09:17:56 UTC
(In reply to Eirik Oeverby from comment #10)

Thanks Eirik.  I'll email you offline.
Comment 14 Bjoern A. Zeeb freebsd_committer freebsd_triage 2023-10-25 21:28:27 UTC
If you two can still reproduce this can you please see comment 25 from PR 271979 and give that change a try (and see if we can confirm the cause)?

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=271979#c25
Comment 15 Bjoern A. Zeeb freebsd_committer freebsd_triage 2024-02-19 16:51:25 UTC
Both the original node_free (see also  PR 273985) and the scan_to_auth panics (and netif restart) are believed to be fixed and should not be seen in 15/14/13/13.3 from RC1 on anymore.

I'll close this.  In case you see them still after updating, please re-open!
In case things work, kindly leave a note for history here :)

Thanks for reporting, your patience and all the testing!