Bug 273985 - iwlwifi: AX201: crash (kernel panic) on ifconfig wlan0 destroy (ieee80211_node_vdetach / node_free / _ieee80211_free_node)
Summary: iwlwifi: AX201: crash (kernel panic) on ifconfig wlan0 destroy (ieee80211_nod...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.2-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: Bjoern A. Zeeb
URL:
Keywords: crash, needs-qa
Depends on:
Blocks: iwlwifi
  Show dependency treegraph
 
Reported: 2023-09-20 19:19 UTC by Maxim Filimonov
Modified: 2024-02-19 17:06 UTC (History)
4 users (show)

See Also:
bz: mfc-stable14+
bz: mfc-stable13+


Attachments
a more debug backtrace (17.74 KB, text/plain)
2023-09-23 16:55 UTC, Maxim Filimonov
no flags Details
core.txt for the aforementined backtrace (74.97 KB, application/gzip)
2023-09-23 17:17 UTC, Maxim Filimonov
no flags Details
Picture of kernel panic on driver unload (581.15 KB, image/jpeg)
2024-01-07 08:01 UTC, mmatalka
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Maxim Filimonov 2023-09-20 19:19:40 UTC
FreeBSD-13.2-RELEASE-p3. amd64. ThinkPad T14s gen 1.

As many people have mentioned, iwlwifi isn't that stable. In my case, __sometimes__ (more often after total connectivity loss or UDP failure) iwlwifi crashes the system on wlan0 destruction. Here's the backtrace:

(lldb) bt
* thread #1, name = '(pid 3984) ifconfig (crashed)'
  * frame #0: 0xffffffff80c065ae kernel`doadump + 46
    frame #1: 0xffffffff80c0638a kernel`kern_reboot + 1082
    frame #2: 0xffffffff80c0682e kernel`vpanic + 446
    frame #3: 0xffffffff80c06663 kernel`panic + 67
    frame #4: 0xffffffff810b1fa7 kernel`trap_fatal + 903
    frame #5: 0xffffffff810b1fff kernel`trap_pfault + 79
    frame #6: 0xffffffff81088ed8 kernel`calltrap + 8
    frame #7: 0xfffffe012388e1b0
    frame #8: 0xffffffff80d8f873 kernel`ieee80211_node_psq_drain + 243
    frame #9: 0xffffffff80d83907 kernel`node_cleanup + 167
    frame #10: 0xffffffff80d83825 kernel`node_free + 37
    frame #11: 0xffffffff80d83f3b kernel`ieee80211_node_vdetach + 43
    frame #12: 0xffffffff80d5bd7b kernel`ieee80211_vap_detach + 1099
    frame #13: 0xffffffff80e47d7b kernel`lkpi_ic_vap_delete + 171
    frame #14: 0xffffffff80d22931 kernel`ifc_simple_destroy_wrapper + 33
    frame #15: 0xffffffff80d21c59 kernel`if_clone_destroyif_flags + 201
    frame #16: 0xffffffff80d21b31 kernel`if_clone_destroy + 273
    frame #17: 0xffffffff80d1f07f kernel`ifioctl + 1775
    frame #18: 0xffffffff80c748fd kernel`kern_ioctl + 621
    frame #19: 0xffffffff80c745e0 kernel`sys_ioctl + 256
    frame #20: 0xffffffff810b289c kernel`amd64_syscall + 268
    frame #21: 0xffffffff810897eb kernel`fast_syscall_common + 248

It's similar to what I've seen in similar PRs, but not exactly the same.
Comment 1 Graham Perrin 2023-09-21 02:02:51 UTC
> ThinkPad T14s gen 1.

Can you describe the Wi-Fi hardware? 

Thanks, and then we can aim for bug 269842 and this bug 269842 to not have identical summary lines.
Comment 2 Maxim Filimonov 2023-09-21 08:31:58 UTC
The hardware is, as follows:

iwlwifi0: Detected Intel(R) Wi-Fi 6 AX201 160MHz, REV=0x351

At least, that's what dmesg says about it.
Comment 3 Maxim Filimonov 2023-09-23 16:55:23 UTC
Created attachment 245166 [details]
a more debug backtrace

Here's a more debug backtrace I collected recently. Looks like the exact same core dump.

Note it was caught on  the following kernel:
% uname -v
FreeBSD 13.2-STABLE #2 stable/13-n256346-a4916232acd6-dirty: Thu Sep 21 02:20:19 +04 2023     root@hamster:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
Comment 4 Maxim Filimonov 2023-09-23 17:17:32 UTC
Created attachment 245168 [details]
core.txt for the aforementined backtrace

Forgot to attach the core.txt file.
Comment 5 Maxim Filimonov 2023-10-02 13:22:46 UTC
Will there be fixes for this in 13-STABLE, by the way?
Comment 6 Bjoern A. Zeeb freebsd_committer freebsd_triage 2023-10-02 13:45:05 UTC
(In reply to Maxim Filimonov from comment #5)

This one isn't fixed yet but the intention is to get it all - driver update, LinuxKPI update, net80211 changes merged to stable/14 and I also hope that all can be merged smoothly to stable/13 too.  It'll be the next days.
Comment 7 Maxim Filimonov 2023-12-11 11:03:39 UTC
Any updates on this? It reproduces with the following version:
14.0-STABLE FreeBSD 14.0-STABLE #4 stable/14-n265760-ae8387cc818a
Comment 8 Cheng Cui freebsd_committer freebsd_triage 2023-12-11 19:15:49 UTC
Also hit this crash reliably.

root@n1_iwl_vm:~ # uname -a
FreeBSD n1_iwl_vm 15.0-CURRENT FreeBSD 15.0-CURRENT #19 main-488bc7e9a: Tue Nov 21 11:42:00 EST 2023     root@n1_iwl_vm:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
root@n1_iwl_vm:~ # 
root@n1_iwl_vm:~ # ifconfig wlan0 destroy


Fatal trap 9: general protection fault while in kernel mode
cpuid = 0; apic id = 00
instruction pointer	= 0x20:0xffffffff80b2a5c8
stack pointer	        = 0x28:0xfffffe0074741a40
frame pointer	        = 0x28:0xfffffe0074741a80
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 1139 (ifconfig)
rdi: deadc0dedeadc0f6 rsi: fffff8000598d740 rdx: 0000000000000000
rcx: 0000000000000865  r8: 0000000000000001  r9: 0000000000010000
rax: 0000000000000001 rbx: deadc0dedeadc10e rbp: fffffe0074741a80
r10: 0000000000000001 r11: 0000000000010000 r12: 0000000000000865
r13: deadc0dedeadc0de r14: deadc0dedeadc10e r15: ffffffff8116aaf8
trap number		= 9
panic: general protection fault
cpuid = 0
time = 1702316103
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0074741780
vpanic() at vpanic+0x132/frame 0xfffffe00747418b0
panic() at panic+0x43/frame 0xfffffe0074741910
trap_fatal() at trap_fatal+0x40c/frame 0xfffffe0074741970
calltrap() at calltrap+0x8/frame 0xfffffe0074741970
--- trap 0x9, rip = 0xffffffff80b2a5c8, rsp = 0xfffffe0074741a40, rbp = 0xfffffe0074741a80 ---
__mtx_lock_flags() at __mtx_lock_flags+0x48/frame 0xfffffe0074741a80
_ieee80211_free_node() at _ieee80211_free_node+0x34/frame 0xfffffe0074741ac0
ieee80211_node_vdetach() at ieee80211_node_vdetach+0x2b/frame 0xfffffe0074741ae0
ieee80211_vap_detach() at ieee80211_vap_detach+0x612/frame 0xfffffe0074741b20
lkpi_ic_vap_delete() at lkpi_ic_vap_delete+0xae/frame 0xfffffe0074741b50
wlan_clone_destroy() at wlan_clone_destroy+0x12/frame 0xfffffe0074741b60
if_clone_destroyif_flags() at if_clone_destroyif_flags+0x6a/frame 0xfffffe0074741ba0
if_clone_destroy() at if_clone_destroy+0x100/frame 0xfffffe0074741be0
ifioctl() at ifioctl+0x8a5/frame 0xfffffe0074741cd0
kern_ioctl() at kern_ioctl+0x286/frame 0xfffffe0074741d30
sys_ioctl() at sys_ioctl+0x152/frame 0xfffffe0074741e00
amd64_syscall() at amd64_syscall+0x153/frame 0xfffffe0074741f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0074741f30
--- syscall (54, FreeBSD ELF64, ioctl), rip = 0x3c753c0388a, rsp = 0x3c74f4a9a38, rbp = 0x3c74f4a9a70 ---
KDB: enter: panic
[ thread pid 1139 tid 100090 ]
Stopped at      kdb_enter+0x32: movq    $0,0xe2ab23(%rip)
db> dump
Dumping 338 out of 6111 MB:..5%..15%..24%..34%..43%..53%..62%..71%..81%..95%
Dump complete
db>
Comment 9 Cheng Cui freebsd_committer freebsd_triage 2023-12-11 19:17:06 UTC
(In reply to Cheng Cui from comment #8)

Please let me know if the core file is needed, and I can upload it to freefall.
Comment 10 mmatalka 2024-01-07 07:58:31 UTC
I'm not sure if this is the right bug report but I wanted to post that I'm on 15.0-CURRENT #6 main-n267044-eb4d13126d85 experience panic sometimes on suspend/resume.  It doesn't happen all the time.  My susepnd/resume involves unloading the wifi kernel mod (I thought maybe this would resolve the issue) and stopping wlan0 interface.  No dice.
Comment 11 mmatalka 2024-01-07 08:01:54 UTC
Created attachment 247506 [details]
Picture of kernel panic on driver unload
Comment 12 mmatalka 2024-01-30 19:11:04 UTC
Just a small update, I'm on 15.0-CURRENT #9 main-n267794-72dd306e44bc and still hitting this.  It happens when I am suspending.  It's not entirely consistent, I can almost always suspend once successfully, sometimes twice, but almost definitely not three times.
Comment 13 mmatalka 2024-01-30 19:11:46 UTC
I'm also on 6.1lts for drm-kmod and latest drm-kmod-firmware
Comment 14 commit-hook freebsd_committer freebsd_triage 2024-02-14 19:50:16 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=0936c648ad0ee5152dc19f261e77fe9c1833fe05

commit 0936c648ad0ee5152dc19f261e77fe9c1833fe05
Author:     Bjoern A. Zeeb <bz@FreeBSD.org>
AuthorDate: 2024-02-05 14:51:08 +0000
Commit:     Bjoern A. Zeeb <bz@FreeBSD.org>
CommitDate: 2024-02-14 19:48:04 +0000

    LinuxKPI: 802.11: update the ni/lsta reference cycle

    Update the ni/lsta reference cycle, add extra checks and assertions.
    This is to accomodate problems we were seeing based on net80211
    behaviour (join1() and (*iv_update_bss)() as well as state changes for
    new iv_bss nodes during an active session).
    This should hopefully help to stabilise behaviour until the underlying
    problems gets properly addressed (for this and all other device drivers).

    PR:             272607, 273985, 274003
    MFC after:      3 days
    Reviewed by:    cc
    Differential Revision: https://reviews.freebsd.org/D43753

 sys/compat/linuxkpi/common/src/linux_80211.c | 209 +++++++++++++++++----------
 sys/compat/linuxkpi/common/src/linux_80211.h |   1 +
 2 files changed, 130 insertions(+), 80 deletions(-)
Comment 15 Maxim Filimonov 2024-02-17 22:52:37 UTC
Will this be MFCd into stable/14?
Comment 16 commit-hook freebsd_committer freebsd_triage 2024-02-18 21:12:01 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=12887199b37469c98a47baf66cd3cc182c79fbd6

commit 12887199b37469c98a47baf66cd3cc182c79fbd6
Author:     Bjoern A. Zeeb <bz@FreeBSD.org>
AuthorDate: 2024-02-05 14:51:08 +0000
Commit:     Bjoern A. Zeeb <bz@FreeBSD.org>
CommitDate: 2024-02-18 18:31:17 +0000

    LinuxKPI: 802.11: update the ni/lsta reference cycle

    Update the ni/lsta reference cycle, add extra checks and assertions.
    This is to accomodate problems we were seeing based on net80211
    behaviour (join1() and (*iv_update_bss)() as well as state changes for
    new iv_bss nodes during an active session).
    This should hopefully help to stabilise behaviour until the underlying
    problems gets properly addressed (for this and all other device drivers).

    PR:             272607, 273985, 274003
    Reviewed by:    cc
    Differential Revision: https://reviews.freebsd.org/D43753

    (cherry picked from commit 0936c648ad0ee5152dc19f261e77fe9c1833fe05)

 sys/compat/linuxkpi/common/src/linux_80211.c | 209 +++++++++++++++++----------
 sys/compat/linuxkpi/common/src/linux_80211.h |   1 +
 2 files changed, 130 insertions(+), 80 deletions(-)
Comment 17 Bjoern A. Zeeb freebsd_committer freebsd_triage 2024-02-18 21:14:34 UTC
(In reply to Maxim Filimonov from comment #15)

just happened with a lot of other changes.
Comment 18 commit-hook freebsd_committer freebsd_triage 2024-02-19 08:09:11 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=223edc1a3c2fc86dbc7fa0ecd00f26a85d7c7b43

commit 223edc1a3c2fc86dbc7fa0ecd00f26a85d7c7b43
Author:     Bjoern A. Zeeb <bz@FreeBSD.org>
AuthorDate: 2024-02-05 14:51:08 +0000
Commit:     Bjoern A. Zeeb <bz@FreeBSD.org>
CommitDate: 2024-02-19 08:02:02 +0000

    LinuxKPI: 802.11: update the ni/lsta reference cycle

    Update the ni/lsta reference cycle, add extra checks and assertions.
    This is to accomodate problems we were seeing based on net80211
    behaviour (join1() and (*iv_update_bss)() as well as state changes for
    new iv_bss nodes during an active session).
    This should hopefully help to stabilise behaviour until the underlying
    problems gets properly addressed (for this and all other device drivers).

    PR:             272607, 273985, 274003
    Reviewed by:    cc
    Differential Revision: https://reviews.freebsd.org/D43753

    (cherry picked from commit 0936c648ad0ee5152dc19f261e77fe9c1833fe05)

 sys/compat/linuxkpi/common/src/linux_80211.c | 209 +++++++++++++++++----------
 sys/compat/linuxkpi/common/src/linux_80211.h |   1 +
 2 files changed, 130 insertions(+), 80 deletions(-)
Comment 19 mmatalka 2024-02-19 12:17:20 UTC
I've been using this patch all week and suspend/resume has not failed on me yet.  Great!
Comment 20 commit-hook freebsd_committer freebsd_triage 2024-02-19 16:10:44 UTC
A commit in branch releng/13.3 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=9b2da4bc5a68294bc1dcfdd0d0ccadf747bafd67

commit 9b2da4bc5a68294bc1dcfdd0d0ccadf747bafd67
Author:     Bjoern A. Zeeb <bz@FreeBSD.org>
AuthorDate: 2024-02-05 14:51:08 +0000
Commit:     Bjoern A. Zeeb <bz@FreeBSD.org>
CommitDate: 2024-02-19 16:09:22 +0000

    LinuxKPI: 802.11: update the ni/lsta reference cycle

    Update the ni/lsta reference cycle, add extra checks and assertions.
    This is to accomodate problems we were seeing based on net80211
    behaviour (join1() and (*iv_update_bss)() as well as state changes for
    new iv_bss nodes during an active session).
    This should hopefully help to stabilise behaviour until the underlying
    problems gets properly addressed (for this and all other device drivers).

    Approved by:    re (cperciva)
    PR:             272607, 273985, 274003
    Reviewed by:    cc
    Differential Revision: https://reviews.freebsd.org/D43753

    (cherry picked from commit 0936c648ad0ee5152dc19f261e77fe9c1833fe05)
    (cherry picked from commit 223edc1a3c2fc86dbc7fa0ecd00f26a85d7c7b43)

 sys/compat/linuxkpi/common/src/linux_80211.c | 209 +++++++++++++++++----------
 sys/compat/linuxkpi/common/src/linux_80211.h |   1 +
 2 files changed, 130 insertions(+), 80 deletions(-)
Comment 21 Bjoern A. Zeeb freebsd_committer freebsd_triage 2024-02-19 16:35:54 UTC
(In reply to mmatalka from comment #19)

Great.  Thanks a lot for the feedback!


To everyone:  thanks for reporting and testing!
I believe this is fixed in all branches now (15/14/13/13.3).
In case you hit it again, please re-open.
I annotated the title with the usual function names we go through so people searching can better find it.