(This is a laptop running with fresh -CURRENT; the device was: iwlwifi0: <iwlwifi> mem 0xecc00000-0xecc01fff at device 0.0 on pci3 iwlwifi0: Detected crf-id 0xbadcafe, cnv-id 0x10 wfpm id 0x80000000 iwlwifi0: PCI dev 24fd/0010, rev=0x230, rfid=0xd55555d5 iwlwifi0: successfully loaded firmware image 'iwlwifi-8265-36.ucode' iwlwifi0: loaded firmware version 36.ca7b901d.0 8265-36.ucode op_mode iwlmvm iwlwifi0: Detected Intel(R) Dual Band Wireless AC 8265, REV=0x230 The kernel is built with WITNESS / INVARIANT enabled. It seems that the 802.11 stack was trying to transit from RUN to INIT, and the driver returned -EIO because firmware told it that ADD_STA_MODIFY_NON_EXISTING_STA (=0x8) in iwl_mvm_drain_sta(). ) Tue Nov 21 22:33:33 PST 2023 FreeBSD p51.home.us.delphij.net 15.0-CURRENT FreeBSD 15.0-CURRENT #1 main-n266520-f930dac6d584: Mon Nov 20 15:48:41 PST 2023 delphij@p51.home.us.delphij.net:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 panic: INIT state change failed GNU gdb (GDB) 13.2 [GDB v13.2 for FreeBSD] Copyright (C) 2023 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-portbld-freebsd15.0". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /boot/kernel/kernel... Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug... Unread portion of the kernel message buffer: iwlwifi0: linuxkpi_ieee80211_connection_loss: vif 0xfffffe01773abc80 vap 0xfffffe01773ab010 state RUN <6>wlan0: link state changed to DOWN <118>Nov 21 22:32:11 p51 wpa_supplicant[423]: ioctl[SIOCS80211, op=20, val=0, arg_len=7]: Can't assign requested address iwlwifi0: Couldn't drain frames for staid 0, status 0x8 iwlwifi0: lkpi_sta_run_to_init:1954: mo_sta_state(NOTEXIST) failed: -5 iwlwifi0: lkpi_iv_newstate: error -5 during state transition 5 (RUN) -> 0 (INIT) Dumping 2446 out of 32422 MB: (CTRL-C to abort) (CTRL-C to abort) ..1% (CTRL-C to abort) (CTRL-C to abort) (CTRL-C to abort) ..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57 57 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57 td = <optimized out> #1 doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:405 error = 0 coredump = <optimized out> #2 0xffffffff85dee143 in vt_kms_postswitch () from /boot/modules/drm.ko No symbol table info available. #3 0xffffffff8099ac81 in vt_window_switch (vw=0xfffff8000233bd80, vw@entry=0xffffffff816a9c98 <vt_conswindow>) at /usr/src/sys/dev/vt/vt_core.c:612 vd = 0xffffffff816a9de8 <vt_consdev> curvw = 0xfffff80006dfcd80 kbd = <optimized out> #4 0xffffffff8099bfdf in vtterm_cngrab (tm=<unavailable>, tm@entry=<error reading variable: value is not available>) at /usr/src/sys/dev/vt/vt_core.c:1863 vw = 0xffffffff816a9c98 <vt_conswindow> vd = 0xffffffff816a9de8 <vt_consdev> #5 0xffffffff80aeb106 in cngrab () at /usr/src/sys/kern/kern_cons.c:385 cnd = 0xffffffff8196d7e0 <cn_devtab> cn = <unavailable> #6 0xffffffff80b5bd7f in vpanic ( fmt=0xffffffff8120c4c9 "INIT state change failed", ap=ap@entry=0xffffffff82761dd0) at /usr/src/sys/kern/kern_shutdown.c:942 buf = "INIT state change failed", '\000' <repeats 231 times> __pc = <optimized out> __pc = <optimized out> __pc = <optimized out> other_cpus = {__bits = {127, 0 <repeats 15 times>}} td = 0xfffff80001f31000 bootopt = 256 newpanic = <optimized out> #7 0xffffffff80b5bbf3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:894 ap = {{gp_offset = 8, fp_offset = 48, overflow_arg_area = 0xffffffff82761e00, reg_save_area = 0xffffffff82761da0}} #8 0xffffffff80d104e1 in ieee80211_newstate_cb (xvap=0xfffffe01773ab010, npending=<optimized out>) at /usr/src/sys/net80211/ieee80211_proto.c:2552 vap = 0xfffffe01773ab010 ic = <optimized out> arg = 0 ostate = IEEE80211_S_RUN rc = -5 nstate = <optimized out> #9 0xffffffff80bc1f8b in taskqueue_run_locked ( queue=queue@entry=0xfffff800028c6000) at /usr/src/sys/kern/subr_taskqueue.c:512 et = {et_link = {tqe_next = 0x0, tqe_prev = 0x8}, et_td = 0xffffffff811ba967, et_section = {bucket = 0}, et_old_priority = 0 '\000'} tb = {tb_running = 0xfffffe01773ab320, tb_seq = 25, tb_canceling = false, tb_link = {le_next = 0x0, le_prev = 0xfffff800028c6010}} in_net_epoch = false task = 0xfffffe01773ab320 pending = 1 #10 0xffffffff80bc3043 in taskqueue_thread_loop ( arg=arg@entry=0xfffffe0176a55110) at /usr/src/sys/kern/subr_taskqueue.c:824 tqp = <optimized out> tq = 0xfffff800028c6000 #11 0xffffffff80b11372 in fork_exit ( callout=0xffffffff80bc2f70 <taskqueue_thread_loop>, arg=0xfffffe0176a55110, frame=0xffffffff82761f40) at /usr/src/sys/kern/kern_fork.c:1160 __pc = <optimized out> __pc = <optimized out> td = 0xfffff80001f31000 p = 0xffffffff8196c4c0 <proc0> dtd = <optimized out> #12 <signal handler called> No locals. #13 0x00001dd895cec5ba in ?? () No symbol table info available. Backtrace stopped: Cannot access memory at address 0x1dd89daf8f48 (kgdb)
Despite looking different this is a duplicate of 271979 with different states as a result of the node swap from net80211. *** This bug has been marked as a duplicate of bug 271979 ***
I'll re-open this. lkpi_sta_run_to_init() should not be affected by the problem in 271979 given it does not re-lookup ni -- at least not in my local tree. Also you made it all to mo_sta_state(NOTEXIST). That smells like a bss_conf update removed the sta for us. I've hit this before. Maybe only with pre-22000 cards? Test case (likely to reproduce): get into RUN take your AP away wait for connection loss to happen that will trigger the RUN -> INIT newstate change. Things will proceed from that. My 8265 is currently buried somewhere but I'll go and give it a try in a few days hopefully. Can you reproduce this somehow?
(In reply to Bjoern A. Zeeb from comment #2) obviously we do hist the beacon loss first: [2339.509731] iwlwifi0: linuxkpi_ieee80211_beacon_loss: vif 0xffff0001d6b09c80 vap 0xffff0001d6b09010 state RUN [2339.612089] iwlwifi0: linuxkpi_ieee80211_beacon_loss: vif 0xffff0001d6b09c80 vap 0xffff0001d6b09010 state RUN [2339.618158] wlan1: link state changed to DOWN which will get us into sta_beacon_miss and switch to SCAN. And lkpi_sta_run_to_scan simply calls lkpi_sta_run_to_init. Doesn't go kaboom on a modern card; in this case a B200. I'll go and find my 8265 and try again the next days.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=713db49d06deee90dd358b2e4b9ca05368a5eaf6 commit 713db49d06deee90dd358b2e4b9ca05368a5eaf6 Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2024-01-10 10:14:16 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-14 19:47:21 +0000 net80211: deal with lost state transitions Since 5efea30f039c4 we can possibly lose a state transition which can cause trouble further down the road. The reproducer from 643d6dce6c1e can trigger these for example. Drivers for firmware based wireless cards have worked around some of this (and other) problems in the past. Add an array of tasks rather than a single one as we would simply get npending > 1 and lose order with other tasks. Try to keep state changes updated as queued in case we end up with more than one at a time. While this is not ideal either (call it a hack) it will sort the problem for now. We will queue in ieee80211_new_state_locked() and do checks there and dequeue in ieee80211_newstate_cb(). If we still overrun the (currently) 8 slots we will drop the state change rather than overwrite the last one. When dequeing we will update iv_nstate and keep it around for historic reasons for the moment. The longer term we should make the callers of ieee80211_new_state[_locked]() actually use the returned errors and act appropriately but that will touch a lot more places and drivers (possibly incl. changed behaviour for ioctls). rtwn(4) and rum(4) should probably be revisted and net80211 internals removed (for rum(4) at least the current logic still seems prone to races). PR: 271979, 271988, 275255, 263613, 274003 Sponsored by: The FreeBSD Foundation (in 2023) MFC after: 3 days Reviewed by: cc Differential Revision: https://reviews.freebsd.org/D43389 sys/dev/rtwn/if_rtwn.c | 4 +- sys/dev/usb/wlan/if_rum.c | 4 +- sys/net80211/ieee80211.c | 4 +- sys/net80211/ieee80211_ddb.c | 13 ++++- sys/net80211/ieee80211_proto.c | 124 ++++++++++++++++++++++++++++++++++------- sys/net80211/ieee80211_var.h | 13 ++++- 6 files changed, 134 insertions(+), 28 deletions(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=2ac8a2189ac6707f48f77ef2e36baf696a0d2f40 commit 2ac8a2189ac6707f48f77ef2e36baf696a0d2f40 Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2024-02-03 16:33:56 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-14 19:47:53 +0000 LinuxKPI: 802.11: band-aid for invalid state changes after (*iv_update_bss) With firmware based solutions we cannot just jump from an active session to a new iv_bss node without tearing down state for the old and bringing up the new node. This likely used to work on softmac based cards/drivers where one could essentially set the state and fire at will. We track (*iv_update_bss) calls from net80211 and set a local flag that we are out of synch and do not allow any further operations up the state machine until we hit INIT or SCAN. That means someone will take the state down, clean up firmware state and then we can join again and build up state. Apparently this problem has been "known" for a while as native iwm(4) and others have similar workarounds (though less strict) and can be equally pestered into bad states. For LinuxKPI all the KASSERTs just massively brought this problem out. The solution will be some rewrites in net80211. Until then, try to keep us more stable at least and not die on second join1() calls triggered by service netif start wlan0 and similar. PR: 271979, 271988, 275255, 263613, 274003 Sponsored by: The FreeBSD Foundation (2023, partial) MFC after: 3 days Reviewed by: cc Differential Revision: https://reviews.freebsd.org/D43725 sys/compat/linuxkpi/common/src/linux_80211.c | 309 +++++++++++++++++++-------- sys/compat/linuxkpi/common/src/linux_80211.h | 2 + 2 files changed, 216 insertions(+), 95 deletions(-)
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=b392b36d3776b696601ce0253256803276d24ea2 commit b392b36d3776b696601ce0253256803276d24ea2 Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2024-01-10 10:14:16 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-18 18:31:17 +0000 net80211: deal with lost state transitions Since 5efea30f039c4 we can possibly lose a state transition which can cause trouble further down the road. The reproducer from 643d6dce6c1e can trigger these for example. Drivers for firmware based wireless cards have worked around some of this (and other) problems in the past. Add an array of tasks rather than a single one as we would simply get npending > 1 and lose order with other tasks. Try to keep state changes updated as queued in case we end up with more than one at a time. While this is not ideal either (call it a hack) it will sort the problem for now. We will queue in ieee80211_new_state_locked() and do checks there and dequeue in ieee80211_newstate_cb(). If we still overrun the (currently) 8 slots we will drop the state change rather than overwrite the last one. When dequeing we will update iv_nstate and keep it around for historic reasons for the moment. The longer term we should make the callers of ieee80211_new_state[_locked]() actually use the returned errors and act appropriately but that will touch a lot more places and drivers (possibly incl. changed behaviour for ioctls). rtwn(4) and rum(4) should probably be revisted and net80211 internals removed (for rum(4) at least the current logic still seems prone to races). Given this changes the internal structure of 'struct ieee80211vap', which gets allocated by the drivers, and we do not have enough spares, all wireless drivers need to be recompiled. Given we are forced to do the update, we leave fields in the middle of the struct and add more spares at the same time. __FreeBSD_version gets updated to 1400509 to be able to detect this change. PR: 271979, 271988, 275255, 263613, 274003 Sponsored by: The FreeBSD Foundation (in 2023) Reviewed by: cc Differential Revision: https://reviews.freebsd.org/D43389 (cherry picked from commit 713db49d06deee90dd358b2e4b9ca05368a5eaf6) (cherry picked from commit a890a3a5ddf33acb0a4000885945b89156799b07) UPDATING | 6 ++ sys/dev/rtwn/if_rtwn.c | 4 +- sys/dev/usb/wlan/if_rum.c | 4 +- sys/net80211/ieee80211.c | 4 +- sys/net80211/ieee80211_ddb.c | 13 ++++- sys/net80211/ieee80211_proto.c | 124 ++++++++++++++++++++++++++++++++++------- sys/net80211/ieee80211_var.h | 15 +++-- sys/sys/param.h | 2 +- 8 files changed, 142 insertions(+), 30 deletions(-)
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=8c450ea1083b03f30871506b59034f26bc608972 commit 8c450ea1083b03f30871506b59034f26bc608972 Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2024-02-03 16:33:56 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-18 18:31:17 +0000 LinuxKPI: 802.11: band-aid for invalid state changes after (*iv_update_bss) With firmware based solutions we cannot just jump from an active session to a new iv_bss node without tearing down state for the old and bringing up the new node. This likely used to work on softmac based cards/drivers where one could essentially set the state and fire at will. We track (*iv_update_bss) calls from net80211 and set a local flag that we are out of synch and do not allow any further operations up the state machine until we hit INIT or SCAN. That means someone will take the state down, clean up firmware state and then we can join again and build up state. Apparently this problem has been "known" for a while as native iwm(4) and others have similar workarounds (though less strict) and can be equally pestered into bad states. For LinuxKPI all the KASSERTs just massively brought this problem out. The solution will be some rewrites in net80211. Until then, try to keep us more stable at least and not die on second join1() calls triggered by service netif start wlan0 and similar. PR: 271979, 271988, 275255, 263613, 274003 Sponsored by: The FreeBSD Foundation (2023, partial) Reviewed by: cc Differential Revision: https://reviews.freebsd.org/D43725 (cherry picked from commit 2ac8a2189ac6707f48f77ef2e36baf696a0d2f40) sys/compat/linuxkpi/common/src/linux_80211.c | 309 +++++++++++++++++++-------- sys/compat/linuxkpi/common/src/linux_80211.h | 2 + 2 files changed, 216 insertions(+), 95 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=a7e1fc7f620d3341549c1380f550aaafbdb45622 commit a7e1fc7f620d3341549c1380f550aaafbdb45622 Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2024-01-10 10:14:16 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-19 08:02:01 +0000 net80211: deal with lost state transitions Since 5efea30f039c4 we can possibly lose a state transition which can cause trouble further down the road. The reproducer from 643d6dce6c1e can trigger these for example. Drivers for firmware based wireless cards have worked around some of this (and other) problems in the past. Add an array of tasks rather than a single one as we would simply get npending > 1 and lose order with other tasks. Try to keep state changes updated as queued in case we end up with more than one at a time. While this is not ideal either (call it a hack) it will sort the problem for now. We will queue in ieee80211_new_state_locked() and do checks there and dequeue in ieee80211_newstate_cb(). If we still overrun the (currently) 8 slots we will drop the state change rather than overwrite the last one. When dequeing we will update iv_nstate and keep it around for historic reasons for the moment. The longer term we should make the callers of ieee80211_new_state[_locked]() actually use the returned errors and act appropriately but that will touch a lot more places and drivers (possibly incl. changed behaviour for ioctls). rtwn(4) and rum(4) should probably be revisted and net80211 internals removed (for rum(4) at least the current logic still seems prone to races). PR: 271979, 271988, 275255, 263613, 274003 Sponsored by: The FreeBSD Foundation (in 2023) Reviewed by: cc Differential Revision: https://reviews.freebsd.org/D43389 (cherry picked from commit 713db49d06deee90dd358b2e4b9ca05368a5eaf6) Given this changes the internal structure of 'struct ieee80211vap', which gets allocated by the drivers, and we do not have enough spares, all wireless drivers need to be recompiled. Given we are forced to do the update, we leave fields in the middle of the struct and add more spares at the same time. __FreeBSD_version gets updated to 1303501 to be able to detect this change. (cherry picked from commit a890a3a5ddf33acb0a4000885945b89156799b07) UPDATING | 6 ++ sys/dev/rtwn/if_rtwn.c | 4 +- sys/dev/usb/wlan/if_rum.c | 4 +- sys/net80211/ieee80211.c | 4 +- sys/net80211/ieee80211_ddb.c | 15 ++++- sys/net80211/ieee80211_proto.c | 124 ++++++++++++++++++++++++++++++++++------- sys/net80211/ieee80211_var.h | 18 +++--- sys/sys/param.h | 2 +- 8 files changed, 143 insertions(+), 34 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=184ccc414686ea32c64f063c081c7cc1adeae7c3 commit 184ccc414686ea32c64f063c081c7cc1adeae7c3 Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2024-02-03 16:33:56 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-19 08:02:02 +0000 LinuxKPI: 802.11: band-aid for invalid state changes after (*iv_update_bss) With firmware based solutions we cannot just jump from an active session to a new iv_bss node without tearing down state for the old and bringing up the new node. This likely used to work on softmac based cards/drivers where one could essentially set the state and fire at will. We track (*iv_update_bss) calls from net80211 and set a local flag that we are out of synch and do not allow any further operations up the state machine until we hit INIT or SCAN. That means someone will take the state down, clean up firmware state and then we can join again and build up state. Apparently this problem has been "known" for a while as native iwm(4) and others have similar workarounds (though less strict) and can be equally pestered into bad states. For LinuxKPI all the KASSERTs just massively brought this problem out. The solution will be some rewrites in net80211. Until then, try to keep us more stable at least and not die on second join1() calls triggered by service netif start wlan0 and similar. PR: 271979, 271988, 275255, 263613, 274003 Sponsored by: The FreeBSD Foundation (2023, partial) Reviewed by: cc Differential Revision: https://reviews.freebsd.org/D43725 (cherry picked from commit 2ac8a2189ac6707f48f77ef2e36baf696a0d2f40) sys/compat/linuxkpi/common/src/linux_80211.c | 309 +++++++++++++++++++-------- sys/compat/linuxkpi/common/src/linux_80211.h | 2 + 2 files changed, 216 insertions(+), 95 deletions(-)
A commit in branch releng/13.3 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=d4b4efc6db6c6c3a9abf2f187ba1ccc0e40028cf commit d4b4efc6db6c6c3a9abf2f187ba1ccc0e40028cf Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2024-02-03 16:33:56 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-19 16:09:22 +0000 LinuxKPI: 802.11: band-aid for invalid state changes after (*iv_update_bss) With firmware based solutions we cannot just jump from an active session to a new iv_bss node without tearing down state for the old and bringing up the new node. This likely used to work on softmac based cards/drivers where one could essentially set the state and fire at will. We track (*iv_update_bss) calls from net80211 and set a local flag that we are out of synch and do not allow any further operations up the state machine until we hit INIT or SCAN. That means someone will take the state down, clean up firmware state and then we can join again and build up state. Apparently this problem has been "known" for a while as native iwm(4) and others have similar workarounds (though less strict) and can be equally pestered into bad states. For LinuxKPI all the KASSERTs just massively brought this problem out. The solution will be some rewrites in net80211. Until then, try to keep us more stable at least and not die on second join1() calls triggered by service netif start wlan0 and similar. Approved by: re (cperciva) PR: 271979, 271988, 275255, 263613, 274003 Sponsored by: The FreeBSD Foundation (2023, partial) Reviewed by: cc Differential Revision: https://reviews.freebsd.org/D43725 (cherry picked from commit 2ac8a2189ac6707f48f77ef2e36baf696a0d2f40) (cherry picked from commit 184ccc414686ea32c64f063c081c7cc1adeae7c3) sys/compat/linuxkpi/common/src/linux_80211.c | 309 +++++++++++++++++++-------- sys/compat/linuxkpi/common/src/linux_80211.h | 2 + 2 files changed, 216 insertions(+), 95 deletions(-)
A commit in branch releng/13.3 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=9b998db87c28356fce21784c4f8bfb8737615e1f commit 9b998db87c28356fce21784c4f8bfb8737615e1f Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2024-01-10 10:14:16 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-19 16:07:20 +0000 net80211: deal with lost state transitions Since 5efea30f039c4 we can possibly lose a state transition which can cause trouble further down the road. The reproducer from 643d6dce6c1e can trigger these for example. Drivers for firmware based wireless cards have worked around some of this (and other) problems in the past. Add an array of tasks rather than a single one as we would simply get npending > 1 and lose order with other tasks. Try to keep state changes updated as queued in case we end up with more than one at a time. While this is not ideal either (call it a hack) it will sort the problem for now. We will queue in ieee80211_new_state_locked() and do checks there and dequeue in ieee80211_newstate_cb(). If we still overrun the (currently) 8 slots we will drop the state change rather than overwrite the last one. When dequeing we will update iv_nstate and keep it around for historic reasons for the moment. The longer term we should make the callers of ieee80211_new_state[_locked]() actually use the returned errors and act appropriately but that will touch a lot more places and drivers (possibly incl. changed behaviour for ioctls). rtwn(4) and rum(4) should probably be revisted and net80211 internals removed (for rum(4) at least the current logic still seems prone to races). PR: 271979, 271988, 275255, 263613, 274003 Sponsored by: The FreeBSD Foundation (in 2023) Reviewed by: cc Differential Revision: https://reviews.freebsd.org/D43389 (cherry picked from commit 713db49d06deee90dd358b2e4b9ca05368a5eaf6) Given this changes the internal structure of 'struct ieee80211vap', which gets allocated by the drivers, and we do not have enough spares, all wireless drivers need to be recompiled. Given we are forced to do the update, we leave fields in the middle of the struct and add more spares at the same time. __FreeBSD_version will get updated to 1303001 to be able to detect this change. Approved by: re (cperciva) (cherry picked from commit a890a3a5ddf33acb0a4000885945b89156799b07) (cherry picked from commit a7e1fc7f620d3341549c1380f550aaafbdb45622) sys/dev/rtwn/if_rtwn.c | 4 +- sys/dev/usb/wlan/if_rum.c | 4 +- sys/net80211/ieee80211.c | 4 +- sys/net80211/ieee80211_ddb.c | 15 ++++- sys/net80211/ieee80211_proto.c | 124 ++++++++++++++++++++++++++++++++++------- sys/net80211/ieee80211_var.h | 18 +++--- 6 files changed, 136 insertions(+), 33 deletions(-)
I believe this should be fixed in all branches now. Can you re-test or can we close this?
Also just reported here with a 9xxx card: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=274003#c28 Will try to track this here.
I'm seeing very frequent (i.e. several times per day) panics after upgrading from 13.2R to 13.3R and possibly they are related to this bug. I'm not able to get a proper core on this machine (and I still can't understand why), but in the logs I find: Apr 18 15:00:25 hector kernel: iwlwifi0: linuxkpi_ieee80211_beacon_loss: vif 0xfffffe00ac5b1e80 vap 0xfffffe00ac5b1010 state RUN Apr 18 15:00:28 hector syslogd: last message repeated 1 times Apr 18 15:00:29 hector kernel: ipfw: 9999 Deny UDP 192.168.113.82:137 192.168.113.255:137 in via wlan0 Apr 18 15:00:29 hector wpa_supplicant[93738]: wlan0: CTRL-EVENT-DISCONNECTED bssid=fc:f5:28:ca:f1:12 reason=0 Apr 18 15:00:29 hector kernel: iwlwifi0: linuxkpi_ieee80211_connection_loss: vif 0xfffffe00ac5b1e80 vap 0xfffffe00ac5b1010 state RUN Apr 18 15:00:29 hector kernel: wlan0: link state changed to DOWN Apr 18 15:00:29 hector wpa_supplicant[93738]: BSSID fc:f5:28:ca:f1:12 ignore list count incremented to 2, ignoring for 10 seconds Apr 18 15:00:29 hector wpa_supplicant[93738]: ioctl[SIOCS80211, op=20, val=0, arg_len=7]: Can't assign requested address Apr 18 15:00:29 hector dhclient[97089]: wlan0 link state up -> down Apr 18 15:00:29 hector devd[99040]: Processing event '!system=IFNET subsystem=wlan0 type=LINK_DOWN' Apr 18 15:00:29 hector devd[99040]: Pushing table Apr 18 15:00:29 hector devd[99040]: Processing notify event Apr 18 15:00:29 hector devd[99040]: Popping table Apr 18 15:00:29 hector dbus-daemon[3053]: [system] Activating service name='org.freedesktop.ConsoleKit' requested by ':1.2' (uid=0 pid=24346 comm="") (using servicehelper) Apr 18 15:00:29 hector kernel: iwlwifi0: Couldn't drain frames for staid 0, status 0x8 Apr 18 15:00:29 hector kernel: iwlwifi0: lkpi_sta_run_to_init:2173: mo_sta_state(NOTEXIST) failed: -5 Apr 18 15:00:29 hector kernel: iwlwifi0: lkpi_iv_newstate: error -5 during state transition 5 (RUN) -> 0 (INIT) Apr 18 15:00:29 hector dbus-daemon[3053]: [system] Activating service name='org.freedesktop.PolicyKit1' requested by ':1.3' (uid=0 pid=24293 comm="") (using servicehelper) Apr 18 15:00:29 hector dbus-daemon[3053]: [system] Successfully activated service 'org.freedesktop.ConsoleKit' Apr 18 15:00:29 hector polkitd[24756]: Started polkitd version 124 Apr 18 15:00:29 hector polkitd[24756]: Loading rules from directory /usr/local/etc/polkit-1/rules.d Apr 18 15:00:29 hector polkitd[24756]: Loading rules from directory /usr/local/share/polkit-1/rules.d Apr 18 15:00:29 hector polkitd[24756]: Finished loading, compiling and executing 1 rules Apr 18 15:00:29 hector dbus-daemon[3053]: [system] Successfully activated service 'org.freedesktop.PolicyKit1' Apr 18 15:00:29 hector polkitd[24756]: Acquired the name org.freedesktop.PolicyKit1 on the system bus Apr 18 15:00:29 hector dbus-daemon[29677]: [session uid=1001 pid=28981] Activating service name='org.a11y.Bus' requested by ':1.0' (uid=1001 pid=25777 comm="") Apr 18 15:00:29 hector dbus-daemon[29677]: [session uid=1001 pid=28981] Successfully activated service 'org.a11y.Bus' Apr 18 15:00:29 hector dbus-daemon[29677]: [session uid=1001 pid=28981] Activating service name='org.xfce.Xfconf' requested by ':1.2' (uid=1001 pid=25777 comm="") Apr 18 15:00:29 hector dbus-daemon[29677]: [session uid=1001 pid=28981] Successfully activated service 'org.xfce.Xfconf' Apr 18 15:00:30 hector dbus-daemon[29677]: [session uid=1001 pid=28981] Activating service name='org.gtk.vfs.Daemon' requested by ':1.6' (uid=1001 pid=36842 comm="") Apr 18 15:00:30 hector dbus-daemon[29677]: [session uid=1001 pid=28981] Successfully activated service 'org.gtk.vfs.Daemon' Apr 18 15:00:30 hector dbus-daemon[3053]: [system] Activating service name='org.freedesktop.UPower' requested by ':1.6' (uid=1001 pid=40674 comm="") (using servicehelper) Apr 18 15:00:30 hector dbus-daemon[3053]: [system] Successfully activated service 'org.freedesktop.UPower' Apr 18 15:00:30 hector wpa_supplicant[93738]: wlan0: Trying to associate with fc:f5:28:ca:f1:13 (SSID='CCBiesse' freq=5180 MHz) Apr 18 15:00:30 hector kernel: iwlwifi0: lkpi_sta_scan_to_auth:1033: lvif 0xfffffe00ac5b1000 vap 0xfffffe00ac5b1010 iv_bss 0xfffffe00adb4c000 lvif_bss 0xfffff8000565a000 lvif_bss->ni 0xfffffe00aed99000 synched 0 Apr 18 15:00:30 hector kernel: iwlwifi0: lkpi_iv_newstate: error 16 during state transition 1 (SCAN) -> 2 (AUTH) Apr 18 15:01:09 hector syslogd: restart Apr 18 15:01:09 hector syslogd: kernel boot file is /boot/kernel/kernel Apr 18 15:01:09 hector kernel: Sleeping thread (tid 100785, pid 0) owns a non-sleepable lock Apr 18 15:01:09 hector kernel: KDB: stack backtrace of thread 100785: Apr 18 15:01:09 hector kernel: sched_switch() at sched_switch+0x7d1/frame 0xfffffe00ab30fe20 Apr 18 15:01:09 hector kernel: mi_switch() at mi_switch+0xbf/frame 0xfffffe00ab30fe40 Apr 18 15:01:09 hector kernel: _sleep() at _sleep+0x1f0/frame 0xfffffe00ab30fec0 Apr 18 15:01:09 hector kernel: taskqueue_thread_loop() at taskqueue_thread_loop+0xb1/frame 0xfffffe00ab30fef0 Apr 18 15:01:09 hector kernel: fork_exit() at fork_exit+0x7d/frame 0xfffffe00ab30ff30 Apr 18 15:01:09 hector kernel: fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00ab30ff30 Apr 18 15:01:09 hector kernel: --- trap 0xc, rip = 0x63639d6e3da, rsp = 0x6364d128f48, rbp = 0x6364d128f60 --- Apr 18 15:01:09 hector kernel: panic: sleeping thread Apr 18 15:01:09 hector kernel: cpuid = 2 Apr 18 15:01:09 hector kernel: time = 1713445230 Apr 18 15:01:09 hector kernel: KDB: stack backtrace: Apr 18 15:01:09 hector kernel: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00ab2ba960 Apr 18 15:01:09 hector kernel: vpanic() at vpanic+0x152/frame 0xfffffe00ab2ba9b0 Apr 18 15:01:09 hector kernel: panic() at panic+0x43/frame 0xfffffe00ab2baa10 Apr 18 15:01:09 hector kernel: propagate_priority() at propagate_priority+0x293/frame 0xfffffe00ab2baa50 Apr 18 15:01:09 hector kernel: turnstile_wait() at turnstile_wait+0x314/frame 0xfffffe00ab2baaa0 Apr 18 15:01:09 hector kernel: __mtx_lock_sleep() at __mtx_lock_sleep+0x17b/frame 0xfffffe00ab2bab30 Apr 18 15:01:09 hector kernel: linuxkpi_ieee80211_find_sta() at linuxkpi_ieee80211_find_sta+0xd0/frame 0xfffffe00ab2bab70 Apr 18 15:01:09 hector kernel: linuxkpi_ieee80211_find_sta_by_ifaddr() at linuxkpi_ieee80211_find_sta_by_ifaddr+0x7f/frame 0xfffffe00ab2babc0 Apr 18 15:01:09 hector kernel: iwl_mvm_rx_mpdu_mq() at iwl_mvm_rx_mpdu_mq+0x420/frame 0xfffffe00ab2bacd0 Apr 18 15:01:09 hector kernel: iwl_pcie_rx_handle() at iwl_pcie_rx_handle+0x444/frame 0xfffffe00ab2badd0 Apr 18 15:01:09 hector kernel: iwl_pcie_napi_poll_msix() at iwl_pcie_napi_poll_msix+0x30/frame 0xfffffe00ab2bae20 Apr 18 15:01:09 hector kernel: lkpi_napi_task() at lkpi_napi_task+0xf/frame 0xfffffe00ab2bae40 Apr 18 15:01:09 hector kernel: taskqueue_run_locked() at taskqueue_run_locked+0x182/frame 0xfffffe00ab2baec0 Apr 18 15:01:09 hector kernel: taskqueue_thread_loop() at taskqueue_thread_loop+0xc2/frame 0xfffffe00ab2baef0 Apr 18 15:01:09 hector kernel: fork_exit() at fork_exit+0x7d/frame 0xfffffe00ab2baf30 Apr 18 15:01:09 hector kernel: fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00ab2baf30 Apr 18 15:01:09 hector kernel: --- trap 0xc, rip = 0x63639d6e3da, rsp = 0x63644046f48, rbp = 0x63644046f60 --- Apr 18 15:01:09 hector kernel: Uptime: 33s Another: Apr 18 15:05:32 hector wpa_supplicant[21281]: wlan0: CTRL-EVENT-DISCONNECTED bssid=fc:f5:28:ca:f1:13 reason=0 Apr 18 15:05:32 hector kernel: iwlwifi0: linuxkpi_ieee80211_beacon_loss: vif 0xfffffe00abd8de80 vap 0xfffffe00abd8d010 state RUN Apr 18 15:05:32 hector syslogd: last message repeated 1 times Apr 18 15:05:32 hector kernel: wlan0: link state changed to DOWN Apr 18 15:05:32 hector devd[76299]: Processing event '!system=IFNET subsystem=wlan0 type=LINK_DOWN' Apr 18 15:05:32 hector dhclient[22403]: wlan0 link state up -> down Apr 18 15:05:32 hector devd[76299]: Pushing table Apr 18 15:05:32 hector devd[76299]: Processing notify event Apr 18 15:05:32 hector kernel: iwlwifi0: Couldn't drain frames for staid 0, status 0x8 Apr 18 15:05:32 hector kernel: iwlwifi0: lkpi_sta_run_to_init:2173: mo_sta_state(NOTEXIST) failed: -5 Apr 18 15:05:32 hector kernel: iwlwifi0: lkpi_iv_newstate: error -5 during state transition 5 (RUN) -> 1 (SCAN) Apr 18 15:05:32 hector devd[76299]: Popping table Apr 18 15:05:33 hector wpa_supplicant[21281]: wlan0: Trying to associate with fc:f5:28:ca:f1:13 (SSID='CCBiesse' freq=5180 MHz) Apr 18 15:05:33 hector kernel: iwlwifi0: lkpi_sta_scan_to_auth:1033: lvif 0xfffffe00abd8d000 vap 0xfffffe00abd8d010 iv_bss 0xfffffe00ade45000 lvif_bss 0xfffff80005c8f800 lvif_bss->ni 0xfffffe00ac03c000 synched 0 Apr 18 15:05:33 hector kernel: iwlwifi0: lkpi_iv_newstate: error 16 during state transition 1 (SCAN) -> 2 (AUTH) Apr 18 15:05:33 hector kernel: Sleeping thread (tid 100787, pid 0) owns a non-sleepable lock Apr 18 15:05:33 hector kernel: KDB: stack backtrace of thread 100787: Apr 18 15:05:33 hector kernel: sched_switch() at sched_switch+0x7d1/frame 0xfffffe00b0d0ae20 Apr 18 15:05:33 hector kernel: mi_switch() at mi_switch+0xbf/frame 0xfffffe00b0d0ae40 Apr 18 15:05:33 hector kernel: _sleep() at _sleep+0x1f0/frame 0xfffffe00b0d0aec0 Apr 18 15:06:26 hector syslogd: restart Apr 18 15:06:26 hector syslogd: kernel boot file is /boot/kernel/kernel Apr 18 15:06:26 hector kernel: fork_exit() at fork_exit+0x7d/frame 0xfffffe00b0d0af30 Apr 18 15:06:26 hector kernel: fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00b0d0af30 Apr 18 15:06:26 hector kernel: --- trap 0xfc9226bb, rip = 0x9e2138e7aa6c807d, rsp = 0xce04afe17783e5b4, rbp = 0x7adcc9aeee864891 --- Apr 18 15:06:26 hector kernel: panic: sleeping thread Apr 18 15:06:26 hector kernel: cpuid = 3 Apr 18 15:06:26 hector kernel: time = 1713445533 Apr 18 15:06:26 hector kernel: KDB: stack backtrace: Apr 18 15:06:26 hector kernel: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00b0dbc960 Apr 18 15:06:26 hector kernel: vpanic() at vpanic+0x152/frame 0xfffffe00b0dbc9b0 Apr 18 15:06:26 hector kernel: panic() at panic+0x43/frame 0xfffffe00b0dbca10 Apr 18 15:06:26 hector kernel: propagate_priority() at propagate_priority+0x293/frame 0xfffffe00b0dbca50 Apr 18 15:06:26 hector kernel: turnstile_wait() at turnstile_wait+0x314/frame 0xfffffe00b0dbcaa0 Apr 18 15:06:26 hector kernel: __mtx_lock_sleep() at __mtx_lock_sleep+0x17b/frame 0xfffffe00b0dbcb30 Apr 18 15:06:26 hector kernel: linuxkpi_ieee80211_find_sta() at linuxkpi_ieee80211_find_sta+0xd0/frame 0xfffffe00b0dbcb70 Apr 18 15:06:26 hector kernel: linuxkpi_ieee80211_find_sta_by_ifaddr() at linuxkpi_ieee80211_find_sta_by_ifaddr+0x7f/frame 0xfffffe00b0dbcbc0 Apr 18 15:06:26 hector kernel: iwl_mvm_rx_mpdu_mq() at iwl_mvm_rx_mpdu_mq+0x420/frame 0xfffffe00b0dbccd0 Apr 18 15:06:26 hector kernel: iwl_pcie_rx_handle() at iwl_pcie_rx_handle+0x444/frame 0xfffffe00b0dbcdd0 Apr 18 15:06:26 hector kernel: iwl_pcie_napi_poll_msix() at iwl_pcie_napi_poll_msix+0x30/frame 0xfffffe00b0dbce20 Apr 18 15:06:26 hector kernel: lkpi_napi_task() at lkpi_napi_task+0xf/frame 0xfffffe00b0dbce40 Apr 18 15:06:26 hector kernel: taskqueue_run_locked() at taskqueue_run_locked+0x182/frame 0xfffffe00b0dbcec0 Apr 18 15:06:26 hector kernel: taskqueue_thread_loop() at taskqueue_thread_loop+0xc2/frame 0xfffffe00b0dbcef0 Apr 18 15:06:26 hector kernel: fork_exit() at fork_exit+0x7d/frame 0xfffffe00b0dbcf30 Apr 18 15:06:26 hector kernel: fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00b0dbcf30 Apr 18 15:06:26 hector kernel: --- trap 0, rip = 0, rsp = 0, rbp = 0 --- Apr 18 15:06:26 hector kernel: Uptime: 4m36s I've got several others. Once I also entered a loop where the laptop would panic and reboot before I reached the prompt. Luckily entering single-user mode, starting network from there and then moving to multi-user mode solved. # pciconf -lv iwlwifi0 iwlwifi0@pci0:0:12:0: class=0x028000 rev=0x06 hdr=0x00 vendor=0x8086 device=0x31dc subvendor=0x8086 subdevice=0x0264 vendor = 'Intel Corporation' device = 'Gemini Lake PCH CNVi WiFi' class = network (should be Intel 9461).
(In reply to ml from comment #14) P.S. I'm wrinting here on invitation from Bjoern (he was the one saying I'm possibly seeing this specific issue).
I tested this in a VM with a fairly recent -current (commit da4230af3fda). ifconfig wlan0 create wlandev iwlwifi0 service netif start wlan0 works fine and the machine stayed up quite a while. But when I did a "shutdown now" it paniced. I repeated this a few times: syslogd: exiting on signal 15 iwlwifi0: Couldn't drain frames for staid 0, status 0x8 iwlwifi0: lkpi_sta_run_to_init:2309: mo_sta_state(NOTEXIST) failed: -5 iwlwifi0: lkpi_iv_newstate: error -5 during state transition 5 (RUN) -> 0 (INIT) panic: INIT state change failed cpuid = 0 time = 1715667397 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00688bbc60 vpanic() at vpanic+0x13f/frame 0xfffffe00688bbd90 panic() at panic+0x43/frame 0xfffffe00688bbdf0 ieee80211_newstate_cb() at ieee80211_newstate_cb+0x422/frame 0xfffffe00688bbe40 taskqueue_run_locked() at taskqueue_run_locked+0x1c2/frame 0xfffffe00688bbec0 taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame 0xfffffe00688bbef0 fork_exit() at fork_exit+0x82/frame 0xfffffe00688bbf30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00688bbf30 --- trap 0xc, rip = 0x39f11dbd331a, rsp = 0x39f122c47f48, rbp = 0x39f122c47f60 --- KDB: enter: panic [ thread pid 0 tid 100266 ] Stopped at kdb_enter+0x33: movq $0,0x1053452(%rip)
Can everyone please try to patch from https://reviews.freebsd.org/D45293 ? You can download the raw diff from https://reviews.freebsd.org/D45293?download=true if you don't use arc. The change should apply to main, stable/14 and stable/13.
I applied the patch on -current and the system worked fine and didn't crash at reboot time. It worked fine for several more reboots.
(In reply to Bakul Shah from comment #18) Wow, that was fast. Thank you! Can you remind me of the PCI ID of your card?
iwlwifi0@pci0:0:7:0: class=0x028000 rev=0x29 hdr=0x00 vendor=0x8086 device=0x2526 subvendor=0x8086 subdevice=0x0014 vendor = 'Intel Corporation' device = 'Wi-Fi 5(802.11ac) Wireless-AC 9x6x [Thunder Peak]' class = network iwlwifi0: Detected Intel(R) Wireless-AC 9260 160MHz, REV=0x321 iwlwifi0: base HW address: 8c:a9:82:fc:e8:9c, OTP minor version: 0x4 iwlwifi0: <iwlwifi> mem 0xc1034000-0xc1037fff at device 7.0 on pci0 iwlwifi0: Detected crf-id 0x2816, cnv-id 0x1000200 wfpm id 0x80000000 iwlwifi0: PCI dev 2526/0014, rev=0x321, rfid=0x105110 iwlwifi0: successfully loaded firmware image 'iwlwifi-9260-th-b0-jf-b0-46.ucode' Note that on reboot the device seems to be in some odd state and I see messages like iwlwifi0: loaded firmware version 46.ff18e32a.0 9260-th-b0-jf-b0-46.ucode op_mode iwlmvm iwlwifi0: Detected Intel(R) Wireless-AC 9260 160MHz, REV=0x321 iwlwifi0: Failed to load firmware chunk! iwlwifi0: iwlwifi transaction failed, dumping registers iwlwifi0: iwlwifi device config registers: iwlwifi 0000:00:07.0: 0000 86 80 26 25 07 04 10 00 29 00 80 02 00 00 00 00 |.. ... iwlwifi0: Could not load the [0] uCode section iwlwifi0: Failed to start INIT ucode: -60 iwlwifi0: WRT: Collecting data: ini trigger 13 fired (delay=0ms). iwlwifi0: Not valid error log pointer 0x00000000 for Init uCode iwlwifi0: IML/ROM dump: ... iwlwifi0: 0xE27C6CB6 | FSEQ_CLASS_TP_VERSION iwlwifi0: Failed to run INIT ucode: -60 iwlwifi0: retry init count 0 iwlwifi0: Detected Intel(R) Wireless-AC 9260 160MHz, REV=0x321 But it seems to work fine (at least with my manula ifconfig ... create...). Haven't tried enabling it from rc.conf yet.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=5a4d24610fc6143ac1d570fe2b5160e8ae893c2c commit 5a4d24610fc6143ac1d570fe2b5160e8ae893c2c Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2024-05-22 02:24:51 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-05-22 21:04:19 +0000 LinuxKPI: 802.11: change teardown order to avoid iwlwifi firmware crashes While the previous order worked well for iwlwifi 22000 and later chipsets (AXxxx, BE200), earlier chipsets had trouble and ran into firmware crashes. Change the teardown order to avoid these problems. The inline comments in lkpi_sta_run_to_init() (and lkpi_disassoc()) try to document the new order and also the old problems we were seeing (too early sta removal or silent non-removal) leading to follow-up problems. There is a possible further problem still lingering but a lot harder to trigger (see comment in review) and likely related to some other doings so we'll track it separately. Sponsored by: The FreeBSD Foundation MFC after: 3 days PR: 275255 Tested with: AX210, 8265 (bz); 9260 (Bakul Shah) Differential Revision: https://reviews.freebsd.org/D45293 sys/compat/linuxkpi/common/src/linux_80211.c | 84 ++++++++++++++++++---------- 1 file changed, 55 insertions(+), 29 deletions(-)
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=7ad7453748e2adafa1e1a3e44b02fc852d4c5301 commit 7ad7453748e2adafa1e1a3e44b02fc852d4c5301 Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2024-05-22 02:24:51 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-06-12 13:58:36 +0000 LinuxKPI: 802.11: change teardown order to avoid iwlwifi firmware crashes While the previous order worked well for iwlwifi 22000 and later chipsets (AXxxx, BE200), earlier chipsets had trouble and ran into firmware crashes. Change the teardown order to avoid these problems. The inline comments in lkpi_sta_run_to_init() (and lkpi_disassoc()) try to document the new order and also the old problems we were seeing (too early sta removal or silent non-removal) leading to follow-up problems. There is a possible further problem still lingering but a lot harder to trigger (see comment in review) and likely related to some other doings so we'll track it separately. Sponsored by: The FreeBSD Foundation PR: 275255 Tested with: AX210, 8265 (bz); 9260 (Bakul Shah) Differential Revision: https://reviews.freebsd.org/D45293 (cherry picked from commit 5a4d24610fc6143ac1d570fe2b5160e8ae893c2c) sys/compat/linuxkpi/common/src/linux_80211.c | 84 ++++++++++++++++++---------- 1 file changed, 55 insertions(+), 29 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=def43d8a4a3c0a2868fe74ef6aefe16c435ea19c commit def43d8a4a3c0a2868fe74ef6aefe16c435ea19c Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2024-05-22 02:24:51 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-06-14 14:55:16 +0000 LinuxKPI: 802.11: change teardown order to avoid iwlwifi firmware crashes While the previous order worked well for iwlwifi 22000 and later chipsets (AXxxx, BE200), earlier chipsets had trouble and ran into firmware crashes. Change the teardown order to avoid these problems. The inline comments in lkpi_sta_run_to_init() (and lkpi_disassoc()) try to document the new order and also the old problems we were seeing (too early sta removal or silent non-removal) leading to follow-up problems. There is a possible further problem still lingering but a lot harder to trigger (see comment in review) and likely related to some other doings so we'll track it separately. Sponsored by: The FreeBSD Foundation PR: 275255 Tested with: AX210, 8265 (bz); 9260 (Bakul Shah) Differential Revision: https://reviews.freebsd.org/D45293 (cherry picked from commit 5a4d24610fc6143ac1d570fe2b5160e8ae893c2c) sys/compat/linuxkpi/common/src/linux_80211.c | 84 ++++++++++++++++++---------- 1 file changed, 55 insertions(+), 29 deletions(-)
We believe this is fixed in main, stable/13 and stable/14. If you run a release before (not including) 14.2 or 13.4 then there is no need to report it anymore. If you have a chance in these cases to try a stable branch after the commits it would be great. Thanks to everyone who reported the problem, provided debug information or tested the change(s).