Bug 271988 - iwlwifi - AX210 wireless cannot work normally in FreeBSD 13.2 (also Invalid TXQ id?)
Summary: iwlwifi - AX210 wireless cannot work normally in FreeBSD 13.2 (also Invalid T...
Status: In Progress
Alias: None
Product: Base System
Classification: Unclassified
Component: wireless (show other bugs)
Version: 13.2-STABLE
Hardware: amd64 Any
: --- Affects Some People
Assignee: Bjoern A. Zeeb
URL:
Keywords:
Depends on:
Blocks: iwlwifi
  Show dependency treegraph
 
Reported: 2023-06-14 08:17 UTC by Kevin
Modified: 2024-03-20 19:54 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Kevin 2023-06-14 08:17:16 UTC
The AX210 cannot work normally on my laptop, when I install FreeBSD 13.2 in it.
FreeBSD13.1 did not have this issue.
If I delete a line in rc.conf. It could work without ipv6.
Here is a part of default rc.conf.

wlans_iwlwifi0="wlan0"
ifconfig_wlan0="WPA DHCP"
# ifconfig_wlan0_ipv6="inet6 accept_rtadv"

I cannot add "ifconfig_wlan0_ipv6="inet6 accept_rtadv"" in my rc.conf. Otherwise the wireless will be broken.

Here is a part of dmesg, when I add "ifconfig_wlan0_ipv6="inet6 accept_rtadv"" in my rc.conf.

Intel(R) Wireless WiFi based driver for FreeBSD
acpi_wmi0: <ACPI-WMI mapping> on acpi0
acpi_wmi1: <ACPI-WMI mapping> on acpi0
acpi_wmi2: <ACPI-WMI mapping> on acpi0
acpi_wmi2: Embedded MOF found
ACPI: \134_SB.PCI0.WMI1.WQXM: 1 arguments were passed to a non-method ACPI object (Buffer) (20201113/nsarguments-361)
ichsmb0: <Intel Sunrise Point-H SMBus controller> port 0xf000-0xf01f mem 0xda12a000-0xda12a0ff at device 31.4 on pci0
smbus0: <System Management Bus> on ichsmb0
iwlwifi0: <iwlwifi> mem 0xdc200000-0xdc203fff at device 0.0 on pci6
iwlwifi0: successfully loaded firmware image 'iwlwifi-ty-a0-gf-a0-73.ucode'
iwlwifi0: api flags index 2 larger than supported by driver
iwlwifi0: TLV_FW_FSEQ_VERSION: FSEQ Version: 0.0.2.36
iwlwifi0: loaded firmware version 73.35c0a2c6.0 ty-a0-gf-a0-73.ucode op_mode iwlmvm
iwlwifi0: Detected Intel(R) Wi-Fi 6 AX210 160MHz, REV=0x420
iwlwifi0: successfully loaded firmware image 'iwlwifi-ty-a0-gf-a0.pnvm'
iwlwifi0: loaded PNVM version 881c99e1
iwlwifi0: Detected RF GF, rfid=0x10d000
iwlwifi0: base HW address: 84:7b:27:92:6e:59
wlan0: Ethernet address: 84:7b:27:92:6e:59
lo0: link state changed to UP
alc0: link state changed to DOWN
wlan0: ieee80211_new_state_locked: pending SCAN -> AUTH transition lost
Invalid TXQ id
iwlwifi0: lkpi_iv_newstate: error -5 during state transition 2 (AUTH) -> 1 (SCAN)
WARNING mvmvif->ap_sta_id != 0xFF failed at /usr/src/sys/contrib/dev/iwlwifi/mvm/sta.c:1761
wlan0: ieee80211_new_state_locked: pending ASSOC -> RUN transition lost
iwlwifi0: Microcode SW error detected. Restarting 0x0.
iwlwifi0: Start IWL Error Log Dump:
iwlwifi0: Transport status: 0x0000004B, valid: 6
iwlwifi0: Loaded firmware version: 73.35c0a2c6.0 ty-a0-gf-a0-73.ucode
iwlwifi0: 0x00000071 | NMI_INTERRUPT_UMAC_FATAL
iwlwifi0: 0x00A08200 | trm_hw_status0
iwlwifi0: 0x00000000 | trm_hw_status1
iwlwifi0: 0x004DB676 | branchlink2
iwlwifi0: 0x004D1896 | interruptlink1
iwlwifi0: 0x004D1896 | interruptlink2
iwlwifi0: 0x00016B8A | data1
iwlwifi0: 0x00000010 | data2
iwlwifi0: 0x00000000 | data3
iwlwifi0: 0x00017636 | beacon time
iwlwifi0: 0x57A96A05 | tsf low
iwlwifi0: 0x00000027 | tsf hi
iwlwifi0: 0x00000000 | time gp1
iwlwifi0: 0x00DA033C | time gp2
iwlwifi0: 0x00000001 | uCode revision type
iwlwifi0: 0x00000049 | uCode version major
iwlwifi0: 0x35C0A2C6 | uCode version minor
iwlwifi0: 0x00000420 | hw version
iwlwifi0: 0x00C89002 | board version
iwlwifi0: 0x805CFC01 | hcmd
iwlwifi0: 0x24020000 | isr0
iwlwifi0: 0x61000000 | isr1
iwlwifi0: 0x48F00002 | isr2
iwlwifi0: 0x00C3400C | isr3
iwlwifi0: 0x00200000 | isr4
iwlwifi0: 0x0A01001C | last cmd Id
iwlwifi0: 0x00016B8A | wait_event
iwlwifi0: 0x00000010 | l2p_control
iwlwifi0: 0x00018034 | l2p_duration
iwlwifi0: 0x0000003F | l2p_mhvalid
iwlwifi0: 0x00CF00F8 | l2p_addr_match
iwlwifi0: 0x00000009 | lmpm_pmg_sel
iwlwifi0: 0x00000000 | timestamp
iwlwifi0: 0x0000689C | flow_handler
iwlwifi0: Start IWL Error Log Dump:
iwlwifi0: Transport status: 0x0000004B, valid: 7
iwlwifi0: 0x20101F05 | ADVANCED_SYSASSERT
iwlwifi0: 0x00000000 | umac branchlink1
iwlwifi0: 0x8045F174 | umac branchlink2
iwlwifi0: 0x010815F6 | umac interruptlink1
iwlwifi0: 0x00000000 | umac interruptlink2
iwlwifi0: 0x00000000 | umac data1
iwlwifi0: 0x00000003 | umac data2
iwlwifi0: 0xDEADBEEF | umac data3
iwlwifi0: 0x00000049 | umac major
iwlwifi0: 0x35C0A2C6 | umac minor
iwlwifi0: 0x00DA0335 | frame pointer
iwlwifi0: 0xC0886BFC | stack pointer
iwlwifi0: 0x003C0128 | last host cmd
iwlwifi0: 0x00000000 | isr status reg
iwlwifi0: IML/ROM dump:
iwlwifi0: 0x00000B03 | IML/ROM error/state
iwlwifi0: 0x00007ED0 | IML/ROM data1
iwlwifi0: 0x00000090 | IML/ROM WFPM_AUTH_KEY_0
iwlwifi0: Fseq Registers:
iwlwifi0: 0x60000000 | FSEQ_ERROR_CODE
iwlwifi0: 0x80440005 | FSEQ_TOP_INIT_VERSION
iwlwifi0: 0x00080009 | FSEQ_CNVIO_INIT_VERSION
iwlwifi0: 0x0000A652 | FSEQ_OTP_VERSION
iwlwifi0: 0x00000002 | FSEQ_TOP_CONTENT_VERSION
iwlwifi0: 0x4552414E | FSEQ_ALIVE_TOKEN
iwlwifi0: 0x00400410 | FSEQ_CNVI_ID
iwlwifi0: 0x00400410 | FSEQ_CNVR_ID
iwlwifi0: 0x00400410 | CNVI_AUX_MISC_CHIP
iwlwifi0: 0x00400410 | CNVR_AUX_MISC_CHIP
iwlwifi0: 0x00009061 | CNVR_SCU_SD_REGS_SD_REG_DIG_DCDC_VTRIM
iwlwifi0: 0x00000061 | CNVR_SCU_SD_REGS_SD_REG_ACTIVE_VDIG_MIRROR
iwlwifi0: WRT: Collecting data: ini trigger 4 fired (delay=0ms).
iwlwifi0: FW error in SYNC CMD MAC_CONTEXT_CMD
#0 0xffffffff80e5ac03 at linux_dump_stack+0x23
#1 0xffffffff8435eed4 at iwl_trans_txq_send_hcmd+0x414
#2 0xffffffff8430603e at iwl_trans_send_cmd+0xce
#3 0xffffffff84345f99 at iwl_mvm_send_cmd_pdu+0x49
#4 0xffffffff84315c31 at iwl_mvm_mac_ctx_send+0x431
#5 0xffffffff8431e662 at iwl_mvm_bss_info_changed+0x282
#6 0xffffffff80e4f20a at lkpi_sta_assoc_to_run+0x27a
#7 0xffffffff80e548b8 at lkpi_iv_newstate+0x1b8
#8 0xffffffff80d8b96a at ieee80211_newstate_cb+0x17a
#9 0xffffffff80c64d51 at taskqueue_run_locked+0x181
#10 0xffffffff80c66013 at taskqueue_thread_loop+0xc3
#11 0xffffffff80bc0a6d at fork_exit+0x7d
#12 0xffffffff81090fae at fork_trampoline+0xe
iwlwifi0: Failed to send MAC context (action:2): -5
iwlwifi0: failed to update MAC 0xfffffe025014dc9eM
iwlwifi0: mcast filter cmd error. ret=-5
iwlwifi0: Failed to synchronize multicast groups update
iwlwifi0: failed to update power mode
iwlwifi0: mcast filter cmd error. ret=-5
iwlwifi0: Failed to synchronize multicast groups update
WARNING iwl_mvm_enable_beacon_filter(mvm, vif, 0) failed at /usr/src/sys/contrib/dev/iwlwifi/mvm/mac80211.c:3277
iwlwifi0: Failed to send MAC context (action:2): -5
iwlwifi0: Failed to send MAC context (action:2): -5
iwlwifi0: failed to update MAC 0xfffffe025014dc9eM
iwlwifi0: Failed to send MAC context (action:2): -5
iwlwifi0: failed to update MAC 0xfffffe025014dc9eM
wlan0: link state changed to UP
iwlwifi0: failed to update power mode
iwlwifi0: mcast filter cmd error. ret=-5
iwlwifi0: Failed to synchronize multicast groups update
WARNING i != mvmvif->ap_sta_id && !sta->tdls failed at /usr/src/sys/contrib/dev/iwlwifi/mvm/mac80211.c:4914
iwlwifi0: Failed to send flush command (-5)
iwlwifi0: flush request fail
iwlwifi0: Failed to send flush command (-5)
iwlwifi0: flush request fail
iwlwifi0: Couldn't send the SESSION_PROTECTION_CMD
wlan0: link state changed to DOWN
WARNING i != mvmvif->ap_sta_id && !sta->tdls failed at /usr/src/sys/contrib/dev/iwlwifi/mvm/mac80211.c:4914
WARNING trans->state != IWL_TRANS_FW_ALIVE failed at /usr/src/sys/contrib/dev/iwlwifi/iwl-trans.h:1367
iwlwifi0: iwl_trans_wait_txq_empty bad state = 0
iwlwifi0: iwl_trans_wait_txq_empty bad state = 0
iwlwifi0: Failed to trigger RX queues sync (-5)
iwlwifi0: Failed to send MAC context (action:2): -5
iwlwifi0: Failed to synchronize multicast groups update
iwlwifi0: Failed to send MAC context (action:2): -5
iwlwifi0: failed to update MAC 0xfffffe025014dc9eM
iwlwifi0: Failed to remove station. Id=1
iwlwifi0: failed to remove AP station
iwlwifi0: Failed to send MAC context (action:2): -5
iwlwifi0: failed to update MAC 0xfffffe025014dc9eM (clear after unassoc)
iwlwifi0: Failed to synchronize multicast groups update
iwlwifi0: Failed to send MAC context (action:2): -5
iwlwifi0: failed to update MAC 0xfffffe025014dc9eM
iwlwifi0: Failed to send binding (action:3): -5
iwlwifi0: PHY ctxt cmd error. ret=-5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Failed to synchronize multicast groups update
iwlwifi0: Failed to synchronize multicast groups update
iwlwifi0: Failed to synchronize multicast groups update
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
Security policy loaded: MAC/ntpd (mac_ntpd)
ACPI Warning: \134_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20201113/nsarguments-212)
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5

Here is pciconf -lv.

iwlwifi0@pci0:63:0:0:   class=0x028000 rev=0x1a hdr=0x00 vendor=0x8086 device=0x2725 subvendor=0x8086 subdevice=0x0024
    vendor     = 'Intel Corporation'
    device     = 'Wi-Fi 6 AX210/AX211/AX411 160MHz'
    class      = network

Some people have similar problem in FreeBSD 13.2
Comment 1 Bjoern A. Zeeb freebsd_committer freebsd_triage 2023-09-30 08:49:57 UTC
if you can try main: please update to/past the revision mentioned in:
https://lists.freebsd.org/archives/freebsd-wireless/2023-September/001441.html
Comment 2 Bjoern A. Zeeb freebsd_committer freebsd_triage 2023-10-25 21:16:16 UTC
(In reply to Bjoern A. Zeeb from comment #1)

In case you have any chance, can you try 14 or 15 and see if this is gone for you?
Comment 3 Vincent Milum Jr 2023-11-08 00:03:02 UTC
I'm seeing this exact same issue on 14.0-RC4.

I also commented out the IPv6 config from rc.conf, and magically WiFi started working again after reboot.
Comment 4 Bjoern A. Zeeb freebsd_committer freebsd_triage 2023-11-08 00:26:00 UTC
(In reply to Vincent Milum Jr from comment #3)

Hello Vincent,

which error exactly did you get.  Do you have a full log of it?

How did you install 14.0-RC4?  Was this a fresh install or a source update?
Comment 5 Vincent Milum Jr 2023-11-08 03:58:59 UTC
FreeBSD 14.0-RC4 fresh install via ISO image.

Also note that just doing a WiFi SSID scan from the installer image also fails.

Full dmesg:
https://gist.github.com/darkain/449cb04e6e932f0b468293711edd657a
Comment 6 Bjoern A. Zeeb freebsd_committer freebsd_triage 2023-11-09 20:05:13 UTC
(In reply to Vincent Milum Jr from comment #5)

Thanks.  Seems this is triggered by a possible race in net80211 also affecting other drivers but given LinuxKPI/iwlwifi does a lot more seems to expose it more easily.  I'll follow-up in a few days.  It seems cc@ can reproduce it reliably and will try to gather more info as I could not any more.
Comment 7 Bjoern A. Zeeb freebsd_committer freebsd_triage 2023-11-29 01:50:33 UTC
(In reply to Vincent Milum Jr from comment #5)

The installer problem is tracked here now:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=274990
Comment 8 Hadow1998 2023-12-06 06:45:51 UTC
MARKED AS SPAM
Comment 9 Bjoern A. Zeeb freebsd_committer freebsd_triage 2023-12-19 01:11:53 UTC
(In reply to Bjoern A. Zeeb from comment #2)

Kevin, any chance for you to test main, a recent stable/14 or recent stable/13?
Comment 10 Kevin 2023-12-19 08:39:08 UTC
Yes, I have been trying to solve this problem by updating the system, but until a few days ago I updated to a newer version of stable/14(FreeBSD 14.0-STABLE #0 stable/14-n265989-c07ebf5becae)and still had the same error. But I accidentally discovered a way to make ipv6 work properly, which is to use Link Aggregation and Failover. Below is my configuration. 


cloned_interfaces="lagg0 lo1"
##Link Aggregation and Failover
#
# Configuring Dynamic IPv6 Address
ipv6_activate_all_interfaces="YES"
rtsold_enable="YES"

#ifconfig_alc0="up"
#ifconfig_wlan0="up"
ifconfig_alc0="ether 47:b6:76:66:6e:59"
wlans_iwlwifi0="wlan0"
#ifconfig_wlan0_ipv6="inet6 accept_rtadv"
create_args_wlan0="wlanmode sta regdomain FCC country US"
wlandebug_wlan0="+state +node +auth +assoc +scan +output +dot1xsm +wpa +alc"
#cloned_interfaces="lagg0"
ifconfig_lagg0="up laggproto failover laggport alc0 laggport wlan0 SYNCDHCP"
#ifconfig_lagg0_ipv6="inet6 accept_rtadv"
ifconfig_wlan0="WPA mode 11a"


It allows me to use the network normally with ipv6 enabled. 
"lo1" is the interface of my BastilleBSD. If you don't use jail, you can ignore it. 
Also since I want to enable ipv6 for all interfaces instead of configuring them one by one, I used ipv6_activate_all_interfaces="YES" and set net.inet6.ip6.accept_rtadv = "1".
I only know that this configuration allows me to use ipv6 normally, but I don't know why. Hope this helps anyone with the same problem. Hopefully this bug can be fixed soon.
Comment 11 Kevin 2023-12-19 08:55:05 UTC
(In reply to Bjoern A. Zeeb from comment #9)
If there is a fix for this bug, I'd be happy to test it.
Comment 12 commit-hook freebsd_committer freebsd_triage 2024-02-14 19:50:36 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=713db49d06deee90dd358b2e4b9ca05368a5eaf6

commit 713db49d06deee90dd358b2e4b9ca05368a5eaf6
Author:     Bjoern A. Zeeb <bz@FreeBSD.org>
AuthorDate: 2024-01-10 10:14:16 +0000
Commit:     Bjoern A. Zeeb <bz@FreeBSD.org>
CommitDate: 2024-02-14 19:47:21 +0000

    net80211: deal with lost state transitions

    Since 5efea30f039c4 we can possibly lose a state transition which can
    cause trouble further down the road.
    The reproducer from 643d6dce6c1e can trigger these for example.
    Drivers for firmware based wireless cards have worked around some of
    this (and other) problems in the past.

    Add an array of tasks rather than a single one as we would simply
    get npending > 1 and lose order with other tasks.  Try to keep state
    changes updated as queued in case we end up with more than one at a
    time.  While this is not ideal either (call it a hack) it will sort
    the problem for now.
    We will queue in ieee80211_new_state_locked() and do checks there
    and dequeue in ieee80211_newstate_cb().
    If we still overrun the (currently) 8 slots we will drop the state
    change rather than overwrite the last one.
    When dequeing we will update iv_nstate and keep it around for historic
    reasons for the moment.

    The longer term we should make the callers of
    ieee80211_new_state[_locked]() actually use the returned errors
    and act appropriately but that will touch a lot more places and
    drivers (possibly incl. changed behaviour for ioctls).

    rtwn(4) and rum(4) should probably be revisted and net80211 internals
    removed (for rum(4) at least the current logic still seems prone to
    races).

    PR:             271979, 271988, 275255, 263613, 274003
    Sponsored by:   The FreeBSD Foundation (in 2023)
    MFC after:      3 days
    Reviewed by:    cc
    Differential Revision: https://reviews.freebsd.org/D43389

 sys/dev/rtwn/if_rtwn.c         |   4 +-
 sys/dev/usb/wlan/if_rum.c      |   4 +-
 sys/net80211/ieee80211.c       |   4 +-
 sys/net80211/ieee80211_ddb.c   |  13 ++++-
 sys/net80211/ieee80211_proto.c | 124 ++++++++++++++++++++++++++++++++++-------
 sys/net80211/ieee80211_var.h   |  13 ++++-
 6 files changed, 134 insertions(+), 28 deletions(-)
Comment 13 commit-hook freebsd_committer freebsd_triage 2024-02-14 19:50:39 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=2ac8a2189ac6707f48f77ef2e36baf696a0d2f40

commit 2ac8a2189ac6707f48f77ef2e36baf696a0d2f40
Author:     Bjoern A. Zeeb <bz@FreeBSD.org>
AuthorDate: 2024-02-03 16:33:56 +0000
Commit:     Bjoern A. Zeeb <bz@FreeBSD.org>
CommitDate: 2024-02-14 19:47:53 +0000

    LinuxKPI: 802.11: band-aid for invalid state changes after (*iv_update_bss)

    With firmware based solutions we cannot just jump from an active session
    to a new iv_bss node without tearing down state for the old and bringing
    up the new node.  This likely used to work on softmac based cards/drivers
    where one could essentially set the state and fire at will.

    We track (*iv_update_bss) calls from net80211 and set a local flag that
    we are out of synch and do not allow any further operations up the state
    machine until we hit INIT or SCAN.  That means someone will take the state
    down, clean up firmware state and then we can join again and build up
    state.

    Apparently this problem has been "known" for a while as native iwm(4) and
    others have similar workarounds (though less strict) and can be equally
    pestered into bad states.  For LinuxKPI all the KASSERTs just massively
    brought this problem out.  The solution will be some rewrites in net80211.
    Until then, try to keep us more stable at least and not die on second
    join1() calls triggered by service netif start wlan0 and similar.

    PR:             271979, 271988, 275255, 263613, 274003
    Sponsored by:   The FreeBSD Foundation (2023, partial)
    MFC after:      3 days
    Reviewed by:    cc
    Differential Revision: https://reviews.freebsd.org/D43725

 sys/compat/linuxkpi/common/src/linux_80211.c | 309 +++++++++++++++++++--------
 sys/compat/linuxkpi/common/src/linux_80211.h |   2 +
 2 files changed, 216 insertions(+), 95 deletions(-)
Comment 14 commit-hook freebsd_committer freebsd_triage 2024-02-18 21:12:11 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=b392b36d3776b696601ce0253256803276d24ea2

commit b392b36d3776b696601ce0253256803276d24ea2
Author:     Bjoern A. Zeeb <bz@FreeBSD.org>
AuthorDate: 2024-01-10 10:14:16 +0000
Commit:     Bjoern A. Zeeb <bz@FreeBSD.org>
CommitDate: 2024-02-18 18:31:17 +0000

    net80211: deal with lost state transitions

    Since 5efea30f039c4 we can possibly lose a state transition which can
    cause trouble further down the road.
    The reproducer from 643d6dce6c1e can trigger these for example.
    Drivers for firmware based wireless cards have worked around some of
    this (and other) problems in the past.

    Add an array of tasks rather than a single one as we would simply
    get npending > 1 and lose order with other tasks.  Try to keep state
    changes updated as queued in case we end up with more than one at a
    time.  While this is not ideal either (call it a hack) it will sort
    the problem for now.
    We will queue in ieee80211_new_state_locked() and do checks there
    and dequeue in ieee80211_newstate_cb().
    If we still overrun the (currently) 8 slots we will drop the state
    change rather than overwrite the last one.
    When dequeing we will update iv_nstate and keep it around for historic
    reasons for the moment.

    The longer term we should make the callers of
    ieee80211_new_state[_locked]() actually use the returned errors
    and act appropriately but that will touch a lot more places and
    drivers (possibly incl. changed behaviour for ioctls).

    rtwn(4) and rum(4) should probably be revisted and net80211 internals
    removed (for rum(4) at least the current logic still seems prone to
    races).

    Given this changes the internal structure of 'struct ieee80211vap',
    which gets allocated by the drivers, and we do not have enough
    spares, all wireless drivers need to be recompiled.
    Given we are forced to do the update, we leave fields in the middle
    of the struct and add more spares at the same time.
    __FreeBSD_version gets updated to 1400509 to be able to detect
    this change.

    PR:             271979, 271988, 275255, 263613, 274003
    Sponsored by:   The FreeBSD Foundation (in 2023)
    Reviewed by:    cc
    Differential Revision: https://reviews.freebsd.org/D43389

    (cherry picked from commit 713db49d06deee90dd358b2e4b9ca05368a5eaf6)
    (cherry picked from commit a890a3a5ddf33acb0a4000885945b89156799b07)

 UPDATING                       |   6 ++
 sys/dev/rtwn/if_rtwn.c         |   4 +-
 sys/dev/usb/wlan/if_rum.c      |   4 +-
 sys/net80211/ieee80211.c       |   4 +-
 sys/net80211/ieee80211_ddb.c   |  13 ++++-
 sys/net80211/ieee80211_proto.c | 124 ++++++++++++++++++++++++++++++++++-------
 sys/net80211/ieee80211_var.h   |  15 +++--
 sys/sys/param.h                |   2 +-
 8 files changed, 142 insertions(+), 30 deletions(-)
Comment 15 commit-hook freebsd_committer freebsd_triage 2024-02-18 21:12:19 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=8c450ea1083b03f30871506b59034f26bc608972

commit 8c450ea1083b03f30871506b59034f26bc608972
Author:     Bjoern A. Zeeb <bz@FreeBSD.org>
AuthorDate: 2024-02-03 16:33:56 +0000
Commit:     Bjoern A. Zeeb <bz@FreeBSD.org>
CommitDate: 2024-02-18 18:31:17 +0000

    LinuxKPI: 802.11: band-aid for invalid state changes after (*iv_update_bss)

    With firmware based solutions we cannot just jump from an active session
    to a new iv_bss node without tearing down state for the old and bringing
    up the new node.  This likely used to work on softmac based cards/drivers
    where one could essentially set the state and fire at will.

    We track (*iv_update_bss) calls from net80211 and set a local flag that
    we are out of synch and do not allow any further operations up the state
    machine until we hit INIT or SCAN.  That means someone will take the state
    down, clean up firmware state and then we can join again and build up
    state.

    Apparently this problem has been "known" for a while as native iwm(4) and
    others have similar workarounds (though less strict) and can be equally
    pestered into bad states.  For LinuxKPI all the KASSERTs just massively
    brought this problem out.  The solution will be some rewrites in net80211.
    Until then, try to keep us more stable at least and not die on second
    join1() calls triggered by service netif start wlan0 and similar.

    PR:             271979, 271988, 275255, 263613, 274003
    Sponsored by:   The FreeBSD Foundation (2023, partial)
    Reviewed by:    cc
    Differential Revision: https://reviews.freebsd.org/D43725

    (cherry picked from commit 2ac8a2189ac6707f48f77ef2e36baf696a0d2f40)

 sys/compat/linuxkpi/common/src/linux_80211.c | 309 +++++++++++++++++++--------
 sys/compat/linuxkpi/common/src/linux_80211.h |   2 +
 2 files changed, 216 insertions(+), 95 deletions(-)
Comment 16 commit-hook freebsd_committer freebsd_triage 2024-02-19 08:09:19 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=a7e1fc7f620d3341549c1380f550aaafbdb45622

commit a7e1fc7f620d3341549c1380f550aaafbdb45622
Author:     Bjoern A. Zeeb <bz@FreeBSD.org>
AuthorDate: 2024-01-10 10:14:16 +0000
Commit:     Bjoern A. Zeeb <bz@FreeBSD.org>
CommitDate: 2024-02-19 08:02:01 +0000

    net80211: deal with lost state transitions

    Since 5efea30f039c4 we can possibly lose a state transition which can
    cause trouble further down the road.
    The reproducer from 643d6dce6c1e can trigger these for example.
    Drivers for firmware based wireless cards have worked around some of
    this (and other) problems in the past.

    Add an array of tasks rather than a single one as we would simply
    get npending > 1 and lose order with other tasks.  Try to keep state
    changes updated as queued in case we end up with more than one at a
    time.  While this is not ideal either (call it a hack) it will sort
    the problem for now.
    We will queue in ieee80211_new_state_locked() and do checks there
    and dequeue in ieee80211_newstate_cb().
    If we still overrun the (currently) 8 slots we will drop the state
    change rather than overwrite the last one.
    When dequeing we will update iv_nstate and keep it around for historic
    reasons for the moment.

    The longer term we should make the callers of
    ieee80211_new_state[_locked]() actually use the returned errors
    and act appropriately but that will touch a lot more places and
    drivers (possibly incl. changed behaviour for ioctls).

    rtwn(4) and rum(4) should probably be revisted and net80211 internals
    removed (for rum(4) at least the current logic still seems prone to
    races).

    PR:             271979, 271988, 275255, 263613, 274003
    Sponsored by:   The FreeBSD Foundation (in 2023)
    Reviewed by:    cc
    Differential Revision: https://reviews.freebsd.org/D43389

    (cherry picked from commit 713db49d06deee90dd358b2e4b9ca05368a5eaf6)

    Given this changes the internal structure of 'struct ieee80211vap',
    which gets allocated by the drivers, and we do not have enough
    spares, all wireless drivers need to be recompiled.
    Given we are forced to do the update, we leave fields in the middle
    of the struct and add more spares at the same time.
    __FreeBSD_version gets updated to 1303501 to be able to detect
    this change.

    (cherry picked from commit a890a3a5ddf33acb0a4000885945b89156799b07)

 UPDATING                       |   6 ++
 sys/dev/rtwn/if_rtwn.c         |   4 +-
 sys/dev/usb/wlan/if_rum.c      |   4 +-
 sys/net80211/ieee80211.c       |   4 +-
 sys/net80211/ieee80211_ddb.c   |  15 ++++-
 sys/net80211/ieee80211_proto.c | 124 ++++++++++++++++++++++++++++++++++-------
 sys/net80211/ieee80211_var.h   |  18 +++---
 sys/sys/param.h                |   2 +-
 8 files changed, 143 insertions(+), 34 deletions(-)
Comment 17 commit-hook freebsd_committer freebsd_triage 2024-02-19 08:09:30 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=184ccc414686ea32c64f063c081c7cc1adeae7c3

commit 184ccc414686ea32c64f063c081c7cc1adeae7c3
Author:     Bjoern A. Zeeb <bz@FreeBSD.org>
AuthorDate: 2024-02-03 16:33:56 +0000
Commit:     Bjoern A. Zeeb <bz@FreeBSD.org>
CommitDate: 2024-02-19 08:02:02 +0000

    LinuxKPI: 802.11: band-aid for invalid state changes after (*iv_update_bss)

    With firmware based solutions we cannot just jump from an active session
    to a new iv_bss node without tearing down state for the old and bringing
    up the new node.  This likely used to work on softmac based cards/drivers
    where one could essentially set the state and fire at will.

    We track (*iv_update_bss) calls from net80211 and set a local flag that
    we are out of synch and do not allow any further operations up the state
    machine until we hit INIT or SCAN.  That means someone will take the state
    down, clean up firmware state and then we can join again and build up
    state.

    Apparently this problem has been "known" for a while as native iwm(4) and
    others have similar workarounds (though less strict) and can be equally
    pestered into bad states.  For LinuxKPI all the KASSERTs just massively
    brought this problem out.  The solution will be some rewrites in net80211.
    Until then, try to keep us more stable at least and not die on second
    join1() calls triggered by service netif start wlan0 and similar.

    PR:             271979, 271988, 275255, 263613, 274003
    Sponsored by:   The FreeBSD Foundation (2023, partial)
    Reviewed by:    cc
    Differential Revision: https://reviews.freebsd.org/D43725

    (cherry picked from commit 2ac8a2189ac6707f48f77ef2e36baf696a0d2f40)

 sys/compat/linuxkpi/common/src/linux_80211.c | 309 +++++++++++++++++++--------
 sys/compat/linuxkpi/common/src/linux_80211.h |   2 +
 2 files changed, 216 insertions(+), 95 deletions(-)
Comment 18 commit-hook freebsd_committer freebsd_triage 2024-02-19 16:10:58 UTC
A commit in branch releng/13.3 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=9b998db87c28356fce21784c4f8bfb8737615e1f

commit 9b998db87c28356fce21784c4f8bfb8737615e1f
Author:     Bjoern A. Zeeb <bz@FreeBSD.org>
AuthorDate: 2024-01-10 10:14:16 +0000
Commit:     Bjoern A. Zeeb <bz@FreeBSD.org>
CommitDate: 2024-02-19 16:07:20 +0000

    net80211: deal with lost state transitions

    Since 5efea30f039c4 we can possibly lose a state transition which can
    cause trouble further down the road.
    The reproducer from 643d6dce6c1e can trigger these for example.
    Drivers for firmware based wireless cards have worked around some of
    this (and other) problems in the past.

    Add an array of tasks rather than a single one as we would simply
    get npending > 1 and lose order with other tasks.  Try to keep state
    changes updated as queued in case we end up with more than one at a
    time.  While this is not ideal either (call it a hack) it will sort
    the problem for now.
    We will queue in ieee80211_new_state_locked() and do checks there
    and dequeue in ieee80211_newstate_cb().
    If we still overrun the (currently) 8 slots we will drop the state
    change rather than overwrite the last one.
    When dequeing we will update iv_nstate and keep it around for historic
    reasons for the moment.

    The longer term we should make the callers of
    ieee80211_new_state[_locked]() actually use the returned errors
    and act appropriately but that will touch a lot more places and
    drivers (possibly incl. changed behaviour for ioctls).

    rtwn(4) and rum(4) should probably be revisted and net80211 internals
    removed (for rum(4) at least the current logic still seems prone to
    races).

    PR:             271979, 271988, 275255, 263613, 274003
    Sponsored by:   The FreeBSD Foundation (in 2023)
    Reviewed by:    cc
    Differential Revision: https://reviews.freebsd.org/D43389

    (cherry picked from commit 713db49d06deee90dd358b2e4b9ca05368a5eaf6)

    Given this changes the internal structure of 'struct ieee80211vap',
    which gets allocated by the drivers, and we do not have enough
    spares, all wireless drivers need to be recompiled.
    Given we are forced to do the update, we leave fields in the middle
    of the struct and add more spares at the same time.
    __FreeBSD_version will get updated to 1303001 to be able to detect
    this change.

    Approved by:    re (cperciva)

    (cherry picked from commit a890a3a5ddf33acb0a4000885945b89156799b07)
    (cherry picked from commit a7e1fc7f620d3341549c1380f550aaafbdb45622)

 sys/dev/rtwn/if_rtwn.c         |   4 +-
 sys/dev/usb/wlan/if_rum.c      |   4 +-
 sys/net80211/ieee80211.c       |   4 +-
 sys/net80211/ieee80211_ddb.c   |  15 ++++-
 sys/net80211/ieee80211_proto.c | 124 ++++++++++++++++++++++++++++++++++-------
 sys/net80211/ieee80211_var.h   |  18 +++---
 6 files changed, 136 insertions(+), 33 deletions(-)
Comment 19 commit-hook freebsd_committer freebsd_triage 2024-02-19 16:11:00 UTC
A commit in branch releng/13.3 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=d4b4efc6db6c6c3a9abf2f187ba1ccc0e40028cf

commit d4b4efc6db6c6c3a9abf2f187ba1ccc0e40028cf
Author:     Bjoern A. Zeeb <bz@FreeBSD.org>
AuthorDate: 2024-02-03 16:33:56 +0000
Commit:     Bjoern A. Zeeb <bz@FreeBSD.org>
CommitDate: 2024-02-19 16:09:22 +0000

    LinuxKPI: 802.11: band-aid for invalid state changes after (*iv_update_bss)

    With firmware based solutions we cannot just jump from an active session
    to a new iv_bss node without tearing down state for the old and bringing
    up the new node.  This likely used to work on softmac based cards/drivers
    where one could essentially set the state and fire at will.

    We track (*iv_update_bss) calls from net80211 and set a local flag that
    we are out of synch and do not allow any further operations up the state
    machine until we hit INIT or SCAN.  That means someone will take the state
    down, clean up firmware state and then we can join again and build up
    state.

    Apparently this problem has been "known" for a while as native iwm(4) and
    others have similar workarounds (though less strict) and can be equally
    pestered into bad states.  For LinuxKPI all the KASSERTs just massively
    brought this problem out.  The solution will be some rewrites in net80211.
    Until then, try to keep us more stable at least and not die on second
    join1() calls triggered by service netif start wlan0 and similar.

    Approved by:    re (cperciva)
    PR:             271979, 271988, 275255, 263613, 274003
    Sponsored by:   The FreeBSD Foundation (2023, partial)
    Reviewed by:    cc
    Differential Revision: https://reviews.freebsd.org/D43725

    (cherry picked from commit 2ac8a2189ac6707f48f77ef2e36baf696a0d2f40)
    (cherry picked from commit 184ccc414686ea32c64f063c081c7cc1adeae7c3)

 sys/compat/linuxkpi/common/src/linux_80211.c | 309 +++++++++++++++++++--------
 sys/compat/linuxkpi/common/src/linux_80211.h |   2 +
 2 files changed, 216 insertions(+), 95 deletions(-)
Comment 20 Bjoern A. Zeeb freebsd_committer freebsd_triage 2024-02-19 17:03:20 UTC
So the initial report is a PR 274382 problem "Invalid TXQ id".
I'll add that to "See also".

Kevin, could you confirm that this was always the issue?
Do you still see firmware crashes?

Given people generally have no trouble with IPv6 on iwlwifi, I keep wondering if changing rc.conf simply changes timing or lagg does an extra down/up cycle and it now works "by accident" not triggering the above mentioned problem anymore.

We'll probably should check again once 274382 is fixed.

If you have any further news in the meantime, please let us know.
Comment 21 Kevin 2024-03-20 01:35:26 UTC
(In reply to Bjoern A. Zeeb from comment #20)

Great work!

After an upgraded, This issue has been solved on my laptop. And system could work normally with ipv6 on, however the ipv6 could not work prefect. 

My laptop could get the public ipv6 address for a little time after startup, and then it will be instead with a private ipv6 address. 

I'm not sure if my system config cause this issue. 

"service netif restart" cannot solve this issue, except a reboot.
Comment 22 Bjoern A. Zeeb freebsd_committer freebsd_triage 2024-03-20 19:54:24 UTC
(In reply to Kevin from comment #21)

Hi Kevin,

I am not sure what you mean by a "private" IPv6 address?
If you don't want to share the ifconfig output publicly, please email it to me at bz@ and I'll have a look from "before" and "after" and we can sort that out.

Otherwise in case there is a multicast filter or otherwise issue there's a few things to look at.

Which FreeBSD version are you on now?

Can you check lifetimes using ifconfig -L and ndp -pn and ndp -rn at the time when it stops working?

Also can you go and check based on wpa_supplicant logging, how long it has been since your last (re)assoc or re-key?

Does running rtsol wlan0 bring things back at that point (please do not do a service netif restart (at least not before all this) as it destroys the wlan interface and creates a new one and all state is gone;  before that ifconfig wlan0 down; ifconfig wlan0 up  is a lot better to try).

What does a ping6 -n ff02::1%wlan0 show;  anything but your own interface link-local address?