Bug 271979 - bsdinstall(8): iwlwifi(4): system crash when authenticating for Wi-Fi: panic: lkpi_sta_auth_to_scan: lsta 0x... state not NONE: 0, nstate 1 arg 1 (fixed?)
Summary: bsdinstall(8): iwlwifi(4): system crash when authenticating for Wi-Fi: panic:...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Some People
Assignee: Bjoern A. Zeeb
URL:
Keywords: crash, install, needs-qa
Depends on:
Blocks: 14.0r iwlwifi
  Show dependency treegraph
 
Reported: 2023-06-13 12:56 UTC by Hitch
Modified: 2024-06-08 01:07 UTC (History)
16 users (show)

See Also:
bz: mfc-stable14+
bz: mfc-stable13+


Attachments
Backtrace after the installation failed and dropped out to the debugger. (104.08 KB, image/jpeg)
2023-06-13 18:13 UTC, Hitch
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Hitch 2023-06-13 12:56:02 UTC
During installation of FreeBSD 14.0 CURRENT, installation proceeds well up to the point where it request the way I would like to connect, ethernet or wireless.  When I choose wireless, the system scans for wireless networks and succeeds in finding them.  I choose my wireless network, it then prompts me to enter the appropriate password, I enter - and the installer exits immediately to a prompt and the system completely stops.  I've attempted the install twice, both resulting in the same result.  System exits to a db> prompt, which permits limited options.  None of them resume the installation.
Comment 1 Mina Galić freebsd_triage 2023-06-13 16:10:07 UTC
which WiFi driver is the installer selecting?
as you're already in the debugger, can you share a "backtrace" from the panic?
https://docs.freebsd.org/en/books/developers-handbook/kerneldebug/
Comment 2 Hitch 2023-06-13 18:13:55 UTC
Created attachment 242764 [details]
Backtrace after the installation failed and dropped out to the debugger.

This is the backtrace after the installer failed after attempting to connect to wifi.
Comment 3 Jessica Clarke freebsd_committer freebsd_triage 2023-06-13 20:50:23 UTC
What does `show panic` say? I have a hunch this is the same bug we see and that bz@ knows about but has yet to sit down and fix.
Comment 4 Bjoern A. Zeeb freebsd_committer freebsd_triage 2023-06-13 21:58:13 UTC
(In reply to Jessica Clarke from comment #3)

And which Chipset/Vendor is this (Realtek or Intel)?
Comment 5 Jessica Clarke freebsd_committer freebsd_triage 2023-06-13 22:02:36 UTC
(In reply to Bjoern A. Zeeb from comment #4)

Our issue is with iwlwifi giving "panic: lkpi_sta_auth_to_scan: lsta 0x... state not NONE: 0, nstate 1 arg 1" which jhb@ emailed you about a month ago. The backtrace in this bug report matches that, but there is another KASSERT that it could be (though that seems unlikely).
Comment 6 Graham Perrin freebsd_committer freebsd_triage 2023-06-16 06:40:55 UTC
^Triage: summary, component, keywords, make the former assignee a cc recipient. 

(In reply to Hitch from comment #0)

Thank you, can you tell what Wi-Fi hardware is in the notebook? 

For an exact answer, you can boot from the installer and use a shell to run the following command: 


pciconf -lv | grep -B 3 network
Comment 7 Oleg 2023-07-07 01:07:08 UTC
I think I've been encountering the same panic with my Intel Wi-fi 6 AX201 card. If I type "ifconfig wlan create wlandev iwlwifi0", "ifconfig wlan0 channel 153", "wpa_supplicant -i wlan0 -c /etc/wpa_supplicant.conf &", "dhclient wlan0", then I will be able to access the internet without encountering issues and there will be no panic. However, if I type "ifconfig wlan0 up" right after creating wlandev, there will be a panic after I type "wpa_supplicant -i wlan0 -c /etc/wpa_supplicant.conf &". I don't really have the need to type "ifconfig wlan0 up" because I am still able to access the internet after typing the commands I already mentioned; however, bsdinstall wants this "up" thing to be available early, so, in bsdinstall, I will encounter this panic. Here's the backtrace:

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:59
59              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
(kgdb) #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:59
#1  doadump (textdump=textdump@entry=1)
    at /usr/src/sys/kern/kern_shutdown.c:407
#2  0xffffffff80b4bb60 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:528
#3  0xffffffff80b4c07d in vpanic (
    fmt=0xffffffff811d6263 "%s: lsta %p state not NONE: %#x, nstate %d arg %d\n", ap=ap@entry=0xfffffe01e9a94ce0) at /usr/src/sys/kern/kern_shutdown.c:972
#4  0xffffffff80b4be03 in panic (fmt=<unavailable>)
    at /usr/src/sys/kern/kern_shutdown.c:896
#5  0xffffffff80dcbd0c in lkpi_sta_auth_to_scan (vap=0xfffffe01f95cd010,
    nstate=IEEE80211_S_SCAN, arg=1)
    at /usr/src/sys/compat/linuxkpi/common/src/linux_80211.c:1153
#6  0xffffffff80dd2963 in lkpi_iv_newstate (vap=0xfffffe01f95cd010,
    nstate=IEEE80211_S_SCAN, arg=1)
    at /usr/src/sys/compat/linuxkpi/common/src/linux_80211.c:2045
#7  0xffffffff80cf8937 in ieee80211_newstate_cb (xvap=0xfffffe01f95cd010,
    npending=<optimized out>) at /usr/src/sys/net80211/ieee80211_proto.c:2548
#8  0xffffffff80bb055b in taskqueue_run_locked (
    queue=queue@entry=0xfffff80001cc8e00)
    at /usr/src/sys/kern/subr_taskqueue.c:514
#9  0xffffffff80bb1613 in taskqueue_thread_loop (
    arg=arg@entry=0xfffffe01ea0de110)
    at /usr/src/sys/kern/subr_taskqueue.c:826
#10 0xffffffff80b02500 in fork_exit (
    callout=0xffffffff80bb1540 <taskqueue_thread_loop>,
    arg=0xfffffe01ea0de110, frame=0xfffffe01e9a94f40)
    at /usr/src/sys/kern/kern_fork.c:1131
#11 <signal handler called>
#12 0xdeadc0dedeadc0de in ?? ()
Backtrace stopped: Cannot access memory at address 0xdeadc0dedeadc0de
(kgdb)
Comment 8 Oleg 2023-07-07 22:25:32 UTC
Under the GENERIC-NODEBUG kernel, the OS doesn't crash, but I get these messages:

iwlwifi0: Microcode SW error detected. Restarting 0x0.
iwlwifi0: Start IWL Error Log Dump:
iwlwifi0: Transport status: 0x0000004B, valid: 6
iwlwifi0: Loaded firmware version: 73.35c0a2c6.0 QuZ-a0-hr-b0-73.ucode
iwlwifi0: 0x00000071 | NMI_INTERRUPT_UMAC_FATAL    
iwlwifi0: 0x00A0A200 | trm_hw_status0
iwlwifi0: 0x00000000 | trm_hw_status1
iwlwifi0: 0x004CC0FE | branchlink2
iwlwifi0: 0x004C2512 | interruptlink1
iwlwifi0: 0x004C2512 | interruptlink2
iwlwifi0: 0x00014D96 | data1
iwlwifi0: 0x00001000 | data2
iwlwifi0: 0x00000000 | data3
iwlwifi0: 0x00000000 | beacon time
iwlwifi0: 0x0002B287 | tsf low
iwlwifi0: 0x00000000 | tsf hi
iwlwifi0: 0x00000000 | time gp1
iwlwifi0: 0x00030DF6 | time gp2
iwlwifi0: 0x00000001 | uCode revision type
iwlwifi0: 0x00000049 | uCode version major
iwlwifi0: 0x35C0A2C6 | uCode version minor
iwlwifi0: 0x00000351 | hw version
iwlwifi0: 0x18C89001 | board version
iwlwifi0: 0x8065FC41 | hcmd
iwlwifi0: 0x24020000 | isr0
iwlwifi0: 0x61000000 | isr1
iwlwifi0: 0x08F00002 | isr2
iwlwifi0: 0x00C3000C | isr3
iwlwifi0: 0x00000000 | isr4
iwlwifi0: 0x00000000 | last cmd Id
iwlwifi0: 0x00014D96 | wait_event
iwlwifi0: 0x00000050 | l2p_control
iwlwifi0: 0x00018014 | l2p_duration
iwlwifi0: 0x0000003F | l2p_mhvalid
iwlwifi0: 0x00000000 | l2p_addr_match
iwlwifi0: 0x00000009 | lmpm_pmg_sel
iwlwifi0: 0x00000000 | timestamp
iwlwifi0: 0x00001054 | flow_handler
iwlwifi0: Start IWL Error Log Dump:
iwlwifi0: Transport status: 0x0000004B, valid: 7
iwlwifi0: 0x20103020 | ADVANCED_SYSASSERT
iwlwifi0: 0x00000000 | umac branchlink1
iwlwifi0: 0x80455E18 | umac branchlink2
iwlwifi0: 0x01077D90 | umac interruptlink1
iwlwifi0: 0x00000000 | umac interruptlink2
iwlwifi0: 0x00000000 | umac data1
iwlwifi0: 0x00000000 | umac data2
iwlwifi0: 0x000000FF | umac data3
iwlwifi0: 0x00000049 | umac major
iwlwifi0: 0x35C0A2C6 | umac minor
iwlwifi0: 0x00030DF1 | frame pointer
iwlwifi0: 0xC0885EE4 | stack pointer
iwlwifi0: 0x0016012B | last host cmd
iwlwifi0: 0x00000000 | isr status reg
iwlwifi0: IML/ROM dump:
iwlwifi0: 0x00000003 | IML/ROM error/state
iwlwifi0: 0x0000569B | IML/ROM data1
iwlwifi0: 0x00000080 | IML/ROM WFPM_AUTH_KEY_0
iwlwifi0: Fseq Registers:
iwlwifi0: 0x60000000 | FSEQ_ERROR_CODE
iwlwifi0: 0x80290033 | FSEQ_TOP_INIT_VERSION
iwlwifi0: 0x00090006 | FSEQ_CNVIO_INIT_VERSION
iwlwifi0: 0x0000A482 | FSEQ_OTP_VERSION
iwlwifi0: 0x00000003 | FSEQ_TOP_CONTENT_VERSION
iwlwifi0: 0x4552414E | FSEQ_ALIVE_TOKEN
iwlwifi0: 0x20000302 | FSEQ_CNVI_ID
iwlwifi0: 0x01300504 | FSEQ_CNVR_ID
iwlwifi0: 0x20000302 | CNVI_AUX_MISC_CHIP
iwlwifi0: 0x01300504 | CNVR_AUX_MISC_CHIP
iwlwifi0: 0x05B0905B | CNVR_SCU_SD_REGS_SD_REG_DIG_DCDC_VTRIM
iwlwifi0: 0x0000025B | CNVR_SCU_SD_REGS_SD_REG_ACTIVE_VDIG_MIRROR
iwlwifi0: WRT: Collecting data: ini trigger 4 fired (delay=0ms).
iwlwifi0: FW error in SYNC CMD BINDING_CONTEXT_CMD
#0 0xffffffff80db16bb at linux_dump_stack+0x1b
#1 0xffffffff833b5a43 at iwl_trans_txq_send_hcmd+0x3f3
#2 0xffffffff8335ce1e at iwl_trans_send_cmd+0xce
#3 0xffffffff8339ca9b at iwl_mvm_send_cmd_status+0x2b
#4 0xffffffff8339cb9f at iwl_mvm_send_cmd_pdu_status+0x4f
#5 0xffffffff83365aae at iwl_mvm_binding_update+0x1fe
#6 0xffffffff83376edc at __iwl_mvm_assign_vif_chanctx+0x7c
#7 0xffffffff83373ae5 at iwl_mvm_assign_vif_chanctx+0x65
#8 0xffffffff80daba97 at lkpi_80211_mo_assign_vif_chanctx+0x27
#9 0xffffffff80da435c at lkpi_sta_scan_to_auth+0x4bc
#10 0xffffffff80dab2ca at lkpi_iv_newstate+0x39a
#11 0xffffffff80cdcf8e at ieee80211_newstate_cb+0xee
#12 0xffffffff80bad472 at taskqueue_run_locked+0x182
#13 0xffffffff80bae702 at taskqueue_thread_loop+0xc2
#14 0xffffffff80b0311f at fork_exit+0x7f
#15 0xffffffff80fefbce at fork_trampoline+0xe
iwlwifi0: Failed to send binding (action:1): -5
iwlwifi0: PHY ctxt cmd error. ret=-5
iwlwifi0: lkpi_iv_newstate: error -5 during state transition 1 (SCAN) -> 2 (AUTH)
iwlwifi0: No queue was found. Dropping TX
iwlwifi0: Failed to trigger RX queues sync (-5)
WARNING !mvmvif->phy_ctxt failed at /usr/src/sys/contrib/dev/iwlwifi/mvm/mac80211.c:3158
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5
iwlwifi0: Scan failed! ret -5
iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5

Once again, this only happens if I type "ifconfig wlan0 up" before executing the wpa_supplicant command. If I don't type it, I won't encounter issues and will be able to access the internet with the wi-fi card. So, the FreeBSD kernel considers the "ifconfig wlan0 up" command evil.
Comment 9 Nils Beyer 2023-07-11 12:40:53 UTC
(In reply to Oleg from comment #8)

> Once again, this only happens if I type "ifconfig wlan0 up" before executing the wpa_supplicant command.

I'm experiencing exactly the same problem and behavior you're mentioning here. My setup is a fail-over LAGG containing "wlan0" as secondary. And I'm upping it before I'm attaching it to "lagg0" and spawning "wpa_supplicant". I've removed that "ifconfig wlan0 up" in my script and "wlan0" works now as expected.
Comment 10 Bjoern A. Zeeb freebsd_committer freebsd_triage 2023-09-30 08:49:07 UTC
if you can try main: please update to/past the revision mentioned in:
https://lists.freebsd.org/archives/freebsd-wireless/2023-September/001441.html
Comment 11 Oleg 2023-09-30 13:09:16 UTC
I no longer encounter this bug after compiling the latest 15-CURRENT kernel.
Comment 12 Oleg 2023-10-05 18:47:40 UTC
But it looks like with the 15-CURRENT kernel that was compiled today, new bugs were introduced such as "panic: lkpi_sta_auth_to_assoc: lsta 0xfffff80439fc1000 state not NONE: 0 . <6>wlan0: ieee80211_new_state_locked: pending SCAN -> AUTH transition lost
<4>Invalid TXQ id"
Comment 13 Bjoern A. Zeeb freebsd_committer freebsd_triage 2023-10-06 11:55:56 UTC
(In reply to Oleg from comment #12)

What was before that.  The actual problem is way earlier in the message buffer.
Comment 14 Oleg 2023-10-06 15:15:04 UTC
__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
57		__asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
(kgdb) #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
#1  doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:405
#2  0xffffffff837ed323 in vt_kms_postswitch () from /boot/modules/drm.ko
#3  0xffffffff80992fde in vt_window_switch (vw=0xfffff800213893c0)
    at /usr/src/sys/dev/vt/vt_core.c:595
#4  0xffffffff80b4ed33 in kern_reboot (howto=4)
    at /usr/src/sys/kern/kern_shutdown.c:501
#5  0xffffffff80b4f50f in vpanic (fmt=0xffffffff8118fa9b "%s", 
    ap=ap@entry=0xfffffe023c222450) at /usr/src/sys/kern/kern_shutdown.c:970
#6  0xffffffff80b4f2b3 in panic (fmt=<unavailable>)
    at /usr/src/sys/kern/kern_shutdown.c:894
#7  0xffffffff8104ecbc in trap_fatal (frame=0xfffffe023c222550, eva=259)
    at /usr/src/sys/amd64/amd64/trap.c:952
#8  0xffffffff8104ed6e in trap_pfault (frame=0xfffffe023c222550, 
    usermode=false, signo=<optimized out>, ucode=<optimized out>)
    at /usr/src/sys/amd64/amd64/trap.c:760
#9  <signal handler called>
#10 0xffffffff836c65d0 in intel_atomic_get_global_obj_state ()
   from /boot/modules/i915kms.ko
#11 0xffffffff8367c35c in skl_compute_wm () from /boot/modules/i915kms.ko
#12 0xffffffff8364464f in intel_atomic_check () from /boot/modules/i915kms.ko
#13 0xffffffff837ad783 in drm_atomic_check_only () from /boot/modules/drm.ko
#14 0xffffffff837adbc3 in drm_atomic_commit () from /boot/modules/drm.ko
#15 0xffffffff837bd298 in drm_client_modeset_commit_atomic ()
   from /boot/modules/drm.ko
#16 0xffffffff837bd384 in drm_client_modeset_commit_locked ()
   from /boot/modules/drm.ko
#17 0xffffffff837bd511 in drm_client_modeset_commit ()
   from /boot/modules/drm.ko
#18 0xffffffff837fff13 in drm_fb_helper_restore_fbdev_mode_unlocked ()
   from /boot/modules/drm.ko
#19 0xffffffff837ed461 in vt_kms_postswitch () from /boot/modules/drm.ko
#20 0xffffffff80992ea1 in vt_window_switch (vw=0xfffffe01eab6c2b0, 
    vw@entry=0xffffffff816a9c98 <vt_conswindow>)
    at /usr/src/sys/dev/vt/vt_core.c:612
#21 0xffffffff809941ff in vtterm_cngrab (tm=<optimized out>)
    at /usr/src/sys/dev/vt/vt_core.c:1863
#22 0xffffffff80adf1c6 in cngrab () at /usr/src/sys/kern/kern_cons.c:385
#23 0xffffffff80b4f441 in vpanic (
    fmt=0xffffffff8125d520 "%s: lsta %p state not NONE: %#x\n", 
    ap=ap@entry=0xfffffe023c222d00) at /usr/src/sys/kern/kern_shutdown.c:942
#24 0xffffffff80b4f2b3 in panic (
    fmt=0x103 <error: Cannot access memory at address 0x103>)
    at /usr/src/sys/kern/kern_shutdown.c:894
#25 0xffffffff80dd2334 in lkpi_sta_auth_to_assoc (vap=0xfffffe023ba37010, 
    nstate=<optimized out>, arg=<optimized out>)
    at /usr/src/sys/compat/linuxkpi/common/src/linux_80211.c:1247
#26 0xffffffff80dd8dc3 in lkpi_iv_newstate (vap=0xfffffe023ba37010, 
    nstate=IEEE80211_S_ASSOC, arg=0)
    at /usr/src/sys/compat/linuxkpi/common/src/linux_80211.c:2064
#27 0xffffffff80cfe837 in ieee80211_newstate_cb (xvap=0xfffffe023ba37010, 
    npending=<optimized out>) at /usr/src/sys/net80211/ieee80211_proto.c:2546
#28 0xffffffff80bb4afb in taskqueue_run_locked (
    queue=queue@entry=0xfffff800034a3000)
    at /usr/src/sys/kern/subr_taskqueue.c:512
#29 0xffffffff80bb5bb3 in taskqueue_thread_loop (
    arg=arg@entry=0xfffffe023b9e6110)
    at /usr/src/sys/kern/subr_taskqueue.c:824
#30 0xffffffff80b05082 in fork_exit (
    callout=0xffffffff80bb5ae0 <taskqueue_thread_loop>, 
    arg=0xfffffe023b9e6110, frame=0xfffffe023c222f40)
    at /usr/src/sys/kern/kern_fork.c:1160
#31 <signal handler called>
(kgdb)
Comment 15 Bjoern A. Zeeb freebsd_committer freebsd_triage 2023-10-06 16:10:47 UTC
(In reply to Oleg from comment #14)

The actual problem is before the KASSERT which just catches it.  Can you check the message buffer (dmesg) of the core file?  There's likely a firmware crash a few lines up with surrounding information.
Comment 16 Oleg 2023-10-06 16:34:26 UTC
Are you looking for this information:

panic: lkpi_sta_auth_to_assoc: lsta 0xfffff80439fc1000 state not NONE: 0

GNU gdb (GDB) 13.2 [GDB v13.2 for FreeBSD]
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd15.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /boot/kernel/kernel...
Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...

Unread portion of the kernel message buffer:
<6>wlan0: ieee80211_new_state_locked: pending SCAN -> AUTH transition lost
<4>Invalid TXQ id
WARNING !drm_modeset_is_locked(&crtc->mutex) failed at /wrkdirs/usr/ports/graphics/drm-515-kmod/work/drm-kmod-drm_v5.15.25_5/drivers/gpu/drm/drm_atomic_helper.c:621
WARNING !drm_modeset_is_locked(&crtc->mutex) failed at /wrkdirs/usr/ports/graphics/drm-515-kmod/work/drm-kmod-drm_v5.15.25_5/drivers/gpu/drm/drm_atomic_helper.c:621
WARNING !drm_modeset_is_locked(&crtc->mutex) failed at /wrkdirs/usr/ports/graphics/drm-515-kmod/work/drm-kmod-drm_v5.15.25_5/drivers/gpu/drm/drm_atomic_helper.c:621
WARNING !drm_modeset_is_locked(&dev->mode_config.connection_mutex) failed at /wrkdirs/usr/ports/graphics/drm-515-kmod/work/drm-kmod-drm_v5.15.25_5/drivers/gpu/drm/drm_atomic_helper.c:671
kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 4; apic id = 04
fault virtual address	= 0x103
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff836c65d0
stack pointer	        = 0x28:0xfffffe023c222610
frame pointer	        = 0x28:0xfffffe023c222650
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= resume, IOPL = 0
current process		= 0 (iwlwifi0 net80211 t)
rdi: 0000000000000103 rsi: fffff800022b1828 rdx: fffff800022b1810
rcx: fffffe01eab693f8  r8: 00000000000000cb  r9: 0000000000000000
rax: fffffe023c2229e8 rbx: fffffe01eab6c2b0 rbp: fffffe023c222650
r10: fffffe023bb1ca72 r11: fffff8043acbc800 r12: fffff80480717a80
r13: fffffe01eab69000 r14: fffff8043acbc800 r15: 0000000000000000
trap number		= 12
panic: page fault
cpuid = 4
time = 1696530515
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe023c2222e0
vpanic() at vpanic+0x132/frame 0xfffffe023c222410
panic() at panic+0x43/frame 0xfffffe023c222470
trap_fatal() at trap_fatal+0x40c/frame 0xfffffe023c2224d0
trap_pfault() at trap_pfault+0xae/frame 0xfffffe023c222540
calltrap() at calltrap+0x8/frame 0xfffffe023c222540
--- trap 0xc, rip = 0xffffffff836c65d0, rsp = 0xfffffe023c222610, rbp = 0xfffffe023c222650 ---
intel_atomic_get_global_obj_state() at intel_atomic_get_global_obj_state+0x90/frame 0xfffffe023c222650
skl_compute_wm() at skl_compute_wm+0xaec/frame 0xfffffe023c222870
intel_atomic_check() at intel_atomic_check+0xeff/frame 0xfffffe023c222940
drm_atomic_check_only() at drm_atomic_check_only+0x4a3/frame 0xfffffe023c2229b0
drm_atomic_commit() at drm_atomic_commit+0x13/frame 0xfffffe023c2229d0
drm_client_modeset_commit_atomic() at drm_client_modeset_commit_atomic+0x158/frame 0xfffffe023c222a40
drm_client_modeset_commit_locked() at drm_client_modeset_commit_locked+0x74/frame 0xfffffe023c222a90
drm_client_modeset_commit() at drm_client_modeset_commit+0x21/frame 0xfffffe023c222ab0
drm_fb_helper_restore_fbdev_mode_unlocked() at drm_fb_helper_restore_fbdev_mode_unlocked+0x83/frame 0xfffffe023c222ae0
vt_kms_postswitch() at vt_kms_postswitch+0x181/frame 0xfffffe023c222b10
vt_window_switch() at vt_window_switch+0x121/frame 0xfffffe023c222b50
vtterm_cngrab() at vtterm_cngrab+0x4f/frame 0xfffffe023c222b70
cngrab() at cngrab+0x26/frame 0xfffffe023c222b90
vpanic() at vpanic+0xd1/frame 0xfffffe023c222cc0
panic() at panic+0x43/frame 0xfffffe023c222d20
lkpi_sta_auth_to_assoc() at lkpi_sta_auth_to_assoc+0x234/frame 0xfffffe023c222d80
lkpi_iv_newstate() at lkpi_iv_newstate+0x253/frame 0xfffffe023c222df0
ieee80211_newstate_cb() at ieee80211_newstate_cb+0x1e7/frame 0xfffffe023c222e40
taskqueue_run_locked() at taskqueue_run_locked+0xab/frame 0xfffffe023c222ec0
taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame 0xfffffe023c222ef0
fork_exit() at fork_exit+0x82/frame 0xfffffe023c222f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe023c222f30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Dumping 2378 out of 65308 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

?
Comment 17 Bjoern A. Zeeb freebsd_committer freebsd_triage 2023-10-08 12:40:17 UTC
(In reply to Oleg from comment #16)

> Are you looking for this information:

If you have a core.txt.<0> there should be a section further down titled (without the ::) which should have even more message buffer information.

:: ------------------------------------------------------------------------
:: dmesg


Can I ask, given the topic of the PR:  is this when running an installed system or during bsdinstall?

Also which chipset and firmware version are you on now (and which freebsd hash)?
Comment 18 Oleg 2023-10-08 13:40:39 UTC
It happened when running an installed system after I typed "kldunload if_iwlwifi" on FreeBSD 15.0-CURRENT #0 main-n265776-b6a61ac2d475. Typing "kldunload if_iwlwifi" sometimes leads to a crash, and sometimes it doesn't. Is this what you are asking about:

Autoloading module: if_iwlwifi
Intel(R) Wireless WiFi based driver for FreeBSD
Autoloading module: ig4
pci0: driver added
found->	vendor=0x8086, dev=0x43ef, revid=0x11
	domain=0, bus=0, slot=20, func=2
	class=05-00-00, hdrtype=0x00, mfdev=0
	cmdreg=0x0002, statreg=0x0010, cachelnsz=16 (dwords)
	lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)
	powerspec 3  supports D0 D3  current D0
pci0:0:20:2: reprobing on driver added
found->	vendor=0x8086, dev=0x43f0, revid=0x11
	domain=0, bus=0, slot=20, func=3
	class=02-80-00, hdrtype=0x00, mfdev=1
	cmdreg=0x0002, statreg=0x0010, cachelnsz=16 (dwords)
	lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)
	intpin=a, irq=255
	powerspec 3  supports D0 D3  current D0
	MSI supports 1 message, 64 bit
	MSI-X supports 16 messages in map 0x10
pci0:0:20:3: reprobing on driver added
iwlwifi0: <iwlwifi> mem 0x6001114000-0x6001117fff at device 20.3 on pci0
iwlwifi0: attempting to allocate 16 MSI-X vectors (16 supported)
msi: routing MSI-X IRQ 145 to local APIC 0 vector 59
msi: routing MSI-X IRQ 146 to local APIC 2 vector 49
msi: routing MSI-X IRQ 147 to local APIC 4 vector 49
msi: routing MSI-X IRQ 148 to local APIC 6 vector 49
msi: routing MSI-X IRQ 149 to local APIC 8 vector 49
msi: routing MSI-X IRQ 150 to local APIC 10 vector 49
msi: routing MSI-X IRQ 151 to local APIC 12 vector 48
msi: routing MSI-X IRQ 152 to local APIC 14 vector 48
msi: routing MSI-X IRQ 153 to local APIC 16 vector 48
msi: routing MSI-X IRQ 154 to local APIC 18 vector 49
msi: routing MSI-X IRQ 155 to local APIC 0 vector 60
msi: routing MSI-X IRQ 156 to local APIC 2 vector 50
msi: routing MSI-X IRQ 157 to local APIC 4 vector 50
msi: routing MSI-X IRQ 158 to local APIC 6 vector 50
msi: routing MSI-X IRQ 159 to local APIC 8 vector 50
msi: routing MSI-X IRQ 160 to local APIC 10 vector 50
iwlwifi0: using IRQs 145-160 for MSI-X
msi: Assigning MSI-X IRQ 146 to local APIC 0 vector 61
msi: Assigning MSI-X IRQ 147 to local APIC 1 vector 49
msi: Assigning MSI-X IRQ 148 to local APIC 2 vector 49
msi: Assigning MSI-X IRQ 149 to local APIC 3 vector 49
msi: Assigning MSI-X IRQ 150 to local APIC 4 vector 49
msi: Assigning MSI-X IRQ 151 to local APIC 5 vector 49
msi: Assigning MSI-X IRQ 152 to local APIC 6 vector 49
msi: Assigning MSI-X IRQ 153 to local APIC 7 vector 49
msi: Assigning MSI-X IRQ 154 to local APIC 8 vector 49
msi: Assigning MSI-X IRQ 155 to local APIC 9 vector 49
msi: Assigning MSI-X IRQ 156 to local APIC 10 vector 49
msi: Assigning MSI-X IRQ 157 to local APIC 11 vector 49
msi: Assigning MSI-X IRQ 158 to local APIC 12 vector 48
msi: Assigning MSI-X IRQ 159 to local APIC 13 vector 49
iwlwifi0: Detected crf-id 0x3617, cnv-id 0x20000302 wfpm id 0x80000000
iwlwifi0: PCI dev 43f0/0074, rev=0x351, rfid=0x10a100
firmware: 'iwlwifi-QuZ-a0-hr-b0-77.ucode' version 77: 1404840 bytes loaded at 0xffffffff83aef000
iwlwifi0: successfully loaded firmware image 'iwlwifi-QuZ-a0-hr-b0-77.ucode'
iwlwifi0: api flags index 2 larger than supported by driver
iwlwifi0: TLV_FW_FSEQ_VERSION: FSEQ Version: 89.3.35.37
iwl-debug-yoyo.bin: could not load firmware image, error 2
iwl-debug-yoyo.bin: could not load firmware image, error 2
iwl-debug-yoyo_bin: could not load firmware image, error 2
iwl_debug_yoyo_bin: could not load firmware image, error 2
iwlwifi0: loaded firmware version 77.2df8986f.0 QuZ-a0-hr-b0-77.ucode op_mode iwlmvm
iwlwifi0: Detected Intel(R) Wi-Fi 6 AX201 160MHz, REV=0x351
iwlwifi0: Detected RF HR B5, rfid=0x10a100
iwlwifi0: base HW address: 10:3d:1c:9c:8d:1c
iwlwifi0: 11a rates: 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps
iwlwifi0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps
iwlwifi0: 11g rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps
found->	vendor=0x8086, dev=0x43e8, revid=0x11
	domain=0, bus=0, slot=21, func=0
	class=0c-80-00, hdrtype=0x00, mfdev=1
	cmdreg=0x0004, statreg=0x0010, cachelnsz=16 (dwords)
	lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)
	intpin=a, irq=255
	powerspec 3  supports D0 D3  current D0
?
Comment 19 Oleg 2023-10-08 14:04:50 UTC
And now I experienced this panic after compiling the latest 15-CURRENT kernel: 
panic: lkpi_sta_auth_to_scan: lsta 0xfffff800022bb000 state not NONE: 0, nstate 1 arg 1 . Before the latest wifi-related commits to the kernel, this bug was always triggered if I typed "ifconfig wlan0 up" before executing the wpa_supplicant command. But now it is only sometimes triggered:

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
57		__asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
(kgdb) #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
#1  doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:405
#2  0xffffffff837ed323 in vt_kms_postswitch () from /boot/modules/drm.ko
#3  0xffffffff8099300e in vt_window_switch (vw=0xfffff800272252c0)
    at /usr/src/sys/dev/vt/vt_core.c:595
#4  0xffffffff80b4ed63 in kern_reboot (howto=4)
    at /usr/src/sys/kern/kern_shutdown.c:501
#5  0xffffffff80b4f53f in vpanic (fmt=0xffffffff8118fa9b "%s", 
    ap=ap@entry=0xfffffe0201c09430) at /usr/src/sys/kern/kern_shutdown.c:970
#6  0xffffffff80b4f2e3 in panic (fmt=<unavailable>)
    at /usr/src/sys/kern/kern_shutdown.c:894
#7  0xffffffff8104ecbc in trap_fatal (frame=0xfffffe0201c09530, 
    eva=18446744069414584320) at /usr/src/sys/amd64/amd64/trap.c:952
#8  0xffffffff8104ed6e in trap_pfault (frame=0xfffffe0201c09530, 
    usermode=false, signo=<optimized out>, ucode=<optimized out>)
    at /usr/src/sys/amd64/amd64/trap.c:760
#9  <signal handler called>
#10 0xffffffff836c65d0 in intel_atomic_get_global_obj_state ()
   from /boot/modules/i915kms.ko
#11 0xffffffff8367c35c in skl_compute_wm () from /boot/modules/i915kms.ko
#12 0xffffffff8364464f in intel_atomic_check () from /boot/modules/i915kms.ko
#13 0xffffffff837ad783 in drm_atomic_check_only () from /boot/modules/drm.ko
#14 0xffffffff837adbc3 in drm_atomic_commit () from /boot/modules/drm.ko
#15 0xffffffff837bd298 in drm_client_modeset_commit_atomic ()
   from /boot/modules/drm.ko
#16 0xffffffff837bd384 in drm_client_modeset_commit_locked ()
   from /boot/modules/drm.ko
#17 0xffffffff837bd511 in drm_client_modeset_commit ()
   from /boot/modules/drm.ko
#18 0xffffffff837fff13 in drm_fb_helper_restore_fbdev_mode_unlocked ()
   from /boot/modules/drm.ko
#19 0xffffffff837ed461 in vt_kms_postswitch () from /boot/modules/drm.ko
#20 0xffffffff80992ed1 in vt_window_switch (vw=0xfffffe01eaccc2b0, 
    vw@entry=0xffffffff816a9c98 <vt_conswindow>)
    at /usr/src/sys/dev/vt/vt_core.c:612
#21 0xffffffff8099422f in vtterm_cngrab (tm=<optimized out>)
    at /usr/src/sys/dev/vt/vt_core.c:1863
#22 0xffffffff80adf1f6 in cngrab () at /usr/src/sys/kern/kern_cons.c:385
#23 0xffffffff80b4f471 in vpanic (
    fmt=0xffffffff811e2547 "%s: lsta %p state not NONE: %#x, nstate %d arg %d\n", ap=ap@entry=0xfffffe0201c09ce0) at /usr/src/sys/kern/kern_shutdown.c:942
#24 0xffffffff80b4f2e3 in panic (
    fmt=0xffffffff00000000 <error: Cannot access memory at address 0xffffffff00000000>) at /usr/src/sys/kern/kern_shutdown.c:894
#25 0xffffffff80dd1d37 in lkpi_sta_auth_to_scan (vap=0xfffffe0201449010, 
    nstate=IEEE80211_S_SCAN, arg=1)
    at /usr/src/sys/compat/linuxkpi/common/src/linux_80211.c:1167
#26 0xffffffff80dd9223 in lkpi_iv_newstate (vap=0xfffffe0201449010, 
    nstate=IEEE80211_S_SCAN, arg=1)
    at /usr/src/sys/compat/linuxkpi/common/src/linux_80211.c:2064
#27 0xffffffff80cfeb37 in ieee80211_newstate_cb (xvap=0xfffffe0201449010, 
    npending=<optimized out>) at /usr/src/sys/net80211/ieee80211_proto.c:2546
#28 0xffffffff80bb4b2b in taskqueue_run_locked (
    queue=queue@entry=0xfffff804663bd500)
    at /usr/src/sys/kern/subr_taskqueue.c:512
#29 0xffffffff80bb5be3 in taskqueue_thread_loop (
    arg=arg@entry=0xfffffe01fdb75110)
    at /usr/src/sys/kern/subr_taskqueue.c:824
#30 0xffffffff80b050b2 in fork_exit (
    callout=0xffffffff80bb5b10 <taskqueue_thread_loop>, 
    arg=0xfffffe01fdb75110, frame=0xfffffe0201c09f40)
    at /usr/src/sys/kern/kern_fork.c:1160
#31 <signal handler called>
(kgdb)
Comment 20 rkoberman 2023-10-09 00:55:04 UTC
I just tried another "service netif restart wlan0" and my system froze. No panic. Nothing logged. No core dump. Display froze after "Starting wpa_supplicant." Nothing abnormal until that point.

After a reboot I tried again without X running. This time I got a panic:
ptavv dumped core - see /var/crash/vmcore.5

Sun Oct  8 17:37:18 PDT 2023

FreeBSD ptavv 15.0-CURRENT FreeBSD 15.0-CURRENT #7 main-n265807-04c8bfc17610: Sat Oct  7 23:34:33 PDT 2023     root@ptavv:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64

panic: lkpi_sta_auth_to_scan: lsta 0xfffff8000bb92000 state not NONE: 0, nstate 1 arg 1

I can attach the full file if it looks useful.
One oddity is that I see several drm items in the stack dump:
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe015adc8330
vpanic() at vpanic+0x132/frame 0xfffffe015adc8460
panic() at panic+0x43/frame 0xfffffe015adc84c0
trap_fatal() at trap_fatal+0x40c/frame 0xfffffe015adc8520
calltrap() at calltrap+0x8/frame 0xfffffe015adc8520
--- trap 0x9, rip = 0xffffffff831d25d0, rsp = 0xfffffe015adc85f0, rbp = 0xfffffe015adc8630 ---
intel_atomic_get_global_obj_state() at intel_atomic_get_global_obj_state+0x90/frame 0xfffffe015adc8630
skl_compute_wm() at skl_compute_wm+0xaec/frame 0xfffffe015adc8850
intel_atomic_check() at intel_atomic_check+0xeff/frame 0xfffffe015adc8920
drm_atomic_check_only() at drm_atomic_check_only+0x4a3/frame 0xfffffe015adc8990
drm_atomic_commit() at drm_atomic_commit+0x13/frame 0xfffffe015adc89b0
drm_client_modeset_commit_atomic() at drm_client_modeset_commit_atomic+0x158/frame 0xfffffe015adc8a20
drm_client_modeset_commit_locked() at drm_client_modeset_commit_locked+0x74/frame 0xfffffe015adc8a70
drm_client_modeset_commit() at drm_client_modeset_commit+0x21/frame 0xfffffe015adc8a90
drm_fb_helper_restore_fbdev_mode_unlocked() at drm_fb_helper_restore_fbdev_mode_unlocked+0x83/frame 0xfffffe015adc8ac0
vt_kms_postswitch() at vt_kms_postswitch+0x181/frame 0xfffffe015adc8af0
vt_window_switch() at vt_window_switch+0x25e/frame 0xfffffe015adc8b30
vtterm_cngrab() at vtterm_cngrab+0x4f/frame 0xfffffe015adc8b50
cngrab() at cngrab+0x26/frame 0xfffffe015adc8b70
vpanic() at vpanic+0xd1/frame 0xfffffe015adc8ca0
panic() at panic+0x43/frame 0xfffffe015adc8d00

This crash occurred after I had terminated the X session and was in text mode on vty0. 

Let me know what else I might be able to provide.
Comment 21 rkoberman 2023-10-09 01:23:53 UTC
(In reply to rkoberman from comment #20)
Forgive me. I posted this to the wrong ticket. I'll reenter it (corrected) to the proper place.
Comment 22 Bjoern A. Zeeb freebsd_committer freebsd_triage 2023-10-09 20:04:27 UTC
(In reply to Oleg from comment #19)

Oleg, if you update to latest main, you may hopefully see some more information or error printed before the KASSERT triggers.  It would be helpful to know that.
Comment 23 Oleg 2023-10-09 21:10:00 UTC
After updating to the latest kernel, I still haven't been able to trigger the bug even after many attempts (typing "kldunload if_iwlwifi" or "ifconfig wlan0 up" early hasn't triggered the bug even after many attempts). I don't know why. In my previous message, I said that the bug was sometimes triggered and sometimes it wasn't, but today, I haven't been able to trigger it at all.
Comment 24 Jean-Sébastien Pédron freebsd_committer freebsd_triage 2023-10-12 22:34:28 UTC
Hi!

I also get this panic when running a kernel based on commit 7cff9672de44824d7d59cb562f53992a055e49cc. To be exact, I have a few more commits on top of for upcoming updates to drm-kmod.

It's easy to reproduce: I simply use "service netif restart wlan0" (it was skipped during boot).

Here are the few lines before the panic and the backtrace:

<6>wlan0: ieee80211_new_state_locked: pending SCAN -> AUTH transition lost
<4>Invalid TXQ id
iwl_mvm_tx_mpdu:1204: fc 0x00b0 tid 8 txq_id 65535 mvm 0xfffffe0147794408 skb 0xfffff8000b884000 { len 30 } info 0xfffffe00c83f5ce8 sta 0xfffff803c451d880 (if you see this please ro PR 274382)
panic: lkpi_sta_auth_to_scan: lsta 0xfffff8000f352000 state not NONE: 0, nstate 1 arg 1

cpuid = 6
time = 1697050125
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0147abdb70
vpanic() at vpanic+0x132/frame 0xfffffe0147abdca0
panic() at panic+0x43/frame 0xfffffe0147abdd00
lkpi_sta_auth_to_scan() at lkpi_sta_auth_to_scan+0x2c8/frame 0xfffffe0147abdd80
lkpi_iv_newstate() at lkpi_iv_newstate+0x253/frame 0xfffffe0147abddf0
ieee80211_newstate_cb() at ieee80211_newstate_cb+0x1e7/frame 0xfffffe0147abde40
taskqueue_run_locked() at taskqueue_run_locked+0xab/frame 0xfffffe0147abdec0
taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame 0xfffffe0147abdef0
fork_exit() at fork_exit+0x82/frame 0xfffffe0147abdf30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0147abdf30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Uptime: 14m18s
Dumping 892 out of 16038 MB:..2%..11%..22%..31%..42%..51%..61%..72%..81%..92%

__curthread () at /home/dumbbell/Documents/freebsd/src/sys/amd64/include/pcpu_aux.h:57
57		__asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
(kgdb) bt
#0  __curthread () at /home/dumbbell/Documents/freebsd/src/sys/amd64/include/pcpu_aux.h:57
#1  doadump (textdump=textdump@entry=1) at /home/dumbbell/Documents/freebsd/src/sys/kern/kern_shutdown.c:405
#2  0xffffffff80b4f3e0 in kern_reboot (howto=260) at /home/dumbbell/Documents/freebsd/src/sys/kern/kern_shutdown.c:526
#3  0xffffffff80b4f8df in vpanic (fmt=0xffffffff811e6539 "%s: lsta %p state not NONE: %#x, nstate %d arg %d\n", ap=ap@entry=0xfffffe0147abdce0)
    at /home/dumbbell/Documents/freebsd/src/sys/kern/kern_shutdown.c:969
#4  0xffffffff80b4f683 in panic (fmt=<unavailable>) at /home/dumbbell/Documents/freebsd/src/sys/kern/kern_shutdown.c:894
#5  0xffffffff80dd2568 in lkpi_sta_auth_to_scan (vap=0xfffffe014a16d010, nstate=IEEE80211_S_SCAN, arg=1)
    at /home/dumbbell/Documents/freebsd/src/sys/compat/linuxkpi/common/src/linux_80211.c:1175
#6  0xffffffff80dd9c93 in lkpi_iv_newstate (vap=0xfffffe014a16d010, nstate=IEEE80211_S_SCAN, arg=1)
    at /home/dumbbell/Documents/freebsd/src/sys/compat/linuxkpi/common/src/linux_80211.c:2113
#7  0xffffffff80cff027 in ieee80211_newstate_cb (xvap=0xfffffe014a16d010, npending=<optimized out>) at /home/dumbbell/Documents/freebsd/src/sys/net80211/ieee80211_proto.c:2546
#8  0xffffffff80bb4ecb in taskqueue_run_locked (queue=queue@entry=0xfffff8000b21a600) at /home/dumbbell/Documents/freebsd/src/sys/kern/subr_taskqueue.c:512
#9  0xffffffff80bb5f83 in taskqueue_thread_loop (arg=arg@entry=0xfffffe0147798110) at /home/dumbbell/Documents/freebsd/src/sys/kern/subr_taskqueue.c:824
#10 0xffffffff80b05452 in fork_exit (callout=0xffffffff80bb5eb0 <taskqueue_thread_loop>, arg=0xfffffe0147798110, frame=0xfffffe0147abdf40)
    at /home/dumbbell/Documents/freebsd/src/sys/kern/kern_fork.c:1160
Comment 25 Bjoern A. Zeeb freebsd_committer freebsd_triage 2023-10-25 21:20:35 UTC
Hi,

if you get either one or both of:

(a) panic: lkpi_sta_auth_to_scan: ... (or other state names)
(b) ieee80211_new_state_locked: pending SCAN -> AUTH transition lost (or other state names)

could you apply the following patch:

https://people.freebsd.org/~bz/wireless/20231025-01-80211-newstate.diff

which will (1) give more information and (2) disable an extra case and report back.

Note: the "ieee80211_new_state_locked:2682: RUN -> INIT (INIT) transition discarded" loggings are generally not interesting but I enabled them for the full picture.
Comment 26 rkoberman 2023-10-26 03:21:46 UTC
(In reply to Bjoern A. Zeeb from comment #25)
Patched and rebuilt the kernel. Crash looks a lot like the previous ones. After my last kernel update I am seeing some new messages during boot. I suspect that they are not new information, but they do look odd to me.
iwlwifi0: WRT: Invalid buffer destination
iwlwifi0: WFPM_UMAC_PD_NOTIFICATION: 0x20
iwlwifi0: WFPM_LMAC2_PD_NOTIFICATION: 0x1f
iwlwifi0: WFPM_AUTH_KEY_0: 0x90
iwlwifi0: CNVI_SCU_SEQ_DATA_DW9: 0x0
iwlwifi0: RFIm is deactivated, reason = 4

I also see:
wlan0: ieee80211_new_state_locked:2718: pending SCAN -> AUTH transition lost
wlan0: ieee80211_new_state_locked:2718: pending AUTH -> SCAN transition lost

Do you want the core file?
Comment 27 Jean-Sébastien Pédron freebsd_committer freebsd_triage 2023-11-01 11:39:05 UTC
(In reply to Bjoern A. Zeeb from comment #25)

Here are the steps I used to reproduce:

(the if_iwlwifi module was already loaded)
ifconfig wlan0 create wlandev iwlwifi0 country FR
env wlans_iwlwifi0="wlan0" create_args_wlan0="country FR" ifconfig_wlan0="WPA DHCP" ifconfig_wlan0_ipv6="inet6 accept_rtadv" service netif restart wlan0

And here is the output with your patch:

== The last lines of /var/log/messages ==

Nov  1 11:07:20 iss kernel: iwlwifi0: WRT: Invalid buffer destination
Nov  1 11:07:21 iss kernel: iwlwifi0: WFPM_UMAC_PD_NOTIFICATION: 0x20
Nov  1 11:07:21 iss kernel: iwlwifi0: WFPM_LMAC2_PD_NOTIFICATION: 0x1f
Nov  1 11:07:21 iss kernel: iwlwifi0: WFPM_AUTH_KEY_0: 0x90
Nov  1 11:07:21 iss kernel: iwlwifi0: CNVI_SCU_SEQ_DATA_DW9: 0x0
Nov  1 11:07:21 iss kernel: wlan0: Ethernet address: 04:cf:4b:1d:fe:fc
Nov  1 11:07:38 iss wpa_supplicant[1534]: Successfully initialized wpa_supplicant
Nov  1 11:07:38 iss wpa_supplicant[1534]: ioctl[SIOCS80211, op=20, val=0, arg_len=7]: Invalid argument
Nov  1 11:07:38 iss syslogd: last message repeated 1 times
Nov  1 11:07:38 iss wpa_supplicant[1535]: ioctl[SIOCS80211, op=103, val=0, arg_len=128]: Operation now in progress
Nov  1 11:07:38 iss wpa_supplicant[1535]: wlan0: CTRL-EVENT-SCAN-FAILED ret=-1 retry=1
Nov  1 11:07:39 iss wpa_supplicant[1535]: ioctl[SIOCS80211, op=103, val=0, arg_len=128]: Operation now in progress
Nov  1 11:07:39 iss wpa_supplicant[1535]: wlan0: CTRL-EVENT-SCAN-FAILED ret=-1 retry=1

== kgdb ==

(...)
Reading symbols from /boot/kernel.drm/kernel...
Reading symbols from /usr/lib/debug//boot/kernel.drm/kernel.debug...

Unread portion of the kernel message buffer:
<6>wlan0: ieee80211_new_state_locked:2718: pending SCAN -> AUTH transition lost
<4>Invalid TXQ id
iwl_mvm_tx_mpdu:1204: fc 0x00b0 tid 8 txq_id 65535 mvm 0xfffffe01762c6408 skb 0
xfffff802d41a6800 { len 30 } info 0xfffffe0038f6bce8 sta 0xfffff80114044880 (if
 you see this please report to PR 274382)
panic: lkpi_sta_auth_to_scan: lsta 0xfffff80114c1e800 state not NONE: 0, nstate
 1 arg 1

cpuid = 15
time = 1698833262
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0175ce8b70
vpanic() at vpanic+0x171/frame 0xfffffe0175ce8ca0
panic() at panic+0x43/frame 0xfffffe0175ce8d00
lkpi_sta_auth_to_scan() at lkpi_sta_auth_to_scan+0x2c8/frame 0xfffffe0175ce8d80
lkpi_iv_newstate() at lkpi_iv_newstate+0x253/frame 0xfffffe0175ce8df0
ieee80211_newstate_cb() at ieee80211_newstate_cb+0x1e7/frame 0xfffffe0175ce8e40
taskqueue_run_locked() at taskqueue_run_locked+0xab/frame 0xfffffe0175ce8ec0
taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame 0xfffffe0175ce8ef0
fork_exit() at fork_exit+0x82/frame 0xfffffe0175ce8f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0175ce8f30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Uptime: 5m22s
Dumping 1320 out of 32422 MB:..2%..11%..21%..31%..42%..51%..61%..71%..82%..91%

(kgdb) bt
#0  __curthread ()
    at /home/dumbbell/Documents/freebsd/src/sys/amd64/include/pcpu_aux.h:57
#1  doadump (textdump=textdump@entry=1)
    at /home/dumbbell/Documents/freebsd/src/sys/kern/kern_shutdown.c:406
#2  0xffffffff80b4ffd0 in kern_reboot (howto=260)
    at /home/dumbbell/Documents/freebsd/src/sys/kern/kern_shutdown.c:527
#3  0xffffffff80b5050e in vpanic (
    fmt=0xffffffff811e7898 "%s: lsta %p state not NONE: %#x, nstate %d arg %d\n
", ap=ap@entry=0xfffffe0175ce8ce0)
    at /home/dumbbell/Documents/freebsd/src/sys/kern/kern_shutdown.c:976
#4  0xffffffff80b50273 in panic (fmt=<unavailable>)
    at /home/dumbbell/Documents/freebsd/src/sys/kern/kern_shutdown.c:895
#5  0xffffffff80dd3ab8 in lkpi_sta_auth_to_scan (vap=0xfffffe017908f010,
    nstate=IEEE80211_S_SCAN, arg=1)
    at /home/dumbbell/Documents/freebsd/src/sys/compat/linuxkpi/common/src/linu
x_80211.c:1175
#6  0xffffffff80ddb1e3 in lkpi_iv_newstate (vap=0xfffffe017908f010,
    nstate=IEEE80211_S_SCAN, arg=1)
    at /home/dumbbell/Documents/freebsd/src/sys/compat/linuxkpi/common/src/linu
x_80211.c:2113
#7  0xffffffff80cfff87 in ieee80211_newstate_cb (xvap=0xfffffe017908f010,
    npending=<optimized out>)
    at /home/dumbbell/Documents/freebsd/src/sys/net80211/ieee80211_proto.c:2546
#8  0xffffffff80bb5d2b in taskqueue_run_locked (
    queue=queue@entry=0xfffff80002a93100)
    at /home/dumbbell/Documents/freebsd/src/sys/kern/subr_taskqueue.c:512
#9  0xffffffff80bb6de3 in taskqueue_thread_loop (
    arg=arg@entry=0xfffffe01762ca110)
    at /home/dumbbell/Documents/freebsd/src/sys/kern/subr_taskqueue.c:824
#10 0xffffffff80b05eb2 in fork_exit (
    callout=0xffffffff80bb6d10 <taskqueue_thread_loop>,
    arg=0xfffffe01762ca110, frame=0xfffffe0175ce8f40)
    at /home/dumbbell/Documents/freebsd/src/sys/kern/kern_fork.c:1160
#11 <signal handler called>
Comment 28 Cheng Cui freebsd_committer freebsd_triage 2023-11-09 22:04:37 UTC
Hit this panic in main with a patch to newstate-logging.

cc@n1_iwl_vm:~ % uname -a
FreeBSD n1_iwl_vm 15.0-CURRENT FreeBSD 15.0-CURRENT #1 main-f7d16a627-dirty: Thu Nov  9 16:03:11 EST 2023     cc@n1_iwl_vm:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
cc@n1_iwl_vm:~ % 

The reproduce method is just reboot with the following rc.conf setup.
/etc/rc.conf
wlans_iwlwifi0="wlan0"
ifconfig_wlan0="WPA SYNCDHCP"
create_args_wlan0="country US regdomain fcc"
wlandebug_wlan0="+state "

/boot/loader.conf
boot_verbose="YES"
kern.msgbufsize=1146880

console prints before panic:
...
iwlwifi0: Detected crf-id 0x3617, cnv-id 0x100530 wfpm id 0x80000000
iwlwifi0: PCI dev 2723/0084, rev=0x340, rfid=0x10a100
firmware: 'iwlwifi-cc-a0-77.ucode' version 77: 1366144 bytes loaded at 0xffffffff826a5000
iwlwifi0: successfully loaded firmware image 'iwlwifi-cc-a0-77.ucode'
iwlwifi0: api flags index 2 larger than supported by driver
iwlwifi0: TLV_FW_FSEQ_VERSION: FSEQ Version: 89.3.35.37
iwl-debug-yoyo.bin: could not load firmware image, error 2
iwl-debug-yoyo.bin: could not load firmware image, error 2
iwl-debug-yoyo_bin: could not load firmware image, error 2
iwl_debug_yoyo_bin: could not load firmware image, error 2
iwlwifi0: loaded firmware version 77.2df8986f.0 cc-a0-77.ucode op_mode iwlmvm
iwlwifi0: Detected Intel(R) Wi-Fi 6 AX200 160MHz, REV=0x340
iwlwifi0: Detected RF HR B3, rfid=0x10a100
iwlwifi0: base HW address: e0:2e:0b:92:e5:82
iwlwifi0: 11a rates: 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps
iwlwifi0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps
iwlwifi0: 11g rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps
pci0: driver added
wlan0: bpf attached
wlan0: bpf attached
wlan0: Ethernet address: e0:2e:0b:92:e5:82
net.wlan.0.debug: 0x0 => 0x80000<state>
Created wlan(4) interfaces: wlan0.
lo0: link state changed to UP
vtnet0: link state changed to UP
Starting dhclient.
DHCPREQUEST on vtnet0 to 255.255.255.255 port 67
DHCPACK from 192.168.1.1
Bogus Host Name option 12: n1_iwl_vm (n1_iwl_vm)
bound to 192.168.1.154 -- renewal in 21600 seconds.
Starting wpa_supplicant.
wlan0: start running, 0 vaps running
wlan0: ieee80211_start_locked: up parent iwlwifi0
wlan0: start running, 1 vaps running
wlan0: ieee80211_new_state_locked:2746: starting state update INIT -> INIT (SCAN)
wlan0: ieee80211_new_state_locked: INIT -> SCAN (arg 0) (nrunning 0 nscanning 0)
wlan0: ieee80211_newstate_cb:2517: running state update INIT -> SCAN (1)
wlan0: ieee80211_newstate_cb: INIT -> SCAN arg 0
wlan0: sta_newstate: INIT -> SCAN (0)
Starting dhclient.
wlan0: no link .....wlan0: ieee80211_new_state_locked:2746: starting state update SCAN -> SCAN (AUTH)
wlan0: ieee80211_new_state_locked: SCAN -> AUTH (arg 192) (nrunning 0 nscanning 0)
wlan0: ieee80211_newstate_cb:2517: running state update SCAN -> AUTH (1)
wlan0: ieee80211_newstate_cb: SCAN -> AUTH arg 192
wlan0: [f4:69:42:57:3f:0e] station assoc via MLME
wlan0: ieee80211_new_state_locked:2731: pending SCAN -> AUTH (now to AUTH) transition lost
wlan0: ieee80211_new_state_locked:2746: starting state update SCAN -> AUTH (AUTH)
wlan0: ieee80211_new_state_locked: SCAN -> AUTH (arg 192) (nrunning 0 nscanning 0)
wlan0: sta_newstate: SCAN -> AUTH (192)
wlan0: ieee80211_newstate_cb:2517: running state update AUTH -> AUTH (1)
wlan0: ieee80211_newstate_cb: AUTH -> AUTH arg 192
Invalid TXQ idiwl_mvm_tx_mpdu:1204: fc 0x00b0 tid 8 txq_id 65535 mvm 0xfffffe00b1250408 skb 0xfffff80007865800 { len 30 } info 0xfffffe00745dcce8 sta 0xfffff80005760880 (if you see this please report to PR 274382)
wlan0: ni 0xfffffe00b15bf000 vap 0xfffffe00b12e0010 mode STA state AUTH m 0xfffff800078b1b00 status 4543576
wlan0: ni 0xfffffe00b15bf000 mode STA state AUTH arg 0x2 status 4543576
wlan0: sta_newstate: AUTH -> AUTH (192)
wlan0: ni 0xfffffe00b15bf000 vap 0xfffffe00b12e0010 mode STA state AUTH m 0xfffff8000773cb00 status 1
wlan0: ni 0xfffffe00b15bf000 mode STA state AUTH arg 0x2 status 1
wlan0: vap 0xfffffe00b12e0010 mode STA state AUTH flags 0x2400 & 0x80
wlan0: ieee80211_new_state_locked:2746: starting state update AUTH -> AUTH (SCAN)
wlan0: ieee80211_new_state_locked: AUTH -> SCAN (arg 1) (nrunning 0 nscanning 0)
wlan0: ieee80211_newstate_cb:2517: running state update AUTH -> SCAN (1)
wlan0: ieee80211_newstate_cb: AUTH -> SCAN arg 1
panic: lkpi_sta_auth_to_scan: lsta 0xfffff80007756800 state not NONE: 0, nstate 1 arg 1

cpuid = 6
time = 1699566558
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00b0eb5b70
vpanic() at vpanic+0x132/frame 0xfffffe00b0eb5ca0
panic() at panic+0x43/frame 0xfffffe00b0eb5d00
lkpi_sta_auth_to_scan() at lkpi_sta_auth_to_scan+0x2c8/frame 0xfffffe00b0eb5d80
lkpi_iv_newstate() at lkpi_iv_newstate+0x253/frame 0xfffffe00b0eb5df0
ieee80211_newstate_cb() at ieee80211_newstate_cb+0x226/frame 0xfffffe00b0eb5e40
taskqueue_run_locked() at taskqueue_run_locked+0xab/frame 0xfffffe00b0eb5ec0
taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame 0xfffffe00b0eb5ef0
fork_exit() at fork_exit+0x82/frame 0xfffffe00b0eb5f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00b0eb5f30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 0 tid 100168 ]
Stopped at      kdb_enter+0x32: movq    $0,0xe2aee3(%rip)
db> dump
Dumping 362 out of 6111 MB:..5%..14%..23%..31%..45%..53%..62%..71%..84%..93%
Dump complete
db>
Comment 29 Bjoern A. Zeeb freebsd_committer freebsd_triage 2023-11-09 23:45:29 UTC
(In reply to Cheng Cui from comment #28)

Ok, based on the wlandebug +state and the additional logging from [1] here's the race in net80211 affecting possibly all drivers:
 
[1] https://people.freebsd.org/~bz/wireless/20231109-01-net80211-newstate-logging.diff
 
>>>> ANNOTATED OUTPUT from Comment 28 [https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=271979#c28];  hope we can confirm this with time stamps on.

wlan0: sta_newstate: INIT -> SCAN (0)
Starting dhclient.
wlan0: no link .....
 
[1] wlan0: ieee80211_new_state_locked:2746: starting state update SCAN -> SCAN (AUTH)
[1] wlan0: ieee80211_new_state_locked: SCAN -> AUTH (arg 192) (nrunning 0 nscanning 0)
[1] wlan0: ieee80211_newstate_cb:2517: running state update SCAN -> AUTH (1)
[1] wlan0: ieee80211_newstate_cb: SCAN -> AUTH arg 192

LinuxKPI running lkpi_sta_scan_to_auth() around here...

wlan0: [f4:69:42:57:3f:0e] station assoc via MLME
ioctl logging, triggering ieee80211_sta_join -> ieee80211_sta_join1 -> ieee80211_new_state(vap, AUTH, IEEE80211_FC0_SUBTYPE_DEAUTH=192) -> [2]
ieee80211_sta_join() would allocate a new node (ni) and lsta in LinuxKPI.
ieee80211_sta_join1() would then call iv_update_bss() and that would swap nodes.
That explains the previous error Colin saw with the queue not having the valid node anymore and also explains why we later panic as the state is not correct anymore.
If the assumption is correct a KASSERT in iv_update_bss() could probably catch this.  I'll post a patch for that as well.  I have a big XXX in that code anyway because of this.

[2] wlan0: ieee80211_new_state_locked:2731: pending SCAN -> AUTH (now to AUTH) transition lost
[2] wlan0: ieee80211_new_state_locked:2746: starting state update SCAN -> AUTH (AUTH)
[2] wlan0: ieee80211_new_state_locked: SCAN -> AUTH (arg 192) (nrunning 0 nscanning 0)

LinuxKPI calls into the original handler for [1] which means lkpi_sta_scan_to_auth() is done:
[1] wlan0: sta_newstate: SCAN -> AUTH (192)
    here iv_state gets updated from SCAN to AUTH,

[2] wlan0: ieee80211_newstate_cb:2517: running state update AUTH -> AUTH (1)
[2] wlan0: ieee80211_newstate_cb: AUTH -> AUTH arg 192

                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The [2] SCAN -> AUTH turned into AUTH -> AUTH;  a further (LinuxKPI runs lkpi_sta_a_to_a() possibly mgmt protection problem given our sta_to_auth has not finished yet -- if we had a reply and moved to assoc; cc@ to test, which would explain the PR in the next line):

Invalid TXQ idiwl_mvm_tx_mpdu:1204: fc 0x00b0 tid 8 txq_id 65535 mvm 0xfffffe00b1250408 skb 0xfffff80007865800 { len 30 } info 0xfffffe00745dcce8 sta 0xfffff80005760880 (if you see this please report to PR 274382)
wlan0: ni 0xfffffe00b15bf000 vap 0xfffffe00b12e0010 mode STA state AUTH m 0xfffff800078b1b00 status 4543576
wlan0: ni 0xfffffe00b15bf000 mode STA state AUTH arg 0x2 status 4543576
[2] wlan0: sta_newstate: AUTH -> AUTH (192)

should call sta_authretry(, with 192 >> 8 == 0 == IEEE80211_STATUS_SUCCESS) -> Sends another b0 (authentication).

wlan0: ni 0xfffffe00b15bf000 vap 0xfffffe00b12e0010 mode STA state AUTH m 0xfffff8000773cb00 status 1
Comment 30 Cheng Cui freebsd_committer freebsd_triage 2023-11-21 17:16:02 UTC
I found a good workaround. Basically, the following commands did the job without scanning or restarting the wlan0 interface.

root@n1_iwl_vm:~ # ifconfig
lo0: flags=1008049<UP,LOOPBACK,RUNNING,MULTICAST,LOWER_UP> metric 0 mtu 16384
        options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
        inet 127.0.0.1 netmask 0xff000000
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
        groups: lo
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
root@n1_iwl_vm:~ # 
root@n1_iwl_vm:~ # uname -a
FreeBSD n1_iwl_vm 15.0-CURRENT FreeBSD 15.0-CURRENT #19 main-488bc7e9a: Tue Nov 21 11:42:00 EST 2023     root@n1_iwl_vm:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64

root@n1_iwl_vm:~ # pciconf -lv | grep -B3 network
iwlwifi0@pci0:0:5:0:    class=0x028000 rev=0x1a hdr=0x00 vendor=0x8086 device=0x2723 subvendor=0x8086 subdevice=0x0084
    vendor     = 'Intel Corporation'
    device     = 'Wi-Fi 6 AX200'
    class      = network
root@n1_iwl_vm:~ #

root@n1_iwl_vm:~ # ifconfig wlan0 create wlandev iwlwifi0 regdomain fcc country US
root@n1_iwl_vm:~ # wpa_supplicant -B -i wlan0 -c /etc/wpa_supplicant.conf
Successfully initialized wpa_supplicant
ioctl[SIOCS80211, op=20, val=0, arg_len=7]: Invalid argument
ioctl[SIOCS80211, op=20, val=0, arg_len=7]: Invalid argument

root@n1_iwl_vm:~ # ifconfig
lo0: flags=1008049<UP,LOOPBACK,RUNNING,MULTICAST,LOWER_UP> metric 0 mtu 16384
        options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
        inet 127.0.0.1 netmask 0xff000000
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
        groups: lo
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
wlan0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=0
        ether e0:2e:0b:92:e5:82
        groups: wlan
        ssid SpectrumSetup-0F channel 157 (5785 MHz 11a) bssid f4:69:42:57:3f:0e
        regdomain FCC country US authmode WPA2/802.11i privacy ON
        deftxkey UNDEF AES-CCM 2:128-bit txpower 23 bmiss 7 mcastrate 6
        mgmtrate 6 scanvalid 60 wme roaming MANUAL
        parent interface: iwlwifi0
        media: IEEE 802.11 Wireless Ethernet OFDM/36Mbps mode 11a
        status: associated                                                      <<< associated!
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>

root@n1_iwl_vm:~ # dhclient wlan0
DHCPREQUEST on wlan0 to 255.255.255.255 port 67                                 <<< request IP addr through DHCP
DHCPACK from 192.168.1.1
Nov 21 11:51:35 n1_iwl_vm dhclient[654]: Bogus Host Name option 12: n1_iwl_vm (n1_iwl_vm)
Bogus Host Name option 12: n1_iwl_vm (n1_iwl_vm)
bound to 192.168.1.214 -- renewal in 21600 seconds.

root@n1_iwl_vm:~ # ifconfig
lo0: flags=1008049<UP,LOOPBACK,RUNNING,MULTICAST,LOWER_UP> metric 0 mtu 16384
        options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
        inet 127.0.0.1 netmask 0xff000000
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
        groups: lo
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
wlan0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=0
        ether e0:2e:0b:92:e5:82
        inet 192.168.1.214 netmask 0xffffff00 broadcast 192.168.1.255
        groups: wlan
        ssid SpectrumSetup-0F channel 157 (5785 MHz 11a) bssid f4:69:42:57:3f:0e
        regdomain FCC country US authmode WPA2/802.11i privacy ON
        deftxkey UNDEF AES-CCM 2:128-bit txpower 23 bmiss 7 mcastrate 6
        mgmtrate 6 scanvalid 60 wme roaming MANUAL
        parent interface: iwlwifi0
        media: IEEE 802.11 Wireless Ethernet OFDM/36Mbps mode 11a
        status: associated
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>

root@n1_iwl_vm:~ # ping -c 3 -S 192.168.1.214 192.168.1.1
PING 192.168.1.1 (192.168.1.1) from 192.168.1.214: 56 data bytes
64 bytes from 192.168.1.1: icmp_seq=0 ttl=64 time=2.904 ms
64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=1.073 ms
64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=1.924 ms

--- 192.168.1.1 ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 1.073/1.967/2.904/0.748 ms
Comment 31 Cheng Cui freebsd_committer freebsd_triage 2023-11-21 17:35:38 UTC
(In reply to Cheng Cui from comment #30)

Well, my above workaround may not work. I tested it multiple times. Sometimes it crashes or sometimes it works. :(
Comment 32 Cheng Cui freebsd_committer freebsd_triage 2023-11-21 17:47:27 UTC
(In reply to Cheng Cui from comment #31)

Workaround update:

Well, I retrieved what I am missing. It looks adding the ssid in the first place during "ifconfig wlan0 create" makes it stable. I found no more crashes. 

root@n1_iwl_vm:~ # ifconfig
lo0: flags=1008049<UP,LOOPBACK,RUNNING,MULTICAST,LOWER_UP> metric 0 mtu 16384
        options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
        inet 127.0.0.1 netmask 0xff000000
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
        groups: lo
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
root@n1_iwl_vm:~ # ifconfig wlan0 create wlandev iwlwifi0 regdomain fcc country US ssid SpectrumSetup-0F
root@n1_iwl_vm:~ # ifconfig
lo0: flags=1008049<UP,LOOPBACK,RUNNING,MULTICAST,LOWER_UP> metric 0 mtu 16384
        options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
        inet 127.0.0.1 netmask 0xff000000
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
        groups: lo
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
wlan0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=0
        ether e0:2e:0b:92:e5:82
        groups: wlan
        ssid SpectrumSetup-0F channel 1 (2412 MHz 11b)
        regdomain FCC country US authmode OPEN privacy OFF txpower 30 bmiss 7
        scanvalid 60 wme bintval 0
        parent interface: iwlwifi0
        media: IEEE 802.11 Wireless Ethernet autoselect (autoselect)
        status: no carrier
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
root@n1_iwl_vm:~ # wpa_supplicant -B -i wlan0 -c /etc/wpa_supplicant.conf
Successfully initialized wpa_supplicant
ioctl[SIOCS80211, op=20, val=0, arg_len=7]: Invalid argument
ioctl[SIOCS80211, op=20, val=0, arg_len=7]: Invalid argument
root@n1_iwl_vm:~ # iwlwifi0: Not associated and the session protection is over already...
iwlwifi0: linuxkpi_ieee80211_connection_loss: vif 0xfffffe00bdea8c80 vap 0xfffffe00bdea8010 state AUTH

root@n1_iwl_vm:~ # ifconfig
lo0: flags=1008049<UP,LOOPBACK,RUNNING,MULTICAST,LOWER_UP> metric 0 mtu 16384
        options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
        inet 127.0.0.1 netmask 0xff000000
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
        groups: lo
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
wlan0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=0
        ether e0:2e:0b:92:e5:82
        groups: wlan
        ssid SpectrumSetup-0F channel 1 (2412 MHz 11g) bssid f4:69:42:57:3f:0d
        regdomain FCC country US authmode WPA2/802.11i privacy ON
        deftxkey UNDEF AES-CCM 2:128-bit txpower 30 bmiss 7 scanvalid 60
        protmode CTS wme roaming MANUAL
        parent interface: iwlwifi0
        media: IEEE 802.11 Wireless Ethernet OFDM/36Mbps mode 11g
        status: associated
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
root@n1_iwl_vm:~ # dhclient wlan0
DHCPREQUEST on wlan0 to 255.255.255.255 port 67
DHCPACK from 192.168.1.1
Nov 21 12:42:57 n1_iwl_vm dhclient[653]: Bogus Host Name option 12: n1_iwl_vm (n1_iwl_vm)
Bogus Host Name option 12: n1_iwl_vm (n1_iwl_vm)
bound to 192.168.1.214 -- renewal in 21600 seconds.
root@n1_iwl_vm:~ # ifconfig
lo0: flags=1008049<UP,LOOPBACK,RUNNING,MULTICAST,LOWER_UP> metric 0 mtu 16384
        options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
        inet 127.0.0.1 netmask 0xff000000
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
        groups: lo
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
wlan0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=0
        ether e0:2e:0b:92:e5:82
        inet 192.168.1.214 netmask 0xffffff00 broadcast 192.168.1.255
        groups: wlan
        ssid SpectrumSetup-0F channel 1 (2412 MHz 11g) bssid f4:69:42:57:3f:0d
        regdomain FCC country US authmode WPA2/802.11i privacy ON
        deftxkey UNDEF AES-CCM 2:128-bit txpower 30 bmiss 7 scanvalid 60
        protmode CTS wme roaming MANUAL
        parent interface: iwlwifi0
        media: IEEE 802.11 Wireless Ethernet OFDM/36Mbps mode 11g
        status: associated
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
root@n1_iwl_vm:~ # ping -c 3 -S 192.168.1.214 192.168.1.1
PING 192.168.1.1 (192.168.1.1) from 192.168.1.214: 56 data bytes
64 bytes from 192.168.1.1: icmp_seq=0 ttl=64 time=3.485 ms
64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=2.810 ms
64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=1.336 ms

--- 192.168.1.1 ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 1.336/2.544/3.485/0.897 ms
root@n1_iwl_vm:~ #
Comment 33 Bjoern A. Zeeb freebsd_committer freebsd_triage 2023-11-22 00:51:33 UTC
(In reply to Cheng Cui from comment #30)


> root@n1_iwl_vm:~ # ifconfig wlan0 create wlandev iwlwifi0 regdomain fcc country US
> root@n1_iwl_vm:~ # wpa_supplicant -B -i wlan0 -c /etc/wpa_supplicant.conf
> Successfully initialized wpa_supplicant

There is a possible panic there;  I had situation where I could run that for almost a day in a row not provoking it, and others can at the instant of a seconds.
Comment 34 Bjoern A. Zeeb freebsd_committer freebsd_triage 2023-11-22 00:53:23 UTC
(In reply to Cheng Cui from comment #32)

> Workaround update:
> 
> Well, I retrieved what I am missing. It looks adding the ssid in the first place during "ifconfig wlan0 create" makes it stable. I found no more crashes. 

That only means you likely only have one BSSID for that SSID.  Once you have two or three APs for the same SSID and you set wpa_supplicant.conf to ignore the one net80211 would pick to try to assoc too after a scan I assume you will still see the crash.
Comment 35 Bjoern A. Zeeb freebsd_committer freebsd_triage 2023-11-22 22:30:20 UTC
*** Bug 275255 has been marked as a duplicate of this bug. ***
Comment 36 Alex Wied 2023-11-27 20:19:21 UTC
I am on 15.0-CURRENT (305a2676ae93fb50a623024d51039415521cb2da), I have multiple base stations under one SSID, and I am experiencing this same crash on boot:

wlans_iwlwifi0="wlan0"
ifconfig_wlan0="WPA SYNCDHCP"
ifconfig_wlan0_ipv6="inet6 auto_linklocal accept_rtadv"

#32 0xffffffff80dd3b08 in lkpi_sta_auth_to_scan (vap=0xfffffe01636e8010, nstate=IEEE80211_S_SCAN, arg=1) at /usr/src/sys/compat/linuxkpi/common/src/linux_80211.c:1175
#33 0xffffffff80ddb263 in lkpi_iv_newstate (vap=0xfffffe01636e8010, nstate=IEEE80211_S_SCAN, arg=1) at /usr/src/sys/compat/linuxkpi/common/src/linux_80211.c:2113
#34 0xffffffff80cffee7 in ieee80211_newstate_cb (xvap=0xfffffe01636e8010, npending=<optimized out>) at /usr/src/sys/net80211/ieee80211_proto.c:2546
Comment 37 commit-hook freebsd_committer freebsd_triage 2023-12-01 01:49:16 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=643d6dce6c1e39f067f8d0feea8615913b324891

commit 643d6dce6c1e39f067f8d0feea8615913b324891
Author:     Bjoern A. Zeeb <bz@FreeBSD.org>
AuthorDate: 2023-12-01 01:37:25 +0000
Commit:     Bjoern A. Zeeb <bz@FreeBSD.org>
CommitDate: 2023-12-01 01:48:34 +0000

    tools/net80211: add mlme_assoc

    mlme_assoc is a tool to trigger net80211::ieee80211_sta_join1() calls
    which in certain conditions cause problems to the LinuxKPI 802.11 compat
    code (but also believed to possibly cause problems in case of race to
    other firmware based drivers).  This has proven to be a good reproducer
    for the problem even on setups which otherwise could run for days without
    hitting it.

    Sponsored by:   The FreeBSD Foundation
    PR:             271979

 tools/tools/net80211/mlme_assoc/Makefile (new)     |   7 +
 tools/tools/net80211/mlme_assoc/README (new)       |  51 ++++++
 tools/tools/net80211/mlme_assoc/mlme_assoc.c (new) | 200 +++++++++++++++++++++
 3 files changed, 258 insertions(+)
Comment 38 commit-hook freebsd_committer freebsd_triage 2024-02-14 19:50:24 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=2ac8a2189ac6707f48f77ef2e36baf696a0d2f40

commit 2ac8a2189ac6707f48f77ef2e36baf696a0d2f40
Author:     Bjoern A. Zeeb <bz@FreeBSD.org>
AuthorDate: 2024-02-03 16:33:56 +0000
Commit:     Bjoern A. Zeeb <bz@FreeBSD.org>
CommitDate: 2024-02-14 19:47:53 +0000

    LinuxKPI: 802.11: band-aid for invalid state changes after (*iv_update_bss)

    With firmware based solutions we cannot just jump from an active session
    to a new iv_bss node without tearing down state for the old and bringing
    up the new node.  This likely used to work on softmac based cards/drivers
    where one could essentially set the state and fire at will.

    We track (*iv_update_bss) calls from net80211 and set a local flag that
    we are out of synch and do not allow any further operations up the state
    machine until we hit INIT or SCAN.  That means someone will take the state
    down, clean up firmware state and then we can join again and build up
    state.

    Apparently this problem has been "known" for a while as native iwm(4) and
    others have similar workarounds (though less strict) and can be equally
    pestered into bad states.  For LinuxKPI all the KASSERTs just massively
    brought this problem out.  The solution will be some rewrites in net80211.
    Until then, try to keep us more stable at least and not die on second
    join1() calls triggered by service netif start wlan0 and similar.

    PR:             271979, 271988, 275255, 263613, 274003
    Sponsored by:   The FreeBSD Foundation (2023, partial)
    MFC after:      3 days
    Reviewed by:    cc
    Differential Revision: https://reviews.freebsd.org/D43725

 sys/compat/linuxkpi/common/src/linux_80211.c | 309 +++++++++++++++++++--------
 sys/compat/linuxkpi/common/src/linux_80211.h |   2 +
 2 files changed, 216 insertions(+), 95 deletions(-)
Comment 39 commit-hook freebsd_committer freebsd_triage 2024-02-14 19:50:28 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=713db49d06deee90dd358b2e4b9ca05368a5eaf6

commit 713db49d06deee90dd358b2e4b9ca05368a5eaf6
Author:     Bjoern A. Zeeb <bz@FreeBSD.org>
AuthorDate: 2024-01-10 10:14:16 +0000
Commit:     Bjoern A. Zeeb <bz@FreeBSD.org>
CommitDate: 2024-02-14 19:47:21 +0000

    net80211: deal with lost state transitions

    Since 5efea30f039c4 we can possibly lose a state transition which can
    cause trouble further down the road.
    The reproducer from 643d6dce6c1e can trigger these for example.
    Drivers for firmware based wireless cards have worked around some of
    this (and other) problems in the past.

    Add an array of tasks rather than a single one as we would simply
    get npending > 1 and lose order with other tasks.  Try to keep state
    changes updated as queued in case we end up with more than one at a
    time.  While this is not ideal either (call it a hack) it will sort
    the problem for now.
    We will queue in ieee80211_new_state_locked() and do checks there
    and dequeue in ieee80211_newstate_cb().
    If we still overrun the (currently) 8 slots we will drop the state
    change rather than overwrite the last one.
    When dequeing we will update iv_nstate and keep it around for historic
    reasons for the moment.

    The longer term we should make the callers of
    ieee80211_new_state[_locked]() actually use the returned errors
    and act appropriately but that will touch a lot more places and
    drivers (possibly incl. changed behaviour for ioctls).

    rtwn(4) and rum(4) should probably be revisted and net80211 internals
    removed (for rum(4) at least the current logic still seems prone to
    races).

    PR:             271979, 271988, 275255, 263613, 274003
    Sponsored by:   The FreeBSD Foundation (in 2023)
    MFC after:      3 days
    Reviewed by:    cc
    Differential Revision: https://reviews.freebsd.org/D43389

 sys/dev/rtwn/if_rtwn.c         |   4 +-
 sys/dev/usb/wlan/if_rum.c      |   4 +-
 sys/net80211/ieee80211.c       |   4 +-
 sys/net80211/ieee80211_ddb.c   |  13 ++++-
 sys/net80211/ieee80211_proto.c | 124 ++++++++++++++++++++++++++++++++++-------
 sys/net80211/ieee80211_var.h   |  13 ++++-
 6 files changed, 134 insertions(+), 28 deletions(-)
Comment 40 commit-hook freebsd_committer freebsd_triage 2024-02-18 21:12:03 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=8c450ea1083b03f30871506b59034f26bc608972

commit 8c450ea1083b03f30871506b59034f26bc608972
Author:     Bjoern A. Zeeb <bz@FreeBSD.org>
AuthorDate: 2024-02-03 16:33:56 +0000
Commit:     Bjoern A. Zeeb <bz@FreeBSD.org>
CommitDate: 2024-02-18 18:31:17 +0000

    LinuxKPI: 802.11: band-aid for invalid state changes after (*iv_update_bss)

    With firmware based solutions we cannot just jump from an active session
    to a new iv_bss node without tearing down state for the old and bringing
    up the new node.  This likely used to work on softmac based cards/drivers
    where one could essentially set the state and fire at will.

    We track (*iv_update_bss) calls from net80211 and set a local flag that
    we are out of synch and do not allow any further operations up the state
    machine until we hit INIT or SCAN.  That means someone will take the state
    down, clean up firmware state and then we can join again and build up
    state.

    Apparently this problem has been "known" for a while as native iwm(4) and
    others have similar workarounds (though less strict) and can be equally
    pestered into bad states.  For LinuxKPI all the KASSERTs just massively
    brought this problem out.  The solution will be some rewrites in net80211.
    Until then, try to keep us more stable at least and not die on second
    join1() calls triggered by service netif start wlan0 and similar.

    PR:             271979, 271988, 275255, 263613, 274003
    Sponsored by:   The FreeBSD Foundation (2023, partial)
    Reviewed by:    cc
    Differential Revision: https://reviews.freebsd.org/D43725

    (cherry picked from commit 2ac8a2189ac6707f48f77ef2e36baf696a0d2f40)

 sys/compat/linuxkpi/common/src/linux_80211.c | 309 +++++++++++++++++++--------
 sys/compat/linuxkpi/common/src/linux_80211.h |   2 +
 2 files changed, 216 insertions(+), 95 deletions(-)
Comment 41 commit-hook freebsd_committer freebsd_triage 2024-02-18 21:12:08 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=b392b36d3776b696601ce0253256803276d24ea2

commit b392b36d3776b696601ce0253256803276d24ea2
Author:     Bjoern A. Zeeb <bz@FreeBSD.org>
AuthorDate: 2024-01-10 10:14:16 +0000
Commit:     Bjoern A. Zeeb <bz@FreeBSD.org>
CommitDate: 2024-02-18 18:31:17 +0000

    net80211: deal with lost state transitions

    Since 5efea30f039c4 we can possibly lose a state transition which can
    cause trouble further down the road.
    The reproducer from 643d6dce6c1e can trigger these for example.
    Drivers for firmware based wireless cards have worked around some of
    this (and other) problems in the past.

    Add an array of tasks rather than a single one as we would simply
    get npending > 1 and lose order with other tasks.  Try to keep state
    changes updated as queued in case we end up with more than one at a
    time.  While this is not ideal either (call it a hack) it will sort
    the problem for now.
    We will queue in ieee80211_new_state_locked() and do checks there
    and dequeue in ieee80211_newstate_cb().
    If we still overrun the (currently) 8 slots we will drop the state
    change rather than overwrite the last one.
    When dequeing we will update iv_nstate and keep it around for historic
    reasons for the moment.

    The longer term we should make the callers of
    ieee80211_new_state[_locked]() actually use the returned errors
    and act appropriately but that will touch a lot more places and
    drivers (possibly incl. changed behaviour for ioctls).

    rtwn(4) and rum(4) should probably be revisted and net80211 internals
    removed (for rum(4) at least the current logic still seems prone to
    races).

    Given this changes the internal structure of 'struct ieee80211vap',
    which gets allocated by the drivers, and we do not have enough
    spares, all wireless drivers need to be recompiled.
    Given we are forced to do the update, we leave fields in the middle
    of the struct and add more spares at the same time.
    __FreeBSD_version gets updated to 1400509 to be able to detect
    this change.

    PR:             271979, 271988, 275255, 263613, 274003
    Sponsored by:   The FreeBSD Foundation (in 2023)
    Reviewed by:    cc
    Differential Revision: https://reviews.freebsd.org/D43389

    (cherry picked from commit 713db49d06deee90dd358b2e4b9ca05368a5eaf6)
    (cherry picked from commit a890a3a5ddf33acb0a4000885945b89156799b07)

 UPDATING                       |   6 ++
 sys/dev/rtwn/if_rtwn.c         |   4 +-
 sys/dev/usb/wlan/if_rum.c      |   4 +-
 sys/net80211/ieee80211.c       |   4 +-
 sys/net80211/ieee80211_ddb.c   |  13 ++++-
 sys/net80211/ieee80211_proto.c | 124 ++++++++++++++++++++++++++++++++++-------
 sys/net80211/ieee80211_var.h   |  15 +++--
 sys/sys/param.h                |   2 +-
 8 files changed, 142 insertions(+), 30 deletions(-)
Comment 42 commit-hook freebsd_committer freebsd_triage 2024-02-18 21:12:25 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=e1d739471efdc6fe32af570e4bd07875a7e502ff

commit e1d739471efdc6fe32af570e4bd07875a7e502ff
Author:     Bjoern A. Zeeb <bz@FreeBSD.org>
AuthorDate: 2023-12-01 01:37:25 +0000
Commit:     Bjoern A. Zeeb <bz@FreeBSD.org>
CommitDate: 2024-02-18 18:31:13 +0000

    tools/net80211: add mlme_assoc

    mlme_assoc is a tool to trigger net80211::ieee80211_sta_join1() calls
    which in certain conditions cause problems to the LinuxKPI 802.11 compat
    code (but also believed to possibly cause problems in case of race to
    other firmware based drivers).  This has proven to be a good reproducer
    for the problem even on setups which otherwise could run for days without
    hitting it.

    Sponsored by:   The FreeBSD Foundation
    PR:             271979

    (cherry picked from commit 643d6dce6c1e39f067f8d0feea8615913b324891)

 tools/tools/net80211/mlme_assoc/Makefile (new)     |   7 +
 tools/tools/net80211/mlme_assoc/README (new)       |  51 ++++++
 tools/tools/net80211/mlme_assoc/mlme_assoc.c (new) | 200 +++++++++++++++++++++
 3 files changed, 258 insertions(+)
Comment 43 commit-hook freebsd_committer freebsd_triage 2024-02-19 08:09:05 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=184ccc414686ea32c64f063c081c7cc1adeae7c3

commit 184ccc414686ea32c64f063c081c7cc1adeae7c3
Author:     Bjoern A. Zeeb <bz@FreeBSD.org>
AuthorDate: 2024-02-03 16:33:56 +0000
Commit:     Bjoern A. Zeeb <bz@FreeBSD.org>
CommitDate: 2024-02-19 08:02:02 +0000

    LinuxKPI: 802.11: band-aid for invalid state changes after (*iv_update_bss)

    With firmware based solutions we cannot just jump from an active session
    to a new iv_bss node without tearing down state for the old and bringing
    up the new node.  This likely used to work on softmac based cards/drivers
    where one could essentially set the state and fire at will.

    We track (*iv_update_bss) calls from net80211 and set a local flag that
    we are out of synch and do not allow any further operations up the state
    machine until we hit INIT or SCAN.  That means someone will take the state
    down, clean up firmware state and then we can join again and build up
    state.

    Apparently this problem has been "known" for a while as native iwm(4) and
    others have similar workarounds (though less strict) and can be equally
    pestered into bad states.  For LinuxKPI all the KASSERTs just massively
    brought this problem out.  The solution will be some rewrites in net80211.
    Until then, try to keep us more stable at least and not die on second
    join1() calls triggered by service netif start wlan0 and similar.

    PR:             271979, 271988, 275255, 263613, 274003
    Sponsored by:   The FreeBSD Foundation (2023, partial)
    Reviewed by:    cc
    Differential Revision: https://reviews.freebsd.org/D43725

    (cherry picked from commit 2ac8a2189ac6707f48f77ef2e36baf696a0d2f40)

 sys/compat/linuxkpi/common/src/linux_80211.c | 309 +++++++++++++++++++--------
 sys/compat/linuxkpi/common/src/linux_80211.h |   2 +
 2 files changed, 216 insertions(+), 95 deletions(-)
Comment 44 commit-hook freebsd_committer freebsd_triage 2024-02-19 08:09:20 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=a7e1fc7f620d3341549c1380f550aaafbdb45622

commit a7e1fc7f620d3341549c1380f550aaafbdb45622
Author:     Bjoern A. Zeeb <bz@FreeBSD.org>
AuthorDate: 2024-01-10 10:14:16 +0000
Commit:     Bjoern A. Zeeb <bz@FreeBSD.org>
CommitDate: 2024-02-19 08:02:01 +0000

    net80211: deal with lost state transitions

    Since 5efea30f039c4 we can possibly lose a state transition which can
    cause trouble further down the road.
    The reproducer from 643d6dce6c1e can trigger these for example.
    Drivers for firmware based wireless cards have worked around some of
    this (and other) problems in the past.

    Add an array of tasks rather than a single one as we would simply
    get npending > 1 and lose order with other tasks.  Try to keep state
    changes updated as queued in case we end up with more than one at a
    time.  While this is not ideal either (call it a hack) it will sort
    the problem for now.
    We will queue in ieee80211_new_state_locked() and do checks there
    and dequeue in ieee80211_newstate_cb().
    If we still overrun the (currently) 8 slots we will drop the state
    change rather than overwrite the last one.
    When dequeing we will update iv_nstate and keep it around for historic
    reasons for the moment.

    The longer term we should make the callers of
    ieee80211_new_state[_locked]() actually use the returned errors
    and act appropriately but that will touch a lot more places and
    drivers (possibly incl. changed behaviour for ioctls).

    rtwn(4) and rum(4) should probably be revisted and net80211 internals
    removed (for rum(4) at least the current logic still seems prone to
    races).

    PR:             271979, 271988, 275255, 263613, 274003
    Sponsored by:   The FreeBSD Foundation (in 2023)
    Reviewed by:    cc
    Differential Revision: https://reviews.freebsd.org/D43389

    (cherry picked from commit 713db49d06deee90dd358b2e4b9ca05368a5eaf6)

    Given this changes the internal structure of 'struct ieee80211vap',
    which gets allocated by the drivers, and we do not have enough
    spares, all wireless drivers need to be recompiled.
    Given we are forced to do the update, we leave fields in the middle
    of the struct and add more spares at the same time.
    __FreeBSD_version gets updated to 1303501 to be able to detect
    this change.

    (cherry picked from commit a890a3a5ddf33acb0a4000885945b89156799b07)

 UPDATING                       |   6 ++
 sys/dev/rtwn/if_rtwn.c         |   4 +-
 sys/dev/usb/wlan/if_rum.c      |   4 +-
 sys/net80211/ieee80211.c       |   4 +-
 sys/net80211/ieee80211_ddb.c   |  15 ++++-
 sys/net80211/ieee80211_proto.c | 124 ++++++++++++++++++++++++++++++++++-------
 sys/net80211/ieee80211_var.h   |  18 +++---
 sys/sys/param.h                |   2 +-
 8 files changed, 143 insertions(+), 34 deletions(-)
Comment 45 commit-hook freebsd_committer freebsd_triage 2024-02-19 08:09:31 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=135f22ad82f6b5179f40123a8b0b743428146729

commit 135f22ad82f6b5179f40123a8b0b743428146729
Author:     Bjoern A. Zeeb <bz@FreeBSD.org>
AuthorDate: 2023-12-01 01:37:25 +0000
Commit:     Bjoern A. Zeeb <bz@FreeBSD.org>
CommitDate: 2024-02-19 08:01:59 +0000

    tools/net80211: add mlme_assoc

    mlme_assoc is a tool to trigger net80211::ieee80211_sta_join1() calls
    which in certain conditions cause problems to the LinuxKPI 802.11 compat
    code (but also believed to possibly cause problems in case of race to
    other firmware based drivers).  This has proven to be a good reproducer
    for the problem even on setups which otherwise could run for days without
    hitting it.

    Sponsored by:   The FreeBSD Foundation
    PR:             271979

    (cherry picked from commit 643d6dce6c1e39f067f8d0feea8615913b324891)

 tools/tools/net80211/mlme_assoc/Makefile (new)     |   7 +
 tools/tools/net80211/mlme_assoc/README (new)       |  51 ++++++
 tools/tools/net80211/mlme_assoc/mlme_assoc.c (new) | 200 +++++++++++++++++++++
 3 files changed, 258 insertions(+)
Comment 46 commit-hook freebsd_committer freebsd_triage 2024-02-19 16:10:51 UTC
A commit in branch releng/13.3 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=9b998db87c28356fce21784c4f8bfb8737615e1f

commit 9b998db87c28356fce21784c4f8bfb8737615e1f
Author:     Bjoern A. Zeeb <bz@FreeBSD.org>
AuthorDate: 2024-01-10 10:14:16 +0000
Commit:     Bjoern A. Zeeb <bz@FreeBSD.org>
CommitDate: 2024-02-19 16:07:20 +0000

    net80211: deal with lost state transitions

    Since 5efea30f039c4 we can possibly lose a state transition which can
    cause trouble further down the road.
    The reproducer from 643d6dce6c1e can trigger these for example.
    Drivers for firmware based wireless cards have worked around some of
    this (and other) problems in the past.

    Add an array of tasks rather than a single one as we would simply
    get npending > 1 and lose order with other tasks.  Try to keep state
    changes updated as queued in case we end up with more than one at a
    time.  While this is not ideal either (call it a hack) it will sort
    the problem for now.
    We will queue in ieee80211_new_state_locked() and do checks there
    and dequeue in ieee80211_newstate_cb().
    If we still overrun the (currently) 8 slots we will drop the state
    change rather than overwrite the last one.
    When dequeing we will update iv_nstate and keep it around for historic
    reasons for the moment.

    The longer term we should make the callers of
    ieee80211_new_state[_locked]() actually use the returned errors
    and act appropriately but that will touch a lot more places and
    drivers (possibly incl. changed behaviour for ioctls).

    rtwn(4) and rum(4) should probably be revisted and net80211 internals
    removed (for rum(4) at least the current logic still seems prone to
    races).

    PR:             271979, 271988, 275255, 263613, 274003
    Sponsored by:   The FreeBSD Foundation (in 2023)
    Reviewed by:    cc
    Differential Revision: https://reviews.freebsd.org/D43389

    (cherry picked from commit 713db49d06deee90dd358b2e4b9ca05368a5eaf6)

    Given this changes the internal structure of 'struct ieee80211vap',
    which gets allocated by the drivers, and we do not have enough
    spares, all wireless drivers need to be recompiled.
    Given we are forced to do the update, we leave fields in the middle
    of the struct and add more spares at the same time.
    __FreeBSD_version will get updated to 1303001 to be able to detect
    this change.

    Approved by:    re (cperciva)

    (cherry picked from commit a890a3a5ddf33acb0a4000885945b89156799b07)
    (cherry picked from commit a7e1fc7f620d3341549c1380f550aaafbdb45622)

 sys/dev/rtwn/if_rtwn.c         |   4 +-
 sys/dev/usb/wlan/if_rum.c      |   4 +-
 sys/net80211/ieee80211.c       |   4 +-
 sys/net80211/ieee80211_ddb.c   |  15 ++++-
 sys/net80211/ieee80211_proto.c | 124 ++++++++++++++++++++++++++++++++++-------
 sys/net80211/ieee80211_var.h   |  18 +++---
 6 files changed, 136 insertions(+), 33 deletions(-)
Comment 47 commit-hook freebsd_committer freebsd_triage 2024-02-19 16:11:09 UTC
A commit in branch releng/13.3 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=d4b4efc6db6c6c3a9abf2f187ba1ccc0e40028cf

commit d4b4efc6db6c6c3a9abf2f187ba1ccc0e40028cf
Author:     Bjoern A. Zeeb <bz@FreeBSD.org>
AuthorDate: 2024-02-03 16:33:56 +0000
Commit:     Bjoern A. Zeeb <bz@FreeBSD.org>
CommitDate: 2024-02-19 16:09:22 +0000

    LinuxKPI: 802.11: band-aid for invalid state changes after (*iv_update_bss)

    With firmware based solutions we cannot just jump from an active session
    to a new iv_bss node without tearing down state for the old and bringing
    up the new node.  This likely used to work on softmac based cards/drivers
    where one could essentially set the state and fire at will.

    We track (*iv_update_bss) calls from net80211 and set a local flag that
    we are out of synch and do not allow any further operations up the state
    machine until we hit INIT or SCAN.  That means someone will take the state
    down, clean up firmware state and then we can join again and build up
    state.

    Apparently this problem has been "known" for a while as native iwm(4) and
    others have similar workarounds (though less strict) and can be equally
    pestered into bad states.  For LinuxKPI all the KASSERTs just massively
    brought this problem out.  The solution will be some rewrites in net80211.
    Until then, try to keep us more stable at least and not die on second
    join1() calls triggered by service netif start wlan0 and similar.

    Approved by:    re (cperciva)
    PR:             271979, 271988, 275255, 263613, 274003
    Sponsored by:   The FreeBSD Foundation (2023, partial)
    Reviewed by:    cc
    Differential Revision: https://reviews.freebsd.org/D43725

    (cherry picked from commit 2ac8a2189ac6707f48f77ef2e36baf696a0d2f40)
    (cherry picked from commit 184ccc414686ea32c64f063c081c7cc1adeae7c3)

 sys/compat/linuxkpi/common/src/linux_80211.c | 309 +++++++++++++++++++--------
 sys/compat/linuxkpi/common/src/linux_80211.h |   2 +
 2 files changed, 216 insertions(+), 95 deletions(-)
Comment 48 Bjoern A. Zeeb freebsd_committer freebsd_triage 2024-02-19 16:55:52 UTC
The firmware crashes seen should be gone for a while

I believe the lkpi_sta_auth_to_scan panic is fixed in 15/14/13/and 13.3 from RC1 on.

I'll leave it open for a few more days.  Would be great if some of the people who have seen this could confirm it to no longer be the case?

FYI: suspend/resume is tracked in  263632 and the "Invalid TXQ id" is tracked in  274382 and are considered different problems.  Please follow-up there if there are any news on those.
Comment 49 mark 2024-02-24 07:53:53 UTC
not sure if i belong or not but i just installed couldn’t get wifi working in the installer. upon booting and setting up ifconfig/wpa_supplicant im constantly spammed by iwlwifi every few nano seconds. 

the issue seems to be either unable to detect the network or a problem logging in. the spam panic messages the driver gives me tends to point in the direction of authentication. other then the panic messages im getting i haven’t really found where the rest of the logs are.
Comment 50 Bjoern A. Zeeb freebsd_committer freebsd_triage 2024-02-24 08:02:16 UTC
(In reply to mark from comment #49)

Which release/snapshot image did you try and which chipset do you have?
Comment 51 mark 2024-02-24 16:15:02 UTC
13.2 i think it’s rc1? i have the killer wi-fi 6 ax 200. i’ll try 13.1 and report back if its still not working. i was also getting some weird issues when trying to add the device so not sure if its a related problem or something else.
Comment 52 mark 2024-02-24 16:17:05 UTC
(In reply to Bjoern A. Zeeb from comment #50)

sorry that last comment was in response to your question. i didn’t notice the reply button at first :p
Comment 53 Bjoern A. Zeeb freebsd_committer freebsd_triage 2024-02-24 17:42:16 UTC
(In reply to mark from comment #51)

Try a 13.3-RC1 once it is out. 13.3-BETA3 was too early for some bug fixes.

Or the Feb 22 14.0-STABLE snapshot image maybe: https://download.freebsd.org/snapshots/amd64/amd64/ISO-IMAGES/14.0/?C=M&O=D  That should have all the latest bits.

13.2 or 13.1 won't do you much good.
Comment 54 mark 2024-02-24 19:08:10 UTC
(In reply to Bjoern A. Zeeb from comment #53)
latest snapshot worked. how do i get updates to work? would i have to wait for the latest 14 release?
Comment 55 Bjoern A. Zeeb freebsd_committer freebsd_triage 2024-02-24 19:12:37 UTC
(In reply to mark from comment #54)

no freebsd-update for stable; indeed.  You'd have to wait for 14.1 or alternatively 13.3-R which is due soon.  or you need to manually track stable for a while.
Comment 56 Bjoern A. Zeeb freebsd_committer freebsd_triage 2024-02-24 19:13:24 UTC
(In reply to Bjoern A. Zeeb from comment #55)

Pressed the button too early.   Fantastic news by the way and thanks for testing and reporting back!   It's much appreciated.
Comment 57 Bjoern A. Zeeb freebsd_committer freebsd_triage 2024-06-08 01:07:27 UTC
We believe this is all fixed.
In case this specific issue still shows up please re-open.

Thanks a lot to everyone reporting and testing!