During installation of FreeBSD 14.0 CURRENT, installation proceeds well up to the point where it request the way I would like to connect, ethernet or wireless. When I choose wireless, the system scans for wireless networks and succeeds in finding them. I choose my wireless network, it then prompts me to enter the appropriate password, I enter - and the installer exits immediately to a prompt and the system completely stops. I've attempted the install twice, both resulting in the same result. System exits to a db> prompt, which permits limited options. None of them resume the installation.
which WiFi driver is the installer selecting? as you're already in the debugger, can you share a "backtrace" from the panic? https://docs.freebsd.org/en/books/developers-handbook/kerneldebug/
Created attachment 242764 [details] Backtrace after the installation failed and dropped out to the debugger. This is the backtrace after the installer failed after attempting to connect to wifi.
What does `show panic` say? I have a hunch this is the same bug we see and that bz@ knows about but has yet to sit down and fix.
(In reply to Jessica Clarke from comment #3) And which Chipset/Vendor is this (Realtek or Intel)?
(In reply to Bjoern A. Zeeb from comment #4) Our issue is with iwlwifi giving "panic: lkpi_sta_auth_to_scan: lsta 0x... state not NONE: 0, nstate 1 arg 1" which jhb@ emailed you about a month ago. The backtrace in this bug report matches that, but there is another KASSERT that it could be (though that seems unlikely).
^Triage: summary, component, keywords, make the former assignee a cc recipient. (In reply to Hitch from comment #0) Thank you, can you tell what Wi-Fi hardware is in the notebook? For an exact answer, you can boot from the installer and use a shell to run the following command: pciconf -lv | grep -B 3 network
I think I've been encountering the same panic with my Intel Wi-fi 6 AX201 card. If I type "ifconfig wlan create wlandev iwlwifi0", "ifconfig wlan0 channel 153", "wpa_supplicant -i wlan0 -c /etc/wpa_supplicant.conf &", "dhclient wlan0", then I will be able to access the internet without encountering issues and there will be no panic. However, if I type "ifconfig wlan0 up" right after creating wlandev, there will be a panic after I type "wpa_supplicant -i wlan0 -c /etc/wpa_supplicant.conf &". I don't really have the need to type "ifconfig wlan0 up" because I am still able to access the internet after typing the commands I already mentioned; however, bsdinstall wants this "up" thing to be available early, so, in bsdinstall, I will encounter this panic. Here's the backtrace: __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:59 59 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:59 #1 doadump (textdump=textdump@entry=1) at /usr/src/sys/kern/kern_shutdown.c:407 #2 0xffffffff80b4bb60 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:528 #3 0xffffffff80b4c07d in vpanic ( fmt=0xffffffff811d6263 "%s: lsta %p state not NONE: %#x, nstate %d arg %d\n", ap=ap@entry=0xfffffe01e9a94ce0) at /usr/src/sys/kern/kern_shutdown.c:972 #4 0xffffffff80b4be03 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:896 #5 0xffffffff80dcbd0c in lkpi_sta_auth_to_scan (vap=0xfffffe01f95cd010, nstate=IEEE80211_S_SCAN, arg=1) at /usr/src/sys/compat/linuxkpi/common/src/linux_80211.c:1153 #6 0xffffffff80dd2963 in lkpi_iv_newstate (vap=0xfffffe01f95cd010, nstate=IEEE80211_S_SCAN, arg=1) at /usr/src/sys/compat/linuxkpi/common/src/linux_80211.c:2045 #7 0xffffffff80cf8937 in ieee80211_newstate_cb (xvap=0xfffffe01f95cd010, npending=<optimized out>) at /usr/src/sys/net80211/ieee80211_proto.c:2548 #8 0xffffffff80bb055b in taskqueue_run_locked ( queue=queue@entry=0xfffff80001cc8e00) at /usr/src/sys/kern/subr_taskqueue.c:514 #9 0xffffffff80bb1613 in taskqueue_thread_loop ( arg=arg@entry=0xfffffe01ea0de110) at /usr/src/sys/kern/subr_taskqueue.c:826 #10 0xffffffff80b02500 in fork_exit ( callout=0xffffffff80bb1540 <taskqueue_thread_loop>, arg=0xfffffe01ea0de110, frame=0xfffffe01e9a94f40) at /usr/src/sys/kern/kern_fork.c:1131 #11 <signal handler called> #12 0xdeadc0dedeadc0de in ?? () Backtrace stopped: Cannot access memory at address 0xdeadc0dedeadc0de (kgdb)
Under the GENERIC-NODEBUG kernel, the OS doesn't crash, but I get these messages: iwlwifi0: Microcode SW error detected. Restarting 0x0. iwlwifi0: Start IWL Error Log Dump: iwlwifi0: Transport status: 0x0000004B, valid: 6 iwlwifi0: Loaded firmware version: 73.35c0a2c6.0 QuZ-a0-hr-b0-73.ucode iwlwifi0: 0x00000071 | NMI_INTERRUPT_UMAC_FATAL iwlwifi0: 0x00A0A200 | trm_hw_status0 iwlwifi0: 0x00000000 | trm_hw_status1 iwlwifi0: 0x004CC0FE | branchlink2 iwlwifi0: 0x004C2512 | interruptlink1 iwlwifi0: 0x004C2512 | interruptlink2 iwlwifi0: 0x00014D96 | data1 iwlwifi0: 0x00001000 | data2 iwlwifi0: 0x00000000 | data3 iwlwifi0: 0x00000000 | beacon time iwlwifi0: 0x0002B287 | tsf low iwlwifi0: 0x00000000 | tsf hi iwlwifi0: 0x00000000 | time gp1 iwlwifi0: 0x00030DF6 | time gp2 iwlwifi0: 0x00000001 | uCode revision type iwlwifi0: 0x00000049 | uCode version major iwlwifi0: 0x35C0A2C6 | uCode version minor iwlwifi0: 0x00000351 | hw version iwlwifi0: 0x18C89001 | board version iwlwifi0: 0x8065FC41 | hcmd iwlwifi0: 0x24020000 | isr0 iwlwifi0: 0x61000000 | isr1 iwlwifi0: 0x08F00002 | isr2 iwlwifi0: 0x00C3000C | isr3 iwlwifi0: 0x00000000 | isr4 iwlwifi0: 0x00000000 | last cmd Id iwlwifi0: 0x00014D96 | wait_event iwlwifi0: 0x00000050 | l2p_control iwlwifi0: 0x00018014 | l2p_duration iwlwifi0: 0x0000003F | l2p_mhvalid iwlwifi0: 0x00000000 | l2p_addr_match iwlwifi0: 0x00000009 | lmpm_pmg_sel iwlwifi0: 0x00000000 | timestamp iwlwifi0: 0x00001054 | flow_handler iwlwifi0: Start IWL Error Log Dump: iwlwifi0: Transport status: 0x0000004B, valid: 7 iwlwifi0: 0x20103020 | ADVANCED_SYSASSERT iwlwifi0: 0x00000000 | umac branchlink1 iwlwifi0: 0x80455E18 | umac branchlink2 iwlwifi0: 0x01077D90 | umac interruptlink1 iwlwifi0: 0x00000000 | umac interruptlink2 iwlwifi0: 0x00000000 | umac data1 iwlwifi0: 0x00000000 | umac data2 iwlwifi0: 0x000000FF | umac data3 iwlwifi0: 0x00000049 | umac major iwlwifi0: 0x35C0A2C6 | umac minor iwlwifi0: 0x00030DF1 | frame pointer iwlwifi0: 0xC0885EE4 | stack pointer iwlwifi0: 0x0016012B | last host cmd iwlwifi0: 0x00000000 | isr status reg iwlwifi0: IML/ROM dump: iwlwifi0: 0x00000003 | IML/ROM error/state iwlwifi0: 0x0000569B | IML/ROM data1 iwlwifi0: 0x00000080 | IML/ROM WFPM_AUTH_KEY_0 iwlwifi0: Fseq Registers: iwlwifi0: 0x60000000 | FSEQ_ERROR_CODE iwlwifi0: 0x80290033 | FSEQ_TOP_INIT_VERSION iwlwifi0: 0x00090006 | FSEQ_CNVIO_INIT_VERSION iwlwifi0: 0x0000A482 | FSEQ_OTP_VERSION iwlwifi0: 0x00000003 | FSEQ_TOP_CONTENT_VERSION iwlwifi0: 0x4552414E | FSEQ_ALIVE_TOKEN iwlwifi0: 0x20000302 | FSEQ_CNVI_ID iwlwifi0: 0x01300504 | FSEQ_CNVR_ID iwlwifi0: 0x20000302 | CNVI_AUX_MISC_CHIP iwlwifi0: 0x01300504 | CNVR_AUX_MISC_CHIP iwlwifi0: 0x05B0905B | CNVR_SCU_SD_REGS_SD_REG_DIG_DCDC_VTRIM iwlwifi0: 0x0000025B | CNVR_SCU_SD_REGS_SD_REG_ACTIVE_VDIG_MIRROR iwlwifi0: WRT: Collecting data: ini trigger 4 fired (delay=0ms). iwlwifi0: FW error in SYNC CMD BINDING_CONTEXT_CMD #0 0xffffffff80db16bb at linux_dump_stack+0x1b #1 0xffffffff833b5a43 at iwl_trans_txq_send_hcmd+0x3f3 #2 0xffffffff8335ce1e at iwl_trans_send_cmd+0xce #3 0xffffffff8339ca9b at iwl_mvm_send_cmd_status+0x2b #4 0xffffffff8339cb9f at iwl_mvm_send_cmd_pdu_status+0x4f #5 0xffffffff83365aae at iwl_mvm_binding_update+0x1fe #6 0xffffffff83376edc at __iwl_mvm_assign_vif_chanctx+0x7c #7 0xffffffff83373ae5 at iwl_mvm_assign_vif_chanctx+0x65 #8 0xffffffff80daba97 at lkpi_80211_mo_assign_vif_chanctx+0x27 #9 0xffffffff80da435c at lkpi_sta_scan_to_auth+0x4bc #10 0xffffffff80dab2ca at lkpi_iv_newstate+0x39a #11 0xffffffff80cdcf8e at ieee80211_newstate_cb+0xee #12 0xffffffff80bad472 at taskqueue_run_locked+0x182 #13 0xffffffff80bae702 at taskqueue_thread_loop+0xc2 #14 0xffffffff80b0311f at fork_exit+0x7f #15 0xffffffff80fefbce at fork_trampoline+0xe iwlwifi0: Failed to send binding (action:1): -5 iwlwifi0: PHY ctxt cmd error. ret=-5 iwlwifi0: lkpi_iv_newstate: error -5 during state transition 1 (SCAN) -> 2 (AUTH) iwlwifi0: No queue was found. Dropping TX iwlwifi0: Failed to trigger RX queues sync (-5) WARNING !mvmvif->phy_ctxt failed at /usr/src/sys/contrib/dev/iwlwifi/mvm/mac80211.c:3158 iwlwifi0: Scan failed! ret -5 iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5 iwlwifi0: Scan failed! ret -5 iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5 iwlwifi0: Scan failed! ret -5 iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5 iwlwifi0: Scan failed! ret -5 iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5 iwlwifi0: Scan failed! ret -5 iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5 iwlwifi0: Scan failed! ret -5 iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5 iwlwifi0: Scan failed! ret -5 iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5 iwlwifi0: Scan failed! ret -5 iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5 iwlwifi0: Scan failed! ret -5 iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5 iwlwifi0: Scan failed! ret -5 iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5 iwlwifi0: Scan failed! ret -5 iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5 iwlwifi0: Scan failed! ret -5 iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5 iwlwifi0: Scan failed! ret -5 iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5 iwlwifi0: Scan failed! ret -5 iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5 iwlwifi0: Scan failed! ret -5 iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5 iwlwifi0: Scan failed! ret -5 iwlwifi0: ERROR: lkpi_ic_scan_start: hw_scan returned -5 Once again, this only happens if I type "ifconfig wlan0 up" before executing the wpa_supplicant command. If I don't type it, I won't encounter issues and will be able to access the internet with the wi-fi card. So, the FreeBSD kernel considers the "ifconfig wlan0 up" command evil.
(In reply to Oleg from comment #8) > Once again, this only happens if I type "ifconfig wlan0 up" before executing the wpa_supplicant command. I'm experiencing exactly the same problem and behavior you're mentioning here. My setup is a fail-over LAGG containing "wlan0" as secondary. And I'm upping it before I'm attaching it to "lagg0" and spawning "wpa_supplicant". I've removed that "ifconfig wlan0 up" in my script and "wlan0" works now as expected.
if you can try main: please update to/past the revision mentioned in: https://lists.freebsd.org/archives/freebsd-wireless/2023-September/001441.html
I no longer encounter this bug after compiling the latest 15-CURRENT kernel.
But it looks like with the 15-CURRENT kernel that was compiled today, new bugs were introduced such as "panic: lkpi_sta_auth_to_assoc: lsta 0xfffff80439fc1000 state not NONE: 0 . <6>wlan0: ieee80211_new_state_locked: pending SCAN -> AUTH transition lost <4>Invalid TXQ id"
(In reply to Oleg from comment #12) What was before that. The actual problem is way earlier in the message buffer.
__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57 57 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57 #1 doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:405 #2 0xffffffff837ed323 in vt_kms_postswitch () from /boot/modules/drm.ko #3 0xffffffff80992fde in vt_window_switch (vw=0xfffff800213893c0) at /usr/src/sys/dev/vt/vt_core.c:595 #4 0xffffffff80b4ed33 in kern_reboot (howto=4) at /usr/src/sys/kern/kern_shutdown.c:501 #5 0xffffffff80b4f50f in vpanic (fmt=0xffffffff8118fa9b "%s", ap=ap@entry=0xfffffe023c222450) at /usr/src/sys/kern/kern_shutdown.c:970 #6 0xffffffff80b4f2b3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:894 #7 0xffffffff8104ecbc in trap_fatal (frame=0xfffffe023c222550, eva=259) at /usr/src/sys/amd64/amd64/trap.c:952 #8 0xffffffff8104ed6e in trap_pfault (frame=0xfffffe023c222550, usermode=false, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:760 #9 <signal handler called> #10 0xffffffff836c65d0 in intel_atomic_get_global_obj_state () from /boot/modules/i915kms.ko #11 0xffffffff8367c35c in skl_compute_wm () from /boot/modules/i915kms.ko #12 0xffffffff8364464f in intel_atomic_check () from /boot/modules/i915kms.ko #13 0xffffffff837ad783 in drm_atomic_check_only () from /boot/modules/drm.ko #14 0xffffffff837adbc3 in drm_atomic_commit () from /boot/modules/drm.ko #15 0xffffffff837bd298 in drm_client_modeset_commit_atomic () from /boot/modules/drm.ko #16 0xffffffff837bd384 in drm_client_modeset_commit_locked () from /boot/modules/drm.ko #17 0xffffffff837bd511 in drm_client_modeset_commit () from /boot/modules/drm.ko #18 0xffffffff837fff13 in drm_fb_helper_restore_fbdev_mode_unlocked () from /boot/modules/drm.ko #19 0xffffffff837ed461 in vt_kms_postswitch () from /boot/modules/drm.ko #20 0xffffffff80992ea1 in vt_window_switch (vw=0xfffffe01eab6c2b0, vw@entry=0xffffffff816a9c98 <vt_conswindow>) at /usr/src/sys/dev/vt/vt_core.c:612 #21 0xffffffff809941ff in vtterm_cngrab (tm=<optimized out>) at /usr/src/sys/dev/vt/vt_core.c:1863 #22 0xffffffff80adf1c6 in cngrab () at /usr/src/sys/kern/kern_cons.c:385 #23 0xffffffff80b4f441 in vpanic ( fmt=0xffffffff8125d520 "%s: lsta %p state not NONE: %#x\n", ap=ap@entry=0xfffffe023c222d00) at /usr/src/sys/kern/kern_shutdown.c:942 #24 0xffffffff80b4f2b3 in panic ( fmt=0x103 <error: Cannot access memory at address 0x103>) at /usr/src/sys/kern/kern_shutdown.c:894 #25 0xffffffff80dd2334 in lkpi_sta_auth_to_assoc (vap=0xfffffe023ba37010, nstate=<optimized out>, arg=<optimized out>) at /usr/src/sys/compat/linuxkpi/common/src/linux_80211.c:1247 #26 0xffffffff80dd8dc3 in lkpi_iv_newstate (vap=0xfffffe023ba37010, nstate=IEEE80211_S_ASSOC, arg=0) at /usr/src/sys/compat/linuxkpi/common/src/linux_80211.c:2064 #27 0xffffffff80cfe837 in ieee80211_newstate_cb (xvap=0xfffffe023ba37010, npending=<optimized out>) at /usr/src/sys/net80211/ieee80211_proto.c:2546 #28 0xffffffff80bb4afb in taskqueue_run_locked ( queue=queue@entry=0xfffff800034a3000) at /usr/src/sys/kern/subr_taskqueue.c:512 #29 0xffffffff80bb5bb3 in taskqueue_thread_loop ( arg=arg@entry=0xfffffe023b9e6110) at /usr/src/sys/kern/subr_taskqueue.c:824 #30 0xffffffff80b05082 in fork_exit ( callout=0xffffffff80bb5ae0 <taskqueue_thread_loop>, arg=0xfffffe023b9e6110, frame=0xfffffe023c222f40) at /usr/src/sys/kern/kern_fork.c:1160 #31 <signal handler called> (kgdb)
(In reply to Oleg from comment #14) The actual problem is before the KASSERT which just catches it. Can you check the message buffer (dmesg) of the core file? There's likely a firmware crash a few lines up with surrounding information.
Are you looking for this information: panic: lkpi_sta_auth_to_assoc: lsta 0xfffff80439fc1000 state not NONE: 0 GNU gdb (GDB) 13.2 [GDB v13.2 for FreeBSD] Copyright (C) 2023 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-portbld-freebsd15.0". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /boot/kernel/kernel... Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug... Unread portion of the kernel message buffer: <6>wlan0: ieee80211_new_state_locked: pending SCAN -> AUTH transition lost <4>Invalid TXQ id WARNING !drm_modeset_is_locked(&crtc->mutex) failed at /wrkdirs/usr/ports/graphics/drm-515-kmod/work/drm-kmod-drm_v5.15.25_5/drivers/gpu/drm/drm_atomic_helper.c:621 WARNING !drm_modeset_is_locked(&crtc->mutex) failed at /wrkdirs/usr/ports/graphics/drm-515-kmod/work/drm-kmod-drm_v5.15.25_5/drivers/gpu/drm/drm_atomic_helper.c:621 WARNING !drm_modeset_is_locked(&crtc->mutex) failed at /wrkdirs/usr/ports/graphics/drm-515-kmod/work/drm-kmod-drm_v5.15.25_5/drivers/gpu/drm/drm_atomic_helper.c:621 WARNING !drm_modeset_is_locked(&dev->mode_config.connection_mutex) failed at /wrkdirs/usr/ports/graphics/drm-515-kmod/work/drm-kmod-drm_v5.15.25_5/drivers/gpu/drm/drm_atomic_helper.c:671 kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode cpuid = 4; apic id = 04 fault virtual address = 0x103 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff836c65d0 stack pointer = 0x28:0xfffffe023c222610 frame pointer = 0x28:0xfffffe023c222650 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 0 (iwlwifi0 net80211 t) rdi: 0000000000000103 rsi: fffff800022b1828 rdx: fffff800022b1810 rcx: fffffe01eab693f8 r8: 00000000000000cb r9: 0000000000000000 rax: fffffe023c2229e8 rbx: fffffe01eab6c2b0 rbp: fffffe023c222650 r10: fffffe023bb1ca72 r11: fffff8043acbc800 r12: fffff80480717a80 r13: fffffe01eab69000 r14: fffff8043acbc800 r15: 0000000000000000 trap number = 12 panic: page fault cpuid = 4 time = 1696530515 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe023c2222e0 vpanic() at vpanic+0x132/frame 0xfffffe023c222410 panic() at panic+0x43/frame 0xfffffe023c222470 trap_fatal() at trap_fatal+0x40c/frame 0xfffffe023c2224d0 trap_pfault() at trap_pfault+0xae/frame 0xfffffe023c222540 calltrap() at calltrap+0x8/frame 0xfffffe023c222540 --- trap 0xc, rip = 0xffffffff836c65d0, rsp = 0xfffffe023c222610, rbp = 0xfffffe023c222650 --- intel_atomic_get_global_obj_state() at intel_atomic_get_global_obj_state+0x90/frame 0xfffffe023c222650 skl_compute_wm() at skl_compute_wm+0xaec/frame 0xfffffe023c222870 intel_atomic_check() at intel_atomic_check+0xeff/frame 0xfffffe023c222940 drm_atomic_check_only() at drm_atomic_check_only+0x4a3/frame 0xfffffe023c2229b0 drm_atomic_commit() at drm_atomic_commit+0x13/frame 0xfffffe023c2229d0 drm_client_modeset_commit_atomic() at drm_client_modeset_commit_atomic+0x158/frame 0xfffffe023c222a40 drm_client_modeset_commit_locked() at drm_client_modeset_commit_locked+0x74/frame 0xfffffe023c222a90 drm_client_modeset_commit() at drm_client_modeset_commit+0x21/frame 0xfffffe023c222ab0 drm_fb_helper_restore_fbdev_mode_unlocked() at drm_fb_helper_restore_fbdev_mode_unlocked+0x83/frame 0xfffffe023c222ae0 vt_kms_postswitch() at vt_kms_postswitch+0x181/frame 0xfffffe023c222b10 vt_window_switch() at vt_window_switch+0x121/frame 0xfffffe023c222b50 vtterm_cngrab() at vtterm_cngrab+0x4f/frame 0xfffffe023c222b70 cngrab() at cngrab+0x26/frame 0xfffffe023c222b90 vpanic() at vpanic+0xd1/frame 0xfffffe023c222cc0 panic() at panic+0x43/frame 0xfffffe023c222d20 lkpi_sta_auth_to_assoc() at lkpi_sta_auth_to_assoc+0x234/frame 0xfffffe023c222d80 lkpi_iv_newstate() at lkpi_iv_newstate+0x253/frame 0xfffffe023c222df0 ieee80211_newstate_cb() at ieee80211_newstate_cb+0x1e7/frame 0xfffffe023c222e40 taskqueue_run_locked() at taskqueue_run_locked+0xab/frame 0xfffffe023c222ec0 taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame 0xfffffe023c222ef0 fork_exit() at fork_exit+0x82/frame 0xfffffe023c222f30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe023c222f30 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- Dumping 2378 out of 65308 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% ?
(In reply to Oleg from comment #16) > Are you looking for this information: If you have a core.txt.<0> there should be a section further down titled (without the ::) which should have even more message buffer information. :: ------------------------------------------------------------------------ :: dmesg Can I ask, given the topic of the PR: is this when running an installed system or during bsdinstall? Also which chipset and firmware version are you on now (and which freebsd hash)?
It happened when running an installed system after I typed "kldunload if_iwlwifi" on FreeBSD 15.0-CURRENT #0 main-n265776-b6a61ac2d475. Typing "kldunload if_iwlwifi" sometimes leads to a crash, and sometimes it doesn't. Is this what you are asking about: Autoloading module: if_iwlwifi Intel(R) Wireless WiFi based driver for FreeBSD Autoloading module: ig4 pci0: driver added found-> vendor=0x8086, dev=0x43ef, revid=0x11 domain=0, bus=0, slot=20, func=2 class=05-00-00, hdrtype=0x00, mfdev=0 cmdreg=0x0002, statreg=0x0010, cachelnsz=16 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) powerspec 3 supports D0 D3 current D0 pci0:0:20:2: reprobing on driver added found-> vendor=0x8086, dev=0x43f0, revid=0x11 domain=0, bus=0, slot=20, func=3 class=02-80-00, hdrtype=0x00, mfdev=1 cmdreg=0x0002, statreg=0x0010, cachelnsz=16 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=a, irq=255 powerspec 3 supports D0 D3 current D0 MSI supports 1 message, 64 bit MSI-X supports 16 messages in map 0x10 pci0:0:20:3: reprobing on driver added iwlwifi0: <iwlwifi> mem 0x6001114000-0x6001117fff at device 20.3 on pci0 iwlwifi0: attempting to allocate 16 MSI-X vectors (16 supported) msi: routing MSI-X IRQ 145 to local APIC 0 vector 59 msi: routing MSI-X IRQ 146 to local APIC 2 vector 49 msi: routing MSI-X IRQ 147 to local APIC 4 vector 49 msi: routing MSI-X IRQ 148 to local APIC 6 vector 49 msi: routing MSI-X IRQ 149 to local APIC 8 vector 49 msi: routing MSI-X IRQ 150 to local APIC 10 vector 49 msi: routing MSI-X IRQ 151 to local APIC 12 vector 48 msi: routing MSI-X IRQ 152 to local APIC 14 vector 48 msi: routing MSI-X IRQ 153 to local APIC 16 vector 48 msi: routing MSI-X IRQ 154 to local APIC 18 vector 49 msi: routing MSI-X IRQ 155 to local APIC 0 vector 60 msi: routing MSI-X IRQ 156 to local APIC 2 vector 50 msi: routing MSI-X IRQ 157 to local APIC 4 vector 50 msi: routing MSI-X IRQ 158 to local APIC 6 vector 50 msi: routing MSI-X IRQ 159 to local APIC 8 vector 50 msi: routing MSI-X IRQ 160 to local APIC 10 vector 50 iwlwifi0: using IRQs 145-160 for MSI-X msi: Assigning MSI-X IRQ 146 to local APIC 0 vector 61 msi: Assigning MSI-X IRQ 147 to local APIC 1 vector 49 msi: Assigning MSI-X IRQ 148 to local APIC 2 vector 49 msi: Assigning MSI-X IRQ 149 to local APIC 3 vector 49 msi: Assigning MSI-X IRQ 150 to local APIC 4 vector 49 msi: Assigning MSI-X IRQ 151 to local APIC 5 vector 49 msi: Assigning MSI-X IRQ 152 to local APIC 6 vector 49 msi: Assigning MSI-X IRQ 153 to local APIC 7 vector 49 msi: Assigning MSI-X IRQ 154 to local APIC 8 vector 49 msi: Assigning MSI-X IRQ 155 to local APIC 9 vector 49 msi: Assigning MSI-X IRQ 156 to local APIC 10 vector 49 msi: Assigning MSI-X IRQ 157 to local APIC 11 vector 49 msi: Assigning MSI-X IRQ 158 to local APIC 12 vector 48 msi: Assigning MSI-X IRQ 159 to local APIC 13 vector 49 iwlwifi0: Detected crf-id 0x3617, cnv-id 0x20000302 wfpm id 0x80000000 iwlwifi0: PCI dev 43f0/0074, rev=0x351, rfid=0x10a100 firmware: 'iwlwifi-QuZ-a0-hr-b0-77.ucode' version 77: 1404840 bytes loaded at 0xffffffff83aef000 iwlwifi0: successfully loaded firmware image 'iwlwifi-QuZ-a0-hr-b0-77.ucode' iwlwifi0: api flags index 2 larger than supported by driver iwlwifi0: TLV_FW_FSEQ_VERSION: FSEQ Version: 89.3.35.37 iwl-debug-yoyo.bin: could not load firmware image, error 2 iwl-debug-yoyo.bin: could not load firmware image, error 2 iwl-debug-yoyo_bin: could not load firmware image, error 2 iwl_debug_yoyo_bin: could not load firmware image, error 2 iwlwifi0: loaded firmware version 77.2df8986f.0 QuZ-a0-hr-b0-77.ucode op_mode iwlmvm iwlwifi0: Detected Intel(R) Wi-Fi 6 AX201 160MHz, REV=0x351 iwlwifi0: Detected RF HR B5, rfid=0x10a100 iwlwifi0: base HW address: 10:3d:1c:9c:8d:1c iwlwifi0: 11a rates: 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps iwlwifi0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps iwlwifi0: 11g rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps found-> vendor=0x8086, dev=0x43e8, revid=0x11 domain=0, bus=0, slot=21, func=0 class=0c-80-00, hdrtype=0x00, mfdev=1 cmdreg=0x0004, statreg=0x0010, cachelnsz=16 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=a, irq=255 powerspec 3 supports D0 D3 current D0 ?
And now I experienced this panic after compiling the latest 15-CURRENT kernel: panic: lkpi_sta_auth_to_scan: lsta 0xfffff800022bb000 state not NONE: 0, nstate 1 arg 1 . Before the latest wifi-related commits to the kernel, this bug was always triggered if I typed "ifconfig wlan0 up" before executing the wpa_supplicant command. But now it is only sometimes triggered: __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57 57 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57 #1 doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:405 #2 0xffffffff837ed323 in vt_kms_postswitch () from /boot/modules/drm.ko #3 0xffffffff8099300e in vt_window_switch (vw=0xfffff800272252c0) at /usr/src/sys/dev/vt/vt_core.c:595 #4 0xffffffff80b4ed63 in kern_reboot (howto=4) at /usr/src/sys/kern/kern_shutdown.c:501 #5 0xffffffff80b4f53f in vpanic (fmt=0xffffffff8118fa9b "%s", ap=ap@entry=0xfffffe0201c09430) at /usr/src/sys/kern/kern_shutdown.c:970 #6 0xffffffff80b4f2e3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:894 #7 0xffffffff8104ecbc in trap_fatal (frame=0xfffffe0201c09530, eva=18446744069414584320) at /usr/src/sys/amd64/amd64/trap.c:952 #8 0xffffffff8104ed6e in trap_pfault (frame=0xfffffe0201c09530, usermode=false, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:760 #9 <signal handler called> #10 0xffffffff836c65d0 in intel_atomic_get_global_obj_state () from /boot/modules/i915kms.ko #11 0xffffffff8367c35c in skl_compute_wm () from /boot/modules/i915kms.ko #12 0xffffffff8364464f in intel_atomic_check () from /boot/modules/i915kms.ko #13 0xffffffff837ad783 in drm_atomic_check_only () from /boot/modules/drm.ko #14 0xffffffff837adbc3 in drm_atomic_commit () from /boot/modules/drm.ko #15 0xffffffff837bd298 in drm_client_modeset_commit_atomic () from /boot/modules/drm.ko #16 0xffffffff837bd384 in drm_client_modeset_commit_locked () from /boot/modules/drm.ko #17 0xffffffff837bd511 in drm_client_modeset_commit () from /boot/modules/drm.ko #18 0xffffffff837fff13 in drm_fb_helper_restore_fbdev_mode_unlocked () from /boot/modules/drm.ko #19 0xffffffff837ed461 in vt_kms_postswitch () from /boot/modules/drm.ko #20 0xffffffff80992ed1 in vt_window_switch (vw=0xfffffe01eaccc2b0, vw@entry=0xffffffff816a9c98 <vt_conswindow>) at /usr/src/sys/dev/vt/vt_core.c:612 #21 0xffffffff8099422f in vtterm_cngrab (tm=<optimized out>) at /usr/src/sys/dev/vt/vt_core.c:1863 #22 0xffffffff80adf1f6 in cngrab () at /usr/src/sys/kern/kern_cons.c:385 #23 0xffffffff80b4f471 in vpanic ( fmt=0xffffffff811e2547 "%s: lsta %p state not NONE: %#x, nstate %d arg %d\n", ap=ap@entry=0xfffffe0201c09ce0) at /usr/src/sys/kern/kern_shutdown.c:942 #24 0xffffffff80b4f2e3 in panic ( fmt=0xffffffff00000000 <error: Cannot access memory at address 0xffffffff00000000>) at /usr/src/sys/kern/kern_shutdown.c:894 #25 0xffffffff80dd1d37 in lkpi_sta_auth_to_scan (vap=0xfffffe0201449010, nstate=IEEE80211_S_SCAN, arg=1) at /usr/src/sys/compat/linuxkpi/common/src/linux_80211.c:1167 #26 0xffffffff80dd9223 in lkpi_iv_newstate (vap=0xfffffe0201449010, nstate=IEEE80211_S_SCAN, arg=1) at /usr/src/sys/compat/linuxkpi/common/src/linux_80211.c:2064 #27 0xffffffff80cfeb37 in ieee80211_newstate_cb (xvap=0xfffffe0201449010, npending=<optimized out>) at /usr/src/sys/net80211/ieee80211_proto.c:2546 #28 0xffffffff80bb4b2b in taskqueue_run_locked ( queue=queue@entry=0xfffff804663bd500) at /usr/src/sys/kern/subr_taskqueue.c:512 #29 0xffffffff80bb5be3 in taskqueue_thread_loop ( arg=arg@entry=0xfffffe01fdb75110) at /usr/src/sys/kern/subr_taskqueue.c:824 #30 0xffffffff80b050b2 in fork_exit ( callout=0xffffffff80bb5b10 <taskqueue_thread_loop>, arg=0xfffffe01fdb75110, frame=0xfffffe0201c09f40) at /usr/src/sys/kern/kern_fork.c:1160 #31 <signal handler called> (kgdb)
I just tried another "service netif restart wlan0" and my system froze. No panic. Nothing logged. No core dump. Display froze after "Starting wpa_supplicant." Nothing abnormal until that point. After a reboot I tried again without X running. This time I got a panic: ptavv dumped core - see /var/crash/vmcore.5 Sun Oct 8 17:37:18 PDT 2023 FreeBSD ptavv 15.0-CURRENT FreeBSD 15.0-CURRENT #7 main-n265807-04c8bfc17610: Sat Oct 7 23:34:33 PDT 2023 root@ptavv:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 panic: lkpi_sta_auth_to_scan: lsta 0xfffff8000bb92000 state not NONE: 0, nstate 1 arg 1 I can attach the full file if it looks useful. One oddity is that I see several drm items in the stack dump: KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe015adc8330 vpanic() at vpanic+0x132/frame 0xfffffe015adc8460 panic() at panic+0x43/frame 0xfffffe015adc84c0 trap_fatal() at trap_fatal+0x40c/frame 0xfffffe015adc8520 calltrap() at calltrap+0x8/frame 0xfffffe015adc8520 --- trap 0x9, rip = 0xffffffff831d25d0, rsp = 0xfffffe015adc85f0, rbp = 0xfffffe015adc8630 --- intel_atomic_get_global_obj_state() at intel_atomic_get_global_obj_state+0x90/frame 0xfffffe015adc8630 skl_compute_wm() at skl_compute_wm+0xaec/frame 0xfffffe015adc8850 intel_atomic_check() at intel_atomic_check+0xeff/frame 0xfffffe015adc8920 drm_atomic_check_only() at drm_atomic_check_only+0x4a3/frame 0xfffffe015adc8990 drm_atomic_commit() at drm_atomic_commit+0x13/frame 0xfffffe015adc89b0 drm_client_modeset_commit_atomic() at drm_client_modeset_commit_atomic+0x158/frame 0xfffffe015adc8a20 drm_client_modeset_commit_locked() at drm_client_modeset_commit_locked+0x74/frame 0xfffffe015adc8a70 drm_client_modeset_commit() at drm_client_modeset_commit+0x21/frame 0xfffffe015adc8a90 drm_fb_helper_restore_fbdev_mode_unlocked() at drm_fb_helper_restore_fbdev_mode_unlocked+0x83/frame 0xfffffe015adc8ac0 vt_kms_postswitch() at vt_kms_postswitch+0x181/frame 0xfffffe015adc8af0 vt_window_switch() at vt_window_switch+0x25e/frame 0xfffffe015adc8b30 vtterm_cngrab() at vtterm_cngrab+0x4f/frame 0xfffffe015adc8b50 cngrab() at cngrab+0x26/frame 0xfffffe015adc8b70 vpanic() at vpanic+0xd1/frame 0xfffffe015adc8ca0 panic() at panic+0x43/frame 0xfffffe015adc8d00 This crash occurred after I had terminated the X session and was in text mode on vty0. Let me know what else I might be able to provide.
(In reply to rkoberman from comment #20) Forgive me. I posted this to the wrong ticket. I'll reenter it (corrected) to the proper place.
(In reply to Oleg from comment #19) Oleg, if you update to latest main, you may hopefully see some more information or error printed before the KASSERT triggers. It would be helpful to know that.
After updating to the latest kernel, I still haven't been able to trigger the bug even after many attempts (typing "kldunload if_iwlwifi" or "ifconfig wlan0 up" early hasn't triggered the bug even after many attempts). I don't know why. In my previous message, I said that the bug was sometimes triggered and sometimes it wasn't, but today, I haven't been able to trigger it at all.
Hi! I also get this panic when running a kernel based on commit 7cff9672de44824d7d59cb562f53992a055e49cc. To be exact, I have a few more commits on top of for upcoming updates to drm-kmod. It's easy to reproduce: I simply use "service netif restart wlan0" (it was skipped during boot). Here are the few lines before the panic and the backtrace: <6>wlan0: ieee80211_new_state_locked: pending SCAN -> AUTH transition lost <4>Invalid TXQ id iwl_mvm_tx_mpdu:1204: fc 0x00b0 tid 8 txq_id 65535 mvm 0xfffffe0147794408 skb 0xfffff8000b884000 { len 30 } info 0xfffffe00c83f5ce8 sta 0xfffff803c451d880 (if you see this please ro PR 274382) panic: lkpi_sta_auth_to_scan: lsta 0xfffff8000f352000 state not NONE: 0, nstate 1 arg 1 cpuid = 6 time = 1697050125 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0147abdb70 vpanic() at vpanic+0x132/frame 0xfffffe0147abdca0 panic() at panic+0x43/frame 0xfffffe0147abdd00 lkpi_sta_auth_to_scan() at lkpi_sta_auth_to_scan+0x2c8/frame 0xfffffe0147abdd80 lkpi_iv_newstate() at lkpi_iv_newstate+0x253/frame 0xfffffe0147abddf0 ieee80211_newstate_cb() at ieee80211_newstate_cb+0x1e7/frame 0xfffffe0147abde40 taskqueue_run_locked() at taskqueue_run_locked+0xab/frame 0xfffffe0147abdec0 taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame 0xfffffe0147abdef0 fork_exit() at fork_exit+0x82/frame 0xfffffe0147abdf30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0147abdf30 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- Uptime: 14m18s Dumping 892 out of 16038 MB:..2%..11%..22%..31%..42%..51%..61%..72%..81%..92% __curthread () at /home/dumbbell/Documents/freebsd/src/sys/amd64/include/pcpu_aux.h:57 57 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) bt #0 __curthread () at /home/dumbbell/Documents/freebsd/src/sys/amd64/include/pcpu_aux.h:57 #1 doadump (textdump=textdump@entry=1) at /home/dumbbell/Documents/freebsd/src/sys/kern/kern_shutdown.c:405 #2 0xffffffff80b4f3e0 in kern_reboot (howto=260) at /home/dumbbell/Documents/freebsd/src/sys/kern/kern_shutdown.c:526 #3 0xffffffff80b4f8df in vpanic (fmt=0xffffffff811e6539 "%s: lsta %p state not NONE: %#x, nstate %d arg %d\n", ap=ap@entry=0xfffffe0147abdce0) at /home/dumbbell/Documents/freebsd/src/sys/kern/kern_shutdown.c:969 #4 0xffffffff80b4f683 in panic (fmt=<unavailable>) at /home/dumbbell/Documents/freebsd/src/sys/kern/kern_shutdown.c:894 #5 0xffffffff80dd2568 in lkpi_sta_auth_to_scan (vap=0xfffffe014a16d010, nstate=IEEE80211_S_SCAN, arg=1) at /home/dumbbell/Documents/freebsd/src/sys/compat/linuxkpi/common/src/linux_80211.c:1175 #6 0xffffffff80dd9c93 in lkpi_iv_newstate (vap=0xfffffe014a16d010, nstate=IEEE80211_S_SCAN, arg=1) at /home/dumbbell/Documents/freebsd/src/sys/compat/linuxkpi/common/src/linux_80211.c:2113 #7 0xffffffff80cff027 in ieee80211_newstate_cb (xvap=0xfffffe014a16d010, npending=<optimized out>) at /home/dumbbell/Documents/freebsd/src/sys/net80211/ieee80211_proto.c:2546 #8 0xffffffff80bb4ecb in taskqueue_run_locked (queue=queue@entry=0xfffff8000b21a600) at /home/dumbbell/Documents/freebsd/src/sys/kern/subr_taskqueue.c:512 #9 0xffffffff80bb5f83 in taskqueue_thread_loop (arg=arg@entry=0xfffffe0147798110) at /home/dumbbell/Documents/freebsd/src/sys/kern/subr_taskqueue.c:824 #10 0xffffffff80b05452 in fork_exit (callout=0xffffffff80bb5eb0 <taskqueue_thread_loop>, arg=0xfffffe0147798110, frame=0xfffffe0147abdf40) at /home/dumbbell/Documents/freebsd/src/sys/kern/kern_fork.c:1160
Hi, if you get either one or both of: (a) panic: lkpi_sta_auth_to_scan: ... (or other state names) (b) ieee80211_new_state_locked: pending SCAN -> AUTH transition lost (or other state names) could you apply the following patch: https://people.freebsd.org/~bz/wireless/20231025-01-80211-newstate.diff which will (1) give more information and (2) disable an extra case and report back. Note: the "ieee80211_new_state_locked:2682: RUN -> INIT (INIT) transition discarded" loggings are generally not interesting but I enabled them for the full picture.
(In reply to Bjoern A. Zeeb from comment #25) Patched and rebuilt the kernel. Crash looks a lot like the previous ones. After my last kernel update I am seeing some new messages during boot. I suspect that they are not new information, but they do look odd to me. iwlwifi0: WRT: Invalid buffer destination iwlwifi0: WFPM_UMAC_PD_NOTIFICATION: 0x20 iwlwifi0: WFPM_LMAC2_PD_NOTIFICATION: 0x1f iwlwifi0: WFPM_AUTH_KEY_0: 0x90 iwlwifi0: CNVI_SCU_SEQ_DATA_DW9: 0x0 iwlwifi0: RFIm is deactivated, reason = 4 I also see: wlan0: ieee80211_new_state_locked:2718: pending SCAN -> AUTH transition lost wlan0: ieee80211_new_state_locked:2718: pending AUTH -> SCAN transition lost Do you want the core file?
(In reply to Bjoern A. Zeeb from comment #25) Here are the steps I used to reproduce: (the if_iwlwifi module was already loaded) ifconfig wlan0 create wlandev iwlwifi0 country FR env wlans_iwlwifi0="wlan0" create_args_wlan0="country FR" ifconfig_wlan0="WPA DHCP" ifconfig_wlan0_ipv6="inet6 accept_rtadv" service netif restart wlan0 And here is the output with your patch: == The last lines of /var/log/messages == Nov 1 11:07:20 iss kernel: iwlwifi0: WRT: Invalid buffer destination Nov 1 11:07:21 iss kernel: iwlwifi0: WFPM_UMAC_PD_NOTIFICATION: 0x20 Nov 1 11:07:21 iss kernel: iwlwifi0: WFPM_LMAC2_PD_NOTIFICATION: 0x1f Nov 1 11:07:21 iss kernel: iwlwifi0: WFPM_AUTH_KEY_0: 0x90 Nov 1 11:07:21 iss kernel: iwlwifi0: CNVI_SCU_SEQ_DATA_DW9: 0x0 Nov 1 11:07:21 iss kernel: wlan0: Ethernet address: 04:cf:4b:1d:fe:fc Nov 1 11:07:38 iss wpa_supplicant[1534]: Successfully initialized wpa_supplicant Nov 1 11:07:38 iss wpa_supplicant[1534]: ioctl[SIOCS80211, op=20, val=0, arg_len=7]: Invalid argument Nov 1 11:07:38 iss syslogd: last message repeated 1 times Nov 1 11:07:38 iss wpa_supplicant[1535]: ioctl[SIOCS80211, op=103, val=0, arg_len=128]: Operation now in progress Nov 1 11:07:38 iss wpa_supplicant[1535]: wlan0: CTRL-EVENT-SCAN-FAILED ret=-1 retry=1 Nov 1 11:07:39 iss wpa_supplicant[1535]: ioctl[SIOCS80211, op=103, val=0, arg_len=128]: Operation now in progress Nov 1 11:07:39 iss wpa_supplicant[1535]: wlan0: CTRL-EVENT-SCAN-FAILED ret=-1 retry=1 == kgdb == (...) Reading symbols from /boot/kernel.drm/kernel... Reading symbols from /usr/lib/debug//boot/kernel.drm/kernel.debug... Unread portion of the kernel message buffer: <6>wlan0: ieee80211_new_state_locked:2718: pending SCAN -> AUTH transition lost <4>Invalid TXQ id iwl_mvm_tx_mpdu:1204: fc 0x00b0 tid 8 txq_id 65535 mvm 0xfffffe01762c6408 skb 0 xfffff802d41a6800 { len 30 } info 0xfffffe0038f6bce8 sta 0xfffff80114044880 (if you see this please report to PR 274382) panic: lkpi_sta_auth_to_scan: lsta 0xfffff80114c1e800 state not NONE: 0, nstate 1 arg 1 cpuid = 15 time = 1698833262 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0175ce8b70 vpanic() at vpanic+0x171/frame 0xfffffe0175ce8ca0 panic() at panic+0x43/frame 0xfffffe0175ce8d00 lkpi_sta_auth_to_scan() at lkpi_sta_auth_to_scan+0x2c8/frame 0xfffffe0175ce8d80 lkpi_iv_newstate() at lkpi_iv_newstate+0x253/frame 0xfffffe0175ce8df0 ieee80211_newstate_cb() at ieee80211_newstate_cb+0x1e7/frame 0xfffffe0175ce8e40 taskqueue_run_locked() at taskqueue_run_locked+0xab/frame 0xfffffe0175ce8ec0 taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame 0xfffffe0175ce8ef0 fork_exit() at fork_exit+0x82/frame 0xfffffe0175ce8f30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0175ce8f30 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- Uptime: 5m22s Dumping 1320 out of 32422 MB:..2%..11%..21%..31%..42%..51%..61%..71%..82%..91% (kgdb) bt #0 __curthread () at /home/dumbbell/Documents/freebsd/src/sys/amd64/include/pcpu_aux.h:57 #1 doadump (textdump=textdump@entry=1) at /home/dumbbell/Documents/freebsd/src/sys/kern/kern_shutdown.c:406 #2 0xffffffff80b4ffd0 in kern_reboot (howto=260) at /home/dumbbell/Documents/freebsd/src/sys/kern/kern_shutdown.c:527 #3 0xffffffff80b5050e in vpanic ( fmt=0xffffffff811e7898 "%s: lsta %p state not NONE: %#x, nstate %d arg %d\n ", ap=ap@entry=0xfffffe0175ce8ce0) at /home/dumbbell/Documents/freebsd/src/sys/kern/kern_shutdown.c:976 #4 0xffffffff80b50273 in panic (fmt=<unavailable>) at /home/dumbbell/Documents/freebsd/src/sys/kern/kern_shutdown.c:895 #5 0xffffffff80dd3ab8 in lkpi_sta_auth_to_scan (vap=0xfffffe017908f010, nstate=IEEE80211_S_SCAN, arg=1) at /home/dumbbell/Documents/freebsd/src/sys/compat/linuxkpi/common/src/linu x_80211.c:1175 #6 0xffffffff80ddb1e3 in lkpi_iv_newstate (vap=0xfffffe017908f010, nstate=IEEE80211_S_SCAN, arg=1) at /home/dumbbell/Documents/freebsd/src/sys/compat/linuxkpi/common/src/linu x_80211.c:2113 #7 0xffffffff80cfff87 in ieee80211_newstate_cb (xvap=0xfffffe017908f010, npending=<optimized out>) at /home/dumbbell/Documents/freebsd/src/sys/net80211/ieee80211_proto.c:2546 #8 0xffffffff80bb5d2b in taskqueue_run_locked ( queue=queue@entry=0xfffff80002a93100) at /home/dumbbell/Documents/freebsd/src/sys/kern/subr_taskqueue.c:512 #9 0xffffffff80bb6de3 in taskqueue_thread_loop ( arg=arg@entry=0xfffffe01762ca110) at /home/dumbbell/Documents/freebsd/src/sys/kern/subr_taskqueue.c:824 #10 0xffffffff80b05eb2 in fork_exit ( callout=0xffffffff80bb6d10 <taskqueue_thread_loop>, arg=0xfffffe01762ca110, frame=0xfffffe0175ce8f40) at /home/dumbbell/Documents/freebsd/src/sys/kern/kern_fork.c:1160 #11 <signal handler called>
Hit this panic in main with a patch to newstate-logging. cc@n1_iwl_vm:~ % uname -a FreeBSD n1_iwl_vm 15.0-CURRENT FreeBSD 15.0-CURRENT #1 main-f7d16a627-dirty: Thu Nov 9 16:03:11 EST 2023 cc@n1_iwl_vm:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 cc@n1_iwl_vm:~ % The reproduce method is just reboot with the following rc.conf setup. /etc/rc.conf wlans_iwlwifi0="wlan0" ifconfig_wlan0="WPA SYNCDHCP" create_args_wlan0="country US regdomain fcc" wlandebug_wlan0="+state " /boot/loader.conf boot_verbose="YES" kern.msgbufsize=1146880 console prints before panic: ... iwlwifi0: Detected crf-id 0x3617, cnv-id 0x100530 wfpm id 0x80000000 iwlwifi0: PCI dev 2723/0084, rev=0x340, rfid=0x10a100 firmware: 'iwlwifi-cc-a0-77.ucode' version 77: 1366144 bytes loaded at 0xffffffff826a5000 iwlwifi0: successfully loaded firmware image 'iwlwifi-cc-a0-77.ucode' iwlwifi0: api flags index 2 larger than supported by driver iwlwifi0: TLV_FW_FSEQ_VERSION: FSEQ Version: 89.3.35.37 iwl-debug-yoyo.bin: could not load firmware image, error 2 iwl-debug-yoyo.bin: could not load firmware image, error 2 iwl-debug-yoyo_bin: could not load firmware image, error 2 iwl_debug_yoyo_bin: could not load firmware image, error 2 iwlwifi0: loaded firmware version 77.2df8986f.0 cc-a0-77.ucode op_mode iwlmvm iwlwifi0: Detected Intel(R) Wi-Fi 6 AX200 160MHz, REV=0x340 iwlwifi0: Detected RF HR B3, rfid=0x10a100 iwlwifi0: base HW address: e0:2e:0b:92:e5:82 iwlwifi0: 11a rates: 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps iwlwifi0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps iwlwifi0: 11g rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps pci0: driver added wlan0: bpf attached wlan0: bpf attached wlan0: Ethernet address: e0:2e:0b:92:e5:82 net.wlan.0.debug: 0x0 => 0x80000<state> Created wlan(4) interfaces: wlan0. lo0: link state changed to UP vtnet0: link state changed to UP Starting dhclient. DHCPREQUEST on vtnet0 to 255.255.255.255 port 67 DHCPACK from 192.168.1.1 Bogus Host Name option 12: n1_iwl_vm (n1_iwl_vm) bound to 192.168.1.154 -- renewal in 21600 seconds. Starting wpa_supplicant. wlan0: start running, 0 vaps running wlan0: ieee80211_start_locked: up parent iwlwifi0 wlan0: start running, 1 vaps running wlan0: ieee80211_new_state_locked:2746: starting state update INIT -> INIT (SCAN) wlan0: ieee80211_new_state_locked: INIT -> SCAN (arg 0) (nrunning 0 nscanning 0) wlan0: ieee80211_newstate_cb:2517: running state update INIT -> SCAN (1) wlan0: ieee80211_newstate_cb: INIT -> SCAN arg 0 wlan0: sta_newstate: INIT -> SCAN (0) Starting dhclient. wlan0: no link .....wlan0: ieee80211_new_state_locked:2746: starting state update SCAN -> SCAN (AUTH) wlan0: ieee80211_new_state_locked: SCAN -> AUTH (arg 192) (nrunning 0 nscanning 0) wlan0: ieee80211_newstate_cb:2517: running state update SCAN -> AUTH (1) wlan0: ieee80211_newstate_cb: SCAN -> AUTH arg 192 wlan0: [f4:69:42:57:3f:0e] station assoc via MLME wlan0: ieee80211_new_state_locked:2731: pending SCAN -> AUTH (now to AUTH) transition lost wlan0: ieee80211_new_state_locked:2746: starting state update SCAN -> AUTH (AUTH) wlan0: ieee80211_new_state_locked: SCAN -> AUTH (arg 192) (nrunning 0 nscanning 0) wlan0: sta_newstate: SCAN -> AUTH (192) wlan0: ieee80211_newstate_cb:2517: running state update AUTH -> AUTH (1) wlan0: ieee80211_newstate_cb: AUTH -> AUTH arg 192 Invalid TXQ idiwl_mvm_tx_mpdu:1204: fc 0x00b0 tid 8 txq_id 65535 mvm 0xfffffe00b1250408 skb 0xfffff80007865800 { len 30 } info 0xfffffe00745dcce8 sta 0xfffff80005760880 (if you see this please report to PR 274382) wlan0: ni 0xfffffe00b15bf000 vap 0xfffffe00b12e0010 mode STA state AUTH m 0xfffff800078b1b00 status 4543576 wlan0: ni 0xfffffe00b15bf000 mode STA state AUTH arg 0x2 status 4543576 wlan0: sta_newstate: AUTH -> AUTH (192) wlan0: ni 0xfffffe00b15bf000 vap 0xfffffe00b12e0010 mode STA state AUTH m 0xfffff8000773cb00 status 1 wlan0: ni 0xfffffe00b15bf000 mode STA state AUTH arg 0x2 status 1 wlan0: vap 0xfffffe00b12e0010 mode STA state AUTH flags 0x2400 & 0x80 wlan0: ieee80211_new_state_locked:2746: starting state update AUTH -> AUTH (SCAN) wlan0: ieee80211_new_state_locked: AUTH -> SCAN (arg 1) (nrunning 0 nscanning 0) wlan0: ieee80211_newstate_cb:2517: running state update AUTH -> SCAN (1) wlan0: ieee80211_newstate_cb: AUTH -> SCAN arg 1 panic: lkpi_sta_auth_to_scan: lsta 0xfffff80007756800 state not NONE: 0, nstate 1 arg 1 cpuid = 6 time = 1699566558 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00b0eb5b70 vpanic() at vpanic+0x132/frame 0xfffffe00b0eb5ca0 panic() at panic+0x43/frame 0xfffffe00b0eb5d00 lkpi_sta_auth_to_scan() at lkpi_sta_auth_to_scan+0x2c8/frame 0xfffffe00b0eb5d80 lkpi_iv_newstate() at lkpi_iv_newstate+0x253/frame 0xfffffe00b0eb5df0 ieee80211_newstate_cb() at ieee80211_newstate_cb+0x226/frame 0xfffffe00b0eb5e40 taskqueue_run_locked() at taskqueue_run_locked+0xab/frame 0xfffffe00b0eb5ec0 taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame 0xfffffe00b0eb5ef0 fork_exit() at fork_exit+0x82/frame 0xfffffe00b0eb5f30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00b0eb5f30 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic [ thread pid 0 tid 100168 ] Stopped at kdb_enter+0x32: movq $0,0xe2aee3(%rip) db> dump Dumping 362 out of 6111 MB:..5%..14%..23%..31%..45%..53%..62%..71%..84%..93% Dump complete db>
(In reply to Cheng Cui from comment #28) Ok, based on the wlandebug +state and the additional logging from [1] here's the race in net80211 affecting possibly all drivers: [1] https://people.freebsd.org/~bz/wireless/20231109-01-net80211-newstate-logging.diff >>>> ANNOTATED OUTPUT from Comment 28 [https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=271979#c28]; hope we can confirm this with time stamps on. wlan0: sta_newstate: INIT -> SCAN (0) Starting dhclient. wlan0: no link ..... [1] wlan0: ieee80211_new_state_locked:2746: starting state update SCAN -> SCAN (AUTH) [1] wlan0: ieee80211_new_state_locked: SCAN -> AUTH (arg 192) (nrunning 0 nscanning 0) [1] wlan0: ieee80211_newstate_cb:2517: running state update SCAN -> AUTH (1) [1] wlan0: ieee80211_newstate_cb: SCAN -> AUTH arg 192 LinuxKPI running lkpi_sta_scan_to_auth() around here... wlan0: [f4:69:42:57:3f:0e] station assoc via MLME ioctl logging, triggering ieee80211_sta_join -> ieee80211_sta_join1 -> ieee80211_new_state(vap, AUTH, IEEE80211_FC0_SUBTYPE_DEAUTH=192) -> [2] ieee80211_sta_join() would allocate a new node (ni) and lsta in LinuxKPI. ieee80211_sta_join1() would then call iv_update_bss() and that would swap nodes. That explains the previous error Colin saw with the queue not having the valid node anymore and also explains why we later panic as the state is not correct anymore. If the assumption is correct a KASSERT in iv_update_bss() could probably catch this. I'll post a patch for that as well. I have a big XXX in that code anyway because of this. [2] wlan0: ieee80211_new_state_locked:2731: pending SCAN -> AUTH (now to AUTH) transition lost [2] wlan0: ieee80211_new_state_locked:2746: starting state update SCAN -> AUTH (AUTH) [2] wlan0: ieee80211_new_state_locked: SCAN -> AUTH (arg 192) (nrunning 0 nscanning 0) LinuxKPI calls into the original handler for [1] which means lkpi_sta_scan_to_auth() is done: [1] wlan0: sta_newstate: SCAN -> AUTH (192) here iv_state gets updated from SCAN to AUTH, [2] wlan0: ieee80211_newstate_cb:2517: running state update AUTH -> AUTH (1) [2] wlan0: ieee80211_newstate_cb: AUTH -> AUTH arg 192 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The [2] SCAN -> AUTH turned into AUTH -> AUTH; a further (LinuxKPI runs lkpi_sta_a_to_a() possibly mgmt protection problem given our sta_to_auth has not finished yet -- if we had a reply and moved to assoc; cc@ to test, which would explain the PR in the next line): Invalid TXQ idiwl_mvm_tx_mpdu:1204: fc 0x00b0 tid 8 txq_id 65535 mvm 0xfffffe00b1250408 skb 0xfffff80007865800 { len 30 } info 0xfffffe00745dcce8 sta 0xfffff80005760880 (if you see this please report to PR 274382) wlan0: ni 0xfffffe00b15bf000 vap 0xfffffe00b12e0010 mode STA state AUTH m 0xfffff800078b1b00 status 4543576 wlan0: ni 0xfffffe00b15bf000 mode STA state AUTH arg 0x2 status 4543576 [2] wlan0: sta_newstate: AUTH -> AUTH (192) should call sta_authretry(, with 192 >> 8 == 0 == IEEE80211_STATUS_SUCCESS) -> Sends another b0 (authentication). wlan0: ni 0xfffffe00b15bf000 vap 0xfffffe00b12e0010 mode STA state AUTH m 0xfffff8000773cb00 status 1
I found a good workaround. Basically, the following commands did the job without scanning or restarting the wlan0 interface. root@n1_iwl_vm:~ # ifconfig lo0: flags=1008049<UP,LOOPBACK,RUNNING,MULTICAST,LOWER_UP> metric 0 mtu 16384 options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6> inet 127.0.0.1 netmask 0xff000000 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 groups: lo nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> root@n1_iwl_vm:~ # root@n1_iwl_vm:~ # uname -a FreeBSD n1_iwl_vm 15.0-CURRENT FreeBSD 15.0-CURRENT #19 main-488bc7e9a: Tue Nov 21 11:42:00 EST 2023 root@n1_iwl_vm:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 root@n1_iwl_vm:~ # pciconf -lv | grep -B3 network iwlwifi0@pci0:0:5:0: class=0x028000 rev=0x1a hdr=0x00 vendor=0x8086 device=0x2723 subvendor=0x8086 subdevice=0x0084 vendor = 'Intel Corporation' device = 'Wi-Fi 6 AX200' class = network root@n1_iwl_vm:~ # root@n1_iwl_vm:~ # ifconfig wlan0 create wlandev iwlwifi0 regdomain fcc country US root@n1_iwl_vm:~ # wpa_supplicant -B -i wlan0 -c /etc/wpa_supplicant.conf Successfully initialized wpa_supplicant ioctl[SIOCS80211, op=20, val=0, arg_len=7]: Invalid argument ioctl[SIOCS80211, op=20, val=0, arg_len=7]: Invalid argument root@n1_iwl_vm:~ # ifconfig lo0: flags=1008049<UP,LOOPBACK,RUNNING,MULTICAST,LOWER_UP> metric 0 mtu 16384 options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6> inet 127.0.0.1 netmask 0xff000000 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 groups: lo nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> wlan0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=0 ether e0:2e:0b:92:e5:82 groups: wlan ssid SpectrumSetup-0F channel 157 (5785 MHz 11a) bssid f4:69:42:57:3f:0e regdomain FCC country US authmode WPA2/802.11i privacy ON deftxkey UNDEF AES-CCM 2:128-bit txpower 23 bmiss 7 mcastrate 6 mgmtrate 6 scanvalid 60 wme roaming MANUAL parent interface: iwlwifi0 media: IEEE 802.11 Wireless Ethernet OFDM/36Mbps mode 11a status: associated <<< associated! nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> root@n1_iwl_vm:~ # dhclient wlan0 DHCPREQUEST on wlan0 to 255.255.255.255 port 67 <<< request IP addr through DHCP DHCPACK from 192.168.1.1 Nov 21 11:51:35 n1_iwl_vm dhclient[654]: Bogus Host Name option 12: n1_iwl_vm (n1_iwl_vm) Bogus Host Name option 12: n1_iwl_vm (n1_iwl_vm) bound to 192.168.1.214 -- renewal in 21600 seconds. root@n1_iwl_vm:~ # ifconfig lo0: flags=1008049<UP,LOOPBACK,RUNNING,MULTICAST,LOWER_UP> metric 0 mtu 16384 options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6> inet 127.0.0.1 netmask 0xff000000 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 groups: lo nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> wlan0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=0 ether e0:2e:0b:92:e5:82 inet 192.168.1.214 netmask 0xffffff00 broadcast 192.168.1.255 groups: wlan ssid SpectrumSetup-0F channel 157 (5785 MHz 11a) bssid f4:69:42:57:3f:0e regdomain FCC country US authmode WPA2/802.11i privacy ON deftxkey UNDEF AES-CCM 2:128-bit txpower 23 bmiss 7 mcastrate 6 mgmtrate 6 scanvalid 60 wme roaming MANUAL parent interface: iwlwifi0 media: IEEE 802.11 Wireless Ethernet OFDM/36Mbps mode 11a status: associated nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> root@n1_iwl_vm:~ # ping -c 3 -S 192.168.1.214 192.168.1.1 PING 192.168.1.1 (192.168.1.1) from 192.168.1.214: 56 data bytes 64 bytes from 192.168.1.1: icmp_seq=0 ttl=64 time=2.904 ms 64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=1.073 ms 64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=1.924 ms --- 192.168.1.1 ping statistics --- 3 packets transmitted, 3 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 1.073/1.967/2.904/0.748 ms
(In reply to Cheng Cui from comment #30) Well, my above workaround may not work. I tested it multiple times. Sometimes it crashes or sometimes it works. :(
(In reply to Cheng Cui from comment #31) Workaround update: Well, I retrieved what I am missing. It looks adding the ssid in the first place during "ifconfig wlan0 create" makes it stable. I found no more crashes. root@n1_iwl_vm:~ # ifconfig lo0: flags=1008049<UP,LOOPBACK,RUNNING,MULTICAST,LOWER_UP> metric 0 mtu 16384 options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6> inet 127.0.0.1 netmask 0xff000000 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 groups: lo nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> root@n1_iwl_vm:~ # ifconfig wlan0 create wlandev iwlwifi0 regdomain fcc country US ssid SpectrumSetup-0F root@n1_iwl_vm:~ # ifconfig lo0: flags=1008049<UP,LOOPBACK,RUNNING,MULTICAST,LOWER_UP> metric 0 mtu 16384 options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6> inet 127.0.0.1 netmask 0xff000000 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 groups: lo nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> wlan0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=0 ether e0:2e:0b:92:e5:82 groups: wlan ssid SpectrumSetup-0F channel 1 (2412 MHz 11b) regdomain FCC country US authmode OPEN privacy OFF txpower 30 bmiss 7 scanvalid 60 wme bintval 0 parent interface: iwlwifi0 media: IEEE 802.11 Wireless Ethernet autoselect (autoselect) status: no carrier nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> root@n1_iwl_vm:~ # wpa_supplicant -B -i wlan0 -c /etc/wpa_supplicant.conf Successfully initialized wpa_supplicant ioctl[SIOCS80211, op=20, val=0, arg_len=7]: Invalid argument ioctl[SIOCS80211, op=20, val=0, arg_len=7]: Invalid argument root@n1_iwl_vm:~ # iwlwifi0: Not associated and the session protection is over already... iwlwifi0: linuxkpi_ieee80211_connection_loss: vif 0xfffffe00bdea8c80 vap 0xfffffe00bdea8010 state AUTH root@n1_iwl_vm:~ # ifconfig lo0: flags=1008049<UP,LOOPBACK,RUNNING,MULTICAST,LOWER_UP> metric 0 mtu 16384 options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6> inet 127.0.0.1 netmask 0xff000000 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 groups: lo nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> wlan0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=0 ether e0:2e:0b:92:e5:82 groups: wlan ssid SpectrumSetup-0F channel 1 (2412 MHz 11g) bssid f4:69:42:57:3f:0d regdomain FCC country US authmode WPA2/802.11i privacy ON deftxkey UNDEF AES-CCM 2:128-bit txpower 30 bmiss 7 scanvalid 60 protmode CTS wme roaming MANUAL parent interface: iwlwifi0 media: IEEE 802.11 Wireless Ethernet OFDM/36Mbps mode 11g status: associated nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> root@n1_iwl_vm:~ # dhclient wlan0 DHCPREQUEST on wlan0 to 255.255.255.255 port 67 DHCPACK from 192.168.1.1 Nov 21 12:42:57 n1_iwl_vm dhclient[653]: Bogus Host Name option 12: n1_iwl_vm (n1_iwl_vm) Bogus Host Name option 12: n1_iwl_vm (n1_iwl_vm) bound to 192.168.1.214 -- renewal in 21600 seconds. root@n1_iwl_vm:~ # ifconfig lo0: flags=1008049<UP,LOOPBACK,RUNNING,MULTICAST,LOWER_UP> metric 0 mtu 16384 options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6> inet 127.0.0.1 netmask 0xff000000 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 groups: lo nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> wlan0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=0 ether e0:2e:0b:92:e5:82 inet 192.168.1.214 netmask 0xffffff00 broadcast 192.168.1.255 groups: wlan ssid SpectrumSetup-0F channel 1 (2412 MHz 11g) bssid f4:69:42:57:3f:0d regdomain FCC country US authmode WPA2/802.11i privacy ON deftxkey UNDEF AES-CCM 2:128-bit txpower 30 bmiss 7 scanvalid 60 protmode CTS wme roaming MANUAL parent interface: iwlwifi0 media: IEEE 802.11 Wireless Ethernet OFDM/36Mbps mode 11g status: associated nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> root@n1_iwl_vm:~ # ping -c 3 -S 192.168.1.214 192.168.1.1 PING 192.168.1.1 (192.168.1.1) from 192.168.1.214: 56 data bytes 64 bytes from 192.168.1.1: icmp_seq=0 ttl=64 time=3.485 ms 64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=2.810 ms 64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=1.336 ms --- 192.168.1.1 ping statistics --- 3 packets transmitted, 3 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 1.336/2.544/3.485/0.897 ms root@n1_iwl_vm:~ #
(In reply to Cheng Cui from comment #30) > root@n1_iwl_vm:~ # ifconfig wlan0 create wlandev iwlwifi0 regdomain fcc country US > root@n1_iwl_vm:~ # wpa_supplicant -B -i wlan0 -c /etc/wpa_supplicant.conf > Successfully initialized wpa_supplicant There is a possible panic there; I had situation where I could run that for almost a day in a row not provoking it, and others can at the instant of a seconds.
(In reply to Cheng Cui from comment #32) > Workaround update: > > Well, I retrieved what I am missing. It looks adding the ssid in the first place during "ifconfig wlan0 create" makes it stable. I found no more crashes. That only means you likely only have one BSSID for that SSID. Once you have two or three APs for the same SSID and you set wpa_supplicant.conf to ignore the one net80211 would pick to try to assoc too after a scan I assume you will still see the crash.
*** Bug 275255 has been marked as a duplicate of this bug. ***
I am on 15.0-CURRENT (305a2676ae93fb50a623024d51039415521cb2da), I have multiple base stations under one SSID, and I am experiencing this same crash on boot: wlans_iwlwifi0="wlan0" ifconfig_wlan0="WPA SYNCDHCP" ifconfig_wlan0_ipv6="inet6 auto_linklocal accept_rtadv" #32 0xffffffff80dd3b08 in lkpi_sta_auth_to_scan (vap=0xfffffe01636e8010, nstate=IEEE80211_S_SCAN, arg=1) at /usr/src/sys/compat/linuxkpi/common/src/linux_80211.c:1175 #33 0xffffffff80ddb263 in lkpi_iv_newstate (vap=0xfffffe01636e8010, nstate=IEEE80211_S_SCAN, arg=1) at /usr/src/sys/compat/linuxkpi/common/src/linux_80211.c:2113 #34 0xffffffff80cffee7 in ieee80211_newstate_cb (xvap=0xfffffe01636e8010, npending=<optimized out>) at /usr/src/sys/net80211/ieee80211_proto.c:2546
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=643d6dce6c1e39f067f8d0feea8615913b324891 commit 643d6dce6c1e39f067f8d0feea8615913b324891 Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2023-12-01 01:37:25 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2023-12-01 01:48:34 +0000 tools/net80211: add mlme_assoc mlme_assoc is a tool to trigger net80211::ieee80211_sta_join1() calls which in certain conditions cause problems to the LinuxKPI 802.11 compat code (but also believed to possibly cause problems in case of race to other firmware based drivers). This has proven to be a good reproducer for the problem even on setups which otherwise could run for days without hitting it. Sponsored by: The FreeBSD Foundation PR: 271979 tools/tools/net80211/mlme_assoc/Makefile (new) | 7 + tools/tools/net80211/mlme_assoc/README (new) | 51 ++++++ tools/tools/net80211/mlme_assoc/mlme_assoc.c (new) | 200 +++++++++++++++++++++ 3 files changed, 258 insertions(+)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=2ac8a2189ac6707f48f77ef2e36baf696a0d2f40 commit 2ac8a2189ac6707f48f77ef2e36baf696a0d2f40 Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2024-02-03 16:33:56 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-14 19:47:53 +0000 LinuxKPI: 802.11: band-aid for invalid state changes after (*iv_update_bss) With firmware based solutions we cannot just jump from an active session to a new iv_bss node without tearing down state for the old and bringing up the new node. This likely used to work on softmac based cards/drivers where one could essentially set the state and fire at will. We track (*iv_update_bss) calls from net80211 and set a local flag that we are out of synch and do not allow any further operations up the state machine until we hit INIT or SCAN. That means someone will take the state down, clean up firmware state and then we can join again and build up state. Apparently this problem has been "known" for a while as native iwm(4) and others have similar workarounds (though less strict) and can be equally pestered into bad states. For LinuxKPI all the KASSERTs just massively brought this problem out. The solution will be some rewrites in net80211. Until then, try to keep us more stable at least and not die on second join1() calls triggered by service netif start wlan0 and similar. PR: 271979, 271988, 275255, 263613, 274003 Sponsored by: The FreeBSD Foundation (2023, partial) MFC after: 3 days Reviewed by: cc Differential Revision: https://reviews.freebsd.org/D43725 sys/compat/linuxkpi/common/src/linux_80211.c | 309 +++++++++++++++++++-------- sys/compat/linuxkpi/common/src/linux_80211.h | 2 + 2 files changed, 216 insertions(+), 95 deletions(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=713db49d06deee90dd358b2e4b9ca05368a5eaf6 commit 713db49d06deee90dd358b2e4b9ca05368a5eaf6 Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2024-01-10 10:14:16 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-14 19:47:21 +0000 net80211: deal with lost state transitions Since 5efea30f039c4 we can possibly lose a state transition which can cause trouble further down the road. The reproducer from 643d6dce6c1e can trigger these for example. Drivers for firmware based wireless cards have worked around some of this (and other) problems in the past. Add an array of tasks rather than a single one as we would simply get npending > 1 and lose order with other tasks. Try to keep state changes updated as queued in case we end up with more than one at a time. While this is not ideal either (call it a hack) it will sort the problem for now. We will queue in ieee80211_new_state_locked() and do checks there and dequeue in ieee80211_newstate_cb(). If we still overrun the (currently) 8 slots we will drop the state change rather than overwrite the last one. When dequeing we will update iv_nstate and keep it around for historic reasons for the moment. The longer term we should make the callers of ieee80211_new_state[_locked]() actually use the returned errors and act appropriately but that will touch a lot more places and drivers (possibly incl. changed behaviour for ioctls). rtwn(4) and rum(4) should probably be revisted and net80211 internals removed (for rum(4) at least the current logic still seems prone to races). PR: 271979, 271988, 275255, 263613, 274003 Sponsored by: The FreeBSD Foundation (in 2023) MFC after: 3 days Reviewed by: cc Differential Revision: https://reviews.freebsd.org/D43389 sys/dev/rtwn/if_rtwn.c | 4 +- sys/dev/usb/wlan/if_rum.c | 4 +- sys/net80211/ieee80211.c | 4 +- sys/net80211/ieee80211_ddb.c | 13 ++++- sys/net80211/ieee80211_proto.c | 124 ++++++++++++++++++++++++++++++++++------- sys/net80211/ieee80211_var.h | 13 ++++- 6 files changed, 134 insertions(+), 28 deletions(-)
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=8c450ea1083b03f30871506b59034f26bc608972 commit 8c450ea1083b03f30871506b59034f26bc608972 Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2024-02-03 16:33:56 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-18 18:31:17 +0000 LinuxKPI: 802.11: band-aid for invalid state changes after (*iv_update_bss) With firmware based solutions we cannot just jump from an active session to a new iv_bss node without tearing down state for the old and bringing up the new node. This likely used to work on softmac based cards/drivers where one could essentially set the state and fire at will. We track (*iv_update_bss) calls from net80211 and set a local flag that we are out of synch and do not allow any further operations up the state machine until we hit INIT or SCAN. That means someone will take the state down, clean up firmware state and then we can join again and build up state. Apparently this problem has been "known" for a while as native iwm(4) and others have similar workarounds (though less strict) and can be equally pestered into bad states. For LinuxKPI all the KASSERTs just massively brought this problem out. The solution will be some rewrites in net80211. Until then, try to keep us more stable at least and not die on second join1() calls triggered by service netif start wlan0 and similar. PR: 271979, 271988, 275255, 263613, 274003 Sponsored by: The FreeBSD Foundation (2023, partial) Reviewed by: cc Differential Revision: https://reviews.freebsd.org/D43725 (cherry picked from commit 2ac8a2189ac6707f48f77ef2e36baf696a0d2f40) sys/compat/linuxkpi/common/src/linux_80211.c | 309 +++++++++++++++++++-------- sys/compat/linuxkpi/common/src/linux_80211.h | 2 + 2 files changed, 216 insertions(+), 95 deletions(-)
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=b392b36d3776b696601ce0253256803276d24ea2 commit b392b36d3776b696601ce0253256803276d24ea2 Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2024-01-10 10:14:16 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-18 18:31:17 +0000 net80211: deal with lost state transitions Since 5efea30f039c4 we can possibly lose a state transition which can cause trouble further down the road. The reproducer from 643d6dce6c1e can trigger these for example. Drivers for firmware based wireless cards have worked around some of this (and other) problems in the past. Add an array of tasks rather than a single one as we would simply get npending > 1 and lose order with other tasks. Try to keep state changes updated as queued in case we end up with more than one at a time. While this is not ideal either (call it a hack) it will sort the problem for now. We will queue in ieee80211_new_state_locked() and do checks there and dequeue in ieee80211_newstate_cb(). If we still overrun the (currently) 8 slots we will drop the state change rather than overwrite the last one. When dequeing we will update iv_nstate and keep it around for historic reasons for the moment. The longer term we should make the callers of ieee80211_new_state[_locked]() actually use the returned errors and act appropriately but that will touch a lot more places and drivers (possibly incl. changed behaviour for ioctls). rtwn(4) and rum(4) should probably be revisted and net80211 internals removed (for rum(4) at least the current logic still seems prone to races). Given this changes the internal structure of 'struct ieee80211vap', which gets allocated by the drivers, and we do not have enough spares, all wireless drivers need to be recompiled. Given we are forced to do the update, we leave fields in the middle of the struct and add more spares at the same time. __FreeBSD_version gets updated to 1400509 to be able to detect this change. PR: 271979, 271988, 275255, 263613, 274003 Sponsored by: The FreeBSD Foundation (in 2023) Reviewed by: cc Differential Revision: https://reviews.freebsd.org/D43389 (cherry picked from commit 713db49d06deee90dd358b2e4b9ca05368a5eaf6) (cherry picked from commit a890a3a5ddf33acb0a4000885945b89156799b07) UPDATING | 6 ++ sys/dev/rtwn/if_rtwn.c | 4 +- sys/dev/usb/wlan/if_rum.c | 4 +- sys/net80211/ieee80211.c | 4 +- sys/net80211/ieee80211_ddb.c | 13 ++++- sys/net80211/ieee80211_proto.c | 124 ++++++++++++++++++++++++++++++++++------- sys/net80211/ieee80211_var.h | 15 +++-- sys/sys/param.h | 2 +- 8 files changed, 142 insertions(+), 30 deletions(-)
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=e1d739471efdc6fe32af570e4bd07875a7e502ff commit e1d739471efdc6fe32af570e4bd07875a7e502ff Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2023-12-01 01:37:25 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-18 18:31:13 +0000 tools/net80211: add mlme_assoc mlme_assoc is a tool to trigger net80211::ieee80211_sta_join1() calls which in certain conditions cause problems to the LinuxKPI 802.11 compat code (but also believed to possibly cause problems in case of race to other firmware based drivers). This has proven to be a good reproducer for the problem even on setups which otherwise could run for days without hitting it. Sponsored by: The FreeBSD Foundation PR: 271979 (cherry picked from commit 643d6dce6c1e39f067f8d0feea8615913b324891) tools/tools/net80211/mlme_assoc/Makefile (new) | 7 + tools/tools/net80211/mlme_assoc/README (new) | 51 ++++++ tools/tools/net80211/mlme_assoc/mlme_assoc.c (new) | 200 +++++++++++++++++++++ 3 files changed, 258 insertions(+)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=184ccc414686ea32c64f063c081c7cc1adeae7c3 commit 184ccc414686ea32c64f063c081c7cc1adeae7c3 Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2024-02-03 16:33:56 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-19 08:02:02 +0000 LinuxKPI: 802.11: band-aid for invalid state changes after (*iv_update_bss) With firmware based solutions we cannot just jump from an active session to a new iv_bss node without tearing down state for the old and bringing up the new node. This likely used to work on softmac based cards/drivers where one could essentially set the state and fire at will. We track (*iv_update_bss) calls from net80211 and set a local flag that we are out of synch and do not allow any further operations up the state machine until we hit INIT or SCAN. That means someone will take the state down, clean up firmware state and then we can join again and build up state. Apparently this problem has been "known" for a while as native iwm(4) and others have similar workarounds (though less strict) and can be equally pestered into bad states. For LinuxKPI all the KASSERTs just massively brought this problem out. The solution will be some rewrites in net80211. Until then, try to keep us more stable at least and not die on second join1() calls triggered by service netif start wlan0 and similar. PR: 271979, 271988, 275255, 263613, 274003 Sponsored by: The FreeBSD Foundation (2023, partial) Reviewed by: cc Differential Revision: https://reviews.freebsd.org/D43725 (cherry picked from commit 2ac8a2189ac6707f48f77ef2e36baf696a0d2f40) sys/compat/linuxkpi/common/src/linux_80211.c | 309 +++++++++++++++++++-------- sys/compat/linuxkpi/common/src/linux_80211.h | 2 + 2 files changed, 216 insertions(+), 95 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=a7e1fc7f620d3341549c1380f550aaafbdb45622 commit a7e1fc7f620d3341549c1380f550aaafbdb45622 Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2024-01-10 10:14:16 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-19 08:02:01 +0000 net80211: deal with lost state transitions Since 5efea30f039c4 we can possibly lose a state transition which can cause trouble further down the road. The reproducer from 643d6dce6c1e can trigger these for example. Drivers for firmware based wireless cards have worked around some of this (and other) problems in the past. Add an array of tasks rather than a single one as we would simply get npending > 1 and lose order with other tasks. Try to keep state changes updated as queued in case we end up with more than one at a time. While this is not ideal either (call it a hack) it will sort the problem for now. We will queue in ieee80211_new_state_locked() and do checks there and dequeue in ieee80211_newstate_cb(). If we still overrun the (currently) 8 slots we will drop the state change rather than overwrite the last one. When dequeing we will update iv_nstate and keep it around for historic reasons for the moment. The longer term we should make the callers of ieee80211_new_state[_locked]() actually use the returned errors and act appropriately but that will touch a lot more places and drivers (possibly incl. changed behaviour for ioctls). rtwn(4) and rum(4) should probably be revisted and net80211 internals removed (for rum(4) at least the current logic still seems prone to races). PR: 271979, 271988, 275255, 263613, 274003 Sponsored by: The FreeBSD Foundation (in 2023) Reviewed by: cc Differential Revision: https://reviews.freebsd.org/D43389 (cherry picked from commit 713db49d06deee90dd358b2e4b9ca05368a5eaf6) Given this changes the internal structure of 'struct ieee80211vap', which gets allocated by the drivers, and we do not have enough spares, all wireless drivers need to be recompiled. Given we are forced to do the update, we leave fields in the middle of the struct and add more spares at the same time. __FreeBSD_version gets updated to 1303501 to be able to detect this change. (cherry picked from commit a890a3a5ddf33acb0a4000885945b89156799b07) UPDATING | 6 ++ sys/dev/rtwn/if_rtwn.c | 4 +- sys/dev/usb/wlan/if_rum.c | 4 +- sys/net80211/ieee80211.c | 4 +- sys/net80211/ieee80211_ddb.c | 15 ++++- sys/net80211/ieee80211_proto.c | 124 ++++++++++++++++++++++++++++++++++------- sys/net80211/ieee80211_var.h | 18 +++--- sys/sys/param.h | 2 +- 8 files changed, 143 insertions(+), 34 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=135f22ad82f6b5179f40123a8b0b743428146729 commit 135f22ad82f6b5179f40123a8b0b743428146729 Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2023-12-01 01:37:25 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-19 08:01:59 +0000 tools/net80211: add mlme_assoc mlme_assoc is a tool to trigger net80211::ieee80211_sta_join1() calls which in certain conditions cause problems to the LinuxKPI 802.11 compat code (but also believed to possibly cause problems in case of race to other firmware based drivers). This has proven to be a good reproducer for the problem even on setups which otherwise could run for days without hitting it. Sponsored by: The FreeBSD Foundation PR: 271979 (cherry picked from commit 643d6dce6c1e39f067f8d0feea8615913b324891) tools/tools/net80211/mlme_assoc/Makefile (new) | 7 + tools/tools/net80211/mlme_assoc/README (new) | 51 ++++++ tools/tools/net80211/mlme_assoc/mlme_assoc.c (new) | 200 +++++++++++++++++++++ 3 files changed, 258 insertions(+)
A commit in branch releng/13.3 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=9b998db87c28356fce21784c4f8bfb8737615e1f commit 9b998db87c28356fce21784c4f8bfb8737615e1f Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2024-01-10 10:14:16 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-19 16:07:20 +0000 net80211: deal with lost state transitions Since 5efea30f039c4 we can possibly lose a state transition which can cause trouble further down the road. The reproducer from 643d6dce6c1e can trigger these for example. Drivers for firmware based wireless cards have worked around some of this (and other) problems in the past. Add an array of tasks rather than a single one as we would simply get npending > 1 and lose order with other tasks. Try to keep state changes updated as queued in case we end up with more than one at a time. While this is not ideal either (call it a hack) it will sort the problem for now. We will queue in ieee80211_new_state_locked() and do checks there and dequeue in ieee80211_newstate_cb(). If we still overrun the (currently) 8 slots we will drop the state change rather than overwrite the last one. When dequeing we will update iv_nstate and keep it around for historic reasons for the moment. The longer term we should make the callers of ieee80211_new_state[_locked]() actually use the returned errors and act appropriately but that will touch a lot more places and drivers (possibly incl. changed behaviour for ioctls). rtwn(4) and rum(4) should probably be revisted and net80211 internals removed (for rum(4) at least the current logic still seems prone to races). PR: 271979, 271988, 275255, 263613, 274003 Sponsored by: The FreeBSD Foundation (in 2023) Reviewed by: cc Differential Revision: https://reviews.freebsd.org/D43389 (cherry picked from commit 713db49d06deee90dd358b2e4b9ca05368a5eaf6) Given this changes the internal structure of 'struct ieee80211vap', which gets allocated by the drivers, and we do not have enough spares, all wireless drivers need to be recompiled. Given we are forced to do the update, we leave fields in the middle of the struct and add more spares at the same time. __FreeBSD_version will get updated to 1303001 to be able to detect this change. Approved by: re (cperciva) (cherry picked from commit a890a3a5ddf33acb0a4000885945b89156799b07) (cherry picked from commit a7e1fc7f620d3341549c1380f550aaafbdb45622) sys/dev/rtwn/if_rtwn.c | 4 +- sys/dev/usb/wlan/if_rum.c | 4 +- sys/net80211/ieee80211.c | 4 +- sys/net80211/ieee80211_ddb.c | 15 ++++- sys/net80211/ieee80211_proto.c | 124 ++++++++++++++++++++++++++++++++++------- sys/net80211/ieee80211_var.h | 18 +++--- 6 files changed, 136 insertions(+), 33 deletions(-)
A commit in branch releng/13.3 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=d4b4efc6db6c6c3a9abf2f187ba1ccc0e40028cf commit d4b4efc6db6c6c3a9abf2f187ba1ccc0e40028cf Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2024-02-03 16:33:56 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-19 16:09:22 +0000 LinuxKPI: 802.11: band-aid for invalid state changes after (*iv_update_bss) With firmware based solutions we cannot just jump from an active session to a new iv_bss node without tearing down state for the old and bringing up the new node. This likely used to work on softmac based cards/drivers where one could essentially set the state and fire at will. We track (*iv_update_bss) calls from net80211 and set a local flag that we are out of synch and do not allow any further operations up the state machine until we hit INIT or SCAN. That means someone will take the state down, clean up firmware state and then we can join again and build up state. Apparently this problem has been "known" for a while as native iwm(4) and others have similar workarounds (though less strict) and can be equally pestered into bad states. For LinuxKPI all the KASSERTs just massively brought this problem out. The solution will be some rewrites in net80211. Until then, try to keep us more stable at least and not die on second join1() calls triggered by service netif start wlan0 and similar. Approved by: re (cperciva) PR: 271979, 271988, 275255, 263613, 274003 Sponsored by: The FreeBSD Foundation (2023, partial) Reviewed by: cc Differential Revision: https://reviews.freebsd.org/D43725 (cherry picked from commit 2ac8a2189ac6707f48f77ef2e36baf696a0d2f40) (cherry picked from commit 184ccc414686ea32c64f063c081c7cc1adeae7c3) sys/compat/linuxkpi/common/src/linux_80211.c | 309 +++++++++++++++++++-------- sys/compat/linuxkpi/common/src/linux_80211.h | 2 + 2 files changed, 216 insertions(+), 95 deletions(-)
The firmware crashes seen should be gone for a while I believe the lkpi_sta_auth_to_scan panic is fixed in 15/14/13/and 13.3 from RC1 on. I'll leave it open for a few more days. Would be great if some of the people who have seen this could confirm it to no longer be the case? FYI: suspend/resume is tracked in 263632 and the "Invalid TXQ id" is tracked in 274382 and are considered different problems. Please follow-up there if there are any news on those.
not sure if i belong or not but i just installed couldn’t get wifi working in the installer. upon booting and setting up ifconfig/wpa_supplicant im constantly spammed by iwlwifi every few nano seconds. the issue seems to be either unable to detect the network or a problem logging in. the spam panic messages the driver gives me tends to point in the direction of authentication. other then the panic messages im getting i haven’t really found where the rest of the logs are.
(In reply to mark from comment #49) Which release/snapshot image did you try and which chipset do you have?
13.2 i think it’s rc1? i have the killer wi-fi 6 ax 200. i’ll try 13.1 and report back if its still not working. i was also getting some weird issues when trying to add the device so not sure if its a related problem or something else.
(In reply to Bjoern A. Zeeb from comment #50) sorry that last comment was in response to your question. i didn’t notice the reply button at first :p
(In reply to mark from comment #51) Try a 13.3-RC1 once it is out. 13.3-BETA3 was too early for some bug fixes. Or the Feb 22 14.0-STABLE snapshot image maybe: https://download.freebsd.org/snapshots/amd64/amd64/ISO-IMAGES/14.0/?C=M&O=D That should have all the latest bits. 13.2 or 13.1 won't do you much good.
(In reply to Bjoern A. Zeeb from comment #53) latest snapshot worked. how do i get updates to work? would i have to wait for the latest 14 release?
(In reply to mark from comment #54) no freebsd-update for stable; indeed. You'd have to wait for 14.1 or alternatively 13.3-R which is due soon. or you need to manually track stable for a while.
(In reply to Bjoern A. Zeeb from comment #55) Pressed the button too early. Fantastic news by the way and thanks for testing and reporting back! It's much appreciated.
We believe this is all fixed. In case this specific issue still shows up please re-open. Thanks a lot to everyone reporting and testing!