"meta-bug" for collecting all (new) issues after the iwlwifi driver and firmware update 2023-09-21.
guest VM: -current (commit: 8a77bc5e1b - past your 16e688b2a commit) host: stable/14 host (Ryzen) It came up fine but *crashed* on "service netif restart wlan0". In case it matters, I am running GENERIC-NODEBUG, wifi: Intel 9260 [Note: This is the first time it worked at all in a VM!] dhclient exiting Stopping wpa_supplicant. Waiting for PIDS: 354iwlwifi0: Couldn't drain frames for staid 0, status 0x8 iwlwifi0: lkpi_iv_newstate: error -5 during state transition 5 (RUN) -> 0 (INIT) . Stopping Network: wlan0. wlan0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=0 ether xxxxxxxxxxxxx groups: wlan ssid "" channel 8 (2447 MHz 11g) regdomain FCC country US authmode OPEN privacy OFF txpower 30 bmiss 7 scanvalid 60 protmode CTS wme parent interface: iwlwifi0 media: IEEE 802.11 Wireless Ethernet autoselect (autoselect) status: no carrier nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> Sep 30 22:48:50 fbsd15 dhclient[666]: connection closed Sep 30 22:48:50 fbsd15 dhclient[666]: exiting. Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x458 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80b27cac stack pointer = 0x28:0xfffffe00748e29c0 frame pointer = 0x28:0xfffffe00748e2a40 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 2378 (ifconfig) rdi: fffffe0075f7c1b0 rsi: 0000000000000004 rdx: 0000000000000000 rcx: fffffe00759ab740 r8: 0000000000000000 r9: fffff80001054b98 rax: 0000000000000000 rbx: 0000000000000000 rbp: fffffe00748e2a40 r10: 0000000000000000 r11: fffff8000103dd70 r12: fffffe00748e29e0 r13: fffffe00759ab740 r14: 0000000000000000 r15: fffffe0075f7c1b0 trap number = 12 ... panic() at panic+0x43/frame 0xfffffe00748e2830 trap_fatal() at trap_fatal+0x40c/frame 0xfffffe00748e2890 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00748e28f0 calltrap() at calltrap+0x8/frame 0xfffffe00748e28f0 --- trap 0xc, rip = 0xffffffff80b27cac, rsp = 0xfffffe00748e29c0, rbp = 0xfffffe00748e2a40 --- __mtx_lock_sleep() at __mtx_lock_sleep+0xbc/frame 0xfffffe00748e2a40 ieee80211_node_psq_drain() at ieee80211_node_psq_drain+0x100/frame 0xfffffe00748e2a90 node_cleanup() at node_cleanup+0x65/frame 0xfffffe00748e2ac0 node_free() at node_free+0x25/frame 0xfffffe00748e2ae0 ieee80211_node_vdetach() at ieee80211_node_vdetach+0x2b/frame 0xfffffe00748e2b00 ieee80211_vap_detach() at ieee80211_vap_detach+0x41d/frame 0xfffffe00748e2b40 lkpi_ic_vap_delete() at lkpi_ic_vap_delete+0x9d/frame 0xfffffe00748e2b80 wlan_clone_destroy() at wlan_clone_destroy+0x12/frame 0xfffffe00748e2b90 if_clone_destroy() at if_clone_destroy+0x91/frame 0xfffffe00748e2bd0 ifioctl() at ifioctl+0x899/frame 0xfffffe00748e2cc0 kern_ioctl() at kern_ioctl+0x255/frame 0xfffffe00748e2d30 sys_ioctl() at sys_ioctl+0x123/frame 0xfffffe00748e2e00 amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe00748e2f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00748e2f30 --- syscall (54, FreeBSD ELF64, ioctl), rip = 0x1188e953e28a, rsp = 0x1188e5b34f68, rbp = 0x1188e5b34fa0 --- KDB: enter: panic [ thread pid 2378 tid 100283 ] Stopped at kdb_enter+0x32: movq $0,0xe29e83(%rip) db>
(In reply to Bakul Shah from comment #1) Initially looking at the backtrace I thought it's a problem related to what I posted in the follow-up (backtraces may differ): https://lists.freebsd.org/archives/freebsd-wireless/2023-September/001449.html But thank to posting the full output, the real problem comes out of iwlwifi: iwlwifi0: Couldn't drain frames for staid 0, status 0x8 It's a FreeBSD enhanced error message already. Given the status I added to that error it seems the sta is gone already when we try to drain (ADD_STA_MODIFY_NON_EXISTING_STA -- driver requested to modify a station that doesn't exit). So after all it could be related to the node_free() problem in your backtrace, which I started tracing on Saturday. I'll follow-up when that is supposed to be fixed and we'll see if this is going away then too.
(In reply to Bjoern A. Zeeb from comment #2) Also seems to be related to PR 273985.
After exiting the VM, the pci slot has to be reset (from the host) before the interface works again. But even then it worked rarely. typically I would see Oct 2 11:58:39 fbsd15 dhclient[999]: send_packet: No buffer space available iwlwifi0: No beacon heard and the time event is over already... iwlwifi0: Couldn't drain frames for staid 0, status 0x8 iwlwifi0: lkpi_iv_newstate: error -5 during state transition 5 (RUN) -> 0 (INIT) iwlwifi0: Queue 5 is active on fifo 3 and stuck for 10000 ms. SW [2, 3] HW [2, 3] FH TRB=0x080305001 iwlwifi0: Microcode SW error detected. Restarting 0x0. iwlwifi0: Start IWL Error Log Dump: iwlwifi0: Transport status: 0x0000004A, valid: 6 iwlwifi0: Loaded firmware version: 46.ff18e32a.0 9260-th-b0-jf-b0-46.ucode iwlwifi0: 0x00000084 | NMI_INTERRUPT_UNKNOWN iwlwifi0: 0x00A022F0 | trm_hw_status0 iwlwifi0: 0x00000000 | trm_hw_status1 iwlwifi0: 0x00481CEE | branchlink2 etc. On "service netif stop wlan0" the interface disappears. Then "service netif start wlan0" doesn't work with iwlwifi0: lkpi_ic_vap_create: failed to start hw: 17 ifconfig: SIOCIFCREATE2 (wlan0): Input/output error I then recompiled the kernel from scratch with debug symbols (by removing WITHOUT_DEBUG_FILES=yes MK_DEBUG_FILES=no Now it never works, with the same symptoms. And it panics after a while: Fatal trap 9: general protection fault while in kernel mode cpuid = 0; apic id = 00 instruction pointer = 0x20:0xffffffff81f855f2 stack pointer = 0x28:0xfffffe0074175de0 frame pointer = 0x28:0xfffffe0074175de0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 6 (txg_thread_enter) rdi: fffffe006aa4d130 rsi: fffff80003592880 rdx: fffffe006aa4d140 rcx: 7fffffff00000000 r8: 0000000000000000 r9: fffffe0074176000 rax: fffff80004cfcd60 rbx: fffffe006aa4c000 rbp: fffffe0074175de0 r10: 0000000000000000 r11: 000000007fffdb42 r12: fffffe006aa4d130 r13: 0000000000000064 r14: fffffe006aa4d110 r15: fffffe006aaae020 trap number = 9 panic: general protection fault cpuid = 0 time = 1696273625 KDB: stack backtrace: I can try to capture more data now that I can use gdb from the host to the VM.
^Triage: * kern (component) and crash (keyword) for kernel panics. (In reply to Bjoern A. Zeeb from comment #0) > "meta-bug" for collecting all (new) issues after the iwlwifi > driver and firmware update 2023-09-21. Would you like people's different cases to be mixed into this one report, or separated out?
(In reply to Graham Perrin from comment #5) Feel free to put them here.
(In reply to Bakul Shah from comment #4) (1) the "etc." bit is actually important. (2) given the firmware version this is a 9xxx? or 8xxx? I cannot remember what device you had? (3) iwlwifi0: lkpi_ic_vap_create: failed to start hw: 17 is understood and mentioned in https://cgit.FreeBSD.org/src/commit/?id=dbf7691999abe501e0ebc0fe4d8d9e97718d3890
(In reply to Bjoern A. Zeeb from comment #7) # (3) should be fixed by https://cgit.FreeBSD.org/src/commit/?id=6c38c6b1b917957d420902213f318bf0153214f2
(2) Device: iwlwifi0@pci0:0:7:0: class=0x028000 rev=0x29 hdr=0x00 vendor=0x8086 device=0x2526 subvendor=0x8086 subdevice=0x0014 vendor = 'Intel Corporation' device = 'Wireless-AC 9260' class = network (1) What the system reports (I haven't applied your latest fix): Autoloading module: if_iwlwifi Intel(R) Wireless WiFi based driver for FreeBSD iwlwifi0: <iwlwifi> mem 0xc1034000-0xc1037fff at device 7.0 on pci0 iwlwifi0: Detected crf-id 0x2816, cnv-id 0x1000200 wfpm id 0x80000000 iwlwifi0: PCI dev 2526/0014, rev=0x321, rfid=0x105110 iwlwifi0: successfully loaded firmware image 'iwlwifi-9260-th-b0-jf-b0-46.ucode' iwlwifi0: WRT: Overriding region id 0 iwlwifi0: WRT: Overriding region id 1 iwlwifi0: WRT: Overriding region id 2 iwlwifi0: WRT: Overriding region id 3 iwlwifi0: WRT: Overriding region id 4 iwlwifi0: WRT: Overriding region id 6 iwlwifi0: WRT: Overriding region id 8 iwlwifi0: WRT: Overriding region id 9 iwlwifi0: WRT: Overriding region id 10 iwlwifi0: WRT: Overriding region id 11 iwlwifi0: WRT: Overriding region id 15 iwlwifi0: WRT: Overriding region id 16 iwlwifi0: WRT: Overriding region id 18 iwlwifi0: WRT: Overriding region id 19 iwlwifi0: WRT: Overriding region id 20 iwlwifi0: WRT: Overriding region id 21 iwlwifi0: WRT: Overriding region id 28 iwlwifi0: loaded firmware version 46.ff18e32a.0 9260-th-b0-jf-b0-46.ucode op_mode iwlmvm iwlwifi0: Detected Intel(R) Wireless-AC 9260 160MHz, REV=0x321 iwlwifi0: SecBoot CPU1 Status: 0x3000001, CPU2 Status: 0x0 iwlwifi0: WFPM_ARC1_PD_NOTIFICATION: 0x2f iwlwifi0: HPM_SECONDARY_DEVICE_STATE: 0x42 iwlwifi0: WFPM_MAC_OTP_CFG7_ADDR: 0x0 iwlwifi0: WFPM_MAC_OTP_CFG7_DATA: 0x4 iwlwifi0: UMAC PC: 0xc0080000 iwlwifi0: LMAC PC: 0x605dc iwlwifi0: WRT: Collecting data: ini trigger 13 fired (delay=0ms). iwlwifi0: Not valid error log pointer 0x00000000 for Init uCode iwlwifi0: IML/ROM dump: iwlwifi0: 0x00000000 | IML/ROM error/state iwlwifi0: 0x03000001 | IML/ROM data1 iwlwifi0: Fseq Registers: iwlwifi0: 0xE3667178 | FSEQ_ERROR_CODE iwlwifi0: 0x00000000 | FSEQ_TOP_INIT_VERSION iwlwifi0: 0xDCA958AE | FSEQ_CNVIO_INIT_VERSION iwlwifi0: 0x0000A371 | FSEQ_OTP_VERSION iwlwifi0: 0x3DDC2BB5 | FSEQ_TOP_CONTENT_VERSION iwlwifi0: 0xE6DEEC56 | FSEQ_ALIVE_TOKEN iwlwifi0: 0x2D435D4F | FSEQ_CNVI_ID iwlwifi0: 0x70DAA327 | FSEQ_CNVR_ID iwlwifi0: 0x01000200 | CNVI_AUX_MISC_CHIP iwlwifi0: 0x01300202 | CNVR_AUX_MISC_CHIP iwlwifi0: 0x0000485B | CNVR_SCU_SD_REGS_SD_REG_DIG_DCDC_VTRIM iwlwifi0: 0x0BADCAFE | CNVR_SCU_SD_REGS_SD_REG_ACTIVE_VDIG_MIRROR iwlwifi0: 0xC90E5BC2 | FSEQ_PREV_CNVIO_INIT_VERSION iwlwifi0: 0xEB17CF2B | FSEQ_WIFI_FSEQ_VERSION iwlwifi0: 0x03176B8B | FSEQ_BT_FSEQ_VERSION iwlwifi0: 0xC27A3CBE | FSEQ_CLASS_TP_VERSION iwlwifi0: Failed to start INIT ucode: -60 iwlwifi0: WRT: Collecting data: ini trigger 13 fired (delay=0ms). iwlwifi0: Failed to run INIT ucode: -60 iwlwifi0: retry init count 0 iwlwifi0: Detected Intel(R) Wireless-AC 9260 160MHz, REV=0x321 Invalid rxb from HW 0 iwlwifi0: Microcode SW error detected. Restarting 0x0. iwlwifi0: Not valid error log pointer 0x00000000 for Init uCode iwlwifi0: IML/ROM dump: iwlwifi0: 0x00002320 | IML/ROM error/state iwlwifi0: 0x00000003 | IML/ROM data1 iwlwifi0: Fseq Registers: iwlwifi0: 0xE3667178 | FSEQ_ERROR_CODE iwlwifi0: 0x00000000 | FSEQ_TOP_INIT_VERSION iwlwifi0: 0xDCA958AE | FSEQ_CNVIO_INIT_VERSION iwlwifi0: 0x0000A371 | FSEQ_OTP_VERSION iwlwifi0: 0x3DDC2BB5 | FSEQ_TOP_CONTENT_VERSION iwlwifi0: 0xE6DEEC56 | FSEQ_ALIVE_TOKEN iwlwifi0: 0x2D435D4F | FSEQ_CNVI_ID iwlwifi0: 0x70DAA327 | FSEQ_CNVR_ID iwlwifi0: 0x01000200 | CNVI_AUX_MISC_CHIP iwlwifi0: 0x01300202 | CNVR_AUX_MISC_CHIP iwlwifi0: 0x0000485B | CNVR_SCU_SD_REGS_SD_REG_DIG_DCDC_VTRIM iwlwifi0: 0x0BADCAFE | CNVR_SCU_SD_REGS_SD_REG_ACTIVE_VDIG_MIRROR iwlwifi0: 0xC90E5BC2 | FSEQ_PREV_CNVIO_INIT_VERSION iwlwifi0: 0xEB17CF2B | FSEQ_WIFI_FSEQ_VERSION iwlwifi0: 0x03176B8B | FSEQ_BT_FSEQ_VERSION iwlwifi0: 0xC27A3CBE | FSEQ_CLASS_TP_VERSION Invalid rxb from HW 0 iwlwifi0: SecBoot CPU1 Status: 0x3, CPU2 Status: 0x2320 iwlwifi0: WFPM_ARC1_PD_NOTIFICATION: 0x20 iwlwifi0: HPM_SECONDARY_DEVICE_STATE: 0x42 iwlwifi0: WFPM_MAC_OTP_CFG7_ADDR: 0x0 iwlwifi0: WFPM_MAC_OTP_CFG7_DATA: 0x4 iwlwifi0: UMAC PC: 0x8044f384 iwlwifi0: LMAC PC: 0xe8 iwlwifi0: WRT: Collecting data: ini trigger 13 fired (delay=0ms). iwlwifi0: Failed to start INIT ucode: -60 iwlwifi0: WRT: Collecting data: ini trigger 13 fired (delay=0ms). iwlwifi0: Failed to run INIT ucode: -60 iwlwifi0: retry init count 1 iwlwifi0: Detected Intel(R) Wireless-AC 9260 160MHz, REV=0x321 iwlwifi0: base HW address: 8c:a9:82:fc:e8:9c, OTP minor version: 0x4 ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/local/lib /usr/local/lib/compat/pkg /usr/local/lib/compat/pkg /usr/local/lib/perl5/5.34/mach/CORE /usr/local/llvm15/lib 32-bit compatibility ldconfig path: /usr/lib32 Setting hostname: fbsd15.bitblocks.com. Setting up harvesting: PURE_RDRAND,[CALLOUT],[UMA],[FS_ATIME],SWI,INTERRUPT,NET_NG,[NET_ETHER],NET_TUN,MOUSE,KEYBOARD,ATTACH,CACHED Feeding entropy: . wlan0: Ethernet address: 8c:a9:82:fc:e8:9c Created wlan(4) interfaces: wlan0. lo0: link state changed to UP Starting wpa_supplicant. Starting Network: lo0 em0 wlan0. lo0: flags=1008049<UP,LOOPBACK,RUNNING,MULTICAST,LOWER_UP> metric 0 mtu 16384 options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6> inet 127.0.0.1 netmask 0xff000000 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2 groups: lo nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> em0: flags=1008802<BROADCAST,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500 options=4e504bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,LRO,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,HWSTATS,MEXTPG> ether 58:9c:fc:0c:17:bb media: Ethernet autoselect (1000baseT <full-duplex>) status: active nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> ... panic: lkpi_sta_scan_to_auth: lsta 0xfffff800049b5000 state not NOTEXIST: 0x1 cpuid = 0 time = 1696343345 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00741dfb70 vpanic() at vpanic+0x132/frame 0xfffffe00741dfca0 panic() at panic+0x43/frame 0xfffffe00741dfd00 lkpi_sta_scan_to_auth() at lkpi_sta_scan_to_auth+0x602/frame 0xfffffe00741dfd80 lkpi_iv_newstate() at lkpi_iv_newstate+0x253/frame 0xfffffe00741dfdf0 ieee80211_newstate_cb() at ieee80211_newstate_cb+0x1e7/frame 0xfffffe00741dfe40 taskqueue_run_locked() at taskqueue_run_locked+0xab/frame 0xfffffe00741dfec0 taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame 0xfffffe00741dfef0 fork_exit() at fork_exit+0x82/frame 0xfffffe00741dff30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00741dff30 --- trap 0x16, rip = 0x2fcfc427ccba, rsp = 0x2fcfca331f48, rbp = 0x2fcfca331f60 --- KDB: enter: panic [ thread pid 0 tid 100264 ] Stopped at kdb_enter+0x32: movq $0,0xe2aa13(%rip) db>
Running latest kernel. Now it panic every time. Resetting the pci slot on the host doesn't help. The system does come up multiuser but after a few seconds panics. Autoloading module: if_iwlwifi Intel(R) Wireless WiFi based driver for FreeBSD iwlwifi0: <iwlwifi> mem 0xc1034000-0xc1037fff at device 7.0 on pci0 iwlwifi0: Detected crf-id 0x2816, cnv-id 0x1000200 wfpm id 0x80000000 iwlwifi0: PCI dev 2526/0014, rev=0x321, rfid=0x105110 iwlwifi0: successfully loaded firmware image 'iwlwifi-9260-th-b0-jf-b0-46.ucode' iwlwifi0: WRT: Overriding region id 0 iwlwifi0: WRT: Overriding region id 1 iwlwifi0: WRT: Overriding region id 2 iwlwifi0: WRT: Overriding region id 3 iwlwifi0: WRT: Overriding region id 4 iwlwifi0: WRT: Overriding region id 6 iwlwifi0: WRT: Overriding region id 8 iwlwifi0: WRT: Overriding region id 9 iwlwifi0: WRT: Overriding region id 10 iwlwifi0: WRT: Overriding region id 11 iwlwifi0: WRT: Overriding region id 15 iwlwifi0: WRT: Overriding region id 16 iwlwifi0: WRT: Overriding region id 18 iwlwifi0: WRT: Overriding region id 19 iwlwifi0: WRT: Overriding region id 20 iwlwifi0: WRT: Overriding region id 21 iwlwifi0: WRT: Overriding region id 28 iwlwifi0: loaded firmware version 46.ff18e32a.0 9260-th-b0-jf-b0-46.ucode op_mode iwlmvm iwlwifi0: Detected Intel(R) Wireless-AC 9260 160MHz, REV=0x321 iwlwifi0: SecBoot CPU1 Status: 0xa5a5a5a2, CPU2 Status: 0xa5a5a5a2 iwlwifi0: WFPM_ARC1_PD_NOTIFICATION: 0xa5a5a5a2 iwlwifi0: HPM_SECONDARY_DEVICE_STATE: 0xa5a5a5a2 iwlwifi0: WFPM_MAC_OTP_CFG7_ADDR: 0xa5a5a5a2 iwlwifi0: WFPM_MAC_OTP_CFG7_DATA: 0xa5a5a5a2 iwlwifi0: UMAC PC: 0xa5a5a5a2 iwlwifi0: LMAC PC: 0xa5a5a5a2 iwlwifi0: WRT: Collecting data: ini trigger 13 fired (delay=0ms). iwlwifi0: Not valid error log pointer 0x00000000 for Init uCode iwlwifi0: Hardware error detected. Restarting. iwlwifi0: IML/ROM dump: iwlwifi0: 0xA5A5 | IML/ROM SYSASSERT iwlwifi0: 0xA5A5A5A2 | IML/ROM error/state iwlwifi0: 0xA5A5A5A2 | IML/ROM data1 iwlwifi0: Fseq Registers: iwlwifi0: 0xA5A5A5A2 | FSEQ_ERROR_CODE iwlwifi0: 0xA5A5A5A2 | FSEQ_TOP_INIT_VERSION iwlwifi0: 0xA5A5A5A2 | FSEQ_CNVIO_INIT_VERSION iwlwifi0: 0xA5A5A5A2 | FSEQ_OTP_VERSION iwlwifi0: 0xA5A5A5A2 | FSEQ_TOP_CONTENT_VERSION iwlwifi0: 0xA5A5A5A2 | FSEQ_ALIVE_TOKEN iwlwifi0: 0xA5A5A5A2 | FSEQ_CNVI_ID iwlwifi0: 0xA5A5A5A2 | FSEQ_CNVR_ID iwlwifi0: 0xA5A5A5A2 | CNVI_AUX_MISC_CHIP iwlwifi0: 0xA5A5A5A2 | CNVR_AUX_MISC_CHIP iwlwifi0: 0xA5A5A5A2 | CNVR_SCU_SD_REGS_SD_REG_DIG_DCDC_VTRIM iwlwifi0: 0xA5A5A5A2 | CNVR_SCU_SD_REGS_SD_REG_ACTIVE_VDIG_MIRROR iwlwifi0: 0xA5A5A5A2 | FSEQ_PREV_CNVIO_INIT_VERSION iwlwifi0: 0xA5A5A5A2 | FSEQ_WIFI_FSEQ_VERSION iwlwifi0: 0xA5A5A5A2 | FSEQ_BT_FSEQ_VERSION iwlwifi0: 0xA5A5A5A2 | FSEQ_CLASS_TP_VERSION iwlwifi0: Failed to start INIT ucode: -60 iwlwifi0: WRT: Collecting data: ini trigger 13 fired (delay=0ms). iwlwifi0: Hardware error detected. Restarting. iwlwifi0: Hardware error detected. Restarting. iwlwifi0: WRT: Failed to dump region: id=1, type=10 iwlwifi0: Hardware error detected. Restarting. iwlwifi0: Hardware error detected. Restarting. iwlwifi0: Hardware error detected. Restarting. iwlwifi0: Hardware error detected. Restarting. iwlwifi0: Hardware error detected. Restarting. iwlwifi0: WRT: Failed to dump region: id=21, type=10 iwlwifi0: WRT: Failed to dump region: id=1, type=10 iwlwifi0: WRT: Failed to dump region: id=21, type=10 iwlwifi0: Failing on timeout while stopping DMA channel 8 [0xa5a5a5a2] iwlwifi0: Failed to run INIT ucode: -60 iwlwifi0: retry init count 0 iwlwifi0: Detected Intel(R) Wireless-AC 9260 160MHz, REV=0x321 iwlwifi0: base HW address: 8c:a9:82:fc:e8:9c, OTP minor version: 0x4 ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/local/lib /usr/local/lib/compat/pkg /usr/local/lib/compat/pkg /usr/local/lib/perl5/5.34/mach/CORE /usr/local/llvm15/lib 32-bit compatibility ldconfig path: /usr/lib32 Setting hostname: fbsd15.bitblocks.com. Setting up harvesting: PURE_RDRAND,[CALLOUT],[UMA],[FS_ATIME],SWI,INTERRUPT,NET_NG,[NET_ETHER],NET_TUN,MOUSE,KEYBOARD,ATTACH,CACHED Feeding entropy: . wlan0: Ethernet address: 8c:a9:82:fc:e8:9c Created wlan(4) interfaces: wlan0. [comes up multiuser] Oct 4 10:20:14 fbsd15 ntpd[887]: error resolving pool 0.freebsd.pool.ntp.org: Name does not resolve (8) Oct 4 10:20:15 fbsd15 ntpd[887]: error resolving pool 2.freebsd.pool.ntp.org: Name does not resolve (8) iwlwifi0: No beacon heard and the time event is over already... Oct 4 10:20:21 fbsd15 wpa_supplicant[352]: ioctl[SIOCS80211, op=20, val=0, arg_len=7]: Can't assign requested address iwlwifi0: Couldn't drain frames for staid 0, status 0x8 iwlwifi0: lkpi_iv_newstate: error -5 during state transition 5 (RUN) -> 0 (INIT) panic: lkpi_sta_scan_to_auth: lsta 0xfffff80004419000 state not NOTEXIST: 0x1 cpuid = 0 time = 1696440022 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00741dfb70 vpanic() at vpanic+0x132/frame 0xfffffe00741dfca0 panic() at panic+0x43/frame 0xfffffe00741dfd00 lkpi_sta_scan_to_auth() at lkpi_sta_scan_to_auth+0x602/frame 0xfffffe00741dfd80 lkpi_iv_newstate() at lkpi_iv_newstate+0x253/frame 0xfffffe00741dfdf0 ieee80211_newstate_cb() at ieee80211_newstate_cb+0x1e7/frame 0xfffffe00741dfe40 taskqueue_run_locked() at taskqueue_run_locked+0xab/frame 0xfffffe00741dfec0 taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame 0xfffffe00741dfef0 fork_exit() at fork_exit+0x82/frame 0xfffffe00741dff30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00741dff30 --- trap 0x16, rip = 0x6bb99f17cba, rsp = 0x6bb9f531f48, rbp = 0x6bb9f531f60 --- KDB: enter: panic [ thread pid 0 tid 100264 ] Stopped at kdb_enter+0x32: movq $0,0xe2a9d3(%rip) db>
(In reply to Bakul Shah from comment #10) Does this also happen if your power off your *host* for a few seconds and boot up again? Given you say you have to reset PCI on the host (and now that doesn't help either) I don't want to rule out a bhyve/passthru problem. Which FreeBSD version are you running on the host? iwlwifi0: Detected Intel(R) Wireless-AC 9260 160MHz, REV=0x321 iwlwifi0: SecBoot CPU1 Status: 0xa5a5a5a2, CPU2 Status: 0xa5a5a5a2 ^^^^^^^^^^^^^ 0xa5a5a5a2 everywhere is a different problem highly likely unrelated to iwlwifi or linuxkpi.
It came up fine after I rebooted the host. But crashed on first "service netif restart wlan0" service netif restart wlan0 Stopping wpa_supplicant. iwlwifi0: Couldn't drain frames for staid 0, status 0x8 iwlwifi0: lkpi_iv_newstate: error -5 during state transition 5 (RUN) -> 0 (INIT) Oct 6 07:33:31 fbsd15 dhclient[5603]: Interface wlan0 is down, dhclient exiting Oct 6 07:33:31 fbsd15 dhclient[5603]: connection closed Oct 6 07:33:31 fbsd15 dhclient[5603]: exiting. Stopping Network: wlan0. wlan0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=0 ether 8c:a9:82:fc:e8:9c groups: wlan ssid "" channel 8 (2447 MHz 11g) regdomain FCC country US authmode OPEN privacy OFF txpower 30 bmiss 7 scanvalid 60 protmode CTS wme parent interface: iwlwifi0 media: IEEE 802.11 Wireless Ethernet autoselect (autoselect) status: no carrier nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> Fatal trap 9: general protection fault while in kernel mode cpuid = 0; apic id = 00 instruction pointer = 0x20:0xffffffff80cf1871 stack pointer = 0x28:0xfffffe0074ad8ab0 frame pointer = 0x28:0xfffffe0074ad8ac0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 17317 (ifconfig) rdi: fffffe007cb81000 rsi: fffff8000103daa0 rdx: 0000000000000005 rcx: fffff8007e667c80 r8: fffff800bac2d788 r9: 00000000baba7800 rax: deadc0dedeadc0de rbx: fffffe007cb81000 rbp: fffffe0074ad8ac0 r10: 0000000000000000 r11: 0000000000010000 r12: fffffe007547c038 r13: fffffe007547c000 r14: deadc0dedeadc0de r15: fffff80001773800 trap number = 9 panic: general protection fault cpuid = 0 time = 1696602811 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0074ad87f0 vpanic() at vpanic+0x132/frame 0xfffffe0074ad8920 panic() at panic+0x43/frame 0xfffffe0074ad8980 trap_fatal() at trap_fatal+0x40c/frame 0xfffffe0074ad89e0 calltrap() at calltrap+0x8/frame 0xfffffe0074ad89e0 --- trap 0x9, rip = 0xffffffff80cf1871, rsp = 0xfffffe0074ad8ab0, rbp = 0xfffffe0074ad8ac0 --- node_free() at node_free+0x11/frame 0xfffffe0074ad8ac0 ieee80211_node_vdetach() at ieee80211_node_vdetach+0x2b/frame 0xfffffe0074ad8ae0 ieee80211_vap_detach() at ieee80211_vap_detach+0x612/frame 0xfffffe0074ad8b20 lkpi_ic_vap_delete() at lkpi_ic_vap_delete+0xae/frame 0xfffffe0074ad8b50 wlan_clone_destroy() at wlan_clone_destroy+0x12/frame 0xfffffe0074ad8b60 if_clone_destroyif_flags() at if_clone_destroyif_flags+0x6a/frame 0xfffffe0074ad8ba0 if_clone_destroy() at if_clone_destroy+0x100/frame 0xfffffe0074ad8be0 ifioctl() at ifioctl+0x8a5/frame 0xfffffe0074ad8cd0 kern_ioctl() at kern_ioctl+0x286/frame 0xfffffe0074ad8d30 sys_ioctl() at sys_ioctl+0x152/frame 0xfffffe0074ad8e00 amd64_syscall() at amd64_syscall+0x153/frame 0xfffffe0074ad8f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0074ad8f30 --- syscall (54, FreeBSD ELF64, ioctl), rip = 0x3c0f3a32629a, rsp = 0x3c0f37307a48, rbp = 0x3c0f37307a80 --- KDB: enter: panic [ thread pid 17317 tid 100521 ] Stopped at kdb_enter+0x32: movq $0,0xe2a953(%rip) db>
(In reply to Bakul Shah from comment #12) that's the known one currently; n80211 node ref cnt or race; that's a wip
Using "bhyve -G 1234" I was able to glean some more details. "service restart wlan0" triggers a panic. The relevant backtrace part: #16 <signal handler called> #17 ieee80211_ratectl_node_deinit (ni=0xfffffe0075cb2000) at /home/FreeBSD/current/sys/net80211/ieee80211_ratectl.h:127 #18 node_free (ni=0xfffffe0075cb2000) at /home/FreeBSD/current/sys/net80211/ieee80211_node.c:1301 #19 0xffffffff80cf237b in ieee80211_node_vdetach ( vap=vap@entry=0xfffffe00757e9010) at /home/FreeBSD/current/sys/net80211/ieee80211_node.c:206 #20 0xffffffff80cc52d2 in ieee80211_vap_detach ( vap=vap@entry=0xfffffe00757e9010) poking around with gdb, notice the value of vap, which seems to be uninitialized! ni seems ok. Seems weird! Is this a race condition or some precondition not being checked or gdb lying? I will continue looking. Should this be reported on some other bug#? (gdb) f 17 #17 ieee80211_ratectl_node_deinit (ni=0xfffffe0075cb2000) at /home/FreeBSD/current/sys/net80211/ieee80211_ratectl.h:127 127 vap->iv_rate->ir_node_deinit(ni); (gdb) l 122 static __inline void 123 ieee80211_ratectl_node_deinit(struct ieee80211_node *ni) 124 { 125 const struct ieee80211vap *vap = ni->ni_vap; 126 127 vap->iv_rate->ir_node_deinit(ni); 128 } 129 130 static int __inline 131 ieee80211_ratectl_rate(struct ieee80211_node *ni, void *arg, uint32_t iarg) (gdb) p vap $6 = (const struct ieee80211vap *) 0xdeadc0dedeadc0de
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=0936c648ad0ee5152dc19f261e77fe9c1833fe05 commit 0936c648ad0ee5152dc19f261e77fe9c1833fe05 Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2024-02-05 14:51:08 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-14 19:48:04 +0000 LinuxKPI: 802.11: update the ni/lsta reference cycle Update the ni/lsta reference cycle, add extra checks and assertions. This is to accomodate problems we were seeing based on net80211 behaviour (join1() and (*iv_update_bss)() as well as state changes for new iv_bss nodes during an active session). This should hopefully help to stabilise behaviour until the underlying problems gets properly addressed (for this and all other device drivers). PR: 272607, 273985, 274003 MFC after: 3 days Reviewed by: cc Differential Revision: https://reviews.freebsd.org/D43753 sys/compat/linuxkpi/common/src/linux_80211.c | 209 +++++++++++++++++---------- sys/compat/linuxkpi/common/src/linux_80211.h | 1 + 2 files changed, 130 insertions(+), 80 deletions(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=2ac8a2189ac6707f48f77ef2e36baf696a0d2f40 commit 2ac8a2189ac6707f48f77ef2e36baf696a0d2f40 Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2024-02-03 16:33:56 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-14 19:47:53 +0000 LinuxKPI: 802.11: band-aid for invalid state changes after (*iv_update_bss) With firmware based solutions we cannot just jump from an active session to a new iv_bss node without tearing down state for the old and bringing up the new node. This likely used to work on softmac based cards/drivers where one could essentially set the state and fire at will. We track (*iv_update_bss) calls from net80211 and set a local flag that we are out of synch and do not allow any further operations up the state machine until we hit INIT or SCAN. That means someone will take the state down, clean up firmware state and then we can join again and build up state. Apparently this problem has been "known" for a while as native iwm(4) and others have similar workarounds (though less strict) and can be equally pestered into bad states. For LinuxKPI all the KASSERTs just massively brought this problem out. The solution will be some rewrites in net80211. Until then, try to keep us more stable at least and not die on second join1() calls triggered by service netif start wlan0 and similar. PR: 271979, 271988, 275255, 263613, 274003 Sponsored by: The FreeBSD Foundation (2023, partial) MFC after: 3 days Reviewed by: cc Differential Revision: https://reviews.freebsd.org/D43725 sys/compat/linuxkpi/common/src/linux_80211.c | 309 +++++++++++++++++++-------- sys/compat/linuxkpi/common/src/linux_80211.h | 2 + 2 files changed, 216 insertions(+), 95 deletions(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=713db49d06deee90dd358b2e4b9ca05368a5eaf6 commit 713db49d06deee90dd358b2e4b9ca05368a5eaf6 Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2024-01-10 10:14:16 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-14 19:47:21 +0000 net80211: deal with lost state transitions Since 5efea30f039c4 we can possibly lose a state transition which can cause trouble further down the road. The reproducer from 643d6dce6c1e can trigger these for example. Drivers for firmware based wireless cards have worked around some of this (and other) problems in the past. Add an array of tasks rather than a single one as we would simply get npending > 1 and lose order with other tasks. Try to keep state changes updated as queued in case we end up with more than one at a time. While this is not ideal either (call it a hack) it will sort the problem for now. We will queue in ieee80211_new_state_locked() and do checks there and dequeue in ieee80211_newstate_cb(). If we still overrun the (currently) 8 slots we will drop the state change rather than overwrite the last one. When dequeing we will update iv_nstate and keep it around for historic reasons for the moment. The longer term we should make the callers of ieee80211_new_state[_locked]() actually use the returned errors and act appropriately but that will touch a lot more places and drivers (possibly incl. changed behaviour for ioctls). rtwn(4) and rum(4) should probably be revisted and net80211 internals removed (for rum(4) at least the current logic still seems prone to races). PR: 271979, 271988, 275255, 263613, 274003 Sponsored by: The FreeBSD Foundation (in 2023) MFC after: 3 days Reviewed by: cc Differential Revision: https://reviews.freebsd.org/D43389 sys/dev/rtwn/if_rtwn.c | 4 +- sys/dev/usb/wlan/if_rum.c | 4 +- sys/net80211/ieee80211.c | 4 +- sys/net80211/ieee80211_ddb.c | 13 ++++- sys/net80211/ieee80211_proto.c | 124 ++++++++++++++++++++++++++++++++++------- sys/net80211/ieee80211_var.h | 13 ++++- 6 files changed, 134 insertions(+), 28 deletions(-)
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=12887199b37469c98a47baf66cd3cc182c79fbd6 commit 12887199b37469c98a47baf66cd3cc182c79fbd6 Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2024-02-05 14:51:08 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-18 18:31:17 +0000 LinuxKPI: 802.11: update the ni/lsta reference cycle Update the ni/lsta reference cycle, add extra checks and assertions. This is to accomodate problems we were seeing based on net80211 behaviour (join1() and (*iv_update_bss)() as well as state changes for new iv_bss nodes during an active session). This should hopefully help to stabilise behaviour until the underlying problems gets properly addressed (for this and all other device drivers). PR: 272607, 273985, 274003 Reviewed by: cc Differential Revision: https://reviews.freebsd.org/D43753 (cherry picked from commit 0936c648ad0ee5152dc19f261e77fe9c1833fe05) sys/compat/linuxkpi/common/src/linux_80211.c | 209 +++++++++++++++++---------- sys/compat/linuxkpi/common/src/linux_80211.h | 1 + 2 files changed, 130 insertions(+), 80 deletions(-)
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=b392b36d3776b696601ce0253256803276d24ea2 commit b392b36d3776b696601ce0253256803276d24ea2 Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2024-01-10 10:14:16 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-18 18:31:17 +0000 net80211: deal with lost state transitions Since 5efea30f039c4 we can possibly lose a state transition which can cause trouble further down the road. The reproducer from 643d6dce6c1e can trigger these for example. Drivers for firmware based wireless cards have worked around some of this (and other) problems in the past. Add an array of tasks rather than a single one as we would simply get npending > 1 and lose order with other tasks. Try to keep state changes updated as queued in case we end up with more than one at a time. While this is not ideal either (call it a hack) it will sort the problem for now. We will queue in ieee80211_new_state_locked() and do checks there and dequeue in ieee80211_newstate_cb(). If we still overrun the (currently) 8 slots we will drop the state change rather than overwrite the last one. When dequeing we will update iv_nstate and keep it around for historic reasons for the moment. The longer term we should make the callers of ieee80211_new_state[_locked]() actually use the returned errors and act appropriately but that will touch a lot more places and drivers (possibly incl. changed behaviour for ioctls). rtwn(4) and rum(4) should probably be revisted and net80211 internals removed (for rum(4) at least the current logic still seems prone to races). Given this changes the internal structure of 'struct ieee80211vap', which gets allocated by the drivers, and we do not have enough spares, all wireless drivers need to be recompiled. Given we are forced to do the update, we leave fields in the middle of the struct and add more spares at the same time. __FreeBSD_version gets updated to 1400509 to be able to detect this change. PR: 271979, 271988, 275255, 263613, 274003 Sponsored by: The FreeBSD Foundation (in 2023) Reviewed by: cc Differential Revision: https://reviews.freebsd.org/D43389 (cherry picked from commit 713db49d06deee90dd358b2e4b9ca05368a5eaf6) (cherry picked from commit a890a3a5ddf33acb0a4000885945b89156799b07) UPDATING | 6 ++ sys/dev/rtwn/if_rtwn.c | 4 +- sys/dev/usb/wlan/if_rum.c | 4 +- sys/net80211/ieee80211.c | 4 +- sys/net80211/ieee80211_ddb.c | 13 ++++- sys/net80211/ieee80211_proto.c | 124 ++++++++++++++++++++++++++++++++++------- sys/net80211/ieee80211_var.h | 15 +++-- sys/sys/param.h | 2 +- 8 files changed, 142 insertions(+), 30 deletions(-)
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=8c450ea1083b03f30871506b59034f26bc608972 commit 8c450ea1083b03f30871506b59034f26bc608972 Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2024-02-03 16:33:56 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-18 18:31:17 +0000 LinuxKPI: 802.11: band-aid for invalid state changes after (*iv_update_bss) With firmware based solutions we cannot just jump from an active session to a new iv_bss node without tearing down state for the old and bringing up the new node. This likely used to work on softmac based cards/drivers where one could essentially set the state and fire at will. We track (*iv_update_bss) calls from net80211 and set a local flag that we are out of synch and do not allow any further operations up the state machine until we hit INIT or SCAN. That means someone will take the state down, clean up firmware state and then we can join again and build up state. Apparently this problem has been "known" for a while as native iwm(4) and others have similar workarounds (though less strict) and can be equally pestered into bad states. For LinuxKPI all the KASSERTs just massively brought this problem out. The solution will be some rewrites in net80211. Until then, try to keep us more stable at least and not die on second join1() calls triggered by service netif start wlan0 and similar. PR: 271979, 271988, 275255, 263613, 274003 Sponsored by: The FreeBSD Foundation (2023, partial) Reviewed by: cc Differential Revision: https://reviews.freebsd.org/D43725 (cherry picked from commit 2ac8a2189ac6707f48f77ef2e36baf696a0d2f40) sys/compat/linuxkpi/common/src/linux_80211.c | 309 +++++++++++++++++++-------- sys/compat/linuxkpi/common/src/linux_80211.h | 2 + 2 files changed, 216 insertions(+), 95 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=184ccc414686ea32c64f063c081c7cc1adeae7c3 commit 184ccc414686ea32c64f063c081c7cc1adeae7c3 Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2024-02-03 16:33:56 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-19 08:02:02 +0000 LinuxKPI: 802.11: band-aid for invalid state changes after (*iv_update_bss) With firmware based solutions we cannot just jump from an active session to a new iv_bss node without tearing down state for the old and bringing up the new node. This likely used to work on softmac based cards/drivers where one could essentially set the state and fire at will. We track (*iv_update_bss) calls from net80211 and set a local flag that we are out of synch and do not allow any further operations up the state machine until we hit INIT or SCAN. That means someone will take the state down, clean up firmware state and then we can join again and build up state. Apparently this problem has been "known" for a while as native iwm(4) and others have similar workarounds (though less strict) and can be equally pestered into bad states. For LinuxKPI all the KASSERTs just massively brought this problem out. The solution will be some rewrites in net80211. Until then, try to keep us more stable at least and not die on second join1() calls triggered by service netif start wlan0 and similar. PR: 271979, 271988, 275255, 263613, 274003 Sponsored by: The FreeBSD Foundation (2023, partial) Reviewed by: cc Differential Revision: https://reviews.freebsd.org/D43725 (cherry picked from commit 2ac8a2189ac6707f48f77ef2e36baf696a0d2f40) sys/compat/linuxkpi/common/src/linux_80211.c | 309 +++++++++++++++++++-------- sys/compat/linuxkpi/common/src/linux_80211.h | 2 + 2 files changed, 216 insertions(+), 95 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=a7e1fc7f620d3341549c1380f550aaafbdb45622 commit a7e1fc7f620d3341549c1380f550aaafbdb45622 Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2024-01-10 10:14:16 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-19 08:02:01 +0000 net80211: deal with lost state transitions Since 5efea30f039c4 we can possibly lose a state transition which can cause trouble further down the road. The reproducer from 643d6dce6c1e can trigger these for example. Drivers for firmware based wireless cards have worked around some of this (and other) problems in the past. Add an array of tasks rather than a single one as we would simply get npending > 1 and lose order with other tasks. Try to keep state changes updated as queued in case we end up with more than one at a time. While this is not ideal either (call it a hack) it will sort the problem for now. We will queue in ieee80211_new_state_locked() and do checks there and dequeue in ieee80211_newstate_cb(). If we still overrun the (currently) 8 slots we will drop the state change rather than overwrite the last one. When dequeing we will update iv_nstate and keep it around for historic reasons for the moment. The longer term we should make the callers of ieee80211_new_state[_locked]() actually use the returned errors and act appropriately but that will touch a lot more places and drivers (possibly incl. changed behaviour for ioctls). rtwn(4) and rum(4) should probably be revisted and net80211 internals removed (for rum(4) at least the current logic still seems prone to races). PR: 271979, 271988, 275255, 263613, 274003 Sponsored by: The FreeBSD Foundation (in 2023) Reviewed by: cc Differential Revision: https://reviews.freebsd.org/D43389 (cherry picked from commit 713db49d06deee90dd358b2e4b9ca05368a5eaf6) Given this changes the internal structure of 'struct ieee80211vap', which gets allocated by the drivers, and we do not have enough spares, all wireless drivers need to be recompiled. Given we are forced to do the update, we leave fields in the middle of the struct and add more spares at the same time. __FreeBSD_version gets updated to 1303501 to be able to detect this change. (cherry picked from commit a890a3a5ddf33acb0a4000885945b89156799b07) UPDATING | 6 ++ sys/dev/rtwn/if_rtwn.c | 4 +- sys/dev/usb/wlan/if_rum.c | 4 +- sys/net80211/ieee80211.c | 4 +- sys/net80211/ieee80211_ddb.c | 15 ++++- sys/net80211/ieee80211_proto.c | 124 ++++++++++++++++++++++++++++++++++------- sys/net80211/ieee80211_var.h | 18 +++--- sys/sys/param.h | 2 +- 8 files changed, 143 insertions(+), 34 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=223edc1a3c2fc86dbc7fa0ecd00f26a85d7c7b43 commit 223edc1a3c2fc86dbc7fa0ecd00f26a85d7c7b43 Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2024-02-05 14:51:08 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-19 08:02:02 +0000 LinuxKPI: 802.11: update the ni/lsta reference cycle Update the ni/lsta reference cycle, add extra checks and assertions. This is to accomodate problems we were seeing based on net80211 behaviour (join1() and (*iv_update_bss)() as well as state changes for new iv_bss nodes during an active session). This should hopefully help to stabilise behaviour until the underlying problems gets properly addressed (for this and all other device drivers). PR: 272607, 273985, 274003 Reviewed by: cc Differential Revision: https://reviews.freebsd.org/D43753 (cherry picked from commit 0936c648ad0ee5152dc19f261e77fe9c1833fe05) sys/compat/linuxkpi/common/src/linux_80211.c | 209 +++++++++++++++++---------- sys/compat/linuxkpi/common/src/linux_80211.h | 1 + 2 files changed, 130 insertions(+), 80 deletions(-)
A commit in branch releng/13.3 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=9b2da4bc5a68294bc1dcfdd0d0ccadf747bafd67 commit 9b2da4bc5a68294bc1dcfdd0d0ccadf747bafd67 Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2024-02-05 14:51:08 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-19 16:09:22 +0000 LinuxKPI: 802.11: update the ni/lsta reference cycle Update the ni/lsta reference cycle, add extra checks and assertions. This is to accomodate problems we were seeing based on net80211 behaviour (join1() and (*iv_update_bss)() as well as state changes for new iv_bss nodes during an active session). This should hopefully help to stabilise behaviour until the underlying problems gets properly addressed (for this and all other device drivers). Approved by: re (cperciva) PR: 272607, 273985, 274003 Reviewed by: cc Differential Revision: https://reviews.freebsd.org/D43753 (cherry picked from commit 0936c648ad0ee5152dc19f261e77fe9c1833fe05) (cherry picked from commit 223edc1a3c2fc86dbc7fa0ecd00f26a85d7c7b43) sys/compat/linuxkpi/common/src/linux_80211.c | 209 +++++++++++++++++---------- sys/compat/linuxkpi/common/src/linux_80211.h | 1 + 2 files changed, 130 insertions(+), 80 deletions(-)
A commit in branch releng/13.3 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=d4b4efc6db6c6c3a9abf2f187ba1ccc0e40028cf commit d4b4efc6db6c6c3a9abf2f187ba1ccc0e40028cf Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2024-02-03 16:33:56 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-19 16:09:22 +0000 LinuxKPI: 802.11: band-aid for invalid state changes after (*iv_update_bss) With firmware based solutions we cannot just jump from an active session to a new iv_bss node without tearing down state for the old and bringing up the new node. This likely used to work on softmac based cards/drivers where one could essentially set the state and fire at will. We track (*iv_update_bss) calls from net80211 and set a local flag that we are out of synch and do not allow any further operations up the state machine until we hit INIT or SCAN. That means someone will take the state down, clean up firmware state and then we can join again and build up state. Apparently this problem has been "known" for a while as native iwm(4) and others have similar workarounds (though less strict) and can be equally pestered into bad states. For LinuxKPI all the KASSERTs just massively brought this problem out. The solution will be some rewrites in net80211. Until then, try to keep us more stable at least and not die on second join1() calls triggered by service netif start wlan0 and similar. Approved by: re (cperciva) PR: 271979, 271988, 275255, 263613, 274003 Sponsored by: The FreeBSD Foundation (2023, partial) Reviewed by: cc Differential Revision: https://reviews.freebsd.org/D43725 (cherry picked from commit 2ac8a2189ac6707f48f77ef2e36baf696a0d2f40) (cherry picked from commit 184ccc414686ea32c64f063c081c7cc1adeae7c3) sys/compat/linuxkpi/common/src/linux_80211.c | 309 +++++++++++++++++++-------- sys/compat/linuxkpi/common/src/linux_80211.h | 2 + 2 files changed, 216 insertions(+), 95 deletions(-)
A commit in branch releng/13.3 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=9b998db87c28356fce21784c4f8bfb8737615e1f commit 9b998db87c28356fce21784c4f8bfb8737615e1f Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2024-01-10 10:14:16 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-19 16:07:20 +0000 net80211: deal with lost state transitions Since 5efea30f039c4 we can possibly lose a state transition which can cause trouble further down the road. The reproducer from 643d6dce6c1e can trigger these for example. Drivers for firmware based wireless cards have worked around some of this (and other) problems in the past. Add an array of tasks rather than a single one as we would simply get npending > 1 and lose order with other tasks. Try to keep state changes updated as queued in case we end up with more than one at a time. While this is not ideal either (call it a hack) it will sort the problem for now. We will queue in ieee80211_new_state_locked() and do checks there and dequeue in ieee80211_newstate_cb(). If we still overrun the (currently) 8 slots we will drop the state change rather than overwrite the last one. When dequeing we will update iv_nstate and keep it around for historic reasons for the moment. The longer term we should make the callers of ieee80211_new_state[_locked]() actually use the returned errors and act appropriately but that will touch a lot more places and drivers (possibly incl. changed behaviour for ioctls). rtwn(4) and rum(4) should probably be revisted and net80211 internals removed (for rum(4) at least the current logic still seems prone to races). PR: 271979, 271988, 275255, 263613, 274003 Sponsored by: The FreeBSD Foundation (in 2023) Reviewed by: cc Differential Revision: https://reviews.freebsd.org/D43389 (cherry picked from commit 713db49d06deee90dd358b2e4b9ca05368a5eaf6) Given this changes the internal structure of 'struct ieee80211vap', which gets allocated by the drivers, and we do not have enough spares, all wireless drivers need to be recompiled. Given we are forced to do the update, we leave fields in the middle of the struct and add more spares at the same time. __FreeBSD_version will get updated to 1303001 to be able to detect this change. Approved by: re (cperciva) (cherry picked from commit a890a3a5ddf33acb0a4000885945b89156799b07) (cherry picked from commit a7e1fc7f620d3341549c1380f550aaafbdb45622) sys/dev/rtwn/if_rtwn.c | 4 +- sys/dev/usb/wlan/if_rum.c | 4 +- sys/net80211/ieee80211.c | 4 +- sys/net80211/ieee80211_ddb.c | 15 ++++- sys/net80211/ieee80211_proto.c | 124 ++++++++++++++++++++++++++++++++++------- sys/net80211/ieee80211_var.h | 18 +++--- 6 files changed, 136 insertions(+), 33 deletions(-)
I believe all reports should be fixed in all branches now. Any further problems should better be tracked individually at this point. Please check any in the iwlwifi meta-bug https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=273620 and add to those, re-open, or open a new one. Thanks for all the testing and reporting!
Still panics in a VM (same setup as in comment 1). Running ef75877fc2d9 I did # ifconfig wlan create wlandev iwlwifi0 # wpa_supplicant -i wlan0 -c /etc/wpa_supplicant.conf & <after a while> # ifconfig wlan0 down wlan0: CTRL-EVENT-DISCONNECTED bssid=XXXXXXXXX reason=3 locally_generated=1 Feb 19 12:33:10 fbsd15 dhclient[1386]: Interface wlan0 is down, dhclient exiting iwlwifi0: Couldn't drain frames for staid 0, status 0x8 iwlwifi0: lkpi_sta_run_to_init:2304: mo_sta_state(NOTEXIST) failed: -5 iwlwifi0: lkpi_iv_newstate: error -5 during state transition 5 (RUN) -> 0 (INIT) and it paniced: in gdb (gdb) f 1 #1 0xffffffff80d06653 in ieee80211_newstate_cb (xvap=0xfffffe00750cc010, npending=<optimized out>) at /home/FreeBSD/current/sys/net80211/ieee80211_proto.c:2616 2616 KASSERT(nstate != IEEE80211_S_INIT,
(In reply to Bakul Shah from comment #28) Still a 9260? Given you are only doing a ifconfig down, the panic is different from #c1, also not #c9, #c10, #c12 or #c14 bits. I've seen this elsewhere as well. We'll need to go and see if there's a bss_conf update on the pre-22000 cards which removes the sta for us. I'll try to find the PR. Meanwhile can you email me the full gdb backtrace to bz@
Still 9260! Yes, there may have been other bugs. This one is pretty easy to trigger but still once in a while the system panic on doing wpa_supplicant, before ifconfig wlan0 down but still in the same way, in ieee80211_newstate_cb(). There is not much to the gdb backtrace, see bleow. but I can try extract more info. (gdb) where #0 panic (fmt=0xffffffff811ca7a4 "INIT state change failed") at /home/FreeBSD/current/sys/kern/kern_shutdown.c:888 #1 0xffffffff80d06653 in ieee80211_newstate_cb (xvap=0xfffffe0074e08010, npending=<optimized out>) at /home/FreeBSD/current/sys/net80211/ieee80211_proto.c:2616 #2 0xffffffff80bbb41b in taskqueue_run_locked ( queue=queue@entry=0xfffff800038caa00) at /home/FreeBSD/current/sys/kern/subr_taskqueue.c:517 #3 0xffffffff80bbc4d3 in taskqueue_thread_loop ( arg=arg@entry=0xfffffe0075006110) at /home/FreeBSD/current/sys/kern/subr_taskqueue.c:829 #4 0xffffffff80b09882 in fork_exit ( callout=0xffffffff80bbc400 <taskqueue_thread_loop>, arg=0xfffffe0075006110, frame=0xfffffe007411cf40) at /home/FreeBSD/current/sys/kern/kern_fork.c:1157 #5 <signal handler called> #6 0x0000174eb7b2751a in ?? () Backtrace stopped: Cannot access memory at address 0x174ec2d27f48
(In reply to Bakul Shah from comment #30) Let us move this to PR 275255. It seems the same problem as in #c28 and also with a pre-22000 card.