Created attachment 239026 [details] proposed patch On latest current I've a crash, while executing following commands: # killall wpa_supplicant # /etc/rc.d/netif start The hardware: rtw880: <rtw_8822ce> port 0x2000-0x20ff mem 0xd0500000-0xd050ffff at device 0.0 on pci1 rtw880: successfully loaded firmware image 'rtw88/rtw8822c_fw.bin' rtw880: Firmware version 9.9.10, H2C version 15 I also attach proposed patch. Fatal trap 12: page fault while in kernel mode cpuid = 5; apic id = 05 fault virtual address = 0x68 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80bc7d19 stack pointer = 0x28:0xfffffe012ed83c40 frame pointer = 0x28:0xfffffe012ed83c80 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (rtw880 net80211 tas) rdi: 50 rsi: fffffe0131b9ce40 rdx: 0 rcx: b93 r8: 40 r9: fffff800086fad00 rax: 1 rbx: 0 rbp: fffffe012ed83c80 r10: fffff803ffd9d200 r11: ffffffff81f334a8 r12: ffffffff813133bd r13: b93 r14: 68 r15: fffff80030d9d800 trap number = 12 panic: page fault cpuid = 5 time = 1672032240 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe012ed83a00 vpanic() at vpanic+0x151/frame 0xfffffe012ed83a50 panic() at panic+0x43/frame 0xfffffe012ed83ab0 trap_fatal() at trap_fatal+0x409/frame 0xfffffe012ed83b10 trap_pfault() at trap_pfault+0xab/frame 0xfffffe012ed83b70 calltrap() at calltrap+0x8/frame 0xfffffe012ed83b70 --- trap 0xc, rip = 0xffffffff80bc7d19, rsp = 0xfffffe012ed83c40, rbp = 0xfffffe012ed83c80 --- __mtx_lock_flags() at __mtx_lock_flags+0x49/frame 0xfffffe012ed83c80 lkpi_ic_raw_xmit() at lkpi_ic_raw_xmit+0x2e/frame 0xfffffe012ed83cb0 ieee80211_send_probereq() at ieee80211_send_probereq+0x4fa/frame 0xfffffe012ed83d50 ieee80211_swscan_probe_curchan() at ieee80211_swscan_probe_curchan+0x71/frame 0xfffffe012ed83d90 scan_curchan() at scan_curchan+0x67/frame 0xfffffe012ed83dd0 scan_curchan_task() at scan_curchan_task+0x2c4/frame 0xfffffe012ed83e40 taskqueue_run_locked() at taskqueue_run_locked+0xaa/frame 0xfffffe012ed83ec0 taskqueue_thread_loop() at taskqueue_thread_loop+0xc2/frame 0xfffffe012ed83ef0 fork_exit() at fork_exit+0x80/frame 0xfffffe012ed83f30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe012ed83f30 --- trap 0, rip = 0xfffff8002d385fd8, rsp = 0xffffffffffff8000, rbp = 0 --- ??() at 0xfffff8002d385fd8 KDB: enter: panic
Created attachment 239027 [details] proposed patch Previous proposed patch was with some debugging printfs, new one attached.
Comment on attachment 239027 [details] proposed patch According to your backtrace you are in scanning. If you have a missing (changed) lsta in that case the problem has been entirely elsewhere but not in raw_xmit.
(In reply to Bjoern A. Zeeb from comment #2) My initial report must have contained more details, fixing it: I'm connected to WPA2 SSID, ping is working, then I do "killall wpa_supplicant" followed by "/etc/rc.d/netif start" and get the panic. needed part of rc.conf: wlans_rtw880="wlan0" ifconfig_wlan0="WPA DHCP" wpa_supplicant.conf: network={ ssid="<SSID>" psk="<PSK>" } When I issue "killall wpa_supplicant" following series of actions happens: ieee80211_newstate_cb lkpi_iv_newstate (nstate=IEEE80211_S_INIT) lkpi_sta_run_to_init lkpi_lsta_remove (this is where we ni->ni_drv_data being set to NULL) According to comments near "sta_state_fsm[]" transition from IEEE80211_S_RUN to IEEE80211_S_INIT is done when DISASSOC frame is sent. When I do "/etc/rc.d/netif start" I get the backtrace from original message: 80211 stack is trying to issue active scan and send probe request, but ni_drv_data is NULL and therefor I get the panic. Currently I'm not sure how to fix this except checking lsta in raw_xmit and manually allocating it if fails. If the patch looks fishy, can you point the direction where to dig this further?
(In reply to Mikhail Pchelin from comment #3) Probably I should have said "*a* direction where to dig this further".
Created attachment 239374 [details] patch v3 Next attempt to fix the panic. Thinking about it more, I must agree that fixing lsta in xmit function is wrong. Panic occurs while transiting from INIT state to SCAN, currently this transition is not handled in a special way, the handler is lkpi_sta_state_do_nada(), which is a stub. I suggest to check for lsta existence in the new handler and allocating it, if it's not set. INIT->SCAN also happens on boot, but in that case we also create a VAP, so lsta allocation is handled in that chain, so there is no crash. Opinions?
I see the problem now. There's a secondary possible; I'll need to sit down and confirm. I'll get back to you the next days.
(In reply to Bjoern A. Zeeb from comment #6) Thanks. Triage: CC wireless@ (the previously assigned group/list).
(In reply to Bjoern A. Zeeb from comment #6) Thanks for taking this. My patch albeit fixes the panic, but with this simple test scenario: while true; do killall wpa_supplicant && /etc/rc.d/netif start && sleep 5; done sometimes (it needs like 10-15 mins) I see crashes like this one: (kgdb) bt #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:59 #1 dump_savectx () at /usr/src/sys/kern/kern_shutdown.c:405 #2 0xffffffff80bee818 in dumpsys (di=0x0) at /usr/src/sys/x86/include/dump.h:87 #3 doadump (textdump=textdump@entry=0) at /usr/src/sys/kern/kern_shutdown.c:434 #4 0xffffffff804b519a in db_dump (dummy=<optimized out>, dummy2=<unavailable>, dummy3=<unavailable>, dummy4=<unavailable>) at /usr/src/sys/ddb/db_command.c:593 #5 0xffffffff804b4fa0 in db_command (last_cmdp=<optimized out>, cmd_table=<optimized out>, dopager=true) at /usr/src/sys/ddb/db_command.c:506 #6 0xffffffff804b4c6d in db_command_loop () at /usr/src/sys/ddb/db_command.c:553 #7 0xffffffff804b8306 in db_trap (type=<optimized out>, code=<optimized out>) at /usr/src/sys/ddb/db_main.c:270 #8 0xffffffff80c3ddee in kdb_trap (type=type@entry=3, code=<unavailable>, code@entry=0, tf=tf@entry=0xfffffe00c2228a20) at /usr/src/sys/kern/subr_kdb.c:745 #9 0xffffffff810d27f7 in trap (frame=0xfffffe00c2228a20) at /usr/src/sys/amd64/amd64/trap.c:611 #10 <signal handler called> #11 kdb_enter (why=<optimized out>, msg=<optimized out>) at /usr/src/sys/kern/subr_kdb.c:509 #12 0xffffffff80bee9c2 in vpanic (fmt=<optimized out>, ap=ap@entry=0xfffffe00c2228b70) at /usr/src/sys/kern/kern_shutdown.c:967 #13 0xffffffff80bee763 in panic (fmt=0xffffffff81e8ff30 <cnputs_mtx> "K\206)\201\377\377\377\377") at /usr/src/sys/kern/kern_shutdown.c:903 #14 0xffffffff810d2c89 in trap_fatal (frame=0xfffffe00c2228c60, eva=0) at /usr/src/sys/amd64/amd64/trap.c:955 #15 0xffffffff810d2d3b in trap_pfault (frame=0xfffffe00c2228c60, usermode=false, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:763 #16 <signal handler called> #17 0xffffffff80e5d94b in lkpi_lsta_remove (lsta=lsta@entry=0xfffff800889fcc00, lvif=0xfffffe013e35e000) at /usr/src/sys/compat/linuxkpi/common/src/linux_80211.c:174 #18 0xffffffff80e5bd4b in lkpi_ic_node_free (ni=0xfffffe0140eb9000) at /usr/src/sys/compat/linuxkpi/common/src/linux_80211.c:2982 #19 0xffffffff80e5e606 in lkpi_ieee80211_free_skb_mbuf (p=0xfffff800421ee500) at /usr/src/sys/compat/linuxkpi/common/src/linux_80211.c:4428 #20 0xffffffff80e734f6 in linuxkpi_kfree_skb (skb=0xfffffe0140907000) at /usr/src/sys/compat/linuxkpi/common/src/linux_skbuff.c:236 #21 0xffffffff83b207f2 in ?? () #22 0x0000000000000000 in ?? () or this one: (kgdb) bt #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:59 #1 dump_savectx () at /usr/src/sys/kern/kern_shutdown.c:405 #2 0xffffffff80bee818 in dumpsys (di=0x0) at /usr/src/sys/x86/include/dump.h:87 #3 doadump (textdump=textdump@entry=0) at /usr/src/sys/kern/kern_shutdown.c:434 #4 0xffffffff804b519a in db_dump (dummy=<optimized out>, dummy2=<unavailable>, dummy3=<unavailable>, dummy4=<unavailable>) at /usr/src/sys/ddb/db_command.c:593 #5 0xffffffff804b4fa0 in db_command (last_cmdp=<optimized out>, cmd_table=<optimized out>, dopager=true) at /usr/src/sys/ddb/db_command.c:506 #6 0xffffffff804b4c6d in db_command_loop () at /usr/src/sys/ddb/db_command.c:553 #7 0xffffffff804b8306 in db_trap (type=<optimized out>, code=<optimized out>) at /usr/src/sys/ddb/db_main.c:270 #8 0xffffffff80c3ddee in kdb_trap (type=type@entry=3, code=<unavailable>, code@entry=0, tf=tf@entry=0xfffffe0132e81660) at /usr/src/sys/kern/subr_kdb.c:745 #9 0xffffffff810d27f7 in trap (frame=0xfffffe0132e81660) at /usr/src/sys/amd64/amd64/trap.c:611 #10 <signal handler called> #11 kdb_enter (why=<optimized out>, msg=<optimized out>) at /usr/src/sys/kern/subr_kdb.c:509 #12 0xffffffff80bee9c2 in vpanic (fmt=<optimized out>, ap=ap@entry=0xfffffe0132e817b0) at /usr/src/sys/kern/kern_shutdown.c:967 #13 0xffffffff80bee763 in panic (fmt=0xffffffff81e8ff30 <cnputs_mtx> "K\206)\201\377\377\377\377") at /usr/src/sys/kern/kern_shutdown.c:903 #14 0xffffffff810d2c89 in trap_fatal (frame=0xfffffe0132e818a0, eva=0) at /usr/src/sys/amd64/amd64/trap.c:955 #15 0xffffffff810d2d3b in trap_pfault (frame=0xfffffe0132e818a0, usermode=false, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:763 #16 <signal handler called> #17 0xffffffff80e5d94b in lkpi_lsta_remove (lsta=lsta@entry=0xfffff800089f8c00, lvif=0xfffffe013201c000) at /usr/src/sys/compat/linuxkpi/common/src/linux_80211.c:174 #18 0xffffffff80e5bd4b in lkpi_ic_node_free (ni=0xfffffe0133437000) at /usr/src/sys/compat/linuxkpi/common/src/linux_80211.c:2982 #19 0xffffffff80d8d448 in ieee80211_sta_join1 (selbs=selbs@entry=0xfffffe013343f000) at /usr/src/sys/net80211/ieee80211_node.c:870 #20 0xffffffff80d8e35c in ieee80211_sta_join (vap=vap@entry=0xfffffe013201c010, chan=<optimized out>, se=<optimized out>) at /usr/src/sys/net80211/ieee80211_node.c:1046 #21 0xffffffff80d82247 in setmlme_assoc_sta (vap=0xfffffe013201c010, mac=0xfffffe0132e81a94 "\344\312\022\231}\375MGTS_GPON_8D02", ssid_len=<optimized out>, ssid=<optimized out>) at /usr/src/sys/net80211/ieee80211_ioctl.c:1576 #22 ieee80211_ioctl_setmlme (vap=vap@entry=0xfffffe013201c010, ireq=ireq@entry=0xfffffe0132e81d50) at /usr/src/sys/net80211/ieee80211_ioctl.c:1633 #23 0xffffffff80d7fca8 in ieee80211_ioctl_set80211 (vap=vap@entry=0xfffffe013201c010, cmd=<optimized out>, ireq=ireq@entry=0xfffffe0132e81d50) at /usr/src/sys/net80211/ieee80211_ioctl.c:2953 #24 0xffffffff80d7e82b in ieee80211_ioctl (ifp=0xfffff80034b50800, cmd=2149607914, data=0xfffffe0132e81d50 "wlan0") at /usr/src/sys/net80211/ieee80211_ioctl.c:3633 #25 0xffffffff80d1e504 in ifioctl (so=0xfffff8003490e780, cmd=2149607914, data=<optimized out>, td=0xfffffe01327e7740) at /usr/src/sys/net/if.c:3161 #26 0xffffffff80c66bc2 in fo_ioctl (fp=0xfffff8006452a9b0, com=2149607914, data=0x24b, active_cred=0x10000, td=<optimized out>) at /usr/src/sys/sys/file.h:367 #27 kern_ioctl (td=td@entry=0xfffffe01327e7740, fd=<optimized out>, com=com@entry=2149607914, data=0x24b <error: Cannot access memory at address 0x24b>, data@entry=0xfffffe0132e81d50 "wlan0") at /usr/src/sys/kern/sys_generic.c:807 #28 0xffffffff80c6690a in sys_ioctl (td=0xfffffe01327e7740, uap=0xfffffe01327e7b38) at /usr/src/sys/kern/sys_generic.c:715 #29 0xffffffff810d363e in syscallenter (td=<optimized out>) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:190 #30 amd64_syscall (td=0xfffffe01327e7740, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1200 #31 <signal handler called> #32 0x00002d35fbf8a95a in ?? () Currently I'm not sure whether it's because of the patch or it's different issue.
Hi, it's been a long time given almost all focus was/is on iwlwifi currently. I believe the latest changes to net80211/LinuxKPI should have fixed this issue too. Is there any chance you can test 15/14/13/or 13.3 starting with RC1 and report back? Or are you no longer interested in LinuxKPI based rtw88 and I should close this?