Bug 268565 - panic after "killall wpa_supplicant" followed by "/etc/rc.d/netif start" with rtw880 (fixed?)
Summary: panic after "killall wpa_supplicant" followed by "/etc/rc.d/netif start" with...
Status: In Progress
Alias: None
Product: Base System
Classification: Unclassified
Component: wireless (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Some People
Assignee: Bjoern A. Zeeb
URL:
Keywords: crash
Depends on:
Blocks: 273621
  Show dependency treegraph
 
Reported: 2022-12-26 05:33 UTC by Mikhail Pchelin
Modified: 2024-02-19 17:14 UTC (History)
4 users (show)

See Also:


Attachments
proposed patch (932 bytes, text/plain)
2022-12-26 05:33 UTC, Mikhail Pchelin
no flags Details
proposed patch (846 bytes, patch)
2022-12-26 05:39 UTC, Mikhail Pchelin
no flags Details | Diff
patch v3 (1.58 KB, patch)
2023-01-10 09:31 UTC, Mikhail Pchelin
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mikhail Pchelin freebsd_committer freebsd_triage 2022-12-26 05:33:07 UTC
Created attachment 239026 [details]
proposed patch

On latest current I've a crash, while executing following commands:

# killall wpa_supplicant
# /etc/rc.d/netif start

The hardware:

rtw880: <rtw_8822ce> port 0x2000-0x20ff mem 0xd0500000-0xd050ffff at device 0.0 on pci1
rtw880: successfully loaded firmware image 'rtw88/rtw8822c_fw.bin'
rtw880: Firmware version 9.9.10, H2C version 15

I also attach proposed patch.

Fatal trap 12: page fault while in kernel mode
cpuid = 5; apic id = 05
fault virtual address	= 0x68
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff80bc7d19
stack pointer	        = 0x28:0xfffffe012ed83c40
frame pointer	        = 0x28:0xfffffe012ed83c80
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 0 (rtw880 net80211 tas)
rdi:               50 rsi: fffffe0131b9ce40 rdx:                0
rcx:              b93  r8:               40  r9: fffff800086fad00
rax:                1 rbx:                0 rbp: fffffe012ed83c80
r10: fffff803ffd9d200 r11: ffffffff81f334a8 r12: ffffffff813133bd
r13:              b93 r14:               68 r15: fffff80030d9d800
trap number		= 12
panic: page fault
cpuid = 5
time = 1672032240
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe012ed83a00
vpanic() at vpanic+0x151/frame 0xfffffe012ed83a50
panic() at panic+0x43/frame 0xfffffe012ed83ab0
trap_fatal() at trap_fatal+0x409/frame 0xfffffe012ed83b10
trap_pfault() at trap_pfault+0xab/frame 0xfffffe012ed83b70
calltrap() at calltrap+0x8/frame 0xfffffe012ed83b70
--- trap 0xc, rip = 0xffffffff80bc7d19, rsp = 0xfffffe012ed83c40, rbp = 0xfffffe012ed83c80 ---
__mtx_lock_flags() at __mtx_lock_flags+0x49/frame 0xfffffe012ed83c80
lkpi_ic_raw_xmit() at lkpi_ic_raw_xmit+0x2e/frame 0xfffffe012ed83cb0
ieee80211_send_probereq() at ieee80211_send_probereq+0x4fa/frame 0xfffffe012ed83d50
ieee80211_swscan_probe_curchan() at ieee80211_swscan_probe_curchan+0x71/frame 0xfffffe012ed83d90
scan_curchan() at scan_curchan+0x67/frame 0xfffffe012ed83dd0
scan_curchan_task() at scan_curchan_task+0x2c4/frame 0xfffffe012ed83e40
taskqueue_run_locked() at taskqueue_run_locked+0xaa/frame 0xfffffe012ed83ec0
taskqueue_thread_loop() at taskqueue_thread_loop+0xc2/frame 0xfffffe012ed83ef0
fork_exit() at fork_exit+0x80/frame 0xfffffe012ed83f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe012ed83f30
--- trap 0, rip = 0xfffff8002d385fd8, rsp = 0xffffffffffff8000, rbp = 0 ---
??() at 0xfffff8002d385fd8
KDB: enter: panic
Comment 1 Mikhail Pchelin freebsd_committer freebsd_triage 2022-12-26 05:39:07 UTC
Created attachment 239027 [details]
proposed patch

Previous proposed patch was with some debugging printfs, new one attached.
Comment 2 Bjoern A. Zeeb freebsd_committer freebsd_triage 2022-12-28 21:47:17 UTC
Comment on attachment 239027 [details]
proposed patch

According to your backtrace you are in scanning.
If you have a missing (changed) lsta in that case the problem has been entirely elsewhere but not in raw_xmit.
Comment 3 Mikhail Pchelin freebsd_committer freebsd_triage 2022-12-29 06:44:28 UTC
(In reply to Bjoern A. Zeeb from comment #2)

My initial report must have contained more details, fixing it:

I'm connected to WPA2 SSID, ping is working, then I do "killall wpa_supplicant" followed by "/etc/rc.d/netif start" and get the panic.

needed part of rc.conf:

wlans_rtw880="wlan0"
ifconfig_wlan0="WPA DHCP"

wpa_supplicant.conf:

network={
  ssid="<SSID>"
  psk="<PSK>"
}

When I issue "killall wpa_supplicant" following series of actions happens:

ieee80211_newstate_cb
lkpi_iv_newstate (nstate=IEEE80211_S_INIT)
lkpi_sta_run_to_init
lkpi_lsta_remove (this is where we ni->ni_drv_data being set to NULL)

According to comments near "sta_state_fsm[]" transition from IEEE80211_S_RUN to IEEE80211_S_INIT is done when DISASSOC frame is sent.

When I do "/etc/rc.d/netif start" I get the backtrace from original message: 80211 stack is trying to issue active scan and send probe request, but ni_drv_data is NULL and therefor I get the panic.

Currently I'm not sure how to fix this except checking lsta in raw_xmit and manually allocating it if fails. If the patch looks fishy, can you point the direction where to dig this further?
Comment 4 Mikhail Pchelin freebsd_committer freebsd_triage 2022-12-29 06:54:23 UTC
(In reply to Mikhail Pchelin from comment #3)
Probably I should have said "*a* direction where to dig this further".
Comment 5 Mikhail Pchelin freebsd_committer freebsd_triage 2023-01-10 09:31:18 UTC
Created attachment 239374 [details]
patch v3

Next attempt to fix the panic.

Thinking about it more, I must agree that fixing lsta in xmit function is wrong.

Panic occurs while transiting from INIT state to SCAN, currently this transition is not handled in a special way, the handler is lkpi_sta_state_do_nada(), which is a stub. I suggest to check for lsta existence in the new handler and allocating it, if it's not set.

INIT->SCAN also happens on boot, but in that case we also create a VAP, so lsta allocation is handled in that chain, so there is no crash.

Opinions?
Comment 6 Bjoern A. Zeeb freebsd_committer freebsd_triage 2023-01-11 22:41:54 UTC
I see the problem now.  There's a secondary possible;  I'll need to sit down and confirm.  I'll get back to you the next days.
Comment 7 Graham Perrin freebsd_committer freebsd_triage 2023-01-12 07:10:41 UTC
(In reply to Bjoern A. Zeeb from comment #6)

Thanks. 

Triage: CC wireless@ (the previously assigned group/list).
Comment 8 Mikhail Pchelin freebsd_committer freebsd_triage 2023-01-12 12:42:05 UTC
(In reply to Bjoern A. Zeeb from comment #6)

Thanks for taking this.

My patch albeit fixes the panic, but with this simple test scenario:

while true; do killall wpa_supplicant && /etc/rc.d/netif start && sleep 5; done

sometimes (it needs like 10-15 mins) I see crashes

like this one:

(kgdb) bt
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:59
#1  dump_savectx () at /usr/src/sys/kern/kern_shutdown.c:405
#2  0xffffffff80bee818 in dumpsys (di=0x0) at /usr/src/sys/x86/include/dump.h:87
#3  doadump (textdump=textdump@entry=0) at /usr/src/sys/kern/kern_shutdown.c:434
#4  0xffffffff804b519a in db_dump (dummy=<optimized out>, dummy2=<unavailable>, dummy3=<unavailable>, dummy4=<unavailable>) at /usr/src/sys/ddb/db_command.c:593
#5  0xffffffff804b4fa0 in db_command (last_cmdp=<optimized out>, cmd_table=<optimized out>, dopager=true) at /usr/src/sys/ddb/db_command.c:506
#6  0xffffffff804b4c6d in db_command_loop () at /usr/src/sys/ddb/db_command.c:553
#7  0xffffffff804b8306 in db_trap (type=<optimized out>, code=<optimized out>) at /usr/src/sys/ddb/db_main.c:270
#8  0xffffffff80c3ddee in kdb_trap (type=type@entry=3, code=<unavailable>, code@entry=0, tf=tf@entry=0xfffffe00c2228a20) at /usr/src/sys/kern/subr_kdb.c:745
#9  0xffffffff810d27f7 in trap (frame=0xfffffe00c2228a20) at /usr/src/sys/amd64/amd64/trap.c:611
#10 <signal handler called>
#11 kdb_enter (why=<optimized out>, msg=<optimized out>) at /usr/src/sys/kern/subr_kdb.c:509
#12 0xffffffff80bee9c2 in vpanic (fmt=<optimized out>, ap=ap@entry=0xfffffe00c2228b70) at /usr/src/sys/kern/kern_shutdown.c:967
#13 0xffffffff80bee763 in panic (fmt=0xffffffff81e8ff30 <cnputs_mtx> "K\206)\201\377\377\377\377") at /usr/src/sys/kern/kern_shutdown.c:903
#14 0xffffffff810d2c89 in trap_fatal (frame=0xfffffe00c2228c60, eva=0) at /usr/src/sys/amd64/amd64/trap.c:955
#15 0xffffffff810d2d3b in trap_pfault (frame=0xfffffe00c2228c60, usermode=false, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:763
#16 <signal handler called>
#17 0xffffffff80e5d94b in lkpi_lsta_remove (lsta=lsta@entry=0xfffff800889fcc00, lvif=0xfffffe013e35e000) at /usr/src/sys/compat/linuxkpi/common/src/linux_80211.c:174
#18 0xffffffff80e5bd4b in lkpi_ic_node_free (ni=0xfffffe0140eb9000) at /usr/src/sys/compat/linuxkpi/common/src/linux_80211.c:2982
#19 0xffffffff80e5e606 in lkpi_ieee80211_free_skb_mbuf (p=0xfffff800421ee500) at /usr/src/sys/compat/linuxkpi/common/src/linux_80211.c:4428
#20 0xffffffff80e734f6 in linuxkpi_kfree_skb (skb=0xfffffe0140907000) at /usr/src/sys/compat/linuxkpi/common/src/linux_skbuff.c:236
#21 0xffffffff83b207f2 in ?? ()
#22 0x0000000000000000 in ?? ()


or this one:

(kgdb) bt
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:59
#1  dump_savectx () at /usr/src/sys/kern/kern_shutdown.c:405
#2  0xffffffff80bee818 in dumpsys (di=0x0) at /usr/src/sys/x86/include/dump.h:87
#3  doadump (textdump=textdump@entry=0) at /usr/src/sys/kern/kern_shutdown.c:434
#4  0xffffffff804b519a in db_dump (dummy=<optimized out>, dummy2=<unavailable>, dummy3=<unavailable>, dummy4=<unavailable>) at /usr/src/sys/ddb/db_command.c:593
#5  0xffffffff804b4fa0 in db_command (last_cmdp=<optimized out>, cmd_table=<optimized out>, dopager=true) at /usr/src/sys/ddb/db_command.c:506
#6  0xffffffff804b4c6d in db_command_loop () at /usr/src/sys/ddb/db_command.c:553
#7  0xffffffff804b8306 in db_trap (type=<optimized out>, code=<optimized out>) at /usr/src/sys/ddb/db_main.c:270
#8  0xffffffff80c3ddee in kdb_trap (type=type@entry=3, code=<unavailable>, code@entry=0, tf=tf@entry=0xfffffe0132e81660) at /usr/src/sys/kern/subr_kdb.c:745
#9  0xffffffff810d27f7 in trap (frame=0xfffffe0132e81660) at /usr/src/sys/amd64/amd64/trap.c:611
#10 <signal handler called>
#11 kdb_enter (why=<optimized out>, msg=<optimized out>) at /usr/src/sys/kern/subr_kdb.c:509
#12 0xffffffff80bee9c2 in vpanic (fmt=<optimized out>, ap=ap@entry=0xfffffe0132e817b0) at /usr/src/sys/kern/kern_shutdown.c:967
#13 0xffffffff80bee763 in panic (fmt=0xffffffff81e8ff30 <cnputs_mtx> "K\206)\201\377\377\377\377") at /usr/src/sys/kern/kern_shutdown.c:903
#14 0xffffffff810d2c89 in trap_fatal (frame=0xfffffe0132e818a0, eva=0) at /usr/src/sys/amd64/amd64/trap.c:955
#15 0xffffffff810d2d3b in trap_pfault (frame=0xfffffe0132e818a0, usermode=false, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:763
#16 <signal handler called>
#17 0xffffffff80e5d94b in lkpi_lsta_remove (lsta=lsta@entry=0xfffff800089f8c00, lvif=0xfffffe013201c000) at /usr/src/sys/compat/linuxkpi/common/src/linux_80211.c:174
#18 0xffffffff80e5bd4b in lkpi_ic_node_free (ni=0xfffffe0133437000) at /usr/src/sys/compat/linuxkpi/common/src/linux_80211.c:2982
#19 0xffffffff80d8d448 in ieee80211_sta_join1 (selbs=selbs@entry=0xfffffe013343f000) at /usr/src/sys/net80211/ieee80211_node.c:870
#20 0xffffffff80d8e35c in ieee80211_sta_join (vap=vap@entry=0xfffffe013201c010, chan=<optimized out>, se=<optimized out>) at /usr/src/sys/net80211/ieee80211_node.c:1046
#21 0xffffffff80d82247 in setmlme_assoc_sta (vap=0xfffffe013201c010, mac=0xfffffe0132e81a94 "\344\312\022\231}\375MGTS_GPON_8D02", ssid_len=<optimized out>, ssid=<optimized out>) at /usr/src/sys/net80211/ieee80211_ioctl.c:1576
#22 ieee80211_ioctl_setmlme (vap=vap@entry=0xfffffe013201c010, ireq=ireq@entry=0xfffffe0132e81d50) at /usr/src/sys/net80211/ieee80211_ioctl.c:1633
#23 0xffffffff80d7fca8 in ieee80211_ioctl_set80211 (vap=vap@entry=0xfffffe013201c010, cmd=<optimized out>, ireq=ireq@entry=0xfffffe0132e81d50) at /usr/src/sys/net80211/ieee80211_ioctl.c:2953
#24 0xffffffff80d7e82b in ieee80211_ioctl (ifp=0xfffff80034b50800, cmd=2149607914, data=0xfffffe0132e81d50 "wlan0") at /usr/src/sys/net80211/ieee80211_ioctl.c:3633
#25 0xffffffff80d1e504 in ifioctl (so=0xfffff8003490e780, cmd=2149607914, data=<optimized out>, td=0xfffffe01327e7740) at /usr/src/sys/net/if.c:3161
#26 0xffffffff80c66bc2 in fo_ioctl (fp=0xfffff8006452a9b0, com=2149607914, data=0x24b, active_cred=0x10000, td=<optimized out>) at /usr/src/sys/sys/file.h:367
#27 kern_ioctl (td=td@entry=0xfffffe01327e7740, fd=<optimized out>, com=com@entry=2149607914, data=0x24b <error: Cannot access memory at address 0x24b>, data@entry=0xfffffe0132e81d50 "wlan0") at /usr/src/sys/kern/sys_generic.c:807
#28 0xffffffff80c6690a in sys_ioctl (td=0xfffffe01327e7740, uap=0xfffffe01327e7b38) at /usr/src/sys/kern/sys_generic.c:715
#29 0xffffffff810d363e in syscallenter (td=<optimized out>) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:190
#30 amd64_syscall (td=0xfffffe01327e7740, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1200
#31 <signal handler called>
#32 0x00002d35fbf8a95a in ?? ()

Currently I'm not sure whether it's because of the patch or it's different issue.
Comment 9 Bjoern A. Zeeb freebsd_committer freebsd_triage 2024-02-19 17:14:25 UTC
Hi,

it's been a long time given almost all focus was/is on iwlwifi currently.
I believe the latest changes to net80211/LinuxKPI should have fixed this issue too.

Is there any chance you can test 15/14/13/or 13.3 starting with RC1 and report back?

Or are you no longer interested in LinuxKPI based rtw88 and I should close this?