Summary: | [iwn] panic in AMPDU tx code "ni: no node" | ||
---|---|---|---|
Product: | Base System | Reporter: | Adrian Chadd <adrian> |
Component: | wireless | Assignee: | Andriy Voskoboinyk <avos> |
Status: | Closed FIXED | ||
Severity: | Affects Only Me | CC: | avos, gl00my, marinintim+freebsd, mizhka |
Priority: | --- | Keywords: | crash |
Version: | CURRENT | ||
Hardware: | Any | ||
OS: | Any |
Description
Adrian Chadd
2014-08-13 16:51:00 UTC
It seems like I'm having the same problem with N-2230 on CURRENT, it causes panics once in a while (I have 9 core dumps since New Year). Not sure how to debug it. Hello! While working with iwfi drivers in Haiku (those drivers are ported from FreeBSD) i founded the bugs, that potentially is FreeBSD specific. iprowifi4965 and iprowifi3945 are affected. May be others. I will describe them on iprowifi4965 code. iwn_tx_done has KASSERT(data->ni != NULL, ("no node")) at beginning that is triggers in some situations. The situation can be: 1) hw interrupt is scheduled 2) iwn_hw_stop is called (and it disables/enables interrupss) node is destroyed here. 3) here we got scheduled (threaded) interrupt that is not actual anymore -> panic In fact, i was thinking it is Haiku specific bug, because of threaded inr handlers. But i found this bug report and it seems the same for me. To reproduce it on Haiku i run floodping to roater while reboot the system. But on FreeBSD i am not sure, what can couse iwn_hw_stop while sending packets... Hope it helps. A commit references this bug: Author: avos Date: Wed Jan 16 12:33:07 UTC 2019 New revision: 343094 URL: https://svnweb.freebsd.org/changeset/base/343094 Log: iwn(4): (partially) rewrite A-MPDU Tx path Generic Tx stats fixes: - do not try to parse "aggregation status" for single frames; send them to iwn_tx_done() instead; - try to attach mbuf / node reference pair to reported BA events; allows to fix reported status for ieee80211_tx_complete() and ifnet counters (previously all A-MPDU frames were counted as failed - see PR 210211); requires few more firmware bug workarounds; - preserve short / long retry counters for wlan_amrr(4) (disabled for now - causes significant performance degradation). - Add new IWN_DEBUG_AMPDU debug category. - Add one more check into iwn_tx_data() to prevent aggregation ring overflow. - Workaround 'seqno % 256' != 'current Tx slot' case (until D9195 is not in the tree). - Improve watchdog timer updates (previously watchdog check was omitted when at least one frame was transmitted). - Stop Tx when memory leak in currently used ring was detected (unlikely to happen). - Few other minor fixes. Was previously tested with: - Intel 6205, STA mode (Tx aggregation behaves much better now). - Intel 4965AGN, STA mode (still unstable). PR: 192641, 210211 Reviewed by: adrian, dhw MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D10728 Changes: head/sys/dev/iwn/if_iwn.c head/sys/dev/iwn/if_iwn_debug.h head/sys/dev/iwn/if_iwnreg.h head/sys/dev/iwn/if_iwnvar.h Hi, Thank you, Andriy!!! This night laptop panicked with similar error on r343662: __curthread () at ./machine/pcpu.h:230 230 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (OFFSETOF_CURTHREAD)); (kgdb) #0 __curthread () at ./machine/pcpu.h:230 #1 doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:371 #2 0xffffffff8049aa3b in db_dump (dummy=<optimized out>, dummy2=<unavailable>, dummy3=<unavailable>, dummy4=<unavailable>) at /usr/src/sys/ddb/db_command.c:574 #3 0xffffffff8049a809 in db_command (last_cmdp=<optimized out>, cmd_table=<optimized out>, dopager=1) at /usr/src/sys/ddb/db_command.c:481 #4 0xffffffff8049a584 in db_command_loop () at /usr/src/sys/ddb/db_command.c:534 #5 0xffffffff8049d73f in db_trap (type=<optimized out>, code=<optimized out>) at /usr/src/sys/ddb/db_main.c:252 #6 0xffffffff80c257a4 in kdb_trap (type=3, code=0, tf=<optimized out>) at /usr/src/sys/kern/subr_kdb.c:692 #7 0xffffffff810adf9b in trap (frame=0xfffffe00005646c0) at /usr/src/sys/amd64/amd64/trap.c:619 #8 <signal handler called> #9 kdb_enter (why=0xffffffff8133e78d "panic", msg=<optimized out>) at /usr/src/sys/kern/subr_kdb.c:479 #10 0xffffffff80bdd351 in vpanic (fmt=<optimized out>, ap=0xfffffe0000564830) at /usr/src/sys/kern/kern_shutdown.c:866 #11 0xffffffff80bdd0e3 in panic ( fmt=0xffffffff81e96658 <cnputs_mtx> "R80\201\377\377\377\377") at /usr/src/sys/kern/kern_shutdown.c:804 #12 0xffffffff806ac92f in iwn_tx_done (sc=0xfffffe0005df9000, desc=<optimized out>, rtsfailcnt=5654144, ackfailcnt=128, status=0 '\000') at /usr/src/sys/dev/iwn/if_iwn.c:3659 #13 0xffffffff806b29d4 in iwn_notif_intr (sc=<optimized out>) at /usr/src/sys/dev/iwn/if_iwn.c:4033 #14 0xffffffff806a9992 in iwn_intr (arg=0xfffffe0005df9000) at /usr/src/sys/dev/iwn/if_iwn.c:4327 #15 0xffffffff80ba02c7 in intr_event_execute_handlers (ie=<optimized out>, p=<optimized out>) at /usr/src/sys/kern/kern_intr.c:1119 #16 ithread_execute_handlers (ie=<optimized out>, p=<optimized out>) at /usr/src/sys/kern/kern_intr.c:1132 #17 ithread_loop (arg=<optimized out>) at /usr/src/sys/kern/kern_intr.c:1212 #18 0xffffffff80b9cf44 in fork_exit ( callout=0xffffffff80ba0140 <ithread_loop>, arg=0xfffff80004a59000, frame=0xfffffe0000564ac0) at /usr/src/sys/kern/kern_fork.c:1055 #19 <signal handler called> (kgdb) Conditions: I have unstable connection to WiFi AP because of long distance between laptop and router. Sometimes connectivity is lost. I can upload dump / textdump if it's required. Best regards Hi, is it 4965agn? (I cannot determine chipset from the backtrace) I remember there were some similar issues with wpi(4) (fixed few years ago); does this happen when power save is turned off? (ifconfig wlan0 -powersave - before NIC is connected, since it will be ignored until next reassociation) P.S. ackfailcnt / rtsfailcnt is bogus; I hope they were corrupted in backtrace only... Firmware panic and/or device restart may result in this issue too - there is a (possible) race between beacon miss and initialization path; I will try to add a workaround here a bit later. avos@, Chipset is centrino wireless-N 2200 (vendor=8086 chip=0891) Last logs from wpa_supplicant: Feb 5 02:17:46 gidrarium wpa_supplicant[325]: wlan0: Event SCAN_RESULTS (3) received Feb 5 02:17:46 gidrarium kernel: iwn0: scan timeout Feb 5 02:17:46 gidrarium wpa_supplicant[325]: Received 0 bytes of scan results (0 BSSes) Feb 5 02:17:46 gidrarium kernel: iwn0: iwn_read_firmware: ucode rev=0x12a80601 Feb 5 02:17:46 gidrarium wpa_supplicant[325]: wlan0: BSS: Start scan result update 127 Feb 5 02:17:46 gidrarium wpa_supplicant[325]: wlan0: BSS: Remove id 187 BSSID c4:71:54:3e:88:ac SSID 'kvartira50' due to no match in scan Feb 5 02:17:46 gidrarium wpa_supplicant[325]: wlan0: BSS: Remove id 179 BSSID b0:4e:26:b4:2b:be SSID 'TP-L-492' due to no match in scan Feb 5 02:17:46 gidrarium wpa_supplicant[325]: BSS: last_scan_res_used=0/32 Feb 5 02:17:46 gidrarium wpa_supplicant[325]: wlan0: New scan results available (own=0 ext=0) Feb 5 02:17:46 gidrarium wpa_supplicant[325]: WPS: Unknown Vendor Extension (Vendor ID 9442) Feb 5 02:17:46 gidrarium syslogd: last message repeated 5 times Feb 5 02:17:46 gidrarium wpa_supplicant[325]: wlan0: Radio work 'scan'@0x800e1a380 done in 4.997372 seconds Feb 5 02:17:46 gidrarium wpa_supplicant[325]: wlan0: radio_work_free('scan'@0x800e1a380): num_active_works --> 0 Feb 5 02:17:46 gidrarium wpa_supplicant[325]: wlan0: No suitable network found Feb 5 02:17:46 gidrarium wpa_supplicant[325]: wlan0: Setting scan request: 5.000000 sec I wonder is it possible to test timestamp of panic from dump. That is - 'scan timeout' causes device restart. A commit references this bug: Author: avos Date: Wed Feb 6 01:34:14 UTC 2019 New revision: 343815 URL: https://svnweb.freebsd.org/changeset/base/343815 Log: iwn(4): plug initialization path vs interrupt handler races There are few places in interrupt handler where the driver lock is dropped; ensure that device is still running before processing remaining ring entries. PR: 192641 MFC after: 5 days Changes: head/sys/dev/iwn/if_iwn.c Thank you, Andriy! A commit references this bug: Author: avos Date: Mon Feb 11 00:31:59 UTC 2019 New revision: 343992 URL: https://svnweb.freebsd.org/changeset/base/343992 Log: MFC r343815: iwn(4): plug initialization path vs interrupt handler races There are few places in interrupt handler where the driver lock is dropped; ensure that device is still running before processing remaining ring entries. PR: 192641 Changes: _U stable/11/ stable/11/sys/dev/iwn/if_iwn.c _U stable/12/ stable/12/sys/dev/iwn/if_iwn.c A commit references this bug: Author: avos Date: Sat Feb 16 01:19:15 UTC 2019 New revision: 344197 URL: https://svnweb.freebsd.org/changeset/base/344197 Log: MFC r343094: iwn(4): (partially) rewrite A-MPDU Tx path The change fixes ifnet counters and improves Tx stability when A-MPDU is used. MFC r343292: iwn(4): drop i_seq field initialization for A-MPDU frames It is done by net80211 since r319460. PR: 192641, 210211 Reviewed by: adrian, dhw Differential Revision: https://reviews.freebsd.org/D10728 Changes: _U stable/12/ stable/12/sys/dev/iwn/if_iwn.c stable/12/sys/dev/iwn/if_iwn_debug.h stable/12/sys/dev/iwn/if_iwnreg.h stable/12/sys/dev/iwn/if_iwnvar.h Looks like the issue can be closed now. |