Bug 167113 - [ath] AR5210: "stuck" TX seems to be occuring, without watchdog timeout firing
Summary: [ath] AR5210: "stuck" TX seems to be occuring, without watchdog timeout firing
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: wireless (show other bugs)
Version: 9.0-RELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-wireless (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-04-20 01:30 UTC by Adrian Chadd
Modified: 2018-05-28 19:45 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Adrian Chadd freebsd_committer 2012-04-20 01:30:10 UTC
When using an AR5210 NIC and with -bgscan disabled, I've noticed that TX will occasionally hang.

A 'scan' (which resets the NIC) will make things work again.

A watchdog timeout isn't occuring, so the watchdog is being tickled somehow. However, the data TXQ shows 2-3 frames actually in the queue, as well as a number of frames being buffered in the software queue.

The relevant dmesg output:

(sysctl dev.ath.1.txagg=1)

.
.
HW TXQ 0: axq_depth=2, axq_aggr_depth=0
HW TXQ 1: axq_depth=0, axq_aggr_depth=0
Total TX buffers: 77; Total TX buffers busy: 0

(here, ifconfig wlan1 scan)

wlan1: [00:24:6c:04:ed:39] sta power save mode on
ar5210: dma receive failed to stop in 10ms
AR_CR=0x24
AR_DIAG_SW=0x40
ath1: ath_tx_tid_drain: node 0xc78c6000: bf=0xc787b570: addbaw=0, dobaw=0, seqno_assign=0, seqno_required=0, seqno=-1, retry=0
ath1: ath_tx_tid_drain: node 0xc78c6000: bf=0xc787b570: tid txq_depth=51 hwq_depth=0
ath1: ath_tx_tid_drain: node 0xc78c6000: bf=0xc787b570: tid 16: txq_depth=0, txq_aggr_depth=0, sched=1, paused=0, hwq_depth=0, incomp=0, baw_head=0, baw_tail=0 txa_start=-1, ni_txseqs=45773
TODS 00:30:ab:17:81:47->00:1f:6c:9a:3f:1b(00:24:6c:04:ed:39) data 0M
 0801 0000 0024 6c04 ed39 0030 ab17 8147 001f 6c9a 3f1b a029 aaaa 0300 0000 0800 4510 0034 fe95 4000 4006 a3ea c0a8 643c cb38 a816 28bf 0016 eae0 2e41 38e2 060a 8010 0401 383a 0000 0101 080a 158d 61be 067f a3a0

Fix: 

I'm not sure. I don't know why frames are going into the software queue here - no aggregation has been negotiated, so in theory everything _should_ be being hardware queued.

However, ath_tx_swq() is incorrectly checking the hardware queue depth against the sc_hwq_limit for non-aggregate traffic, and it's being software queued.

So I -think- in this case, non-aggregate traffic is still being software queued _and_ only two frames are ever being queued to the hardware. That's likely very sub-optimal, but it's making this particular bug show its ugly head.

What I need to check:

* Are we somehow missing TX interrupts? (eg RAC style bugs)
* There are frames in the hardware TXQ, so are they actually completed? I should turn on reset debugging (sysctl dev.ath.1.debug=0x20) and see what the descriptor dump looks like. If they're completed, a TX interrupt should've occured.
* .. am I also getting TXEOL from the AR5210? That's how the TX interrupt mitigation technique is supposed to work.
How-To-Repeat: * Bring the AR5210 'up'
* disable bgscan (ifconfig wlanX -bgscan)
* Do some small amount of traffic (eg web, ssh) and see it occasionally hang
* check the output of sysctl dev.ath.X.txagg=1
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2012-04-20 02:12:26 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-wireless


Over to maintainer(s).
Comment 2 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:45:29 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.