Bug 165866 - [ath] TX hangs, requiring a "scan" to properly reset the interface
Summary: [ath] TX hangs, requiring a "scan" to properly reset the interface
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: wireless (show other bugs)
Version: Unspecified
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-wireless (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-03-08 23:40 UTC by Adrian Chadd
Modified: 2019-01-20 01:14 UTC (History)
1 user (show)

See Also:


Attachments
file.diff (884 bytes, patch)
2012-03-08 23:40 UTC, Adrian Chadd
no flags Details | Diff
b.txt.gz (21.15 KB, application/x-gzip)
2012-03-08 23:54 UTC, Adrian Chadd
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Adrian Chadd freebsd_committer freebsd_triage 2012-03-08 23:40:10 UTC
I've been seeing TX hangs during my tests.

Investigating showed that the TX queue would grow and busy buffers would stay busy.

Eg, from sysctl dev.ath.0.txagg=1:


HW TXQ 0: axq_depth=0, axq_aggr_depth=0
HW TXQ 1: axq_depth=184, axq_aggr_depth=0
HW TXQ 2: axq_depth=0, axq_aggr_depth=0
HW TXQ 3: axq_depth=0, axq_aggr_depth=0
HW TXQ 8: axq_depth=1, axq_aggr_depth=0
Busy: 14
Total TX buffers: 15; Total TX buffers busy: 1

This occured even with a completely idle access point that only responded to probe requests - ie, no active associations.

the only way to flush things was a 'scan' - this forcibly flushes the TX queue and pending frames are either handled or deleted.

I then flipped on reset debugging (sysctl dev.ath.0.debug=0x20) and forced a scan whenever I saw this occur.

I also dumped the relevant registers when this occured. I found that the TXDP for this queue was completely in the wrong place.

I also found that the TX descriptor list made no sense - there were incomplete and complete descriptor lists in the same TX queue, as well as NULL link pointers half way through the list.

So, I figured something is splicing the list together incorrectly.

Fix: This particular patch seems to quieten down the issues. I'm going to run this a bit more and see what happens.

How-To-Repeat: This kernel was compiled with TDMA support, so the ATH_BUF_BUSY flag would be set.

* set it up on a 2.4GHz channel;
* make sure there's lots of STAs and APs around;
* notice the high level of probe request traffic;
* .. wait.
Comment 1 Adrian Chadd freebsd_committer freebsd_triage 2012-03-08 23:50:34 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-wireless

Change to owner
Comment 2 dfilter service freebsd_committer freebsd_triage 2012-03-08 23:53:47 UTC
Author: adrian
Date: Thu Mar  8 23:53:38 2012
New Revision: 232707
URL: http://svn.freebsd.org/changeset/base/232707

Log:
  Correctly initialise the TXQ link pointer to the last descriptor in
  the last buffer in the list.
  
  The current behaviour (due to me, so pointy hat is firmly on my head here)
  was incorrect - it was setting the link pointer to the last descriptor
  of the _first_ buffer in the TXQ.  Instead, it should have set it to the
  last descriptor in the _last_ buffer in the TXQ.
  
  This showed up as occasional TX stalls with frames in the TXQ but no
  TX progress being made.  Further inspection showed the TXQ looked like
  it contained multiple "lists" of frames - there'd be a list of correct
  frames, then a NULL link pointer, but there'd be a next buffer in the
  list.
  
  Since this code is only called upon an interface reset, it's likely
  this only began showing up when I started doing stress testing
  in environments which annoy the radios enough to cause lockups.
  
  I've not yet any TX stalls with this patch applied.
  
  PR:		kern/165866

Modified:
  head/sys/dev/ath/if_ath_tx.c

Modified: head/sys/dev/ath/if_ath_tx.c
==============================================================================
--- head/sys/dev/ath/if_ath_tx.c	Thu Mar  8 23:52:22 2012	(r232706)
+++ head/sys/dev/ath/if_ath_tx.c	Thu Mar  8 23:53:38 2012	(r232707)
@@ -623,19 +623,22 @@ void
 ath_txq_restart_dma(struct ath_softc *sc, struct ath_txq *txq)
 {
 	struct ath_hal *ah = sc->sc_ah;
-	struct ath_buf *bf;
+	struct ath_buf *bf, *bf_last;
 
 	ATH_TXQ_LOCK_ASSERT(txq);
 
 	/* This is always going to be cleared, empty or not */
 	txq->axq_flags &= ~ATH_TXQ_PUTPENDING;
 
+	/* XXX make this ATH_TXQ_FIRST */
 	bf = TAILQ_FIRST(&txq->axq_q);
+	bf_last = ATH_TXQ_LAST(txq, axq_q_s);
+
 	if (bf == NULL)
 		return;
 
 	ath_hal_puttxbuf(ah, txq->axq_qnum, bf->bf_daddr);
-	txq->axq_link = &bf->bf_lastds->ds_link;
+	txq->axq_link = &bf_last->bf_lastds->ds_link;
 	ath_hal_txstart(ah, txq->axq_qnum);
 }
 
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Comment 3 Adrian Chadd freebsd_committer freebsd_triage 2012-03-08 23:54:18 UTC
.. and a compressed version.



adrian
Comment 4 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:46:56 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
Comment 5 Oleksandr Tymoshenko freebsd_committer freebsd_triage 2019-01-20 01:14:03 UTC
There was a commit referencing this bug, but it's still not closed and has been inactive for some time. Closing as fixed. Please re-open it if the issue hasn't been completely resolved.