I've been seeing TX hangs during my tests. Investigating showed that the TX queue would grow and busy buffers would stay busy. Eg, from sysctl dev.ath.0.txagg=1: HW TXQ 0: axq_depth=0, axq_aggr_depth=0 HW TXQ 1: axq_depth=184, axq_aggr_depth=0 HW TXQ 2: axq_depth=0, axq_aggr_depth=0 HW TXQ 3: axq_depth=0, axq_aggr_depth=0 HW TXQ 8: axq_depth=1, axq_aggr_depth=0 Busy: 14 Total TX buffers: 15; Total TX buffers busy: 1 This occured even with a completely idle access point that only responded to probe requests - ie, no active associations. the only way to flush things was a 'scan' - this forcibly flushes the TX queue and pending frames are either handled or deleted. I then flipped on reset debugging (sysctl dev.ath.0.debug=0x20) and forced a scan whenever I saw this occur. I also dumped the relevant registers when this occured. I found that the TXDP for this queue was completely in the wrong place. I also found that the TX descriptor list made no sense - there were incomplete and complete descriptor lists in the same TX queue, as well as NULL link pointers half way through the list. So, I figured something is splicing the list together incorrectly. Fix: This particular patch seems to quieten down the issues. I'm going to run this a bit more and see what happens. How-To-Repeat: This kernel was compiled with TDMA support, so the ATH_BUF_BUSY flag would be set. * set it up on a 2.4GHz channel; * make sure there's lots of STAs and APs around; * notice the high level of probe request traffic; * .. wait.
Responsible Changed From-To: freebsd-bugs->freebsd-wireless Change to owner
Author: adrian Date: Thu Mar 8 23:53:38 2012 New Revision: 232707 URL: http://svn.freebsd.org/changeset/base/232707 Log: Correctly initialise the TXQ link pointer to the last descriptor in the last buffer in the list. The current behaviour (due to me, so pointy hat is firmly on my head here) was incorrect - it was setting the link pointer to the last descriptor of the _first_ buffer in the TXQ. Instead, it should have set it to the last descriptor in the _last_ buffer in the TXQ. This showed up as occasional TX stalls with frames in the TXQ but no TX progress being made. Further inspection showed the TXQ looked like it contained multiple "lists" of frames - there'd be a list of correct frames, then a NULL link pointer, but there'd be a next buffer in the list. Since this code is only called upon an interface reset, it's likely this only began showing up when I started doing stress testing in environments which annoy the radios enough to cause lockups. I've not yet any TX stalls with this patch applied. PR: kern/165866 Modified: head/sys/dev/ath/if_ath_tx.c Modified: head/sys/dev/ath/if_ath_tx.c ============================================================================== --- head/sys/dev/ath/if_ath_tx.c Thu Mar 8 23:52:22 2012 (r232706) +++ head/sys/dev/ath/if_ath_tx.c Thu Mar 8 23:53:38 2012 (r232707) @@ -623,19 +623,22 @@ void ath_txq_restart_dma(struct ath_softc *sc, struct ath_txq *txq) { struct ath_hal *ah = sc->sc_ah; - struct ath_buf *bf; + struct ath_buf *bf, *bf_last; ATH_TXQ_LOCK_ASSERT(txq); /* This is always going to be cleared, empty or not */ txq->axq_flags &= ~ATH_TXQ_PUTPENDING; + /* XXX make this ATH_TXQ_FIRST */ bf = TAILQ_FIRST(&txq->axq_q); + bf_last = ATH_TXQ_LAST(txq, axq_q_s); + if (bf == NULL) return; ath_hal_puttxbuf(ah, txq->axq_qnum, bf->bf_daddr); - txq->axq_link = &bf->bf_lastds->ds_link; + txq->axq_link = &bf_last->bf_lastds->ds_link; ath_hal_txstart(ah, txq->axq_qnum); } _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
.. and a compressed version. adrian
batch change: For bugs that match the following - Status Is In progress AND - Untouched since 2018-01-01. AND - Affects Base System OR Documentation DO: Reset to open status. Note: I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
There was a commit referencing this bug, but it's still not closed and has been inactive for some time. Closing as fixed. Please re-open it if the issue hasn't been completely resolved.