Bug 165475 - [ath] operational mode change doesn't poke the underlying rate control module hard enough
Summary: [ath] operational mode change doesn't poke the underlying rate control module...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: wireless (show other bugs)
Version: 9.0-RELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-wireless (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-02-25 19:50 UTC by Adrian Chadd
Modified: 2019-01-20 01:13 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Adrian Chadd freebsd_committer freebsd_triage 2012-02-25 19:50:11 UTC
This reared its ugly head when testing with an AR5211 (11b/11a, no 11g.)


Specifically:

* the operational mode change has occured (sc->sc_currates is pointing to the 11b table);
* ath_sample_node->ratemask is 0xff for some reason - likely indicating it was assembled from the 11a rate able (which in ath_hal/ar5211/ar5211_phy.c has 8 11a rates in it);
* so ath_rate_findrate() thinks best_rix is fine and the current rate table mapping is fine.

This is likely very similar to other issues with rate control in ath being slightly weird after an operational mode change, if the NIC hasn't transitioned back into the original operating mode. The rate control code isn't informed of this (it only gets told of association/reassociation, and ath_rate_sample is only updating the rate table on _new_ associations) so it doesn't realise it has to rethink its current rate table setup.

Fix: 

I'm not yet sure.

Because of background scanning, it's entirely possible the NIC will spend a non-zero amount of time off channel, TX'ing things which SHOULD have fixed rates.

The ath_rate module code isn't currently informed about channel changes, as the channel change doesn't inform all associated nodes of this fact.

Any rate control lookups during off-channel times will cause things to be confused.

I should first check whether this crash occured with the NIC being in off-channel mode. If so, it shouldn't have tried TXing a data frame at this point. No, i just checked - ni->ni_vap->iv_flags is 0x430c4010 - and 0x80 is IEEE80211_F_SCAN; 0x100 is IEEE80211_F_ASCAN.

So first let's see if _why_ the NIC is in 11b mode can be made obvious. Then, once that's done, figure out why the transition didn't trigger a rate control update.
How-To-Repeat: 

Setup:

* net80211/ath and kernel built with full debugging, assert, witness, etc
* associated to an 11a AP (so it has the 11a OFDM table)
* running iperf
* the session hangs for some reason, I'm not quite sure yet
* .. then the bgscan code kicks in and starts scanning
* .. and for some reason, the NIC is in 11b mode now, and tries TX'ing
* But the "best rix" in ath_rate_findrate (in ath_rate_sample) is referencing an 11a rate, not an 11b rate - ie, rix > the current greatest rix in the config.
* .. so things panic.
Comment 1 Adrian Chadd freebsd_committer freebsd_triage 2012-02-25 19:53:43 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-wireless

Reassign
Comment 2 dfilter service freebsd_committer freebsd_triage 2012-02-26 06:05:00 UTC
Author: adrian
Date: Sun Feb 26 06:04:44 2012
New Revision: 232170
URL: http://svn.freebsd.org/changeset/base/232170

Log:
  Add in some debugging code to check whether the current rate table has
  been bait-and-switched from the rate control code.
  
  This will avoid the panic that I saw and will avoid sending invalid rates
  (eg 11a/11g OFDM rates when in 11b, on 11b-only NICs (AR5211)) where the
  rate table is not "big".
  
  It also will point out situations where this occurs for the 11n NICs
  which will have sufficiently large rate tables that "invalid rix" doesn't
  occur.
  
  I'll try to follow this up with a commit that adds a current operating mode
  check. The "rix" is only relevant to the current operating mode and rate
  table.
  
  PR:	kern/165475

Modified:
  head/sys/dev/ath/ath_rate/sample/sample.c
  head/sys/dev/ath/ath_rate/sample/sample.h

Modified: head/sys/dev/ath/ath_rate/sample/sample.c
==============================================================================
--- head/sys/dev/ath/ath_rate/sample/sample.c	Sun Feb 26 02:24:40 2012	(r232169)
+++ head/sys/dev/ath/ath_rate/sample/sample.c	Sun Feb 26 06:04:44 2012	(r232170)
@@ -495,6 +495,14 @@ ath_rate_findrate(struct ath_softc *sc, 
 
 	ath_rate_update_static_rix(sc, &an->an_node);
 
+	if (sn->currates != sc->sc_currates) {
+		device_printf(sc->sc_dev, "%s: currates != sc_currates!\n",
+		    __func__);
+		rix = 0;
+		*try0 = ATH_TXMAXTRY;
+		goto done;
+	}
+
 	if (sn->static_rix != -1) {
 		rix = sn->static_rix;
 		*try0 = ATH_TXMAXTRY;
@@ -621,6 +629,20 @@ ath_rate_findrate(struct ath_softc *sc, 
 	}
 	*try0 = mrr ? sn->sched[rix].t0 : ATH_TXMAXTRY;
 done:
+
+	/*
+	 * This bug totally sucks and should be fixed.
+	 *
+	 * For now though, let's not panic, so we can start to figure
+	 * out how to better reproduce it.
+	 */
+	if (rix < 0 || rix >= rt->rateCount) {
+		printf("%s: ERROR: rix %d out of bounds (rateCount=%d)\n",
+		    __func__,
+		    rix,
+		    rt->rateCount);
+		    rix = 0;	/* XXX just default for now */
+	}
 	KASSERT(rix >= 0 && rix < rt->rateCount, ("rix is %d", rix));
 
 	*rix0 = rix;
@@ -1073,6 +1095,8 @@ ath_rate_ctl_reset(struct ath_softc *sc,
         sn->static_rix = -1;
 	ath_rate_update_static_rix(sc, ni);
 
+	sn->currates = sc->sc_currates;
+
 	/*
 	 * Construct a bitmask of usable rates.  This has all
 	 * negotiated rates minus those marked by the hal as

Modified: head/sys/dev/ath/ath_rate/sample/sample.h
==============================================================================
--- head/sys/dev/ath/ath_rate/sample/sample.h	Sun Feb 26 02:24:40 2012	(r232169)
+++ head/sys/dev/ath/ath_rate/sample/sample.h	Sun Feb 26 06:04:44 2012	(r232170)
@@ -86,6 +86,8 @@ struct sample_node {
 	uint32_t ratemask;		/* bit mask of valid rate indices */
 	const struct txschedule *sched;	/* tx schedule table */
 
+	const HAL_RATE_TABLE *currates;
+
 	struct rate_stats stats[NUM_PACKET_SIZE_BINS][SAMPLE_MAXRATES];
 	int last_sample_rix[NUM_PACKET_SIZE_BINS];
 
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Comment 3 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:44:54 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
Comment 4 Oleksandr Tymoshenko freebsd_committer freebsd_triage 2019-01-20 01:13:07 UTC
There was a commit referencing this bug, but it's still not closed and has been inactive for some time. Closing as fixed. Please re-open it if the issue hasn't been completely resolved.