This reared its ugly head when testing with an AR5211 (11b/11a, no 11g.) Specifically: * the operational mode change has occured (sc->sc_currates is pointing to the 11b table); * ath_sample_node->ratemask is 0xff for some reason - likely indicating it was assembled from the 11a rate able (which in ath_hal/ar5211/ar5211_phy.c has 8 11a rates in it); * so ath_rate_findrate() thinks best_rix is fine and the current rate table mapping is fine. This is likely very similar to other issues with rate control in ath being slightly weird after an operational mode change, if the NIC hasn't transitioned back into the original operating mode. The rate control code isn't informed of this (it only gets told of association/reassociation, and ath_rate_sample is only updating the rate table on _new_ associations) so it doesn't realise it has to rethink its current rate table setup. Fix: I'm not yet sure. Because of background scanning, it's entirely possible the NIC will spend a non-zero amount of time off channel, TX'ing things which SHOULD have fixed rates. The ath_rate module code isn't currently informed about channel changes, as the channel change doesn't inform all associated nodes of this fact. Any rate control lookups during off-channel times will cause things to be confused. I should first check whether this crash occured with the NIC being in off-channel mode. If so, it shouldn't have tried TXing a data frame at this point. No, i just checked - ni->ni_vap->iv_flags is 0x430c4010 - and 0x80 is IEEE80211_F_SCAN; 0x100 is IEEE80211_F_ASCAN. So first let's see if _why_ the NIC is in 11b mode can be made obvious. Then, once that's done, figure out why the transition didn't trigger a rate control update. How-To-Repeat: Setup: * net80211/ath and kernel built with full debugging, assert, witness, etc * associated to an 11a AP (so it has the 11a OFDM table) * running iperf * the session hangs for some reason, I'm not quite sure yet * .. then the bgscan code kicks in and starts scanning * .. and for some reason, the NIC is in 11b mode now, and tries TX'ing * But the "best rix" in ath_rate_findrate (in ath_rate_sample) is referencing an 11a rate, not an 11b rate - ie, rix > the current greatest rix in the config. * .. so things panic.
Responsible Changed From-To: freebsd-bugs->freebsd-wireless Reassign
Author: adrian Date: Sun Feb 26 06:04:44 2012 New Revision: 232170 URL: http://svn.freebsd.org/changeset/base/232170 Log: Add in some debugging code to check whether the current rate table has been bait-and-switched from the rate control code. This will avoid the panic that I saw and will avoid sending invalid rates (eg 11a/11g OFDM rates when in 11b, on 11b-only NICs (AR5211)) where the rate table is not "big". It also will point out situations where this occurs for the 11n NICs which will have sufficiently large rate tables that "invalid rix" doesn't occur. I'll try to follow this up with a commit that adds a current operating mode check. The "rix" is only relevant to the current operating mode and rate table. PR: kern/165475 Modified: head/sys/dev/ath/ath_rate/sample/sample.c head/sys/dev/ath/ath_rate/sample/sample.h Modified: head/sys/dev/ath/ath_rate/sample/sample.c ============================================================================== --- head/sys/dev/ath/ath_rate/sample/sample.c Sun Feb 26 02:24:40 2012 (r232169) +++ head/sys/dev/ath/ath_rate/sample/sample.c Sun Feb 26 06:04:44 2012 (r232170) @@ -495,6 +495,14 @@ ath_rate_findrate(struct ath_softc *sc, ath_rate_update_static_rix(sc, &an->an_node); + if (sn->currates != sc->sc_currates) { + device_printf(sc->sc_dev, "%s: currates != sc_currates!\n", + __func__); + rix = 0; + *try0 = ATH_TXMAXTRY; + goto done; + } + if (sn->static_rix != -1) { rix = sn->static_rix; *try0 = ATH_TXMAXTRY; @@ -621,6 +629,20 @@ ath_rate_findrate(struct ath_softc *sc, } *try0 = mrr ? sn->sched[rix].t0 : ATH_TXMAXTRY; done: + + /* + * This bug totally sucks and should be fixed. + * + * For now though, let's not panic, so we can start to figure + * out how to better reproduce it. + */ + if (rix < 0 || rix >= rt->rateCount) { + printf("%s: ERROR: rix %d out of bounds (rateCount=%d)\n", + __func__, + rix, + rt->rateCount); + rix = 0; /* XXX just default for now */ + } KASSERT(rix >= 0 && rix < rt->rateCount, ("rix is %d", rix)); *rix0 = rix; @@ -1073,6 +1095,8 @@ ath_rate_ctl_reset(struct ath_softc *sc, sn->static_rix = -1; ath_rate_update_static_rix(sc, ni); + sn->currates = sc->sc_currates; + /* * Construct a bitmask of usable rates. This has all * negotiated rates minus those marked by the hal as Modified: head/sys/dev/ath/ath_rate/sample/sample.h ============================================================================== --- head/sys/dev/ath/ath_rate/sample/sample.h Sun Feb 26 02:24:40 2012 (r232169) +++ head/sys/dev/ath/ath_rate/sample/sample.h Sun Feb 26 06:04:44 2012 (r232170) @@ -86,6 +86,8 @@ struct sample_node { uint32_t ratemask; /* bit mask of valid rate indices */ const struct txschedule *sched; /* tx schedule table */ + const HAL_RATE_TABLE *currates; + struct rate_stats stats[NUM_PACKET_SIZE_BINS][SAMPLE_MAXRATES]; int last_sample_rix[NUM_PACKET_SIZE_BINS]; _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
batch change: For bugs that match the following - Status Is In progress AND - Untouched since 2018-01-01. AND - Affects Base System OR Documentation DO: Reset to open status. Note: I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
There was a commit referencing this bug, but it's still not closed and has been inactive for some time. Closing as fixed. Please re-open it if the issue hasn't been completely resolved.