There are panics in the net80211 fast-frame queue ageing and flushing code. It looks like the staging queue ends up being empty and the net80211 FF routines have KASSERT()s to make sure the queue isn't empty. I'm guessing its a sanity check - it shouldn't be called when the queues are empty. However, the check is done without the comlock being held, so it's entirely plausible that there'll be a race or preemption between the check and actually checking/emptying the queue; where another thread (CPU or preempted thread) will empty the FF AC queue for us; once this returns it panics. kgdb analysis of a crashdump shows: * ath_tx_processq() * ieee80211_ff_flush() * ieee80211_ff_age() ieee80211_ff_flush() checks if the queue is empty and if not, it calls ieee80211_ff_flush(). There's a bunch of places the FF routines are called from and these can and do overlap. Fix: The solutions? * stick the ieee80211_ff_*() calls in a specific taskqueue and call them from there, rather than from both the TX, RX and TX completion context; * grab the comlock before checking, and make sure the function expects the comlock to be held and frees the comlock after; * Just accept (and document) the check is racy/opportunistic; and remove the "is the queue empty?" KASSERT()s in the FF code. How-To-Repeat: * run 9-stable or -head with assert/witness enabled; * iperf TCP between FF capable stations - just wait a while, it'll eventually trigger!
Responsible Changed From-To: freebsd-bugs->freebsd-wireless punt to maintainer list
batch change: For bugs that match the following - Status Is In progress AND - Untouched since 2018-01-01. AND - Affects Base System OR Documentation DO: Reset to open status. Note: I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
I believe this has been fixed in base r244044. Please re-open the PR if it's not the case.
There was a commit referencing this bug (base r244044), but it's still not closed and has been inactive for some time. Closing as fixed. Please re-open it if the issue hasn't been completely resolved.
Yes, it was later fixed once more in base r244051; locking was added in base r302283.