174283 – [net80211] panics in ieee80211_ff_age() and ieee80211_ff_flush()

Bug 174283 - [net80211] panics in ieee80211_ff_age() and ieee80211_ff_flush()

Summary: [net80211] panics in ieee80211_ff_age() and ieee80211_ff_flush()

Status:	Closed FIXED

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	wireless (show other bugs)
Version:	Unspecified
Hardware:	Any Any

Importance:	Normal Affects Only Me
Assignee:	freebsd-wireless (Nobody)

URL:
Keywords:

Depends on:
Blocks:

Reported:	2012-12-09 00:50 UTC by Adrian Chadd
Modified:	2019-01-20 10:58 UTC (History)
CC List:	2 users (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Adrian Chadd freebsd_committer

2012-12-09 00:50:00 UTC

There are panics in the net80211 fast-frame queue ageing and flushing code.

It looks like the staging queue ends up being empty and the net80211 FF routines have KASSERT()s to make sure the queue isn't empty.  I'm guessing its a sanity check - it shouldn't be called when the queues are empty.

However, the check is done without the comlock being held, so it's entirely plausible that there'll be a race or preemption between the check and actually checking/emptying the queue; where another thread (CPU or preempted thread) will empty the FF AC queue for us; once this returns it panics.

kgdb analysis of a crashdump shows:

* ath_tx_processq()
* ieee80211_ff_flush()
* ieee80211_ff_age()

ieee80211_ff_flush() checks if the queue is empty and if not, it calls ieee80211_ff_flush().

There's a bunch of places the FF routines are called from and these can and do overlap.

Fix: 

The solutions?

* stick the ieee80211_ff_*() calls in a specific taskqueue and call them from there, rather than from both the TX, RX and TX completion context;
* grab the comlock before checking, and make sure the function expects the comlock to be held and frees the comlock after;
* Just accept (and document) the check is racy/opportunistic; and remove the "is the queue empty?" KASSERT()s in the FF code.
How-To-Repeat: * run 9-stable or -head with assert/witness enabled;
* iperf TCP between FF capable stations - just wait a while, it'll eventually trigger!

Comment 1 Adrian Chadd freebsd_committer

2012-12-09 00:50:24 UTC

Responsible Changed
From-To: freebsd-bugs->freebsd-wireless

punt to maintainer list

Comment 2 Eitan Adler freebsd_committer

2018-05-28 19:48:42 UTC

batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.

Comment 3 Oleksandr Tymoshenko freebsd_committer

2019-01-19 06:44:42 UTC

I believe this has been fixed in base r244044. Please re-open the PR if it's not the case.

Comment 4 Oleksandr Tymoshenko freebsd_committer

2019-01-20 03:24:13 UTC

There was a commit referencing this bug (base r244044), but it's still not closed and has been inactive for some time. Closing as fixed. Please re-open it if the issue hasn't been completely resolved.

Comment 5 Andriy Voskoboinyk freebsd_committer

2019-01-20 10:58:40 UTC

Yes, it was later fixed once more in base r244051; locking was added in base r302283.