Bug 165966 - [ath] ath0: device timeout on SMP machines due to race conditions
Summary: [ath] ath0: device timeout on SMP machines due to race conditions
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: wireless (show other bugs)
Version: 9.0-RELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-wireless (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-03-12 07:40 UTC by Adrian Chadd
Modified: 2018-05-28 19:42 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Adrian Chadd freebsd_committer freebsd_triage 2012-03-12 07:40:09 UTC
I've traced down a device timeout to the following race condition.

* Something queues a frame via ath_start();
* The TX is queued to the hardware;
* An interrupt occurs immediately;
* The taskqueue is scheduled;
* The completion occurs via ath_tx_processq();
* sc_wd_timer is set to 0;
* .. then, back to ath_start() we go and sc_wd_timer is set to 5.

I'm not (yet) sure whether it's occuring on both CPUs or whether it's occuring on a single CPU with preemption going on.

Here's a snippet of the debug log (with sysctl dev.ath.0.debug=0x3023) and a couple of statements which print something whenever sc_wd_timer is set to something non-zero:


[100080] ath0: ath_tx_dmasetup: m 0xc5d3a900 len 128
TODS 00:03:7f:07:44:78->f4:ec:38:9c:47:24(00:1b:b1:58:f6:f0) data QoS [TID 0] WEP [IV 05 00 00 00 00 00 KID 0] 6M
 8841 3c00 001b b158 f6f0 0003 7f07 4478 f4ec 389c 4724 4000 0000 4724 0500 0020 0000 0000 aaaa 0300 0000 0800 4500 0054 001a 0000 4001 ad94 0a0b 0147 c0a8 0101 0800 8bea 8908 0000 4f5d 95ad 0007 12f8 0809 0a0b 0c0d 0e0f 1011 1213 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 3435 3637
[100080] ath0: ath_tx_chaindesclist: 0: 00000000 0253a95c 613f008a 00008080 24348000 0004b50c
[100080] ath0: ath_tx_handoff_hw: TXDP[1] = 0x1f2a2b00 (0xd8a58b00) depth 1
[100022] ath0: ath_intr: status 0xc0
[100124] ath0: ath_tx_processq: tx queue 1 head 0x1f2a2b00 link 0xd8a58b00
Q1[  0] (DS.V:0xd8a58b00 DS.P:0x1f2a2b00) L:00000000 D:0253a95c F:0011 *
[100022] ath0: ath_intr: status 0x9
        TXF: 0011 Seq: 64 swtry: 0 ADDBAW?: 0 DOBAW?: 0
        613f008a 00008080 24348000 0004b50c c32a0001 0005e081
  [end]
[100022] ath0: ath_intr: status 0x9

. then 5 seconds later:


[100022] ath0: ath_intr: status 0x9
ath0: device timeout
[100124] ath0: ath_reset: called

The number at the beginning is the thread id, (via patching device_printf()).

Fix: 

I'm honestly not sure at this point. :-)
Comment 1 Adrian Chadd freebsd_committer freebsd_triage 2012-03-12 07:42:32 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-wireless

Reassign
Comment 2 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:42:27 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.