Bug 163759

Summary: [ath] ath(4) "stops working" in hostap mode
Product: Base System Reporter: Nathan Lay <nsl03>
Component: wirelessAssignee: freebsd-wireless (Nobody) <wireless>
Status: Open ---    
Severity: Affects Only Me    
Priority: Normal    
Version: Unspecified   
Hardware: Any   
OS: Any   

Description Nathan Lay 2012-01-02 01:00:23 UTC
At an arbitrary time, ath "stops working" while in hostap mode. It vanishes with respect to other wireless clients and it cannot be fixed by bringing the interface up/down or destroying and recreating the interface. Tcpdump confirms that the access point really is no longer visible. Reloading the driver, however, can remedy the problem. The problem device is given below:

ath0: <Atheros 5416> mem 0xfe9f0000-0xfe9fffff irq 16 at device 0.0 on pci1
ath0: AR5418 mac 12.10 RF2133 phy 8.1

Here is how it is configured and used:
create_args_wlan0="wlanmode hostap -bgscan"
ifconfig_wlan0="channel 5:ht/40 ssid Lamp up"
autobridge_bridge0="wlan0 lan0"

It also sits behind pf.

The kernel is not compiled with ATH_ENABLE_11N.

It is also worth mentioning that the aforementioned configuration worked without problems in 8.x.

Other suspicious behavior:
athstats before the problem:
 bexmit bmiss
   4410     0
     10     0
     10     0
      9     0

athstats after the problem:
 bexmit bmiss
  52014     0
      5     0
      4     0
      5     0

dmesg frequently reports beacon misses before and after the problem:
ath0: stuck beacon; resetting (bmiss count 4)
 
Here is the output of athstats after the problem:
352222   data frames received
317354   data frames transmit
113      tx frames with an alternate rate
11861    long on-chip tx retries
755      tx failed 'cuz too many retries
691      stuck beacon conditions
1M       current transmit rate
5537     tx frames with no ack marked
309725   tx frames with short preamble
11773    rx failed 'cuz of bad CRC
3053     rx failed 'cuz of PHY err
    3053     CCK restart
52218    beacons transmitted
181      periodic calibrations
-0/+0    TDMA slot adjust (usecs, smoothed)
58       rssi of last ack
31       avg recv rssi
-96      rx noise floor
2092     tx frames through raw api
241      cabq frames transmitted
97       cabq xmit overflowed beacon interval
1        spur immunity level
54       ANI increased spur immunity
53       ANI decrease spur immunity
693      ANI enabled OFDM weak signal detect
693      ANI disabled CCK weak signal threshold
13947724 cumulative OFDM phy error count
13047675 cumulative CCK phy error count
902      ANI forced listen time to zero
11860    missing ACK's
21408    bad FCS
24       average rssi (beacons only)
Antenna profile:
[0] tx   316491 rx    23814
[1] tx        0 rx   328408

How-To-Repeat: No known way to repeat the problem. However, the following:
options ATH_DEBUG
options AH_DEBUG
options ATH_DIAGAPI

seem to make the problem happen more frequently.
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2012-01-02 05:18:43 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-wireless

Over to maintainer(s).
Comment 2 Adrian Chadd freebsd_committer freebsd_triage 2012-01-02 06:16:21 UTC
A little more digging has shown at least one source of these: software
retries are sneaking onto the list.

Ie:

* force 11n aggregation up - do a whole bunch of traffic;
* enabled debugging - sysctl dev.ath.1.debug=0x7c002000 - that's the
SW TX handling bits and the TX_PROC debugging;
* ping -i 0.3 <ip> in one screen
* scan in the other (ifconfig wlan1 scan)
* notice the tid_drain things being logged.

What I've seen:

* frame is queued via ath_start() or ath_raw_xmit()
* .. it makes it out to the hardware
* ath_tx_processq() is called in the flush routine, with dosched=0
* .. and it requires a retry, for reasons I haven't yet figured out.
Since aggregation is up, the frame is retried in software.
* .. so the frame is replaced on the software queue, but sched isn't
called for it, so it sits on the software queue.
* .. then drain is called, with a software-queued frame in the queue.

So now, what should I do here? Hum.


adrian
Comment 3 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:44:05 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.