At an arbitrary time, ath "stops working" while in hostap mode. It vanishes with respect to other wireless clients and it cannot be fixed by bringing the interface up/down or destroying and recreating the interface. Tcpdump confirms that the access point really is no longer visible. Reloading the driver, however, can remedy the problem. The problem device is given below:
ath0: <Atheros 5416> mem 0xfe9f0000-0xfe9fffff irq 16 at device 0.0 on pci1
ath0: AR5418 mac 12.10 RF2133 phy 8.1
Here is how it is configured and used:
create_args_wlan0="wlanmode hostap -bgscan"
ifconfig_wlan0="channel 5:ht/40 ssid Lamp up"
It also sits behind pf.
The kernel is not compiled with ATH_ENABLE_11N.
It is also worth mentioning that the aforementioned configuration worked without problems in 8.x.
Other suspicious behavior:
athstats before the problem:
athstats after the problem:
dmesg frequently reports beacon misses before and after the problem:
ath0: stuck beacon; resetting (bmiss count 4)
Here is the output of athstats after the problem:
352222 data frames received
317354 data frames transmit
113 tx frames with an alternate rate
11861 long on-chip tx retries
755 tx failed 'cuz too many retries
691 stuck beacon conditions
1M current transmit rate
5537 tx frames with no ack marked
309725 tx frames with short preamble
11773 rx failed 'cuz of bad CRC
3053 rx failed 'cuz of PHY err
3053 CCK restart
52218 beacons transmitted
181 periodic calibrations
-0/+0 TDMA slot adjust (usecs, smoothed)
58 rssi of last ack
31 avg recv rssi
-96 rx noise floor
2092 tx frames through raw api
241 cabq frames transmitted
97 cabq xmit overflowed beacon interval
1 spur immunity level
54 ANI increased spur immunity
53 ANI decrease spur immunity
693 ANI enabled OFDM weak signal detect
693 ANI disabled CCK weak signal threshold
13947724 cumulative OFDM phy error count
13047675 cumulative CCK phy error count
902 ANI forced listen time to zero
11860 missing ACK's
21408 bad FCS
24 average rssi (beacons only)
 tx 316491 rx 23814
 tx 0 rx 328408
How-To-Repeat: No known way to repeat the problem. However, the following:
seem to make the problem happen more frequently.
Over to maintainer(s).
A little more digging has shown at least one source of these: software
retries are sneaking onto the list.
* force 11n aggregation up - do a whole bunch of traffic;
* enabled debugging - sysctl dev.ath.1.debug=0x7c002000 - that's the
SW TX handling bits and the TX_PROC debugging;
* ping -i 0.3 <ip> in one screen
* scan in the other (ifconfig wlan1 scan)
* notice the tid_drain things being logged.
What I've seen:
* frame is queued via ath_start() or ath_raw_xmit()
* .. it makes it out to the hardware
* ath_tx_processq() is called in the flush routine, with dosched=0
* .. and it requires a retry, for reasons I haven't yet figured out.
Since aggregation is up, the frame is retried in software.
* .. so the frame is replaced on the software queue, but sched isn't
called for it, so it sits on the software queue.
* .. then drain is called, with a software-queued frame in the queue.
So now, what should I do here? Hum.
For bugs that match the following
- Status Is In progress
- Untouched since 2018-01-01.
- Affects Base System OR Documentation
Reset to open status.
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.