Created attachment 165714 [details]
- no new devices / drivers were being attached when error occurred
- system has been working for >1 day before crash (stable)
- job being performed (at a time or shortly before) - cross compile kernel for arm
- no user activities at crash
From the core.txt, the crash occurred due to a divide by zero in the ath(4) driver. Specifically, this line in ar9300_ani.c:
ani_state->ofdm_phy_err_count * 1000 / ani_state->listen_time;
This means 'listen_time' must be zero.
Some other places in the debugging code handle the listen_time == 0 case explicitly, e.g.:
/* express ofdm_phy_err_count as errors/second */
log_data.ofdm_phy_err_count = ani_state->listen_time ?
ani_state->ofdm_phy_err_count * 1000 / ani_state->listen_time : 0;
/* express cck_phy_err_count as errors/second */
log_data.cck_phy_err_count = ani_state->listen_time ?
ani_state->cck_phy_err_count * 1000 / ani_state->listen_time : 0;
There is this comment here where listen_time is updated:
/* XXX beware of overflow? */
ani_state->listen_time += listen_time;
I suspect you were bitten by the overflow wrapping to zero. I've added Adrian who might have a suggestion on how best to handle the overflow to zero. The code is the same in HEAD so I suspect this is busted there as well.
(In reply to John Baldwin from comment #1)
Thanks John for analysis of symptoms. Explanation sounds reasonable.