Bug 206334 - [ath] panic integer divide fault
Summary: [ath] panic integer divide fault
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.2-RELEASE
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-wireless (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-01-17 14:27 UTC by Kris
Modified: 2016-08-08 08:14 UTC (History)
2 users (show)

See Also:


Attachments
core.txt (133.77 KB, text/plain)
2016-01-17 14:27 UTC, Kris
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kris 2016-01-17 14:27:10 UTC
Created attachment 165714 [details]
core.txt

- no new devices / drivers were being attached when error occurred
- system has been working for >1 day before crash (stable)
- job being performed (at a time or shortly before) - cross compile kernel for arm
- no user activities at crash
Comment 1 John Baldwin freebsd_committer freebsd_triage 2016-01-22 17:33:03 UTC
From the core.txt, the crash occurred due to a divide by zero in the ath(4) driver.  Specifically, this line in ar9300_ani.c:

    ofdm_phy_err_rate =
        ani_state->ofdm_phy_err_count * 1000 / ani_state->listen_time;

This means 'listen_time' must be zero.

Some other places in the debugging code handle the listen_time == 0 case explicitly, e.g.:

        /* express ofdm_phy_err_count as errors/second */
        log_data.ofdm_phy_err_count = ani_state->listen_time ?
            ani_state->ofdm_phy_err_count * 1000 / ani_state->listen_time : 0;
        /* express cck_phy_err_count as errors/second */
        log_data.cck_phy_err_count =  ani_state->listen_time ?
            ani_state->cck_phy_err_count * 1000 / ani_state->listen_time  : 0;


There is this comment here where listen_time is updated:

    /* XXX beware of overflow? */
    ani_state->listen_time += listen_time;

I suspect you were bitten by the overflow wrapping to zero.  I've added Adrian who might have a suggestion on how best to handle the overflow to zero.  The code is the same in HEAD so I suspect this is busted there as well.
Comment 2 Kris 2016-01-25 15:23:11 UTC
(In reply to John Baldwin from comment #1)
Thanks John for analysis of symptoms. Explanation sounds reasonable.