Bug 203887

Summary: Integer divide panic
Product: Base System Reporter: ml
Component: wirelessAssignee: freebsd-wireless (Nobody) <wireless>
Status: Open ---    
Severity: Affects Some People CC: adrian
Priority: ---    
Version: 10.1-RELEASE   
Hardware: Any   
OS: Any   

Description ml 2015-10-20 10:19:24 UTC
Hello.

My laptop just had a panic: I was not there at the time, so I just found out it had rebooted.

kgdb says:

> Unread portion of the kernel message buffer:
> 
> 
> Fatal trap 18: integer divide fault while in kernel mode
> cpuid = 0; apic id = 00
> instruction pointer	= 0x20:0xffffffff803788bd
> stack pointer	        = 0x28:0xfffffe0110e988e0
> frame pointer	        = 0x28:0xfffffe0110e98930
> code segment		= base rx0, limit 0xfffff, type 0x1b
> 			= DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags	= interrupt enabled, resume, IOPL = 0
> current process		= 11 (swi4: clock)
> trap number		= 18
> panic: integer divide fault
> cpuid = 0
> KDB: stack backtrace:
> #0 0xffffffff8056ef60 at kdb_backtrace+0x60
> #1 0xffffffff80537685 at panic+0x155
> #2 0xffffffff807cc2bf at trap_fatal+0x38f
> #3 0xffffffff807cbf1c at trap+0x75c
> #4 0xffffffff807b1df2 at calltrap+0x8
> #5 0xffffffff803824e8 at ar9300_ani_poll_freebsd+0x48
> #6 0xffffffff80330676 at ath_calibrate+0xf6
> #7 0xffffffff8054c747 at softclock_call_cc+0x177
> #8 0xffffffff8054cb84 at softclock+0x94
> #9 0xffffffff8050b7eb at intr_event_execute_handlers+0xab
> #10 0xffffffff8050bc36 at ithread_loop+0x96
> #11 0xffffffff8050940a at fork_exit+0x9a
> #12 0xffffffff807b232e at fork_trampoline+0xe
> Uptime: 3h0m33s

I guess the problem is in /usr/src/sys/contrib/dev/ath/ath_hal/ar9300/ar9300_ani.c:1180:
1175	     */
1176	    if (!DO_ANI(ah)) {
1177	        return;
1178	    }
1179	
1180	    ofdm_phy_err_rate =
1181	        ani_state->ofdm_phy_err_count * 1000 / ani_state->listen_time;
1182	    cck_phy_err_rate =
1183	        ani_state->cck_phy_err_count * 1000 / ani_state->listen_time;

Probably ani_state->listen_time is zero, but kgdb won't let me check.



This is 10.1/amd64 with 

ath0@pci0:3:0:0:	class=0x028000 card=0xe052105b chip=0x0034168c rev=0x01 hdr=0x00
    vendor     = 'Atheros Communications Inc.'
    device     = 'AR9462 Wireless Network Adapter'
    class      = network


Hope this is the needed info to track this down; otherwise, I'm willing to provide.



This crash is quite rare, since my laptop usually works fine.
Comment 1 Adrian Chadd freebsd_committer freebsd_triage 2015-10-20 18:24:45 UTC
I'm guessing its:

    /* XXX beware of overflow? */
    ani_state->listen_time += listen_time;

in that function.

Just add this underneath:

if (ani_state->listen_time == 0) {
        /* restart ANI period if listen_time is invalid */
        HALDEBUG(ah, HAL_DEBUG_ANI,
            "%s: listen_time=%d - calling ar9300_ani_restart\n",
            __func__, listen_time);
        ar9300_ani_restart(ah);
        return;
}

i bet that fixes it!
Comment 2 ml 2015-10-23 14:47:15 UTC
I rebuilt my kernel with this patch and had no error.

I'll report back in case I see this panic again, altough it was very rare in any case (so no panic doesn't mean problem fixed).
Comment 3 ml 2017-01-11 07:50:04 UTC
Hello.

As I said, I added the proposed patch a long time ago and had no such panics for more than a year.

However my laptop had another one yesterday.



ani_state->listen_time is in fact 0!

This is strange, since I don't think anything should change it between 
    ani_state->listen_time += listen_time;
and
    ofdm_phy_err_rate =
        ani_state->ofdm_phy_err_count * 1000 / ani_state->listen_time;

Is it possible somthing changed this in another thread? (Please forgive my ignorance of the kernel if it's something that stupid).

In any case, I'm enabling AH_DEBUG in the kernel.



P.S. In the meantime I upgraded to 10.3.
Comment 4 Adrian Chadd freebsd_committer freebsd_triage 2017-01-11 09:09:23 UTC
hi! i thought i fixed this in freebsd-head! have you tried 11.0-RELEASE?


-a
Comment 5 Adrian Chadd freebsd_committer freebsd_triage 2017-01-11 09:10:04 UTC
and yeah, I thought there was locking protecting the ANI path. I'll go check.

(The locking would be done by the driver, not the HAL.)
Comment 6 ml 2017-01-11 13:41:36 UTC
I'm not prepared to switch to 11.0 now (for some reasons I won't annoy you with).
I might consider this in a short term if it really fixes, but, if it's possible, I'd rather patch the driver.
Would it be possible?
Comment 7 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:48:23 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
Comment 8 ml 2018-06-11 08:46:27 UTC
Hello.

While I've upgraded to 11.1 in the meantime, I'm still running with the afore posted patch. This reduces the panics by a big factor, but I'm still experiencing one from time to time (read one in several months).