Hello. My laptop just had a panic: I was not there at the time, so I just found out it had rebooted. kgdb says: > Unread portion of the kernel message buffer: > > > Fatal trap 18: integer divide fault while in kernel mode > cpuid = 0; apic id = 00 > instruction pointer = 0x20:0xffffffff803788bd > stack pointer = 0x28:0xfffffe0110e988e0 > frame pointer = 0x28:0xfffffe0110e98930 > code segment = base rx0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 11 (swi4: clock) > trap number = 18 > panic: integer divide fault > cpuid = 0 > KDB: stack backtrace: > #0 0xffffffff8056ef60 at kdb_backtrace+0x60 > #1 0xffffffff80537685 at panic+0x155 > #2 0xffffffff807cc2bf at trap_fatal+0x38f > #3 0xffffffff807cbf1c at trap+0x75c > #4 0xffffffff807b1df2 at calltrap+0x8 > #5 0xffffffff803824e8 at ar9300_ani_poll_freebsd+0x48 > #6 0xffffffff80330676 at ath_calibrate+0xf6 > #7 0xffffffff8054c747 at softclock_call_cc+0x177 > #8 0xffffffff8054cb84 at softclock+0x94 > #9 0xffffffff8050b7eb at intr_event_execute_handlers+0xab > #10 0xffffffff8050bc36 at ithread_loop+0x96 > #11 0xffffffff8050940a at fork_exit+0x9a > #12 0xffffffff807b232e at fork_trampoline+0xe > Uptime: 3h0m33s I guess the problem is in /usr/src/sys/contrib/dev/ath/ath_hal/ar9300/ar9300_ani.c:1180: 1175 */ 1176 if (!DO_ANI(ah)) { 1177 return; 1178 } 1179 1180 ofdm_phy_err_rate = 1181 ani_state->ofdm_phy_err_count * 1000 / ani_state->listen_time; 1182 cck_phy_err_rate = 1183 ani_state->cck_phy_err_count * 1000 / ani_state->listen_time; Probably ani_state->listen_time is zero, but kgdb won't let me check. This is 10.1/amd64 with ath0@pci0:3:0:0: class=0x028000 card=0xe052105b chip=0x0034168c rev=0x01 hdr=0x00 vendor = 'Atheros Communications Inc.' device = 'AR9462 Wireless Network Adapter' class = network Hope this is the needed info to track this down; otherwise, I'm willing to provide. This crash is quite rare, since my laptop usually works fine.
I'm guessing its: /* XXX beware of overflow? */ ani_state->listen_time += listen_time; in that function. Just add this underneath: if (ani_state->listen_time == 0) { /* restart ANI period if listen_time is invalid */ HALDEBUG(ah, HAL_DEBUG_ANI, "%s: listen_time=%d - calling ar9300_ani_restart\n", __func__, listen_time); ar9300_ani_restart(ah); return; } i bet that fixes it!
I rebuilt my kernel with this patch and had no error. I'll report back in case I see this panic again, altough it was very rare in any case (so no panic doesn't mean problem fixed).
Hello. As I said, I added the proposed patch a long time ago and had no such panics for more than a year. However my laptop had another one yesterday. ani_state->listen_time is in fact 0! This is strange, since I don't think anything should change it between ani_state->listen_time += listen_time; and ofdm_phy_err_rate = ani_state->ofdm_phy_err_count * 1000 / ani_state->listen_time; Is it possible somthing changed this in another thread? (Please forgive my ignorance of the kernel if it's something that stupid). In any case, I'm enabling AH_DEBUG in the kernel. P.S. In the meantime I upgraded to 10.3.
hi! i thought i fixed this in freebsd-head! have you tried 11.0-RELEASE? -a
and yeah, I thought there was locking protecting the ANI path. I'll go check. (The locking would be done by the driver, not the HAL.)
I'm not prepared to switch to 11.0 now (for some reasons I won't annoy you with). I might consider this in a short term if it really fixes, but, if it's possible, I'd rather patch the driver. Would it be possible?
batch change: For bugs that match the following - Status Is In progress AND - Untouched since 2018-01-01. AND - Affects Base System OR Documentation DO: Reset to open status. Note: I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
Hello. While I've upgraded to 11.1 in the meantime, I'm still running with the afore posted patch. This reduces the panics by a big factor, but I'm still experiencing one from time to time (read one in several months).
^Triage: I'm sorry that this PR did not get addressed in a timely fashion. By now, the version that it was created against is long out of support. Please re-open if it is still a problem on a supported version.
(In reply to Mark Linimon from comment #9) I'm not so sure... I'm still running with the patch above on 14.2. However the afore mentioned laptop is now my "backup" one, so I very seldom power it up and consequently I've never seen a panic in years. I won't reopen for now. Thanks anyway.