Bug 59876

Summary: APM suspend/resume broken with FreeBSD 5.2-BETA on IBM Thinkpad A30p
Product: Base System Reporter: Jesse D.Guardiani <jesse>
Component: miscAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: Unspecified   
Hardware: Any   
OS: Any   

Description Jesse D.Guardiani 2003-12-01 18:40:16 UTC
	
        I enable the software watchdog, then suspend the machine and capture
        output on serial console:

Lock GEOM topology not exclusively locked @ ../../.
./geom/geom_subr.c:261
acd0: WARNING - removed from configuration
sio4: detached

       The machine is fully suspended at this point. Now I resume the machine:

pci_cfgintr: 0:29 INTA BIOS irq 9
pci_cfgintr: 0:29 INTB BIOS irq 11
pci_cfgintr: 0:29 INTC BIOS irq 9
pci_cfgintr: 0:31 INTB BIOS irq 5
pci_cfgintr: 0:31 INTB BIOS irq 5
pci_cfgintr: 0:31 INTB BIOS irq 5
pci_cfgintr: 1:0 INTA BIOS irq 9
pci_cfgintr: 2:0 INTA BIOS irq 9
pci_cfgintr: 2:0 INTB BIOS irq 5
pci_cfgintr: 2:0 INTC BIOS irq 9
pci_cfgintr: 2:2 INTA BIOS irq 9
pci_cfgintr: 2:8 INTA BIOS irq 10
ata0: resetting devices ..

       Normally the machine would lock at this point with the hard disk light on.
       The only way to fix it is to power down and back on. However, since I have
       enabled the software watchdog, I simply wait a few seconds and I receive
       this:

interrupt                   total
irq0: clk                          18254
irq1: atkbd0                           9
irq3: sio1                             6
irq4: sio0                           882
irq6: fdc0                             1
irq9: cbb0 wi0++                      37
irq13: npx0                            1
irq14: ata0                         3524
irq15: ata1                           32
Total                       22746
watchdog_fire(c073ba80,2,c06d9616,f5,d2a0bca4) at watchdog_fire+0xb5
hardclock(d2a0bca4,0,c06f57a0,bf,c3a6bd00) at hardclock+0x10a
clkintr(d2a0bca4,d2a0bc70,c0526125,c07134e0,0) at clkintr+0xa9
intr_execute_handlers(c072c880,d2a0bca4,c07134e0,2bc530c4,c19d2c5c) at intr_exec
ute_handlers+0xb8
atpic_handle_intr(0) at atpic_handle_intr+0xbf
Xatpic_intr0() at Xatpic_intr0+0x1e
--- interrupt, eip = 0xc06925b5, esp = 0xd2a0bce8, ebp = 0xd2a0bce8 ---
cpu_idle_default(d2a0bd10,c050b55c,c0739bc0,2,c06da083) at cpu_idle_default+0x5
cpu_idle(c0739bc0,2,c06da083,53,c050b520) at cpu_idle+0x1f
idle_proc(0,d2a0bd48,c06d9f44,311,0) at idle_proc+0x3c
fork_exit(c050b520,0,d2a0bd48) at fork_exit+0xb4
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xd2a0bd7c, ebp = 0 ---
Debugger("watchdog timeout")
Stopped at      Debugger+0x54:  xchgl   %ebx,in_Debugger.0
db>

       I hope the above information is helpful. I'm not a kernel programmer. I AM
       familiar with C, but I'm not very good with debuggers. Please let me know
       if you need additional info. Thanks!

Fix: 

unknown.
How-To-Repeat: 	
        Step 1: Fn + F4 to suspend machine
        Step 2: Fn to resume machine
Comment 1 Jesse D.Guardiani 2003-12-09 14:55:59 UTC
Had a chance to repeat this proceedure last night
and panic the kernel at the DDB prompt. Here's
the backtrace from my core dump:

#0  doadump () at ../../../kern/kern_shutdown.c:240
#1  0xc051fd2c in boot (howto=260) at ../../../kern/kern_shutdown.c:372
#2  0xc05200b7 in panic () at ../../../kern/kern_shutdown.c:550
#3  0xc0527dc5 in mi_switch () at ../../../kern/kern_synch.c:470
#4  0xc051fa18 in boot (howto=256) at ../../../kern/kern_shutdown.c:312
#5  0xc05200b7 in panic () at ../../../kern/kern_shutdown.c:550
#6  0xc044d392 in db_panic () at ../../../ddb/db_command.c:450
#7  0xc044d2f2 in db_command (last_cmdp=0xc07305a0, cmd_table=0x0, aux_cmd_tablep=0xc06fa26c, aux_cmd_tablep_end=0xc06fa270)
    at ../../../ddb/db_command.c:346
#8  0xc044d435 in db_command_loop () at ../../../ddb/db_command.c:472
#9  0xc0450435 in db_trap (type=3, code=0) at ../../../ddb/db_trap.c:73
#10 0xc068a61c in kdb_trap (type=3, code=0, regs=0xd2a0bb9c) at ../../../i386/i386/db_interface.c:171
#11 0xc069a7c8 in trap (frame=
      {tf_fs = 24, tf_es = -761266160, tf_ds = 16, tf_edi = 0, tf_esi = 14591, tf_ebp = -761218072, tf_isp = -761218104, tf_ebx = 0, tf_edx = 0, tf_ecx = 32, tf_eax = 29, tf_trapno = 3, tf_err = 0, tf_eip = -1066882860, tf_cs = 8, tf_eflags = 134, tf_esp = -1066453058, tf_ss = -1066559792})
    at ../../../i386/i386/trap.c:580
#12 0xc068c018 in calltrap () at {standard input}:94
#13 0xc04fc051 in watchdog_fire () at ../../../kern/kern_clock.c:557
#14 0xc04fba6a in hardclock (frame=0xd2a0bca4) at ../../../kern/kern_clock.c:257
#15 0xc069d8a9 in clkintr (frame=0xd2a0bca4) at ../../../i386/isa/clock.c:193
#16 0xc0690178 in intr_execute_handlers (isrc=0xc072c880, iframe=0xd2a0bca4) at ../../../i386/i386/intr_machdep.c:192
#17 0xc069d3bf in atpic_handle_intr (iframe=
      {if_vec = 0, if_fs = -1066598376, if_es = 16, if_ds = 1687683088, if_edi = 0, if_esi = -1046550976, if_ebp = -761217816, if_ebx = -1046553508, if_edx = -1066156608, if_ecx = 2, if_eax = 0, if_eip = -1066850891, if_cs = 8, if_eflags = 582, if_esp = -761217808, if_ss = -1066850849})
    at ../../../i386/isa/atpic.c:368
#18 0xc069d51e in Xatpic_intr0 () at {standard input}:32
#19 0xc06925df in cpu_idle () at ../../../i386/i386/machdep.c:1074
#20 0xc050b55c in idle_proc (dummy=0x0) at ../../../kern/kern_idle.c:86
#21 0xc050b294 in fork_exit (callout=0xc050b520 <idle_proc>, arg=0x0, frame=0x0) at ../../../kern/kern_fork.c:793


I can't make heads or tails of the above, so I'm open to suggestions.

-- 
Jesse Guardiani, Systems Administrator
WingNET Internet Services,
P.O. Box 2605 // Cleveland, TN 37320-2605
423-559-LINK (v)  423-559-5145 (f)
http://www.wingnet.net
Comment 2 Nate Lawson 2003-12-09 20:08:09 UTC
The software watchdog trap is working as expected.  It correctly generates
a trap once the kernel is no longer poking it because interrupts are being
lost.

The real problem is that interrupts are being lost after resume.  It
appears that the ata controller is not properly reseting.  I am now
experiencing this problem also.  However, this behavior appeared before
the 1203 import and appears to be a regression.  I had the same problem
back in the summer but it was fixed by a commit by sos@.  It worked for a
few months and is back to the original behavior (hanging with the drive
light on upon resume).  Perhaps he can look into this?

-Nate
Comment 3 Jesse D.Guardiani 2004-01-19 03:30:46 UTC
I'd like to note that I've been using `boot -vD` as a temporary
workaround to this problem for more than a month now.

This is a reliable workaround. The machine will occasionally
still crash on resume from suspend once a week or so (I suspend
at least twice per day), but this is the exact same behavior I
got with 5.1-RELEASE.

I've recommended it to a number of people who have emailed
me privately and it works for them too.

-- 
Jesse Guardiani, Systems Administrator
WingNET Internet Services,
P.O. Box 2605 // Cleveland, TN 37320-2605
423-559-LINK (v)  423-559-5145 (f)
http://www.wingnet.net
Comment 4 Serge Semenenko 2004-03-17 18:44:21 UTC
On my Thinkpad T20 I use followed patch to 5.2-RELEASE :

--- ata-all.c.saved     Wed Mar 17 00:39:08 2004
+++ ata-all.c   Wed Mar 17 17:22:07 2004
@@ -238,11 +238,20 @@

     /* reset the HW */
     ata_printf(ch, -1, "resetting devices ..\n");
-    ATA_FORCELOCK_CH(ch, ATA_CONTROL);
-    ch->running = NULL;
     devices = ch->devices;
+    /* initialize the softc basics */
+    ata_generic_hw(ch);
+    ch->device[MASTER].channel = ch;
+    ch->device[MASTER].unit = ATA_MASTER;
+    ch->device[MASTER].mode = ATA_PIO;
+    ch->device[SLAVE].channel = ch;
+    ch->device[SLAVE].unit = ATA_SLAVE;
+    ch->device[SLAVE].mode = ATA_PIO;
+    ch->state = ATA_IDLE;
+    /* initialise device(s) on this channel */
+    ch->locking(ch, ATA_LF_LOCK);
     ch->hw.reset(ch);
-    ATA_UNLOCK_CH(ch);
+    ch->locking(ch, ATA_LF_UNLOCK);

     /* detach what left the channel during reset */
     if ((misdev = devices & ~ch->devices)) {


Serge
Comment 5 Jesse D.Guardiani 2004-03-17 19:38:47 UTC
I've been running 5.2.1-RELEASE for about a week now. I ran 5.2.1-RC2
for a few weeks before this.

Both versions:

a.) Do NOT exhibit the "Crash every time on resume unless booted with
    `boot -vD`" behavior that I experienced under 5.2-RELEASE.

b.) DO crash every fourth or fifth resume. This is a regression from 5.1-RELEASE
    under which my machine used to crash every 8 or 10 resumes. It's also technically
    a regression from the reliability I got with `boot -vD` under 5.2-RELEASE, but at
    least 5.2.1-RELEASE boots without that kind of hackery.

I suppose this ticket can be closed now. I'm still having problems, but they're much
different in nature from the original reason this ticket was created.

-- 
Jesse Guardiani, Systems Administrator
WingNET Internet Services,
P.O. Box 2605 // Cleveland, TN 37320-2605
423-559-LINK (v)  423-559-5145 (f)
http://www.wingnet.net
Comment 6 Mark Linimon freebsd_committer freebsd_triage 2004-03-29 18:04:24 UTC
State Changed
From-To: open->closed

Closed at submitter's request in misfiled followup misc/64318.