Bug 17592

Summary: ata READ/WRITE command timeouts
Product: Base System Reporter: dforste <dforste>
Component: kernAssignee: Søren Schmidt <sos>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 4.0-RELEASE   
Hardware: Any   
OS: Any   

Description dforste 2000-03-25 06:20:01 UTC
Periodicly the ata driver whill report:
ad0: READ command timeout - resetting
ata0: resetting devices .. done

This preatty consitantly happens after the hard drive has been sitting
idle for a bit (~20sec?) and then there is an IO request (a similiar
message is reported for WRITE requests)...There's a short pause (4-5sec)
and then everything works fine...

I'm running FreeBSD 4.0-RELEASE (installed from original ISO) on my Sony
PCG-748 laptop.  
The kernel reports:
atapci0: <Intel PIIX4 ATA33 controller> port 0xfcd0-0xfcdf at device 7.1 on pci0
...
ad0: 3909MB <FUJITSU MHC2040AT> [7944/16/63] at ata0-master using UDMA33
acd0: CDROM <TOSHIBA CD-ROM XM-1802B> at ata1-master using WDMA2

Basicly Generic kernel + APM, sound and minus extranous hardware

How-To-Repeat: Wait a bit (~20seconds) with no disk activity and then perform READ/WRITE
request on IDE drive.
Comment 1 dan freebsd_committer freebsd_triage 2000-03-25 16:02:48 UTC
Responsible Changed
From-To: freebsd-bugs->sos

This is Soren's area/ 

Comment 2 Cy Schubert 2000-08-13 17:05:36 UTC
I seem to have a similar problem, e.g. same message, with a Western 
Digital 2.5 GB disk.

Aug 13 08:41:16 cwsys /kernel: ata1-master: timeout waiting to give 
command=c8 s=d0 e=00
Aug 13 08:41:16 cwsys /kernel: ad2: error executing command - resetting
Aug 13 08:41:16 cwsys /kernel: ata1: resetting devices .. done

I'm not exactly sure whether the original cause of the problem, the 
timeout itself, is a FreeBSD bug (PR 17592) or a drive problem.  This 
drive has suffered timeouts under FreeBSD using DMA mode ever since it 
was new about 5 years ago, yet the Western Digital diagnostics see no 
problem, nor does PIO mode have any problem (or no flags when previous 
versions of FreeBSD were installed).

Relevant dmesg output:

atapci0: <Intel PIIX3 ATA controller> port 0xf000-0xf00f at device 7.1 
on pci0
ata0: at 0x1f0 irq 14 on atapci0
ata1: at 0x170 irq 15 on atapci0
ad0: 2014MB <WDC AC22100H> [4092/16/63] at ata0-master using WDMA2
ad2: 2441MB <WDC AC22500L> [4960/16/63] at ata1-master using WDMA2

uname -a output:

FreeBSD cwsys 4.1-RELEASE FreeBSD 4.1-RELEASE #5: Sun Aug 13 08:36:00 
PDT 2000     root@cwsys:/usr/opt/cvs-410r/src/sys/compile/CWSYS  i386

ad0 has no problems, yet ad2 has had random timeouts ever since it was 
new, which can be recreated by doing a stat of all all the files in a 
large directory.


Regards,                       Phone:  (250)387-8437
Cy Schubert                      Fax:  (250)387-5766
Team Leader, Sun/DEC Team   Internet:  Cy.Schubert@osg.gov.bc.ca
Open Systems Group, ITSD, ISTA
Province of BC
Comment 3 wildph 2000-09-30 22:30:15 UTC
hi there, 

I'm getting this sort of thing too..  Using a WD Caviar HDD Drive. relevant messages are below.. ask me if you need more.. The motherboard chipset is intel 430hx pentium based. 

Sometimes the ata0:resetting devices appears to work.. Most times the unix box locks up and needs to be physically rebooted (off/on/reset switch). 

I've CVSup'd to 4.1.1STABLE last night.  keeping an eye on it. 

Cheers

Graeme.

<at boot>
/kernel: ad0: 3020MB <WDC AC33100H> [6136/16/63] at ata0-master using WDMA2

<during operation /var/log/messages>
Sep 16 13:39:33 p200 /kernel: FreeBSD 4.1-STABLE #1: Sun Aug 13 21:05:36 BST 2000
-------
Sep 16 16:49:38 p200 /kernel: ad0: WRITE command timeout - resetting
Sep 16 16:49:38 p200 /kernel: ata0: resetting devices .. done
Sep 16 17:36:31 p200 /kernel: ad0: READ command timeout - resetting
Sep 16 17:36:31 p200 /kernel: ata0: resetting devices .. done
<reboot>
Sep 23 00:00:10 p200 /kernel: ad0: WRITE command timeout - resetting
Sep 23 00:00:10 p200 /kernel: ata0: resetting devices .. done
<reboot>
--------
 
Comment 4 Søren Schmidt freebsd_committer freebsd_triage 2000-11-14 08:40:04 UTC
State Changed
From-To: open->closed

Upgrade to 4.2 that should solve you problem.
Comment 5 chris 2000-11-30 03:40:36 UTC
Greetings.  We were having problems very similar to the ones described in
kern/17592.  The recommended fix was to upgrade to 4.2, which we did.  We
continue to have these problems, so I thought I'd submit a follow-up
report in case this PR needs re-opening.

uname -a:

FreeBSD nollie.summersault.com 4.2-RELEASE FreeBSD 4.2-RELEASE #0: Tue Nov
21 13:04:31 EST 2000
root@nollie.summersault.com:/usr/src/sys/compile/NOLLIE.112100 i386

From dmesg:

CPU: Pentium III/Pentium III Xeon/Celeron (651.48-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x681  Stepping = 1
  Features=0x383f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE>
<snip>
ad0: 19574MB <WDC WD205BA> [39770/16/63] at ata0-master UDMA66
ata1-master: timeout waiting for command=ef s=00 e=00
(null): MODE_SENSE_BIG command timeout - resetting
ata1: resetting devices .. done
(null): MODE_SENSE_BIG command timeout - resetting
ata1: resetting devices .. done
(null): MODE_SENSE_BIG command timeout - resetting
ata1: resetting devices .. done
(null): MODE_SENSE_BIG command timeout - resetting
ata1: resetting devices .. done

We used to get these kinds of messages in /var/log/messages right before a
crash:

Nov 10 18:53:52 nollie /kernel: acd0: PREVENT_ALLOW command timeout - resetting
Nov 10 18:53:52 nollie /kernel: ata1: resetting devices .. done

but with 4.2 we no longer do.

The basic behavior is that the system just freezes up and can only be
recovered with a push of the reset button.

Other random notes:
  -The crashes seem to be happening on a semi-regular basis, about every 4
days, but not at the same time for each crash.
  -The drive is only a few months old, and has been tested several times.
  -Possibly unrelated: the kernel displays a message "14: not found" right
after displaying "WARNING: / was not properly dismounted", but this
doesn't show up in dmesg.  I've seen this on a few other FreeBSD boxes we
have.

I'm compiling a kernel with debug symbols now, so hopefully I'll have more
information to offer soon.

Anyone else reporting continued problems since upgrading to 4.2?

Thanks,
Chris

-- Chris Hardie -----------------------------
----- mailto:chris@summersault.com ----------
-------- http://www.summersault.com/chris/ --
Comment 6 chris 2000-12-04 18:06:49 UTC
Soren,

We continue to have problems with our system locking up on a regular basis
as described in my 29 Nov 2000 followup to this PR.  I ran the debug
kernel and no core dump was produced.  This suggests either a problem with
the hardware itself, or a problem with FreeBSD's interaction with the
hardware.  Upgrading to 4.2 didn't seem to help, so I'm wondering what you
would suggest as our next step in figuring out what's wrong and how to
proceed with fixing it.  At this point I'm at a loss for figuring out
what's happening at the moment of the crash.

Hopefully unrelated is the note that this is the same box where we had
problems described in PR#16740.  I hope to avoid the dismissal of the
problem as a hardware defect because we've tested and re-tested the
motherboard several times, but I thought I should mention it.

Thanks,
Chris


-- Chris Hardie -----------------------------
----- mailto:chris@summersault.com ----------
-------- http://www.summersault.com/chris/ --