| Summary: | ATA driver does not recover from READ_DMA TIMEOUT | ||
|---|---|---|---|
| Product: | Base System | Reporter: | Patrick Mackinlay <patrick> |
| Component: | kern | Assignee: | Søren Schmidt <sos> |
| Status: | Closed FIXED | ||
| Severity: | Affects Only Me | ||
| Priority: | Normal | ||
| Version: | 5.2.1-RELEASE | ||
| Hardware: | Any | ||
| OS: | Any | ||
|
Description
Patrick Mackinlay
2004-04-26 18:30:21 UTC
Responsible Changed From-To: freebsd-bugs->sos Sounds like an ata(4) issue, so over to ata maintainer. Believe I am having same problem. And that kern/62897 is probably the same thing too. Bought a brand new Dell 400SC, then a pair of Hitachi HDS722516VLSA80 160G SATA drives. The base Seagate ST340014A 40G is on parallel ATA partioned with sysinstall's "auto" defaults. System withstood a couple of days of abuse including "make world" before installing the SATA drives, leaving the PATA 40G booting FreeBSD 5.2.1-p9. Partitioned the 160's with 1G of swap at the start, remainder native FreeBSD. Have not used the swap partitions. Striped the two large partitions with vinum. Then started filling via ftp. Instantly locked the machine requiring power cycle to recover. Have removed vinum, newfs'ed the bare partitions ad[46]s1d and tried using them simply. cp from PATA to the fs on ad6s1d works just great. cp of files on the fs at ad6s1d to the fs on ad4s1d gets READ_DMA timeout at 1349058560 bytes into the first file. This cp process is stuck. Its not moving. Its not responding to kill. Apparently everything to ad4 is blocked until this clears. Shutdown gave up on syncing 22 buffers. Fsck reports "bad inode number 1083392 to nextinode" now on ad4s1d. Time for newfs. CPU is a P4-2.8G 512k with Hyperthreading enabled. Disabling HT appears to have ended the problem and results in a reliable machine. -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello, I can reproduce this every time. I finally identified the file that is using the disk sectors that are causing the fault and renamed the file to "/file_system_mount_point/broken". This is a work arround that works, however what really needs to be fixed is the ata driver. It quite clearly does not handle hard disk failures properly. Patrick David Kelly wrote: | Believe I am having same problem. And that kern/62897 is probably the | same thing too. | | Bought a brand new Dell 400SC, then a pair of Hitachi HDS722516VLSA80 | 160G SATA drives. The base Seagate ST340014A 40G is on parallel ATA | partioned with sysinstall's "auto" defaults. System withstood a couple | of days of abuse including "make world" before installing the SATA | drives, leaving the PATA 40G booting FreeBSD 5.2.1-p9. | | Partitioned the 160's with 1G of swap at the start, remainder native | FreeBSD. Have not used the swap partitions. | | Striped the two large partitions with vinum. Then started filling via | ftp. Instantly locked the machine requiring power cycle to recover. | | Have removed vinum, newfs'ed the bare partitions ad[46]s1d and tried | using them simply. cp from PATA to the fs on ad6s1d works just great. cp | of files on the fs at ad6s1d to the fs on ad4s1d gets READ_DMA timeout | at 1349058560 bytes into the first file. This cp process is stuck. Its | not moving. Its not responding to kill. Apparently everything to ad4 is | blocked until this clears. | | Shutdown gave up on syncing 22 buffers. Fsck reports "bad inode number | 1083392 to nextinode" now on ad4s1d. Time for newfs. | | CPU is a P4-2.8G 512k with Hyperthreading enabled. Disabling HT appears | to have ended the problem and results in a reliable machine. | | - -- Patrick Mackinlay patrick@spacesurfer.com http://patrick.spacesurfer.com/ tel: +44.7050699851 Yahoo messenger: patrick00_uk fax: +44.7050699852 SpaceSurfer Limited http://www.spacereg.com/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFBCogYD97IpyzY3RIRAi6kAKCKUho4Tx/vJfnxks+lXsu2m5RDCgCcDIh7 CeCO1LrgwWYUGPUFQ2lnBdw= =T48C -----END PGP SIGNATURE----- On Jul 30, 2004, at 12:40 PM, Patrick Mackinlay wrote:
> I can reproduce this every time. I finally identified the file that is
> using the disk sectors that are causing the fault and renamed the file
> to "/file_system_mount_point/broken". This is a work arround that
> works,
> however what really needs to be fixed is the ata driver. It quite
> clearly
> does not handle hard disk failures properly.
That cause sounds different than my problem altho its likely we are
both hanging on the same error handling problem. Sounds like Patrick
has a bad block on the media? See badsect(8) for something that might
help create a bandaid.
Since posting earlier I have disabled hyperthreading of the CPU in the
BIOS and have written over 50G to each of the "problem" SATA drives,
reading from one or the other.
Am now confident hyperthreading (SMP) was the root of my problem and am
ready to set the machine to the tasks it was purchased for, with HT
disabled.
State Changed From-To: open->closed You should try -current (or soon to be 5.3) as I've fixed a couble of races that could provoke this.. |