| Summary: | ATA system freezes during heavy loads | ||
|---|---|---|---|
| Product: | Base System | Reporter: | Kirk Strauser <kirk> |
| Component: | kern | Assignee: | Søren Schmidt <sos> |
| Status: | Closed FIXED | ||
| Severity: | Affects Only Me | ||
| Priority: | Normal | ||
| Version: | 5.2-CURRENT | ||
| Hardware: | Any | ||
| OS: | Any | ||
Hallo...
I have the same problem:
ad0: TIMEOUT - WRITE_DMA retrying (2 reties left)
ata0: resetting devices
ad0: FAILURE - already active DMA on this device
ad0: setting up DMA failed
[only reset helps]
I tried three different harddisks (IBM/Seagate) and two different
(UDMA-)cables without any change. The system still crashes under heavy IO
load! In PIO mode everything works fine but *very* slow...
# uname -a
FreeBSD 5.2-RC FreeBSD 5.2-RC #2: Sat Dec 27 18:28:11 CET 2003
wiwi@:/usr/src/sys/i386/compile/mail i386
(just cvsup'ed)
# atacontrol list
ATA channel 0:
Master: ad0 <IC35L120AVV207-1/V24OA66A> ATA/ATAPI rev 6
Slave: no device present
ATA channel 1:
Master: acd0 <COMPAQ DVD-ROM GDR8160B/0012> ATA/ATAPI rev 0
Slave: no device present
Dec 27 20:04:29 syslogd: kernel boot file is /boot/kernel/kernel
Dec 27 20:04:29 kernel: Copyright (c) 1992-2003 The FreeBSD Project.
Dec 27 20:04:29 kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989,
1991, 1992, 1993, 1994
Dec 27 20:04:29 kernel: The Regents of the University of California. All
rights reserved.
Dec 27 20:04:29 kernel: FreeBSD 5.2-RC #2: Sat Dec 27 18:28:11 CET 2003
Dec 27 20:04:29 kernel: wiwi@:/usr/src/sys/i386/compile/mail
Dec 27 20:04:29 kernel: Preloaded elf kernel "/boot/kernel/kernel" at
0xc09c5000.
Dec 27 20:04:29 kernel: Preloaded elf module "/boot/kernel/acpi.ko" at
0xc09c50cc.
Dec 27 20:04:29 kernel: Timecounter "i8254" frequency 1193182 Hz quality 0
Dec 27 20:04:29 kernel: CPU: Intel(R) Pentium(R) 4 CPU 2.40GHz
(2392.30-MHz 686-class CPU)
Dec 27 20:04:29 kernel: Origin = "GenuineIntel" Id = 0xf24 Stepping = 4
Dec 27 20:04:29 kernel:
Features=0x3febfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MC
A,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM>
Dec 27 20:04:29 kernel: real memory = 260046848 (248 MB)
Dec 27 20:04:29 kernel: avail memory = 242958336 (231 MB)
Dec 27 20:04:29 kernel: Pentium Pro MTRR support enabled
Dec 27 20:04:29 kernel: ACPI-0264: *** Error: Looking up
[\_SB_.PCI0.IDE_.SIDE] in namespace, AE_NOT_FOUND
Dec 27 20:04:29 kernel: ACPI-1287: *** Error: , AE_NOT_FOUND
Dec 27 20:04:29 kernel: npx0: [FAST]
Dec 27 20:04:29 kernel: npx0: <math processor> on motherboard
Dec 27 20:04:29 kernel: npx0: INT 16 interface
Dec 27 20:04:29 kernel: acpi0: <COMPAQ CPQ0050 > on motherboard
Dec 27 20:04:29 kernel: pcibios: BIOS version 2.10
Dec 27 20:04:29 kernel: Using $PIR table, 10 entries at 0xc00ecef0
Dec 27 20:04:29 kernel: ACPI-0438: *** Error: Looking up [\OSFG] in
namespace, AE_NOT_FOUND
Dec 27 20:04:29 kernel: SearchNode 0xc16859c0 StartNode 0xc16859c0
ReturnNode 0
Dec 27 20:04:29 kernel: ACPI-1287: *** Error: Method execution failed
[\_SB_.PCI0._INI] (Node 0xc16859c0), AE_NOT_FOUND
Dec 27 20:04:29 kernel: acpi0: Power Button (fixed)
Dec 27 20:04:29 kernel: Timecounter "ACPI-fast" frequency 3579545 Hz
quality 1000
Dec 27 20:04:29 kernel: acpi_timer0: <24-bit timer at 3.579545MHz> port
0xf808-0xf80b on acpi0
Dec 27 20:04:29 kernel: pcib0: <ACPI Host-PCI bridge> on acpi0
Dec 27 20:04:29 kernel: pci0: <ACPI PCI bus> on pcib0
Dec 27 20:04:29 kernel: pcib0: slot 2 INTA is routed to irq 10
Dec 27 20:04:29 kernel: pcib0: slot 29 INTA is routed to irq 10
Dec 27 20:04:29 kernel: pcib0: slot 29 INTB is routed to irq 11
Dec 27 20:04:29 kernel: pcib0: slot 29 INTD is routed to irq 5
Dec 27 20:04:29 kernel: pcib0: slot 31 INTA is routed to irq 11
Dec 27 20:04:29 kernel: pcib0: slot 31 INTB is routed to irq 5
Dec 27 20:04:29 kernel: agp0: <Intel 82845G (845G GMCH) SVGA controller>
mem 0xfa400000-0xfa47ffff,0xf0000000-0xf7ffffff irq 10 at device 2.0 on pci0
Dec 27 20:04:29 kernel: agp0: detected 8060k stolen memory
Dec 27 20:04:29 kernel: agp0: aperture size is 128M
Dec 27 20:04:29 kernel: uhci0: <Intel 82801DB (ICH4) USB controller USB-A>
port 0x2440-0x245f irq 10 at device 29.0 on pci0
Dec 27 20:04:29 kernel: usb0: <Intel 82801DB (ICH4) USB controller USB-A>
on uhci0
Dec 27 20:04:29 kernel: usb0: USB revision 1.0
Dec 27 20:04:29 kernel: uhub0: Intel UHCI root hub, class 9/0, rev
1.00/1.00, addr 1
Dec 27 20:04:29 kernel: uhub0: 2 ports with 2 removable, self powered
Dec 27 20:04:29 kernel: uhci1: <Intel 82801DB (ICH4) USB controller USB-B>
port 0x2460-0x247f irq 11 at device 29.1 on pci0
Dec 27 20:04:29 kernel: usb1: <Intel 82801DB (ICH4) USB controller USB-B>
on uhci1
Dec 27 20:04:29 kernel: usb1: USB revision 1.0
Dec 27 20:04:29 kernel: uhub1: Intel UHCI root hub, class 9/0, rev
1.00/1.00, addr 1
Dec 27 20:04:29 kernel: uhub1: 2 ports with 2 removable, self powered
Dec 27 20:04:29 kernel: pci0: <serial bus, USB> at device 29.7 (no driver
attached)
Dec 27 20:04:29 kernel: pcib1: <ACPI PCI-PCI bridge> at device 30.0 on pci0
Dec 27 20:04:29 kernel: pcib1: could not get PCI interrupt routing table
for \_SB_.PCI0.HUB_ - AE_NOT_FOUND
Dec 27 20:04:29 kernel: pci5: <ACPI PCI bus> on pcib1
Dec 27 20:04:29 kernel: fxp0: <Intel 82801DB (ICH4) Pro/100 VM Ethernet>
port 0x1000-0x103f mem 0xfa500000-0xfa500fff irq 5 at device 8.0 on pci5
Dec 27 20:04:29 kernel: fxp0: Ethernet address 00:08:02:a6:7a:19
Dec 27 20:04:29 kernel: miibus0: <MII bus> on fxp0
Dec 27 20:04:29 kernel: inphy0: <i82562EM 10/100 media interface> on
miibus0
Dec 27 20:04:29 kernel: inphy0: 10baseT, 10baseT-FDX, 100baseTX,
100baseTX-FDX, auto
Dec 27 20:04:29 kernel: isab0: <PCI-ISA bridge> at device 31.0 on pci0
Dec 27 20:04:29 kernel: isa0: <ISA bus> on isab0
Dec 27 20:04:29 kernel: atapci0: <Intel ICH4 UDMA100 controller> port
0x24a0-0x24af,0x24c4-0x24c7,0x24b8-0x24bf,0x24c0-0x24c3,0x24b0-0x24b7 irq
11 at device 31.1 on pci0
Dec 27 20:04:29 kernel: ata0: at 0x1f0 irq 14 on atapci0
Dec 27 20:04:29 kernel: ata0: [MPSAFE]
Dec 27 20:04:29 kernel: ata1: at 0x170 irq 15 on atapci0
Dec 27 20:04:29 kernel: ata1: [MPSAFE]
Dec 27 20:04:29 kernel: pci0: <multimedia, audio> at device 31.5 (no
driver attached)
Dec 27 20:04:29 kernel: pmtimer0 on isa0
Dec 27 20:04:29 kernel: atkbdc0: <Keyboard controller (i8042)> at port
0x64,0x60 on isa0
Dec 27 20:04:29 kernel: atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0
Dec 27 20:04:29 kernel: kbd0 at atkbd0
Dec 27 20:04:29 kernel: psm0: <PS/2 Mouse> irq 12 on atkbdc0
Dec 27 20:04:29 kernel: psm0: model IntelliMouse Explorer, device ID 4
Dec 27 20:04:29 kernel: fdc0: <Enhanced floppy controller (i82077, NE72065
or clone)> at port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on isa0
Dec 27 20:04:29 kernel: fdc0: FIFO enabled, 8 bytes threshold
Dec 27 20:04:29 kernel: fd0: <1440-KB 3.5" drive> on fdc0 drive 0
Dec 27 20:04:29 kernel: ppc0: <Parallel port> at port 0x378-0x37f irq 7 on
isa0
Dec 27 20:04:29 kernel: ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in
COMPATIBLE mode
Dec 27 20:04:29 kernel: ppc0: FIFO with 16/16/13 bytes threshold
Dec 27 20:04:29 kernel: ppbus0: <Parallel port bus> on ppc0
Dec 27 20:04:29 kernel: plip0: <PLIP network interface> on ppbus0
Dec 27 20:04:29 kernel: lpt0: <Printer> on ppbus0
Dec 27 20:04:29 kernel: lpt0: Interrupt-driven port
Dec 27 20:04:29 kernel: ppi0: <Parallel I/O> on ppbus0
Dec 27 20:04:29 kernel: sc0: <System console> at flags 0x100 on isa0
Dec 27 20:04:29 kernel: sc0: VGA <16 virtual consoles, flags=0x300>
Dec 27 20:04:29 kernel: sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
Dec 27 20:04:29 kernel: sio0: type 16550A
Dec 27 20:04:29 kernel: sio1 at port 0x2f8-0x2ff irq 3 on isa0
Dec 27 20:04:29 kernel: sio1: type 16550A
Dec 27 20:04:29 kernel: vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem
0xa0000-0xbffff on isa0
Dec 27 20:04:29 kernel: Timecounter "TSC" frequency 2392296452 Hz quality
800
Dec 27 20:04:29 kernel: Timecounters tick every 10.000 msec
Dec 27 20:04:29 kernel: GEOM: create disk ad0 dp=0xc2d2e160
Dec 27 20:04:29 kernel: ad0: 117800MB <IC35L120AVV207-1> [239340/16/63] at
ata0-master UDMA100
Dec 27 20:04:29 kernel: acd0: DVDROM <COMPAQ DVD-ROM GDR8160B> at
ata1-master PIO4
Best regards,
Christian Wittenhorst
Hi, it's me again...
MAINTAINER: can you reassign this bug report to: "CRITICAL", please? I
think this bug is "critical", indeed!
I checked this on different machines. The bug seems systematic. I
experienced this bug on all (=4) machines I installed 5.2RC on. This bug
seems unrelated to kern/57174 (ata tagging), all machines have standard
install and GENERIC kernel config with just SMP disabled! The same machines
worked fine at 5.1/5.0!
both ATA 0&1 crash:
ATA channel 0:
Master: ad0 <QUANTUM FIREBALL CR4.3A/A5U.1000> ATA/ATAPI rev 4
Slave: no device present
ATA channel 1:
Master: ad2 <IC35L080AVVA07-0/VA4OA52A> ATA/ATAPI rev 5
Slave: no device present
only ATA 0 crashed so far:
ATA channel 0:
Master: ad0 <ST340016A/3.21> ATA/ATAPI rev 5
Slave: no device present
ATA channel 1:
Master: acd0 <COMPAQ DVD-ROM GDR8160B/0012> ATA/ATAPI rev 0
Slave: no device present
only ATA 0 crashed so far:
ATA channel 0:
Master: ad0 <IC35L060AVV207-0/V22OA66A> ATA/ATAPI rev 6
Slave: no device present
ATA channel 1:
Master: acd0 <Compaq CRD-8322B/1.03> ATA/ATAPI rev 0
Slave: no device present
both ATA 0&1 crash:
ATA channel 0:
Master: ad0 <IC35L060AVV207-0/V22OA63A> ATA/ATAPI rev 6
Slave: no device present
ATA channel 1:
Master: ad2 <IC35L120AVV207-0/V24OA66A> ATA/ATAPI rev 6
Slave: acd0 <Compaq CRD-8322B/1.06> ATA/ATAPI rev 0
I just "need" heavy io load over a longer time (>5 mins) to crash the ata
system. I just rebuild bogofilter's spam databases to crash the machines:
#/bin/sh
for i in /mail/data/user/progon-net/Archive/Spam/2003-11/*.
do
echo $i
bogofilter -c /etc/bogofilter.cf -s < $i
done
If you need more information, just drop an email...
Best regards,
Christian Wittenhorst
Responsible Changed From-To: freebsd-bugs->sos Over to maintainer. State Changed From-To: open->closed This is belived to be fixed recently in -current. |
I built world after cvsup'ing -CURRENT on 2003-12-11 and am still having the same ATA READ_DMA hangs that started in early October on my system. I can repeat the hangs at will; the machine serves as an Amanda server, and launching a backup for itself plus 3 client machines is guaranteed to trigger it: ad0: TIMEOUT - READ_DMA retrying (2 retries left) ata0: resetting devices .. ad0: FAILURE - already active DMA on this device ad0: setting up DMA failed When this happens, the system is effectively dead until I reset it. I can run for days on end by booting with the drive in BIOSPIO mode (PIO4 has the same problem as the UDMA modes), but that's not really my ideal long-term solution as it slows the system to a crawl. The drive in question is a Western Digital WD1200JB-00DUA3 (Caviar 120GB special edition) attached to an Asus P3V4X (Via chipset) motherboard. The combination has worked perfectly from the server's 4.8-STABLE days, through 5.0, and up until the last two months when I started experiencing this immediately after an upgrade. The sysutils/smartctl reports: SMART overall-health self-assessment test result: PASSED That, coupled with the problem's appearance immediately after an upgrade leads me to suspect a problem with the ATA driver and not with the hardware itself. Fix: A temporary workaround is to run: atacontrol mode 0 BIOSPIO BIOSPIO The system will still crash at PIO4 and above, so stick with BIOSPIO for stability. How-To-Repeat: Do anything that stresses the ATA drive, such as run multiple copies of tar.