Bug 82261 - DMA-support on Sparc64 broken
Summary: DMA-support on Sparc64 broken
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: sparc64 (show other bugs)
Version: 6.0-CURRENT
Hardware: Any Any
: Normal Affects Only Me
Assignee: Søren Schmidt
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-06-15 10:20 UTC by Sebastian Koehler
Modified: 2005-08-29 12:28 UTC (History)
0 users

See Also:


Attachments
ata-chipset.c.diff (1.77 KB, patch)
2005-07-08 21:19 UTC, Sebastian Koehler
no flags Details | Diff
ata-chipset.c.diff2 (2.52 KB, text/plain; charset=us-ascii)
2005-08-20 00:59 UTC, marius
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sebastian Koehler 2005-06-15 10:20:16 UTC
A clean installation using FreeBSD media cause errors when DMA mode is used to access the IDE disks.

messages during installation:
ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=2570528
ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=2570624
ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=2574752

if system is installed using hw.ata.ata_dma=0 the following happens, when system is booted with DMA enabled:
dc1: Ethernet address: 00:03:ba:0f:22:55
dc1: if_start running deferred for Giant
dc1: [GIANT-LOCKED]
pci0: <serial bus, USB> at device 10.0 (no driver attached)
atapci0: <AcerLabs M5229 UDMA66 controller> port 0x10200-0x10207,0x10218-0x1021b,0x10210-0x10217,0x10208-0x1020b,0x10220-0x1022f at device 13.0 on pci0
ata2: <ATA channel 0> on atapci0
ata3: <ATA channel 1> on atapci0
stray level interrupt 14
rtc0: <Real Time Clock> at port 0x70-0x71 on isa0
uart0: <16550 or compatible> at port 0x3f8-0x3ff irq 43 on isa0
uart0: console (9600,n,8,1)
uart1: <16550 or compatible> at port 0x2e8-0x2ef irq 43 on isa0
Timecounters tick every 1.000 msec
ad0: 38166MB <Seagate ST340824A 3.28> at ata2-master UDMA66
acd0: CDRW <RICOH CD-R/RW MP7200A/1.30> at ata3-master UDMA33
Trying to mount root from ufs:/dev/ad0a
/libexec/ld-elf.so.1: /lib/libncurses.so.5: invalid file format
Enter full pathname of shell or RETURN for /bin/sh: 

or:
dc1: Ethernet address: 00:03:ba:0f:22:55
dc1: if_start running deferred for Giant
dc1: [GIANT-LOCKED]
pci0: <serial bus, USB> at device 10.0 (no driver attached)
atapci0: <AcerLabs M5229 UDMA66 controller> port 0x10200-0x10207,0x10218-0x1021b,0x10210-0x10217,0x10208-0x1020b,0x10220-0x1022f at device 13.0 on pci0
ata2: <ATA channel 0> on atapci0
ata3: <ATA channel 1> on atapci0
stray level interrupt 14
rtc0: <Real Time Clock> at port 0x70-0x71 on isa0
uart0: <16550 or compatible> at port 0x3f8-0x3ff irq 43 on isa0
uart0: console (9600,n,8,1)
uart1: <16550 or compatible> at port 0x2e8-0x2ef irq 43 on isa0
Timecounters tick every 1.000 msec
ad0: 38166MB <Seagate ST340824A 3.28> at ata2-master UDMA66
acd0: CDRW <RICOH CD-R/RW MP7200A/1.30> at ata3-master UDMA33
Trying to mount root from ufs:/dev/ad0a
init in malloc(): error: recursive call
init in malloc(): error: recursive call
init in malloc(): error: recursive call
init in malloc(): error: recursive call
init in malloc(): error: recursive call
init in malloc(): error: recursive call
init in malloc(): error: recursive call
init in malloc(): error: recursive call
init in malloc(): error: recursive call
init in malloc(): error: recursive call
init in malloc(): error: recursive call
init in malloc(): error: recursive call

system is working fine with hw.ata.ata_dma=0:
dc1: Ethernet address: 00:03:ba:0f:22:55
dc1: if_start running deferred for Giant
dc1: [GIANT-LOCKED]
pci0: <serial bus, USB> at device 10.0 (no driver attached)
atapci0: <AcerLabs M5229 UDMA66 controller> port 0x10200-0x10207,0x10218-0x1021b,0x10210-0x10217,0x10208-0x1020b,0x10220-0x1022f at device 13.0 on pci0
ata2: <ATA channel 0> on atapci0
ata3: <ATA channel 1> on atapci0
rtc0: <Real Time Clock> at port 0x70-0x71 on isa0
uart0: <16550 or compatible> at port 0x3f8-0x3ff irq 43 on isa0
uart0: console (9600,n,8,1)
uart1: <16550 or compatible> at port 0x2e8-0x2ef irq 43 on isa0
Timecounters tick every 1.000 msec
ad0: 38166MB <Seagate ST340824A 3.28> at ata2-master PIO4
acd0: CDRW <RICOH CD-R/RW MP7200A/1.30> at ata3-master UDMA33
Trying to mount root from ufs:/dev/ad0a
Loading configuration files.
Entropy harvesting: interrupts ethernet point_to_point kickstart.
swapon: adding /dev/ad0b as swap device
Starting file system checks:
/dev/ad0a: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ad0a: clean, 102079 free (975 frags, 12638 blocks, 0.8% fragmentation)
/dev/ad0e: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ad0e: clean, 127341 free (29 frags, 15914 blocks, 0.0% fragmentation)
/dev/ad0f: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ad0f: clean, 17986047 free (4295 frags, 2247719 blocks, 0.0% fragmentation)
/dev/ad0d: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ad0d: clean, 127258 free (42 frags, 15902 blocks, 0.0% fragmentation)

Fix: 

If DMA is not used (hw.ata.ata_dma=0 in bootloader) the messages go away and access to HDD is possible without errors, but only in PIO4.
How-To-Repeat: Try to access IDE drives in a Sun Netra X1 using DMA mode. Tested FreeBSD installation media 5.3-RELEASE and 6.0-CURRENT-SNAP004. Earlier releases no testet.
Comment 1 marius 2005-06-15 11:44:40 UTC
On Wed, Jun 15, 2005 at 09:14:32AM +0000, Sebastian Koehler wrote:
> 
> A clean installation using FreeBSD media cause errors when DMA mode is used to access the IDE disks.
> 
> messages during installation:
> ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=2570528
> ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=2570624
> ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=2574752
> 

Is this with original drives from Sun or with vanilla off-the-shelf
ones?
Comment 2 Sebastian Koehler 2005-06-15 14:12:00 UTC
The drive got a yellow Sun label on it with P/N 370-4419-01. It's a 40GB
Seagate, model ST340824A.
Comment 3 Sebastian Koehler 2005-07-08 21:19:10 UTC
Hi list,

got the attached patch from Marius Strobl. Unfortunately problem was not
fixed trough it. When ata_generic_reset(dev) is below if (ctlr...) in
ata_ali_reset() and system is booting up with hw.ata.ata_dma=1 still
data corruption occurs. See the next lines for details.

...
Additional routing options:.
Starting devd.
Mounting NFS file systems:.
Creating and/or trimming log files:.
Starting syslogd.
Checking for core dump on /dev/ad0b...
/libexec/ld-elf.so.1: /lib/libz.so.2: invalid file format
ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/local/lib
Starting local daemons:.
...

sunshine# sysctl -a | grep ata_dma
hw.ata.ata_dma: 1
sunshine#

sunshine# dmesg
...
uart1: <16550 or compatible> at port 0x2e8-0x2ef irq 43 on isa0
Timecounters tick every 1.000 msec
ad0: 38166MB <Seagate ST340824A 3.28> at ata2-master UDMA66
Trying to mount root from ufs:/dev/ad0a
ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=256
ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=394976
ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=3635040
dc0: failed to force tx and rx to idle state
dc0: failed to force tx and rx to idle state
dc0: failed to force tx and rx to idle state
dc0: failed to force tx and rx to idle state
ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=224

Sebastian
Comment 4 Linh Pham 2005-07-28 03:25:22 UTC
I just wanted to report that I experienced the same problem with my Sun
Blade 100 system (using both the stock Sun hard drive and two
off-the-shelf drives, one Seagate 7200.7 ATA100 and an older WD 8GB)
running 5.4-RELEASE.

I'm guessing this impacts all Blade 100/150, Netra X1 and Fire V100
servers as they all seem to use the same chipset (give or take a little
bit).

Setting hw.ata.ata_dma to 0 does fix the problem with my Blade 100 (and
Blade 150, nearly if not identical motherboard but US-IIi instead of
US-IIe).

I haven't run into the same problem while running 5.4-RELEASE or
6.0-BETA on a Sun Ultra 10 using a Seagate Barracuda IV 40GB drive;
probably because it uses a different ATA controller (which isn't great
to begin with).

-- 
Linh Pham
question@closedsrc.org
http://closedsrc.org/
Comment 5 marius 2005-08-10 21:23:36 UTC
On Wed, Jun 15, 2005 at 09:14:32AM +0000, Sebastian Koehler wrote:
> 
> >Number:         82261
> >Category:       sparc64
> >Synopsis:       DMA-support on Sparc64 broken
> >Confidential:   no
> >Severity:       serious
> >Priority:       high
> >Responsible:    freebsd-sparc64
> >State:          closed
> >Quarter:        
> >Keywords:       
> >Date-Required:
> >Class:          sw-bug
> >Submitter-Id:   current-users
> >Arrival-Date:   Wed Jun 15 09:20:16 GMT 2005
> >Closed-Date:
> >Last-Modified:
> >Originator:     Sebastian Koehler
> >Release:        6.0-CURRENT-SNAP004
> >Organization:
> >Environment:
> FreeBSD  6.0-20050601-SNAP FreeBSD 6.0-20050601-SNAP #0: Thu Jun  2 05:29:17 UTC 2005     root@u60.samsco.home:/usr/obj/usr/src/sys/GENERIC  sparc64
> >Description:
> A clean installation using FreeBSD media cause errors when DMA mode is used to access the IDE disks.
> 
> messages during installation:
> ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=2570528
> ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=2570624
> ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=2574752
> 
> if system is installed using hw.ata.ata_dma=0 the following happens, when system is booted with DMA enabled:
> dc1: Ethernet address: 00:03:ba:0f:22:55
> dc1: if_start running deferred for Giant
> dc1: [GIANT-LOCKED]
> pci0: <serial bus, USB> at device 10.0 (no driver attached)
> atapci0: <AcerLabs M5229 UDMA66 controller> port 0x10200-0x10207,0x10218-0x1021b,0x10210-0x10217,0x10208-0x1020b,0x10220-0x1022f at device 13.0 on pci0
> ata2: <ATA channel 0> on atapci0
> ata3: <ATA channel 1> on atapci0
> stray level interrupt 14
> rtc0: <Real Time Clock> at port 0x70-0x71 on isa0
> uart0: <16550 or compatible> at port 0x3f8-0x3ff irq 43 on isa0
> uart0: console (9600,n,8,1)
> uart1: <16550 or compatible> at port 0x2e8-0x2ef irq 43 on isa0
> Timecounters tick every 1.000 msec
> ad0: 38166MB <Seagate ST340824A 3.28> at ata2-master UDMA66
> acd0: CDRW <RICOH CD-R/RW MP7200A/1.30> at ata3-master UDMA33
> Trying to mount root from ufs:/dev/ad0a
> /libexec/ld-elf.so.1: /lib/libncurses.so.5: invalid file format
> Enter full pathname of shell or RETURN for /bin/sh: 
> 
> or:
> dc1: Ethernet address: 00:03:ba:0f:22:55
> dc1: if_start running deferred for Giant
> dc1: [GIANT-LOCKED]
> pci0: <serial bus, USB> at device 10.0 (no driver attached)
> atapci0: <AcerLabs M5229 UDMA66 controller> port 0x10200-0x10207,0x10218-0x1021b,0x10210-0x10217,0x10208-0x1020b,0x10220-0x1022f at device 13.0 on pci0
> ata2: <ATA channel 0> on atapci0
> ata3: <ATA channel 1> on atapci0
> stray level interrupt 14
> rtc0: <Real Time Clock> at port 0x70-0x71 on isa0
> uart0: <16550 or compatible> at port 0x3f8-0x3ff irq 43 on isa0
> uart0: console (9600,n,8,1)
> uart1: <16550 or compatible> at port 0x2e8-0x2ef irq 43 on isa0
> Timecounters tick every 1.000 msec
> ad0: 38166MB <Seagate ST340824A 3.28> at ata2-master UDMA66
> acd0: CDRW <RICOH CD-R/RW MP7200A/1.30> at ata3-master UDMA33
> Trying to mount root from ufs:/dev/ad0a
> init in malloc(): error: recursive call
> init in malloc(): error: recursive call
> init in malloc(): error: recursive call
> init in malloc(): error: recursive call
> init in malloc(): error: recursive call
> init in malloc(): error: recursive call
> init in malloc(): error: recursive call
> init in malloc(): error: recursive call
> init in malloc(): error: recursive call
> init in malloc(): error: recursive call
> init in malloc(): error: recursive call
> init in malloc(): error: recursive call
> 
> system is working fine with hw.ata.ata_dma=0:
> dc1: Ethernet address: 00:03:ba:0f:22:55
> dc1: if_start running deferred for Giant
> dc1: [GIANT-LOCKED]
> pci0: <serial bus, USB> at device 10.0 (no driver attached)
> atapci0: <AcerLabs M5229 UDMA66 controller> port 0x10200-0x10207,0x10218-0x1021b,0x10210-0x10217,0x10208-0x1020b,0x10220-0x1022f at device 13.0 on pci0
> ata2: <ATA channel 0> on atapci0
> ata3: <ATA channel 1> on atapci0
> rtc0: <Real Time Clock> at port 0x70-0x71 on isa0
> uart0: <16550 or compatible> at port 0x3f8-0x3ff irq 43 on isa0
> uart0: console (9600,n,8,1)
> uart1: <16550 or compatible> at port 0x2e8-0x2ef irq 43 on isa0
> Timecounters tick every 1.000 msec
> ad0: 38166MB <Seagate ST340824A 3.28> at ata2-master PIO4
> acd0: CDRW <RICOH CD-R/RW MP7200A/1.30> at ata3-master UDMA33
> Trying to mount root from ufs:/dev/ad0a
> Loading configuration files.
> Entropy harvesting: interrupts ethernet point_to_point kickstart.
> swapon: adding /dev/ad0b as swap device
> Starting file system checks:
> /dev/ad0a: FILE SYSTEM CLEAN; SKIPPING CHECKS
> /dev/ad0a: clean, 102079 free (975 frags, 12638 blocks, 0.8% fragmentation)
> /dev/ad0e: FILE SYSTEM CLEAN; SKIPPING CHECKS
> /dev/ad0e: clean, 127341 free (29 frags, 15914 blocks, 0.0% fragmentation)
> /dev/ad0f: FILE SYSTEM CLEAN; SKIPPING CHECKS
> /dev/ad0f: clean, 17986047 free (4295 frags, 2247719 blocks, 0.0% fragmentation)
> /dev/ad0d: FILE SYSTEM CLEAN; SKIPPING CHECKS
> /dev/ad0d: clean, 127258 free (42 frags, 15902 blocks, 0.0% fragmentation)
> >How-To-Repeat:
> Try to access IDE drives in a Sun Netra X1 using DMA mode. Tested FreeBSD installation media 5.3-RELEASE and 6.0-CURRENT-SNAP004. Earlier releases no testet.
> >Fix:
> If DMA is not used (hw.ata.ata_dma=0 in bootloader) the messages go away and access to HDD is possible without errors, but only in PIO4.


Søren, could you please look into this? AFAIK you also have a
Sun Netra X1. Like a couple of other Sun models these use an
onboard AcerLabs M5229 rev. 0xc3 and at least the 'TIMEOUT -
WRITE_DMA retrying' warnings haven been reported for pretty
much all of them, it seems much less likely to experience them
with the original Sun supplied disks though. On the other hand
there are a few reports like <200508071916.50197.Chris@LainOS.org>
on freebsd-current@ and this PR that the ATA disks aren't
useable at all. The problems seem to have started some time
in the earlier 5.x days but an exact date isn't know and are
still persistent after ATA mkIII. AFAICT the problems are also
limited to UDMA66 and don't happen when restricting to UDMA33.
Given that this also affects a couple of other models like
the AX1105, Blade 100, Fire V100, etc. and it's not possible
to plug in another controller on some of them this unfortunately
is a show-stopper type of problem.
The AcerLabs M5229 rev. 0xc3 are also know to suffer from a
silicon bug that can cause data corruption but which doesn't
seem to be the cause of the above mentioned problems (the
workaround is to disable and re-enable the respective channel
via the IDE interface control register of the accompanying
ISA bridge on reset, see the audit-trail of the PR for a
patch; the info is from OpenSolaris and an equivalent patch
was also incorporated into NetBSD). The data corruption issue
has been seen under FreeBSD in the past before other issues
like the WRITE_DMA timeouts occured and only seems to happen
ocassionally but not cause permanent problems like the other
problems.
Thanks.
Comment 6 Marius Strobl freebsd_committer freebsd_triage 2005-08-10 21:43:51 UTC
Responsible Changed
From-To: freebsd-sparc64->sos


Over to sos.
Comment 7 Søren Schmidt freebsd_committer freebsd_triage 2005-08-15 12:31:39 UTC
State Changed
From-To: open->analyzed

Hmm, my (only) SUN is a Netra T1 which has the c3 step 5229 chip as well, 
however it doesn't seem to have any problems at all using ATA66 disks, so 
at least the problem is not "universal". 
Anyhow I seem to recall a SUN enginer once telling me that the did change 
some HW to make it less "friendly" to non-SUN supplied disks but if forgot 
the details. 
I guess the best fix would be to simply disallow modes beyound ATA33 on SUN  
hardware as it seems that would allow  at least some DMA on those affected  
system, or did I get that wrong ?
Comment 8 marius 2005-08-15 19:10:03 UTC
> Hmm, my (only) SUN is a Netra T1 which has the c3 step 5229 chip as well, 
> however it doesn't seem to have any problems at all using ATA66 disks, so 
> at least the problem is not "universal". 

By "ATA66 disks" you mean you also gave it a try with non-Sun supplied
disks in addition to the originally supplied one?

> Anyhow I seem to recall a SUN enginer once telling me that the did change 
> some HW to make it less "friendly" to non-SUN supplied disks but if forgot 
> the details. 
> I guess the best fix would be to simply disallow modes beyound ATA33 on SUN  
> hardware as it seems that would allow  at least some DMA on those affected  
> system, or did I get that wrong ?

From my own experience and other reports I can say that the WRITE_DMA
timeouts (which in general don't seem to be fatal, they're just reported
over and over again) vanish when limiting to ATA33 but I don't know about
the particular problem of this PR (permanent data corruption). Sebastian,
could you please test what happens if you just limit to ATA33 instead
of disabling DMA completely? The simplest approach to achieve this
probably is to replace the 80-pin cable with a 40-pin one.
As for disallowing modes beyond ATA33 on Sun hardware in general as a
"fix" for the WRITE_DMA timeouts (and maybe this PR) is pretty gross
as there don't seem to be problems with ATA66 and up when putting PCI
ATA add-on cards into sparc64 machines (modulo stuff that needs the
firmware on these cards to be executed). So restricting the ATA33
limitation on sparc64 to the onboard M5229 rev. 0xc3 would be desirable.
Also as most of the onboard M5229 seem to work just fine at ATA66 on
sparc64 when using Sun-supplied disks modulo the occasional data
corruption due to the silicon bug it would be nice to default to ATA33
with these controllers but make this limitation easily overideable
(or maybe don't limit to ATA33 by default but throttle the DMA mode
in case a WRITE_DMA timeout occurs).

Marius
Comment 9 Sebastian Koehler 2005-08-16 14:24:19 UTC
It is exactly as you described, Marius. When I use an old 40 pin IDE
cable, the system is working fine. Please see dmesg for details:

uhub0: 2 ports with 2 removable, self powered
atapci0: <AcerLabs M5229 UDMA66 controller> port
0x10200-0x10207,0x10218-0x1021b,0x10210-0x10217,0x10208-0x1020b,0x10220-0x1022f
at device 13.0 on pci0
ata2: <ATA channel 0> on atapci0
ata3: <ATA channel 1> on atapci0
nexus0: <syscons>, type (unknown) (no driver attached)
rtc0: <Real Time Clock> at port 0x70-0x71 pnpid @@Kd041 on isa0
uart0: <16550 or compatible> at port 0x3f8-0x3ff irq 43 pnpid @HEd041 on
isa0
uart0: console (9600,n,8,1)
uart1: <16550 or compatible> at port 0x2e8-0x2ef irq 43 pnpid @HEd041 on
isa0
Timecounters tick every 1.000 msec
ad0: DMA limited to UDMA33, controller found non-ATA66 cable
ad0: 38166MB <Seagate ST340824A 3.28> at ata2-master UDMA33
Trying to mount root from ufs:/dev/ad0a
dc0: failed to force tx and rx to idle state
dc0: failed to force tx and rx to idle state
dc0: failed to force tx and rx to idle state
dc0: failed to force tx and rx to idle state
dc1: link state changed to UP

At this point system is able to access disks via UDMA33 and no data get
lost. I'm just wondering that Solaris has no problem accessing the disk
via UDMA66. Hope you can send me a patch for testing purposes or commit
it into HEAD.

Best Regards,
Sebastian
Comment 10 Sebastian Koehler 2005-08-16 17:58:13 UTC
Hope you can find something in the output.

# pciconf -l -v
isab0@pci0:7:0: class=0x060100 card=0x153310b9 chip=0x153310b9 rev=0x00
hdr=0x00
    vendor   = 'Acer Labs Incorporated (ALi)'
    device   = 'ALI M1533 Aladdin IV ISA Bridge'
    class    = bridge
    subclass = PCI-ISA
none0@pci0:3:0: class=0x000000 card=0x00000000 chip=0x710110b9 rev=0x00
hdr=0x00
    vendor   = 'Acer Labs Incorporated (ALi)'
    device   = 'ALI M7101 Power Management Controller'
    class    = old
    subclass = non-VGA display device
none1@pci0:3:0: class=0x000000 card=0x00000000 chip=0x710110b9 rev=0x00
hdr=0x00
    vendor   = 'Acer Labs Incorporated (ALi)'
    device   = 'ALI M7101 Power Management Controller'
    class    = old
    subclass = non-VGA display device
dc0@pci0:12:0:  class=0x020000 card=0x00000000 chip=0x91021282 rev=0x31
hdr=0x00
    vendor   = 'Davicom Semiconductor Inc.'
    device   = 'DM9102/A/AF Dell 4300S - CNET Pro200WL Ethernet Adapter'
    class    = network
    subclass = ethernet
dc1@pci0:5:0:   class=0x020000 card=0x00000000 chip=0x91021282 rev=0x31
hdr=0x00
    vendor   = 'Davicom Semiconductor Inc.'
    device   = 'DM9102/A/AF Dell 4300S - CNET Pro200WL Ethernet Adapter'
    class    = network
    subclass = ethernet
ohci0@pci0:10:0:        class=0x0c0310 card=0x00000000 chip=0x523710b9
rev=0x03 hdr=0x00
    vendor   = 'Acer Labs Incorporated (ALi)'
    device   = 'M5237 OpenHCI 1.1 USB Controller'
    class    = serial bus
    subclass = USB
atapci0@pci0:13:0:      class=0x0101ff card=0x00000000 chip=0x522910b9
rev=0xc3 hdr=0x00
    vendor   = 'Acer Labs Incorporated (ALi)'
    device   = 'M1543 Southbridge EIDE Controller'
    class    = mass storage
    subclass = ATA

# pciconf -r -b atapci0@pci0:13:0: 0:255
b9 10 29 52 05 00 90 02  c3 ff 01 01 00 10 00 00
01 02 01 00 19 02 01 00  11 02 01 00 09 02 01 00
21 02 01 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 60 00 00 00  00 00 00 00 0c 01 02 04
06 00 00 7f 00 00 00 00  00 02 00 c9 00 80 ba 1a
03 00 00 81 50 55 44 44  01 00 31 00 03 00 00 00
01 00 02 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  21 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
Comment 11 Sebastian Koehler 2005-08-16 20:10:14 UTC
Here is the output of the registers at different modes.

PIO4:
b9 10 29 52 05 00 90 02  c3 ff 01 01 00 10 00 00
01 02 01 00 19 02 01 00  11 02 01 00 09 02 01 00
21 02 01 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 60 00 00 00  00 00 00 00 0c 01 02 04
06 00 00 7f 00 00 00 00  00 02 01 c9 00 80 ba 1a
03 00 00 81 50 55 44 44  01 00 31 00 03 00 00 00
01 00 02 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  21 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00

UDMA33:
b9 10 29 52 05 00 90 02  c3 ff 01 01 00 10 00 00
01 02 01 00 19 02 01 00  11 02 01 00 09 02 01 00
21 02 01 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 60 00 00 00  00 00 00 00 0c 01 02 04
06 00 00 7f 00 00 00 00  00 02 01 c9 00 80 ba 1a
03 00 00 81 55 55 4a 44  01 00 31 00 03 00 00 00
01 00 02 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  21 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00

UDMA66:
b9 10 29 52 05 00 90 02  c3 ff 01 01 00 10 00 00
01 02 01 00 19 02 01 00  11 02 01 00 09 02 01 00
21 02 01 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 60 00 00 00  00 00 00 00 0c 01 02 04
06 00 00 7f 00 00 00 00  00 02 00 c9 00 80 ba 1a
03 00 00 81 55 55 48 44  01 00 31 00 03 00 00 00
01 00 02 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  21 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
Comment 12 Søren Schmidt freebsd_committer freebsd_triage 2005-08-16 21:14:05 UTC
On 16/08/2005, at 21:10, Sebastian Koehler wrote:

> Here is the output of the registers at different modes.

(dumps deleted)

Looks just as they should and exactly as what I see on my (working) =20
Netra T1, which is what I expected but better safe....
I can only assume that SUN did something "smart" to the HW that you =20
need to know about to make things work. Someone with access to the =20
failing HW and the proper measurement / testing equipment need to dig =20=

in there and figure out how/what they changed.
We can lobotomize ATA to only use at max UDMA33 on this Acer chip on =20
the sparc platform as a workaround, ugly but functional :)

- S=F8ren
Comment 13 Sebastian Koehler 2005-08-19 22:32:20 UTC
I've noticed your changes commited to HEAD. I've rebuilded world and
kernel with sources from today.
Your workaround did not resolve issue completely but changed it
slightly. When the 80 pin cable is used timeouts happen only from time
to time but disk access result in error messages. Attached some samples
for you.

1st sample:
GDB: no debug ports present
KDB: debugger backends: ddb
KDB: current backend: ddb
Copyright (c) 1992-2005 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD 7.0-CURRENT #5: Fri Aug 19 16:14:20 CEST 2005
    root@:/usr/obj/usr/src/sys/GENERIC
WARNING: WITNESS option enabled, expect reduced performance.
Timecounter "tick" frequency 500000000 Hz quality 1000
real memory  = 536870912 (512 MB)
avail memory = 505454592 (482 MB)
cpu0: Sun Microsystems UltraSparc-IIe Processor (500.00 MHz CPU)
nexus0: <Open Firmware Nexus device>
pcib0: <U2P UPA-PCI bridge> on nexus0
pcib0: Sabre, impl 0, version 0, ign 0x7c0, bus A
pcib0: [FAST]
pcib0: [GIANT-LOCKED]
pcib0: [FAST]
pcib0: [GIANT-LOCKED]
pcib0 dvma: DVMA map: 0x60000000 to 0x63ffffff
pci0: <OFW PCI bus> on pcib0
isab0: <PCI-ISA bridge> at device 7.0 on pci0
isa0: <ISA bus> on isab0
pci0: <old, non-VGA display device> at device 3.0 (no driver attached)
pci0: <old, non-VGA display device> at device 3.0 (no driver attached)
dc0: <Davicom DM9102A 10/100BaseTX> port 0x10000-0x100ff at device 12.0
on pci0
miibus0: <MII bus> on dc0
ukphy0: <Generic IEEE 802.3u media interface> on miibus0
ukphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
dc0: Ethernet address: 00:03:ba:0f:22:55
dc1: <Davicom DM9102A 10/100BaseTX> port 0x10100-0x101ff mem
0x2000-0x20ff at device 5.0 on pci0
miibus1: <MII bus> on dc1
ukphy1: <Generic IEEE 802.3u media interface> on miibus1
ukphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
dc1: Ethernet address: 00:03:ba:0f:22:55
ohci0: <AcerLabs M5237 (Aladdin-V) USB controller> mem
0x1000000-0x1000fff at device 10.0 on pci0
ohci0: [GIANT-LOCKED]
usb0: OHCI version 1.0, legacy support
usb0: <AcerLabs M5237 (Aladdin-V) USB controller> on ohci0
usb0: USB revision 1.0
uhub0: AcerLabs OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
atapci0: <AcerLabs M5229 UDMA66 controller> port
0x10200-0x10207,0x10218-0x1021b,0x10210-0x10217,0x10208-0x1020b,0x10220-0x1022f
at device 13.0 on pci0
atapci0: using PIO transfers above 137GB as workaround for 48bit DMA
access bug, expect reduced performance
ata2: <ATA channel 0> on atapci0
ata3: <ATA channel 1> on atapci0
nexus0: <syscons>, type (unknown) (no driver attached)
rtc0: <Real Time Clock> at port 0x70-0x71 pnpid @@Kd041 on isa0
uart0: <16550 or compatible> at port 0x3f8-0x3ff irq 43 pnpid @HEd041 on
isa0
uart0: console (9600,n,8,1)
uart1: <16550 or compatible> at port 0x2e8-0x2ef irq 43 pnpid @HEd041 on
isa0
Timecounters tick every 1.000 msec
ad0: 38166MB <Seagate ST340824A 3.28> at ata2-master UDMA66
Trying to mount root from ufs:/dev/ad0a
Aug 20 01:17:33 init: can't exec /bin/sh for /etc/rc: Exec format error

2nd sample:
Timecounters tick every 1.000 msec
ad0: 38166MB <Seagate ST340824A 3.28> at ata2-master UDMA66
Trying to mount root from ufs:/dev/ad0a
...
/dev/ad0d: clean, 96562 free (154 frags, 12051 blocks, 0.1% fragmentation)
ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=384
Setting hostname: sunshine.thrillkill.lan.
dc0: failed to force tx and rx to idle state
...
Creating and/or trimming log files:.
Starting syslogd.
ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=2839808
add net default: gateway 192.168.25.254
...
Starting sshd.
g_vfs_done():ad0a[READ(offset=8070450532280647680, length=16384)]error = 5
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 358 (sshd)
ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=47264
Segmentation fault (core dumped)
g_vfs_done():ad0a[READ(offset=8070450532280647680, length=16384)]error = 5
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 363 (sendmail)
Segmentation fault

Best Regards,
Sebastian
Comment 14 marius 2005-08-20 00:59:06 UTC
On Tue, Aug 16, 2005 at 03:24:19PM +0200, Sebastian Koehler wrote:
> 
> At this point system is able to access disks via UDMA33 and no data get
> lost. I'm just wondering that Solaris has no problem accessing the disk
> via UDMA66. Hope you can send me a patch for testing purposes or commit
> it into HEAD.
> 

Well, looks like I found the cause of the endless "TIMEOUT - WRITE_DMA
retrying" messages I'm seeing at ATA66. AFAICT the firmware initializes
the M5229 to use the ATA66 byte counter (probably also in the higher
modes on PATA chips newer than 0xc3) instead of triggering an interrupt
at the zero count of the transfer buffer counter which is not what we
want in ata(4) as it doesn't use the ATA66 byte counter. This involves
some guesswork as I don't have datasheets for these chips to back this
up. Soeren do you have them and can confirm this?
With the attached patch (against latest -current) I no longer can
provoke the mentioned messages. Sebastian, could you please give it a
try whether it also solves your problems at ATA66?

Marius
Comment 15 Sebastian Koehler 2005-08-20 11:47:36 UTC
With the patch error messages and the data corruption are gone in UDMA66
mode. I think you've got the right idea, even though without whitepapers. :)

Best Regards,
Sebastian
Comment 16 marius 2005-08-20 12:46:43 UTC
On Sat, Aug 20, 2005 at 12:47:36PM +0200, Sebastian Koehler wrote:
> With the patch error messages and the data corruption are gone in UDMA66
> mode. I think you've got the right idea, even though without whitepapers. :)
> 

Excellent. Soeren, could you please approve this patch or in case you
don't like something about the changes (style, comments, ...) modify
them accordingly and commit them yourself?

Marius
Comment 17 Søren Schmidt freebsd_committer freebsd_triage 2005-08-20 15:34:54 UTC
On 20/08/2005, at 13:46, Marius Strobl wrote:

> On Sat, Aug 20, 2005 at 12:47:36PM +0200, Sebastian Koehler wrote:
>
>> With the patch error messages and the data corruption are gone in =20
>> UDMA66
>> mode. I think you've got the right idea, even though without =20
>> whitepapers. :)
>>
>>
>
> Excellent. Soeren, could you please approve this patch or in case you
> don't like something about the changes (style, comments, ...) modify
> them accordingly and commit them yourself?

I'll look at it, but I'm busy this weekend. I have docs for the chip =20
so....

I'd like you guy to test it *without* the reset hack, as that should =20
not be needed the way we do things...

- S=F8ren
Comment 18 Sebastian Koehler 2005-08-23 05:13:32 UTC
> Yes, the above snippet was from my original version of the patch
> (the first in the audit trail in the PR) where ata_ali_reset() was
> hooked up for all flavours of Acer chips. For testing your previous
> version I just changed the chiprev to be checked for <= 0xc3 which
> than again would only apply to 0xc2 and 0xc3 as ata_ali_reset()
> is only called for ALINEW in your version...
> Anyway, your new version works fine here, thanks!

Patch is also working for me and my Netra X1. Thank you!

Best Regards,
Comment 19 Søren Schmidt freebsd_committer freebsd_triage 2005-08-29 12:28:13 UTC
State Changed
From-To: analyzed->closed

Fix has been committed based on the code here and Acer docs.