Bug 75317

Summary: [busdma] [patch] ATA DMA broken on PCalpha
Product: Base System Reporter: Sven Petai <hadara>
Component: alphaAssignee: freebsd-alpha (Nobody) <alpha>
Status: Closed FIXED    
Severity: Affects Only Me CC: sos
Priority: Normal    
Version: 6.0-CURRENT   
Hardware: Any   
OS: Any   

Description Sven Petai 2004-12-20 16:40:12 UTC
Machine fails to boot with various different symptoms after mounting root, sometimes
it hangs, sometimes it gets machine check etc. I traced it down to introduction
of version 1.129 of the file src/sys/dev/ata/ata-dma.c which among other changes removes code
that did split up segments into page sized chunks to avoid running into some kind
of bug in busdma. it was commented as:
"A maximum segment size was specified for bus_dma_tag_create, but
 some busdma code does not seem to honor this, so fix up if needed."

System: FreeBSD alpha 6.0-CURRENT FreeBSD 6.0-CURRENT #23: Mon Dec 20 05:07:30 EET 2004     root@alpha:/mnt/disk/obj/usr/src/sys/HADARA  alpha dmesg can be found @ http://bsd.ee/~hadara/debug/pcalpha/kernel.txt
but relevant details of it should be:
Digital AlphaPC 164LX 533 MHz, 531MHz
8192 byte page size, 1 processor.
...
ad0: 1226MB <FUJITSU M1636TAU/5045> [2491/16/63] at ata0-master WDMA2
ad1: 3052MB <SAMSUNG SV0322A/JK200-36> [11024/9/63] at ata0-slave WDMA2
...
this is IDE only machine

Fix: easy & ugly workaround is to just disable ata dma from the loader with
set hw.ata.ata_dma=0
maybe a little bit better workaround is following patch, which just reverts to 
previous behaviour:



if ((args->error = error))
        return;

+    lastcount = j = 0;
     for (i = 0; i < nsegs; i++) {
-       prd[i].addr = htole32(segs[i].ds_addr);
-       prd[i].count = htole32(segs[i].ds_len);
+       for (cnt = 0; cnt < segs[i].ds_len; cnt += PAGE_SIZE, j++) {
+             prd[j].addr = htole32(segs[i].ds_addr + cnt);
+             lastcount = ulmin(segs[i].ds_len - cnt, PAGE_SIZE) & 0xffff;
+             prd[j].count = htole32(lastcount);
+        }
     }
-    prd[i - 1].count |= htole32(ATA_DMA_EOT);
+    prd[j - 1].count |= htole32(lastcount | ATA_DMA_EOT);
 }

 static int


of course the real solution should be finding and fixing the bug in busdma code.--AbFnSMZuom3S6vHyAEs8iUF8o1XI9PAJaIBQaIAhZ1he5F0l
Content-Type: text/plain; name="file.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="file.diff"

--- sys/dev/ata/ata-dma.c.orig  Mon Dec 20 08:27:25 2004
+++ sys/dev/ata/ata-dma.c       Mon Dec 20 08:47:15 2004
@@ -198,16 +198,22 @@
 {
     struct ata_dmasetprd_args *args = xsc;
     struct ata_dma_prdentry *prd = args->dmatab;
-    int i;
+    int i,j;
+    bus_size_t cnt;
+    u_int32_t lastcount;
How-To-Repeat: Just try to boot fbsd 5.3 release on similar hardware.
Comment 1 Sven Petai 2004-12-22 17:24:16 UTC
I managed to put a wrong link in the original
bugreport. The correct dmesg -v output is located at
http://bsd.ee/~hadara/debug/pcalpha/pcalpha_panic_08.11.2004.txt


I think I now understand what causes this bug,
First of all there seems to be a slight bug in Alphas
bus_dmamap_load(), which could be fixed with following
patch:

--- sys/alpha/alpha/busdma_machdep.c.orig       Wed Dec 22 17:11:27 2004
+++ sys/alpha/alpha/busdma_machdep.c    Wed Dec 22 17:13:57 2004
@@ -581,7 +581,8 @@
                if (sg->ds_len == 0) {
                        sg->ds_addr = paddr + alpha_XXX_dmamap_or;
                        sg->ds_len = size;
-               } else if (paddr == nextpaddr) {
+               } else if (paddr == nextpaddr &&
+            (sg->ds_len + size) <= dmat->maxsegsz) {
                        sg->ds_len += size;
                } else {
                        /* Go to the next segment */


without that we really could return larger chunks than
specified by maxsegsz to bus_dma_tag_create function.

anyway this still doesn't make things work correctly for me,
because the real problem seems to be the Pyxis 
page crossing bug. Basically it comes down to corrupting
DMA transfers larger than 8k. It didn't cause problems before,
since we never did larger than PAGE_SIZE transfers before 
the ATA dma change mentioned in the original report. 
There's a detection code for the buggy chip @
src/sys/alpha/pci/cia.c
but it's little too naive, since it assumes only DEC_ST550 can
have it, in reality it seems to be used in some very early 
revisions of 164LX(SX too?). But there doesn't seem to be a
reliable way to detect if we have the faulty chip since
it was worked around in later revisions by doing some changes
elsewhere.

One of the possible easy solutions would be to hack ata_dmaalloc() to
use PAGE_SIZE as max segment size argument to the 
bus_dma_tag_create function if machine has Pyxis chip at all,
no matter if it's faulty or not.
Would that be acceptable and if so what is the best way to propagate
knowledge about existence of this chip from cia driver to ATA ?
Comment 2 Nathan Whitehorn 2006-03-07 22:24:43 UTC
This occurs on my 164SX running 6.1-PRERELEASE and can be reproduced on 
6.0-RELEASE and 5.4-STABLE as well. None of the above patches fix the 
problem on 6.0. With ATA drives in the system, and DMA on, machine 
checks occur every few hours at best, every few minutes at worst. With 
the same drives attached by firewire, I haven't had a single machine 
check ever.
Comment 3 Remko Lodder freebsd_committer freebsd_triage 2007-03-26 21:36:44 UTC
State Changed
From-To: open->closed

alpha is no longer supported and time will not be invested to get things 
fixed. Our apologies for our lack of attention for this issue.
Comment 4 Remko Lodder freebsd_committer freebsd_triage 2007-03-27 06:28:25 UTC
State Changed
From-To: closed->open

People are actively working on this, reopen the ticket (Thanks John, again my apologies)
Comment 5 dfilter service freebsd_committer freebsd_triage 2007-11-27 17:43:56 UTC
jhb         2007-11-27 17:43:50 UTC

  FreeBSD src repository

  Modified files:        (Branch: RELENG_6)
    sys/alpha/alpha      busdma_machdep.c 
  Log:
  Cleanup the alpha bus dma code a bit and sync it up with i386.  Changes
  include:
  - Honor alignment and boundary restrictions on DMA tags by using bounce
    pages for misaligned buffers and not coalescing pages if the resulting
    segment would cross a boundary.
  - Teach the _bus_dmamap_load_buffer() helper function to use bounce pages
    when needed and change bus_dmamap_load() to use the helper function
    instead of largely duplicating it.  As a side effect, this enables bounce
    page support for the other load routines (load_mbuf(), load_mbuf_sg(),
    and load_uio()).
  
  Honoring the boundary restrictions partially helps with the Alpha ATA DMA
  problem.  More work is needed for that however (and forthcoming).
  
  PR:             alpha/75317
  Tested by:      wilko
  Approved by:    re (kensmith)
  
  Revision  Changes    Path
  1.51.2.2  +155 -158  src/sys/alpha/alpha/busdma_machdep.c
_______________________________________________
cvs-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/cvs-all
To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org"
Comment 6 dfilter service freebsd_committer freebsd_triage 2007-12-10 20:14:23 UTC
jhb         2007-12-10 20:14:16 UTC

  FreeBSD src repository

  Modified files:        (Branch: RELENG_6)
    sys/alpha/alpha      busdma_machdep.c 
    sys/alpha/include    md_var.h 
    sys/alpha/pci        cia.c 
  Log:
  - Add a workaround for the DMA bugs on some alpha chipsets that ATA DMA
    trips over often.  Specifically, in these chipsets DMA transfers that
    cross a page boundary result in data corruption.  The workaround is to
    not allow any DMA transfers for non-static DMA maps (i.e. "real"
    transfers as opposed to work areas allocated with bus_dmamem_alloc()) to
    cross a page in a single S/G element.  This behavior is enabled by
    setting 'busdma_pyxis_bug' to 1.
  - Add a new tunable 'machdep.busdma_pyxis_bug' that can be used to enable
    the workaround from the loader.  This can be used to enable it on
    chipsets where we don't automatically enable it.
  - Auto-enable the workaround for buggy PYXIS 1 chipsets supported via
    cia(4).
  
  PR:             alpha/75317
  
  Revision   Changes    Path
  1.51.2.3   +23 -6     src/sys/alpha/alpha/busdma_machdep.c
  1.23.10.1  +1 -0      src/sys/alpha/include/md_var.h
  1.44.2.1   +1 -0      src/sys/alpha/pci/cia.c
_______________________________________________
cvs-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/cvs-all
To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org"
Comment 7 John Baldwin freebsd_committer freebsd_triage 2008-02-11 19:16:28 UTC
State Changed
From-To: open->closed

This is believed to be fixed in 6.3 based on at least one positive user 
report.