Bug 24092

Summary: Disk data corruption using FreeBSD_4_2_0_RELEASE
Product: Base System Reporter: Martin Birgmeier <Martin.Birgmeier>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 4.2-RELEASE   
Hardware: Any   
OS: Any   

Description Martin Birgmeier 2001-01-05 20:10:01 UTC
See below. When doing a cmp -x on the files just copied, it turns
out that blocks of size 2 ** n, with n between 6 and 12 inclusive,
are corrupt (sometimes more than one such block in the same
file).

Fortunately, mostly (but not only!) long files are affected.

Fix: 

Unknown.

However, I tried the following, without any improvements:

- In /sys/dev/ata/ata-all.c, made ata_umode() return -1 always. As a
  result, the disks used WDMA2.

- In /sys/dev/ata/ata-disk.c, ad_attach() (only one at a time of the
  following items):
  . fixed adp->transfersize at DEV_BSIZE
  . disabled write caching

With all this, I am pretty sure that the problem lies not with my
hardware, but within the vm/buffer subsystem and its interaction
with some other service, possibly malloc (corruptions seem to
always be powers of two in length, see above).

-- 
Martin Birgmeier

Vienna
Austria
How-To-Repeat: 
Use the following shell script. The file "SRC" contains data:
% ls -l /d/5s4g/fileX
-rw-r--r--  1 root  wheel  1083285504 Jan  2 14:24 .../fileX
%

----------------------------------------------------------------------
#! /bin/sh

SRC=/d/5s4g/fileX
DST=/d/6s4e/file

for i in 1 2 3 4 5 6 7 8
do
        echo "*** $i ***"
        dd if="$SRC" of="${DST}$i" bs=102400k || break
        for j in 1 2 3
        do
                cmp "$SRC" "${DST}$i" && break
        done
done
----------------------------------------------------------------------

What happens is that in about 50% of the cases, the compare does not
succeed (I once had a case where a compare failed on the first try,
but later succeeded; hence the triple comparison).

This happens most often on large(r) files, which is exactly the reason
why I am using a file of about 1 GB for testing.

Notes: I tested copying a file of 800 MB under Win98 twice - no
problems.  In addition, I installed Suse Linux 7.0 on ad0s3, and
tested copying a file of about 400 MB four times using the above
shell script (as in `for i in 1 2 3 4'...). No problems. Reason
for somewhat smaller file sizes is that I don't have much disk
space devoted to the other environments.
Comment 1 Martin Birgmeier 2001-01-06 20:05:24 UTC
Quite (or not so?) unbelievably, upgrading the motherboard's BIOS
seems to do the trick: From a7v1004c.zip to a7v1005a.zip.

I'll watch it some more and post a final note when everything
seems indeed doing well again.

I guess that now some chipset registers are initialized `more
correctly'. Would be nice if FreeBSD could do that instead of
relying on the BIOS, but I understand the task involved.

-- 
Martin Birgmeier

Vienna
Austria
Comment 2 Martin Birgmeier 2001-01-20 20:00:31 UTC
This indeed seems to have been a bios problem. No more data
corruption since bios update.

Someone please close this PR.

-- 
Martin Birgmeier

Vienna
Austria
Comment 3 dwmalone freebsd_committer freebsd_triage 2001-01-20 21:06:07 UTC
State Changed
From-To: open->closed

Closed at request of submitter.