Bug 77163

Summary: File cache gets corrupted, system randomly hangs, sometimes with disk corruption
Product: Base System Reporter: yuri
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 5.3-RELEASE   
Hardware: Any   
OS: Any   

Description yuri 2005-02-06 10:30:24 UTC
My system periodically hangs, sometimes there are disk corruption after reboot.

While running I was comparing number of identical files between harddrive and CD copies. Comparison fails randomly (~one out of 100 files of 560MB worth of files).
Once file gets in cache it always fails if repeatedly compared. When it gets out of cache -- some other will fail. Sometimes bad copy is in CD, sometimes in HD. Once it gets out of cache -- doesn't fail again, but some other one will.

Differences in files that I've spotted: from 1 to ~32 bytes continuously.

What's unusual about my system:
* I have SATA RAID disk array (2 identical disks Maxtor 6Y120M0/YAR51HW0, mirrored) (became supported only in 5.3 ?)
* I run i386 on AMD64
* I have recent NVidia card, but problem happens even w/out drivers installed.

Difference in copy of file coming from CD to my mind is telling that it's not HD hardware.
And it's not memory: I've ran each of two 512MB memory cards separately -- happens on both of them.

Looks like someone in kernel does a bad write in the memory.

I know this is a tough one
but I am lost with this problem.

Fix: 

N/A
How-To-Repeat: N/A
Comment 1 Poul-Henning Kamp 2005-02-06 13:11:56 UTC
This sounds exactly like the stuff I fought for half a year.

My motherboard was an Arima/Rioworks HDAMA and something on that board
just didn't like Promise chips.  This was a bit of a problem as the
onboard SATA channels are Promise.

I've heard that recent bios updates should have fixed it, but I have
not been able to check it.

If you have a HDAMA motherboard and a bios upgrade does not fix it,
return the board and tell them that you have the "promise data corruption
problem" and want a board that works.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
Comment 2 dwmalone 2005-02-06 17:19:05 UTC
On Sun, Feb 06, 2005 at 10:25:35AM +0000, Yuri wrote:
> Difference in copy of file coming from CD to my mind is telling that it's not HD hardware.
> And it's not memory: I've ran each of two 512MB memory cards separately -- happens on both of them.
> 
> Looks like someone in kernel does a bad write in the memory.

We saw a problem like this once and it was the disk controler fault,
sometimes it wouldn't finish the DMA of data into memory.

	David.
Comment 3 yuri 2005-02-06 20:26:39 UTC
>We saw a problem like this once and it was the disk controler fault,
>sometimes it wouldn't finish the DMA of data into memory.
>
>	David.
>  
>
I upgraded BIOS and problem seems to be gone.

Just for the record: Motherboard MSI: MS-6702, BIOS was v.1.0, upgraded 
to v.2.0, has Promise SATA controller by Marvell.

Thank you!
Yuri
Comment 4 Mark Linimon freebsd_committer freebsd_triage 2005-02-06 20:47:04 UTC
State Changed
From-To: open->closed

Submitter notes that problem went away after a BIOS upgrade.
Comment 5 yuri 2005-02-07 06:43:05 UTC
Also for the record: BIOS update also fixed memory clock problem:
DDR400 memory was unable to work @ 400, only @ 300, after upgrade
problem is also gone.


Yuri