Bug 44006

Summary: Filesystem corruption with ata(4) software-raid on HPT370.
Product: Base System Reporter: Pawe&#322; Ma&#322;achowski <pawmal>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 4.7-RELEASE   
Hardware: Any   
OS: Any   

Description Pawe&#322; Ma&#322;achowski 2002-10-13 14:00:04 UTC
      I've created ata(4) software-raid with HPT370 ATA-RAID controller and two MAXTOR 6L040J2 (740X) 40GB hard drives. After that, problem appeared:

PARTIALLY ALLOCATED INODE I=7595634
UNEXPECTED SOFT UPDATE INCONSISTENCY

CLEAR? [yn] y

PARTIALLY ALLOCATED INODE I=7595638
UNEXPECTED SOFT UPDATE INCONSISTENCY

CLEAR? [yn] y

** Phase 2 - Check Pathnames
UNALLOCATED  I=7595634  OWNER=root MODE=0
SIZE=0 MTIME=Jan  1 01:00 1970
NAME=/chroot/cvsup/home/ncvs/ports/games/mangband/files

UNEXPECTED SOFT UPDATE INCONSISTENCY

REMOVE? [yn] y

** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
LINK COUNT DIR I=7595633  OWNER=9012 MODE=40775
SIZE=512 MTIME=Oct 13 13:49 2002  COUNT 3 SHOULD BE 2
ADJUST? [yn] y

UNREF FILE  I=7599504  OWNER=9012 MODE=100444
SIZE=2026 MTIME=Oct  2 06:50 2002

NO lost+found DIRECTORY
CREATE? [yn] y

UNREF FILE  I=7599505  OWNER=9012 MODE=100444
SIZE=1770 MTIME=Oct  2 06:50 2002
RECONNECT? [yn] y

UNREF FILE  I=7599506  OWNER=9012 MODE=100444
SIZE=1024 MTIME=Oct  2 06:50 2002
RECONNECT? [yn] y

UNREF FILE  I=7599507  OWNER=9012 MODE=100444
SIZE=1952 MTIME=Jan 16 16:27 2001
RECONNECT? [yn] y

** Phase 5 - Check Cyl groups
SUMMARY INFORMATION BAD
SALVAGE? [yn] y

BLK(S) MISSING IN BIT MAPS
SALVAGE? [yn] y

FREE BLK COUNT(S) WRONG IN SUPERBLK
SALVAGE? [yn] y

I've removed ata-raid (ar0), mounted filesystems from first HDD (ad4), fsck, stressed, and everyting was OK (no such poblems). Then, I've mounted filesystems from second HDD (ad6), fsck, stressed, and everyting was OK, too. So, my hardware isn't faulty.
Then I've copied one drive to another (using dd or HPT BIOS), created ata-raid and after few hours my filesystems is in inconsistent state, again.
CPU cooling is OK and memory is well tested and reportd to be good.
This looks to be related with write operation on ar0, software ata-raid device.

Fix: 

Don't know.
How-To-Repeat:       Create ata-raid on identical hard disks using dd+atacontrol or HPT BIOS. Run complete `make buildworld' 2-4x times. After hard working filesystem is corrupted and `make buildworld' fails.
Comment 1 Søren Schmidt freebsd_committer freebsd_triage 2003-05-05 10:48:15 UTC
State Changed
From-To: open->closed

You cannot just dd one disk to another and then create a RAID ontop of that. 

You need to create the RAID *before* you create (disklabel newfs) your filesystems. 

What you encounter is problems because you fool the system into using a 
mirror where the two parts are almost but not entirely identical.
Comment 2 Pawe&#322; Ma&#322;achowski 2003-05-05 19:47:28 UTC
On 5 May 2003 at 2:50, Søren Schmidt wrote:

> Synopsis: Filesystem corruption with ata(4) software-raid on HPT370.
> 
> State-Changed-From-To: open->closed
> State-Changed-By: sos
> State-Changed-When: Mon May 5 02:48:15 PDT 2003
> State-Changed-Why: 
> You cannot just dd one disk to another and then create a RAID ontop of that.
> 
> You need to create the RAID *before* you create (disklabel newfs) your filesystems.
> 
> What you encounter is problems because you fool the system into using a
> mirror where the two parts are almost but not entirely identical.

I was not clear, ATA-RAID was created before installing system.
Filesystem was getting corrupted, so I've tried to synchronize
disks using HPT BIOS. The problem was still there so _then_ I've
decided to copy disks using dd.
Of course system never complained me about any inconsistence
between two disks in RAID1 (when data on disk1 and disk2 are
known to be different, shouldn't array be degraded?).

However, I think this PR should stay closed cause I know machines using
HPT370 without such problems -- I suspect my card was broken somehow.
I simply removed that HPT controller from my PC and I can't verify that
right now.


-- 
Pawel Malachowski