Bug 95459 - Rebooting the system while rebuilding RAID (Intel MatrixRAID) results in data loss
Summary: Rebooting the system while rebuilding RAID (Intel MatrixRAID) results in data...
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 6.1-BETA4
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-04-07 06:40 UTC by oleg dashevskii
Modified: 2018-01-03 05:12 UTC (History)
0 users

See Also:


Attachments
ata-raid.c.patch7 (7.69 KB, text/plain)
2008-09-05 03:16 UTC, hsakamt
no flags Details
ata-raid.h.patch (1.29 KB, patch)
2008-09-05 03:16 UTC, hsakamt
no flags Details | Diff
ata-raid.c.patch6 (7.71 KB, text/plain)
2008-09-05 03:16 UTC, hsakamt
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description oleg dashevskii 2006-04-07 06:40:12 UTC
I've got a motherboard with a ICH7 chipset which supports RAID. Using BIOS utility, I created a RAID1 of two SATA disks (150 Gbytes each). I installed  FreeBSD 6.1-BETA4 then. No prob, ar0 has been detected and voila.

Upon installation, I wanted to check the RAID1 functioning and pulled away the power cord from one of the disks. This was immediately detected and RAID1 found itself in a DEGRADED state. I turned on the power again (the disk was detected) and used "atacontrol addspare" and then "atacontrol rebuild" to recreate the array.

The rebuilding process was nearly complete when I decided to reboot the box. To my surprise, the RAID was no more detected by the BIOS. The first disk was labeled as "Single" or "Separate" or whatever (don't remember it exactly), the second as "Spare". But there were no RAID volumes detected (as shown on the screen) and FreeBSD wasn't going to boot. So I had to "un-RAID" both disks, recreate the array and reinstall FreeBSD.

I decided to see what will be if I wait until complete rebuilding. Just after it was complete, the ATA driver hanged for nearly 10 secs. It unhanged with the following messages:
ad6: WARNING - WRITE_DMA taskqueue timeout - completing request directly
ad6: WARNING - WRITE_DMA48 freeing taskqueue zombie request

This is bad news - you get RAID1 for redundancy, but if you occasionally reboot while rebuilding, you lose ALL your data.

M$ Windows XP in a similar situation is able to continue the rebuilding process from the point it was stopped when you initiated a reboot.

Fix: 

none
How-To-Repeat: 1. Get a working RAID1 of two disks on a Intel MatrixRAID (ICH7 chipset).

2. Put array in a DEGRADED state by removing the power cord from one of the disks.

3. Regain the power, put the disk back to the array by using e.g. "atacontrol addspare ar0 ad4" and initiate the rebuilding by "atacontrol rebuild ar0".

4. Reboot the system while rebuilding.

5. You get it - the system doesn't boot, the data are LOST.
Comment 1 Wouter de Jong 2006-11-13 10:47:44 UTC
I get the same error message _after_ rebuilding (and the ar0 goes back to status READY).
However, my data is not lost nor the RAID.
The system just 'hangs'.

This is with both ICH5 and ICH6-chipset. (different SuperMicro server-models).

Nov 13 10:41:16 monitoring0-1 kernel: ad4: 76319MB <WDC WD800JD-00LSA0 06.01D06> at ata2-master SATA150
Nov 13 10:41:37 monitoring0-1 kernel: ad4: inserted into ar0 disk0 as spare
Nov 13 11:34:22 monitoring0-1 kernel: ad6: WARNING - WRITE_DMA taskqueue timeout - completing request directly
Nov 13 11:34:22 monitoring0-1 kernel: ad4: WARNING - WRITE_DMA freeing taskqueue zombie request
Nov 13 11:34:22 monitoring0-1 kernel: ad6: WARNING - WRITE_DMA freeing taskqueue zombie request

After rebooting, it's all OK _after_ a forced fsck in single mode.

Regards,

Wouter
Comment 2 hsakamt 2008-09-05 03:16:09 UTC
Hi,

I wrote a patch for 6_STABLE and 7_STABLE.
ata-raid.h.patch is common patch for both version.

I hope this patch, kern/102211(this patch also resolv kern/108924), and
kern/124064 will be commited in next RELEASE, if it's not too late.

---
 Hideki Sakamoto
Comment 3 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 07:59:07 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped