Bug 92786 - [ata] [patch] ATA fixes, write support for LSI v3 RAID
Summary: [ata] [patch] ATA fixes, write support for LSI v3 RAID
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 6.0-RELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-bugs mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-02-04 04:10 UTC by Garry Belka
Modified: 2018-01-03 05:16 UTC (History)
0 users

See Also:


Attachments
ata-incl.patch (58.99 KB, patch)
2006-02-04 04:10 UTC, Garry Belka
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Garry Belka 2006-02-04 04:10:04 UTC
This patch fixes or helps to avoid several ATA subsystem problems, namely:
 system crash after RAID hard disk failures or after disk removal
for hot-swappable disks;
deadlock during RAID label access or update;
systems failing to come up after reboot every 8-10 reboots;
system crashes doing atacontrol attach or detach commands;
minor fixes.

In addition, this patch includes a write support for LSIv3 RAID.
This is the hardware that is used by a current Intel Server Board (SE7520JR2).

And it introduces a scheme for defining array-specific write label functions
that divides functionality into two pieces, one to fill data in memory,
and the other to write label. If adopted, these scheme may enable
to use the same labelling routines with GEOM modules.

Fix: Some problems can be traced to missing synchronisation. Locking was added.

In other cases ATA request weren't tracked correctly, and the system tried to 
access device structures after they were freed. We added  reference counters
to requests, to be used in suspicious circumstances.

The larger problem is architectural: RAID label read and write are synchronous,
and stop any I/O completion on a channel until label requests are completed.
As a result, software interrupt servicing channel completion may have
several completion tasks waiting to be executed. Now, if one of the requests
on  a channel I/O queue before the label is a RAID composite request
that depends on a read operation on a different channel to complete
before it can continue,  we are in a deadlock.

The patch makes label updates asynchronous, and that takes care of the most situations. The price is a changed semantics for label write, - it now reports
success after starting an operation. I think it's acceptable,
because a failure of the forthcoming wirte will cause an array reconfiguration
and a visible message to user.

However, the problem is in a read label case. Not willing to introduce
massive architectural changes, I left it synchronous. So the deadlock 
described above may still happen if somebody uses atacontrol
at an unhappy moment.

Also, see above a comment regarding LSIv3 label support and a proposed 
label code structuring.
How-To-Repeat: reboot the system with RAID array multiple times.
or simulate disk failure (e.g., remove disk or detach disk) during normal disk operation or array rebuild.
or run rebuild cycle several times.
Comment 1 Matteo Riondato freebsd_committer 2006-02-04 09:11:48 UTC
Responsible Changed
From-To: freebsd-bugs->sos

Assign to sos@, aka Mr. ATA =)
Comment 2 Mark Linimon freebsd_committer freebsd_triage 2009-05-12 05:35:37 UTC
Responsible Changed
From-To: sos->freebsd-bugs

sos@ is not actively working on ATA-related PRs. 

To submitter: is this still an issue?
Comment 3 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 08:00:32 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped