Bug 124969 - gvinum(8): gvinum raid5 plex does not detect missing subdisk
Summary: gvinum(8): gvinum raid5 plex does not detect missing subdisk
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 6.3-STABLE
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-06-25 02:30 UTC by Dan Ports
Modified: 2018-01-03 05:16 UTC (History)
0 users

See Also:


Attachments
gvinum_detect_down_disk.diff (11.50 KB, patch)
2008-07-09 18:38 UTC, Ulf Lilleengen
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dan Ports 2008-06-25 02:30:01 UTC
I am using gvinum to create a RAID 5 array with three drives (i.e. a
single raid5 plex with three subdisks). Recently, one drive failed. When
the drive failed (but was present on boot), the array continues to work
fine, albeit degraded, as one would expect. However, with the drive
removed, gvinum does not properly detect the plex's configuration on boot:

2 drives:
D b                     State: up       /dev/ad11s1d    A: 0/474891 MB (0%)
D a                     State: up       /dev/ad10s1d    A: 0/474891 MB (0%)

1 volume:
V space                 State: up       Plexes:       1 Size:        463 GB

1 plex:
P space.p0           R5 State: degraded Subdisks:     2 Size:        463 GB

3 subdisks:
S space.p0.s2           State: down     D: c            Size:        463 GB
S space.p0.s1           State: up       D: b            Size:        463 GB
S space.p0.s0           State: up       D: a            Size:        463 GB

Note that space.p0 has a capacity of 463 GB, the size of the drive,
when it should be twice that. It seems as though the plex isn't aware
that the downed subdisk ever existed. As a result, the volume is up,
but its data is not valid.

It seems a rather alarming flaw that a RAID 5 array fails to work
correctly when one drive is not present!

Fix: 

No fix, but the following thread appears to be describing the same
problem, and includes an analysis. However, the problem appears to
still exist. (I'm running 6.3-STABLE, and haven't tried either 7-STABLE
or -CURRENT, but a cursory examination of the cvs history provides no
indication that the problem has been fixed in other branches.)

http://lists.freebsd.org/pipermail/freebsd-geom/2007-March/002109.html

I'm willing to poke at this problem a bit more, but am probably the
wrong person to do so since I currently have neither the time nor any
geom experience.
How-To-Repeat: Create a gvinum raid5 plex with three subdisks, then remove the drive
corresponding to one of them.
Comment 1 Dan Ports 2008-06-25 06:00:36 UTC
Oops -- meant to file this under kern rather than misc. Sorry about
that!

Dan

-- 
Dan R. K. Ports                              <drkp-f@ambulatoryclam.net>
Research Henchman
Massachusetts Institute of Technology                     <drkp@mit.edu>
Computer Science and Artificial Intelligence Lab    <drkp@csail.mit.edu>
Comment 2 Mark Linimon freebsd_committer freebsd_triage 2008-06-25 06:25:24 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-geom

Over to maintainer(s).
Comment 3 Ulf Lilleengen 2008-06-25 20:42:06 UTC
Siterer Dan Ports <drkp-f@ambulatoryclam.net>:

>
>> Number:         124969
>> Category:       misc
>> Synopsis:       gvinum raid5 plex does not detect missing subdisk
>> Confidential:   no
>> Severity:       serious
>> Priority:       medium
>> Responsible:    freebsd-bugs
>> State:          open
>> Quarter:
>> Keywords:
>> Date-Required:
>> Class:          sw-bug
>> Submitter-Id:   current-users
>> Arrival-Date:   Wed Jun 25 01:30:01 UTC 2008
>> Closed-Date:
>> Last-Modified:
>> Originator:     Dan Ports
>> Release:        6.3-STABLE
>> Organization:
>> Environment:
> FreeBSD clamshell.ambulatoryclam.net 6.3-STABLE FreeBSD 6.3-STABLE  =20
> #4: Sat Jun 14 10:05:12 PDT 2008      =20
> root@clamshell.ambulatoryclam.net:/usr/obj/usr/src/sys/CLAMSHELL  i386
>> Description:
> I am using gvinum to create a RAID 5 array with three drives (i.e. a =20
>  single raid5 plex with three subdisks). Recently, one drive failed. =20
>  When the drive failed (but was present on boot), the array =20
> continues  to work fine, albeit degraded, as one would expect. =20
> However, with  the drive removed, gvinum does not properly detect =20
> the plex's  configuration on boot:
>
> 2 drives:
> D b                     State: up       /dev/ad11s1d    A: 0/474891 MB (0%=
)
> D a                     State: up       /dev/ad10s1d    A: 0/474891 MB (0%=
)
>
> 1 volume:
> V space                 State: up       Plexes:       1 Size:        463 G=
B
>
> 1 plex:
> P space.p0           R5 State: degraded Subdisks:     2 Size:        463 G=
B
>
> 3 subdisks:
> S space.p0.s2           State: down     D: c            Size:        463 G=
B
> S space.p0.s1           State: up       D: b            Size:        463 G=
B
> S space.p0.s0           State: up       D: a            Size:        463 G=
B
>
> Note that space.p0 has a capacity of 463 GB, the size of the drive,  =20
> when it should be twice that. It seems as though the plex isn't  =20
> aware that the downed subdisk ever existed. As a result, the volume  =20
> is up, but its data is not valid.
>
> It seems a rather alarming flaw that a RAID 5 array fails to work  =20
> correctly when one drive is not present!
>> How-To-Repeat:
> Create a gvinum raid5 plex with three subdisks, then remove the  =20
> drive corresponding to one of them.
>> Fix:
> No fix, but the following thread appears to be describing the same  =20
> problem, and includes an analysis. However, the problem appears to  =20
> still exist. (I'm running 6.3-STABLE, and haven't tried either  =20
> 7-STABLE or -CURRENT, but a cursory examination of the cvs history  =20
> provides no indication that the problem has been fixed in other  =20
> branches.)
>
> http://lists.freebsd.org/pipermail/freebsd-geom/2007-March/002109.html
>
> I'm willing to poke at this problem a bit more, but am probably the  =20
> wrong person to do so since I currently have neither the time nor  =20
> any geom experience.
>
>> Release-Note:
>> Audit-Trail:
>> Unformatted:
> _______________________________________________
> freebsd-bugs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-bugs
> To unsubscribe, send any mail to "freebsd-bugs-unsubscribe@freebsd.org"
>
>

This is a known issue, and I've fixed it in patches that are pending review
(for a few months now... ;).  If it's very critical for you right now, I can
create a patch for you, and request a commit for it, but as there are some
gvinum restructuring I'd like to get into the tree, I'd rather not fix the
same issues twice. but I agree this is a special case, so I'll try get out a
fix soon. I'm sorry for the inconvenience.

--=20
Ulf Lilleengen
Comment 4 Ulf Lilleengen 2008-07-09 18:38:24 UTC
Hello,

I think I managed to solve the issue, at least it works in CURRENT. This
patch is for RELENG_6, but I've not been able to test it. Be careful, I make
no guarantees of what it will do :) However, it doesn't touch any data, just
handles some extra fields and flags in the structures. I ended up making it
possible to have "referenced" drives (drives with no real device), as well as
subdisks that didn't have a "real" drive yet. Also, I fixed some issues with
the plex states when things are tasted, but please be careful and watch the
states and make sure they are what they "should" be (E.g. a failed drive that
is returning should neved have the UP state, since it must be synchronized
first).

I even found a "real" bug in CURRENT while making this patch as well. Please
tell me how it fares.

-- 
Ulf Lilleengen
Comment 5 Ulf Lilleengen 2008-07-09 18:49:07 UTC
Hi again,

I also put the patch here:
http://people.freebsd.org/~lulf/patches/gvinum/gvinum_detect_down_disk.diff

In case you love GNATS as much as I do :)

-- 
Ulf Lilleengen
Comment 6 Dan Ports 2008-07-29 00:53:05 UTC
 I finally got a chance to try your patch (unfortunately, work and life
intervened for a couple of weeks). I had to make a minor change or two
to get it to compile (e.g. a missing brace). Will provide an updated
patch shortly.

 Unfortunately, though the missing disk is now detected, the plex
configuration is not quite right. Here's the output:

3 drives:
D b                     State: up       /dev/ad11s1d    A: 0/474891 MB (0%)
D a                     State: up       /dev/ad10s1d    A: 0/474891 MB (0%)
D c                     State: down     /dev/???        

1 volume:
V space                 State: up       Plexes:       1 Size: 463 GB

1 plex:
P space.p0           R5 State: degraded Subdisks:     2 Size: 463 GB

3 subdisks:
S space.p0.s0           State: up       D: a            Size: 463 GB
S space.p0.s1           State: stale    D: b            Size: 463 GB
S space.p0.s2           State: down     D: c            Size: 0

 Note that drive c / subdisk s2 are correctly missing, but the plex
still contains only two subdisks and is half as large as it should be.
Also, not sure why subdisk s1 is marked stale.

 Thanks,

 Dan

-- 
Dan R. K. Ports                              <drkp-f@ambulatoryclam.net>
Research Henchman
Massachusetts Institute of Technology                     <drkp@mit.edu>
Computer Science and Artificial Intelligence Lab    <drkp@csail.mit.edu>
Comment 7 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 08:01:36 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped