I am using gvinum to create a RAID 5 array with three drives (i.e. a single raid5 plex with three subdisks). Recently, one drive failed. When the drive failed (but was present on boot), the array continues to work fine, albeit degraded, as one would expect. However, with the drive removed, gvinum does not properly detect the plex's configuration on boot: 2 drives: D b State: up /dev/ad11s1d A: 0/474891 MB (0%) D a State: up /dev/ad10s1d A: 0/474891 MB (0%) 1 volume: V space State: up Plexes: 1 Size: 463 GB 1 plex: P space.p0 R5 State: degraded Subdisks: 2 Size: 463 GB 3 subdisks: S space.p0.s2 State: down D: c Size: 463 GB S space.p0.s1 State: up D: b Size: 463 GB S space.p0.s0 State: up D: a Size: 463 GB Note that space.p0 has a capacity of 463 GB, the size of the drive, when it should be twice that. It seems as though the plex isn't aware that the downed subdisk ever existed. As a result, the volume is up, but its data is not valid. It seems a rather alarming flaw that a RAID 5 array fails to work correctly when one drive is not present! Fix: No fix, but the following thread appears to be describing the same problem, and includes an analysis. However, the problem appears to still exist. (I'm running 6.3-STABLE, and haven't tried either 7-STABLE or -CURRENT, but a cursory examination of the cvs history provides no indication that the problem has been fixed in other branches.) http://lists.freebsd.org/pipermail/freebsd-geom/2007-March/002109.html I'm willing to poke at this problem a bit more, but am probably the wrong person to do so since I currently have neither the time nor any geom experience. How-To-Repeat: Create a gvinum raid5 plex with three subdisks, then remove the drive corresponding to one of them.
Oops -- meant to file this under kern rather than misc. Sorry about that! Dan -- Dan R. K. Ports <drkp-f@ambulatoryclam.net> Research Henchman Massachusetts Institute of Technology <drkp@mit.edu> Computer Science and Artificial Intelligence Lab <drkp@csail.mit.edu>
Responsible Changed From-To: freebsd-bugs->freebsd-geom Over to maintainer(s).
Siterer Dan Ports <drkp-f@ambulatoryclam.net>: > >> Number: 124969 >> Category: misc >> Synopsis: gvinum raid5 plex does not detect missing subdisk >> Confidential: no >> Severity: serious >> Priority: medium >> Responsible: freebsd-bugs >> State: open >> Quarter: >> Keywords: >> Date-Required: >> Class: sw-bug >> Submitter-Id: current-users >> Arrival-Date: Wed Jun 25 01:30:01 UTC 2008 >> Closed-Date: >> Last-Modified: >> Originator: Dan Ports >> Release: 6.3-STABLE >> Organization: >> Environment: > FreeBSD clamshell.ambulatoryclam.net 6.3-STABLE FreeBSD 6.3-STABLE =20 > #4: Sat Jun 14 10:05:12 PDT 2008 =20 > root@clamshell.ambulatoryclam.net:/usr/obj/usr/src/sys/CLAMSHELL i386 >> Description: > I am using gvinum to create a RAID 5 array with three drives (i.e. a =20 > single raid5 plex with three subdisks). Recently, one drive failed. =20 > When the drive failed (but was present on boot), the array =20 > continues to work fine, albeit degraded, as one would expect. =20 > However, with the drive removed, gvinum does not properly detect =20 > the plex's configuration on boot: > > 2 drives: > D b State: up /dev/ad11s1d A: 0/474891 MB (0%= ) > D a State: up /dev/ad10s1d A: 0/474891 MB (0%= ) > > 1 volume: > V space State: up Plexes: 1 Size: 463 G= B > > 1 plex: > P space.p0 R5 State: degraded Subdisks: 2 Size: 463 G= B > > 3 subdisks: > S space.p0.s2 State: down D: c Size: 463 G= B > S space.p0.s1 State: up D: b Size: 463 G= B > S space.p0.s0 State: up D: a Size: 463 G= B > > Note that space.p0 has a capacity of 463 GB, the size of the drive, =20 > when it should be twice that. It seems as though the plex isn't =20 > aware that the downed subdisk ever existed. As a result, the volume =20 > is up, but its data is not valid. > > It seems a rather alarming flaw that a RAID 5 array fails to work =20 > correctly when one drive is not present! >> How-To-Repeat: > Create a gvinum raid5 plex with three subdisks, then remove the =20 > drive corresponding to one of them. >> Fix: > No fix, but the following thread appears to be describing the same =20 > problem, and includes an analysis. However, the problem appears to =20 > still exist. (I'm running 6.3-STABLE, and haven't tried either =20 > 7-STABLE or -CURRENT, but a cursory examination of the cvs history =20 > provides no indication that the problem has been fixed in other =20 > branches.) > > http://lists.freebsd.org/pipermail/freebsd-geom/2007-March/002109.html > > I'm willing to poke at this problem a bit more, but am probably the =20 > wrong person to do so since I currently have neither the time nor =20 > any geom experience. > >> Release-Note: >> Audit-Trail: >> Unformatted: > _______________________________________________ > freebsd-bugs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-bugs > To unsubscribe, send any mail to "freebsd-bugs-unsubscribe@freebsd.org" > > This is a known issue, and I've fixed it in patches that are pending review (for a few months now... ;). If it's very critical for you right now, I can create a patch for you, and request a commit for it, but as there are some gvinum restructuring I'd like to get into the tree, I'd rather not fix the same issues twice. but I agree this is a special case, so I'll try get out a fix soon. I'm sorry for the inconvenience. --=20 Ulf Lilleengen
Hello, I think I managed to solve the issue, at least it works in CURRENT. This patch is for RELENG_6, but I've not been able to test it. Be careful, I make no guarantees of what it will do :) However, it doesn't touch any data, just handles some extra fields and flags in the structures. I ended up making it possible to have "referenced" drives (drives with no real device), as well as subdisks that didn't have a "real" drive yet. Also, I fixed some issues with the plex states when things are tasted, but please be careful and watch the states and make sure they are what they "should" be (E.g. a failed drive that is returning should neved have the UP state, since it must be synchronized first). I even found a "real" bug in CURRENT while making this patch as well. Please tell me how it fares. -- Ulf Lilleengen
Hi again, I also put the patch here: http://people.freebsd.org/~lulf/patches/gvinum/gvinum_detect_down_disk.diff In case you love GNATS as much as I do :) -- Ulf Lilleengen
I finally got a chance to try your patch (unfortunately, work and life intervened for a couple of weeks). I had to make a minor change or two to get it to compile (e.g. a missing brace). Will provide an updated patch shortly. Unfortunately, though the missing disk is now detected, the plex configuration is not quite right. Here's the output: 3 drives: D b State: up /dev/ad11s1d A: 0/474891 MB (0%) D a State: up /dev/ad10s1d A: 0/474891 MB (0%) D c State: down /dev/??? 1 volume: V space State: up Plexes: 1 Size: 463 GB 1 plex: P space.p0 R5 State: degraded Subdisks: 2 Size: 463 GB 3 subdisks: S space.p0.s0 State: up D: a Size: 463 GB S space.p0.s1 State: stale D: b Size: 463 GB S space.p0.s2 State: down D: c Size: 0 Note that drive c / subdisk s2 are correctly missing, but the plex still contains only two subdisks and is half as large as it should be. Also, not sure why subdisk s1 is marked stale. Thanks, Dan -- Dan R. K. Ports <drkp-f@ambulatoryclam.net> Research Henchman Massachusetts Institute of Technology <drkp@mit.edu> Computer Science and Artificial Intelligence Lab <drkp@csail.mit.edu>
For bugs matching the following criteria: Status: In Progress Changed: (is less than) 2014-06-01 Reset to default assignee and clear in-progress tags. Mail being skipped