I am writing capacity expansion support for geom_raid5, created by Arne Woerner. This expansion works great, yet I also need growfs to expand the filesystem on it afterwards. Through testing Arne and I discovered a bug or weakness in growfs: it will wrongfully assume the expanded disk area is full of zero's. growfs itself does not show any error and seems to work fine. It is the fsck afterwards that *can* kill a lot of data. It's safe to assume this is caused by 'stale' data on the expanded disk area, which confuses fsck because it thinks its part of the filesystem where in fact it is not 'active' filesystem data. In most cases, fsck will report a ton of errors, clear bogus files and leave you with a healthy filesystem where only your original files remained. In some cases, though, I managed to kill off about 40% of my original files after expanding using growfs. I did a fsck before using growfs to verify the filesystem was clean, after using growfs I ran fsck again and it did an 'extra step' (2b I think) to find more DUPs (duplicate inodes I assume?). After fsck finished, it left me with a crippled filesystem where lots of files were gone. Also fsck was not able to resolve all issues, no matter how much I ran fsck there remained some errors. I suspect this is because the bogus data in the expanded area was part of a real UFS filesystem in the past, and as such many UFS file structures remained that would have confused fsck. This behavior is not acceptable! Simply referring to the "you should have made a backup" is no excuse for having a flawed growfs. Growfs is required to make a proper RAID5 expansion possible - a feature that will be quite hot when geom_raid5 is integrated into FreeBSD source tree. Please help resolve this issue and pave the way for the "Just Works" principle that makes software popular and sexy. Fix: The solution is rather simple: growfs should write zeroes to the expanding area before writing the metadata structures. This will ensure no bogus filesystem structures exist which may lead to fsck mutilating the filesystem. Until such feature exists, the manpage should make it very clear that a zero-write procedure has to be performed manually, using dd. Workaround: before using growfs, blank the area using dd: dd if=/dev/zero of=<disk> bs=512 oseek=<sector starting with expansion> Contact via email for details. - Veronica How-To-Repeat: * create a filesystem shorter than the device, using the -s argument and populate with data: newfs -s 30000 /dev/ad4 mount /dev/ad4 /mnt (copy data onto /mnt until its full) umount /dev/ad4 * now write random data to the area which will be expanded dd if=/dev/urandom of=/dev/ad4 oseek=30000 bs=512 * check filesystem for errors (should be clean) fsck -t ufs /dev/ad4 * grow filesystem growfs /dev/ad4 * check again for errors fsck -t ufs -y /dev/ad4 Result: You will find tons if not millions of errors. Most will be about the bogus data in the expanded area. But in a few cases fsck after growfs destroyed 'original' files as well, deleting over 40% of the original filesystem.
I took a look at this. Let me say up front that I am not an fs expert, I'm just reading the code, so any of this could be wrong. It looks like this was introduced in revision 1.23 of src/sbin/growfs/growfs.c, commited by scottl. There are actually two different bugs for UFS1 and UFS2. The idea is that inodes have to be initialized to zero before they are used, and fsck complains if this isn't the case, even if those inodes are not in use according to the cylinder group bitmap. In UFS1, newfs is expected to initialize all inodes. In UFS2, inode initialization is "lazy"; there is a field cg_initediblk in the cylinder group indicating how many inodes have been initialized, starting from the first. (It looks like this and cg_niblk count inodes, not blocks, despite the name.) If more are needed, the kernel initializes them as necessary. For some reason, newfs initializes the first two blocks worth of inodes in every cg. growfs works by creating new cylinder groups in the filesystem, each of which comes with its own chunk of inodes. Prior to revision 1.23, growfs initialized the inodes in all the cgs it created, for both UFS1 and UFS2. Revision 1.23 attempted to optimize this behavior. For UFS2, it removed the inode initialization completely, but did not adjust cg_initediblk accordingly, so the kernel and fsck still thought that all inodes were initialized when actually they were not. My patch fixes this by setting cg_initedblk to zero. It would be possible to behave like newfs and initialize the first few inodes, but I don't see why that is necessary. It is simpler to skip it entirely, and in my tests it works fine. For UFS1, revision 1.23 changes it so that all *but* the first two blocks of inodes are initialized. It looks like the author was confused by some out-of-context code from newfs (which I'm not sure is exactly right itself), since this doesn't make any sense. My patch changes it back to initializing all inodes, and removes some leftover UFS2 initialization code as well, which is no longer used. I tested this patch on a md filesystem, filling it with random bytes, creating and filling a 100 MB filesystem, then growing it to 200 MB. fsck passes the new filesystem and all files remain intact, with both UFS1 and UFS2. --- /usr/src/sbin/growfs/growfs.c Tue Nov 29 23:20:16 2005 +++ ./growfs.c Tue Oct 16 23:07:47 2007 @@ -376,7 +376,6 @@ long d, dlower, dupper, blkno, start; ufs2_daddr_t i, cbase, dmax; struct ufs1_dinode *dp1; - struct ufs2_dinode *dp2; struct csum *cs; if (iobuf == NULL && (iobuf = malloc(sblock.fs_bsize)) == NULL) { @@ -401,7 +400,7 @@ acg.cg_magic = CG_MAGIC; acg.cg_cgx = cylno; acg.cg_niblk = sblock.fs_ipg; - acg.cg_initediblk = sblock.fs_ipg; + acg.cg_initediblk = 0; acg.cg_ndblk = dmax - cbase; if (sblock.fs_contigsumsize > 0) acg.cg_nclusterblks = acg.cg_ndblk / sblock.fs_frag; @@ -445,26 +444,16 @@ setbit(cg_inosused(&acg), i); acg.cg_cs.cs_nifree--; } - /* - * XXX Newfs writes out two blocks of initialized inodes - * unconditionally. Should we check here to make sure that they - * were actually written? - */ if (sblock.fs_magic == FS_UFS1_MAGIC) { bzero(iobuf, sblock.fs_bsize); - for (i = 2 * sblock.fs_frag; i < sblock.fs_ipg / INOPF(&sblock); + for (i = 0; i < sblock.fs_ipg / INOPF(&sblock); i += sblock.fs_frag) { dp1 = (struct ufs1_dinode *)iobuf; - dp2 = (struct ufs2_dinode *)iobuf; #ifdef FSIRAND - for (j = 0; j < INOPB(&sblock); j++) - if (sblock.fs_magic == FS_UFS1_MAGIC) { - dp1->di_gen = random(); - dp1++; - } else { - dp2->di_gen = random(); - dp2++; - } + for (j = 0; j < INOPB(&sblock); j++) { + dp1->di_gen = random(); + dp1++; + } #endif wtfs(fsbtodb(&sblock, cgimin(&sblock, cylno) + i), sblock.fs_bsize, iobuf, fso, Nflag); -- Nate Eldredge nge@cs.hmc.edu
Responsible Changed From-To: freebsd-bugs->gavin Take. The UFS2 bug was fixed in r197763, the UFS1 bug can still be recreated on HEAD with: mdconfig -a -t swap -s 1G dd if=/dev/random of=/dev/md0 bs=1048576 newfs -s 30000 -O 1 /dev/md0 mount /dev/md0 /mnt cp -r /usr/src/sys /mnt umount /mnt fsck -t ufs /dev/md0 growfs /dev/md0 fsck -t ufs /dev/md0
Author: gavin Date: Tue Feb 2 19:51:30 2010 New Revision: 203397 URL: http://svn.freebsd.org/changeset/base/203397 Log: Merge r201401 from head: Remove dead code. This section of code is only run in the (sblock.fs_magic == FS_UFS1_MAGIC) case, so the check within the loop is redundant. PR: bin/115174 (partly) Submitted by: Nate Eldredge nge cs.hmc.edu Reviewed by: mjacob Approved by: ed (mentor, implicit) Modified: stable/8/sbin/growfs/growfs.c Directory Properties: stable/8/sbin/growfs/ (props changed) Modified: stable/8/sbin/growfs/growfs.c ============================================================================== --- stable/8/sbin/growfs/growfs.c Tue Feb 2 19:44:52 2010 (r203396) +++ stable/8/sbin/growfs/growfs.c Tue Feb 2 19:51:30 2010 (r203397) @@ -376,7 +376,6 @@ initcg(int cylno, time_t utime, int fso, long d, dlower, dupper, blkno, start; ufs2_daddr_t i, cbase, dmax; struct ufs1_dinode *dp1; - struct ufs2_dinode *dp2; struct csum *cs; if (iobuf == NULL && (iobuf = malloc(sblock.fs_bsize)) == NULL) { @@ -455,16 +454,11 @@ initcg(int cylno, time_t utime, int fso, for (i = 2 * sblock.fs_frag; i < sblock.fs_ipg / INOPF(&sblock); i += sblock.fs_frag) { dp1 = (struct ufs1_dinode *)iobuf; - dp2 = (struct ufs2_dinode *)iobuf; #ifdef FSIRAND - for (j = 0; j < INOPB(&sblock); j++) - if (sblock.fs_magic == FS_UFS1_MAGIC) { - dp1->di_gen = random(); - dp1++; - } else { - dp2->di_gen = random(); - dp2++; - } + for (j = 0; j < INOPB(&sblock); j++) { + dp1->di_gen = random(); + dp1++; + } #endif wtfs(fsbtodb(&sblock, cgimin(&sblock, cylno) + i), sblock.fs_bsize, iobuf, fso, Nflag); _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Author: gavin Date: Sat Feb 13 16:22:08 2010 New Revision: 203835 URL: http://svn.freebsd.org/changeset/base/203835 Log: When growing a UFS1 filesystem, we need to initialise all inodes in any new cylinder groups that are created. When the filesystem is first created, newfs always initialises the first two blocks of inodes, and then in the UFS1 case will also initialise the remaining inode blocks. The changes in growfs.c 1.23 broke the initialisation of all inodes, seemingly based on this implementation detail in newfs(8). The result was that instead of initialising all inodes, we would actually end up initialising all but the first two blocks of inodes. If the filesystem was grown into empty (all-zeros) space then the resulting filesystem was fine, however when grown onto non-zeroed space the filesystem produced would appear to have massive corruption on the first fsck after growing. A test case for this problem can be found in the PR audit trail. Fix this by once again initialising all inodes in the UFS1 case. PR: bin/115174 Submitted by: Nate Eldredgei nge cs.hmc.edu Reviewed by: mjacob MFC after: 1 month Modified: head/sbin/growfs/growfs.c Modified: head/sbin/growfs/growfs.c ============================================================================== --- head/sbin/growfs/growfs.c Sat Feb 13 16:04:58 2010 (r203834) +++ head/sbin/growfs/growfs.c Sat Feb 13 16:22:08 2010 (r203835) @@ -450,13 +450,11 @@ initcg(int cylno, time_t utime, int fso, acg.cg_cs.cs_nifree--; } /* - * XXX Newfs writes out two blocks of initialized inodes - * unconditionally. Should we check here to make sure that they - * were actually written? + * For the old file system, we have to initialize all the inodes. */ if (sblock.fs_magic == FS_UFS1_MAGIC) { bzero(iobuf, sblock.fs_bsize); - for (i = 2 * sblock.fs_frag; i < sblock.fs_ipg / INOPF(&sblock); + for (i = 0; i < sblock.fs_ipg / INOPF(&sblock); i += sblock.fs_frag) { dp1 = (struct ufs1_dinode *)iobuf; #ifdef FSIRAND _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
State Changed From-To: open->patched UFS1 case is fixed in head in r203835. UFS2 case was fixed in r197763. I'm planning on MFCing r203835 in a month.
Author: gavin Date: Wed Mar 17 20:27:35 2010 New Revision: 205259 URL: http://svn.freebsd.org/changeset/base/205259 Log: Merge r203835 from head: When growing a UFS1 filesystem, we need to initialise all inodes in any new cylinder groups that are created. When the filesystem is first created, newfs always initialises the first two blocks of inodes, and then in the UFS1 case will also initialise the remaining inode blocks. The changes in growfs.c 1.23 broke the initialisation of all inodes, seemingly based on this implementation detail in newfs(8). The result was that instead of initialising all inodes, we would actually end up initialising all but the first two blocks of inodes. If the filesystem was grown into empty (all-zeros) space then the resulting filesystem was fine, however when grown onto non-zeroed space the filesystem produced would appear to have massive corruption on the first fsck after growing. A test case for this problem can be found in the PR audit trail. Fix this by once again initialising all inodes in the UFS1 case. PR: bin/115174 Submitted by: "Nate Eldredge" <nge cs.hmc.edu> Reviewed by: mjacob Modified: stable/8/sbin/growfs/growfs.c Directory Properties: stable/8/sbin/growfs/ (props changed) Modified: stable/8/sbin/growfs/growfs.c ============================================================================== --- stable/8/sbin/growfs/growfs.c Wed Mar 17 20:23:14 2010 (r205258) +++ stable/8/sbin/growfs/growfs.c Wed Mar 17 20:27:35 2010 (r205259) @@ -445,13 +445,11 @@ initcg(int cylno, time_t utime, int fso, acg.cg_cs.cs_nifree--; } /* - * XXX Newfs writes out two blocks of initialized inodes - * unconditionally. Should we check here to make sure that they - * were actually written? + * For the old file system, we have to initialize all the inodes. */ if (sblock.fs_magic == FS_UFS1_MAGIC) { bzero(iobuf, sblock.fs_bsize); - for (i = 2 * sblock.fs_frag; i < sblock.fs_ipg / INOPF(&sblock); + for (i = 0; i < sblock.fs_ipg / INOPF(&sblock); i += sblock.fs_frag) { dp1 = (struct ufs1_dinode *)iobuf; #ifdef FSIRAND _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Author: gavin Date: Wed Mar 17 20:30:37 2010 New Revision: 205260 URL: http://svn.freebsd.org/changeset/base/205260 Log: Merge r201401 from head: Remove dead code. This section of code is only run in the (sblock.fs_magic == FS_UFS1_MAGIC) case, so the check within the loop is redundant. Merge 203835 from head: When growing a UFS1 filesystem, we need to initialise all inodes in any new cylinder groups that are created. When the filesystem is first created, newfs always initialises the first two blocks of inodes, and then in the UFS1 case will also initialise the remaining inode blocks. The changes in growfs.c 1.23 broke the initialisation of all inodes, seemingly based on this implementation detail in newfs(8). The result was that instead of initialising all inodes, we would actually end up initialising all but the first two blocks of inodes. If the filesystem was grown into empty (all-zeros) space then the resulting filesystem was fine, however when grown onto non-zeroed space the filesystem produced would appear to have massive corruption on the first fsck after growing. A test case for this problem can be found in the PR audit trail. Fix this by once again initialising all inodes in the UFS1 case. PR: bin/115174 Submitted by: "Nate Eldredge" <nge cs.hmc.edu> Reviewed by: mjacob Modified: stable/7/sbin/growfs/growfs.c Directory Properties: stable/7/sbin/growfs/ (props changed) Modified: stable/7/sbin/growfs/growfs.c ============================================================================== --- stable/7/sbin/growfs/growfs.c Wed Mar 17 20:27:35 2010 (r205259) +++ stable/7/sbin/growfs/growfs.c Wed Mar 17 20:30:37 2010 (r205260) @@ -375,7 +375,6 @@ initcg(int cylno, time_t utime, int fso, long blkno, start; ufs2_daddr_t i, cbase, dmax; struct ufs1_dinode *dp1; - struct ufs2_dinode *dp2; struct csum *cs; uint d, dupper, dlower; @@ -446,25 +445,18 @@ initcg(int cylno, time_t utime, int fso, acg.cg_cs.cs_nifree--; } /* - * XXX Newfs writes out two blocks of initialized inodes - * unconditionally. Should we check here to make sure that they - * were actually written? + * For the old file system, we have to initialize all the inodes. */ if (sblock.fs_magic == FS_UFS1_MAGIC) { bzero(iobuf, sblock.fs_bsize); - for (i = 2 * sblock.fs_frag; i < sblock.fs_ipg / INOPF(&sblock); + for (i = 0; i < sblock.fs_ipg / INOPF(&sblock); i += sblock.fs_frag) { dp1 = (struct ufs1_dinode *)iobuf; - dp2 = (struct ufs2_dinode *)iobuf; #ifdef FSIRAND - for (j = 0; j < INOPB(&sblock); j++) - if (sblock.fs_magic == FS_UFS1_MAGIC) { - dp1->di_gen = random(); - dp1++; - } else { - dp2->di_gen = random(); - dp2++; - } + for (j = 0; j < INOPB(&sblock); j++) { + dp1->di_gen = random(); + dp1++; + } #endif wtfs(fsbtodb(&sblock, cgimin(&sblock, cylno) + i), sblock.fs_bsize, iobuf, fso, Nflag); _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
State Changed From-To: patched->closed Fixed in head, and merged to stable/8, stable/7 and stable/6. Thanks for your bug report!
Author: gavin Date: Wed Mar 17 20:32:13 2010 New Revision: 205261 URL: http://svn.freebsd.org/changeset/base/205261 Log: Merge r201401 from head: Remove dead code. This section of code is only run in the (sblock.fs_magic == FS_UFS1_MAGIC) case, so the check within the loop is redundant. Merge 203835 from head: When growing a UFS1 filesystem, we need to initialise all inodes in any new cylinder groups that are created. When the filesystem is first created, newfs always initialises the first two blocks of inodes, and then in the UFS1 case will also initialise the remaining inode blocks. The changes in growfs.c 1.23 broke the initialisation of all inodes, seemingly based on this implementation detail in newfs(8). The result was that instead of initialising all inodes, we would actually end up initialising all but the first two blocks of inodes. If the filesystem was grown into empty (all-zeros) space then the resulting filesystem was fine, however when grown onto non-zeroed space the filesystem produced would appear to have massive corruption on the first fsck after growing. A test case for this problem can be found in the PR audit trail. Fix this by once again initialising all inodes in the UFS1 case. PR: bin/115174 Submitted by: "Nate Eldredge" <nge cs.hmc.edu> Reviewed by: mjacob Modified: stable/6/sbin/growfs/growfs.c Directory Properties: stable/6/sbin/growfs/ (props changed) Modified: stable/6/sbin/growfs/growfs.c ============================================================================== --- stable/6/sbin/growfs/growfs.c Wed Mar 17 20:30:37 2010 (r205260) +++ stable/6/sbin/growfs/growfs.c Wed Mar 17 20:32:13 2010 (r205261) @@ -376,7 +376,6 @@ initcg(int cylno, time_t utime, int fso, long d, dlower, dupper, blkno, start; ufs2_daddr_t i, cbase, dmax; struct ufs1_dinode *dp1; - struct ufs2_dinode *dp2; struct csum *cs; if (iobuf == NULL && (iobuf = malloc(sblock.fs_bsize)) == NULL) { @@ -446,25 +445,18 @@ initcg(int cylno, time_t utime, int fso, acg.cg_cs.cs_nifree--; } /* - * XXX Newfs writes out two blocks of initialized inodes - * unconditionally. Should we check here to make sure that they - * were actually written? + * For the old file system, we have to initialize all the inodes. */ if (sblock.fs_magic == FS_UFS1_MAGIC) { bzero(iobuf, sblock.fs_bsize); - for (i = 2 * sblock.fs_frag; i < sblock.fs_ipg / INOPF(&sblock); + for (i = 0; i < sblock.fs_ipg / INOPF(&sblock); i += sblock.fs_frag) { dp1 = (struct ufs1_dinode *)iobuf; - dp2 = (struct ufs2_dinode *)iobuf; #ifdef FSIRAND - for (j = 0; j < INOPB(&sblock); j++) - if (sblock.fs_magic == FS_UFS1_MAGIC) { - dp1->di_gen = random(); - dp1++; - } else { - dp2->di_gen = random(); - dp2++; - } + for (j = 0; j < INOPB(&sblock); j++) { + dp1->di_gen = random(); + dp1++; + } #endif wtfs(fsbtodb(&sblock, cgimin(&sblock, cylno) + i), sblock.fs_bsize, iobuf, fso, Nflag); _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"