Bug 144214 - zfsboot fails on gang block after upgrade to zfs v14
Summary: zfsboot fails on gang block after upgrade to zfs v14
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: Unspecified
Hardware: Any Any
: Normal Affects Only Me
Assignee: Andriy Gapon
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-02-22 20:50 UTC by c.kworr
Modified: 2010-06-07 15:04 UTC (History)
0 users

See Also:


Attachments
zfs-boot-gang.patch (549 bytes, patch)
2010-05-13 07:59 UTC, Andriy Gapon
no flags Details | Diff
gang.diff (1.03 KB, patch)
2010-05-27 00:02 UTC, Andriy Gapon
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description c.kworr 2010-02-22 20:50:01 UTC
The bug is hard to get. Everything should match:

 * i386 arch,
 * `make installkernel`,
 * RELENG_8 after zfs v14 import,
 * zpool with zfs v14,
 * zfsboot installed.

Boot partition is NOT compressed, changing the boot partition does nothing, even creating a new one.

After that sometimes part of kernel or other modules would yield "ZFS: gang block detected" on loading.

Fix: 

Bootability can be restored with:

rsync -lrptygoWSH --delete /boot /somewhere/boot
rm -rf /boot
rsync -lrptygoWSH --delete /somewhere/boot /boot
How-To-Repeat: Any installkernel can do the trick.
Comment 1 Remko Lodder freebsd_committer freebsd_triage 2010-02-24 06:48:26 UTC
Responsible Changed
From-To: freebsd-i386->freebsd-fs

Reassign to fs team
Comment 2 Andrei V. Lavreniyuk 2010-02-24 08:29:48 UTC
Hi!


Fix:

# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ad0





-- 
  Best regards, Andrei V. Lavreniyuk.
Comment 3 c.kworr 2010-02-25 22:33:23 UTC
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada0
gpart: No such geom: ada0

Could you explain what exactly it should do? Using gptzfsboot instead of 
simple zfsboot?

-- 
Sphinx of black quartz judge my vow.
Comment 4 Andrei V. Lavreniyuk 2010-02-26 09:42:11 UTC
Hi!


http://wiki.freebsd.org/ZFS

http://blogs.freebsdish.org/lulf/2008/12/16/setting-up-a-zfs-only-system/



-- 
  Best regards, Andrei V. Lavreniyuk.
Comment 5 c.kworr 2010-02-26 09:56:19 UTC
26.02.2010 11:42, Andrei V. Lavreniyuk wrote:
> http://wiki.freebsd.org/ZFS
>
> http://blogs.freebsdish.org/lulf/2008/12/16/setting-up-a-zfs-only-system/

This is not my case. I have no partitions at all and I'm booting from a 
ZFS dedicated disk. And if I'm not missing something gpart requires 
separate partition with gptzfsboot to work.

-- 
Sphinx of black quartz judge my vow.
Comment 6 Dan Naumov 2010-02-26 23:57:16 UTC
> I have no partitions at all and I'm booting from a ZFS dedicated disk.

Sure you do have partitions. Booting off a purely ZFS disk without a
valid partition table is not and has never been possible in neither
FreeBSD nor Solaris.

What you need to do is update your bootcode (because it has changed
for zfs v14). How you do this depends on whether you use zfsboot or
gptzfsboot. The links above describe the process.


- Sincerely,
Dan Naumov
Comment 7 Daniel Gerzo freebsd_committer freebsd_triage 2010-04-29 01:11:16 UTC
Hello,

this bug seems to be still present in stable/8. The proposed workaround 
seems to work. You may find the console screenshot at
http://danger.rulez.sk/dockdrop/144214.png

-- 
S pozdravom / Best regards
   Daniel Gerzo, FreeBSD committer
Comment 8 Daniel Gerzo 2010-04-29 01:13:42 UTC
Hi,

note that the HDD has been almost full (97%) when the box died (ca. 2GB 
free).

-- 
S pozdravom / Best regards
   Daniel Gerzo
Comment 9 Martin Matuska freebsd_committer freebsd_triage 2010-04-29 01:16:24 UTC
This bug is still actual.
We had to forcibly reboot a server with a 97% full zpool (~2GB free space)
and we came to a ZFS: gang block detected
The workaround with re-creating and re-populating /boot worked.
Comment 10 Andriy Gapon 2010-04-29 13:10:28 UTC
Just to be on the sure side: have you guys actually updated bootblocks on your system?
I.e. the code that runs before loader and that resides beyond filesystems.

-- 
Andriy Gapon
Comment 11 Andriy Gapon 2010-05-13 06:59:28 UTC
I had a private conversation with Daniel Gerzo (danger@) and neither him nor mm@
are sure that the system for which they reported the problem had the latest boot
blocks that are supposed to actually support zfs gang blocks.

P.S. gang block support seems to have been added to stable/8 by rnoland@ on 21
Nov 2009 in r199634, so anything before that is not expected to work.

-- 
Andriy Gapon
Comment 12 Andriy Gapon 2010-05-13 07:59:00 UTC
It seems that I have been misunderstanding the problem.
"ZFS: gang block detected" won't even appear if boot code is too old.

Having briefly glanced over the code and comparing it to the code in osol and in
zio_gang_tree_issue(), I think the following change is needed.
But I am not sure if it is a real fix for the issue at hand.

If anyone can reproduce the problem, could you please test this change?
Thanks!

-- 
Andriy Gapon
Comment 13 Robert Noland freebsd_committer freebsd_triage 2010-05-13 14:52:14 UTC
Andriy Gapon wrote:
> It seems that I have been misunderstanding the problem.
> "ZFS: gang block detected" won't even appear if boot code is too old.
> 
> Having briefly glanced over the code and comparing it to the code in osol and in
> zio_gang_tree_issue(), I think the following change is needed.
> But I am not sure if it is a real fix for the issue at hand.
> 
> If anyone can reproduce the problem, could you please test this change?
> Thanks!

This looks sane.  I was never actually able to test it, since 
reproducing the issue is rather tricky.

robert.

>
Comment 14 c.kworr 2010-05-14 15:12:23 UTC
2010/5/13 Andriy Gapon <avg@icyb.net.ua>:
>
> It seems that I have been misunderstanding the problem.
> "ZFS: gang block detected" won't even appear if boot code is too old.
>
> Having briefly glanced over the code and comparing it to the code in osol and in
> zio_gang_tree_issue(), I think the following change is needed.
> But I am not sure if it is a real fix for the issue at hand.
>
> If anyone can reproduce the problem, could you please test this change?
> Thanks!

Tested it. Same problem.

1. Rebuild and reinstall on i386. Filling disk up (600M free of 120G, 0.5%).
2. Immediately after starting boot screen bursts into psychic colors.
Computer reboots.
3. Booted from ftp://ftp.freebsd.org/pub/FreeBSD/snapshots/201004/FreeBSD-8.0-STABLE-201004-i386-livefs.iso
in VirtualBox i386. Boot code updated with dd.
4. Same as p2. in vBox i386 takes looong time to rotate dash then
spits "ZFS: gang block detected" and hangs.
5. Booted from amd64 install, updated boot code with dd.
6. Booted on amd64. Immediately after starting boot spits out "ZFS:
gang block detected" and hangs.
7. Booted from amd64 install. /boot transferred transferred to/from other disk.
8. Booted on amd64. Immediately after starting boot spits out "ZFS:
gang block detected" and hangs.
9. Booted from amd64 install. Some files deleted (800M free, files
were written contiguously). /boot transferred transferred to/from
other disk.
10. Booted on amd64.

Results:
1. Patch changes something. However zfsloader(?) still can't be read completely.
2. Bug can happen on amd64. More extreme conditions needed(?).
3. I'll post a follow-up on successfully booting on original i386 hardware.

-- 
Sphinx of black quartz judge my vow.
Comment 15 Andriy Gapon 2010-05-14 19:15:31 UTC
on 14/05/2010 17:12 Volodymyr Kostyrko said the following:
> 2010/5/13 Andriy Gapon <avg@icyb.net.ua>:
>> It seems that I have been misunderstanding the problem.
>> "ZFS: gang block detected" won't even appear if boot code is too old.
>>
>> Having briefly glanced over the code and comparing it to the code in osol and in
>> zio_gang_tree_issue(), I think the following change is needed.
>> But I am not sure if it is a real fix for the issue at hand.
>>
>> If anyone can reproduce the problem, could you please test this change?
>> Thanks!
> 
> Tested it. Same problem.

Sigh.  I almost do not see any other obvious differences with other code that is
supposed to support gang blocks.

> 1. Rebuild and reinstall on i386. Filling disk up (600M free of 120G, 0.5%).
> 2. Immediately after starting boot screen bursts into psychic colors.
> Computer reboots.

With unpatched boot code I presume?

> 3. Booted from ftp://ftp.freebsd.org/pub/FreeBSD/snapshots/201004/FreeBSD-8.0-STABLE-201004-i386-livefs.iso
> in VirtualBox i386. Boot code updated with dd.

Have you updated both both part of zfsboot and loader?
Are you sure that you used patched versions? (asking just in case)

> 4. Same as p2. in vBox i386 takes looong time to rotate dash then
> spits "ZFS: gang block detected" and hangs.

Nothing else get printed?
Asking because of this screenshot:
http://danger.rulez.sk/dockdrop/144214.png

> 5. Booted from amd64 install, updated boot code with dd.
> 6. Booted on amd64. Immediately after starting boot spits out "ZFS:
> gang block detected" and hangs.
> 7. Booted from amd64 install. /boot transferred transferred to/from other disk.
> 8. Booted on amd64. Immediately after starting boot spits out "ZFS:
> gang block detected" and hangs.

amd64 has exactly the same boot code that i386 has, perhaps some difference
could arise during compilation, but even if so, it should not matter much in our
case.

> 9. Booted from amd64 install. Some files deleted (800M free, files
> were written contiguously). /boot transferred transferred to/from
> other disk.
> 10. Booted on amd64.

Not interested much in the workarounds - if they work, then OK, but mainly we
are trying to fix the boot code.  Only behavior of installed zfsboot and
zfsloader are interesting to us.

> Results:
> 1. Patch changes something. However zfsloader(?) still can't be read completely.
> 2. Bug can happen on amd64. More extreme conditions needed(?).
> 3. I'll post a follow-up on successfully booting on original i386 hardware.

Can you please also share output of 'zfs get all' for the boot filesystem?
Thank you for your help!

And one last thing that I could think of:
--- a/sys/boot/zfs/zfsimpl.c
+++ b/sys/boot/zfs/zfsimpl.c
@@ -1001,7 +1001,7 @@ zio_read(spa_t *spa, const blkptr_t *bp, void *buf)
 		if (DVA_GET_GANG(dva)) {
 			printf("ZFS: gang block detected!\n");
 			if (zio_read_gang(spa, bp, dva, buf))
-				return (EIO);
+				continue;
 		} else {
 			vdevid = DVA_GET_VDEV(dva);
 			offset = DVA_GET_OFFSET(dva);

This should be applied in addition to the previous patch.
If this still doesn't work, the it would make sense to add printfs in various
places of zio_read_gang() function to try to see what happens there.

-- 
Andriy Gapon
Comment 16 Andriy Gapon freebsd_committer freebsd_triage 2010-05-27 00:02:03 UTC
Here's a new patch that, as I strongly believe, should fix the problem for real.
I am sending "production ready" version of the patch, please keep "ZFS: gang
block detected!" message in your sources during testing/verification.

Thanks!
-- 
Andriy Gapon
Comment 17 Andriy Gapon freebsd_committer freebsd_triage 2010-05-28 07:53:51 UTC
State Changed
From-To: open->analyzed

It seems that I've got interested and involved in this PR. 


Comment 18 Andriy Gapon freebsd_committer freebsd_triage 2010-05-28 07:53:51 UTC
Responsible Changed
From-To: freebsd-fs->avg

It seems that I've got interested and involved in this PR.
Comment 19 dfilter service freebsd_committer freebsd_triage 2010-05-28 08:34:31 UTC
Author: avg
Date: Fri May 28 07:34:20 2010
New Revision: 208610
URL: http://svn.freebsd.org/changeset/base/208610

Log:
  boot/zfs: fix gang block reading code
  
  - use correct size (512) while reading a gang block
  - skip holes while reading child blocks
  - advance buffer pointer while reading child blocks
  
  PR:		144214
  MFC after:	10 days

Modified:
  head/sys/boot/zfs/zfsimpl.c

Modified: head/sys/boot/zfs/zfsimpl.c
==============================================================================
--- head/sys/boot/zfs/zfsimpl.c	Fri May 28 06:49:57 2010	(r208609)
+++ head/sys/boot/zfs/zfsimpl.c	Fri May 28 07:34:20 2010	(r208610)
@@ -958,12 +958,17 @@ zio_read_gang(spa_t *spa, const blkptr_t
 			break;
 	if (!vdev || !vdev->v_read)
 		return (EIO);
-	if (vdev->v_read(vdev, bp, &zio_gb, offset, SPA_GANGBLOCKSIZE))
+	if (vdev->v_read(vdev, NULL, &zio_gb, offset, SPA_GANGBLOCKSIZE))
 		return (EIO);
 
 	for (i = 0; i < SPA_GBH_NBLKPTRS; i++) {
-		if (zio_read(spa, &zio_gb.zg_blkptr[i], buf))
+		blkptr_t *gbp = &zio_gb.zg_blkptr[i];
+
+		if (BP_IS_HOLE(gbp))
+			continue;
+		if (zio_read(spa, gbp, buf))
 			return (EIO);
+		buf = (char*)buf + BP_GET_PSIZE(gbp);
 	}
  
 	return (0);
@@ -994,9 +999,8 @@ zio_read(spa_t *spa, const blkptr_t *bp,
 			continue;
 
 		if (DVA_GET_GANG(dva)) {
-			printf("ZFS: gang block detected!\n");
 			if (zio_read_gang(spa, bp, dva, buf))
-				return (EIO); 
+				continue;
 		} else {
 			vdevid = DVA_GET_VDEV(dva);
 			offset = DVA_GET_OFFSET(dva);
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Comment 20 Andriy Gapon freebsd_committer freebsd_triage 2010-05-28 08:44:31 UTC
State Changed
From-To: analyzed->patched

The fix is committed to head.
Comment 21 c.kworr 2010-05-28 22:29:52 UTC
2010/5/26 Andriy Gapon <avg@freebsd.org>:
> Here's a new patch that, as I strongly believe, should fix the problem for real.
> I am sending "production ready" version of the patch, please keep "ZFS: gang
> block detected!" message in your sources during testing/verification.

Yes, this patch works. After reinitializing boot code the message
"ZFS: gang block detected!" appears multiple times but system proceeds
with the boot sequence.

-- 
Sphinx of black quartz judge my vow.
Comment 22 dfilter service freebsd_committer freebsd_triage 2010-06-07 14:37:32 UTC
Author: avg
Date: Mon Jun  7 13:37:13 2010
New Revision: 208892
URL: http://svn.freebsd.org/changeset/base/208892

Log:
  MFC r208610: boot/zfs: fix gang block reading code
  
  - use correct size (512) while reading a gang block
  - skip holes while reading child blocks
  - advance buffer pointer while reading child blocks
  
  PR:		144214
  Approved by:	re(kib)

Modified:
  stable/8/sys/boot/zfs/zfsimpl.c
Directory Properties:
  stable/8/sys/   (props changed)
  stable/8/sys/amd64/include/xen/   (props changed)
  stable/8/sys/cddl/contrib/opensolaris/   (props changed)
  stable/8/sys/contrib/dev/acpica/   (props changed)
  stable/8/sys/contrib/pf/   (props changed)
  stable/8/sys/dev/xen/xenpci/   (props changed)
  stable/8/sys/geom/sched/   (props changed)

Modified: stable/8/sys/boot/zfs/zfsimpl.c
==============================================================================
--- stable/8/sys/boot/zfs/zfsimpl.c	Mon Jun  7 11:33:20 2010	(r208891)
+++ stable/8/sys/boot/zfs/zfsimpl.c	Mon Jun  7 13:37:13 2010	(r208892)
@@ -958,12 +958,17 @@ zio_read_gang(spa_t *spa, const blkptr_t
 			break;
 	if (!vdev || !vdev->v_read)
 		return (EIO);
-	if (vdev->v_read(vdev, bp, &zio_gb, offset, SPA_GANGBLOCKSIZE))
+	if (vdev->v_read(vdev, NULL, &zio_gb, offset, SPA_GANGBLOCKSIZE))
 		return (EIO);
 
 	for (i = 0; i < SPA_GBH_NBLKPTRS; i++) {
-		if (zio_read(spa, &zio_gb.zg_blkptr[i], buf))
+		blkptr_t *gbp = &zio_gb.zg_blkptr[i];
+
+		if (BP_IS_HOLE(gbp))
+			continue;
+		if (zio_read(spa, gbp, buf))
 			return (EIO);
+		buf = (char*)buf + BP_GET_PSIZE(gbp);
 	}
  
 	return (0);
@@ -994,9 +999,8 @@ zio_read(spa_t *spa, const blkptr_t *bp,
 			continue;
 
 		if (DVA_GET_GANG(dva)) {
-			printf("ZFS: gang block detected!\n");
 			if (zio_read_gang(spa, bp, dva, buf))
-				return (EIO); 
+				continue;
 		} else {
 			vdevid = DVA_GET_VDEV(dva);
 			offset = DVA_GET_OFFSET(dva);
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Comment 23 dfilter service freebsd_committer freebsd_triage 2010-06-07 14:44:13 UTC
Author: avg
Date: Mon Jun  7 13:44:04 2010
New Revision: 208893
URL: http://svn.freebsd.org/changeset/base/208893

Log:
  MFC r208610: boot/zfs: fix gang block reading code
  
  - use correct size (512) while reading a gang block
  - skip holes while reading child blocks
  - advance buffer pointer while reading child blocks
  
  PR:		144214

Modified:
  stable/7/sys/boot/zfs/zfsimpl.c
Directory Properties:
  stable/7/sys/   (props changed)
  stable/7/sys/cddl/contrib/opensolaris/   (props changed)
  stable/7/sys/contrib/dev/acpica/   (props changed)
  stable/7/sys/contrib/pf/   (props changed)

Modified: stable/7/sys/boot/zfs/zfsimpl.c
==============================================================================
--- stable/7/sys/boot/zfs/zfsimpl.c	Mon Jun  7 13:37:13 2010	(r208892)
+++ stable/7/sys/boot/zfs/zfsimpl.c	Mon Jun  7 13:44:04 2010	(r208893)
@@ -914,12 +914,17 @@ zio_read_gang(spa_t *spa, const blkptr_t
 			break;
 	if (!vdev || !vdev->v_read)
 		return (EIO);
-	if (vdev->v_read(vdev, bp, &zio_gb, offset, SPA_GANGBLOCKSIZE))
+	if (vdev->v_read(vdev, NULL, &zio_gb, offset, SPA_GANGBLOCKSIZE))
 		return (EIO);
 
 	for (i = 0; i < SPA_GBH_NBLKPTRS; i++) {
-		if (zio_read(spa, &zio_gb.zg_blkptr[i], buf))
+		blkptr_t *gbp = &zio_gb.zg_blkptr[i];
+
+		if (BP_IS_HOLE(gbp))
+			continue;
+		if (zio_read(spa, gbp, buf))
 			return (EIO);
+		buf = (char*)buf + BP_GET_PSIZE(gbp);
 	}
  
 	return (0);
@@ -950,9 +955,8 @@ zio_read(spa_t *spa, const blkptr_t *bp,
 			continue;
 
 		if (DVA_GET_GANG(dva)) {
-			printf("ZFS: gang block detected!\n");
 			if (zio_read_gang(spa, bp, dva, buf))
-				return (EIO); 
+				continue;
 		} else {
 			vdevid = DVA_GET_VDEV(dva);
 			offset = DVA_GET_OFFSET(dva);
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Comment 24 Andriy Gapon freebsd_committer freebsd_triage 2010-06-07 15:03:31 UTC
State Changed
From-To: patched->closed

Should be resolved now in all stable branches.