Bug 209096 - zfsroot bricked on 10.3-RELEASE
Summary: zfsroot bricked on 10.3-RELEASE
Status: Closed Works As Intended
Alias: None
Product: Base System
Classification: Unclassified
Component: misc (show other bugs)
Version: 10.3-RELEASE
Hardware: Any Any
: Normal Affects Many People
Assignee: freebsd-fs (Nobody)
URL:
Keywords: needs-patch, needs-qa
Depends on:
Blocks:
 
Reported: 2016-04-27 12:41 UTC by Daniel Ylitalo
Modified: 2016-08-03 22:08 UTC (History)
4 users (show)

See Also:
koobs: mfc-stable11?
koobs: mfc-stable10?


Attachments
screenshot of boot error (182.22 KB, image/jpeg)
2016-08-02 11:27 UTC, Daniel Ylitalo
no flags Details
zdb -C output (87.70 KB, image/jpeg)
2016-08-02 12:41 UTC, Daniel Ylitalo
no flags Details
lsdev -v not found (60.53 KB, image/jpeg)
2016-08-02 12:46 UTC, Daniel Ylitalo
no flags Details
zdb mos output (139.93 KB, image/jpeg)
2016-08-03 07:58 UTC, Daniel Ylitalo
no flags Details
zfsboottest output (108.64 KB, image/jpeg)
2016-08-03 08:05 UTC, Daniel Ylitalo
no flags Details
Working boot lsdev -v (114.25 KB, image/jpeg)
2016-08-03 21:45 UTC, Daniel Ylitalo
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel Ylitalo 2016-04-27 12:41:17 UTC
Hi,

I just tried going to 10.3-RELEASE from 10.2-RELEASE-p14

I'm running a zfsroot system in a single disk installed with default options from the 10.2-RELEASE installer.

When i ran freebsd-update -r '10.3-RELEASE' and then rebooted once the kernel stuff was installed I was displayed with this error:
ZFS: i/o error all block copies unavailable

I was then dropped to the boot loader
boot:

If entering ? there I could see the boot folder together with all the other regular folders

Only way I was able to recover from this was booting with the 10.2-RELEASE livecd and issuing the following;
zpool import -R /mnt -f zroot
mv /mnt/boot /mnt/boot.bricked
cp -Rp /boot /mnt/boot
cp /mnt/boot.bricked/loader.conf /mnt/boot/loader.conf
reboot

Trying the above commands with the 10.3-RELEASE livecd ended up with the same result as with upgrading with freebsd (dropped to boot:)

My ZFS root looks like this:
===================================================================
[root@content-01-group-01-stockholm-se ~]# gpart show
=>         34  11710627773  mfid0  GPT  (5.5T)
           34            6         - free -  (3.0K)
           40         1024      1  freebsd-boot  (512K)
         1064          984         - free -  (492K)
         2048      8388608      2  freebsd-swap  (4.0G)
      8390656  11702235136      3  freebsd-zfs  (5.4T)
  11710625792         2015         - free -  (1.0M)
===================================================================
[root@content-01-group-01-stockholm-se ~]# zpool status
  pool: zroot
 state: ONLINE
  scan: scrub repaired 0 in 4h29m with 0 errors on Tue Apr 26 23:44:35 2016
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          mfid0p3   ONLINE       0     0     0

errors: No known data errors
Comment 1 Steven Hartland freebsd_committer freebsd_triage 2016-07-10 15:46:40 UTC
Can you check to messages during the boot with the new kernel to see if you see mfid0 created?
Comment 2 Daniel Ylitalo 2016-07-11 10:55:54 UTC
Hmm, is there anyway to enable verbose booting or something?

Or where would I look for this message?

Nothing is outputted by the boot loader before the "ZFS: i/o error all block copies unavailable" message and I'm dropped to boot:

(This is before the kernel is being loaded)
Comment 3 Steven Hartland freebsd_committer freebsd_triage 2016-07-11 11:45:26 UTC
So this was a loader message not a kernel message?

If so can you confirm what boot method your using I'm guessing bios not efi?

If so then did you update your boot blocks e.g.
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 mfid0
Comment 4 Daniel Ylitalo 2016-07-11 14:25:26 UTC
No, I havent tried updating the bootcode, should that be necessary between release upgrades?

(10.2 is running fine)

If you want I can give that a try from the 10.3 live cd in about two weeks because right now the failover server for that cluster is offline so I can't take this server down to test with until the failover server is back online.
Comment 5 Daniel Ylitalo 2016-07-11 14:26:56 UTC
Sorry, missed answering your first question, yes bios boot and not efi
Comment 6 Steven Hartland freebsd_committer freebsd_triage 2016-07-11 14:27:32 UTC
Yep give that a go.
Comment 7 Daniel Ylitalo 2016-08-02 11:26:39 UTC
That kind of made things worse, now it won't boot even if I copy back the /boot folder from the live cd

I can still import the zroot pool from both the 10.2 and 10.3 live cd

I'm also attaching a screenshot if that tells you anything.
Comment 8 Daniel Ylitalo 2016-08-02 11:27:11 UTC
Created attachment 173189 [details]
screenshot of boot error
Comment 9 Steven Hartland freebsd_committer freebsd_triage 2016-08-02 11:46:33 UTC
Is your pool configured with blocks > 128K as that isn't supported for booting from.
Comment 10 Daniel Ylitalo 2016-08-02 11:50:55 UTC
How can I see that?

This pool is created from the 10.2 installer using the zfsroot option (so it should not have anything not supported)
Comment 11 Steven Hartland freebsd_committer freebsd_triage 2016-08-02 12:21:10 UTC
The following should show you
zpool get feature@large_blocks

Note that setting recordsize on dataset is the trigger between enabled and active for this.

It may be the message in your screenshot is invalid but worth checking.
Comment 12 Steven Hartland freebsd_committer freebsd_triage 2016-08-02 12:27:00 UTC
Also whats the output of:
zdb -C
and:
zpool status

For zdb if you're booted from an alternative medium you may need to use -U to ensure it uses the cache file from your main pool.
Comment 13 Andriy Gapon freebsd_committer freebsd_triage 2016-08-02 12:32:56 UTC
Can you also capture a full output of lsdev -v command issued at the loader boot prompt?  Your problem may have to do with the firmware giving a wrong value for disk sizes (especially given that your disk seems to be larger than 2 TB).
Comment 14 Daniel Ylitalo 2016-08-02 12:35:29 UTC
This is what it has;

NAME    PROPERTY                 VALUE              SOURCE
zroot   feature@large_blocks     enabled            local
Comment 15 Daniel Ylitalo 2016-08-02 12:41:07 UTC
Created attachment 173192 [details]
zdb -C output
Comment 16 Daniel Ylitalo 2016-08-02 12:46:13 UTC
Hmm, the boot loader wont let me run lsdev, attaching screenshot
Comment 17 Daniel Ylitalo 2016-08-02 12:46:34 UTC
Created attachment 173193 [details]
lsdev -v not found
Comment 18 Daniel Ylitalo 2016-08-02 13:00:18 UTC
You have the zpool status output in my initial post
Comment 19 Steven Hartland freebsd_committer freebsd_triage 2016-08-02 13:06:03 UTC
Whats the content of /boot/loader.conf from the pool?
Comment 20 Daniel Ylitalo 2016-08-02 13:19:22 UTC
Here is the loader.conf;

kern.geom.label.gptid.enable="0"
zfs_load="YES"
vfs.zfs.arc_max="4294967296"

net.inet.tcp.tcbhashsize="32768"
net.inet.tcp.hostcache.bucketlimit="128"
net.inet.tcp.hostcache.hashsize="16384"
net.inet.tcp.syncache.hashsize="16384"
net.inet.tcp.syncache.bucketlimit="128"
net.link.ifqmaxlen="2048"
Comment 21 Daniel Ylitalo 2016-08-02 13:47:25 UTC
Any suggestions? :)

Or am I better off just reinstalling this machine tomorrow with regular ufs as it's a single drive anyway
Comment 22 Andriy Gapon freebsd_committer freebsd_triage 2016-08-02 14:07:04 UTC
(In reply to Daniel Ylitalo from comment #17)
Ah, sorry, you are not even getting to the loader which does support lsdev.
You are stuck at zfsboot stage.
Comment 23 Andriy Gapon freebsd_committer freebsd_triage 2016-08-02 14:23:08 UTC
(In reply to Daniel Ylitalo from comment #21)

You can check current location of zroot's MOS like this:
zdb -dddd zroot 0
If this command fails, try prepending -e before other options.

You can also try to check the pool with zfsboottest tool if you have /usr/src checkout:
cd /usr/src/tools/tools/zfsboottest/
make
./zfsboottest.sh zroot
Comment 24 Daniel Ylitalo 2016-08-03 07:58:43 UTC
Created attachment 173211 [details]
zdb mos output

I have no idea what the output is telling me but it looks good atleast :)
Comment 25 Daniel Ylitalo 2016-08-03 08:05:53 UTC
Created attachment 173212 [details]
zfsboottest output

zfsboottest sais everything is ok :(
Comment 26 Daniel Ylitalo 2016-08-03 08:18:50 UTC
Anyone of you have a verbose version of the boot loader i can install/apply so that we can see whats going on perhaps?
Comment 27 Andriy Gapon freebsd_committer freebsd_triage 2016-08-03 12:17:36 UTC
(In reply to Daniel Ylitalo from comment #26)
Based on the latest information that you have provided and the history of your problem I think that the problem is caused by the BIOS or other firmware components in your system.

So, the next step would be to check if there are newer versions of BIOS and firmware for the disk controller and see if upgrading fixes the issue.

If that does not help, or just as an alternative solution, you can break up your freebsd-zfs partition in two such that the first one does not cross 2 TiB boundary (from the start of the disk).  Then you can create a boot / root pool on top of that partition.  You were prepared to go back to UFS, so this could allow you to keep using ZFS, but at the cost of moving data.
BTW, if you go with this solution and you get a bootable system, then I still would like to see output of lsdev -v from the loader prompt (which you get by selecting "escape to propmpt" option from the loader menu).
Comment 28 Daniel Ylitalo 2016-08-03 13:31:04 UTC
I'll do the split setup just for testing as the server is going to be reinstalled anyhow so can reinstall it two times :P

Strange that the zroot works fine on 10.2 though and not on 10.3
Comment 29 karl 2016-08-03 13:34:14 UTC
(In reply to Daniel Ylitalo from comment #28)

Not really if the disk is > 2Tb.  On original install the kernel was likely in the first 2Tb, but once you start filling the disk the new write of the new /boot/kernel directory may go into blocks beyond that boundary, and now it's unreadable at boot time.

This can happen on a UFS disk too.... not sure if that's what's going on but it's certainly possible.
Comment 30 Daniel Ylitalo 2016-08-03 14:34:21 UTC
hehe, good to know, then I will stop updating our content servers that doesnt run freebsd on a sata dom if thats the case :)
Comment 31 karl 2016-08-03 14:47:53 UTC
(In reply to Daniel Ylitalo from comment #30)

This has ALWAYS been dangerous on a disk that is >2Tb; the solution is to place the boot (or boot/root, if they're on the same filesystem) within the first 2Tb and do *not* allow the size of that filesystem to cross the 2Tb boundary.  This is an especially nasty problem because it appears that all is ok and is for a while, right up until the system starts allocating space beyond the boundary when you do an update... then you are suddenly hosed as the machine will not boot.

It is safe to have root/boot on a disk larger than 2Tb, but *not* to allow the size of that filesystem to extend across that boundary.
Comment 32 Daniel Ylitalo 2016-08-03 21:43:36 UTC
Looks very much like it was the "outside 2tb" issue as I just wiped the existing partitions and did a zfsroot reinstall with the 10.3 installer and it booted up fine.

Sorry for hassling you guys about it.
Comment 33 Daniel Ylitalo 2016-08-03 21:45:43 UTC
Created attachment 173256 [details]
Working boot lsdev -v

Attaching requested lsdev -v output of a working boot as requested
Comment 34 Steven Hartland freebsd_committer freebsd_triage 2016-08-03 21:46:15 UTC
Thanks for confirming think we should consider adding checks / warning in for this as the messages provided are particularly poor in helping to identify the real issue at hand.
Comment 35 Andriy Gapon freebsd_committer freebsd_triage 2016-08-03 21:53:47 UTC
(In reply to Daniel Ylitalo from comment #33)
Hmm, so it seems that lsdev -v does not report the disk size as I've expected.
So it's not that useful :-(

BTW, it seems that you are still using a single huge partition for ZFS...
If you split it into the under 2TB and above 2TB partitions, then you could easily test the theory by making the >2TB partition a root ZFS pool and checking if you can boot to it.
Comment 36 Daniel Ylitalo 2016-08-03 22:08:32 UTC
(In reply to Andriy Gapon from comment #35)

Yeah, I couldn't really figure out how to split the partitions in the installer so I just went with the full disk install just to see if that would even work and it did.

I've put a sata dom 32gb in the server to install freebsd on to just make it simple (thats how our newer servers operate anyhow), so I will be going with regular ufs on the large mfi array as it's running through a raid controller already, we were just testing zfsroot to see how well it worked out on that server.

But if you really want me to I can give it a go if you could give me a few pointers on how to create the two partitions and then tell the installer to install on the second one