Bug 221077 - Boot from ZFS fails following freebsd-update
Summary: Boot from ZFS fails following freebsd-update
Status: Closed Works As Intended
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.3-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-fs mailing list
URL:
Keywords: regression
Depends on:
Blocks:
 
Reported: 2017-07-28 23:51 UTC by Jason W. Bacon
Modified: 2019-09-06 17:33 UTC (History)
8 users (show)

See Also:


Attachments
snapshot of the console with the error message (299.24 KB, image/jpeg)
2019-01-29 08:09 UTC, Laurent Frigault
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jason W. Bacon freebsd_committer 2017-07-28 23:51:29 UTC
This system was running without a hickup for over a year.  After running freebsd-update fetch install today, I'm getting the following:

ZFS: i/o error - all block copies unavailable
ZFS: can't find dataset u
gptzfsboot: failed to mount default pool

Symptoms look similar to a problem I had previously solved on another system:

https://forums.freebsd.org/threads/54422/

Specifically, I had done the following a couple of times successfully, but it's not working in this case:

1) Boot from 10.2-RELEASE USB stick, CD, or DVD
2) Select Live CD from Install/Shell/Live CD menu
3) Log in as root
4) Run the following at the shell prompt:
# zpool import -R /mnt -f zroot # Probably equivalent to what you did for this purpose
# cd /mnt
# mv boot boot.orig
# mkdir boot
# cd boot.orig
# cp -Rp * /mnt/boot # Note -p to make sure permissions are correct in the new /boot
# zpool export
reboot

Also tried reinstalling the boot block:

gpart bootcode bootcode -b /mnt/boot/pmbr -p /mnt/boot/gptzfsboot -i 1 mfid0

This is a ZFS filesystem on a hardware RAID 5, PERC R710 (LSI chipset).  From pciconf -l:

mfi0@pci0:1:0:0:        class=0x010400 card=0x1f161028 chip=0x00791000 rev=0x05 hdr=0x00
    vendor     = 'LSI Logic / Symbios Logic'
    device     = 'MegaRAID SAS 2108 [Liberator]'
    class      = mass storage
    subclass   = RAID
Comment 1 Jason W. Bacon freebsd_committer 2017-07-29 14:41:28 UTC
BTW, this is a production system that I need to bring back online ASAP.

If there is any info I can gather to help with diagnosis before I wipe it and reinstall the system, let me know.
Comment 2 Alexander Leidinger freebsd_committer 2017-07-29 16:25:56 UTC
Try to load zfsloader.old instead of zfsloader from the boo0 prompt (when you see the first spinning ventilator at boot press space, there you see the path which is used to load zfsloader, replace it by attaching ".old").

Pleae report back if this works, it gives a hint where the problem may be.
Comment 3 Jason W. Bacon freebsd_committer 2017-07-29 18:35:05 UTC
Unfortunately, no such file.

I looked around my other systems, including some that haven't been updated in a while, and the zfsloader files are all the same:

SHA256 (/boot/zfsloader) = 442ca49cd5594c149a7b4e9683bc623de6e94c3e43ebbe6ba33d8c2f994ef160

Thanks anyway...
Comment 4 Jason W. Bacon freebsd_committer 2017-07-29 18:50:14 UTC
BTW, I forgot to mention that this is 10.3-RELEASE.

Also, not that it matters, but I misreported the PERC model.  It's actually an R700.

Lastly, not sure what this means, but the version looks weird.  Instead of 10.3-RELEASE-p20 like my other systems, I get this (after booting from a USB drive and importing zroot):

root@login:~ # chroot /zroot/
root@login.peregrine / # uname -a
FreeBSD login.peregrine.hpc.uwm.edu 10.3-RELEASE FreeBSD 10.3-RELEASE #0 r297264: Fri Mar 25 02:10:02 UTC 2016     root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64
root@login.peregrine / # uname -v
FreeBSD 10.3-RELEASE #0 r297264: Fri Mar 25 02:10:02 UTC 2016     root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC
Comment 5 Jason W. Bacon freebsd_committer 2017-08-02 18:27:30 UTC
Couldn't hold off on restoring this system, but I saved a copy of /boot and dd'd a copy of the boot block just in case someone wants to examine it at some point.

I have other root-on-zfs installations that are less critical, so if this happens again, we can use them for diagnosis.

For my mission-critical systems, until this issue is understood and resolved, I'm installing to UFS2 and leaving most of the disk free for a post-install ZFS pool setup, e.g.

Filesystem             Size    Used   Avail Capacity  Mounted on
/dev/mfid0p2           242G    3.7G    219G     2%    /
devfs                  1.0K    1.0K      0B   100%    /dev
procfs                 4.0K    4.0K      0B   100%    /proc
fdescfs                1.0K    1.0K      0B   100%    /dev/fd
zroot/share1           5.0T    726G    4.3T    14%    /share1
zroot/sharedapps       4.3T    5.3G    4.3T     0%    /sharedapps
zroot/tmp              4.3T     68K    4.3T     0%    /tmp
zroot/home             4.3T    7.1G    4.3T     0%    /usr/home
zroot/var              4.3T    502M    4.3T     0%    /var
zroot/var/cache        4.3T     25K    4.3T     0%    /var/cache
zroot/var/cache/pkg    4.3T    211M    4.3T     0%    /var/cache/pkg
Comment 6 Allan Jude freebsd_committer 2017-08-10 15:15:24 UTC
It is not clear, are you booting from ZFS or UFS?

It looks like your / is on UFS, and you should not have installed gptzfsboot, but gptboot
Comment 7 Jason W. Bacon freebsd_committer 2017-08-10 15:29:01 UTC
The problem occurred when booting from ZFS.

Since I had to get this system back online and reliable ASAP, I backed it up and reinstalled with a UFS boot partition.

I had a similar issue twice on another system booting from ZFS a couple years ago, but it was easily worked around by recreating the contents of /boot.  That little hack didn't work this time.

As I mentioned, I will continue to boot from ZFS on some of my development systems and "hope" the problem returns so we can properly diagnose it.

I'll add that the problem seems to occur after particularly heavy writes to the pool, if 3 times qualifies as a pattern.

In this case, I had just installed a few hundred ports from source.

In the previous case, I had done an rsync backup to the system from another server I was upgrading.
Comment 8 Jason W. Bacon freebsd_committer 2017-10-23 13:48:33 UTC
Just for the sake of completeness, I should mention that I've had this issue on two machines, and both were running ZFS on top of a PERC hardware RAID.  I have used root on ZFS on other systems over the past few years, where ZFS is doing the volume management (raidz2) and have not had any issues so far.
Comment 9 Jason W. Bacon freebsd_committer 2017-10-31 16:42:40 UTC
Add one more data point: I experienced the issue on a Proliant server with hardware RAID 1 immediately after installation.  Switching to UFS2 resolved the issue.

I have still not seen a problem where ZFS is doing the volume management, so this is looking more like an issue of ZFS on top of hardware RAIDs.
Comment 10 Laurent Frigault 2019-01-28 20:12:34 UTC
I've been having this issue 3 times on at least 2 different kind of dell server. Each time, this was on GPT zfsboot servers.
last time, it was when upgrading from 11.2 to 12.0 .

I re-install them, with a small UFS boot partition .

Regards,
Comment 11 Jason W. Bacon freebsd_committer 2019-01-28 20:18:34 UTC
Were you using ZFS filesystem on a hardware RAID?  Single disk?  Or were you using some sort of ZFS software RAID?

In every case where I experienced this issue, it was on a hardware RAID. Single disks and RAIDZs have never given me trouble.
Comment 12 Laurent Frigault 2019-01-28 20:23:15 UTC
(In reply to Jason W. Bacon from comment #11)

Each time, the server was running zfs on an hardware raid volume.
Comment 13 Steven Hartland freebsd_committer 2019-01-28 20:52:20 UTC
Could anyone provide partition information for an effected machine prior to fixing?
Comment 14 Jason W. Bacon freebsd_committer 2019-01-28 21:58:25 UTC
I suspect that ZFS may have a conflict with certain RAID controllers, perhaps both trying to use the same lower blocks for volume management?  I don't know much about ZFS internals so this is just a barely-educated guess, but every case I've seen of this problem with full details available (all of mine and a few others) have involved ZFS on a hardware RAID.

ZFS on a hardware RAID seems to work fine as long as you don't try to boot from it.  Booting from ZFS on a single disk or allowing ZFS to do volume management has never given me a problem.  I've been using it regularly as long as it has been available in FreeBSD.

If you look around you'll see other comments about why ZFS should not be used on a hardware RAID.  I haven't yet found one specifically about this issue, but they're worth reading.
Comment 15 Laurent Frigault 2019-01-28 22:27:31 UTC
(In reply to Steven Hartland from comment #13)

The partition scheme on my lastt server having the issue must have look like that:


=>        40  5859442608  mfid0  GPT  (2.7T)
          40        1024      1  freebsd-boot  (512K)
        1064    67108864      2  freebsd-swap  (32G)
    67109898      xxxxxx      3  freebsd-zfs  (2.7T)
     yyyyyyy           1         - free -  (512B)
Comment 16 Steven Hartland freebsd_committer 2019-01-28 23:52:28 UTC
(In reply to Laurent Frigault from comment #15)
Hmm nothing odd there, even with HW RAID the ZFS partition is way from both the start and the end of the disk so should be fine.
Comment 17 Steven Hartland freebsd_committer 2019-01-29 00:34:05 UTC
Its been a while since I looked at zfs boot code and seems like its been refactored a bit.

Looking at the original error I think its coming from zfs_mount_dataset however I also think the error is being corrupted, I would have expected a number at the end of "can't find dataset" instead of a raw 'u'.

A similar issue for the "failed to mount default pool" the next bit should list the pool but you have nothing.

I'm wondering if you have some other partition which looks like a pool but isn't and that's what its actually trying to boot from.

What's the output from lsdev from the boot loader?
Comment 18 Laurent Frigault 2019-01-29 08:09:11 UTC
Created attachment 201495 [details]
snapshot of the console with the error message

The server are running and I can't stop them to run lsdev.
I have a screen snapshot of the console (via dell idrac virtual console) the last time it happens.
Comment 19 Andriy Gapon freebsd_committer 2019-01-29 12:09:49 UTC
(In reply to Steven Hartland from comment #17)
I think 'u' is a result of the code being shared between (gpt)zfsboot and zfsloader. printf in stand/i386/boot2/boot2.c is very primitive and it simply consumes '%j' without any output because it does not  expect 'j' (or any size modifiers) and then just prints 'u'. It seems that that printf supports only %c, %s and %u.
Comment 20 Steven Hartland freebsd_committer 2019-01-29 12:34:40 UTC
(In reply to Andriy Gapon from comment #19)
Thanks that would make sense
Comment 21 Warner Losh freebsd_committer 2019-01-29 16:18:17 UTC
Yes.  Due to the extreme space limitations of boot2, printf in it has been shaved to the bone. boot2 doesn't support ZFS at all, but is only for UFS since it lives in the UFS boot block area.

But for *gptboot, we use the printf in libsa/printf.c, which at least pretends to support 'j' modifiers (and the pretense appears to be quite good from my reading of it). So it's quite odd to see 'u' printed there. A quick grep of the symbols for the .o's brought in shows they are all U. There is a zfs_printf, but it just calls printf and returns 0.
Comment 22 Andriy Gapon freebsd_committer 2019-01-29 17:38:19 UTC
(In reply to Warner Losh from comment #21)
Maybe because zfsboot != gptzfsboot...
Comment 23 Warner Losh freebsd_committer 2019-01-29 18:12:53 UTC
(In reply to Andriy Gapon from comment #22)
zfsboot is also not boot2 either. It also uses libsa's printf. zfsboot doesn't use boot2's printf either.
Comment 24 martin 2019-01-30 11:11:56 UTC
(In reply to Warner Losh from comment #23)
The original report was in 10.3-RELEASE, which I think uses https://svnweb.freebsd.org/base/release/10.3.0/sys/boot/common/util.c?view=markup#l118 for printf in zfsboot in gptzfsboot (without %j support).
Comment 25 Jason W. Bacon freebsd_committer 2019-09-02 01:27:42 UTC
I think this is some sort of conflict between ZFS and certain hardware RAID controllers.  Creating separate volumes for each disk and using RAIDZ instead of installing a ZFS filesystem on top of a hardware RAID config seems to avoid the issue.
Comment 26 Miroslav Lachman 2019-09-02 10:30:08 UTC
(In reply to Jason W. Bacon from comment #25)
Why this was closed? 
If ZFS stopped working after some update procedure then there is a bug.
It is true that it is better to run ZFS pool on top of individual disks but it definitely should work on top of any GEOM or RAID device without this kind of failure.
I am using ZFS from the early days on FreeBSD always on individual disks but this year I was forced to maintain one system on similar configuration like in this PR - ZFS pool on top of some Avago RAID card which cannot expose individual disk devices. 
It is working fine now but what it will do after system upgrade? (running 11.2 now)
Comment 27 Jason W. Bacon freebsd_committer 2019-09-02 13:18:03 UTC
I think proximity of the original failure to an update was a coincidence.  I've since seen a couple of systems fail randomly in the absence of updates, and one (HP Proliant) simply would not work from the beginning with ZFS on a hardware RAID.  After configuring each disk as a separate volume in the RAID controller, everything was fine.  You're welcome to reopen the PR if you want to pursue it further.  Very few people run ZFS on top of a hardware RAID, though, so it may be difficult to support.
Comment 28 Miroslav Lachman 2019-09-02 16:49:00 UTC
(In reply to Jason W. Bacon from comment #27)
If I have a choice I will choose individual disks but this machine is rented one without any choice to configure it differently. Tech support told us controller does not support JBOD.

Anyway - what is the root cause of the ZFS pool failure? Only the boot records are damaged or the whole pool and data on it are crushed?
Comment 29 Jason W. Bacon freebsd_committer 2019-09-06 17:33:13 UTC
Only boot records.

You don't really need JBOD support.  Creating separate 1-disk volumes for each drive has worked for me.  It's not ideal, but it seems to be reliable.  This is what I had to do for the Proliant server and I've had no problems with it.