Bug 192066 - sysutils/grub2 and ZFS: wrong lz4 endianness
Summary: sysutils/grub2 and ZFS: wrong lz4 endianness
Status: Closed FIXED
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Only Me
Assignee: John Marino
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-07-23 17:34 UTC by Andrey Zholos
Modified: 2014-07-27 18:15 UTC (History)
1 user (show)

See Also:


Attachments
fix (512 bytes, patch)
2014-07-23 17:34 UTC, Andrey Zholos
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andrey Zholos 2014-07-23 17:34:17 UTC
Created attachment 144914 [details]
fix

I am using GRUB to boot the kernel directly from ZFS.

Not long after an upgrade to a recent 10-stable r268881, GRUB stopped being able
to see the pool and boot. Having completed an appropriate recovery effort and
finally booting the system again, I used gdb on grub-probe to determine that the
problem was in lz4 decompression of the uberblock.

Here is the problematic code in GRUB 2.00 (with FreeBSD port patches):

grub-core/fs/zfs/zfs_lz4.c:

    #if BYTE_ORDER == BIG_ENDIAN

Apparently <sys/endian.h> isn't included, so those macros expand to 0, and the
code incorrectly assumes a big-endian system. Then based on this assumption it
byte-swaps a 2-byte offset field in the compressed data, which makes the data
appear corrupt, and fails.

I am not sure why this problem happened to manifest just now, since GRUB hasn't
been updated in a while, but I think the recent kernel happens to lz4-compress
the uberblock and earlier kernels happened to lzjb-compress or not compress it,
leaving the problem unnoticed.

This causes disturbing messages like "error: no such device: <pool id>." and
"lz4 decompression failed" at the GRUB prompt, and this:

# grub-probe -d /dev/gpt/mypool
grub-probe: error: unknown filesystem.

The fix is simply adding #include <sys/endian.h> at the top of zfs_lz4.c:

# grub-probe -d /dev/gpt/mypool
zfs

Note I am also using the patch from bug 188524 for the "hole_birth" feature and
I haven't enabled the "embedded_data" feature on my pool yet. A newly created
pool doesn't work in GRUB because of those feature flags, regardless of lz4.

The latest GRUB source uses grub_le_to_cpu16() instead of BYTE_ORDER, so the
problem should resolve itself in future versions.
Comment 1 John Marino freebsd_committer freebsd_triage 2014-07-24 08:30:19 UTC
incredibly, grub2 is unmaintained...
Comment 2 commit-hook freebsd_committer freebsd_triage 2014-07-27 18:14:24 UTC
A commit references this bug:

Author: marino
Date: Sun Jul 27 18:13:32 UTC 2014
New revision: 363087
URL: http://svnweb.freebsd.org/changeset/ports/363087

Log:
  sysutils/grub2: Fix wrong lz4 endianness and general port cleanup

  Due to lack of inclusion of <sys/endian.h>, the lz4 code incorrectly
  assumes a big-endian system.  The result issues manifest with errors like,
  "error: no such device: <pool id>." and "lz3 decompression failed" at the
  grub prompt.  Modify existing patch to add <sys/endian.h>.

  While here, simplify the port with OPTIONS_SUB framework and fix the
  manpage stuff on the options which apparently has been broken since
  this unmaintained port was staged.

  PR:		192066
  Submitted by:	Andrey Zholos

Changes:
  head/sysutils/grub2/Makefile
  head/sysutils/grub2/files/patch-grub-2.00-zfs-feature-flag-support
  head/sysutils/grub2/pkg-plist
Comment 3 John Marino freebsd_committer freebsd_triage 2014-07-27 18:15:35 UTC
Thanks!