Bug 218568

Summary: vdev_geom_attach_by_guids can attach to wrong partition
Product: Base System Reporter: Alan Somers <asomers>
Component: kernAssignee: Alan Somers <asomers>
Status: Closed FIXED    
Severity: Affects Some People Flags: asomers: mfc-stable11+
Priority: ---    
Version: CURRENT   
Hardware: Any   
OS: Any   

Description Alan Somers freebsd_committer freebsd_triage 2017-04-11 17:29:47 UTC
When opening a vdev whose path is unknown, vdev_geom must find a geom provider with a label whose guids match te desired vdev.  However, due to partitioning, it is possible that two non-synonomous providers will share two labels.  For example, if the first partition starts at the beginning of the drive, then ada0 and ada0p1 will share the first label.  More troubling, if the last partition runs to the end of the drive, then ada0p3 and ada0 will share the last label.  If vdev_geom opens ada0 when it should've opened ada0p3, then the pool won't be readable.  If it opens ada0 when it should've opened ada0p1, then it will corrupt some other partition when it writes the 3rd and 4th labels.

The easiest way to observe this problem is to install ZFS to a pair of mirrored drives, using the default GPT partitioning scheme.  Power off, swap the drives, and power back on.  Depending on the order that geom probes its providers, ZFS may or may not attach to the correct partitions.  If it doesn't, then importing the pool will fail somewhere up stack, and the system will be left at the mountroot> prompt.
Comment 1 Alan Somers freebsd_committer freebsd_triage 2017-04-11 17:30:23 UTC
I already have a fix for this issue in our private FreeBSD fork.  I'll try to merge it into head.
Comment 2 Alan Somers freebsd_committer freebsd_triage 2017-04-13 15:22:27 UTC
Fixed by r316760

 Fix vdev_geom_attach_by_guids for partitioned disks

  When opening a vdev whose path is unknown, vdev_geom must find a geom
  provider with a label whose guids match the desired vdev. However, due to
  partitioning, it is possible that two non-synonomous providers will share
  some labels. For example, if the first partition starts at the beginning of
  the drive, then ada0 and ada0p1 will share the first label. More troubling,
  if the last partition runs to the end of the drive, then ada0p3 and ada0
  will share the last label. If vdev_geom opens ada0 when it should've opened
  ada0p3, then the pool won't be readable. If it opens ada0 when it should've
  opened ada0p1, then it will corrupt some other partition when it writes the
  3rd and 4th labels.

  The easiest way to reproduce this problem is to install a mirrored root pool
  with the default partition layout, then swap the positions of the two boot
  drives and reboot.  Whether the bug manifests depends on the order in which
  geom lists its providers, which is arbitrary.

  Fix this situation by modifying the search algorithm to prefer geom
  providers that have all four labels intact. If no such provider exists, then
  open whichever provider has the most.

  Reviewed by:  mav
  MFC after:    3 weeks
  Sponsored by: Spectra Logic Corp
  Differential Revision:        https://reviews.freebsd.org/D10365
Comment 3 Alan Somers freebsd_committer freebsd_triage 2017-05-22 15:17:42 UTC
MFCed to stable/11 by r317833