Bug 253471 - Change 348902 : vdev_geom_attach_by_guids() breaks other use cases
Summary: Change 348902 : vdev_geom_attach_by_guids() breaks other use cases
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.2-STABLE
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords: cam
Depends on:
Blocks:
 
Reported: 2021-02-12 21:15 UTC by Dave Baukus
Modified: 2021-02-14 18:18 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dave Baukus 2021-02-12 21:15:17 UTC
Change MFC r344316, SVN Importer: FreeBSD/base rev 348906 breaks the
following scenario:

- Assume 3 blank, all ZFS label cleared, disks in the system.
- Shutdown, remove one of the blanks, put in on the shelf, reboot.
- Create a mirror pool on the 2 remaining disk, shutdown.
- Replace one of the mirror disks with the with the previously removed
  blank, put the zfs mirror member on the shelf, reboot.

All these machinations are required to ensure the blank disk enumerates
the same as the mirror disk member (has the same vdev_path).

zpool status will now report the pool as a healthy "ONLINE" with
a missing disk:

# zpool status -x
  pool: pool200
 state: ONLINE
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-4J
  scan: none requested
config:

        NAME                     STATE     READ WRITE CKSUM
        pool200                  ONLINE       0     0     0
          mirror-0               ONLINE       0     0     0
            1165392866195823403  UNAVAIL      0     0     0  was /dev/da8
            da9                  ONLINE       0     0     0


This is because change 348906 allows vdev_geom_attach_by_guids() to attach a
vdev that has a "NO_MATCH" guid but satisfies the requested vdev_path:


} else if (match == best_match) {
  /* match = NO_MATCH and best_match is initialized to NO_MATCH;
   * if the paths match we will attach it.
   */
   if (strcmp(pp->name, vdpath) == 0) {
        best_pp = pp;
   }
}


From ZFS_LOG() via vdev_geom_open_by_guids() and others:
vdev_geom_attach:240[1]: Attaching to da8.
vdev_geom_attach:309[1]: Created consumer for da8.
vdev_geom_open_by_guids:797[1]: Attach by guid [17284178551510092527:1165392866195823403] succeeded, provider da8.


Later back in vdev_validate(), this condition never gets vdev_propagate()ed
when detected because for this one error, vdev_set_state() is called with
"is_open" true:

if ((label = vdev_label_read_config(vd, txg)) == NULL) {
   vdev_set_state(vd, B_TRUE, VDEV_STATE_CANT_OPEN,
      VDEV_AUX_BAD_LABEL);
   return (0);
}

It is curious to me that for this one error path vdev_validate() calls
vdev_set_state(..., B_TRUE, ...) whereas all other are B_FALSE.


For my purposes, reverting 348906 fixes the issue, but changing 
   vdev_set_state(vd, B_TRUE, VDEV_STATE_CANT_OPEN, VDEV_AUX_BAD_LABEL);
to
   vdev_set_state(vd, B_FALSE, VDEV_STATE_CANT_OPEN, VDEV_AUX_BAD_LABEL);

also solves the problem.
Comment 1 Dave Baukus 2021-02-12 22:17:19 UTC
Note my original comment references the wrong change; I corrected the summary.
The change that introduces the issue I describe is:

348902 Not 348906

MFC r344316
SVN Importer: FreeBSD/base rev 348902
Comment 2 Dave Baukus 2021-02-12 22:31:13 UTC
A possible fix that works for PJD's scenario and the one I describe:

In vdev_geom_attach_by_guids()

} else if (best_match != NO_MATCH && match == best_match) {
    if (strcmp(pp->name, vdpath) == 0) {
       best_pp = pp;
    }
}
Comment 3 Li-Wen Hsu freebsd_committer 2021-02-14 18:18:47 UTC
CC committer of base r348902 and base r344316