Bug 253471

Summary: Change 348902 : vdev_geom_attach_by_guids() breaks other use cases
Product: Base System Reporter: Dave Baukus <daveb>
Component: kernAssignee: freebsd-geom (Nobody) <geom>
Status: Open ---    
Severity: Affects Some People CC: daveb, grahamperrin, lwhsu, mav, pjd
Priority: --- Keywords: cam
Version: 12.2-STABLE   
Hardware: Any   
OS: Any   
URL: https://github.com/freebsd/freebsd-src/commit/11c875933712fa6c00b2896ade2c3f02a4d4e063

Description Dave Baukus 2021-02-12 21:15:17 UTC
Change MFC r344316, SVN Importer: FreeBSD/base rev 348906 breaks the
following scenario:

- Assume 3 blank, all ZFS label cleared, disks in the system.
- Shutdown, remove one of the blanks, put in on the shelf, reboot.
- Create a mirror pool on the 2 remaining disk, shutdown.
- Replace one of the mirror disks with the with the previously removed
  blank, put the zfs mirror member on the shelf, reboot.

All these machinations are required to ensure the blank disk enumerates
the same as the mirror disk member (has the same vdev_path).

zpool status will now report the pool as a healthy "ONLINE" with
a missing disk:

# zpool status -x
  pool: pool200
 state: ONLINE
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-4J
  scan: none requested
config:

        NAME                     STATE     READ WRITE CKSUM
        pool200                  ONLINE       0     0     0
          mirror-0               ONLINE       0     0     0
            1165392866195823403  UNAVAIL      0     0     0  was /dev/da8
            da9                  ONLINE       0     0     0


This is because change 348906 allows vdev_geom_attach_by_guids() to attach a
vdev that has a "NO_MATCH" guid but satisfies the requested vdev_path:


} else if (match == best_match) {
  /* match = NO_MATCH and best_match is initialized to NO_MATCH;
   * if the paths match we will attach it.
   */
   if (strcmp(pp->name, vdpath) == 0) {
        best_pp = pp;
   }
}


From ZFS_LOG() via vdev_geom_open_by_guids() and others:
vdev_geom_attach:240[1]: Attaching to da8.
vdev_geom_attach:309[1]: Created consumer for da8.
vdev_geom_open_by_guids:797[1]: Attach by guid [17284178551510092527:1165392866195823403] succeeded, provider da8.


Later back in vdev_validate(), this condition never gets vdev_propagate()ed
when detected because for this one error, vdev_set_state() is called with
"is_open" true:

if ((label = vdev_label_read_config(vd, txg)) == NULL) {
   vdev_set_state(vd, B_TRUE, VDEV_STATE_CANT_OPEN,
      VDEV_AUX_BAD_LABEL);
   return (0);
}

It is curious to me that for this one error path vdev_validate() calls
vdev_set_state(..., B_TRUE, ...) whereas all other are B_FALSE.


For my purposes, reverting 348906 fixes the issue, but changing 
   vdev_set_state(vd, B_TRUE, VDEV_STATE_CANT_OPEN, VDEV_AUX_BAD_LABEL);
to
   vdev_set_state(vd, B_FALSE, VDEV_STATE_CANT_OPEN, VDEV_AUX_BAD_LABEL);

also solves the problem.
Comment 1 Dave Baukus 2021-02-12 22:17:19 UTC
Note my original comment references the wrong change; I corrected the summary.
The change that introduces the issue I describe is:

348902 Not 348906

MFC r344316
SVN Importer: FreeBSD/base rev 348902
Comment 2 Dave Baukus 2021-02-12 22:31:13 UTC
A possible fix that works for PJD's scenario and the one I describe:

In vdev_geom_attach_by_guids()

} else if (best_match != NO_MATCH && match == best_match) {
    if (strcmp(pp->name, vdpath) == 0) {
       best_pp = pp;
    }
}
Comment 3 Li-Wen Hsu freebsd_committer freebsd_triage 2021-02-14 18:18:47 UTC
CC committer of base r348902 and base r344316
Comment 4 Dave Baukus 2021-10-28 17:21:12 UTC
Same issue exists in Stable13
Comment 5 Graham Perrin freebsd_committer freebsd_triage 2022-10-19 06:29:05 UTC
Reassignment. 

Whilst here, 

(In reply to Dave Baukus from comment #0)

> Change MFC r344316, …

– a GitHub URL for r344316.