Bug 225960

Summary: zfs: g_access leak when unmounting UFS on a zvol
Product: Base System Reporter: Alan Somers <asomers>
Component: kernAssignee: Andriy Gapon <avg>
Status: Closed FIXED    
Severity: Affects Some People CC: avg, daveb, fs
Priority: ---    
Version: CURRENT   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
possible solution none

Description Alan Somers freebsd_committer freebsd_triage 2018-02-16 20:35:13 UTC
Sometimes a g_access leak can be triggered by creating a UFS filesystem on a zvol, and then unmounting it.  The result is a zvol whose geom mode is r0w0e2, even though it has no consumers.  The zvol cannot be destroyed, and neither can its parent pool.  Rebooting solves the problem.  While not 100% reproducible, this problem is frequently reproduced by the sys/cddl/zfs/tests/cli_root/zfs_copies/zfs_copies_test:zfs_copies_001_pos test from the ZFS test suite.
Comment 1 commit-hook freebsd_committer freebsd_triage 2018-02-16 20:40:28 UTC
A commit references this bug:

Author: asomers
Date: Fri Feb 16 20:40:05 UTC 2018
New revision: 329408
URL: https://svnweb.freebsd.org/changeset/base/329408

Log:
  Skip zfs_copies_006_pos

  It frequently triggers a g_access leak, creating a non-destroyable pool

  PR:		225960
  Sponsored by:	Spectra Logic Corp

Changes:
  projects/zfsd/head/tests/sys/cddl/zfs/tests/cli_root/zfs_copies/zfs_copies_test.sh
Comment 2 Andriy Gapon freebsd_committer freebsd_triage 2018-02-16 21:03:05 UTC
I've just seen this.  I tried to zfs destroy a volume with a mounted (ro) UFS filesystem on it.  The destroy failed with EBUSY, as expected.  But after I unmounted the filesystem the destroy would still fail in the same way.
Comment 3 Andriy Gapon freebsd_committer freebsd_triage 2018-02-16 21:17:44 UTC
Could be a g_label problem though:
    <geom id="0xfffff80023b33200">
      <class ref="0xffffffff80d86a90"/>
      <name>zvol/pond/zvol2s1</name>
      <rank>3</rank>
      <config>
      </config>
        <consumer id="0xfffff8002215e300">
          <geom ref="0xfffff80023b33200"/>
          <provider ref="0xfffff8002291ad00"/>
          <mode>r0w0e2</mode>
          <config>
          </config>
        </consumer>
        <provider id="0xfffff80023b33100">
          <geom ref="0xfffff80023b33200"/>
          <mode>r0w0e0</mode>
          <name>ufsid/4e6d250390a79ac0</name>
          <mediasize>1073705472</mediasize>
          <sectorsize>512</sectorsize>
          <stripesize>4096</stripesize>
          <stripeoffset>3584</stripeoffset>
          <config>
            <index>0</index>
            <length>1073705472</length>
            <seclength>2097081</seclength>
            <offset>0</offset>
            <secoffset>0</secoffset>
          </config>
        </provider>
    </geom>

I don't think that the consumer should have r0w0e2 mode while the provider has r0w0e0.  The class is LABEL and apparently it's the ufsid variety.
Comment 4 Alan Somers freebsd_committer freebsd_triage 2018-02-16 21:42:45 UTC
Yeah, and kern.geom.label.ufsid.enable=0 makes the problem go away.
Comment 5 Andriy Gapon freebsd_committer freebsd_triage 2018-02-16 21:44:37 UTC
I now think that this could be a / the deep problem in GEOM.
The problem is that g_access() must be called with the GEOM topology lock held.
And that gives a false impression that the lock is indeed held across the call.
But this isn't true because many classes, ZVOL being one of the many, need to perform I/O in their access method.  So, they must drop and pick up the topology lock.
That, of course, can break many assumptions.

Specifically, looking at g_slice_access() we can see that the code assumes that all invocations are serialized.  Indeed, if another call to g_slice_access() is permitted while the consumer's access bits are not updated yet, then the following condition can be true multiple times:

        /* On first open, grab an extra "exclusive" bit */
        if (cp->acr == 0 && cp->acw == 0 && cp->ace == 0)
                de++;

And if that ever happens and we have extra grabs on cp->ace, then the following condition will never be true (because cp->ace + de > 1):

        /* ... and let go of it on last close */
        if ((cp->acr + dr) == 0 && (cp->acw + dw) == 0 && (cp->ace + de) == 1)
                de--;
Comment 6 Andriy Gapon freebsd_committer freebsd_triage 2018-02-19 15:00:01 UTC
Created attachment 190802 [details]
possible solution

I have come up with a possible solution, haven't tested it yet.
Perhaps there is a better way to fix the issue in GEOM core code, but I couldn'r think of one.
Comment 7 Alan Somers freebsd_committer freebsd_triage 2018-02-19 18:13:09 UTC
Your patch works for me.
Comment 8 Andriy Gapon freebsd_committer freebsd_triage 2018-02-21 11:52:07 UTC
I created a review request for this https://reviews.freebsd.org/D14458

Maybe there is a smarter way to tackle the problem, but I couldn't come up with anything else...
Comment 9 Andriy Gapon freebsd_committer freebsd_triage 2018-02-21 12:21:49 UTC
(In reply to Andriy Gapon from comment #8)
FWIW, I have never been able to reproduce the original problem.
Comment 10 Alan Somers freebsd_committer freebsd_triage 2018-02-21 16:04:41 UTC
avg, I made a mistake with my reproduction instructions.  You should run zfs_copies_006_pos, not 001_pos.  And zfsd must be running at the same time.

sudo service zfsd onestart
cd /usr/tests/sys/cddl/zfs/tests/cli_root/zfs_copies
sudo kyua debug zfs_copies_test:zfs_copies_006_pos
Comment 11 Andriy Gapon freebsd_committer freebsd_triage 2018-02-22 10:29:35 UTC
(In reply to Alan Somers from comment #10)
I figured out the correct test name from your commit to disable it, so I was already using it.  I wasn't running zfsd, I am doing doing that now.  But still can't reproduce.  Maybe because of my GEOM configuration where I use multiple slices of a single disk to emulate multiple disks.
Comment 12 Dave Baukus 2018-02-22 14:13:02 UTC
(In reply to Andriy Gapon from comment #11)
Do you have Alan's latest zfsd changes ? this is what exasperated the condition.
Comment 13 Alan Somers freebsd_committer freebsd_triage 2018-02-22 15:42:25 UTC
(In reply to Dave Baukus from comment #12)

Specifically, the problem should be easiest to reproduce with zfsd built from at least revision 329284 but less than 329344.
Comment 14 Andriy Gapon freebsd_committer freebsd_triage 2018-03-04 13:56:34 UTC
A more generic patch for the problem: https://reviews.freebsd.org/D14533
Comment 15 commit-hook freebsd_committer freebsd_triage 2018-03-15 09:17:10 UTC
A commit references this bug:

Author: avg
Date: Thu Mar 15 09:16:11 UTC 2018
New revision: 330977
URL: https://svnweb.freebsd.org/changeset/base/330977

Log:
  g_access: deal with races created by geoms that drop the topology lock

  The problem is that g_access() must be called with the GEOM topology
  lock held.  And that gives a false impression that the lock is indeed
  held across the call.  But this isn't always true because many classes,
  ZVOL being one of the many, need to drop the lock.  It's either to
  perform an I/O on the first open or to acquire a different lock (like in
  g_mirror_access).

  That, of course, can break many assumptions.  For example,
  g_slice_access() adds an extra exclusive count on the first open. As
  described above, an underlying geom may drop the topology lock and that
  would open a race with another thread that would also request another
  extra exclusive count.  In general, two consumers may be granted
  incompatible accesses.

  To avoid this problem the code is changed to mark a geom with special
  flag before calling its access method and clear the flag afterwards.  If
  another thread sees that flag, then it means that the topology lock has
  been dropped (either by the geom in question or downstream from it), so
  it is not safe to make another access call.  So, the second thread would
  use g_topology_sleep() to wait until the flag is cleared and only then
  would it proceed with the access.

  Also see http://docs.freebsd.org/cgi/mid.cgi?809d9254-ee56-59d8-69a4-08838e985cea

  PR:		225960
  Reported by:	asomers
  Reviewed by:	markj, mav
  MFC after:	3 weeks
  Differential Revision: https://reviews.freebsd.org/D14533

Changes:
  head/sys/geom/geom.h
  head/sys/geom/geom_subr.c
Comment 16 commit-hook freebsd_committer freebsd_triage 2018-03-15 09:28:23 UTC
A commit references this bug:

Author: avg
Date: Thu Mar 15 09:28:11 UTC 2018
New revision: 330979
URL: https://svnweb.freebsd.org/changeset/base/330979

Log:
  re-enable zfs_copies_006_pos test after a fix in r330977

  The test was disabled in r329408.

  PR:		225960

Changes:
  head/tests/sys/cddl/zfs/tests/cli_root/zfs_copies/zfs_copies_test.sh
Comment 17 commit-hook freebsd_committer freebsd_triage 2018-04-06 12:14:16 UTC
A commit references this bug:

Author: avg
Date: Fri Apr  6 12:13:32 UTC 2018
New revision: 332095
URL: https://svnweb.freebsd.org/changeset/base/332095

Log:
  MFC r330977: g_access: deal with races created by geoms that drop the topology lock

  PR:		225960

Changes:
_U  stable/11/
  stable/11/sys/geom/geom.h
  stable/11/sys/geom/geom_subr.c
Comment 18 commit-hook freebsd_committer freebsd_triage 2018-04-06 12:24:28 UTC
A commit references this bug:

Author: avg
Date: Fri Apr  6 12:24:00 UTC 2018
New revision: 332096
URL: https://svnweb.freebsd.org/changeset/base/332096

Log:
  MFC r330977: g_access: deal with races created by geoms that drop the topology lock

  PR:		225960

Changes:
_U  stable/10/
  stable/10/sys/geom/geom.h
  stable/10/sys/geom/geom_subr.c