Bug 239243 - [zfs] regression: "zfs mount -a" mount order is incorrect became incorrect
Summary: [zfs] regression: "zfs mount -a" mount order is incorrect became incorrect
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.3-STABLE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-fs (Nobody)
URL:
Keywords: regression
Depends on:
Blocks:
 
Reported: 2019-07-16 09:41 UTC by Bertrand Petit
Modified: 2019-08-07 09:36 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Bertrand Petit 2019-07-16 09:41:41 UTC
I just attempted an upgrade to 11.3-STABLE (svn rev 349940) and had to rollback to svn rev 339356 because zfs' are now mounted in an incorrect order.

The boot pool has the following layout:

NAME                     CANMOUNT  MOUNTPOINT
asroot                        off  none
asroot/home                    on  /home
asroot/sys                    off  none
asroot/sys/root               off  none
asroot/sys/root/default        on  /
asroot/sys/usr                 on  /usr
asroot/sys/usr/local           on  /usr/local
asroot/sys/usr/obj             on  /usr/obj
asroot/sys/usr/ports           on  /usr/ports
asroot/sys/usr/src             on  /usr/src
asroot/sys/var                 on  /var
asroot/sys/var/audit           on  /var/audit
asroot/sys/var/crash           on  /var/crash
asroot/sys/var/log             on  /var/log
asroot/sys/var/mail            on  /var/mail
asroot/sys/var/tmp             on  /var/tmp
asroot/tmp                     on  /tmp

more filesystems are mounted to /home and /usr from another pool. Please note that another zfs has the same mountpoint as asroot/sys/usr/local but it has the canmount property set to noauto.

At boot time, under r339356, all the zfs' are properly mounted in a correct order. However, under r349940, the mount order looks to be

asroot/sys/root/default
[...]
asroot/sys/usr/local
[...]
asroot/sys/usr

[I wasn't able to determine the exact mount order because "zfs mount -v -a" did not produce any output contrary to the what is claimed in the man page.]

The net effect of this regression is that /usr/local got overlaid by /usr and it is thus prevented from being accessed. The mount order is incorrect, I had to rollback.
Comment 1 Bertrand Petit 2019-07-20 09:56:23 UTC
See also https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=237397
Comment 2 fullermd 2019-07-21 17:07:32 UTC
Bug 237517 now contains a fix from ZoL.
Comment 3 Bertrand Petit 2019-07-22 15:42:35 UTC
(In reply to fullermd from comment #2)
I confirm that this patch fixes the here reported bug.
Comment 4 commit-hook freebsd_committer freebsd_triage 2019-07-26 13:12:44 UTC
A commit references this bug:

Author: bapt
Date: Fri Jul 26 13:12:33 UTC 2019
New revision: 350358
URL: https://svnweb.freebsd.org/changeset/base/350358

Log:
  Fix a bug introduced with parallel mounting of zfs

  Incorporate a fix from zol:
  https://github.com/zfsonlinux/zfs/commit/ab5036df1ccbe1b18c1ce6160b5829e8039d94ce

  commit log from upstream:
   Fix race in parallel mount's thread dispatching algorithm

  Strategy of parallel mount is as follows.

  1) Initial thread dispatching is to select sets of mount points that
   don't have dependencies on other sets, hence threads can/should run
   lock-less and shouldn't race with other threads for other sets. Each
   thread dispatched corresponds to top level directory which may or may
   not have datasets to be mounted on sub directories.

  2) Subsequent recursive thread dispatching for each thread from 1)
   is to mount datasets for each set of mount points. The mount points
   within each set have dependencies (i.e. child directories), so child
   directories are processed only after parent directory completes.

  The problem is that the initial thread dispatching in
  zfs_foreach_mountpoint() can be multi-threaded when it needs to be
  single-threaded, and this puts threads under race condition. This race
  appeared as mount/unmount issues on ZoL for ZoL having different
  timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8).
  `zfs unmount -a` which expects proper mount order can't unmount if the
  mounts were reordered by the race condition.

  There are currently two known patterns of input list `handles` in
  `zfs_foreach_mountpoint(..,handles,..)` which cause the race condition.

  1) #8833 case where input is `/a /a /a/b` after sorting.
   The problem is that libzfs_path_contains() can't correctly handle an
   input list with two same top level directories.
   There is a race between two POSIX threads A and B,
    * ThreadA for "/a" for test1 and "/a/b"
    * ThreadB for "/a" for test0/a
   and in case of #8833, ThreadA won the race. Two threads were created
   because "/a" wasn't considered as `"/a" contains "/a"`.

  2) #8450 case where input is `/ /var/data /var/data/test` after sorting.
   The problem is that libzfs_path_contains() can't correctly handle an
   input list containing "/".
   There is a race between two POSIX threads A and B,
    * ThreadA for "/" and "/var/data/test"
    * ThreadB for "/var/data"
   and in case of #8450, ThreadA won the race. Two threads were created
   because "/var/data" wasn't considered as `"/" contains "/var/data"`.
   In other words, if there is (at least one) "/" in the input list,
   the initial thread dispatching must be single-threaded since every
   directory is a child of "/", meaning they all directly or indirectly
   depend on "/".

  In both cases, the first non_descendant_idx() call fails to correctly
  determine "path1-contains-path2", and as a result the initial thread
  dispatching creates another thread when it needs to be single-threaded.
  Fix a conditional in libzfs_path_contains() to consider above two.

  Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
  Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
  Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>

  PR:		237517, 237397, 239243
  Submitted by:	Matthew D. Fuller <fullermd@over-yonder.net> (by email)
  MFC after:	3 days

Changes:
  head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_mount.c
Comment 5 commit-hook freebsd_committer freebsd_triage 2019-07-29 08:15:07 UTC
A commit references this bug:

Author: bapt
Date: Mon Jul 29 08:14:36 UTC 2019
New revision: 350401
URL: https://svnweb.freebsd.org/changeset/base/350401

Log:
  MFC r350358:

  Fix a bug introduced with parallel mounting of zfs

  Incorporate a fix from zol:
  https://github.com/zfsonlinux/zfs/commit/ab5036df1ccbe1b18c1ce6160b5829e8039d94ce

  commit log from upstream:
   Fix race in parallel mount's thread dispatching algorithm

  Strategy of parallel mount is as follows.

  1) Initial thread dispatching is to select sets of mount points that
   don't have dependencies on other sets, hence threads can/should run
   lock-less and shouldn't race with other threads for other sets. Each
   thread dispatched corresponds to top level directory which may or may
   not have datasets to be mounted on sub directories.

  2) Subsequent recursive thread dispatching for each thread from 1)
   is to mount datasets for each set of mount points. The mount points
   within each set have dependencies (i.e. child directories), so child
   directories are processed only after parent directory completes.

  The problem is that the initial thread dispatching in
  zfs_foreach_mountpoint() can be multi-threaded when it needs to be
  single-threaded, and this puts threads under race condition. This race
  appeared as mount/unmount issues on ZoL for ZoL having different
  timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8).
  `zfs unmount -a` which expects proper mount order can't unmount if the
  mounts were reordered by the race condition.

  There are currently two known patterns of input list `handles` in
  `zfs_foreach_mountpoint(..,handles,..)` which cause the race condition.

  1) #8833 case where input is `/a /a /a/b` after sorting.
   The problem is that libzfs_path_contains() can't correctly handle an
   input list with two same top level directories.
   There is a race between two POSIX threads A and B,
    * ThreadA for "/a" for test1 and "/a/b"
    * ThreadB for "/a" for test0/a
   and in case of #8833, ThreadA won the race. Two threads were created
   because "/a" wasn't considered as `"/a" contains "/a"`.

  2) #8450 case where input is `/ /var/data /var/data/test` after sorting.
   The problem is that libzfs_path_contains() can't correctly handle an
   input list containing "/".
   There is a race between two POSIX threads A and B,
    * ThreadA for "/" and "/var/data/test"
    * ThreadB for "/var/data"
   and in case of #8450, ThreadA won the race. Two threads were created
   because "/var/data" wasn't considered as `"/" contains "/var/data"`.
   In other words, if there is (at least one) "/" in the input list,
   the initial thread dispatching must be single-threaded since every
   directory is a child of "/", meaning they all directly or indirectly
   depend on "/".

  In both cases, the first non_descendant_idx() call fails to correctly
  determine "path1-contains-path2", and as a result the initial thread
  dispatching creates another thread when it needs to be single-threaded.
  Fix a conditional in libzfs_path_contains() to consider above two.

  Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
  Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
  Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>

  PR:  237517, 237397, 239243
  Submitted by: Matthew D. Fuller <fullermd@over-yonder.net> (by email)

Changes:
_U  stable/12/
  stable/12/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_mount.c
Comment 6 commit-hook freebsd_committer freebsd_triage 2019-07-29 08:24:12 UTC
A commit references this bug:

Author: bapt
Date: Mon Jul 29 08:23:15 UTC 2019
New revision: 350402
URL: https://svnweb.freebsd.org/changeset/base/350402

Log:
  MFC r350358:

  Fix a bug introduced with parallel mounting of zfs

  Incorporate a fix from zol:
  https://github.com/zfsonlinux/zfs/commit/ab5036df1ccbe1b18c1ce6160b5829e8039d94ce

  commit log from upstream:
   Fix race in parallel mount's thread dispatching algorithm

  Strategy of parallel mount is as follows.

  1) Initial thread dispatching is to select sets of mount points that
   don't have dependencies on other sets, hence threads can/should run
   lock-less and shouldn't race with other threads for other sets. Each
   thread dispatched corresponds to top level directory which may or may
   not have datasets to be mounted on sub directories.

  2) Subsequent recursive thread dispatching for each thread from 1)
   is to mount datasets for each set of mount points. The mount points
   within each set have dependencies (i.e. child directories), so child
   directories are processed only after parent directory completes.

  The problem is that the initial thread dispatching in
  zfs_foreach_mountpoint() can be multi-threaded when it needs to be
  single-threaded, and this puts threads under race condition. This race
  appeared as mount/unmount issues on ZoL for ZoL having different
  timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8).
  `zfs unmount -a` which expects proper mount order can't unmount if the
  mounts were reordered by the race condition.

  There are currently two known patterns of input list `handles` in
  `zfs_foreach_mountpoint(..,handles,..)` which cause the race condition.

  1) #8833 case where input is `/a /a /a/b` after sorting.
   The problem is that libzfs_path_contains() can't correctly handle an
   input list with two same top level directories.
   There is a race between two POSIX threads A and B,
    * ThreadA for "/a" for test1 and "/a/b"
    * ThreadB for "/a" for test0/a
   and in case of #8833, ThreadA won the race. Two threads were created
   because "/a" wasn't considered as `"/a" contains "/a"`.

  2) #8450 case where input is `/ /var/data /var/data/test` after sorting.
   The problem is that libzfs_path_contains() can't correctly handle an
   input list containing "/".
   There is a race between two POSIX threads A and B,
    * ThreadA for "/" and "/var/data/test"
    * ThreadB for "/var/data"
   and in case of #8450, ThreadA won the race. Two threads were created
   because "/var/data" wasn't considered as `"/" contains "/var/data"`.
   In other words, if there is (at least one) "/" in the input list,
   the initial thread dispatching must be single-threaded since every
   directory is a child of "/", meaning they all directly or indirectly
   depend on "/".

  In both cases, the first non_descendant_idx() call fails to correctly
  determine "path1-contains-path2", and as a result the initial thread
  dispatching creates another thread when it needs to be single-threaded.
  Fix a conditional in libzfs_path_contains() to consider above two.

  Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
  Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
  Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>

  PR:  237517, 237397, 239243
  Submitted by: Matthew D. Fuller <fullermd@over-yonder.net> (by email)

Changes:
_U  stable/11/
  stable/11/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_mount.c