Bug 209112

Summary: /usr/sbin/jail jails fail to launch with possible race when jails mount common dir with nullfs
Product: Base System Reporter: agifford
Component: binAssignee: Jamie Gritton <jamie>
Status: Closed FIXED    
Severity: Affects Some People CC: agh, ari, djn, jamie, marcnarc, pat, qjail1
Priority: ---    
Version: 10.3-RELEASE   
Hardware: Any   
OS: Any   

Description agifford 2016-04-27 20:51:51 UTC
On a host with multiple jails (configured using /etc/jail.conf) that mount a common directory as read-only with the jail, some jails will randomly, silently fail to launch due to nullfs mounting failing (silently).  This also occurs on FreeBSD 10.2 and possibly earlier.

DETAILS:

I've got a FreeBSD 10.3 host with three jails that use nullfs to mount a common read-only base system. On reboot, only one or two of the three will start, and I cannot predict which ones. The first jail listed in /etc/jail.conf will usually launch just fine. But subsequent ones fail (and I cannot predict which ones will succedd or fail). There are NO logs indicating the reason for failure on the main system, nor in the jails' individual console log files.

To track down the problem, I added some debugging logging into the /etc/rc.d/jail script, and some exec.prestart/exec.poststart lines to my jail.conf configuration:

/etc/jail.conf:

jail1 {
  host.hostname  = "jail1.example.org";
  path  = "/usr/local/jail/jail1";
  ip4.addr  = 127.0.0.11;
  mount  = "/usr/local/jail/basejail_2016_04_19 /usr/local/jail/jail1/basejail nullfs ro 0 0";
  exec.consolelog = "/var/log/jail_${host.hostname}.log";
  exec.prestart = "/bin/sh -c 'echo PRESTART_${host.hostname} >> /tmp/DEBUG'";
  exec.poststart = "/bin/sh -c 'echo POSTSTART_${host.hostname} >> /tmp/DEBUG'";
}

jail2 {
  host.hostname  = "jail2.example.org";
  path  = "/usr/local/jail/jail2";
  ip4.addr  = 127.0.0.12;
  mount  = "/usr/local/jail/basejail_2016_04_19 /usr/local/jail/jail2/basejail nullfs ro 0 0";
  exec.consolelog = "/var/log/jail_${host.hostname}.log";
  exec.prestart = "/bin/sh -c 'echo PRESTART_${host.hostname} >> /tmp/DEBUG'";
  exec.poststart = "/bin/sh -c 'echo POSTSTART_${host.hostname} >> /tmp/DEBUG'";
}

jail3 {
  host.hostname  = "jail3.example.org";
  path  = "/usr/local/jail/jail3";
  ip4.addr  = 127.0.0.11;
  mount  = "/usr/local/jail/basejail_2016_04_19 /usr/local/jail/jail3/basejail nullfs ro 0 0";
  exec.consolelog = "/var/log/jail_${host.hostname}.log";
  exec.prestart = "/bin/sh -c 'echo PRESTART_${host.hostname} >> /tmp/DEBUG'";
  exec.poststart = "/bin/sh -c 'echo POSTSTART_${host.hostname} >> /tmp/DEBUG'";
}


To the /etc/rc.d/jail script, in the jail_start() function, in the _ALL case statement subsection, to capture the output stored in the $_tmp file on error/failure I added:

echo  "DEBUG: Contents of '$_tmp' are:" >> /tmp/DEBUG
cat $_tmp >> /tmp/DEBUG
echo  "DEBUG: END OF '$_tmp' CONTENTS" >> /tmp/DEBUG

I reboot the FreeBSD 10.3 system.  Only a SINGLE jail started, the first one. The output of /tmp/DEBUG showed me:

PRESTART_jail1
POSTSTART_jail1
DEBUG: Contents of '/tmp/jail.hyLntGie' are:
mount_nullfs: /usr/local/jail/jail2/basejail: Operation not supported by device
mount_nullfs: /usr/local/jail/jail3/basejail: Operation not supported by device
jail: jail2: /sbin/mount -t nullfs -o ro /usr/local/jail/basejail_2016_04_19 /usr/local/jail/jail2/basejail: failed
jail: jail3: /sbin/mount -t nullfs -o ro /usr/local/jail/basejail_2016_04_19 /usr/local/jail/jail3/basejail: failed
jail1: created
DEBUG: jail_start(): END OF '/tmp/jail.hyLntGie' CONTENTS

Ah ha!  The nullfs mounting failed!

BUG #1: Apparently the /usr/sbin/jail command must attempt to launch jails in parallel and there may be some file system resource that the parallel mounting of the common directory is encountering.

And unfortunately the failure was SILENT!  No logs!


BUG #2: The /etc/rc.d/jail script is NOT LOGGING the failure information!


WORKAROUND FOR THE INTERIM:

I can force /usr/sbin/jail to launch my jails sequentially by adding to each jail's /etc/jail.conf section a "depend =" line, like:

jail2 {
  ...
  depend = jail1;
  ...
}

jail3 {
  ...
  depend = jail2;
  ...
}


This strikes me as a very brittle work-around.  And if one jail fails to launch for some other reason, all subsequent jails would fail.

The best solution would be to eliminate whatever resource contention is going on here.

Google searches revealed a jail_parallel_start=NO rc.conf variable, but those appeared to be related to the /etc/rc.d/jail script doing things in parallel.  In this bug, it is the /usr/sbin/jail command executing as a single process that is likely doing things in parallel (or perhaps sequentially but quickly enough that there is some resource contention in the nullfs mounting still) unless the depend= settings are included.

Thanks for your help!
 
Aaron out.
Comment 1 Jamie Gritton freebsd_committer freebsd_triage 2016-04-29 23:24:18 UTC
There are a couple problems with the jail startup script, neither of which is exactly at the center of this complaint.

The first is the lack of logging in case of failure.  The $_tmp file is removed (after printing only the last line), which it should be preserved somehow.  I don't know if printing the whole thing out is the right answer, or appending it to some long file, but removing it is problematic.

The second is the parallel start.  There is a jail_parallel_start flag, but it's only checked when starting a subset of jails (i.e. "service jail start foo bar") and not when starting all jails.

Both of these should be fixed, but they're not the source of your problem, only something that makes it harder to solve.  Unfortunately, I wasn't able to recreate your issue, and was in fact able to create 100 jails each with a nullfs-mount directory, in a setup based on your noted jail.conf file.

Still, I have a guess: the nullfs module.  The error you're getting is ENODEV, is given when mounting a filesystem type that depends on a kernel module and loading that module fails.  Now there's some kernel locking around the module load that should make it so different mounts tying to load the same module at the same time don't run into each other, and like I mentioned it worked out for me.  But of the places where ENODEV was mentioned, that seemed the most likely.

I'll leave this bug open, but I'll only be looking at the jail part of it, not at the central issue.  That's out of my area and what I failed to duplicate.
Comment 2 Joe Barbish 2016-05-02 12:44:44 UTC
There is more that one way to code the use of nullfs common base file system jail environments. sysutils/qjail uses the mount.fstab jail.config variable that points to a file containing the fstab statement. Second thing do not use the "service jail" command to start/stop your jails. Use the native jail(8) command instead. It's bad practice to mix the use of the old way and new way of jail control. I would not call this a work-a-round for your problem, but more like the correct intended usage. As a side affect your jails will start/stop faster.
Comment 3 ari 2016-06-27 06:14:57 UTC
I'm seeing similar issues in my environment:

* FreeBSD 10.2
* about 30 jails started on boot
* usually about 4-6 of those jails fail to start
* no error logs anywhere I can find
* no nullfs or unionfs
* ZFS volume per jail

Starting up the jails individually never fails, so I'm guessing some sort of race condition. But maybe a different one to the original author of this bug.

At any rate, better logging in the jail system would be very welcome.
Comment 4 Daniel Neades 2016-07-06 12:13:37 UTC
I am also seeing this issue with FreeBSD 10.3-Release-p4 – I have six jails, and only two or three start on boot (two fail due to a dependency on another jail that fails to start). I encounter no problems when starting the jails post-boot.

I added logging to /etc/rc.d/jail in a similar manner to the original poster, and see this output:

----
mount_nullfs: /jails/pgsql/data: mount_nullfs: /jails/hg/data: Operation not supported by device
Operation not supported by device
jail: pgsql: /sbin/mount -t nullfs -o rw /data/pgsql /jails/pgsql/data: failed
jail: hg: /sbin/mount -t nullfs -o rw /data/hg /jails/hg/data: failed
jail: support: skipped
jail: logic: skipped
www-bhs-0: created
backup-bhs-0: created

...
----

Here is my jail.conf:

----
# START BLOCK: Araxis jail defaults -- DO NOT EDIT
exec.start = "/bin/sh /etc/rc";
exec.stop = "/bin/sh /etc/rc.shutdown";
exec.clean;
mount.devfs;
mount.fdescfs;
allow.noset_hostname;
host = new;

# temporarily allow during development/testing
allow.raw_sockets;
# END BLOCK: Araxis jail defaults
# START BLOCK: Araxis jail settings for jail www-bhs-0.araxis.net -- DO NOT EDIT
www-bhs-0 {
  mount.fstab = "/jails/.fstabs/www-bhs-0";
  path = "/jails/www-bhs-0";
  host.hostname = www-bhs-0.araxis.net;
  ip4.addr = "lo1|10.11.11.2/32";
  ip6.addr = "ix0|2607:5300:60:9b9c::2/64";
}
# END BLOCK: Araxis jail settings for jail www-bhs-0.araxis.net
# START BLOCK: Araxis jail settings for jail backup-bhs-0.araxis.net -- DO NOT EDIT
backup-bhs-0 {
  mount.fstab = "/jails/.fstabs/backup-bhs-0";
  path = "/jails/backup-bhs-0";
  host.hostname = backup-bhs-0.araxis.net;
  ip4.addr = "lo2|10.12.12.8/32";
  enforce_statfs = 1;
  allow.mount;
  allow.mount.zfs;
  exec.poststart = "/sbin/zfs set jailed=on zroot/bkup && /sbin/zfs jail ${name} zroot/bkup && jexec ${name} /sbin/zfs mount -a";
  exec.prestop = "/sbin/zfs unjail ${name} zroot/bkup && /sbin/zfs set jailed=off zroot/bkup";
}
# END BLOCK: Araxis jail settings for jail backup-bhs-0.araxis.net
# START BLOCK: Araxis jail settings for jail pgsql.araxis.net -- DO NOT EDIT
pgsql {
  mount.fstab = "/jails/.fstabs/pgsql";
  path = "/jails/pgsql";
  host.hostname = pgsql.araxis.net;
  ip4.addr = "lo2|10.12.12.4/32";
  allow.sysvipc;
}
# END BLOCK: Araxis jail settings for jail pgsql.araxis.net
# START BLOCK: Araxis jail settings for jail support.araxis.net -- DO NOT EDIT
support {
  mount.fstab = "/jails/.fstabs/support";
  path = "/jails/support";
  host.hostname = support.araxis.net;
  ip4.addr = "lo2|10.12.12.5/32";
  depend = pgsql;
}
# END BLOCK: Araxis jail settings for jail support.araxis.net
# START BLOCK: Araxis jail settings for jail logic.araxis.com -- DO NOT EDIT
logic {
  mount.fstab = "/jails/.fstabs/logic";
  path = "/jails/logic";
  host.hostname = logic.araxis.com;
  ip4.addr = "lo2|10.12.12.6/32";
  depend = pgsql;
}
# END BLOCK: Araxis jail settings for jail logic.araxis.com
# START BLOCK: Araxis jail settings for jail hg.araxis.net -- DO NOT EDIT
hg {
  mount.fstab = "/jails/.fstabs/hg";
  path = "/jails/hg";
  host.hostname = hg.araxis.net;
  ip4.addr = "lo2|10.12.12.3/32";
}
# END BLOCK: Araxis jail settings for jail hg.araxis.net
----

The various fstab files are nearly identical. Here’s the one (/jails/.fstabs/pgsql) for the pgsql jail, which always fails to start on boot:

----
# Device    Mountpoint    FStype    Options   Dump    Pass#
/data/pgsql   /jails/pgsql/data    nullfs    rw    0   0
----
Comment 5 Daniel Neades 2016-07-06 12:20:03 UTC
I should also like to add my voice to those requesting logging of boot-time jail startup problems. The lack of any diagnostics cost me several hours of time in tracking down this problem.
Comment 6 Daniel Neades 2016-07-06 12:57:08 UTC
I have what seems (on first try, at least) to be a viable alternative workaround that is (somewhat) less icky than adding artificial dependencies between jails. Simply add the following two lines to /etc/rc.conf (or /etc/rc.conf.d/jail):

jail_parallel_start="NO"
jail_list="list of all jails to start"

Specifying the jail_list explicitly means that the jail_parallel_start setting can take effect, since the default _ALL case in jail_start() (which ignores jail_parallel_start) is then bypassed.
Comment 7 commit-hook freebsd_committer freebsd_triage 2016-07-14 20:16:24 UTC
A commit references this bug:

Author: jamie
Date: Thu Jul 14 20:15:56 UTC 2016
New revision: 302856
URL: https://svnweb.freebsd.org/changeset/base/302856

Log:
  Fix up the order in which jail creation processes are run, to preserve
  the config file's order in the non-parallel-start case.

  PR:		209112
  MFC after:	3 days

Changes:
  head/usr.sbin/jail/command.c
  head/usr.sbin/jail/jailp.h
  head/usr.sbin/jail/state.c
Comment 8 commit-hook freebsd_committer freebsd_triage 2016-07-14 20:17:27 UTC
A commit references this bug:

Author: jamie
Date: Thu Jul 14 20:17:08 UTC 2016
New revision: 302857
URL: https://svnweb.freebsd.org/changeset/base/302857

Log:
  Start jails non-parallel if jail_parallel_start is NO.  This was true
  for an explicitly specified jail list; now it's also true for all jails.

  PR:		209112
  MFC after:	3 days

Changes:
  head/etc/rc.d/jail
Comment 9 Jamie Gritton freebsd_committer freebsd_triage 2016-07-14 20:20:21 UTC
I've made some changes to make jail_parallel_start work like it should, even when the jails aren't explicitly listed.  I'm also printing the entire jail(8) output when creation (or removal) fails, instead of just the last line.  This may be verbose on errors, but that's where verbosity is typically a good thing.

What this doesn't do is fix the underlying problem.  But at least the default case (jail_parallel_start="NO") won't have this problem.
Comment 10 commit-hook freebsd_committer freebsd_triage 2016-07-17 14:06:39 UTC
A commit references this bug:

Author: jamie
Date: Sun Jul 17 14:06:18 UTC 2016
New revision: 302954
URL: https://svnweb.freebsd.org/changeset/base/302954

Log:
  MFC r302856:

    Fix up the order in which jail creation processes are run, to preserve
    the config file's order in the non-parallel-start case.

  PR:		209112
  Approved by:	re (gjb)

Changes:
_U  stable/11/
  stable/11/usr.sbin/jail/command.c
  stable/11/usr.sbin/jail/jailp.h
  stable/11/usr.sbin/jail/state.c
Comment 11 commit-hook freebsd_committer freebsd_triage 2016-07-17 14:08:42 UTC
A commit references this bug:

Author: jamie
Date: Sun Jul 17 14:07:51 UTC 2016
New revision: 302955
URL: https://svnweb.freebsd.org/changeset/base/302955

Log:
  MFC r302857:

    Start jails non-parallel if jail_parallel_start is NO.  This was true
    for an explicitly specified jail list; now it's also true for all jails.

  PR:		209112
  Approved by:	re (gjb)

Changes:
_U  stable/11/
  stable/11/etc/rc.d/jail
Comment 12 commit-hook freebsd_committer freebsd_triage 2016-07-17 14:15:46 UTC
A commit references this bug:

Author: jamie
Date: Sun Jul 17 14:15:09 UTC 2016
New revision: 302958
URL: https://svnweb.freebsd.org/changeset/base/302958

Log:
  MFC r302856:

      Fix up the order in which jail creation processes are run, to preserve
      the config file's order in the non-parallel-start case.

  PR:		209112

Changes:
_U  stable/10/
  stable/10/usr.sbin/jail/command.c
  stable/10/usr.sbin/jail/jailp.h
  stable/10/usr.sbin/jail/state.c
Comment 13 commit-hook freebsd_committer freebsd_triage 2016-07-17 14:16:48 UTC
A commit references this bug:

Author: jamie
Date: Sun Jul 17 14:16:22 UTC 2016
New revision: 302959
URL: https://svnweb.freebsd.org/changeset/base/302959

Log:
  MFC r302857:

    Start jails non-parallel if jail_parallel_start is NO.  This was true
    for an explicitly specified jail list; now it's also true for all jails.

  PR:		209112

Changes:
_U  stable/10/
  stable/10/etc/rc.d/jail
Comment 14 Marc Branchaud 2018-04-25 19:53:54 UTC
Commit base r302857 broke non-parallel jail starts for me when it was MFC'd to 10.4.

I can start each jail individually, but if I try to start them all with a single "/etc/rc.d/jail start" some of them fail.

Since r302857, when jail_parallel_start="NO" the rc script adds "-p1" to the jail command's arguments.  Things work fine if I remove that flag.
Comment 15 Mark Linimon freebsd_committer freebsd_triage 2024-01-20 21:35:17 UTC
^Triage: FreeBSD 10 is long out of support.