Bug 26740

Summary: [PATCH] jail improvement
Product: Base System Reporter: Oliver Fromme <olli>
Component: kernAssignee: Robert Watson <rwatson>
Status: Closed FIXED    
Severity: Affects Only Me CC: olli
Priority: Normal    
Version: Unspecified   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
file.diff none

Description Oliver Fromme 2001-04-21 00:20:02 UTC
This patch adds a small feature to jails.

Processes within a jails cannot see things outside their
jail (files, processes, sockets) -- however, thay still
can see _all_ mountpoints.  This patch fixes that.  In
particular, it does two things:
-1-  Mounts which are outside of a jail are not returned
     to processes within that jail.
-2-  Mounts within a jail get their jail chroot prefix
     stripped off.  See example below.

The patch modifies statfs(), fstatfs() and getfsstat().

Without my patch, the output of "mount" within a jail on
my test machine looks like this:

   /dev/da0a on / (ufs, local, soft-updates)
   /dev/da0f on /sea (ufs, local, soft-updates)
   /dev/da0e on /var (ufs, local, soft-updates)
   procfs on /proc (procfs, local)
   /dev/vn0c on /usr (ufs, local, soft-updates)
   /dev/vn1c on /jail/olli (ufs, local, read-only)
   /dev/vn2c on /jail/olli (ufs, local, union, soft-updates)
   procfs on /jail/olli/proc (procfs, local)
   /dev/vn3c on /jail/olli/home (ufs, local, read-only)

With my patch, the output is this:

   /dev/vn1c on / (ufs, local, read-only)
   /dev/vn2c on / (ufs, local, union, soft-updates)
   procfs on /proc (procfs, local)
   /dev/vn3c on /home (ufs, local, read-only)

There's a small point to note:  If there aren't any mounts
within the jail, getfsstat() returns 0 (an empty list).
/bin/df handles this well, but /sbin/mount prints
,,error 0``.  I tend to think that that's a bug in
/sbin/mount.  I'll submit a fix if desired.

Fix: NOTE:  This patch is against -stable.  Unfortunately, I
don't have a machine left for installing -current.
I've had a quick look at the relevant files in -current
in the CVS repository, and I _think_ it shouldn't be too
painful to apply the patch there (change p->pr_prison to
p->p_ucred->cr_prison etc.), but I can't test it myself
on -current.  (It's running perfectly fine on my -stable
box, of course.)

How-To-Repeat: 
Set up a jail environment somewhere, start the jail and
type "df" or "mount".  You'll see all mountpoints,
including those in other jails and those that aren't in
any jails.
Comment 1 Robert Watson freebsd_committer freebsd_triage 2001-04-22 21:28:32 UTC
Responsible Changed
From-To: freebsd-bugs->rwatson

After a quick chat with Poul-Henning, we concluded I'd take ownership of 
this bug report/feature request.  However, we have some concerns about 
the effectiveness of the patch in slightly less usual file system namespaces 
(such as ones where mount within chroot occurs).
Comment 2 Oliver Fromme 2002-02-27 14:51:05 UTC
I have updated and improved the patch.
The following is the complete patch for 4.5-RELEASE.

--- src/sys/kern/kern_jail.c.orig	Mon Sep  3 10:17:12 2001
+++ src/sys/kern/kern_jail.c	Wed Feb 27 15:40:43 2002
@@ -69,6 +69,9 @@
 	error = copyinstr(j.hostname, &pr->pr_host, sizeof pr->pr_host, 0);
 	if (error) 
 		goto bail;
+	error = copyinstr(j.path, &pr->pr_path, sizeof pr->pr_path, 0);
+	if (error)
+		goto bail;
 	pr->pr_ip = j.ip_number;
 
 	ca.path = j.path;
--- src/sys/kern/vfs_syscalls.c.orig	Mon Jan  7 21:47:34 2002
+++ src/sys/kern/vfs_syscalls.c	Wed Feb 27 13:21:06 2002
@@ -59,6 +59,7 @@
 #include <sys/unistd.h>
 #include <sys/vnode.h>
 #include <sys/proc.h>
+#include <sys/jail.h>
 #include <sys/dirent.h>
 #include <sys/extattr.h>
 
@@ -72,6 +73,9 @@
 
 static int change_dir __P((struct nameidata *ndp, struct proc *p));
 static void checkdirs __P((struct vnode *olddp));
+static int check_prison_mount __P((struct proc *p, struct statfs *sp,
+    int specific));
+static void trim_prison_mount __P((struct statfs *sp, int pripl));
 static int chroot_refuse_vdir_fds __P((struct filedesc *fdp));
 static int getutimes __P((const struct timeval *, struct timespec *));
 static int setfown __P((struct proc *, struct vnode *, uid_t, gid_t));
@@ -356,7 +360,7 @@
 /*
  * Scan all active processes to see if any of them have a current
  * or root directory onto which the new filesystem has just been
- * mounted. If so, replace them with the new mount point.
+ * mounted. If so, replace them with the new mountpoint.
  */
 static void
 checkdirs(olddp)
@@ -611,6 +615,90 @@
 }
 
 /*
+ * Check if we're imprisoned and whether the mountpoint is inside of
+ * our prison.  Return values:
+ *    -1   if we're not supposed to see this mount.
+ *     0   if it's alright.
+ *    >0   strip that many characters from the beginning of the path of
+ *         the mountpoint.  E.g. if the mountpoint is /foo/bar and the
+ *         jail is chrooted at /foo, 4 is be returned, so only /bar
+ *         will be left.
+ *
+ * The "specific" flag indicates if the request has been issued for
+ * this specific filesystem (1), or if this is part of an enumeration
+ * of all filesystems (0).  In the former case, we allow it to be
+ * visible _IF_ it is a prefix of the jail chroot directory.  This
+ * makes "ls" etc. work if the mountpoint of the jail root is outside
+ * of the jail.  In the latter case, this filesystem is hidden.
+ * Example:  The jail chroot is /usr/jail (which is not a separate
+ * filesystem, but /usr is).  Ordinarily, "ls" within the jail would
+ * fail ("no such file or directory"), because it calls fstatfs() on
+ * the directory.  Similarly for "df ." etc.  Therefore, /usr must be
+ * visible within the jail for fstatfs() (it'll show up as "/" within
+ * the jail).
+ */
+static int
+check_prison_mount(p, sp, specific)
+	struct proc *p;
+	struct statfs *sp;
+	int specific;
+{
+	register char *prip;	/* prison path */
+	register int pripl;	/* prison path length */
+
+	/* short-cut:  no prison -> no restriction */
+	if (!p->p_prison)
+		return (0);
+	prip = p->p_prison->pr_path;
+	pripl = strlen(prip);
+	if (pripl > 0 && prip[pripl - 1] == '/')
+		pripl--;	/* ignore trailing slash, if any */
+	/*
+	 * First, if this is a specific filesystem request, check if
+	 * the mountpoint is a prefix of the jail root.
+	 * If it is, then strip the whole path, so it'll become "/".
+	 * Otherwise continue with the normal (non-specific) check.
+	 */
+	if (specific) {
+		int mntlen = strlen(sp->f_mntonname);
+
+		if (strncmp(prip, sp->f_mntonname, mntlen) == 0 &&
+		    (prip[pripl] == '\0' || prip[pripl] == '/'))
+			return mntlen;	/* strip all */
+	}
+	/*
+	 * Note that it is not sufficient to check for the
+	 * first <pripl> characters to be the same.
+	 * We also have to make sure that the next character
+	 * in the mountpoint path is either '\0' or '/'.
+	 * Otherwise a jail in "/foo/bar" would be allowed
+	 * to see a mount at "/foo/barbara".
+	 */
+	if (strncmp(prip, sp->f_mntonname, pripl) != 0 ||
+	    (sp->f_mntonname[pripl] != '\0' && sp->f_mntonname[pripl] != '/'))
+		return (-1);	/* not our business */
+	return (pripl);
+}
+
+/*
+ * Remove the jail chroot path from the mountpoint,
+ * so that imprisoned users see everything relative
+ * to their jail chroot.
+ */
+static void
+trim_prison_mount(sp, pripl)
+	struct statfs *sp;
+	int pripl;	/* prison path length */
+{
+	strcpy(sp->f_mntonname, sp->f_mntonname + pripl);
+	/* If there's nothing left, this ought to be "/". */
+	if (sp->f_mntonname[0] == '\0') {
+		sp->f_mntonname[0] = '/';
+		sp->f_mntonname[1] = '\0';
+	}
+}
+
+/*
  * Get filesystem statistics.
  */
 #ifndef _SYS_SYSPROTO_H_
@@ -631,6 +719,8 @@
 	register struct mount *mp;
 	register struct statfs *sp;
 	int error;
+	int pripl;
+	int nonsu;
 	struct nameidata nd;
 	struct statfs sb;
 
@@ -641,13 +731,19 @@
 	sp = &mp->mnt_stat;
 	NDFREE(&nd, NDF_ONLY_PNBUF);
 	vrele(nd.ni_vp);
+	if ((pripl = check_prison_mount(p, sp, 1)) < 0)
+		return (ENOENT);
 	error = VFS_STATFS(mp, sp, p);
 	if (error)
 		return (error);
 	sp->f_flags = mp->mnt_flag & MNT_VISFLAGMASK;
-	if (suser_xxx(p->p_ucred, 0, 0)) {
+	nonsu = suser_xxx(p->p_ucred, 0, 0);
+	if (nonsu || pripl) {
 		bcopy((caddr_t)sp, (caddr_t)&sb, sizeof(sb));
-		sb.f_fsid.val[0] = sb.f_fsid.val[1] = 0;
+		if (nonsu)
+			sb.f_fsid.val[0] = sb.f_fsid.val[1] = 0;
+		if (pripl)
+			trim_prison_mount(&sb, pripl);
 		sp = &sb;
 	}
 	return (copyout((caddr_t)sp, (caddr_t)SCARG(uap, buf), sizeof(*sp)));
@@ -675,6 +771,8 @@
 	struct mount *mp;
 	register struct statfs *sp;
 	int error;
+	int pripl;
+	int nonsu;
 	struct statfs sb;
 
 	if ((error = getvnode(p->p_fd, SCARG(uap, fd), &fp)) != 0)
@@ -683,13 +781,19 @@
 	if (mp == NULL)
 		return (EBADF);
 	sp = &mp->mnt_stat;
+	if ((pripl = check_prison_mount(p, sp, 1)) < 0)
+		return (ENOENT);
 	error = VFS_STATFS(mp, sp, p);
 	if (error)
 		return (error);
 	sp->f_flags = mp->mnt_flag & MNT_VISFLAGMASK;
-	if (suser_xxx(p->p_ucred, 0, 0)) {
+	nonsu = suser_xxx(p->p_ucred, 0, 0);
+	if (nonsu || pripl) {
 		bcopy((caddr_t)sp, (caddr_t)&sb, sizeof(sb));
-		sb.f_fsid.val[0] = sb.f_fsid.val[1] = 0;
+		if (nonsu)
+			sb.f_fsid.val[0] = sb.f_fsid.val[1] = 0;
+		if (pripl)
+			trim_prison_mount(&sb, pripl);
 		sp = &sb;
 	}
 	return (copyout((caddr_t)sp, (caddr_t)SCARG(uap, buf), sizeof(*sp)));
@@ -718,13 +822,16 @@
 	register struct statfs *sp;
 	caddr_t sfsp;
 	long count, maxcount, error;
+	int pripl;
+	struct statfs sb;
 
 	maxcount = SCARG(uap, bufsize) / sizeof(struct statfs);
 	sfsp = (caddr_t)SCARG(uap, buf);
 	count = 0;
 	simple_lock(&mountlist_slock);
 	for (mp = TAILQ_FIRST(&mountlist); mp != NULL; mp = nmp) {
-		if (vfs_busy(mp, LK_NOWAIT, &mountlist_slock, p)) {
+		pripl = check_prison_mount(p, &mp->mnt_stat, 0);
+		if (pripl < 0 || vfs_busy(mp, LK_NOWAIT, &mountlist_slock, p)) {
 			nmp = TAILQ_NEXT(mp, mnt_list);
 			continue;
 		}
@@ -744,6 +851,11 @@
 				continue;
 			}
 			sp->f_flags = mp->mnt_flag & MNT_VISFLAGMASK;
+			if (pripl) {
+				bcopy((caddr_t)sp, (caddr_t)&sb, sizeof(sb));
+				trim_prison_mount(&sb, pripl);
+				sp = &sb;
+			}
 			error = copyout((caddr_t)sp, sfsp, sizeof(*sp));
 			if (error) {
 				vfs_unbusy(mp, p);
--- src/sys/sys/jail.h.orig	Wed Nov  1 18:58:06 2000
+++ src/sys/sys/jail.h	Wed Feb 27 15:40:17 2002
@@ -30,6 +30,17 @@
 MALLOC_DECLARE(M_PRISON);
 #endif
 
+
+#ifndef MNAMELEN
+/* This is taken from sys/mount.h. */
+#ifdef __i386__
+#define	MNAMELEN	80	/* length of buffer for returned name */
+#endif
+#ifdef __alpha__
+#define	MNAMELEN	72	/* length of buffer for returned name */
+#endif
+#endif
+
 /*
  * This structure describes a prison.  It is pointed to by all struct
  * proc's of the inmates.  pr_ref keeps track of them and is used to
@@ -39,6 +50,7 @@
 struct prison {
 	int		pr_ref;
 	char 		pr_host[MAXHOSTNAMELEN];
+	char		pr_path[MNAMELEN];
 	u_int32_t	pr_ip;
 	void		*pr_linux;
 };
Comment 3 Oliver Fromme 2003-01-28 12:12:38 UTC
Hi,

It is obvious that the patch in this PR will not be comitted.
Therefore I suggest that this PR be closed.

I have submitted a new patch:  PR kern/47586
It fixes the shortcomings of the old patch in this PR (in
particular, it scales much better, and also works fine in
combined chroot/jail environments).

-- 
Oliver Fromme, Konrad-Celtis-Str. 72, 81369 Munich, Germany

``All that we see or seem is just a dream within a dream.''
(E. A. Poe)
Comment 4 Maxim Konovalov freebsd_committer freebsd_triage 2003-03-24 15:43:42 UTC
State Changed
From-To: open->closed

Closed at the submitter's request, obsoleted by kern/47586.