Bug 51583 - [nullfs] [patch] allow to work with devices and sockets over nullfs [STABLE, 5.0-CURRENT]
Summary: [nullfs] [patch] allow to work with devices and sockets over nullfs [STABLE, ...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 5.0-CURRENT
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-fs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-04-29 14:30 UTC by mitya
Modified: 2012-04-24 20:25 UTC (History)
0 users

See Also:


Attachments
file.diff (431 bytes, patch)
2003-04-29 14:30 UTC, mitya
no flags Details | Diff
file.diff (622 bytes, patch)
2003-04-29 14:30 UTC, mitya
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description mitya 2003-04-29 14:30:03 UTC
1) The system may panic when accessing devices over nullfs:
mount_null(fs) /dev /mnt
dd if=/mnt/zero of=/tmp/t.dat

The system will panic.

2) It is impossible to work with unix sockets (PF_LOCAL) over nullfs:
To reproduce: start mysqld.  It creates socket /tmp/mysql.sock to 
communicate with local mysql client.
mount_null(fs) /tmp /mnt
Try to use this socket:
mysql -S /mnt/mysql.sock mysql

Connection to mysqld will fail.

Fix: I am attaching patches for both -current and -stable.
I am not sure how correct they are, but they do work.

-STABLE:

-CURRENT:
Comment 1 tjr freebsd_committer 2003-06-17 14:19:41 UTC
Responsible Changed
From-To: freebsd-bugs->tjr

I'll check the patch out and commit it if it seems ok.
Comment 2 tjr freebsd_committer 2004-02-15 07:24:11 UTC
Responsible Changed
From-To: tjr->freebsd-bugs

Unassign. I haven't had time to fix this.
Comment 3 Mark Linimon freebsd_committer 2004-07-16 05:42:53 UTC
State Changed
From-To: open->feedback

Is this still a problem with more recent versions of -stable and 
-current, or has it been fixed on one or the other?
Comment 4 Dmitry Sivachenko freebsd_committer 2004-07-16 06:00:49 UTC
State Changed
From-To: feedback->open

It's still a problem.
Comment 5 linimon 2004-07-16 18:17:32 UTC
Adding to audit trail from email received:

From: Jeremie Le Hen <jeremie.le-hen@epita.fr>

I tried the attached patch, and it's appear to be only a half-fix.
To test it, I made a small server which listens on a named Unix socket
and a client sending an hello message through it.  Here are the results
on a 5-CURRENT (about 50 days ago) (`lower' means the program opens the
socket on the lower filesystem while `upper' means it is on the null
layer) :

			Without patch		With Dmitry
						Sivachenko's patch

client(lower) /		OK			OK
server(lower)

client(upper) /		Failed			OK
server(lower)

client(lower) /		Failed			Failed
server(upper)

client(upper) /		OK			OK
server(upper)


I hope this can help.  I tried to understand why it fails, but my
programming skills are not good enough.

Regards,
-- 
Jeremie LE HEN aka TtZ/TataZ                          jeremie.le-hen@epita.fr
                                                                 ttz@epita.fr
Comment 6 bwb 2004-08-01 22:33:55 UTC
The reason it fails in case #3 is that vp->v_socket is being set on the
upper layer by unp_bind(), but not on the lower layer, because that code
is ignorant of filesystem layering.

I'm not sure what the best fix is for this.  Is it acceptable to have
null_lookup simply return the lower vnode when it's of type VSOCK, or
would that break something else?
Comment 7 spell 2004-11-01 15:57:40 UTC
This patch is working on my 5.3-STABLE Oct 22
(referring to Mark Linimon's classification,
I have "client(upper)/server(lower)" situation),
but "find" and "ls -R" on nullfs mount point
after outputing some amount of files
exit with "fts_read: No such file or directory".
Comment 8 timbob 2006-12-18 07:01:40 UTC
I confirm that this is still a problem on 6.1-RELEASE (-p11).

In particular, it prevents using nullfs to run X clients through Unix
domain sockets from a chroot or jail (i.e. connecting via /tmp/.X11-unix/X0).
(http://lists.freebsd.org/pipermail/freebsd-emulation/2006-December/002912.html)

The Linimon/Le Hen tests can be verified using the net/netcat port:

    setup
    -----
    mkdir lower
    mkdir upper
    mount -t nullfs lower upper
    touch lower/testport

    test.sh
    --------
    #!/bin.sh

    rm lower/testport
    nc -lU $BIND/testport &
    SERVER=$!
    echo test | nc -U $CONNECT/testport
    echo $?
    kill $SERVER 2>/dev/null

    tests
    -----
    BIND=lower CONNECT=lower ./test.sh # works
    BIND=upper CONNECT=upper ./test.sh # works
    BIND=lower CONNECT=upper ./test.sh # FAILS before patch, works after
    BIND=upper CONNECT=lower ./test.sh # FAILS before and after patch

The Sivachenko patch corrects the following situation:

    analysis: BIND=lower CONNECT=upper
    ----------------------------------
    1. s = socket(AF_UNIX, SOCK_STREAM, 0)
       falloc: creates a new open file in the process descriptor table
       socreate: associates a socket with this file and associates the unix
                 domain protosw functions

    2. connect(s, name, namelen)
       kern_connect -> so_connect -> pru_connect -> unp_connect

       * unp_connect (src/sys/kern/uipc_usrreq.c, v1.155.2.3)

       * call namei to retrieve the requested vnode
         -calls null_lookup in src/sys/fs/null_vnops.c
         -in turn calls null_nodeget in src/sys/fs/null_subr.c
          returns the UPPER vnode
          DOES NOT copy the v_un field from the lower vnode.

       * line 962: so2 = vp->v_socket;              (vp->v_un.vu_socket)
         DIRECT ACCESS to v_un field of (UPPER) vnode.

The Buchanan analysis refers to a different location:

    analysis: BIND=upper CONNECT=lower
    ----------------------------------
    1. s = socket(AF_UNIX, SOCK_STREAM, 0)
    2. bind(s, name, namelen)
       
       * unp_bind (src/sys/kern/uipc_usrreq.c, v1.155.2.3)
       * creates new vnodes
       * line 902: vp->v_socket = unp->unp_socket;  (vp->v_un.vu_socket)
         DIRECT ACCESS to v_un field of (UPPER) vnode.

    Not fixed by the submitted patch.
    Messy. The socket information in the upper (nullfs) vnode must somehow be
    passed down into the lower vnode...

A proper fix is beyond me.
Comment 9 Mark Linimon freebsd_committer 2009-05-18 05:25:47 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-fs

Over to maintainer(s).
Comment 10 dfilter freebsd_committer 2012-02-29 21:38:56 UTC
Author: trociny
Date: Wed Feb 29 21:38:31 2012
New Revision: 232317
URL: http://svn.freebsd.org/changeset/base/232317

Log:
  Introduce VOP_UNP_BIND(), VOP_UNP_CONNECT(), and VOP_UNP_DETACH()
  operations for setting and accessing vnode's v_socket field.
  
  The operations are necessary to implement proper unix socket handling
  on layered file systems like nullfs(5).
  
  This change fixes the long standing issue with nullfs(5) being in that
  unix sockets did not work between lower and upper layers: if we bound
  to a socket on the lower layer we could connect only to the lower
  path; if we bound to the upper layer we could connect only to the
  upper path. The new behavior is one can connect to both the lower and
  the upper paths regardless what layer path one binds to.
  
  PR:		kern/51583, kern/159663
  Suggested by:	kib
  Reviewed by:	arch
  MFC after:	2 weeks

Modified:
  head/UPDATING
  head/sys/kern/uipc_usrreq.c
  head/sys/kern/vfs_default.c
  head/sys/kern/vnode_if.src
  head/sys/sys/vnode.h

Modified: head/UPDATING
==============================================================================
--- head/UPDATING	Wed Feb 29 21:11:02 2012	(r232316)
+++ head/UPDATING	Wed Feb 29 21:38:31 2012	(r232317)
@@ -22,6 +22,14 @@ NOTE TO PEOPLE WHO THINK THAT FreeBSD 10
 	machines to maximize performance.  (To disable malloc debugging, run
 	ln -s aj /etc/malloc.conf.)
 
+20120229:
+	Now unix domain sockets behave "as expected" on	nullfs(5). Previously
+	nullfs(5) did not pass through all behaviours to the underlying layer,
+	as a result if we bound to a socket on the lower layer we could connect
+	only to the lower path; if we bound to the upper layer we could connect
+	only to	the upper path. The new behavior is one can connect to both the
+	lower and the upper paths regardless what layer path one binds to.
+
 20120211:
 	The getifaddrs upgrade path broken with 20111215 has been restored.
 	If you have upgraded in between 20111215 and 20120209 you need to

Modified: head/sys/kern/uipc_usrreq.c
==============================================================================
--- head/sys/kern/uipc_usrreq.c	Wed Feb 29 21:11:02 2012	(r232316)
+++ head/sys/kern/uipc_usrreq.c	Wed Feb 29 21:38:31 2012	(r232317)
@@ -542,7 +542,7 @@ restart:
 
 	UNP_LINK_WLOCK();
 	UNP_PCB_LOCK(unp);
-	vp->v_socket = unp->unp_socket;
+	VOP_UNP_BIND(vp, unp->unp_socket);
 	unp->unp_vnode = vp;
 	unp->unp_addr = soun;
 	unp->unp_flags &= ~UNP_BINDING;
@@ -638,7 +638,7 @@ uipc_detach(struct socket *so)
 	 * XXXRW: Should assert vp->v_socket == so.
 	 */
 	if ((vp = unp->unp_vnode) != NULL) {
-		unp->unp_vnode->v_socket = NULL;
+		VOP_UNP_DETACH(vp);
 		unp->unp_vnode = NULL;
 	}
 	unp2 = unp->unp_conn;
@@ -1308,7 +1308,7 @@ unp_connect(struct socket *so, struct so
 	 * and to protect simultaneous locking of multiple pcbs.
 	 */
 	UNP_LINK_WLOCK();
-	so2 = vp->v_socket;
+	VOP_UNP_CONNECT(vp, &so2);
 	if (so2 == NULL) {
 		error = ECONNREFUSED;
 		goto bad2;
@@ -2318,17 +2318,15 @@ vfs_unp_reclaim(struct vnode *vp)
 
 	active = 0;
 	UNP_LINK_WLOCK();
-	so = vp->v_socket;
+	VOP_UNP_CONNECT(vp, &so);
 	if (so == NULL)
 		goto done;
 	unp = sotounpcb(so);
 	if (unp == NULL)
 		goto done;
 	UNP_PCB_LOCK(unp);
-	if (unp->unp_vnode != NULL) {
-		KASSERT(unp->unp_vnode == vp,
-		    ("vfs_unp_reclaim: vp != unp->unp_vnode"));
-		vp->v_socket = NULL;
+	if (unp->unp_vnode == vp) {
+		VOP_UNP_DETACH(vp);
 		unp->unp_vnode = NULL;
 		active = 1;
 	}

Modified: head/sys/kern/vfs_default.c
==============================================================================
--- head/sys/kern/vfs_default.c	Wed Feb 29 21:11:02 2012	(r232316)
+++ head/sys/kern/vfs_default.c	Wed Feb 29 21:38:31 2012	(r232317)
@@ -123,6 +123,9 @@ struct vop_vector default_vnodeops = {
 	.vop_unlock =		vop_stdunlock,
 	.vop_vptocnp =		vop_stdvptocnp,
 	.vop_vptofh =		vop_stdvptofh,
+	.vop_unp_bind =		vop_stdunp_bind,
+	.vop_unp_connect =	vop_stdunp_connect,
+	.vop_unp_detach =	vop_stdunp_detach,
 };
 
 /*
@@ -1037,6 +1040,30 @@ vop_stdadvise(struct vop_advise_args *ap
 	return (error);
 }
 
+int
+vop_stdunp_bind(struct vop_unp_bind_args *ap)
+{
+
+	ap->a_vp->v_socket = ap->a_socket;
+	return (0);
+}
+
+int
+vop_stdunp_connect(struct vop_unp_connect_args *ap)
+{
+
+	*ap->a_socket = ap->a_vp->v_socket;
+	return (0);
+}
+
+int
+vop_stdunp_detach(struct vop_unp_detach_args *ap)
+{
+
+	ap->a_vp->v_socket = NULL;
+	return (0);
+}
+
 /*
  * vfs default ops
  * used to fill the vfs function table to get reasonable default return values.

Modified: head/sys/kern/vnode_if.src
==============================================================================
--- head/sys/kern/vnode_if.src	Wed Feb 29 21:11:02 2012	(r232316)
+++ head/sys/kern/vnode_if.src	Wed Feb 29 21:38:31 2012	(r232317)
@@ -640,6 +640,26 @@ vop_advise {
 	IN int advice;
 };
 
+%% unp_bind	vp	E E E
+
+vop_unp_bind {
+	IN struct vnode *vp;
+	IN struct socket *socket;
+};
+
+%% unp_connect	vp	L L L
+
+vop_unp_connect {
+	IN struct vnode *vp;
+	OUT struct socket **socket;
+};
+
+%% unp_detach	vp	= = =
+
+vop_unp_detach {
+	IN struct vnode *vp;
+};
+
 # The VOPs below are spares at the end of the table to allow new VOPs to be
 # added in stable branches without breaking the KBI.  New VOPs in HEAD should
 # be added above these spares.  When merging a new VOP to a stable branch,

Modified: head/sys/sys/vnode.h
==============================================================================
--- head/sys/sys/vnode.h	Wed Feb 29 21:11:02 2012	(r232316)
+++ head/sys/sys/vnode.h	Wed Feb 29 21:38:31 2012	(r232317)
@@ -703,6 +703,9 @@ int	vop_stdpathconf(struct vop_pathconf_
 int	vop_stdpoll(struct vop_poll_args *);
 int	vop_stdvptocnp(struct vop_vptocnp_args *ap);
 int	vop_stdvptofh(struct vop_vptofh_args *ap);
+int	vop_stdunp_bind(struct vop_unp_bind_args *ap);
+int	vop_stdunp_connect(struct vop_unp_connect_args *ap);
+int	vop_stdunp_detach(struct vop_unp_detach_args *ap);
 int	vop_eopnotsupp(struct vop_generic_args *ap);
 int	vop_ebadf(struct vop_generic_args *ap);
 int	vop_einval(struct vop_generic_args *ap);
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Comment 11 dfilter freebsd_committer 2012-04-24 20:08:50 UTC
Author: trociny
Date: Tue Apr 24 19:08:40 2012
New Revision: 234660
URL: http://svn.freebsd.org/changeset/base/234660

Log:
  MFC r232317:
  
  Introduce VOP_UNP_BIND(), VOP_UNP_CONNECT(), and VOP_UNP_DETACH()
  operations for setting and accessing vnode's v_socket field.
  
  The operations are necessary to implement proper unix socket handling
  on layered file systems like nullfs(5).
  
  This change fixes the long standing issue with nullfs(5) being in that
  unix sockets did not work between lower and upper layers: if we bound
  to a socket on the lower layer we could connect only to the lower
  path; if we bound to the upper layer we could connect only to the
  upper path. The new behavior is one can connect to both the lower and
  the upper paths regardless what layer path one binds to.
  
  PR:		kern/51583, kern/159663
  Suggested by:	kib
  Reviewed by:	arch

Modified:
  stable/9/UPDATING   (contents, props changed)
  stable/9/sys/kern/uipc_usrreq.c
  stable/9/sys/kern/vfs_default.c
  stable/9/sys/kern/vnode_if.src
  stable/9/sys/sys/vnode.h
Directory Properties:
  stable/9/sys/   (props changed)

Modified: stable/9/UPDATING
==============================================================================
--- stable/9/UPDATING	Tue Apr 24 19:00:42 2012	(r234659)
+++ stable/9/UPDATING	Tue Apr 24 19:08:40 2012	(r234660)
@@ -9,6 +9,14 @@ handbook.
 Items affecting the ports and packages system can be found in
 /usr/ports/UPDATING.  Please read that file before running portupgrade.
 
+20120422:
+	Now unix domain sockets behave "as expected" on	nullfs(5). Previously
+	nullfs(5) did not pass through all behaviours to the underlying layer,
+	as a result if we bound to a socket on the lower layer we could connect
+	only to the lower path; if we bound to the upper layer we could connect
+	only to	the upper path. The new behavior is one can connect to both the
+	lower and the upper paths regardless what layer path one binds to.
+
 20120109:
 	The acpi_wmi(4) status device /dev/wmistat has been renamed to
 	/dev/wmistat0.

Modified: stable/9/sys/kern/uipc_usrreq.c
==============================================================================
--- stable/9/sys/kern/uipc_usrreq.c	Tue Apr 24 19:00:42 2012	(r234659)
+++ stable/9/sys/kern/uipc_usrreq.c	Tue Apr 24 19:08:40 2012	(r234660)
@@ -541,7 +541,7 @@ restart:
 
 	UNP_LINK_WLOCK();
 	UNP_PCB_LOCK(unp);
-	vp->v_socket = unp->unp_socket;
+	VOP_UNP_BIND(vp, unp->unp_socket);
 	unp->unp_vnode = vp;
 	unp->unp_addr = soun;
 	unp->unp_flags &= ~UNP_BINDING;
@@ -637,7 +637,7 @@ uipc_detach(struct socket *so)
 	 * XXXRW: Should assert vp->v_socket == so.
 	 */
 	if ((vp = unp->unp_vnode) != NULL) {
-		unp->unp_vnode->v_socket = NULL;
+		VOP_UNP_DETACH(vp);
 		unp->unp_vnode = NULL;
 	}
 	unp2 = unp->unp_conn;
@@ -1307,7 +1307,7 @@ unp_connect(struct socket *so, struct so
 	 * and to protect simultaneous locking of multiple pcbs.
 	 */
 	UNP_LINK_WLOCK();
-	so2 = vp->v_socket;
+	VOP_UNP_CONNECT(vp, &so2);
 	if (so2 == NULL) {
 		error = ECONNREFUSED;
 		goto bad2;
@@ -2317,17 +2317,15 @@ vfs_unp_reclaim(struct vnode *vp)
 
 	active = 0;
 	UNP_LINK_WLOCK();
-	so = vp->v_socket;
+	VOP_UNP_CONNECT(vp, &so);
 	if (so == NULL)
 		goto done;
 	unp = sotounpcb(so);
 	if (unp == NULL)
 		goto done;
 	UNP_PCB_LOCK(unp);
-	if (unp->unp_vnode != NULL) {
-		KASSERT(unp->unp_vnode == vp,
-		    ("vfs_unp_reclaim: vp != unp->unp_vnode"));
-		vp->v_socket = NULL;
+	if (unp->unp_vnode == vp) {
+		VOP_UNP_DETACH(vp);
 		unp->unp_vnode = NULL;
 		active = 1;
 	}

Modified: stable/9/sys/kern/vfs_default.c
==============================================================================
--- stable/9/sys/kern/vfs_default.c	Tue Apr 24 19:00:42 2012	(r234659)
+++ stable/9/sys/kern/vfs_default.c	Tue Apr 24 19:08:40 2012	(r234660)
@@ -123,6 +123,9 @@ struct vop_vector default_vnodeops = {
 	.vop_unlock =		vop_stdunlock,
 	.vop_vptocnp =		vop_stdvptocnp,
 	.vop_vptofh =		vop_stdvptofh,
+	.vop_unp_bind =		vop_stdunp_bind,
+	.vop_unp_connect =	vop_stdunp_connect,
+	.vop_unp_detach =	vop_stdunp_detach,
 };
 
 /*
@@ -1037,6 +1040,30 @@ vop_stdadvise(struct vop_advise_args *ap
 	return (error);
 }
 
+int
+vop_stdunp_bind(struct vop_unp_bind_args *ap)
+{
+
+	ap->a_vp->v_socket = ap->a_socket;
+	return (0);
+}
+
+int
+vop_stdunp_connect(struct vop_unp_connect_args *ap)
+{
+
+	*ap->a_socket = ap->a_vp->v_socket;
+	return (0);
+}
+
+int
+vop_stdunp_detach(struct vop_unp_detach_args *ap)
+{
+
+	ap->a_vp->v_socket = NULL;
+	return (0);
+}
+
 /*
  * vfs default ops
  * used to fill the vfs function table to get reasonable default return values.

Modified: stable/9/sys/kern/vnode_if.src
==============================================================================
--- stable/9/sys/kern/vnode_if.src	Tue Apr 24 19:00:42 2012	(r234659)
+++ stable/9/sys/kern/vnode_if.src	Tue Apr 24 19:08:40 2012	(r234660)
@@ -640,23 +640,31 @@ vop_advise {
 	IN int advice;
 };
 
-# The VOPs below are spares at the end of the table to allow new VOPs to be
-# added in stable branches without breaking the KBI.  New VOPs in HEAD should
-# be added above these spares.  When merging a new VOP to a stable branch,
-# the new VOP should replace one of the spares.
+%% unp_bind	vp	E E E
 
-vop_spare1 {
+vop_unp_bind {
 	IN struct vnode *vp;
+	IN struct socket *socket;
 };
 
-vop_spare2 {
+%% unp_connect	vp	L L L
+
+vop_unp_connect {
 	IN struct vnode *vp;
+	OUT struct socket **socket;
 };
 
-vop_spare3 {
+%% unp_detach	vp	= = =
+
+vop_unp_detach {
 	IN struct vnode *vp;
 };
 
+# The VOPs below are spares at the end of the table to allow new VOPs to be
+# added in stable branches without breaking the KBI.  New VOPs in HEAD should
+# be added above these spares.  When merging a new VOP to a stable branch,
+# the new VOP should replace one of the spares.
+
 vop_spare4 {
 	IN struct vnode *vp;
 };

Modified: stable/9/sys/sys/vnode.h
==============================================================================
--- stable/9/sys/sys/vnode.h	Tue Apr 24 19:00:42 2012	(r234659)
+++ stable/9/sys/sys/vnode.h	Tue Apr 24 19:08:40 2012	(r234660)
@@ -707,6 +707,9 @@ int	vop_stdpathconf(struct vop_pathconf_
 int	vop_stdpoll(struct vop_poll_args *);
 int	vop_stdvptocnp(struct vop_vptocnp_args *ap);
 int	vop_stdvptofh(struct vop_vptofh_args *ap);
+int	vop_stdunp_bind(struct vop_unp_bind_args *ap);
+int	vop_stdunp_connect(struct vop_unp_connect_args *ap);
+int	vop_stdunp_detach(struct vop_unp_detach_args *ap);
 int	vop_eopnotsupp(struct vop_generic_args *ap);
 int	vop_ebadf(struct vop_generic_args *ap);
 int	vop_einval(struct vop_generic_args *ap);
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Comment 12 Mikolaj Golub freebsd_committer 2012-04-24 20:24:35 UTC
State Changed
From-To: open->closed

The fix merged to stable/9.