When mounting a filesystem that contains a socket with nullfs, a new instance of the socket is generated. Processes listening on one instance of this socket do not respond to connect() requests on the other instance. How-To-Repeat: $ mkdir a b $ ./server a/sock & [1] 2093 $ ./client a/sock MESSAGE FROM CLIENT: hello from a client MESSAGE FROM SERVER: hello from the server $ sudo mount -t nullfs a b $ ./client b/sock connect() failed
Attached server.c and client.c I used for the test (found on the net). -- Robert Millan
I found a thread from 2007 with further discussion about this problem: http://lists.freebsd.org/pipermail/freebsd-fs/2007-February/002669.html There's a proposed solution which might cause consistency problems. If I find some time, I'll study the code and try to figure out how to do this properly. No promises though :-)
2011/9/24 Robert Millan <rmh@freebsd.org>: > I found a thread from 2007 with further discussion about this problem: > > http://lists.freebsd.org/pipermail/freebsd-fs/2007-February/002669.html Hi, I've looked at the situation in a bit more detail, for now only with sockets in mind (not named pipes). My understanding is (please correct me if I'm wrong): - nullfs holds reference counts for each vnode, but sockets have their own mechanism for reference counting (so_count / soref / sorele). vnode reference counting doesn't protect against socket being closed, which would leave a stale pointer in the upper nullfs layer. - Increasing the reference count of the socket itself can't be done in null_nodeget() because this function is merely a getter whose call doesn't indicate any meaningful event. - It's not clear to me that there's any event in time where the socket reference can be increased. If mounting a nullfs were that event, then all existing sockets would be soref'ed but we wouldn't be soref'ing future sockets created in the lower layer after the mount. This doesn't seem correct. - Possible solution: null_nodeget() semantics are replaced with something that actually allows vnodes in the upper layer to be created and destroyed. - Possible solution: upper layer has a memory structure to keep track of which sockets in the lower layer have been soref'ed.
On Sun, 25 Sep 2011 17:32:27 +0200 Robert Millan wrote: RM> 2011/9/24 Robert Millan <rmh@freebsd.org>: >> I found a thread from 2007 with further discussion about this problem: >> >> http://lists.freebsd.org/pipermail/freebsd-fs/2007-February/002669.html RM> Hi, RM> I've looked at the situation in a bit more detail, for now only with RM> sockets in mind (not named pipes). My understanding is (please RM> correct me if I'm wrong): RM> - nullfs holds reference counts for each vnode, but sockets have their RM> own mechanism for reference counting (so_count / soref / sorele). RM> vnode reference counting doesn't protect against socket being closed, RM> which would leave a stale pointer in the upper nullfs layer. RM> - Increasing the reference count of the socket itself can't be done in RM> null_nodeget() because this function is merely a getter whose call RM> doesn't indicate any meaningful event. RM> - It's not clear to me that there's any event in time where the socket RM> reference can be increased. If mounting a nullfs were that event, RM> then all existing sockets would be soref'ed but we wouldn't be RM> soref'ing future sockets created in the lower layer after the mount. RM> This doesn't seem correct. RM> - Possible solution: null_nodeget() semantics are replaced with RM> something that actually allows vnodes in the upper layer to be created RM> and destroyed. RM> - Possible solution: upper layer has a memory structure to keep track RM> of which sockets in the lower layer have been soref'ed. It looks like there is no need in setting vp->v_un = lowervp->v_un for VFIFO. They work without this modification bypassing vnode operations to lover node and lowervp->v_un is used. The issue is only with local sockets, because when bind or connnect is called for nullfs file the upper v_un is used. For me the approach "vp->v_un = lowervp->v_un" has many complications. May be it is much easier to use always only lower vnode? What we need for this is to make bind and connect get the lower vnode when they are called on nullfs file. As a proof of concept below is a patch that implements it. Currently I am not sure that vrele/vref magic is done properly, but it looks like it works for me. The issues with this approach I see so far: - we need an additional flag for namei; - nullfs can be unmounted with a socket file still being opened. -- Mikolaj Golub
2011/9/25 Mikolaj Golub <trociny@freebsd.org>: > As a proof of concept below is a patch that implements it. This works very well, I'm currently using your patch to run X11 over a nullfs-mounted /tmp. > The issues with this approach I see so far: > > - we need an additional flag for namei; What does this involve?
RM> 2011/9/25 Mikolaj Golub <trociny@freebsd.org>: >> As a proof of concept below is a patch that implements it. RM> This works very well, I'm currently using your patch to run X11 over a RM> nullfs-mounted /tmp. >> The issues with this approach I see so far: >> >> - we need an additional flag for namei; RM> What does this involve? Well, adding yet another flag just to handle this one case might be not very good idea :-) But actually it is possible to do without the additional flag, with the only hack in nullfs code: in lookup and create return lower vnode if it is a socket, like in the patch below. It works for me but I have not tested much and not checked yet if use cases are possible when this makes undesirable effect. -- Mikolaj Golub
Hi Mikolaj, > But actually it is possible to do without the additional flag, with the only > hack in nullfs code: in lookup and create return lower vnode if it is a > socket, like in the patch below. It works for me but I have not tested much > and not checked yet if use cases are possible when this makes undesirable > effect. I've been using your patch (with 8.1 kernel on amd64) for two months now, and didn't notice any ill effects. Do you have plans on committing it?
On Sun, 27 Nov 2011 18:44:30 +0100 Robert Millan wrote: RM> Hi Mikolaj, Hi >> But actually it is possible to do without the additional flag, with the only >> hack in nullfs code: in lookup and create return lower vnode if it is a >> socket, like in the patch below. It works for me but I have not tested much >> and not checked yet if use cases are possible when this makes undesirable >> effect. RM> I've been using your patch (with 8.1 kernel on amd64) for two months RM> now, and didn't notice any ill effects. RM> Do you have plans on committing it? Thanks for testing! I wouldn't like to commit it without trying to find a better solution. After I got some suggestions from kib@ I hope will be able to come with something better than this patch. -- Mikolaj Golub
Responsible Changed From-To: freebsd-bugs->freebsd-fs Over to maintainer(s).
Author: trociny Date: Wed Feb 29 21:38:31 2012 New Revision: 232317 URL: http://svn.freebsd.org/changeset/base/232317 Log: Introduce VOP_UNP_BIND(), VOP_UNP_CONNECT(), and VOP_UNP_DETACH() operations for setting and accessing vnode's v_socket field. The operations are necessary to implement proper unix socket handling on layered file systems like nullfs(5). This change fixes the long standing issue with nullfs(5) being in that unix sockets did not work between lower and upper layers: if we bound to a socket on the lower layer we could connect only to the lower path; if we bound to the upper layer we could connect only to the upper path. The new behavior is one can connect to both the lower and the upper paths regardless what layer path one binds to. PR: kern/51583, kern/159663 Suggested by: kib Reviewed by: arch MFC after: 2 weeks Modified: head/UPDATING head/sys/kern/uipc_usrreq.c head/sys/kern/vfs_default.c head/sys/kern/vnode_if.src head/sys/sys/vnode.h Modified: head/UPDATING ============================================================================== --- head/UPDATING Wed Feb 29 21:11:02 2012 (r232316) +++ head/UPDATING Wed Feb 29 21:38:31 2012 (r232317) @@ -22,6 +22,14 @@ NOTE TO PEOPLE WHO THINK THAT FreeBSD 10 machines to maximize performance. (To disable malloc debugging, run ln -s aj /etc/malloc.conf.) +20120229: + Now unix domain sockets behave "as expected" on nullfs(5). Previously + nullfs(5) did not pass through all behaviours to the underlying layer, + as a result if we bound to a socket on the lower layer we could connect + only to the lower path; if we bound to the upper layer we could connect + only to the upper path. The new behavior is one can connect to both the + lower and the upper paths regardless what layer path one binds to. + 20120211: The getifaddrs upgrade path broken with 20111215 has been restored. If you have upgraded in between 20111215 and 20120209 you need to Modified: head/sys/kern/uipc_usrreq.c ============================================================================== --- head/sys/kern/uipc_usrreq.c Wed Feb 29 21:11:02 2012 (r232316) +++ head/sys/kern/uipc_usrreq.c Wed Feb 29 21:38:31 2012 (r232317) @@ -542,7 +542,7 @@ restart: UNP_LINK_WLOCK(); UNP_PCB_LOCK(unp); - vp->v_socket = unp->unp_socket; + VOP_UNP_BIND(vp, unp->unp_socket); unp->unp_vnode = vp; unp->unp_addr = soun; unp->unp_flags &= ~UNP_BINDING; @@ -638,7 +638,7 @@ uipc_detach(struct socket *so) * XXXRW: Should assert vp->v_socket == so. */ if ((vp = unp->unp_vnode) != NULL) { - unp->unp_vnode->v_socket = NULL; + VOP_UNP_DETACH(vp); unp->unp_vnode = NULL; } unp2 = unp->unp_conn; @@ -1308,7 +1308,7 @@ unp_connect(struct socket *so, struct so * and to protect simultaneous locking of multiple pcbs. */ UNP_LINK_WLOCK(); - so2 = vp->v_socket; + VOP_UNP_CONNECT(vp, &so2); if (so2 == NULL) { error = ECONNREFUSED; goto bad2; @@ -2318,17 +2318,15 @@ vfs_unp_reclaim(struct vnode *vp) active = 0; UNP_LINK_WLOCK(); - so = vp->v_socket; + VOP_UNP_CONNECT(vp, &so); if (so == NULL) goto done; unp = sotounpcb(so); if (unp == NULL) goto done; UNP_PCB_LOCK(unp); - if (unp->unp_vnode != NULL) { - KASSERT(unp->unp_vnode == vp, - ("vfs_unp_reclaim: vp != unp->unp_vnode")); - vp->v_socket = NULL; + if (unp->unp_vnode == vp) { + VOP_UNP_DETACH(vp); unp->unp_vnode = NULL; active = 1; } Modified: head/sys/kern/vfs_default.c ============================================================================== --- head/sys/kern/vfs_default.c Wed Feb 29 21:11:02 2012 (r232316) +++ head/sys/kern/vfs_default.c Wed Feb 29 21:38:31 2012 (r232317) @@ -123,6 +123,9 @@ struct vop_vector default_vnodeops = { .vop_unlock = vop_stdunlock, .vop_vptocnp = vop_stdvptocnp, .vop_vptofh = vop_stdvptofh, + .vop_unp_bind = vop_stdunp_bind, + .vop_unp_connect = vop_stdunp_connect, + .vop_unp_detach = vop_stdunp_detach, }; /* @@ -1037,6 +1040,30 @@ vop_stdadvise(struct vop_advise_args *ap return (error); } +int +vop_stdunp_bind(struct vop_unp_bind_args *ap) +{ + + ap->a_vp->v_socket = ap->a_socket; + return (0); +} + +int +vop_stdunp_connect(struct vop_unp_connect_args *ap) +{ + + *ap->a_socket = ap->a_vp->v_socket; + return (0); +} + +int +vop_stdunp_detach(struct vop_unp_detach_args *ap) +{ + + ap->a_vp->v_socket = NULL; + return (0); +} + /* * vfs default ops * used to fill the vfs function table to get reasonable default return values. Modified: head/sys/kern/vnode_if.src ============================================================================== --- head/sys/kern/vnode_if.src Wed Feb 29 21:11:02 2012 (r232316) +++ head/sys/kern/vnode_if.src Wed Feb 29 21:38:31 2012 (r232317) @@ -640,6 +640,26 @@ vop_advise { IN int advice; }; +%% unp_bind vp E E E + +vop_unp_bind { + IN struct vnode *vp; + IN struct socket *socket; +}; + +%% unp_connect vp L L L + +vop_unp_connect { + IN struct vnode *vp; + OUT struct socket **socket; +}; + +%% unp_detach vp = = = + +vop_unp_detach { + IN struct vnode *vp; +}; + # The VOPs below are spares at the end of the table to allow new VOPs to be # added in stable branches without breaking the KBI. New VOPs in HEAD should # be added above these spares. When merging a new VOP to a stable branch, Modified: head/sys/sys/vnode.h ============================================================================== --- head/sys/sys/vnode.h Wed Feb 29 21:11:02 2012 (r232316) +++ head/sys/sys/vnode.h Wed Feb 29 21:38:31 2012 (r232317) @@ -703,6 +703,9 @@ int vop_stdpathconf(struct vop_pathconf_ int vop_stdpoll(struct vop_poll_args *); int vop_stdvptocnp(struct vop_vptocnp_args *ap); int vop_stdvptofh(struct vop_vptofh_args *ap); +int vop_stdunp_bind(struct vop_unp_bind_args *ap); +int vop_stdunp_connect(struct vop_unp_connect_args *ap); +int vop_stdunp_detach(struct vop_unp_detach_args *ap); int vop_eopnotsupp(struct vop_generic_args *ap); int vop_ebadf(struct vop_generic_args *ap); int vop_einval(struct vop_generic_args *ap); _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Author: trociny Date: Tue Apr 24 19:08:40 2012 New Revision: 234660 URL: http://svn.freebsd.org/changeset/base/234660 Log: MFC r232317: Introduce VOP_UNP_BIND(), VOP_UNP_CONNECT(), and VOP_UNP_DETACH() operations for setting and accessing vnode's v_socket field. The operations are necessary to implement proper unix socket handling on layered file systems like nullfs(5). This change fixes the long standing issue with nullfs(5) being in that unix sockets did not work between lower and upper layers: if we bound to a socket on the lower layer we could connect only to the lower path; if we bound to the upper layer we could connect only to the upper path. The new behavior is one can connect to both the lower and the upper paths regardless what layer path one binds to. PR: kern/51583, kern/159663 Suggested by: kib Reviewed by: arch Modified: stable/9/UPDATING (contents, props changed) stable/9/sys/kern/uipc_usrreq.c stable/9/sys/kern/vfs_default.c stable/9/sys/kern/vnode_if.src stable/9/sys/sys/vnode.h Directory Properties: stable/9/sys/ (props changed) Modified: stable/9/UPDATING ============================================================================== --- stable/9/UPDATING Tue Apr 24 19:00:42 2012 (r234659) +++ stable/9/UPDATING Tue Apr 24 19:08:40 2012 (r234660) @@ -9,6 +9,14 @@ handbook. Items affecting the ports and packages system can be found in /usr/ports/UPDATING. Please read that file before running portupgrade. +20120422: + Now unix domain sockets behave "as expected" on nullfs(5). Previously + nullfs(5) did not pass through all behaviours to the underlying layer, + as a result if we bound to a socket on the lower layer we could connect + only to the lower path; if we bound to the upper layer we could connect + only to the upper path. The new behavior is one can connect to both the + lower and the upper paths regardless what layer path one binds to. + 20120109: The acpi_wmi(4) status device /dev/wmistat has been renamed to /dev/wmistat0. Modified: stable/9/sys/kern/uipc_usrreq.c ============================================================================== --- stable/9/sys/kern/uipc_usrreq.c Tue Apr 24 19:00:42 2012 (r234659) +++ stable/9/sys/kern/uipc_usrreq.c Tue Apr 24 19:08:40 2012 (r234660) @@ -541,7 +541,7 @@ restart: UNP_LINK_WLOCK(); UNP_PCB_LOCK(unp); - vp->v_socket = unp->unp_socket; + VOP_UNP_BIND(vp, unp->unp_socket); unp->unp_vnode = vp; unp->unp_addr = soun; unp->unp_flags &= ~UNP_BINDING; @@ -637,7 +637,7 @@ uipc_detach(struct socket *so) * XXXRW: Should assert vp->v_socket == so. */ if ((vp = unp->unp_vnode) != NULL) { - unp->unp_vnode->v_socket = NULL; + VOP_UNP_DETACH(vp); unp->unp_vnode = NULL; } unp2 = unp->unp_conn; @@ -1307,7 +1307,7 @@ unp_connect(struct socket *so, struct so * and to protect simultaneous locking of multiple pcbs. */ UNP_LINK_WLOCK(); - so2 = vp->v_socket; + VOP_UNP_CONNECT(vp, &so2); if (so2 == NULL) { error = ECONNREFUSED; goto bad2; @@ -2317,17 +2317,15 @@ vfs_unp_reclaim(struct vnode *vp) active = 0; UNP_LINK_WLOCK(); - so = vp->v_socket; + VOP_UNP_CONNECT(vp, &so); if (so == NULL) goto done; unp = sotounpcb(so); if (unp == NULL) goto done; UNP_PCB_LOCK(unp); - if (unp->unp_vnode != NULL) { - KASSERT(unp->unp_vnode == vp, - ("vfs_unp_reclaim: vp != unp->unp_vnode")); - vp->v_socket = NULL; + if (unp->unp_vnode == vp) { + VOP_UNP_DETACH(vp); unp->unp_vnode = NULL; active = 1; } Modified: stable/9/sys/kern/vfs_default.c ============================================================================== --- stable/9/sys/kern/vfs_default.c Tue Apr 24 19:00:42 2012 (r234659) +++ stable/9/sys/kern/vfs_default.c Tue Apr 24 19:08:40 2012 (r234660) @@ -123,6 +123,9 @@ struct vop_vector default_vnodeops = { .vop_unlock = vop_stdunlock, .vop_vptocnp = vop_stdvptocnp, .vop_vptofh = vop_stdvptofh, + .vop_unp_bind = vop_stdunp_bind, + .vop_unp_connect = vop_stdunp_connect, + .vop_unp_detach = vop_stdunp_detach, }; /* @@ -1037,6 +1040,30 @@ vop_stdadvise(struct vop_advise_args *ap return (error); } +int +vop_stdunp_bind(struct vop_unp_bind_args *ap) +{ + + ap->a_vp->v_socket = ap->a_socket; + return (0); +} + +int +vop_stdunp_connect(struct vop_unp_connect_args *ap) +{ + + *ap->a_socket = ap->a_vp->v_socket; + return (0); +} + +int +vop_stdunp_detach(struct vop_unp_detach_args *ap) +{ + + ap->a_vp->v_socket = NULL; + return (0); +} + /* * vfs default ops * used to fill the vfs function table to get reasonable default return values. Modified: stable/9/sys/kern/vnode_if.src ============================================================================== --- stable/9/sys/kern/vnode_if.src Tue Apr 24 19:00:42 2012 (r234659) +++ stable/9/sys/kern/vnode_if.src Tue Apr 24 19:08:40 2012 (r234660) @@ -640,23 +640,31 @@ vop_advise { IN int advice; }; -# The VOPs below are spares at the end of the table to allow new VOPs to be -# added in stable branches without breaking the KBI. New VOPs in HEAD should -# be added above these spares. When merging a new VOP to a stable branch, -# the new VOP should replace one of the spares. +%% unp_bind vp E E E -vop_spare1 { +vop_unp_bind { IN struct vnode *vp; + IN struct socket *socket; }; -vop_spare2 { +%% unp_connect vp L L L + +vop_unp_connect { IN struct vnode *vp; + OUT struct socket **socket; }; -vop_spare3 { +%% unp_detach vp = = = + +vop_unp_detach { IN struct vnode *vp; }; +# The VOPs below are spares at the end of the table to allow new VOPs to be +# added in stable branches without breaking the KBI. New VOPs in HEAD should +# be added above these spares. When merging a new VOP to a stable branch, +# the new VOP should replace one of the spares. + vop_spare4 { IN struct vnode *vp; }; Modified: stable/9/sys/sys/vnode.h ============================================================================== --- stable/9/sys/sys/vnode.h Tue Apr 24 19:00:42 2012 (r234659) +++ stable/9/sys/sys/vnode.h Tue Apr 24 19:08:40 2012 (r234660) @@ -707,6 +707,9 @@ int vop_stdpathconf(struct vop_pathconf_ int vop_stdpoll(struct vop_poll_args *); int vop_stdvptocnp(struct vop_vptocnp_args *ap); int vop_stdvptofh(struct vop_vptofh_args *ap); +int vop_stdunp_bind(struct vop_unp_bind_args *ap); +int vop_stdunp_connect(struct vop_unp_connect_args *ap); +int vop_stdunp_detach(struct vop_unp_detach_args *ap); int vop_eopnotsupp(struct vop_generic_args *ap); int vop_ebadf(struct vop_generic_args *ap); int vop_einval(struct vop_generic_args *ap); _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
State Changed From-To: open->closed The fix merged to stable/9.