Bug 251363 - use unionfs as a disk-cache for NFS [feature]
Summary: use unionfs as a disk-cache for NFS [feature]
Status: Closed Not Accepted
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.2-RELEASE
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-fs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-11-25 03:30 UTC by Gunther Schadow
Modified: 2021-06-02 00:15 UTC (History)
4 users (show)

See Also:


Attachments
the patch as a separate file (5.05 KB, patch)
2020-11-26 02:57 UTC, Gunther Schadow
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Gunther Schadow 2020-11-25 03:30:43 UTC
I just asked a question on stackexchange and developed a solution which could be useful to others too. May I refer you to the detail (nicely written up and laid out with source code edits to all but get this done here: https://unix.stackexchange.com/questions/621430/has-anybody-heard-of-a-trick-to-make-a-local-hard-drive-based-cache-of-nfs-p

Summary:

Problem: NFS can be slow, when starting binaries (e.g. /usr/bin) over NFS, such as in a network booted system. RAM buffer cache might not be sufficient to avoid slowness.

Idea: it seems we should be able to have a local disk cache which would save the file(s) locally as they are pulled from NFS. The unionfs kernel module (mount_unionfs) has a copy-on-write feature, and with only fairly few simple changes in the source code, we could turn this into a copy-always (copy-on-read) disk cache for NFS. Here is the /etc/fstab file that I am thinking:

# Custom /etc/fstab for FreeBSD VM images
/dev/gpt/rootfs  /        ufs      rw      1       1
/dev/gpt/varfs   /var     ufs      rw      1       1
fdesc            /dev/fd  fdescfs  rw      0       0
proc             /proc    procfs   rw      0       0
/usr             /.usr    nullfs   rw      0       0
fs-xxxxxxxx.efs.rrrr.amazonaws.com:/ /usr nfs rw,nfsv4,minorversion=1,oneopenown,...,late,bg 0 0
/dev/gpt/cachefs /usr     unionfs  rw,copypolicy=always,copymode=transparent,whiteout=whenneeded 0 0

in my source code walk-through I am showing how this can be implemented. I don't have a source tree installed currently and only an AWS instance at hand, so it will take me some time to actually do this. But I wanted to throw it out here to see if others find it cool and useful and if they want to review/criticize my code proposal already or run with it?
Comment 1 Gunther Schadow 2020-11-25 05:02:37 UTC
Update, I have applied the patch of the 12.2-RELEASE kernel source and building it now.
Comment 2 Gunther Schadow 2020-11-25 05:15:11 UTC
Cross references on the subject:

Feature request and mentioning Solaris cachefs and Linux FS-cache: 
https://lists.freebsd.org/pipermail/freebsd-questions/2008-August/181446.html
Comment 3 Gunther Schadow 2020-11-25 05:16:30 UTC
... the above request answered here: https://lists.freebsd.org/pipermail/freebsd-fs/2008-September/005140.html, it was acknowledged but said that nothing exists.
Comment 4 Gunther Schadow 2020-11-25 05:18:04 UTC
Packrat patches for NFSv4, experimental: https://people.freebsd.org/~rmacklem/
Comment 5 Gunther Schadow 2020-11-25 15:57:02 UTC
Updated the stackexchange link with a draft design to allow to use rm and rmdir to evict entries from the cache, new whitemode UNIONFS_WHITE_NEVER would do that. I love how the kernel sources seem to be quite clean and it is easy to add these improvements. At least that's my impression before actually having run any of them (kernel is still compiling on a AWS EC2 t2.micro instance with no remaining CPU credits -- reminds me of compiling the 386BSD kernel in 1992 on my i486 PC (or the Ultix 4.2 kernel on the VAX 6000-340 ;-) )
Comment 6 Gunther Schadow 2020-11-26 01:01:49 UTC
Success! Here is my full diff in /usr/src/sys/fs/unionfs
# svnlite diff -x-w
Index: union.h
===================================================================
--- union.h     (revision 368012)
+++ union.h     (working copy)
@@ -40,6 +40,12 @@

 #ifdef _KERNEL

+/* copy policy from lower to upper layer */
+typedef enum _unionfs_copypolicy {
+        UNIONFS_COPY_ON_WRITE = 0,
+        UNIONFS_COPY_ALWAYS
+} unionfs_copypolicy;
+
 /* copy method of attr from lower to upper */
 typedef enum _unionfs_copymode {
        UNIONFS_TRADITIONAL = 0,
@@ -57,6 +63,7 @@
        struct vnode   *um_lowervp;     /* VREFed once */
        struct vnode   *um_uppervp;     /* VREFed once */
        struct vnode   *um_rootvp;      /* ROOT vnode */
+        unionfs_copypolicy um_copypolicy;
        unionfs_copymode um_copymode;
        unionfs_whitemode um_whitemode;
        uid_t           um_uid;
Index: union_vfsops.c
===================================================================
--- union_vfsops.c      (revision 368012)
+++ union_vfsops.c      (working copy)
@@ -89,6 +89,7 @@
        gid_t           gid;
        u_short         udir;
        u_short         ufile;
+       unionfs_copypolicy copypolicy;
        unionfs_copymode copymode;
        unionfs_whitemode whitemode;
        struct nameidata nd, *ndp;
@@ -102,6 +103,7 @@
        gid = 0;
        udir = 0;
        ufile = 0;
+       copypolicy = UNIONFS_COPY_ON_WRITE;
        copymode = UNIONFS_TRANSPARENT; /* default */
        whitemode = UNIONFS_WHITE_ALWAYS;
        ndp = &nd;
@@ -190,6 +192,20 @@
                                return (EINVAL);
                        }
                }
+               if (vfs_getopt(mp->mnt_optnew, "copypolicy", (void **)&tmp,
+                              NULL) == 0) {
+                 if (tmp == NULL) {
+                   vfs_mount_error(mp, "Invalid copy policy");
+                   return (EINVAL);
+                 } else if (strcasecmp(tmp, "always") == 0)
+                   copypolicy = UNIONFS_COPY_ALWAYS;
+                 else if (strcasecmp(tmp, "onwrite") == 0)
+                   copypolicy = UNIONFS_COPY_ON_WRITE;
+                 else {
+                   vfs_mount_error(mp, "Invalid copy policy");
+                   return (EINVAL);
+                 }
+               }
                if (vfs_getopt(mp->mnt_optnew, "copymode", (void **)&tmp,
                    NULL) == 0) {
                        if (tmp == NULL) {
@@ -229,7 +245,7 @@

        UNIONFSDEBUG("unionfs_mount: uid=%d, gid=%d\n", uid, gid);
        UNIONFSDEBUG("unionfs_mount: udir=0%03o, ufile=0%03o\n", udir, ufile);
-       UNIONFSDEBUG("unionfs_mount: copymode=%d\n", copymode);
+        UNIONFSDEBUG("unionfs_mount: copypolicy=%d, copymode=%d, whitemode=%d\n", copypolicy, copymode, whitemode);

        /*
         * Find upper node
@@ -265,6 +281,7 @@
        ump->um_gid = gid;
        ump->um_udir = udir;
        ump->um_ufile = ufile;
+       ump->um_copypolicy = copypolicy;
        ump->um_copymode = copymode;
        ump->um_whitemode = whitemode;

Index: union_vnops.c
===================================================================
--- union_vnops.c       (revision 368012)
+++ union_vnops.c       (working copy)
@@ -464,6 +464,7 @@
 {
        int             error;
        struct unionfs_node *unp;
+       struct unionfs_mount *ump;
        struct unionfs_node_status *unsp;
        struct vnode   *uvp;
        struct vnode   *lvp;
@@ -477,6 +478,7 @@

        error = 0;
        unp = VTOUNIONFS(ap->a_vp);
+       ump = MOUNTTOUNIONFSMOUNT(ap->a_vp->v_mount);
        uvp = unp->un_uppervp;
        lvp = unp->un_lowervp;
        targetvp = NULLVP;
@@ -498,7 +500,7 @@
        }
        if (targetvp == NULLVP) {
                if (uvp == NULLVP) {
-                       if ((ap->a_mode & FWRITE) && lvp->v_type == VREG) {
+                       if (((ap->a_mode & FWRITE) || (ump->um_copypolicy == UNIONFS_COPY_ALWAYS)) && lvp->v_type == VREG) {
                                error = unionfs_copyfile(unp,
                                    !(ap->a_mode & O_TRUNC), cred, td);
                                if (error != 0)
r


Testing:

# mdconfig -a -t vnode /data/cachfs
md0
newfs /dev/md0
/dev/md0: 1024.0MB (2097152 sectors) block size 32768, fragment size 4096
        using 4 cylinder groups of 256.03MB, 8193 blks, 32896 inodes.
super-block backups (for fsck_ffs -b #) at:
 192, 524544, 1048896, 1573248
# mkdir /cache
# mount /dev/md0 /cache
# mount_unionfs -o copypolicy=always /cache /worker-efs
# /efs/bin/less
...
# /efs/local/bin/python3.7
...
# find /cache
cache
cache/.snap
cache/bin
cache/bin/less
cache/sbin
cache/local
cache/local/bin
cache/local/bin/python3.7
cache/local/lib
cache/local/lib/python3.7
cache/local/lib/python3.7/lib-dynload
cache/local/lib/python3.7/lib-dynload/readline.so
cache/local/lib/python3.7/encodings
cache/local/lib/python3.7/encodings/__pycache__
cache/local/lib/python3.7/encodings/__pycache__/__init__.cpython-37.pyc
cache/local/lib/python3.7/encodings/__pycache__/aliases.cpython-37.pyc
cache/local/lib/python3.7/encodings/__pycache__/utf_8.cpython-37.pyc
cache/local/lib/python3.7/encodings/__pycache__/latin_1.cpython-37.pyc
cache/local/lib/python3.7/__pycache__
cache/local/lib/python3.7/__pycache__/codecs.cpython-37.pyc
cache/local/lib/python3.7/__pycache__/io.cpython-37.pyc
cache/local/lib/python3.7/__pycache__/abc.cpython-37.pyc
cache/local/lib/python3.7/__pycache__/site.cpython-37.pyc
cache/local/lib/python3.7/__pycache__/os.cpython-37.pyc
cache/local/lib/python3.7/__pycache__/stat.cpython-37.pyc
cache/local/lib/python3.7/__pycache__/_collections_abc.cpython-37.pyc
cache/local/lib/python3.7/__pycache__/posixpath.cpython-37.pyc
cache/local/lib/python3.7/__pycache__/genericpath.cpython-37.pyc
cache/local/lib/python3.7/__pycache__/_sitebuiltins.cpython-37.pyc
cache/local/lib/python3.7/__pycache__/_bootlocale.cpython-37.pyc
cache/local/lib/python3.7/__pycache__/rlcompleter.cpython-37.pyc
cache/local/lib/python3.7/site-packages
cache/local/lib/python3.7/site-packages/easy-install.pth
...

So this is really cool, I can even evict the cache by just writing to the unionfs!

This is nice, so I might not even need that UNIONFS_WIHTE_NEVER.
Comment 7 Gunther Schadow 2020-11-26 01:08:02 UTC
Erratum: I said "I can evict from the cache by writing to the unionfs" -- I meant by removing files from that /cache file system. I wasn't clear that in order to mount a unionfs, you must already have the upper layer file system mounted somewhere. And then you can access just the upper layer, where we can clear the cache. So I do not need that UNIONFS_WHITE_NEVER option at all.

What might be cool was some kind of hook to register if the unionfs_copyfile errors with no space left on device, we could call that evictor hook, which then would delete from the upper layer by oldest atime.
Comment 8 Gunther Schadow 2020-11-26 02:32:37 UTC
Another erratum, I didn't realize at first the unionfs doesn't even require a block device nor a mount-point directory, but rather, it can use any directory. This is awesome, so none of this mdconfig and newfs suff is needed at all. Forget that part of my test.
Comment 9 Gunther Schadow 2020-11-26 02:50:15 UTC
For completeness sake, I updated the mount_unionfs.8 man file also.

Full diff out of /usr/src with svnlite diff -x-w:

Index: sbin/mount_unionfs/mount_unionfs.8
===================================================================
--- sbin/mount_unionfs/mount_unionfs.8  (revision 368012)
+++ sbin/mount_unionfs/mount_unionfs.8  (working copy)
@@ -83,6 +83,16 @@
 However,
 .Ar uniondir
 remains the mount point.
+.It Cm copypolicy No = Cm onwrite | always
+The normal behavior is copy-on-write with
+.Cm onwrite
+this is the standaed unionfs behavior. If
+.Cm always
+is spefied, then the a copy is made on the upper layer even if the
+file or directory is accessed only for reading. This effectively
+turns the upper layer into a cache which is an easy way to implement
+a disk cache over an NFS filesystem, speeding up all subsequent accesses,
+very useful for network booted systems.
 .It Cm copymode No = Cm traditional | transparent | masquerade
 Specifies the way to create a file or a directory in the upper layer
 automatically when needed.
Index: sys/fs/unionfs/union.h
===================================================================
--- sys/fs/unionfs/union.h      (revision 368012)
+++ sys/fs/unionfs/union.h      (working copy)
@@ -40,6 +40,12 @@

 #ifdef _KERNEL

+/* copy policy from lower to upper layer */
+typedef enum _unionfs_copypolicy {
+        UNIONFS_COPY_ON_WRITE = 0,
+        UNIONFS_COPY_ALWAYS
+} unionfs_copypolicy;
+
 /* copy method of attr from lower to upper */
 typedef enum _unionfs_copymode {
        UNIONFS_TRADITIONAL = 0,
@@ -57,6 +63,7 @@
        struct vnode   *um_lowervp;     /* VREFed once */
        struct vnode   *um_uppervp;     /* VREFed once */
        struct vnode   *um_rootvp;      /* ROOT vnode */
+        unionfs_copypolicy um_copypolicy;
        unionfs_copymode um_copymode;
        unionfs_whitemode um_whitemode;
        uid_t           um_uid;
Index: sys/fs/unionfs/union_vfsops.c
===================================================================
--- sys/fs/unionfs/union_vfsops.c       (revision 368012)
+++ sys/fs/unionfs/union_vfsops.c       (working copy)
@@ -89,6 +89,7 @@
        gid_t           gid;
        u_short         udir;
        u_short         ufile;
+       unionfs_copypolicy copypolicy;
        unionfs_copymode copymode;
        unionfs_whitemode whitemode;
        struct nameidata nd, *ndp;
@@ -102,6 +103,7 @@
        gid = 0;
        udir = 0;
        ufile = 0;
+       copypolicy = UNIONFS_COPY_ON_WRITE;
        copymode = UNIONFS_TRANSPARENT; /* default */
        whitemode = UNIONFS_WHITE_ALWAYS;
        ndp = &nd;
@@ -190,6 +192,20 @@
                                return (EINVAL);
                        }
                }
+               if (vfs_getopt(mp->mnt_optnew, "copypolicy", (void **)&tmp,
+                              NULL) == 0) {
+                 if (tmp == NULL) {
+                   vfs_mount_error(mp, "Invalid copy policy");
+                   return (EINVAL);
+                 } else if (strcasecmp(tmp, "always") == 0)
+                   copypolicy = UNIONFS_COPY_ALWAYS;
+                 else if (strcasecmp(tmp, "onwrite") == 0)
+                   copypolicy = UNIONFS_COPY_ON_WRITE;
+                 else {
+                   vfs_mount_error(mp, "Invalid copy policy");
+                   return (EINVAL);
+                 }
+               }
                if (vfs_getopt(mp->mnt_optnew, "copymode", (void **)&tmp,
                    NULL) == 0) {
                        if (tmp == NULL) {
@@ -229,7 +245,7 @@

        UNIONFSDEBUG("unionfs_mount: uid=%d, gid=%d\n", uid, gid);
        UNIONFSDEBUG("unionfs_mount: udir=0%03o, ufile=0%03o\n", udir, ufile);
-       UNIONFSDEBUG("unionfs_mount: copymode=%d\n", copymode);
+        UNIONFSDEBUG("unionfs_mount: copypolicy=%d, copymode=%d, whitemode=%d\n", copypolicy, copymode, whitemode);

        /*
         * Find upper node
@@ -265,6 +281,7 @@
        ump->um_gid = gid;
        ump->um_udir = udir;
        ump->um_ufile = ufile;
+       ump->um_copypolicy = copypolicy;
        ump->um_copymode = copymode;
        ump->um_whitemode = whitemode;

Index: sys/fs/unionfs/union_vnops.c
===================================================================
--- sys/fs/unionfs/union_vnops.c        (revision 368012)
+++ sys/fs/unionfs/union_vnops.c        (working copy)
@@ -464,6 +464,7 @@
 {
        int             error;
        struct unionfs_node *unp;
+       struct unionfs_mount *ump;
        struct unionfs_node_status *unsp;
        struct vnode   *uvp;
        struct vnode   *lvp;
@@ -477,6 +478,7 @@

        error = 0;
        unp = VTOUNIONFS(ap->a_vp);
+       ump = MOUNTTOUNIONFSMOUNT(ap->a_vp->v_mount);
        uvp = unp->un_uppervp;
        lvp = unp->un_lowervp;
        targetvp = NULLVP;
@@ -498,7 +500,7 @@
        }
        if (targetvp == NULLVP) {
                if (uvp == NULLVP) {
-                       if ((ap->a_mode & FWRITE) && lvp->v_type == VREG) {
+                       if (((ap->a_mode & FWRITE) || (ump->um_copypolicy == UNIONFS_COPY_ALWAYS)) && lvp->v_type == VREG) {
                                error = unionfs_copyfile(unp,
                                    !(ap->a_mode & O_TRUNC), cred, td);
                                if (error != 0)
r
Comment 10 Gunther Schadow 2020-11-26 02:54:07 UTC
(forget the r in last line above)

I read the Handbook about how contributions are submitted and I am glad to have found out that they are submitted on bug tickets. So that is what I did.

I think you want to include this in some of the next releases. It's such a useful and inoffensive new feature!
Comment 11 Gunther Schadow 2020-11-26 02:57:20 UTC
Created attachment 219985 [details]
the patch as a separate file
Comment 12 Gunther Schadow 2020-11-28 03:48:07 UTC
While it works in principle, there is a little problem.

Here is on a system which actually has /usr mounted via NFS. Root sets up the unionfs cache:

# mkdir /var/cache/
# mount -t unionfs -o copypolicy=always /var/cache/usr /usr
# df
Filesystem                                        1K-blocks              Used            Avail Capacity  Mounted on
/dev/gpt/rootfs                                      507280            245844           220856    53%    /
devfs                                                     1                 1                0   100%    /dev
/dev/gpt/varfs                                       507536            158192           308744    34%    /var
fdescfs                                                   1                 1                0   100%    /dev/fd
procfs                                                    4                 4                0   100%    /proc
/usr                                                 507280            245844           220856    53%    /.usr
fs-ca0cf63f.efs.us-east-1.amazonaws.com:/  9007199254739968           6639576 9007199248100392     0%    /usr
<above>:/var/cache/usr                    -9007199254234480 -9007199254583824           308744   100%    /usr

Now another user just uses which(1)

$ which java
bash: /usr/bin/which: Permission denied
$ ls -l /usr/bin/which
-r-xr-xr-x  1 root  wheel  7816 Oct 23 06:48 /usr/bin/which
$ ls -l /var/cache/usr/bin
total 0

oh, so the copying of the file did not work? Why?

$ find /var/cache/ -ls
 16394        8 drwxr-xr-x    3 root wheel   512 Nov 26 23:12 /var/cache/
 16384        0 lrwxr-xr-x    1 root wheel    18 Oct 31 18:16 /var/cache/pkg -> /usr/var/cache/pkg
 16407        8 drwxr-xr-x    8 root wheel   512 Nov 28 03:10 /var/cache/usr
 16387        8 drwxr-xr-x    5 root wheel   512 Nov 28 03:07 /var/cache/usr/local
 16410        8 drwxr-xr-x    2 root wheel   512 Oct 31 03:59 /var/cache/usr/local/etc
 16422        8 drwxr-xr-x    2 root wheel   512 Oct 31 03:59 /var/cache/usr/local/share
 16423        8 drwxr-xr-x    2 root wheel   512 Oct 31 03:59 /var/cache/usr/local/lib
 16416        8 drwxr-xr-x    2 root wheel   512 Oct 23 06:50 /var/cache/usr/sbin
 16417        8 drwxr-xr-x    2 root wheel   512 Oct 31 18:20 /var/cache/usr/bin
 16418        8 drwxr-xr-x    4 root wheel   512 Nov 28 03:07 /var/cache/usr/share
 16420        8 drwxr-xr-x    3 root wheel   512 Nov 28 03:07 /var/cache/usr/share/nls
 16421        8 drwxr-xr-x    2 root wheel   512 Oct 23 06:49 /var/cache/usr/share/nls/C
 16431        8 drwxr-xr-x    2 root wheel   512 Oct 23 06:48 /var/cache/usr/share/zoneinfo
 16424        8 drwxr-xr-x    2 root wheel   512 Nov 28 03:10 /var/cache/usr/lib
 16440      104 -r--r--r--    1 root wheel 53048 Oct 23 06:45 /var/cache/usr/lib/libpam.so.6
 16434        8 drwxr-xr-x    2 root wheel   512 Nov 28 03:10 /var/cache/usr/libexec
 16439       40 -r-xr-xr-x    1 root wheel 16576 Oct 23 06:45 /var/cache/usr/libexec/atrun

Could it be that the problem comes when it's not the root user who is trying to access the unionfs system? Yes, that could be! Because the expectation of unionfs_copyfile(...) is that the user actually had write permission!

Prove that, as toot:

# which java
/usr/local/bin/java

and now over for the normal user:

$ which java
/usr/local/bin/java

working too now, and indeed it was copied;

$ ls -l /var/cache/usr/bin/which
-r-xr-xr-x  1 root  wheel  7816 Oct 23 06:48 /var/cache/usr/bin/which

So, to fix this, what I need is a way to run unionfs_copyfile(...) under root permissions, it being inside the kernel I thought that's what it would do anyway. But apparently it doesn't and apparently this issue doesn't get to be seen.

But what about standard unionfs use as copy on write. What if the lower layer has a file with user root, with group write permission, and now I come as the non-root user, but in the same group, and wanting to write this file.

Let's try.

# mkdir test-lower
# mkdir test-upper
# ls -l test-*
drwxr-xr-x  2 root      wheel    512 Nov 28 03:34 test-lower
drwxr-xr-x  2 root      wheel    512 Nov 28 03:34 test-upper
# echo "I am root" > test-lower/data
# chmod 664 test-lower/data
# mount -t unionfs test-upper test-lower
# df
Filesystem                                       1K-blocks    Used            Avail Capacity  Mounted on
...
<above>:/tmp/test-upper                            1014560  753136           220844    77%    /tmp/test-lower

$ cat test-lower/data
I am root
$ echo "But I'm not root" > test-lower/data
bash: test-lower/data: Permission denied

Look at that! But without unionfs I was able to do that! Let's unmount that unionfs

# umount test-lower

and try again:

$ echo "But I'm not root" > test-lower/data
$ cat test-lower/data
But I'm not root

So, as we can see, even your normal unionfs is not really transparent! The copy should have been made with the exact same modes and user:group.

The man page says something that is not actually happening:

             copymode = traditional | transparent | masquerade
                     Specifies the way to create a file or a directory in the
                     upper layer automatically when needed.  The traditional
                     mode uses the same way as the old unionfs for backward
                     compatibility, and transparent duplicates the file and
                     directory mode bits and the ownership in the lower layer
                     to the created file in the upper layer.  For behavior of
                     the masquerade mode, see MASQUERADE MODE below.

"transparent duplicates the file and directory mode bits and the ownership in the lower layer to the created file in the upper layer."

And that is totally not what's happening! Looks like a bug in unionfs to me now. But one that I'm determined to squash.
Comment 13 Gunther Schadow 2020-11-28 05:05:27 UTC
Further confirmation of the problem, which is now a bug of unionfs not following its own specification for copymode = transparent.

The files do not get copied with the original owner credentials. Proof:

# chmod g+w test-upper
# mount -t unionfs test-upper test-lower
# rm test-upper/data

$ cat test-lower/data
I'm root
$ echo "But I'm not root, and I mean it!" > test-lower/data
$ ls -l test-upper/
total 4
-rw-rw-r--  1 user wheel  33 Nov 28 04:20 data

so now the file is owned by user

$ rm test-upper/data
$ ls -l test-lower/
total 4
-rw-rw-r--  1 root  wheel  17 Nov 28 03:39 data

so clearly the user is not the right one. Why?

in unionfs_subr.c:
----------------------------------------------------------
void
unionfs_create_uppervattr_core(struct unionfs_mount *ump,
                               struct vattr *lva,
                               struct vattr *uva,
                               struct thread *td)
{
        VATTR_NULL(uva);
        uva->va_type = lva->va_type;
        uva->va_atime = lva->va_atime;
        uva->va_mtime = lva->va_mtime;
        uva->va_ctime = lva->va_ctime;

        switch (ump->um_copymode) {
        case UNIONFS_TRANSPARENT:
                uva->va_mode = lva->va_mode;
                uva->va_uid = lva->va_uid;
                uva->va_gid = lva->va_gid;
                break;
----------------------------------------------------------

it should have been done! What if this is an issue with my code changes? No, I verified it, it is happening on the original unionfs.ko just the same.

I will make another ticket from that, because it's not my fault. But it's a major road block to getting this to work.
Comment 14 Gunther Schadow 2020-11-28 05:46:15 UTC
Cross referencing the bug 251433 I just filed, since that is in fact a real bug.

The question is does it really matter to me? What is my workaround?

Clearly I need to be able to make the copy on the upper layer. And it is root that must make that copy. The effective user id to decide if the VOP_CREATE is allowed must be root, or whatever user did the mount_unionfs. Not the user accessing the file right now.

The owner uid and gid should be exactly that from the lower layer. Not the euid of the user accessing the file.

I need to understand how this VOP_CREATE ultimately works. 

Note that it seems to work right with the shadow directories. So that might be a clue.
Comment 15 Gunther Schadow 2020-11-29 23:16:49 UTC
See the related bug 251433 for the completion of this work as the unionfs_copyfile now works with root cred and the copymode=transparent will properly reflect the owner uid and gid.

There is one problem mentioned on that ticket that in theory provides write permissions as root the first time the copyfile from lower to upper layer is performed. But that is worked around / guarded by checking with VOP_ACCESS to make sure that the calling user has all requested permissions before we do the copyfile.
Comment 16 Gunther Schadow 2020-12-23 17:18:00 UTC
Bump. Ping! Any comments? Can this go into the next release?
Comment 17 Wayne Willcox 2021-03-09 23:32:23 UTC
Looks pretty interesting to me I was just looking for a way to cache files from nfs.
Comment 18 Kirk McKusick freebsd_committer 2021-05-18 20:46:33 UTC
Using the union filesystem to cache NFS is a bad idea. The whole point of NFS is to maintain cache coherency between the clients and the server. If a client writes a file, NFS will ensure that all the other NFS clients can see those changes. If a new copy of a binary is installed on the server, NFS will ensure that all the clients see the updated binary. The union filesystem provides no coherency at all.

NFS already does caching using the buffer cache. The buffer cache is a better way to cache binaries as it only holds the pages of the binary that you are actively using. By contrast unionfs is going to copy the entire binary image before it can start up. And it will hold all of the binary in memory even if only a small part of it is being used.

This technique may be usable in your environment, but it is absolutely not useful as a general-purpose tool.
Comment 19 Gunther Schadow 2021-05-19 02:27:24 UTC
I am certainly honored to be replied to by the one and only Kirk McKusick, and of course I understand the cache consistency issue. HOWEVER, I am sure you will also accept that the buffer cache is a very limited resource and generally too small to hold a lot of binaries just in case.

The truth is that /bin, /usr/bin, /lib and all that good stuff hardly never changes in a production system except for times of controlled upgrades. Therefore the argument that the UNIONFS cache would become stale is not really hard hitting.

I think my approach has a very good purpose when and where applied with an understanding of the consequences.

Finally, if you really wanted to do an unlimited cache that is kept consistent, then nothing should stop the addition of some additional update checks before every access of the local cache on the source NFS, and likewise, writes could be written through to the NFS backend. In other words, NFS could have a local unlimited size disk cache feature. But I doubt anyone would have fulfilled my wish, so I went ahead and provided small improvements to the UNIONFS code to help me with my imperfect solution that's nevertheless good enough for me.
Comment 20 Kirk McKusick freebsd_committer 2021-05-19 04:52:35 UTC
(In reply to Gunther Schadow from comment #19)
The buffer cache is no longer a limited resource as it has become merged with the VM cache. Actually, the buffer cache is just a compatibility shim over the VM cache. The point is that the entire physical memory is managed by the VM cache. Most of the pages are associated with pages in filesystems and some are associated with running programs (stacks and heap pages). So whether you use those pages as part of NFS or the memory filesystem aggregated through the union filesystem, they are all coming from the same pool. Given the broken semantics and extra overhead of the union filesystem, you are better off letting NFS rather than the union and memory filesystem manage the pages.

The role of the union filesystem is to allow the appearance of modification of read-only media like DVDs or CD-ROMs.
Comment 21 Rick Macklem freebsd_committer 2021-05-19 14:24:13 UTC
A lack of any cache consistency guarantees is a major
limitation and I tend to agree with Kirk that this
should not be recommended practice.
If it works for you, that's fine, but others would
not understand the limitations of not having cache
consistency and that would probably result in bug
reports related to it.

When I wrote the packrats code, cache consistency was
maintained by the NFSv4 delegation.
Although the code basically worked (I had not written
the recovery code needed after a client reboot), I
left the bits to rot because I did not think it was
widely useful. I may eventually revisit that, since
I can see cases where it could be useful. My intent was
to handle laptops mounting through wifi/wan connections
better, since those are often high latency connections.
I suspect that some of your performance improvement is
a result of the lack of cache consistency. For example,
flushing of all written data to the server upon close(2) can be a big
performance hit, but is required for close->open consistency.
--> I honestly do not know how the Linux fscache code
    works, but I suspect that some variant of cache
    consistency is maintained by it.

Another issue is that the comment in the bugs section
of the mount_unionfs man page still applies (and is in
caps). I don't know the details w.r.t. how unionfs is
broken, but if it were easy to fix, someone would have
done so. I do know that people report issues w.r.t. using
unionfs on an NFS mount and I simply point out to them
that it is broken.
It does not make much sense to implement a new feature on
top of a broken one, imho.

kib@ is probably the only person who knows what the breakage
in unionfs is and what needs to be done to fix it.

rick
ps: I don't think comments like "the one and only" are
    appreciated in the context you used it.
Comment 22 Kirk McKusick freebsd_committer 2021-06-02 00:15:18 UTC
I have closed this proposed extension because it provides incorrect consistency semantics as explained in comments 18, 20, and 21.