Bug 235299

Summary: remove limitation in NFS directory exports
Product: Base System Reporter: Josh Paetzel <jpaetzel>
Component: kernAssignee: Andriy Gapon <avg>
Status: New ---    
Severity: Affects Many People CC: rmacklem
Priority: ---    
Version: 12.0-RELEASE   
Hardware: Any   
OS: Any   

Description Josh Paetzel freebsd_committer freebsd_triage 2019-01-29 15:26:33 UTC
Currently mountd doesn't really know about directory level exports.

If you have the following filesystem: /storage with the following directories A and B and you try the following exports:

/storage/A -maproot=root:wheel 10.0.0.1
/storage/B -ro 10.0.0.1

mountd will explode because it thinks you are exporting the same filesystem to the same host with different options.
Comment 1 Andriy Gapon freebsd_committer freebsd_triage 2019-01-30 13:14:15 UTC
The error actually seems to come from vfs_hang_addrlist() in the kernel.
Comment 2 Rick Macklem freebsd_committer freebsd_triage 2019-01-30 23:21:50 UTC
That is correct behaviour. The example shows /storage being exported
as read/write for 10.0.0.2 in the first line and read-only in the second
line.

Exports are per-filesystem in the kernel.

The exporting of different directories within a file system are
referred to in "man exports" as administrative controls and only
define which directories an NFSv3 client can mount via the Mount
protocol. For NFSv4, they are meaningless, since the Mount protocol
is not used.

Personally, I would have preferred that these "administrative controls"
did not exist, but I lost those arguments. (In the early days, it was
because Sun supported them. More recently, it came up when a utility
called nfse was being looked at as a replacement for mountd and it did
not support these "administrative controls" and that was considered
a POLA violation.

You are welcome to try and make "man exports" explain this more clearly,
since my attempts at it have never been successful.

rick
ps: Making the kernel understand directory exports is basically not
    feasible, since only mount structures remain in the kernel and no
    knowledge of directory subtrees exists in the kernel.
Comment 3 Josh Paetzel freebsd_committer freebsd_triage 2019-01-30 23:56:09 UTC
ok, that being said I have another idea then.

What if we pretended directories were filesystems?  Since directory names have to be unique we could use that to build a FSID hash and tell the NFS server that the directories are indeed individual filesystems.

I believe this is the technique used by linux's exportfs
Comment 4 Rick Macklem freebsd_committer freebsd_triage 2019-01-31 01:53:01 UTC
Directory names don't need to be unique, so I am not sure what you are saying?
Now, I suppose that mountd could get the unique file handle for the
directory and then an index of those could be maintained in the kernel for
the exports to be referenced to.
But then, how do files get associated with one of these?
(Remember that every file gets a vnode and it references the mount point,
 which is where the exports currently live. You propose a separate
 "fake mountpoint" for each directory. What about subdirectories or the
 following case of multiple hard links. Hard links are a nightmare for
 distributed file systems.)

For your example, suppose we create a file in /storage with two hard links
for the file:
/storage/A/foo
/storage/B/bar

both these paths represent the same file. For your example, whether it is
exported to 10.0.0.1 as read/write or read-only is ambiguous.

There are userspace NFS servers out there that would assign two different
file handles to the above file, one for each path it might be looked up via.
The Linux and FreeBSD clients break for these servers, because they don't
recognize the two files as the "same file" and maintain separate caches in
the client. (Not a good plan and no way to make it work for at least NFSv3.)
Comment 5 Rick Macklem freebsd_committer freebsd_triage 2019-01-31 01:58:18 UTC
If you want to export a subdirectory to the same client with different
options (rw vs read-only for example), then just make it a separate file
system. (Not a fake file system for each directory. You can't have hard links
across file systems, so the hard links problems don't exist.)

I know of a site that has over 20,000 file systems on their server.
(This is easy for ZFS. Unfortunately updates to exports take a long time,
 since with entries per file system, 20,000 is a lot of changes to update.
 Although it would be nice to do, I haven't found an easy way to implement
 "change these exports and leave the rest unchanged".
 It is easy to have a "add these exports and leave the current ones unchanged",
 but that wasn't general enough for this site.)
Comment 6 Josh Paetzel freebsd_committer freebsd_triage 2019-01-31 02:18:27 UTC
The situation we are in we are limited to a single filesystem. Otherwise yes multiple filesystems would be the answer.
Comment 7 Rick Macklem freebsd_committer freebsd_triage 2019-01-31 02:55:36 UTC
Hmm. Don't you have multiple file systems that are faked as one file system?
(Getting rather off topic for a FreeBSD bug, but...)
Btw, I now realize you meant "names within a directory are unique", but
that still leave the multiple hard links in different directories situation,
which make "exports per directory" impractical.
Comment 8 Andriy Gapon freebsd_committer freebsd_triage 2019-01-31 06:37:15 UTC
(In reply to Rick Macklem from comment #2)

From usability point of view (from an end-user's point of view) it makes little sense that I can do
> /storage/A 10.0.0.1
> /storage/B -ro 10.0.0.2
but cannot do
> /storage/A 10.0.0.1
> /storage/B -ro 10.0.0.1

I get an impression that this is an implementation detail not a deep design issue. But I haven't dug into the code sufficiently yet.

And a quick question, does the kernel really have to be involved with those "administrative controls" at all?

P.S.
I realize that this request cannot be implemented for NFSv4 and all the controls are per filesystem only in that case.
Comment 9 Josh Paetzel freebsd_committer freebsd_triage 2019-01-31 17:29:08 UTC
(In reply to Rick Macklem from comment #7)

Sort of.  Our CloudFS is multiple filesystems stitched together, however from an individual NFS server it has one filesystem.