|Summary:||remove limitation in NFS directory exports|
|Product:||Base System||Reporter:||Josh Paetzel <jpaetzel>|
|Component:||kern||Assignee:||Andriy Gapon <avg>|
|Severity:||Affects Many People||CC:||rmacklem|
Description Josh Paetzel 2019-01-29 15:26:33 UTC
Currently mountd doesn't really know about directory level exports. If you have the following filesystem: /storage with the following directories A and B and you try the following exports: /storage/A -maproot=root:wheel 10.0.0.1 /storage/B -ro 10.0.0.1 mountd will explode because it thinks you are exporting the same filesystem to the same host with different options.
Comment 1 Andriy Gapon 2019-01-30 13:14:15 UTC
The error actually seems to come from vfs_hang_addrlist() in the kernel.
Comment 2 Rick Macklem 2019-01-30 23:21:50 UTC
That is correct behaviour. The example shows /storage being exported as read/write for 10.0.0.2 in the first line and read-only in the second line. Exports are per-filesystem in the kernel. The exporting of different directories within a file system are referred to in "man exports" as administrative controls and only define which directories an NFSv3 client can mount via the Mount protocol. For NFSv4, they are meaningless, since the Mount protocol is not used. Personally, I would have preferred that these "administrative controls" did not exist, but I lost those arguments. (In the early days, it was because Sun supported them. More recently, it came up when a utility called nfse was being looked at as a replacement for mountd and it did not support these "administrative controls" and that was considered a POLA violation. You are welcome to try and make "man exports" explain this more clearly, since my attempts at it have never been successful. rick ps: Making the kernel understand directory exports is basically not feasible, since only mount structures remain in the kernel and no knowledge of directory subtrees exists in the kernel.
Comment 3 Josh Paetzel 2019-01-30 23:56:09 UTC
ok, that being said I have another idea then. What if we pretended directories were filesystems? Since directory names have to be unique we could use that to build a FSID hash and tell the NFS server that the directories are indeed individual filesystems. I believe this is the technique used by linux's exportfs
Comment 4 Rick Macklem 2019-01-31 01:53:01 UTC
Directory names don't need to be unique, so I am not sure what you are saying? Now, I suppose that mountd could get the unique file handle for the directory and then an index of those could be maintained in the kernel for the exports to be referenced to. But then, how do files get associated with one of these? (Remember that every file gets a vnode and it references the mount point, which is where the exports currently live. You propose a separate "fake mountpoint" for each directory. What about subdirectories or the following case of multiple hard links. Hard links are a nightmare for distributed file systems.) For your example, suppose we create a file in /storage with two hard links for the file: /storage/A/foo /storage/B/bar both these paths represent the same file. For your example, whether it is exported to 10.0.0.1 as read/write or read-only is ambiguous. There are userspace NFS servers out there that would assign two different file handles to the above file, one for each path it might be looked up via. The Linux and FreeBSD clients break for these servers, because they don't recognize the two files as the "same file" and maintain separate caches in the client. (Not a good plan and no way to make it work for at least NFSv3.)
Comment 5 Rick Macklem 2019-01-31 01:58:18 UTC
If you want to export a subdirectory to the same client with different options (rw vs read-only for example), then just make it a separate file system. (Not a fake file system for each directory. You can't have hard links across file systems, so the hard links problems don't exist.) I know of a site that has over 20,000 file systems on their server. (This is easy for ZFS. Unfortunately updates to exports take a long time, since with entries per file system, 20,000 is a lot of changes to update. Although it would be nice to do, I haven't found an easy way to implement "change these exports and leave the rest unchanged". It is easy to have a "add these exports and leave the current ones unchanged", but that wasn't general enough for this site.)
Comment 6 Josh Paetzel 2019-01-31 02:18:27 UTC
The situation we are in we are limited to a single filesystem. Otherwise yes multiple filesystems would be the answer.
Comment 7 Rick Macklem 2019-01-31 02:55:36 UTC
Hmm. Don't you have multiple file systems that are faked as one file system? (Getting rather off topic for a FreeBSD bug, but...) Btw, I now realize you meant "names within a directory are unique", but that still leave the multiple hard links in different directories situation, which make "exports per directory" impractical.
Comment 8 Andriy Gapon 2019-01-31 06:37:15 UTC
(In reply to Rick Macklem from comment #2) From usability point of view (from an end-user's point of view) it makes little sense that I can do > /storage/A 10.0.0.1 > /storage/B -ro 10.0.0.2 but cannot do > /storage/A 10.0.0.1 > /storage/B -ro 10.0.0.1 I get an impression that this is an implementation detail not a deep design issue. But I haven't dug into the code sufficiently yet. And a quick question, does the kernel really have to be involved with those "administrative controls" at all? P.S. I realize that this request cannot be implemented for NFSv4 and all the controls are per filesystem only in that case.