Bug 197695 - Fix broken KERN_PROC_FILEDESC sysctl
Summary: Fix broken KERN_PROC_FILEDESC sysctl
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Many People
Assignee: freebsd-bugs (Nobody)
Depends on:
Reported: 2015-02-16 00:50 UTC by Niall Douglas
Modified: 2017-02-09 14:54 UTC (History)
3 users (show)

See Also:

test case (1.95 KB, text/x-csrc)
2015-02-19 04:57 UTC, Niall Douglas
no flags Details
test case output (2.11 KB, text/plain)
2015-02-19 04:58 UTC, Niall Douglas
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Niall Douglas 2015-02-16 00:50:54 UTC
I recently discovered that FreeBSD is the only one of the major operating systems which does not provide a method of retrieving a current canonical path for a currently open file descriptor. I think this should be fixed, as without this facility writing code which implements race-free file entry unlinking is extremely tough - you simply cannot avoid accidentally deleting the wrong file if another process changes out the filing system underneath you, and you cannot use the trick of openat() + statat() + unlinkat() on the current path of a file descriptor to ensure you are deleting the correct file (of course, it would be super great if POSIX allowed one to delete and rename files via open file descriptor like Windows does and then life would be much easier writing race free filing system code. But I digress).

How other OSs implement path reading:

* Windows: NtQueryObject(hFile, ObjectNameInformation, nameFull.Buffer, sizeof(nameFull.Buffer), &returnedLength). This returns the NT kernel path for an open file handle. With a bit of work, this can be converted into a DOS style path. On Windows, the NT kernel path used to open a handle is retained per handle, so hard links for the same file don't confound. Also, NT usefully supplies a boolean which indicates if the file is deleted or not.

* Linux: readlink("/proc/self/fd/NNN", buffer, sizeof(buffer)) returning the length of the buffer filled. Linux usefully prepends (older kernels) or appends (newer kernels) the string "(deleted)" if the file is deleted. Unfortunately hard links can confound on Linux, so the path returned may be very different to the one you opened. You basically get back _some_ path referring to that inode, whichever the kernel found first in its caches.

* Mac OS X: fnctl(fd, F_GETPATH, buffer). No size of the buffer is supplied which I think was a real oversight. No return of how much of the buffer was filled either. Also, if the file is deleted you just get back the last known good path, and I don't know if this API is confounded by hard links.

How FreeBSD might implement this:

1. /proc is deprecated on FreeBSD, so Linux's approach is out. I dislike the OS X API as OS X did it, but if one added a size_t* bufsize and let one query the buffer size by passing a null buffer it would look a lot better. Some method of indicating if the file is deleted (e.g. a null string return) even better again.

2. As an alternative to implementing our own F_GETPATH which doesn't match OS X's API, the following code could work:

        size_t len;
        int mib[4]={CTL_KERN, KERN_PROC, KERN_PROC_FILEDESC, getpid()};
        BOOST_AFIO_ERRHOS(sysctl(mib, 4, NULL, &len, NULL, 0));
        std::vector<char> buffer(len*2);
        BOOST_AFIO_ERRHOS(sysctl(mib, 4, buffer.data(), &len, NULL, 0));
        for(char *p=buffer.data(); p<buffer.data()+len;)
          struct kinfo_file *kif=(struct kinfo_file *) p;
            lock_guard<pathlock_t> g(pathlock);
            return _path;

Right now FreeBSD returns path information per fd via KERN_PROC_FILEDESC for just about every type of file descriptor *except* regular files. If you use procstat you'll see this for yourself - regular files always get a null path.

Ideally speaking the kernel would track the path used to open each fd as it changed over time, this would prevent hard link confounding. However, I can see that would require filing system support. One alternative could be to return a null terminated sequence of path fragments within its mounted filing system, one per hard link, but again I can see filing system support might be needed.

Looking at kernel source code, ZFS provides a ZFS_IOC_OBJ_TO_PATH ioctl which will return you a path from a supplied ZFS object, however I note that it returns exactly one path, so I assume that ZFS objects are one per hard link. UFS appears to provide a SAVENAME facility in ufs_lookup_ino(), so in theory there it's easy. I didn't look into the other filing systems, but it doesn't look like implementing this would be hard for someone familiar with the FreeBSD kernel.

And procstat and lsof would now return more useful information, also a win.

Comment 1 Niall Douglas 2015-02-19 04:57:48 UTC
Created attachment 153164 [details]
test case
Comment 2 Niall Douglas 2015-02-19 04:58:23 UTC
Created attachment 153165 [details]
test case output
Comment 3 Niall Douglas 2015-02-19 04:59:40 UTC
It turns out that I
Comment 4 Niall Douglas 2015-02-19 05:04:11 UTC
It turns out that I was wrong about there being no method of retrieving a canonical path for an open file descriptor on FreeBSD. There in fact is, and it is called KERN_PROC_FILEDESC.

Unfortunately, KERN_PROC_FILEDESC is broken for regular files, but not directories where it works perfectly - even tracking renames and deletions with ease. The attached test case and its output shows the broken behaviour:

1. Files opened with O_CREAT never get a path. Creating the file, closing it and opening it again gets the file descriptor its path.

2. As soon as the process does its first rename() all the paths for all the regular file descriptors get reset to null, but not directory file descriptors.

3. Directory file descriptors work correctly.

This was tested on FreeBSD 10.1 with ZFS on root. Other filing systems may vary.

Comment 5 Jilles Tjoelker freebsd_committer 2015-03-08 14:29:30 UTC
As it is now, KERN_PROC_FILEDESC is designed for debugging only. The namecache is best-effort. See for example SVN r275897, which states that newly created files are deliberately not entered into the namecache (as part of a change to enter core dump filenames into the namecache).

A second problem is that KERN_PROC_FILEDESC processes all file descriptors at once and may therefore be slow.

I think directory file descriptors always know their names because directories have a single "..", all the way up to the root.
Comment 6 Niall Douglas 2015-03-13 20:55:14 UTC
(In reply to Jilles Tjoelker from comment #5)

I appreciate the explanation.

What I'll do is open a new feature request, this time to add a F_GETPATH implementation. Thanks.

Comment 7 Niall Douglas 2015-03-13 21:00:17 UTC
Superseded by #198570, so closing.