Bug 201611 - [patch] Add devfs_get_cdevpriv_from_file(9)
Summary: [patch] Add devfs_get_cdevpriv_from_file(9)
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: FreeBSD bugs mailing list
Keywords: patch
Depends on:
Reported: 2015-07-16 08:11 UTC by Andy Ritger
Modified: 2015-09-29 00:43 UTC (History)
4 users (show)

See Also:

patch to implement devfs_get_cdevpriv_from_file(9) (1.14 KB, patch)
2015-07-16 08:11 UTC, Andy Ritger
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andy Ritger 2015-07-16 08:11:44 UTC
Created attachment 158831 [details]
patch to implement devfs_get_cdevpriv_from_file(9)

In the NVIDIA GPU driver stack for FreeBSD, we have a variety of cases
where one user-space driver component calls our kernel-space driver
component to allocate an object (e.g., a block of video memory) and then
needs to share the object with another user-space driver component.

We would like to share these objects between user-space driver components
by passing a file descriptor over a UNIX domain socket.  E.g.,

    Process A:
    - alloc video memory
    - fd = open("/dev/nvidiactl");
    - call kernel driver to associate video memory with fd
    - pass fd to process B
    - close fd

    Process B:
    - receive fd from process A
    - call kernel driver to retrieve video memory assocated with fd
    - close fd

But to do this, we would like to be able to map from a file descriptor to
the per-open file descriptor data (registered with devfs_set_cdevpriv(9)
and normally retrieved using devfs_get_cdevpriv(9)).

If I understand correctly, this can be done with (pseudo-code):

    struct file *fp;
    struct cdev_privdata *p;
    struct driver_per_open *popen;

    fget(curthread, fd, &fp);

    p = fp->f_cdevpriv;
    popen = p->cdpd_data;

    /* use popen... */

    fdrop(fp, curthread);

However, it looks like struct cdev_privdata is not intended to be accessed
by drivers.

Would it be reasonable to extend the devfs_cdevpriv(9) family of
functions with a function that can retrieve the driver specific data,
given a struct file?  That way, drivers could do:

    struct file *fp;
    struct driver_per_open *popen;

    fget(curthread, fd, &fp);

    devfs_get_cdevpriv_from_file(&popen, fp);

    /* use popen... */

    fdrop(fp, curthread);

The attached patch implements devfs_get_cdevpriv_from_file().
Comment 1 Andy Ritger 2015-07-23 23:24:42 UTC
Actually, for the driver to use devfs_get_cdevpriv_from_file(), as in:

    struct file *fp;
    struct driver_per_open *popen;

    fget(curthread, fd, &fp);

    devfs_get_cdevpriv_from_file(&popen, fp);

    /* use popen... */

    fdrop(fp, curthread);

someone would need to validate that the 'popen' was allocated by the driver.  Otherwise, an attacker could open some other device file and pass that fd into this code, tricking the driver into misinterpreting what file::f_cdevpriv::cdpd_data points to.

I think this could be validated by comparing file::f_cdevpriv::cdpd_dtr to the function pointer (cdevpriv_dtr_t) that was passed into devfs_set_cdevpriv().  Would it make sense for devfs_get_cdevpriv_from_file() to take a cdevpriv_dtr_t as an argument?  Or is there a better way to structure this?
Comment 2 Konstantin Belousov freebsd_committer 2015-09-18 19:48:45 UTC
(In reply to Andy Ritger from comment #0)
I do not quite understand this.

Lets clarify the notational system first.  I propose to rename the file descriptor we talk about in the context of the process B as fdb.  Also, we note that both fd (from the process A filedescriptors namespace) and fdb (from B) point to the same file F.

In the process B, to retrieve the video memory associated with F, you perform some kernel call, most likely ioctl(fdb, OPcode, ...).  Then, during the call to the ioctl handler for OPcode, you get the same pointer to the file private data F by devfs_get_cdevpriv().  It does not matter if it is in context of ioctl(fd, OP1code) in A or in context of ioctl(fdb, OPcode) in B.  The underlying file is same, since it was passed by ancillary data, and cdevpriv is same.
Comment 3 Andy Ritger 2015-09-18 20:15:54 UTC
(In reply to Konstantin Belousov from comment #2)

Thanks for taking a look.

Sorry that I didn't describe things well in my original post.  The intent
is for this usage pattern:

    process A:
        fda_dev = open("/dev/nvidia0");

        // [...allocate resources...]

        // allocate file to pass resource to process B
        fda = open("/dev/nvidia0");

        // associate resource with fda
        params.resource = handleForResource;
        params.fd = fda;
        ioctl(fda_dev, opcode, &params, sizeof(params));


    process B:
        fdb_dev = open("/dev/nvidia0");

        // resource resource from process A
        fdb = RecvFdFromProcessA();

        // look up resource assocated with fd
        params.fd = fdb;
        ioctl(fdb_dev, opcode, &params, sizeof(params));

        resource = params.handleForResource

(i.e., the fd for the resource is different than the fd on which the
ioctl is performed)

It is important that fd[ab] are different than fd[ab]_dev: process A
doesn't want to give process B access to _everything_: just a specific

Does that make sense?

Comment 4 Andy Ritger 2015-09-18 20:36:14 UTC
I should note that this sort of usage pattern is used by dma-buf on Linux, as I understand it, so this may be interesting for people bringing up modern DRM on FreeBSD.
Comment 5 John Baldwin freebsd_committer freebsd_triage 2015-09-18 23:44:37 UTC
Would you be open to creating new fd's to describe the resources themselves instead?  That is, if you had a fd directly for the buffer you want to share, you could share the buffer by passing the fd for the buffer rather than having to proxy it via ioctls.  It seems that you are already sort-of doing this, but you are overloading fda so you can use an ioctl to "bind" a buffer to it and then send it across.  What if instead you had something like:

process A:

   fda_dev = open("/dev/nvidiactl");

   params.handleForResource = resource;
   ioctl(fda_dev, opcode, &params);

   /* ioctl creates new fd in params.fd for the resource */


process B:

   fdb = RecvFdFromProcessA();
   mmap(..., fdb);

This is assuming you want to pass memory buffers around without copying.

Note that if all you want is a new file descriptor type that can be mmap'd and references an existing VM object (such as the OBJT_SG objects the nvidia driver already creates internally) then such a file descriptor is probably fairly simple to write (especially in HEAD with the fo_mmap hook in fileops).  shm_open() would serve as a good reference, though we could also help with implementing an API for you to just takes an existing VM object and allocates a new fd for the current process that only permits mmap but nothing else.
Comment 6 Andy Ritger 2015-09-22 20:18:06 UTC
(In reply to John Baldwin from comment #5)

Thanks for looking at this, John.

If the ioctl called by Process A returned a new file descriptor, that
would be workable.  Though, the user-space code to use this would then
need to be a bit different between FreeBSD and other UNIX platforms: at
least on Linux, I'm not sure how straight-forward it would be to create
a new file descriptor within the kernel code that processes the ioctl.
We chose opening the file in user space, in order to create the fd,
as the least complex way to create it.

In process B, the resource that is being shared won't necessarily be
mmaped into the process's CPU virtual address space: sharing a resource
can mean other things, such as mapping into the GPU's (not CPU's virtual
address space).

Is there concern with looking up the cdevpriv from the struct file?
Comment 7 John Baldwin freebsd_committer freebsd_triage 2015-09-22 20:43:03 UTC
It's a bit hackish to abuse another file descriptor to pass an unrelated resource.  As you noted, now you have to have extra checks in place to know if the cdevpriv you are looking at is safe for you to examine.  It's not clear to me how you can really do that safely (perhaps you can check the cdevsw pointer of the fd but that's pretty gross).

I think my suggestion is more in line with the existing UNIX way of doing things in that you use file descriptors as handles on resources (the same way you would use a handle on Windows for these sorts of things).  You can create whatever kind of file descriptor you want and it will be self-contained as it will have its assigned file ops which will always be correct (and thus not be subject to the race you mentioned previously) when passed across a UNIX domain socket.

Having a file descriptor for a VM object is a fairly simple thing to do (it is how shm_open() is implemented in FreeBSD).  I (and possibly Konstantin) can help with creating helper routines if needed (e.g. something like create_memobject_file() that takes a VM object and some flags (e.g. if it's read-only or not, supports changing size or not) and creates a new file descriptor in the current process that can be returned via a member in a structure from an ioctl, etc.).  It is of course possible to define other file descriptor types if you need something besides just memory.
Comment 8 Andy Ritger 2015-09-29 00:43:28 UTC
(In reply to John Baldwin from comment #7)

Thanks, John.  Those are all fair points.  Looking back at my notes, the reason we originally implemented our API as having user-space, rather than kernel-space, create the file descriptor was that doing it in user-space minimized how much in-kernel API churn we would be exposed to (i.e., concern over the API for allocating an fd within the kernel changing across kernel versions).  To be fair, that's much more of a concern on Linux than on FreeBSD.

In any case, having the ioctl return an fd for the resource should be workable.

On the receiving side, I worry that we may still need the flexibility of an ioctl implementation within the kernel looking up information from a file descriptor.

Are you very familiar with the Linux dma-buf interface?  There is some documentation on it here:
I expect those working on porting DRM from Linux to FreeBSD will eventually need to implement drm-buf on FreeBSD, and some of these same fd translation questions will come up, there.