Created attachment 158831 [details] patch to implement devfs_get_cdevpriv_from_file(9) In the NVIDIA GPU driver stack for FreeBSD, we have a variety of cases where one user-space driver component calls our kernel-space driver component to allocate an object (e.g., a block of video memory) and then needs to share the object with another user-space driver component. We would like to share these objects between user-space driver components by passing a file descriptor over a UNIX domain socket. E.g., Process A: - alloc video memory - fd = open("/dev/nvidiactl"); - call kernel driver to associate video memory with fd - pass fd to process B - close fd Process B: - receive fd from process A - call kernel driver to retrieve video memory assocated with fd - close fd But to do this, we would like to be able to map from a file descriptor to the per-open file descriptor data (registered with devfs_set_cdevpriv(9) and normally retrieved using devfs_get_cdevpriv(9)). If I understand correctly, this can be done with (pseudo-code): struct file *fp; struct cdev_privdata *p; struct driver_per_open *popen; fget(curthread, fd, &fp); p = fp->f_cdevpriv; popen = p->cdpd_data; /* use popen... */ fdrop(fp, curthread); However, it looks like struct cdev_privdata is not intended to be accessed by drivers. Would it be reasonable to extend the devfs_cdevpriv(9) family of functions with a function that can retrieve the driver specific data, given a struct file? That way, drivers could do: struct file *fp; struct driver_per_open *popen; fget(curthread, fd, &fp); devfs_get_cdevpriv_from_file(&popen, fp); /* use popen... */ fdrop(fp, curthread); The attached patch implements devfs_get_cdevpriv_from_file().
Actually, for the driver to use devfs_get_cdevpriv_from_file(), as in: struct file *fp; struct driver_per_open *popen; fget(curthread, fd, &fp); devfs_get_cdevpriv_from_file(&popen, fp); /* use popen... */ fdrop(fp, curthread); someone would need to validate that the 'popen' was allocated by the driver. Otherwise, an attacker could open some other device file and pass that fd into this code, tricking the driver into misinterpreting what file::f_cdevpriv::cdpd_data points to. I think this could be validated by comparing file::f_cdevpriv::cdpd_dtr to the function pointer (cdevpriv_dtr_t) that was passed into devfs_set_cdevpriv(). Would it make sense for devfs_get_cdevpriv_from_file() to take a cdevpriv_dtr_t as an argument? Or is there a better way to structure this?
(In reply to Andy Ritger from comment #0) I do not quite understand this. Lets clarify the notational system first. I propose to rename the file descriptor we talk about in the context of the process B as fdb. Also, we note that both fd (from the process A filedescriptors namespace) and fdb (from B) point to the same file F. In the process B, to retrieve the video memory associated with F, you perform some kernel call, most likely ioctl(fdb, OPcode, ...). Then, during the call to the ioctl handler for OPcode, you get the same pointer to the file private data F by devfs_get_cdevpriv(). It does not matter if it is in context of ioctl(fd, OP1code) in A or in context of ioctl(fdb, OPcode) in B. The underlying file is same, since it was passed by ancillary data, and cdevpriv is same.
(In reply to Konstantin Belousov from comment #2) Thanks for taking a look. Sorry that I didn't describe things well in my original post. The intent is for this usage pattern: process A: fda_dev = open("/dev/nvidia0"); // [...allocate resources...] // allocate file to pass resource to process B fda = open("/dev/nvidia0"); // associate resource with fda params.resource = handleForResource; params.fd = fda; ioctl(fda_dev, opcode, ¶ms, sizeof(params)); SendFdToProcessB(fda); close(fda); process B: fdb_dev = open("/dev/nvidia0"); // resource resource from process A fdb = RecvFdFromProcessA(); // look up resource assocated with fd params.fd = fdb; ioctl(fdb_dev, opcode, ¶ms, sizeof(params)); close(fdb); resource = params.handleForResource (i.e., the fd for the resource is different than the fd on which the ioctl is performed) It is important that fd[ab] are different than fd[ab]_dev: process A doesn't want to give process B access to _everything_: just a specific resource. Does that make sense? Thanks.
I should note that this sort of usage pattern is used by dma-buf on Linux, as I understand it, so this may be interesting for people bringing up modern DRM on FreeBSD.
Would you be open to creating new fd's to describe the resources themselves instead? That is, if you had a fd directly for the buffer you want to share, you could share the buffer by passing the fd for the buffer rather than having to proxy it via ioctls. It seems that you are already sort-of doing this, but you are overloading fda so you can use an ioctl to "bind" a buffer to it and then send it across. What if instead you had something like: process A: fda_dev = open("/dev/nvidiactl"); params.handleForResource = resource; ioctl(fda_dev, opcode, ¶ms); /* ioctl creates new fd in params.fd for the resource */ SendFdToProcessB(params.fd); close(params.fd); process B: fdb = RecvFdFromProcessA(); mmap(..., fdb); This is assuming you want to pass memory buffers around without copying. Note that if all you want is a new file descriptor type that can be mmap'd and references an existing VM object (such as the OBJT_SG objects the nvidia driver already creates internally) then such a file descriptor is probably fairly simple to write (especially in HEAD with the fo_mmap hook in fileops). shm_open() would serve as a good reference, though we could also help with implementing an API for you to just takes an existing VM object and allocates a new fd for the current process that only permits mmap but nothing else.
(In reply to John Baldwin from comment #5) Thanks for looking at this, John. If the ioctl called by Process A returned a new file descriptor, that would be workable. Though, the user-space code to use this would then need to be a bit different between FreeBSD and other UNIX platforms: at least on Linux, I'm not sure how straight-forward it would be to create a new file descriptor within the kernel code that processes the ioctl. We chose opening the file in user space, in order to create the fd, as the least complex way to create it. In process B, the resource that is being shared won't necessarily be mmaped into the process's CPU virtual address space: sharing a resource can mean other things, such as mapping into the GPU's (not CPU's virtual address space). Is there concern with looking up the cdevpriv from the struct file?
It's a bit hackish to abuse another file descriptor to pass an unrelated resource. As you noted, now you have to have extra checks in place to know if the cdevpriv you are looking at is safe for you to examine. It's not clear to me how you can really do that safely (perhaps you can check the cdevsw pointer of the fd but that's pretty gross). I think my suggestion is more in line with the existing UNIX way of doing things in that you use file descriptors as handles on resources (the same way you would use a handle on Windows for these sorts of things). You can create whatever kind of file descriptor you want and it will be self-contained as it will have its assigned file ops which will always be correct (and thus not be subject to the race you mentioned previously) when passed across a UNIX domain socket. Having a file descriptor for a VM object is a fairly simple thing to do (it is how shm_open() is implemented in FreeBSD). I (and possibly Konstantin) can help with creating helper routines if needed (e.g. something like create_memobject_file() that takes a VM object and some flags (e.g. if it's read-only or not, supports changing size or not) and creates a new file descriptor in the current process that can be returned via a member in a structure from an ioctl, etc.). It is of course possible to define other file descriptor types if you need something besides just memory.
(In reply to John Baldwin from comment #7) Thanks, John. Those are all fair points. Looking back at my notes, the reason we originally implemented our API as having user-space, rather than kernel-space, create the file descriptor was that doing it in user-space minimized how much in-kernel API churn we would be exposed to (i.e., concern over the API for allocating an fd within the kernel changing across kernel versions). To be fair, that's much more of a concern on Linux than on FreeBSD. In any case, having the ioctl return an fd for the resource should be workable. On the receiving side, I worry that we may still need the flexibility of an ioctl implementation within the kernel looking up information from a file descriptor. Are you very familiar with the Linux dma-buf interface? There is some documentation on it here: https://www.kernel.org/doc/Documentation/dma-buf-sharing.txt I expect those working on porting DRM from Linux to FreeBSD will eventually need to implement drm-buf on FreeBSD, and some of these same fd translation questions will come up, there.