Bug 224358 - x11/nvidia-driver: libnvidia-fatbinaryloader.so not installed
Summary: x11/nvidia-driver: libnvidia-fatbinaryloader.so not installed
Status: New
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Only Me
Assignee: Alexey Dokuchaev
Depends on: 217901
  Show dependency treegraph
Reported: 2017-12-15 04:45 UTC by Henry Hu
Modified: 2018-11-21 20:19 UTC (History)
2 users (show)

See Also:
bugzilla: maintainer-feedback? (danfe)

patch to install libnvidia-fatbinaryloader (1.97 KB, patch)
2017-12-15 04:45 UTC, Henry Hu
no flags Details | Diff
ktrace log (76.29 KB, text/plain)
2018-11-12 03:17 UTC, Henry Hu
no flags Details
patch (37.12 KB, patch)
2018-11-12 17:20 UTC, Tijl Coosemans
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Henry Hu 2017-12-15 04:45:32 UTC
Created attachment 188856 [details]
patch to install libnvidia-fatbinaryloader

Recent versions of nvidia-driver installs a 32bit libcuda.so which can be used for CUDA.
However, this lib depends on libnvidia-fatbinaryloader.so, which is also shipped, but not installed.
A simple patch is attached to resolve this issue.
Comment 1 Tijl Coosemans freebsd_committer 2018-11-09 21:47:44 UTC
Can you try the patch in bug 217901?  It reworks the installation of the Linux libraries.
Comment 2 Henry Hu 2018-11-10 21:07:29 UTC
(In reply to Tijl Coosemans from comment #1)

I've tried your patch, and Google Earth works fine.
CUDA test program deviceQueryDrv can find libcuda.so and libnvidia-fatbinaryloader.so, so your patch fixes this PR.
On the other hand, it still reports cuInit() returns 999, so there are other problems, before we can use CUDA.
Comment 3 Alex S 2018-11-10 21:47:28 UTC
CUDA is unlikely to work without the nvidia-uvm kernel module from the Linux driver package.
Comment 4 Tijl Coosemans freebsd_committer 2018-11-11 11:38:09 UTC
(In reply to Henry Hu from comment #2)
Can you run the CUDA test program using ktrace?  That should give us some more information about what it tries to do.

ktrace -i -f /where/you/want/ktrace.out testprogram
kdump -H -f /where/you/want/ktrace.out > /where/you/want/ktrace.txt

Then attach ktrace.txt to this bug (compressed with bzip2 or something if it's too big).
Comment 5 Henry Hu 2018-11-12 03:17:06 UTC
Created attachment 199156 [details]
ktrace log

It seems to be accessing /dev/nvidia-uvm.
Comment 6 Tijl Coosemans freebsd_committer 2018-11-12 10:23:19 UTC
(In reply to Henry Hu from comment #5)
Right, nvidia-uvm seems to be open source so it should be possible to port it (perhaps using linuxkpi in base and linuxkpi_gplv2 in graphics/drm-devel-kmod), but I don't have time for that right now.
Comment 7 Tijl Coosemans freebsd_committer 2018-11-12 17:20:04 UTC
Created attachment 199178 [details]

I noticed that nvidia-uvm also has an unsupported mode which is trivial to port so here's a new version of the patch for x11/nvidia-driver.  Now it should install a dummy nvidia-uvm kernel module that you can load with kldload nvidia-uvm.  You probably also need to adjust the permissions on /dev/nvidia-uvm.

Please give it a try.  If it doesn't work then create another ktrace.
Comment 8 Alex S 2018-11-12 18:11:35 UTC
I tried unsupported mode before (https://github.com/shkhln/nvshim/blob/master/src/libc/sys/ioctl.c#L15) and I think it is, well, actually unsupported. Too lazy to setup a proper Linux system for testing.
Comment 9 Alex S 2018-11-12 22:43:21 UTC
> I tried unsupported mode before (https://github.com/shkhln/nvshim/blob/master/src/libc/sys/ioctl.c#L15) and I think it is, well, actually unsupported.

Ok, turns out I'm just dumb. Disregard that.
Comment 10 Tijl Coosemans freebsd_committer 2018-11-13 15:03:33 UTC
Comment on attachment 199178 [details]

I've uploaded a new patch to bug 217901 addressing issues with the ioctl handler.
Comment 11 Tijl Coosemans freebsd_committer 2018-11-13 15:37:06 UTC
The port of nvidia-uvm is incomplete.  It doesn't handle ioctl calls from Linux programs yet.
Comment 12 Tijl Coosemans freebsd_committer 2018-11-19 10:51:09 UTC
Patch4 in bug 217901 contains an updated nvidia-uvm module (still unsupported mode).  Please give it a try and provide another ktrace if it doesn't work.
Comment 13 Alex S 2018-11-20 06:37:16 UTC
There are quite a few stubs in nvidia.ko, some of them might be required for CUDA. For example, running matrixMul from CUDA SDK and glxgears with this dtrace script:

#!/usr/sbin/dtrace -s

nvidia:*:entry, nvidia-modeset:*:entry /execname == "matrixMul"/ {
  @counts[probefunc] = min(1);

nvidia:*:entry, nvidia-modeset:*:entry /execname != "matrixMul"/ {
  @counts[probefunc] = min(0);

Gives me:

os_lock_user_pages 1
Comment 14 Tijl Coosemans freebsd_committer 2018-11-21 13:54:12 UTC
(In reply to Alex S from comment #13)
Does the CUDA program get further now that /dev/nvidia-uvm exists?  Or do you see this without /dev/nvidia-uvm as well?
Comment 15 Alex S 2018-11-21 17:25:38 UTC
(In reply to Tijl Coosemans from comment #14)

Are you able to test 390.87 yourself?

> Or do you see this without /dev/nvidia-uvm as well?

Without /dev/nvidia-uvm I see CUDA trying to pass an error code (-2) into ioctl call: https://forums.freebsd.org/threads/linux-binary-compatibility-nvidia-drivers-and-cuda-for-blender.65065/#post-382015.

> Does the CUDA program get further now that /dev/nvidia-uvm exists?

Please note that I'm on 11.2-RELEASE and I'm not currently able to test your patches. Other that that, with "unsupported mode UVM" matrixMul sample prints some vague "all CUDA-capable devices are busy or unavailable" error message. Replacing "return NV_ERR_NOT_SUPPORTED" with "return NV_OK" (without proper implementation) in os_lock_user_pages and os_unlock_user_pages seems to trick it into actually reading some (garbage) data. Dtrace reports nv_register_user_pages and nv_unregister_user_pages being called.
Comment 16 Alex S 2018-11-21 17:40:00 UTC
(In reply to Alex S from comment #15)

> that that
* than that
Comment 17 Tijl Coosemans freebsd_committer 2018-11-21 20:19:32 UTC
(In reply to Alex S from comment #15)
I cannot test 390, only 304.  And I'm only interested in enabling linux64 in the nvidia-driver to make linux-c7 the default.  The nvidia-driver is the last blocker for that.  If I can get CUDA working at the same time that would be a nice bonus, but it's not a priority for me.  I can take a look at os_lock_user_pages and friends, but no promises.

If you modified your 11.2 kernel as in bug 206711 you should be able to test my patch for x11/nvidia-driver.