224358 – x11/nvidia-driver: libnvidia-fatbinaryloader.so not installed

Bug 224358 - x11/nvidia-driver: libnvidia-fatbinaryloader.so not installed

Summary: x11/nvidia-driver: libnvidia-fatbinaryloader.so not installed

Status:	Closed FIXED

Alias:	None

Product:	Ports & Packages
Classification:	Unclassified
Component:	Individual Port(s) (show other bugs)
Version:	Latest
Hardware:	Any Any

Importance:	--- Affects Only Me
Assignee:	Alexey Dokuchaev

URL:
Keywords:

Depends on:	217901
Blocks:
	Show dependency tree / graph

Reported:	2017-12-15 04:45 UTC by Henry Hu
Modified:	2021-10-16 16:35 UTC (History)
CC List:	3 users (show)

See Also:

Flags:	bugzilla: maintainer-feedback? (danfe)

Attachments
patch to install libnvidia-fatbinaryloader (1.97 KB, patch) 2017-12-15 04:45 UTC, Henry Hu	no flags	Details \| Diff
ktrace log (76.29 KB, text/plain) 2018-11-12 03:17 UTC, Henry Hu	no flags	Details
patch (37.12 KB, patch) 2018-11-12 17:20 UTC, Tijl Coosemans	no flags	Details \| Diff
Show Obsolete (2) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Henry Hu 2017-12-15 04:45:32 UTC

Created attachment 188856 [details]
patch to install libnvidia-fatbinaryloader

Recent versions of nvidia-driver installs a 32bit libcuda.so which can be used for CUDA.
However, this lib depends on libnvidia-fatbinaryloader.so, which is also shipped, but not installed.
A simple patch is attached to resolve this issue.

Comment 1 Tijl Coosemans freebsd_committer

2018-11-09 21:47:44 UTC

Can you try the patch in bug 217901?  It reworks the installation of the Linux libraries.

Comment 2 Henry Hu 2018-11-10 21:07:29 UTC

(In reply to Tijl Coosemans from comment #1)

I've tried your patch, and Google Earth works fine.
CUDA test program deviceQueryDrv can find libcuda.so and libnvidia-fatbinaryloader.so, so your patch fixes this PR.
On the other hand, it still reports cuInit() returns 999, so there are other problems, before we can use CUDA.

Comment 3 Alex S 2018-11-10 21:47:28 UTC

CUDA is unlikely to work without the nvidia-uvm kernel module from the Linux driver package.

Comment 4 Tijl Coosemans freebsd_committer

2018-11-11 11:38:09 UTC

(In reply to Henry Hu from comment #2)
Can you run the CUDA test program using ktrace?  That should give us some more information about what it tries to do.

ktrace -i -f /where/you/want/ktrace.out testprogram
kdump -H -f /where/you/want/ktrace.out > /where/you/want/ktrace.txt

Then attach ktrace.txt to this bug (compressed with bzip2 or something if it's too big).

Comment 5 Henry Hu 2018-11-12 03:17:06 UTC

Created attachment 199156 [details]
ktrace log

It seems to be accessing /dev/nvidia-uvm.

Comment 6 Tijl Coosemans freebsd_committer

2018-11-12 10:23:19 UTC

(In reply to Henry Hu from comment #5)
Right, nvidia-uvm seems to be open source so it should be possible to port it (perhaps using linuxkpi in base and linuxkpi_gplv2 in graphics/drm-devel-kmod), but I don't have time for that right now.

Comment 7 Tijl Coosemans freebsd_committer

2018-11-12 17:20:04 UTC

Created attachment 199178 [details]
patch

I noticed that nvidia-uvm also has an unsupported mode which is trivial to port so here's a new version of the patch for x11/nvidia-driver.  Now it should install a dummy nvidia-uvm kernel module that you can load with kldload nvidia-uvm.  You probably also need to adjust the permissions on /dev/nvidia-uvm.

Please give it a try.  If it doesn't work then create another ktrace.

Comment 8 Alex S 2018-11-12 18:11:35 UTC

I tried unsupported mode before (https://github.com/shkhln/nvshim/blob/master/src/libc/sys/ioctl.c#L15) and I think it is, well, actually unsupported. Too lazy to setup a proper Linux system for testing.

Comment 9 Alex S 2018-11-12 22:43:21 UTC

> I tried unsupported mode before (https://github.com/shkhln/nvshim/blob/master/src/libc/sys/ioctl.c#L15) and I think it is, well, actually unsupported.

Ok, turns out I'm just dumb. Disregard that.

Comment 10 Tijl Coosemans freebsd_committer

2018-11-13 15:03:33 UTC

Comment on attachment 199178 [details]
patch

I've uploaded a new patch to bug 217901 addressing issues with the ioctl handler.

Comment 11 Tijl Coosemans freebsd_committer

2018-11-13 15:37:06 UTC

The port of nvidia-uvm is incomplete.  It doesn't handle ioctl calls from Linux programs yet.

Comment 12 Tijl Coosemans freebsd_committer

2018-11-19 10:51:09 UTC

Patch4 in bug 217901 contains an updated nvidia-uvm module (still unsupported mode).  Please give it a try and provide another ktrace if it doesn't work.

Comment 13 Alex S 2018-11-20 06:37:16 UTC

There are quite a few stubs in nvidia.ko, some of them might be required for CUDA. For example, running matrixMul from CUDA SDK and glxgears with this dtrace script:

#!/usr/sbin/dtrace -s

nvidia:*:entry, nvidia-modeset:*:entry /execname == "matrixMul"/ {
  @counts[probefunc] = min(1);
}

nvidia:*:entry, nvidia-modeset:*:entry /execname != "matrixMul"/ {
  @counts[probefunc] = min(0);
}

Gives me:

os_lock_user_pages 1

Comment 14 Tijl Coosemans freebsd_committer

2018-11-21 13:54:12 UTC

(In reply to Alex S from comment #13)
Does the CUDA program get further now that /dev/nvidia-uvm exists?  Or do you see this without /dev/nvidia-uvm as well?

Comment 15 Alex S 2018-11-21 17:25:38 UTC

(In reply to Tijl Coosemans from comment #14)

Are you able to test 390.87 yourself?

> Or do you see this without /dev/nvidia-uvm as well?

Without /dev/nvidia-uvm I see CUDA trying to pass an error code (-2) into ioctl call: https://forums.freebsd.org/threads/linux-binary-compatibility-nvidia-drivers-and-cuda-for-blender.65065/#post-382015.

> Does the CUDA program get further now that /dev/nvidia-uvm exists?

Please note that I'm on 11.2-RELEASE and I'm not currently able to test your patches. Other that that, with "unsupported mode UVM" matrixMul sample prints some vague "all CUDA-capable devices are busy or unavailable" error message. Replacing "return NV_ERR_NOT_SUPPORTED" with "return NV_OK" (without proper implementation) in os_lock_user_pages and os_unlock_user_pages seems to trick it into actually reading some (garbage) data. Dtrace reports nv_register_user_pages and nv_unregister_user_pages being called.

Comment 16 Alex S 2018-11-21 17:40:00 UTC

(In reply to Alex S from comment #15)

> that that
* than that

Comment 17 Tijl Coosemans freebsd_committer

2018-11-21 20:19:32 UTC

(In reply to Alex S from comment #15)
I cannot test 390, only 304.  And I'm only interested in enabling linux64 in the nvidia-driver to make linux-c7 the default.  The nvidia-driver is the last blocker for that.  If I can get CUDA working at the same time that would be a nice bonus, but it's not a priority for me.  I can take a look at os_lock_user_pages and friends, but no promises.

If you modified your 11.2 kernel as in bug 206711 you should be able to test my patch for x11/nvidia-driver.

Comment 18 Gleb Popov freebsd_committer

2019-11-26 11:44:34 UTC

The linux-nvidia-libs installs libnvidia-fatbinaryloader.so and linux-c7 is default as of now.

Can this be closed?

Comment 19 Henry Hu 2019-11-27 01:08:28 UTC

(In reply to Gleb Popov from comment #18)
I think that it can be closed.

Comment 20 Alex S 2021-10-16 16:35:24 UTC

(In reply to Alex S from comment #13)

For the record, upstream was kind enough to implement os_lock_user_pages (and related functions) in 495.29.05, so now we only have to deal with the UVM stuff.