Bug 286034 - x11/nvidia-driver problems
Summary: x11/nvidia-driver problems
Status: New
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-x11 (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-04-11 08:26 UTC by rob2g2
Modified: 2025-04-11 11:31 UTC (History)
1 user (show)

See Also:
bugzilla: maintainer-feedback? (x11)


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description rob2g2 2025-04-11 08:26:55 UTC
Using FreeBSD 14.2-RELEASE, upgrading to nvidia-driver-570.124.04.1401000 brought some issues. 

First, I am using xfce4. Since the upgrade of nvidia-driver to the version mentioned above, xfwm4 keeps consuming memory and not releasing it until at some point it uses around 40GB of RES memory out of my 64. This issue seems to be known with nvidia, see https://forums.developer.nvidia.com/t/extreme-growing-memory-usage-in-x11-opengl-or-vulkan-applications-after-suspend-resume/329078 - also for reference the issue of xfce: https://gitlab.xfce.org/xfce/xfwm4/-/issues/825

Also, at some point my NVIDIA GeForce GTX 1650 stops working - the desktop freezes. It occurs rarely, like once a week and leaves the following in the dmesg:

NVRM: GPU at PCI:0000:0a:00: GPU-eba90a43-57cc-af7c-1928-bf26dbe69c93
NVRM: Xid (PCI:0000:0a:00): 62, 000120ab 00012107 00011c38 00015afb 00015f06 00013f17 00000011 00000000
NVRM: Xid (PCI:0000:0a:00): 119, Timeout after 6s of waiting for RPC response from GPU0 GSP! Expected function 76 (GSP_RM_CONTROL) (0x20800a56 0x5c).
NVRM: GPU0 GSP RPC buffer contains function 76 (GSP_RM_CONTROL) and data 0x0000000020800a56 0x000000000000005c.
NVRM: GPU0 RPC history (CPU -> GSP):
NVRM:     entry function                   data0              data1              ts_start           ts_end             duration actively_polling
NVRM:      0    76   GSP_RM_CONTROL        0x0000000020800a56 0x000000000000005c 0x0006325f5ea56d56 0x0000000000000000          y
NVRM:     -1    76   GSP_RM_CONTROL        0x00000000c3700104 0x0000000000000014 0x0006325f5e685c87 0x0006325f5e68606f   1000us  
NVRM:     -2    76   GSP_RM_CONTROL        0x00000000c3700104 0x0000000000000014 0x0006325f5e68589f 0x0006325f5e685c87   1000us  
NVRM:     -3    76   GSP_RM_CONTROL        0x00000000c3700104 0x0000000000000014 0x0006325f5e684516 0x0006325f5e684ce7   2001us  
NVRM:     -4    76   GSP_RM_CONTROL        0x00000000c3700104 0x0000000000000014 0x0006325f5e681e07 0x0006325f5e681e07           
NVRM:     -5    76   GSP_RM_CONTROL        0x00000000c3700104 0x0000000000000014 0x0006325f5e680697 0x0006325f5e680a7f   1000us  
NVRM:     -6    76   GSP_RM_CONTROL        0x00000000c3700104 0x0000000000000014 0x0006325f5e67e757 0x0006325f5e67e757           
NVRM:     -7    76   GSP_RM_CONTROL        0x00000000c3700104 0x0000000000000014 0x0006325f5e67d7b7 0x0006325f5e67db9f   1000us  
NVRM: GPU0 RPC event history (CPU <- GSP):
NVRM:     entry function                   data0              data1              ts_start           ts_end             duration during_incomplete_rpc
NVRM:      0    4130 RECOVERY_ACTION       0x0000000000000000 0x0000000000000000 0x0006325f5ea56d56 0x0006325f5ea56d56          y
NVRM:     -1    4102 OS_ERROR_LOG          0x0000000000000000 0x0000000000000000 0x0006325f5ea56d56 0x0006325f5ea56d56          y
NVRM:     -2    4128 GSP_POST_NOCAT_RECORD 0x0000000000000003 0x00000000000120ab 0x0006325f5ea56d56 0x0006325f5ea56d56          y
NVRM:     -3    4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000285057cb854 0x0006325f22f7deeb 0x0006325f22f7deeb           
NVRM:     -4    4128 GSP_POST_NOCAT_RECORD 0x0000000000000002 0x0000000000000025 0x0006325ee4d1cabb 0x0006325ee4d1cabb           
NVRM:     -5    4099 POST_EVENT            0x0000000000000001 0x0000000000000000 0x0006325ee4d1c6d3 0x0006325ee4d1c6d3           
NVRM:     -6    4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000285057cb854 0x0006325ee44b83bc 0x0006325ee44b83bc           
NVRM:     -7    4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000285057cb854 0x0006325ee44ac06b 0x0006325ee44ac06b           
#0 0xffffffff847a9d38 at os_dump_stack+0x18
#1 0xffffffff840bdc68 at _nv013200rm+0x508
NVRM: Xid (PCI:0000:0a:00): 154, GPU recovery action changed from 0x0 (None) to 0x1 (GPU Reset Required)
NVRM: Xid (PCI:0000:0a:00): 119, pid=6098, name=thunderbird, Timeout after 6s of waiting for RPC response from GPU0 GSP! Expected function 76 (GSP_RM_CONTROL) (0x20800a6a 0x0).
NVRM: Xid (PCI:0000:0a:00): 119, pid=70894, name=xfwm4, Timeout after 6s of waiting for RPC response from GPU0 GSP! Expected function 10 (FREE) (0xbeef0403 0x0).
NVRM: Rate limiting GSP RPC error prints for GPU at PCI:0000:0a:00 (printing 1 of every 30).  The GPU likely needs to be reset.
NVRM: Xid (PCI:0000:0a:00): 16, Head 00000003 Count 006c8dcf
Comment 1 Tomoaki AOKI 2025-04-11 11:23:39 UTC
(In reply to rob2g2 from comment #0)
> Using FreeBSD 14.2-RELEASE, upgrading to nvidia-driver-570.124.04.1401000 brought some issues.

This could be the prolem. In your version string, 1401000 means it is built for 14.1-Release. But x11/nvidia-driver is relatively robust with this compared with graphics/*drm-*-kmod, which are quite sensitive with the version of base linuxkpi.ko.

But more suspicious would be

> NVRM: Xid (PCI:0000:0a:00): 119, Timeout after 6s of waiting for RPC response from GPU0 GSP! Expected function 76 (GSP_RM_CONTROL) (0x20800a56 0x5c).

and anything alike in following outputs.

This seems to be problems related with GSP (GPU System Processor), which Turing and later generations of nvidia GPU has, and your GeForce GTX 1650 should be a Turing generation GPU.

So you can try adding hw.nvidia.registry.EnableGpuFirmware=0 to your loader.conf and following restart could help. This disables GSP.