Bug 285803 - graphics/nvidia-drm-66-kmod Suspend failure
Summary: graphics/nvidia-drm-66-kmod Suspend failure
Status: New
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-ports-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-03-31 10:40 UTC by Ben Hutton
Modified: 2025-04-15 13:47 UTC (History)
5 users (show)

See Also:
bugzilla: maintainer-feedback? (ashafer)


Attachments
Nvidia Log (413.38 KB, text/plain)
2025-04-01 23:42 UTC, Ben Hutton
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ben Hutton 2025-03-31 10:40:24 UTC
After upgrading from graphics/nvidia-drm-66-kmod from 550.127.05.1500034_1 to 570.124.04.1500034_1 suspend no longer works. When I close the lid on my laptop the screen just goes blank and it doesn't go into suspend mode. After a few minutes I get dumped into the terminal then the below appears in /var/log/messages. I also capture the following the bellow on the console:

Note: I also upgraded from FreeBSD Current 1500034 to 1500035 with both versions having the same suspend issue.

Also note I both installed the nvidia-drm from the latest pkg I also compile from the latest ports.

This is occurring on a Lenovo Thinkpad P1 Gen3 with a Nvidia Quadro T2000 with the latest BIOS with FreeBSD Current and KDE Plasma 6 and SDDM. If I stay on the current version of the DRM Drivers 550.127.05.1500034_1 suspend/resume is working.

More than happy to assist with debugging. While I have rolled back to the previous boot environment i did save the boot environment with the issue.

Console Errors:

DEVICE_SUSPEND(acpi0) failed: 5
DEVICE_SUSPEND(nexus0) failed: 5
acpi0: device_suspend failed

... kernel: nvidia-modeset: ERROR: GPU:0: Failed detecting connected display devices


/var/log/messages

Mar 31 08:48:47 tesla kernel: vgapci1: child drmn1 requested pci_set_powerstate   
Mar 31 08:48:47 tesla kernel: NVRM: GPU at PCI:0000:01:00: GPU-b3a0ee73-8822-4b5d-35ac-5c952972aa91   
Mar 31 08:48:47 tesla kernel: NVRM: Xid (PCI:0000:01:00): 120, GSP task exception: load access fault (cause:0x5) @ pc:0x50fe804, task:1   
Mar 31 08:48:47 tesla kernel: NVRM:     Reported by libos task:0 v2.0 \[0\] @ ts:1743410864   
Mar 31 08:48:47 tesla kernel: NVRM:     RISC-V CSR State:   
Mar 31 08:48:47 tesla kernel: NVRM:         mstatus:0x000000001e000000  mscratch:0x0000000000000000     mie:0x0000000000000880  mip:0x0000000000000000   
Mar 31 08:48:47 tesla kernel: NVRM:         mepc:0x00000000050fe804  mbadaddr:0x0000000000000080  mcause:0x0000000000000005   
Mar 31 08:48:47 tesla kernel: NVRM:     RISC-V GPR State:   
Mar 31 08:48:47 tesla kernel: NVRM:         ra:0x00000000050fe7cc   sp:0x0000000005b91a30   gp:0x0000000000000000   tp:0x0000000000000000   
Mar 31 08:48:47 tesla kernel: NVRM:         a0:0x0000000000000000   a1:0x0000000005b3c060   a2:0x0000000000000004   a3:0x0000000000000000   
Mar 31 08:48:47 tesla kernel: NVRM:         a4:0x0000000000000000   a5:0x0000000000000100   a6:0x0000000000000001   a7:0x0000000000000003   
Mar 31 08:48:47 tesla kernel: NVRM:         s0:0x0000000005b91a90   s1:0x0000000005b91ad0   s2:0x0000000005b91ab0   s3:0x0000000004775cb0   
Mar 31 08:48:47 tesla kernel: NVRM:         s4:0x800000000019f710   s5:0x80000000001b9a90   s6:0x800000000022bbb0   s7:0x00000000041db000   
Mar 31 08:48:47 tesla kernel: NVRM:         s8:0x80000000003325d0   s9:0x00000000041db000  s10:0x0000000000000000  s11:0x0000000000073d00   
Mar 31 08:48:47 tesla kernel: NVRM:         t0:0x0000000000000009   t1:0x0000000005a86dec   t2:0x800000000003ef70   t3:0x0000000000000020   
Mar 31 08:48:47 tesla kernel: NVRM:         t4:0x0000000000000000   t5:0x0000000005b91811   t6:0x0000000000000000   
Mar 31 08:48:47 tesla kernel: NVRM:     Stack Trace:   
Mar 31 08:48:47 tesla kernel: NVRM:         0x00000000050fe804   
Mar 31 08:48:47 tesla kernel: NVRM:         0x00000000051133dc   
Mar 31 08:48:47 tesla kernel: NVRM:         0x000000000511d8e4   
Mar 31 08:48:47 tesla kernel: NVRM:         0x00000000050b8098   
Mar 31 08:48:47 tesla kernel: NVRM:         0x000000000521ecd4   
Mar 31 08:48:47 tesla kernel: NVRM:         0x0000000005ad099c   
Mar 31 08:48:47 tesla kernel: NVRM:         0x0000000005a86398   
Mar 31 08:48:47 tesla kernel: NVRM:         0x0000000005a88a80   
Mar 31 08:48:47 tesla kernel: NVRM:         0x0000000005aa2c6c   
Mar 31 08:48:47 tesla kernel: NVRM:         0x0000000005aadd68   
Mar 31 08:48:47 tesla kernel: NVRM:     PC Trace:   
Mar 31 08:48:47 tesla kernel: NVRM:         0x0000000004018064  0x000000000401a8a8  0x0000000004018d0c  0x000000000400e35c  0x0000000004018c4c   
Mar 31 08:48:47 tesla kernel: NVRM:         0x000000000401832c  0x000000000400e35c  0x00000000040183a0  0x0000000004018c3c  0x0000000004018110   
Mar 31 08:48:47 tesla kernel: NVRM:         0x0000000004018c1c  0x000000000401aa84  0x0000000004018a84  0x00000000040181d8  0x0000000004018b90   
Mar 31 08:48:47 tesla kernel: NVRM:         0x000000000401832c  0x000000000400e35c  0x00000000040183a0  0x0000000004018bc8  0x00000000040181d8   
Mar 31 08:48:47 tesla kernel: NVRM:         0x0000000004018b90  0x000000000401832c  0x000000000400e35c  0x00000000040183a0  0x0000000004018bc8   
Mar 31 08:48:47 tesla kernel: NVRM:         0x00000000040181d8  0x0000000004018b90  0x000000000401832c  0x000000000400e35c  0x00000000040183a0   
Mar 31 08:48:47 tesla kernel: NVRM:         0x0000000004018bc8  0x00000000040181d8  0x0000000004018b90  0x000000000401832c  0x000000000400e35c   
Mar 31 08:48:47 tesla kernel: NVRM:         0x00000000040183a0   
Mar 31 08:48:47 tesla kernel: NVRM:     External I/O Register State:   
Mar 31 08:48:47 tesla kernel: NVRM:         0x00111360:0x00000000   0x00111364:0xbadf5108   0x00111368:0x0000e828   0x0011136c:0x00000000   
Mar 31 08:48:47 tesla kernel: NVRM:         0x001112b4:0x00040040   0x001112b8:0x00000040   0x001112bc:0x00000000   0x00111344:0x11100000   
Mar 31 08:48:47 tesla kernel: NVRM:         0x00110008:0x00008050   0x0011010c:0x00000000   0x00110118:0x00012022   0x00110110:0x003f8950   
Mar 31 08:48:47 tesla kernel: NVRM:         0x00110128:0x00000000   0x00110114:0x00005060   0x0011011c:0x00000010   
Mar 31 08:48:47 tesla kernel: NVRM:     ------------\[ end crash report \]------------   
Mar 31 08:48:47 tesla kernel: NVRM: Xid (PCI:0000:01:00): 119, Timeout after 6s of waiting for RPC response from GPU0 GSP! Expected function 47 (UNLOADING_GUEST_DRIVER) (0x0 0x0).   
Mar 31 08:48:47 tesla kernel: NVRM: GPU0 GSP RPC buffer contains function 4128 (GSP_POST_NOCAT_RECORD) and data 0x0000000000000005 0x00000000050fe7cc.   
Mar 31 08:48:47 tesla kernel: NVRM: GPU0 RPC history (CPU -> GSP):   
Mar 31 08:48:47 tesla kernel: NVRM:     entry function                   data0              data1              ts_start           ts_end             duration actively_polling   
Mar 31 08:48:47 tesla kernel: NVRM:      0    47   UNLOADING_GUEST_DRIVE 0x0000000000000000 0x0000000000000000 0x00063198c4b1748d 0x0000000000000000          y   
Mar 31 08:48:47 tesla kernel: NVRM:     -1    76   GSP_RM_CONTROL        0x0000000020800301 0x0000000000000014 0x00063198c4a61a4c 0x00063198c4a61a4c              
Mar 31 08:48:47 tesla kernel: NVRM:     -2    103  GSP_RM_ALLOC          0x000000000000007e 0x0000000000000018 0x00063198c4a61664 0x00063198c4a61a4c   1000us     
Mar 31 08:48:47 tesla kernel: NVRM:     -3    76   GSP_RM_CONTROL        0x00000000a06f0103 0x0000000000000002 0x00063198c4a6127c 0x00063198c4a61664   1000us     
Mar 31 08:48:47 tesla kernel: NVRM:     -4    103  GSP_RM_ALLOC          0x000000000000c5b5 0x0000000000000008 0x00063198c4a60e96 0x00063198c4a6127c    998us     
Mar 31 08:48:47 tesla kernel: NVRM:     -5    103  GSP_RM_ALLOC          0x000000000000c46f 0x0000000000000170 0x00063198c4a5c074 0x00063198c4a5d3fc   5000us     
Mar 31 08:48:47 tesla kernel: NVRM:     -6    76   GSP_RM_CONTROL        0x0000000020802a08 0x0000000000000004 0x00063198c4a5bc8c 0x00063198c4a5c074   1000us     
Mar 31 08:48:47 tesla kernel: NVRM:     -7    10   FREE                  0x000000000000000c 0x0000000000000000 0x00063198c4a5bc8c 0x00063198c4a5bc8c              
Mar 31 08:48:47 tesla kernel: NVRM: GPU0 RPC event history (CPU <- GSP):   
Mar 31 08:48:47 tesla kernel: NVRM:     entry function                   data0              data1              ts_start           ts_end             duration during_incomplete_rpc   
Mar 31 08:48:47 tesla kernel: NVRM:      0    4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000000050fe7cc 0x00063198c4b27e2d 0x00063198c4b27e2d          y   
Mar 31 08:48:47 tesla kernel: NVRM:     -1    4128 GSP_POST_NOCAT_RECORD 0x0000000000000002 0x0000000000000028 0x00063198c4b1ce64 0x00063198c4b1ce64          y   
Mar 31 08:48:47 tesla kernel: NVRM:     -2    4111 PERF_BRIDGELESS_INFO\_ 0x0000000000000000 0x0000000000000000 0x00063198c481bdb5 0x00063198c481bdb5              
Mar 31 08:48:47 tesla kernel: NVRM:     -3    4111 PERF_BRIDGELESS_INFO\_ 0x0000000000000000 0x0000000000000000 0x00063198c478a38f 0x00063198c478a38f              
Mar 31 08:48:47 tesla kernel: NVRM:     -4    4099 POST_EVENT            0x0000000000000000 0x0000000000000000 0x00063198c476193b 0x00063198c476193b              
Mar 31 08:48:47 tesla kernel: NVRM:     -5    4099 POST_EVENT            0x0000000000000000 0x0000000000000000 0x00063198c476193b 0x00063198c476193b              
Mar 31 08:48:47 tesla kernel: NVRM:     -6    4099 POST_EVENT            0x0000000000000000 0x0000000000000000 0x00063198c475d6df 0x00063198c475d6df              
Mar 31 08:48:47 tesla kernel: NVRM:     -7    4099 POST_EVENT            0x0000000000000000 0x0000000000000000 0x00063198c475d6df 0x00063198c475d6df              
Mar 31 08:48:47 tesla kernel: #0 0xffffffff855a9d28 at os_dump_stack+0x18   
Mar 31 08:48:47 tesla kernel: #1 0xffffffff84ebdc68 at nv013200rm+0x508   
Mar 31 08:48:47 tesla kernel: DEVICESUSPEND(nvidia0) failed: 5   
Mar 31 08:48:47 tesla kernel: DEVICE_SUSPEND(vgapci0) failed: 5   
Mar 31 08:48:47 tesla kernel: DEVICE_SUSPEND(pci1) failed: 5   
Mar 31 08:48:47 tesla kernel: DEVICE_SUSPEND(pcib1) failed: 5   
Mar 31 08:48:47 tesla kernel: vgapci1: child drmn1 requested pci_set_powerstate   
Mar 31 08:48:47 tesla kernel: vgapci1: child drmn1 requested pci_enable_io   
Mar 31 08:48:47 tesla syslogd: last message repeated 1 times   
Mar 31 08:48:47 tesla kernel: pci6: failed to set ACPI power state D3 on \\\_SB\_.PCI0.RP07.PXSX: AE_BAD_PARAMETER
Comment 1 Tomoaki AOKI 2025-03-31 12:10:03 UTC
Maybe Bug 285741 would be related (the reporter claims that he's using x11/nvidia-driver only, but not graphics/nvidia-drm-*-kmod).

But the reported failure is on resume. So not 100% sure these are actually the same issue.
And the last 550 series of drivers, 550.144.03 that we skipped, is reported to be the latest working driver for him just as previous 550.127.05.

Anyway, I'm quite surprized that at least 2 persons could have using suspend/resume, as I never succeeded to resume on ThinkPads after FreeBSD switched from APM to ACPI.
This is because I've never tested suspend/resume on submitting patches (simply impossible for me to test).
Comment 2 Eric Turgeon freebsd_committer freebsd_triage 2025-03-31 20:13:43 UTC
(In reply to Tomoaki AOKI from comment #1)
Not with a resume, but not going entirely to sleep, the screen goes to sleep, but not the desktop system.
Comment 3 Austin Shafer 2025-04-01 15:13:32 UTC
Can you please capture a nvidia-bug-report.sh log? This is on Plasma 6 X11 correct? Does this happen with other desktop environments like xfce? I assume it will.

Is this with the laptop in hybrid graphics mode or NVIDIA-only mode? Does doing a regular VT switch to and from KDE work?

You should verify that hw.nvidiadrm.modeset=1 and hw.nvidiadrm.fbdev=0. If you've installed things through ports this should be the case.

One of the other new things is GSP firmware being enabled on FreeBSD: https://download.nvidia.com/XFree86/FreeBSD-x86_64/560.31.02/README/gsp.html

I would see if you can reproduce with disabling GSP, you should just be able to add  hw.nvidia.registry.EnableGpuFirmware=0 to loader.conf.
Comment 4 Ben Hutton 2025-04-01 23:42:42 UTC
Created attachment 259260 [details]
Nvidia Log
Comment 5 Ben Hutton 2025-04-01 23:54:52 UTC
(In reply to Austin Shafer from comment #3)

Can you please capture a nvidia-bug-report.sh log? This is on Plasma 6 X11 correct? Does this happen with other desktop environments like xfce? I assume it will.

- Have uploaded the log 
- Yes it is Plasma 6 X11
- It does fail with xfce as well.

Is this with the laptop in hybrid graphics mode or NVIDIA-only mode? Does doing a regular VT switch to and from KDE work?

- Yes it's in hybrid mode
- It does do a VT switch to and from KDE

You should verify that hw.nvidiadrm.modeset=1 and hw.nvidiadrm.fbdev=0. If you've installed things through ports this should be the case.

- Yes both are set as above

One of the other new things is GSP firmware being enabled on FreeBSD: https://download.nvidia.com/XFree86/FreeBSD-x86_64/560.31.02/README/gsp.html

I would see if you can reproduce with disabling GSP, you should just be able to add  hw.nvidia.registry.EnableGpuFirmware=0 to loader.conf.

- With hw.nvidia.registry.EnableGpuFirmware=0 set suspend now works
Comment 6 commit-hook freebsd_committer freebsd_triage 2025-04-15 13:33:00 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/ports/commit/?id=9c0e0196bdc6ddf75e801bda7f673ee2db645ad7

commit 9c0e0196bdc6ddf75e801bda7f673ee2db645ad7
Author:     Austin Shafer <ashafer@FreeBSD.org>
AuthorDate: 2025-04-14 16:19:19 +0000
Commit:     Austin Shafer <ashafer@FreeBSD.org>
CommitDate: 2025-04-15 13:31:07 +0000

    x11/nvidia-driver: disable GSP Firmware by default

    Users have reported issues with suspend/resume when GSP firmware is
    enabled. This change disables GSP to unbreak desktop use cases while
    a fix is delivered in a future driver version

    PR:             285803
    Reviewed by:    Tomoaki AOKI <junchoon@dec.sakura.ne.jp>
    Approved by:    kbowling (mentor)
    Differential Revision:  https://reviews.freebsd.org/D49828

 x11/nvidia-driver/Makefile                                    |  8 +++++++-
 .../files/extra-gsp-patch-src-nvidia_subr.c.in (new)          | 11 +++++++++++
 2 files changed, 18 insertions(+), 1 deletion(-)
Comment 7 Chad Jacob Milios 2025-04-15 13:47:24 UTC
(In reply to commit-hook from comment #6)

dont forget to bump nvidia-secondary-driver when bumping nvidia-driver.

or...let's ask why does nvidia-secondary-driver explicitly set PORTREVISION anyway?