283815 – listing dev.vgapci.X.%iommu hangs indefinitely on NVIDIA card

Bug 283815 - listing dev.vgapci.X.%iommu hangs indefinitely on NVIDIA card

Summary: listing dev.vgapci.X.%iommu hangs indefinitely on NVIDIA card

Status:	In Progress

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	14.2-RELEASE
Hardware:	Any Any

Importance:	--- Affects Only Me
Assignee:	Konstantin Belousov

URL:
Keywords:	regression

Depends on:
Blocks:

Reported:	2025-01-03 13:09 UTC by Anton Saietskii
Modified:	2025-01-07 15:37 UTC (History)
CC List:	2 users (show)

See Also:

Attachments
/var/run/dmesg.boot (verbose) (83.42 KB, text/plain) 2025-01-03 13:09 UTC, Anton Saietskii	no flags	Details
'config -x /boot/kernel/kernel' output (1.51 KB, text/plain) 2025-01-03 13:10 UTC, Anton Saietskii	no flags	Details
'kldstat -v' output (8.83 KB, text/plain) 2025-01-03 13:10 UTC, Anton Saietskii	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Anton Saietskii 2025-01-03 13:09:38 UTC

Created attachment 256370 [details]
/var/run/dmesg.boot (verbose)

After upgrading to 14.2R from 14.1R, I noticed that listing all sysctls doesn't work anymore. It stops here:
<CUT>
dev.ahci.%parent:
dev.vgapci.1.%iommu: rid=0x10
dev.vgapci.1.%parent: pci0
dev.vgapci.1.%pnpinfo: vendor=0x8086 device=0x591b subvendor=0x1028 subdevice=0x07b1 class=0x030000
dev.vgapci.1.%location: slot=2 function=0 dbsf=pci0:0:2:0 handle=\_SB_.PCI0.GFX0
dev.vgapci.1.%driver: vgapci
dev.vgapci.1.%desc: VGA-compatible display
dev.vgapci.0.wake: 0
^C^C^C

Process enters R+, eats 100% CPU and becomes unkillable:
jason@jnb: [?:0] ~ $ ps auwwx | grep '[s]ysctl'
jason        25384 100.0  0.0   16888   4880  6  R+   15:00    0:49.15 sysctl -a
jason@jnb: [?:0] ~ $ kill -9 25384
jason@jnb: [?:0] ~ $ ps auwwx | grep '[s]ysctl'
jason        25384 100.0  0.0   16888   4880  6  R+   15:00    3:19.35 sysctl -a
jason@jnb: [?:0] ~ $ procstat -kk 25384
  PID    TID COMM                TDNAME              KSTACK
25384 101712 sysctl              -                   pci_find_cap_method+0x11a iommu_get_requester+0x192 device_sysctl_handler+0x216 sysctl_root_handler_locked+0x8a sysctl_root+0x1fa userland_sysctl+0x115 sys___sysctl+0x60 amd64_syscall+0xed fast_syscall_common+0xf8

In this state, subsequent 'sysctl kern.geom' works fine, while 'sysctl dev.cpu' also hangs:
jason@jnb: [?:0] ~ $ ps auwwx | grep '[s]ysctl'
jason        25436 100.0  0.0   13816   2236  5  R+   15:05    1:30.48 sysctl dev.cpu
jason        25384 100.0  0.0   16888   4880  6  R+   15:00    6:31.35 sysctl -a
jason@jnb: [?:0] ~ $ procstat -kk 25436 25384
  PID    TID COMM                TDNAME              KSTACK
25436 101172 sysctl              -                   sysctl_root_handler_locked+0x143 sysctl_root+0x1fa userland_sysctl+0x115 sys___sysctl+0x60 amd64_syscall+0xed fast_syscall_common+0xf8
25384 101712 sysctl              -                   pci_find_cap_method+0x17a iommu_get_requester+0x192 device_sysctl_handler+0x216 sysctl_root_handler_locked+0x8a sysctl_root+0x1fa userland_sysctl+0x115 sys___sysctl+0x60 amd64_syscall+0xed fast_syscall_common+0xf8
jason@jnb: [?:0] ~ $

No changes have been made to system configuration during upgrade.

Comment 1 Anton Saietskii 2025-01-03 13:10:12 UTC

Created attachment 256371 [details]
'config -x /boot/kernel/kernel' output

Comment 2 Anton Saietskii 2025-01-03 13:10:32 UTC

Created attachment 256372 [details]
'kldstat -v' output

Comment 3 Anton Saietskii 2025-01-03 16:40:50 UTC

Narrowed down the issue to a single OID.

I have the following GPU:
vgapci0@pci0:1:0:0:     class=0x030000 rev=0xa1 hdr=0x00 vendor=0x10de device=0x1bb7 subvendor=0x1028 subdevice=0x07b1
    vendor     = 'NVIDIA Corporation'
    device     = 'GP104GLM [Quadro P4000 Mobile]'
    class      = display
    subclass   = VGA
    cap 01[60] = powerspec 3  supports D0 D3  current D0
    cap 05[68] = MSI supports 1 message, 64 bit
    cap 10[78] = PCI-Express 2 legacy endpoint max data 256(256) RO NS
                 max read 512
                 link x16(x16) speed 8.0(8.0) ASPM L0s/L1(L0s/L1) ClockPM disabled
    ecap 0002[100] = VC 1 max VC0
    ecap 0018[250] = LTR 1
    ecap 0004[128] = Power Budgeting 1
    ecap 0001[420] = AER 2 0 fatal 0 non-fatal 1 corrected
    ecap 000b[600] = Vendor [1] ID 0001 Rev 1 Length 36
                 0b 00 01 90 01 00 41 02 02 00 41 01 01 18 00 00
                 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
                 00 00 00 00
    ecap 0019[900] = PCIe Sec 1 lane errors 0

It's unused in my system, so being turned off by sysutils/acpi_call and xmj@'s turn_off_gpu.sh from TuningPowerConsumption [0]:
vgapci0@pci0:1:0:0:     class=0x030000 rev=0xa1 hdr=0x00 vendor=0x10de device=0x1bb7 subvendor=0x1028 subdevice=0x07b1
    vendor     = 'NVIDIA Corporation'
    device     = 'GP104GLM [Quadro P4000 Mobile]'
    class      = display
    subclass   = VGA
(With '\_SB.PCI0.PEG0.PEGP._OFF' method.)

After NVIDIA turned off, any sysctl call which tries to get dev.vgapci.X.%iommu hangs. Getting other OIDs, e.g. dev.vgapci.0.%driver, works fine.
NOTE: this isn't DRM issue as I don't have any NVIDIA-related modules loaded.

[0]: https://wiki.freebsd.org/TuningPowerConsumption

Comment 4 Anton Saietskii 2025-01-03 17:17:51 UTC

Compared 'vga'-containing commits between releng/14.2 and releng/14.1 -- identical.

Then compared 'iommu'-containing commits (as this OID hangs) and discovered ec8d60f0d9b762880482e39f567db552c152d3a2 by kib@, which exposes the value. This commit is only present in releng/14.2, so I believe it's the trigger. However, most likely no more than a trigger -- not root cause.

Comment 5 Konstantin Belousov freebsd_committer

2025-01-06 23:09:15 UTC

The pci_get_requester() loops somewhere in the call to pci_find_cap_method().
The later is accessing the PCI config space directly, trying to read the header
and to iterate the list of the capabilities, for instance, to read PCIe cap.

To further diagnose the problem, you might try to instrument pci_find_cap_method()
to see which registers it tries to read. My guess is that the cap read cycle gets
something like 0xff as the offset of the next capability and then loops back.

Comment 6 Konstantin Belousov freebsd_committer

2025-01-06 23:33:12 UTC

Try https://reviews.freebsd.org/D48348

Comment 7 Anton Saietskii 2025-01-07 10:06:35 UTC

(In reply to Konstantin Belousov from comment #6)

Thanks for prompt response.
I can confirm that with the patch applied, getting OID in question doesn't hang anymore -- I can see 'dev.vgapci.0.%iommu: rid=0x100' both before and after powering down NVIDIA.

Comment 8 commit-hook freebsd_committer

2025-01-07 15:35:34 UTC

A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=6ba2c036a0117ac02f9979b7dc49f15e9c1ea9c9

commit 6ba2c036a0117ac02f9979b7dc49f15e9c1ea9c9
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2025-01-06 23:29:18 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2025-01-07 15:34:59 +0000

    pci_find_cap_method(): limit number of iterations for finding a capability

    Powered down device might return 0xff of extended config registers
    reads, causing loop.

    PR:     283815
    Reviewed by:    imp
    Sponsored by:   The FreeBSD Foundation
    MFC after:      1 week
    Differential revision:  https://reviews.freebsd.org/D48348

 sys/dev/pci/pci.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)