Bug 291337 - amdgpu crash when calling unregister_fictitious_range() / on 14.3-RELEASE or 15.0-RELEASE
Summary: amdgpu crash when calling unregister_fictitious_range() / on 14.3-RELEASE or ...
Status: Open
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: amd64 Any
: --- Affects Many People
Assignee: freebsd-x11 (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-12-01 21:11 UTC by Pierre Beyssac
Modified: 2025-12-09 09:33 UTC (History)
4 users (show)

See Also:
linimon: maintainer-feedback? (x11)


Attachments
stack trace with system information (140.61 KB, application/x-troff-man)
2025-12-01 21:11 UTC, Pierre Beyssac
no flags Details
system dmesg (17.24 KB, application/x-troff-man)
2025-12-01 21:12 UTC, Pierre Beyssac
no flags Details
Xorg.0.log file *after* fixing the crash (5.69 KB, text/plain)
2025-12-01 21:15 UTC, Pierre Beyssac
no flags Details
A diff is worth a thousand words (440 bytes, patch)
2025-12-04 18:58 UTC, Pierre Beyssac
no flags Details | Diff
pciconf -lv (11.13 KB, text/plain)
2025-12-08 18:56 UTC, Pierre Beyssac
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Pierre Beyssac 2025-12-01 21:11:57 UTC
Created attachment 265758 [details]
stack trace with system information

My system has a built-in GPU in a Ryzen 5 5600G. Driver firmware = green_sardine.

When I try to install an external GPU (AMD RX 9060 XT), amdgpu crashes on module load, when calling vm_phys_fictitious_unreg_range().

vm_phys_fictitious_unreg_range calls RB_FIND at line 1215 (lines for FreeBSD 15.0 source code), which returns a NULL pointer in "seg", so it crashes when referencing it at the next line:
  if (seg->start != start || seg->end != end)

The root cause seems to be that amdgpu fails to initialize the driver (probably due to the 9060 XT, as it doesn't crash if not present).

[drm] initializing kernel modesetting (IP DISCOVERY 0x1002:0x7590 0x1EAE:0x86010xC0).
[drm] register mmio base: 0xFCD00000
[drm] register mmio size: 524288
drmn0: Fatal error during GPU init
drmn0: amdgpu: finishing device.

So it tries to unregister it by calling vm_phys_fictitious_unreg_range(), but it has likely never called vm_phys_fictitious_reg_range().

The crash happens in vm_phys_fictitious_unreg_range() but it feels like a bug in amdgpu not doing proper resource tracking, although vm_phys_fictitious_unreg_range() might also check the pointer before using it.
Comment 1 Pierre Beyssac 2025-12-01 21:12:39 UTC
Created attachment 265759 [details]
system dmesg
Comment 2 Pierre Beyssac 2025-12-01 21:15:50 UTC
Created attachment 265760 [details]
Xorg.0.log file *after* fixing the crash

I have commented-out the call to unregister_fictitious_range() in amdgpu. This fixes the crash, but Xorg still fails to start with the included GPU (it works when the new card is not present in the machine).
Comment 3 Pierre Beyssac 2025-12-01 21:18:19 UTC
Notes:
- exact same crash on 14.3-RELEASE with drm-kmod 61 instead of 66.
- adding the card in the "pptdevs" loader variable doesn't help at all; it even breaks the text console once executed.
Comment 4 Pierre Beyssac 2025-12-04 18:58:03 UTC
Created attachment 265846 [details]
A diff is worth a thousand words

This quick & dirty patch doesn't fix the underlying issue, but it fixes the crash.
Comment 5 Emmanuel Vadot freebsd_committer freebsd_triage 2025-12-05 18:29:44 UTC
I think that this card isn't supported yet, what you're patching is the unload code which hasn't work on FreeBSD for a long time (and yes at some point we should fix this).
Comment 6 Pierre Beyssac 2025-12-05 23:40:11 UTC
(In reply to Emmanuel Vadot from comment #5)

Thanks Emmanuel. I wasn't too sure whether it was supported.

My hope at this time is just for the card to be ignored by amdgpu so that I can leave it in the machine :) or ideally, be able to configure a passthru to get it to work in another OS in bhyve.
Comment 7 Bjoern A. Zeeb freebsd_committer freebsd_triage 2025-12-08 11:46:51 UTC
(In reply to Pierre Beyssac from comment #6)

You can use pptdevs in loader.conf (see man vmm) to "take it away" from amdgpu.
I believe passthru for amdgpu is not there yet for bhyve but you'd be prepared for that this way as well ;-)

There are other ways to just "disable" it (like devctl disable) to prevent any driver from attaching.  See the "disabled" in https://man.freebsd.org/cgi/man.cgi?query=device.hints .

Hope that helps for now.
Comment 8 Pierre Beyssac 2025-12-08 18:56:10 UTC
(In reply to Bjoern A. Zeeb from comment #7)

Thanks for the devices.hints tip, I didn't try this yet.

I already tried pptdevs, but it didn't seem to be honored by amdgpu, it stills probes the device and crashes. Maybe my specification iq wrong?
The line was:
  pptdevs="14/0/0 14/0/1 14/0/2 14/0/3 14/0/4 14/0/6"
and the relevant pciconf output is below (full system pciconf -lv as an attachment).

vgapci1@pci0:14:0:0:	class=0x030000 rev=0xc9 hdr=0x00 vendor=0x1002 device=0x1638 subvendor=0x1043 subdevice=0x8809
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'Cezanne [Radeon Vega Series / Radeon Vega Mobile Series]'
    class      = display
    subclass   = VGA
hdac1@pci0:14:0:1:	class=0x040300 rev=0x00 hdr=0x00 vendor=0x1002 device=0x1637 subvendor=0x1043 subdevice=0x8809
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'Renoir/Cezanne HDMI/DP Audio Controller'
    class      = multimedia
    subclass   = HDA
none0@pci0:14:0:2:	class=0x108000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x15df subvendor=0x1043 subdevice=0x8809
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Raven/Raven2/FireFlight/Renoir/Cezanne Platform Security Processor'
    class      = encrypt/decrypt
xhci1@pci0:14:0:3:	class=0x0c0330 rev=0x00 hdr=0x00 vendor=0x1022 device=0x1639 subvendor=0x1043 subdevice=0x87e1
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Renoir/Cezanne USB 3.1'
    class      = serial bus
    subclass   = USB
xhci2@pci0:14:0:4:	class=0x0c0330 rev=0x00 hdr=0x00 vendor=0x1022 device=0x1639 subvendor=0x1043 subdevice=0x87e1
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Renoir/Cezanne USB 3.1'
    class      = serial bus
    subclass   = USB
hdac2@pci0:14:0:6:	class=0x040300 rev=0x00 hdr=0x00 vendor=0x1022 device=0x15e3 subvendor=0x1043 subdevice=0x86c7
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Family 17h/19h/1ah HD Audio Controller'
    class      = multimedia
    subclass   = HDA
Comment 9 Pierre Beyssac 2025-12-08 18:56:48 UTC
Created attachment 266000 [details]
pciconf -lv
Comment 10 Emmanuel Vadot freebsd_committer freebsd_triage 2025-12-09 08:52:06 UTC
I think that you need to load vmm(4) before amdgpu otherwise the pci device will not be assigned to the ppt driver
Comment 11 Pierre Beyssac 2025-12-09 09:33:45 UTC
(In reply to Emmanuel Vadot from comment #10)
I load vmm with loader.conf, amdgpu by hand after the boot or from rc.conf.