Created attachment 265758 [details] stack trace with system information My system has a built-in GPU in a Ryzen 5 5600G. Driver firmware = green_sardine. When I try to install an external GPU (AMD RX 9060 XT), amdgpu crashes on module load, when calling vm_phys_fictitious_unreg_range(). vm_phys_fictitious_unreg_range calls RB_FIND at line 1215 (lines for FreeBSD 15.0 source code), which returns a NULL pointer in "seg", so it crashes when referencing it at the next line: if (seg->start != start || seg->end != end) The root cause seems to be that amdgpu fails to initialize the driver (probably due to the 9060 XT, as it doesn't crash if not present). [drm] initializing kernel modesetting (IP DISCOVERY 0x1002:0x7590 0x1EAE:0x86010xC0). [drm] register mmio base: 0xFCD00000 [drm] register mmio size: 524288 drmn0: Fatal error during GPU init drmn0: amdgpu: finishing device. So it tries to unregister it by calling vm_phys_fictitious_unreg_range(), but it has likely never called vm_phys_fictitious_reg_range(). The crash happens in vm_phys_fictitious_unreg_range() but it feels like a bug in amdgpu not doing proper resource tracking, although vm_phys_fictitious_unreg_range() might also check the pointer before using it.
Created attachment 265759 [details] system dmesg
Created attachment 265760 [details] Xorg.0.log file *after* fixing the crash I have commented-out the call to unregister_fictitious_range() in amdgpu. This fixes the crash, but Xorg still fails to start with the included GPU (it works when the new card is not present in the machine).
Notes: - exact same crash on 14.3-RELEASE with drm-kmod 61 instead of 66. - adding the card in the "pptdevs" loader variable doesn't help at all; it even breaks the text console once executed.
Created attachment 265846 [details] A diff is worth a thousand words This quick & dirty patch doesn't fix the underlying issue, but it fixes the crash.
I think that this card isn't supported yet, what you're patching is the unload code which hasn't work on FreeBSD for a long time (and yes at some point we should fix this).
(In reply to Emmanuel Vadot from comment #5) Thanks Emmanuel. I wasn't too sure whether it was supported. My hope at this time is just for the card to be ignored by amdgpu so that I can leave it in the machine :) or ideally, be able to configure a passthru to get it to work in another OS in bhyve.
(In reply to Pierre Beyssac from comment #6) You can use pptdevs in loader.conf (see man vmm) to "take it away" from amdgpu. I believe passthru for amdgpu is not there yet for bhyve but you'd be prepared for that this way as well ;-) There are other ways to just "disable" it (like devctl disable) to prevent any driver from attaching. See the "disabled" in https://man.freebsd.org/cgi/man.cgi?query=device.hints . Hope that helps for now.
(In reply to Bjoern A. Zeeb from comment #7) Thanks for the devices.hints tip, I didn't try this yet. I already tried pptdevs, but it didn't seem to be honored by amdgpu, it stills probes the device and crashes. Maybe my specification iq wrong? The line was: pptdevs="14/0/0 14/0/1 14/0/2 14/0/3 14/0/4 14/0/6" and the relevant pciconf output is below (full system pciconf -lv as an attachment). vgapci1@pci0:14:0:0: class=0x030000 rev=0xc9 hdr=0x00 vendor=0x1002 device=0x1638 subvendor=0x1043 subdevice=0x8809 vendor = 'Advanced Micro Devices, Inc. [AMD/ATI]' device = 'Cezanne [Radeon Vega Series / Radeon Vega Mobile Series]' class = display subclass = VGA hdac1@pci0:14:0:1: class=0x040300 rev=0x00 hdr=0x00 vendor=0x1002 device=0x1637 subvendor=0x1043 subdevice=0x8809 vendor = 'Advanced Micro Devices, Inc. [AMD/ATI]' device = 'Renoir/Cezanne HDMI/DP Audio Controller' class = multimedia subclass = HDA none0@pci0:14:0:2: class=0x108000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x15df subvendor=0x1043 subdevice=0x8809 vendor = 'Advanced Micro Devices, Inc. [AMD]' device = 'Raven/Raven2/FireFlight/Renoir/Cezanne Platform Security Processor' class = encrypt/decrypt xhci1@pci0:14:0:3: class=0x0c0330 rev=0x00 hdr=0x00 vendor=0x1022 device=0x1639 subvendor=0x1043 subdevice=0x87e1 vendor = 'Advanced Micro Devices, Inc. [AMD]' device = 'Renoir/Cezanne USB 3.1' class = serial bus subclass = USB xhci2@pci0:14:0:4: class=0x0c0330 rev=0x00 hdr=0x00 vendor=0x1022 device=0x1639 subvendor=0x1043 subdevice=0x87e1 vendor = 'Advanced Micro Devices, Inc. [AMD]' device = 'Renoir/Cezanne USB 3.1' class = serial bus subclass = USB hdac2@pci0:14:0:6: class=0x040300 rev=0x00 hdr=0x00 vendor=0x1022 device=0x15e3 subvendor=0x1043 subdevice=0x86c7 vendor = 'Advanced Micro Devices, Inc. [AMD]' device = 'Family 17h/19h/1ah HD Audio Controller' class = multimedia subclass = HDA
Created attachment 266000 [details] pciconf -lv
I think that you need to load vmm(4) before amdgpu otherwise the pci device will not be assigned to the ppt driver
(In reply to Emmanuel Vadot from comment #10) I load vmm with loader.conf, amdgpu by hand after the boot or from rc.conf.