Bug 289320 - graphics/drm-66-kmod: Page fault on 16-CURRENT after 16-CURRENT rename
Summary: graphics/drm-66-kmod: Page fault on 16-CURRENT after 16-CURRENT rename
Status: Open
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-x11 (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-09-05 13:47 UTC by Cy Schubert
Modified: 2025-11-17 18:26 UTC (History)
7 users (show)

See Also:
bugzilla: maintainer-feedback? (x11)


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Cy Schubert freebsd_committer freebsd_triage 2025-09-05 13:47:24 UTC
When performing kldload amdgpu.ko on an AMD Framework 13 16-CURRENT I get,

#8  vm_phys_fictitious_unreg_range (start=<optimized out>, end=<optimized out>)
    at /opt/src/git-src/sys/vm/vm_phys.c:1216
        pi = <optimized out>
        pe = <optimized out>
        tmp = {node = {rbe_link = {<optimized out>, <optimized out>,
              <optimized out>}}, start = <optimized out>, end = 0,
          first_page = <optimized out>}
        seg = 0x0

It worked perfectly fine yesterday on 15-CURRENT.

The strange thing is drm-66-kmod works on my HP 840 G5. Though that machine is intel based.

graphics/drm-66-kmod was rebuilt on the new system on both.
Comment 1 Marek Zarychta 2025-09-05 14:04:58 UTC
Perhaps rebuilding gpu-firmware will help.
Comment 2 Cy Schubert freebsd_committer freebsd_triage 2025-09-06 06:26:30 UTC
Kernel messages gleaned from the dump:

drmn0: could not load firmware image 'amdgpu/gc_11_0_1_mes.bin'
[drm ERROR :amdgpu_device_ip_early_init] early_init of IP block <mes_v11_0> failed -19
drmn0: Fatal error during GPU init
drmn0: amdgpu: finishing device.
Fatal trap 12: page fault while in kernel mode
cpuid = 10; apic id = 0a
fault virtual address   = 0x18
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80a29ef4
stack pointer           = 0x28:0xfffffe023c7927e0
frame pointer           = 0x28:0xfffffe023c792800
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 47797 (kldload)
rdi: 0000000000000000 rsi: 0000000000000000 rdx: 0000000000000000
rcx: 0000000000000000  r8: fffffffffffffff0  r9: fffffffffffffff0
rax: 0000000000000001 rbx: fffff801505ee780 rbp: fffffe023c792800
r10: 000000000000007c r11: ffffffcffffffff5 r12: fffff801071e4200
r13: fffff801071e5a00 r14: 0000000000000000 r15: fffffe0265d39000
trap number             = 12
panic: page fault
cpuid = 10
time = 1757136418
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe023c792530
vpanic() at vpanic+0x136/frame 0xfffffe023c792660
panic() at panic+0x43/frame 0xfffffe023c7926c0
trap_pfault() at trap_pfault+0x3c9/frame 0xfffffe023c792710
calltrap() at calltrap+0x8/frame 0xfffffe023c792710
--- trap 0xc, rip = 0xffffffff80a29ef4, rsp = 0xfffffe023c7927e0, rbp = 0xfffffe023c792800 ---
vm_phys_fictitious_unreg_range() at vm_phys_fictitious_unreg_range+0xd4/frame 0xfffffe023c792800
unregister_fictitious_range() at unregister_fictitious_range+0xc/frame 0xfffffe023c792810
amdgpu_device_fini_hw() at amdgpu_device_fini_hw+0x117/frame 0xfffffe023c792850
amdgpu_driver_load_kms() at amdgpu_driver_load_kms+0x84/frame 0xfffffe023c792880
amdgpu_pci_probe() at amdgpu_pci_probe+0x29f/frame 0xfffffe023c7928d0
linux_pci_attach_device() at linux_pci_attach_device+0x5ac/frame 0xfffffe023c792930
device_attach() at device_attach+0x43d/frame 0xfffffe023c792980
bus_generic_driver_added() at bus_generic_driver_added+0x73/frame 0xfffffe023c7929a0
devclass_driver_added() at devclass_driver_added+0x29/frame 0xfffffe023c7929d0
devclass_add_driver() at devclass_add_driver+0x11e/frame 0xfffffe023c792a10
_linux_pci_register_driver() at _linux_pci_register_driver+0xcc/frame 0xfffffe023c792a40
amdgpu_evh() at amdgpu_evh+0x73/frame 0xfffffe023c792a50
module_register_init() at module_register_init+0x85/frame 0xfffffe023c792a80
linker_load_module() at linker_load_module+0xc0f/frame 0xfffffe023c792d80
kern_kldload() at kern_kldload+0x165/frame 0xfffffe023c792dd0
sys_kldload() at sys_kldload+0x59/frame 0xfffffe023c792e00
amd64_syscall() at amd64_syscall+0x126/frame 0xfffffe023c792f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe023c792f30
--- syscall (304, FreeBSD ELF64, kldload), rip = 0x1105f4e7efca, rsp = 0x1105f260d758, rbp = 0x1105f260dcd0 ---
Uptime: 29m14s
Dumping 4069 out of 96010 MB:

-19 == ENODEV, operation not supported by device. This worked earlier in the day prior to switching the FreeBSD version from 15-PRERELEASE to 16-CURRENT.
Comment 3 Cy Schubert freebsd_committer freebsd_triage 2025-09-06 14:13:05 UTC
Ok, now this is weird. I reverted "Update main to 16" 8b4e4c2737305df8807abc6cd054a32586085c93. The panic is resolved. What is going on in drm-66-kmod?
Comment 4 Chuck Tuffli freebsd_committer freebsd_triage 2025-09-06 19:50:05 UTC
FWIW, I'm seeing the same (or at least similar) panic also on a Framework 13, but that system is running

FreeBSD 15.0-PRERELEASE main-n280086-851dc7f859c2 GENERIC amd64

with a drm-kmod built against the running kernel. I see this panic with both the 6.6-lts and 6.1-lts branches. Note my panic backtrace looks the same, but the panic message is:

panic: Start of segment isn't less than end (start: 0 end: 0)
Comment 5 Cy Schubert freebsd_committer freebsd_triage 2025-09-08 03:30:40 UTC
(In reply to Chuck Tuffli from comment #4)

Your revision level is not discussed by this PR because 65059dd2b6f94e570acc645be82b8ea056316459 was reverted restoring functionality to all but amdgpu. This PR is about the problem coming back when 8b4e4c2737305df8807abc6cd054a32586085c93 was committed. Update to 3aa0a0aaa23b95dbf0ef58b16b313637f515b460 to resolve the drm-66-kmod and iwlwifi panics. Update to the next commit (8b4e4c2737305df8807abc6cd054a32586085c93) and the panic comes back for amdgpu only (but not for i915 or for iwlwifi).
Comment 6 Cy Schubert freebsd_committer freebsd_triage 2025-09-10 06:19:10 UTC
I have worked around the problem. This is not a final solution but a circumvention.

cd /usr/ports/graphics/gpu-firmware-kmod
make
cd $WRKSRC/drm-kmod-firmware-20230625_8/amdgpukmsfw-files

In my case this is /export/wrkdir/amd64/usr/ports/graphics/gpu-firmware-amd-kmod/work-aldebaran/drm-kmod-firmware-20230625_8/amdgpukmsfw-files

mkdir /boot/firmware/amdgpu
cp * /boot/firmware/amdgpu

kldload amdgpu will load the appropriate firmware.

The problem was discovered by booting verbose, capturing the dump, then looking at the captured dmesg buffer to see what it was barfing on.

Reinstalling graphics/gpu-firmware-amd-kmod on the new 16-CURRENT system does not fix the problem but this is where the problem lies.

The workaround is clunky but it will get us over the hump until a permanent solution is found.

I've bumped the severity from only affects me to affects some people, as it appears others are interested.
Comment 7 Cy Schubert freebsd_committer freebsd_triage 2025-09-10 06:26:56 UTC
BTW, the drm-kmods should try to print an error message when kernel modules are not loaded and exit. At the point of panic it may be a good idea to test for a NULL pointer, print an message, stop initializing itself, and let the boot continue.
Comment 8 Guido Falsi freebsd_committer freebsd_triage 2025-09-10 07:56:28 UTC
(In reply to Cy Schubert from comment #6)

I'm in the process of updating my personal PCs (desktop and laptop) to 16 and these machines have amd graphics, so I have been following this.

Since from what you say the problem is the location of the firmware files, a simpler workaround could maybe be (untested):

# ln -s /boot/modules /boot/firmware/amdgpu

If I stumble upon this issue, I'm going to test this idea, but maybe I misunderstood your workaround?
Comment 9 Cy Schubert freebsd_committer freebsd_triage 2025-09-10 12:52:22 UTC
(In reply to Guido Falsi from comment #8)

No.
Comment 10 Guido Falsi freebsd_committer freebsd_triage 2025-09-14 19:49:32 UTC
A quick followup, to state I'm not experiencing this issue as I feared.

I've updated to base r4cb50d74c19c014e8099272777eb20aaf834d61c from Fri Sep 5 08:00:52 2025 +0200

I do use amdgpu and load it at boot and have seen no crashes.

I don't know what makes my laptop and desktop machines different from cy's, but since it is working for me I have little more to contribute here, I fear.
Comment 11 Bjoern A. Zeeb freebsd_committer freebsd_triage 2025-11-10 16:43:33 UTC
I had a similar issue with amdgpu on a FW16 which would go like:
kldload amdgpu
black screen
count to 5
See POST

No ddb, no kernel dump, ...  I tried 6.1, 6.6, and master at least from github built along the kernel.

I thought it was switching to GENERIC-NODEBUG which made it work beginning of October, but it seems it might have been putting linux-firmware.git for too much wifi into /boot/firmware?

fwget -n says I should install:
gpu-firmware-amd-kmod-dcn-3-1-4
gpu-firmware-amd-kmod-gc-11-0-1
gpu-firmware-amd-kmod-psp-13-0-4
gpu-firmware-amd-kmod-sdma-6-0-1
gpu-firmware-amd-kmod-vcn-4-0-2

pkg-plist says for the file that did not load for cy@

% grep gc_11_0_1_mes pkg-plist 
%%GC_11_0_1%%/%%KMODDIR%%/amdgpu_gc_11_0_1_mes_bin.ko
%%GC_11_0_1%%/%%KMODDIR%%/amdgpu_gc_11_0_1_mes1_bin.ko
%%GC_11_0_1%%/%%KMODDIR%%/amdgpu_gc_11_0_1_mes_2_bin.ko

so if the plist is correct that should be there;  but the PCI ID for your card may differ;  would be interesting to know?

If you do a bootverbose (boot -v) you will see more information about which firmware files are being tried to load incl. the .ko files from /boot/kernel and /boot/modules if there is no plain firmware file in /boot/firmware/ . The latter is tried first.

If the problem is indeed that amdgpu make the kernel panic or even reset the machine if it fails to laod firmware then the driver needs some improvements.
Comment 12 Cy Schubert freebsd_committer freebsd_triage 2025-11-11 05:40:04 UTC
(In reply to Bjoern A. Zeeb from comment #11)

This is exactly how I clued into simply copying the .bin files to /boot/firmware/amdgpu. The weird thing is that when the uname was 15-CURRENT it worked fine but after uname changed to 16-CURRENT it panicked. It's definitely a driver issue.
Comment 13 Emmanuel Vadot freebsd_committer freebsd_triage 2025-11-11 11:09:08 UTC
loading the amdgpu firmware .ko files works perfectly here on 16-CURRENT, and it seems it also works for Guido so something is wrong with your setup.

Now for the panic it's known, amdgpu needs the firmware and will try to kldunload if it can't load it and this never worked on FreeBSD (well it did a long time ago and stopped working).