Bug 287165 - drm/pseudofs panic in radeonkms.ko
Summary: drm/pseudofs panic in radeonkms.ko
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords: crash
Depends on:
Blocks:
 
Reported: 2025-05-30 17:38 UTC by Steve Kargl
Modified: 2025-06-10 18:18 UTC (History)
4 users (show)

See Also:


Attachments
dmesg.boot showing the radeonkms loading info (18.99 KB, text/plain)
2025-05-30 17:52 UTC, Steve Kargl
no flags Details
proposed patch (3.49 KB, patch)
2025-06-10 13:04 UTC, Mark Johnston
no flags Details | Diff
proposed patch (406 bytes, patch)
2025-06-10 13:11 UTC, Mark Johnston
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Comment 1 Marek Zarychta 2025-05-30 17:44:13 UTC
For recent CURRENT, probably drm-66-kmod or at least drm-61-kmod should be used nowadays.
Comment 2 Steve Kargl freebsd_committer freebsd_triage 2025-05-30 17:52:04 UTC
Created attachment 260809 [details]
dmesg.boot showing the radeonkms loading info
Comment 3 Steve Kargl freebsd_committer freebsd_triage 2025-05-30 17:56:06 UTC
(In reply to Marek Zarychta from comment #1)

Both drm-61 and drm-66 end up in an endless reboot loop
upon loading radeonkms.ko.  That is, 

boot -> load radeonkms.ko -> reboot

drm-515 and current circa february 2025 work fine.
Something has changed in the pass few weeks.  When scanning
https://lists.freebsd.org/archives/dev-commits-src-main/
nothing jumps out as a problematic commit.
Comment 4 Steve Kargl freebsd_committer freebsd_triage 2025-05-30 18:01:48 UTC
% pciconf -vl
...
vgapci0@pci0:1:0:0:     class=0x030000 rev=0x00 hdr=0x00 vendor=0x1002 device=0x6779 subvendor=0x1092 subdevice=0x6450
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'Caicos [Radeon HD 6450/7450/8450 / R5 230 OEM]'
    class      = display
    subclass   = VGA
Comment 5 Steve Kargl freebsd_committer freebsd_triage 2025-05-30 18:34:35 UTC
Looking for closely at dmesg.boot, one finds the initial
reporting for loading radeonkms.ko

  [drm] radeon kernel modesetting enabled.
  drmn0: <drmn> on vgapci0
  vgapci0: child drmn0 requested pci_enable_io
  vgapci0: child drmn0 requested pci_enable_io
  sysctl_add_oid: can't re-use a leaf (hw.dri.debug)!
  [drm] initializing kernel modesetting (CAICOS 0x1002:0x6779 0x1092:0x6450 0x00).

Hmmm, that looks suspicious, and now looking at the qinitial part of core.txt.1

  panic: pfs_add_node(): homonymous siblings

  Reading symbols from /usr/lib/debug//boot/kernel/cpuctl.ko.debug...
  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
  57              __asm("movq %%gs:%c1,%0" : "=r" (td)
  (kgdb) #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57

Learned a new word today.

  homonymous /hō-mŏn′ə-məs, hə-/
  adjective

  Having the same name.
Comment 6 Marek Zarychta 2025-05-30 18:57:01 UTC
That's a bit odd.... I have almost the same graphics card, and it runs perfectly on CURRENT, including suspend/resume support. To prevent problems, I always build kmnods after installkernel.

Probably the problem is somewhere between CPU and graphics, since I am using this card with Intel processor.

vgapci0@pci0:1:0:0:	class=0x030000 rev=0x00 hdr=0x00 vendor=0x1002 device=0x6778 subvendor=0x1028 subdevice=0x2120
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'Caicos XT [Radeon HD 7470/8470 / R5 235/310 OEM]'
    class      = display
    subclass   = VGA
Comment 7 Steve Kargl freebsd_committer freebsd_triage 2025-05-30 20:31:35 UTC
(In reply to Marek Zarychta from comment #6)
The full updating dance for me

% cd /usr/ports
% git pull -ff
% cd ../src
% git pull -ff
% make -j7 buildworld
% make -j7 buildkernel
% make installkernel
<reboot to single user>
# mount -a
# etcupdate -p
# make installworld
# etcupdate -B 
# make delete-old
# vi /etc/rc.conf (comment out kld_list)
# sync
# reboot
% pkg delete -f drm-515-kmod
% pkg delete -f gpu-firmware\*
% cd /usr/ports/graphics/drm-515-kmod
% make -j7 && make install && make clean
% cd ../gpu-firmware-radeon-kmod
% make -j7 && make install && make clean
% vi /etc/rc.conf (restore kld_list)
% shutdown -r now

Log in run startx, watch system panic.

The panic occurs with both a custom kernel and GENERIC.
In an odd twist of fate, the kernel crash dump that I 
have was the first time the system panicked.  With all
other panics, the system simply hangs with a Black screen
and hitting the reset button is required.
Comment 8 Marek Zarychta 2025-05-30 21:39:37 UTC
(In reply to Steve Kargl from comment #7)
I am sorry, I am not to judge whether the upgrade procedure is 100% or 101% correct. I am trying to help here since the graphics are similar. 

Driven by curiosity, I have just upgraded to the most recent CURRENT, replaced drm-61-kmod with drm-66-kmod, rebooted using UEFI and then BIOS methods, and everything seems to work fine. TBH, my upgrade procedure is simplified: installkernel and installworld (over NFS), etcupdate, pkg upgrade, portupgrade drm-66-kmod and then reboot. 

Please let me paste excerpts from the dmesg below - there are a few errors, they are similar but not fatal. 

What I have noticed, when booting from UEFI the screen is blank for short period of time (still in text mode), when booting with BIOS method, only the resolution changes, but the screen doesn't go blank during boot.

From dmesg(8), this time booted from legacy BIOS:
[drm] radeon kernel modesetting enabled.
drmn0: <drmn> on vgapci0
vgapci0: child drmn0 requested pci_enable_io
vgapci0: child drmn0 requested pci_enable_io
sysctl_add_oid: can't re-use a leaf (hw.dri.debug)!
[drm] initializing kernel modesetting (CAICOS 0x1002:0x6778 0x1028:0x2120 0x00).
[drm ERROR :radeon_atombios_init] Unable to find PCI I/O BAR; using MMIO for ATOM IIO
ATOM BIOS: C26411
drmn0: VRAM: 1024M 0x0000000000000000 - 0x000000003FFFFFFF (1024M used)
drmn0: GTT: 1024M 0x0000000040000000 - 0x000000007FFFFFFF
[drm] Detected VRAM RAM=1024M, BAR=256M
[drm] RAM width 64bits DDR
[drm] radeon: 1024M of VRAM memory ready
[drm] radeon: 1024M of GTT memory ready.
[drm] Loading CAICOS Microcode
radeon/CAICOS_pfp.bin: could not load binary firmware /boot/firmware/radeon/CAICOS_pfp.bin either
CAICOS_pfp.bin: could not load binary firmware /boot/firmware/CAICOS_pfp.bin either
radeon_CAICOS_pfp.bin: could not load binary firmware /boot/firmware/radeon_CAICOS_pfp.bin either
drmn0: successfully loaded firmware image 'radeon/CAICOS_pfp.bin'
radeon/CAICOS_me.bin: could not load binary firmware /boot/firmware/radeon/CAICOS_me.bin either
CAICOS_me.bin: could not load binary firmware /boot/firmware/CAICOS_me.bin either
radeon_CAICOS_me.bin: could not load binary firmware /boot/firmware/radeon_CAICOS_me.bin either
drmn0: successfully loaded firmware image 'radeon/CAICOS_me.bin'
radeon/BTC_rlc.bin: could not load binary firmware /boot/firmware/radeon/BTC_rlc.bin either
BTC_rlc.bin: could not load binary firmware /boot/firmware/BTC_rlc.bin either
radeon_BTC_rlc.bin: could not load binary firmware /boot/firmware/radeon_BTC_rlc.bin either
drmn0: successfully loaded firmware image 'radeon/BTC_rlc.bin'
radeon/CAICOS_mc.bin: could not load binary firmware /boot/firmware/radeon/CAICOS_mc.bin either
CAICOS_mc.bin: could not load binary firmware /boot/firmware/CAICOS_mc.bin either
radeon_CAICOS_mc.bin: could not load binary firmware /boot/firmware/radeon_CAICOS_mc.bin either
drmn0: successfully loaded firmware image 'radeon/CAICOS_mc.bin'
radeon/CAICOS_smc.bin: could not load binary firmware /boot/firmware/radeon/CAICOS_smc.bin either
CAICOS_smc.bin: could not load binary firmware /boot/firmware/CAICOS_smc.bin either
radeon_CAICOS_smc.bin: could not load binary firmware /boot/firmware/radeon_CAICOS_smc.bin either
drmn0: successfully loaded firmware image 'radeon/CAICOS_smc.bin'
[drm] Internal thermal controller without fan control
[drm] radeon: dpm initialized
radeon/SUMO_uvd.bin: could not load binary firmware /boot/firmware/radeon/SUMO_uvd.bin either
SUMO_uvd.bin: could not load binary firmware /boot/firmware/SUMO_uvd.bin either
radeon_SUMO_uvd.bin: could not load binary firmware /boot/firmware/radeon_SUMO_uvd.bin either
drmn0: successfully loaded firmware image 'radeon/SUMO_uvd.bin'
[drm] GART: num cpu pages 262144, num gpu pages 262144
[drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
[drm] PCIE GART of 1024M enabled (table at 0x0000000000162000).
drmn0: WB enabled
drmn0: fence driver on ring 0 use gpu addr 0x0000000040000c00
drmn0: fence driver on ring 3 use gpu addr 0x0000000040000c0c
drmn0: fence driver on ring 5 use gpu addr 0x0000000000072118
drmn0: radeon: MSI limited to 32-bit
drmn0: radeon: using MSI.
[drm] radeon: irq initialized.
[drm] ring test on 0 succeeded in 3 usecs
[drm] ring test on 3 succeeded in 8 usecs
[drm] ring test on 5 succeeded in 2 usecs
[drm] UVD initialized successfully.
[drm] ib test on ring 0 succeeded in 0 usecs
[drm] ib test on ring 3 succeeded in 0 usecs
[drm] ib test on ring 5 succeeded
lkpi_iicbb0: <LinuxKPI I2CBB> on drmn0

(...)

[drm] Radeon Display Connectors
[drm] Connector 0:
[drm]   DP-1
[drm]   HPD2
[drm]   DDC: 0x6460 0x6460 0x6464 0x6464 0x6468 0x6468 0x646c 0x646c
[drm]   Encoders:
[drm]     DFP1: INTERNAL_UNIPHY1
[drm] Connector 1:
[drm]   DVI-I-1
[drm]   HPD4
[drm]   DDC: 0x6450 0x6450 0x6454 0x6454 0x6458 0x6458 0x645c 0x645c
[drm]   Encoders:
[drm]     DFP2: INTERNAL_UNIPHY
[drm]     CRT1: INTERNAL_KLDSCP_DAC1
[drm] Initialized radeon 2.50.0 20080528 for drmn0 on minor 0
[drm] fb mappable at 0xE0363000
[drm] vram apper at 0xE0000000
[drm] size 7299072
[drm] fb depth is 24
[drm]    pitch is 6912
VT: Replacing driver "vga" with new "drmfb".
start FB_INFO:
height=1024 width=1280 depth=32
pbase=0xe0363000 vbase=0xfffff800e0363000
name=drmn0 id=radeondrmfb flags=0x0 stride=6912
end FB_INFO
Comment 9 Steve Kargl freebsd_committer freebsd_triage 2025-06-02 22:21:08 UTC
Taking a relatively giant step backwards, I have downgraded
from a once function radeonkms.ko from drm-515-kmod to the
vesa driver.  A git bisection of both /usr/src and /usr/ports
is likely to take awhile.  1) I need to learn how to the 
bisection and 2) I need to backup to mid-february with a
guess at git hash.
Comment 10 Bjoern A. Zeeb freebsd_committer freebsd_triage 2025-06-03 09:20:43 UTC
(In reply to Steve Kargl from comment #5)

Do you have a backtrace in your core.txt.1 or can you make the file available?
Comment 11 Steve Kargl freebsd_committer freebsd_triage 2025-06-03 14:54:25 UTC
(In reply to Bjoern A. Zeeb from comment #10)

I have core.txt.1, vmcore.1, and info.1 as well as *.2 files.
Unfortunely, I've built and installed dozen of kernels and have
lost the kernel.debug files.  I can upload the files to my
home directory kargl@freefall.freebsd.org later today.

I'll try adding the dump_stack() call you mentioned in another
email to see if I can get additional information.
Comment 12 Steve Kargl freebsd_committer freebsd_triage 2025-06-03 19:21:29 UTC
(In reply to Steve Kargl from comment #11)
 Bjoern, I have uploaded the *.0 files to the directory drm/
in my home directory on freefall (aka kargl/drm).

Note, I added the dump_stack() call and it appears in core.txt.0,
and updated to include your recent change to output the name.

Finally, I saved a copy of /usr/lib/debug/boot, so I have the
*.debug files but have not uploaded those, yet.  Let me know 
if you need those.
Comment 13 Bjoern A. Zeeb freebsd_committer freebsd_triage 2025-06-03 20:11:35 UTC
(In reply to Steve Kargl from comment #12)

So my assumption was correct?  We are going twice through evergreen_startup().


...
drmn0: radeon: using MSI.
[drm] radeon: irq initialized.
#0 0xffffffff808bcdeb at linux_dump_stack+0x1b
#1 0xffffffff82a67adc at evergreen_startup+0x15ec
#2 0xffffffff82a67fb6 at evergreen_init+0x276
#3 0xffffffff82abdc35 at radeon_device_init+0x835
#4 0xffffffff82acebbe at radeon_driver_load_kms+0x19e
#5 0xffffffff82ba4147 at drm_dev_register+0x1c7
#6 0xffffffff82ac4cdc at radeon_pci_probe+0x15c
#7 0xffffffff808c5020 at linux_pci_attach_device+0x440
#8 0xffffffff806ac61a at device_attach+0x3fa
#9 0xffffffff806ae370 at bus_generic_driver_added+0x90
#10 0xffffffff806a9ba9 at devclass_driver_added+0x29
#11 0xffffffff806a9ac8 at devclass_add_driver+0x138
#12 0xffffffff808c5f51 at _linux_pci_register_driver+0xc1
#13 0xffffffff82ac4b4e at radeonkms_evh+0x3e
#14 0xffffffff80652be0 at module_register_init+0xb0
#15 0xffffffff80642e0b at linker_load_module+0xbeb
#16 0xffffffff80644b25 at kern_kldload+0x125
#17 0xffffffff80644bb9 at sys_kldload+0x59
[drm] ring test on 0 succeeded in 4 usecs
[drm] ring test on 3 succeeded in 6 usecs
[drm] ring test on 5 succeeded in 3 usecs
[drm] UVD initialized successfully.
[drm] ib test on ring 0 succeeded in 0 usecs
[drm] ib test on ring 3 succeeded in 0 usecs
[drm] ib test on ring 5 succeeded
...
...
[drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
[drm] PCIE GART of 1024M enabled (table at 0x0000000000162000).
drmn0: WB enabled
drmn0: fence driver on ring 0 use gpu addr 0x0000000040000c00
drmn0: fence driver on ring 3 use gpu addr 0x0000000040000c0c
drmn0: fence driver on ring 5 use gpu addr 0x0000000000072118
#0 0xffffffff808bcdeb at linux_dump_stack+0x1b
#1 0xffffffff82a67adc at evergreen_startup+0x15ec
#2 0xffffffff82a66333 at evergreen_resume+0x63
#3 0xffffffff82abeea0 at radeon_gpu_reset+0x290
#4 0xffffffff82ac9ea8 at radeon_gem_wait_idle_ioctl+0xb8
#5 0xffffffff82bb4f16 at drm_ioctl_kernel+0xc6
#6 0xffffffff82bb528d at drm_ioctl+0x29d
#7 0xffffffff808ba3b1 at linux_file_ioctl+0x301
#8 0xffffffff806e495e at kern_ioctl+0x1de
#9 0xffffffff806e471f at sys_ioctl+0x12f
#10 0xffffffff80a01e4e at amd64_syscall+0x13e
#11 0xffffffff809d4ccb at fast_syscall_common+0xf8
panic: pfs_add_node(): homonymous siblings: 'radeon_ring_gfx' type 5


I am adding dumbbell to Cc:  Someone needs to figure out where in all this the cleanup does not happen.  I do not see any debugfs_remove*() calls in amdgpu; there's likely other code paths to get to there (or possibly other KPI functions which would cleanup the pfs).

The other question I cannot answer yet, is where is CONFIG_DEBUG_FS turned on for the build so that you hit this code path in first place.
I do not know how the
./kconfig.mk:           DEBUG_FS \
magic works.

It came in with:
---
commit aec60ec819e1d8aa7961be2effef3f6b22741486
Author:     Jake Freeland <jfree@freebsd.org>
AuthorDate: Mon Oct 10 18:13:11 2022 -0500
Commit:     Emmanuel Vadot <manu@bidouilliste.com>
CommitDate: Tue Oct 11 09:43:23 2022 +0200
 
    Add support for CONFIG_DEBUG_FS build flag
---

Hmmm Ok.  Entirely untested: could you try this patch?

diff --git radeon/Makefile radeon/Makefile
index f731eb961e..788e1fbb77 100644
--- radeon/Makefile
+++ radeon/Makefile
@@ -129,7 +129,7 @@ CFLAGS+= -I${SRCDIR:H}/amd/include
 
 CFLAGS+= '-DKBUILD_MODNAME="${KMOD}"'
 CFLAGS+= '-DLINUXKPI_PARAM_PREFIX=radeon_' -DDRM_SYSCTL_PARAM_PREFIX=_${KMOD}
-CFLAGS+= ${KCONFIG:C/(.*)/-DCONFIG_\1/}
+CFLAGS+= ${KCONFIG:NDEBUG_FS:C/(.*)/-DCONFIG_\1/}
 
 CFLAGS.gcc+= -Wno-redundant-decls -Wno-cast-qual -Wno-unused-but-set-variable \
        -Wno-maybe-uninitialized
Comment 14 Steve Kargl freebsd_committer freebsd_triage 2025-06-03 23:16:38 UTC
bz, thanks for looking into the issue.

I've managed to backup to "git checkout 'main@{2025-03-15 12:00:00}'",
which has hash d3c4b002d.  After rebuilding and re-install world/kernel,
and rebuild gpu-firmware and drm-515-kmod, I can successfully load
radeonkms.ko and run startx.  The desktop I expect comes up.

So, whatever is causing the issue appears in src/ after the above
date.  I'll move forward to 2025-04-15.  It takes 3-4 hours to rebuild
everything.
Comment 15 Bjoern A. Zeeb freebsd_committer freebsd_triage 2025-06-03 23:35:03 UTC
(In reply to Steve Kargl from comment #14)

if that does not work I would go backwards trying
62d51a43825bb632f542f4e89d57f3dbdb08095f (Apr 11)
86db734ae292fee58532f09b17b50438f6889cc8 (Apr 3)

both are followed by a set of LinuxKPI changes so you'd back these out which may help to narrow it down with the manual bisect.

HTH
Comment 16 Steve Kargl freebsd_committer freebsd_triage 2025-06-04 20:15:25 UTC
I have been able to build world/kernel from src/ of 2025-04-15
(aka adc33d3288).  I rebuilt drm-515-kmod and gpu-firmware-kmod.
After reboot system, I can now kldload radeonkms.ko.  startx
brings up the expected desktop.

I now rebuilding src/ circa 2025-05-01 (aka 8d136fb027).
Comment 17 Jean-Sébastien Pédron freebsd_committer freebsd_triage 2025-06-09 12:43:44 UTC
Hi!

I have a hard time to follow between the mailing list thread and this problem report, so let me try to rephrase:

1. There is a panic in pseudofs because the radeon driver wants to declase two entries with the same name.

2. It looks like this double attempt comes from the fact that the same code path is executed twise during init.

The problem appeared between commits 9b2a503a1179 and 6c3a4b5f9b7b in freebsd-src HEAD.

Am I correct?

About the panic, it looks like it has been addressed by a commit from kib@:
https://cgit.freebsd.org/src/commit/?id=e9897199576a40360440aa4d2aa48d61c4010f11

That dosn’t change the fact the initialization is apparently called twice. However I can’t find the dmesgs that are mentionned during the discussions where the dump_stack() traces appear, demonstration the two inits. Could you please attach these dmesgs here?

Thank you!
Comment 18 Bjoern A. Zeeb freebsd_committer freebsd_triage 2025-06-09 14:38:17 UTC
(In reply to Jean-Sébastien Pédron from comment #17)

See comment #11 (it's on freefall  believe)
Comment 19 Steve Kargl freebsd_committer freebsd_triage 2025-06-09 14:52:08 UTC
(In reply to Jean-Sébastien Pédron from comment #17)

In the "good" case where I can load radeonkms.ko
and use startx to bring up my desktop, the driver
is only initialized **once**.

In the "bad" case, after loading radeonkms.ko, the
use of startx causes a panic.  The panic appears to
be due to an attempt to initialize the driver a
second time.

kib's patch addresses the panic, but does not address
why the initialization of the driver is occurring
**twice**.  I reduced the range of commits to the
range you quoted.  Unfortunately, slow hardware and
keeping world/kernel in sync is taking a long time
to find the commit that is causing the actual
problem.

Look for the directory kargl/drm on freefall for
crash dump
Comment 20 Steve Kargl freebsd_committer freebsd_triage 2025-06-09 21:19:35 UTC
I found it.  It's markj's commit about jiffies.  The four consecutive commits are 

4fa275a5f357 - main - queue(3): Add simple tests for some macros...  Olivier Certner
325aa4dbd10d - main - linuxkpi: Introduce a properly typed jiffies Mark Johnston
901256f6ea3c - main - mlx5: jiffies is unsigned long Mark Johnston
87e57632bf88 - main - ofed: jiffies is unsigned long Mark Johnston

4fa275a boots, I can kldload radeonkms.ko, and startx brings up my desktop.
In fact, I'm typing this in firefox at the moment.  There is only dump_stack()
message from evergreen.c

87e5763 boots, I can kldload radeonkmd.ko, and startx causes a panic.
My custom kernel uses neither mlx5 nor ofed.  That leaves 325aa4d as
the commit causing an issue. There are two dump_stack() message from
evergreen.c in /var/crash/core.txt.3
Comment 21 Mark Johnston freebsd_committer freebsd_triage 2025-06-10 13:04:15 UTC
Created attachment 261141 [details]
proposed patch

Steve, can you please test this patch to drm-kmod and let us know if it fixes the problem when my commits are reapplied?
Comment 22 Bjoern A. Zeeb freebsd_committer freebsd_triage 2025-06-10 13:09:34 UTC
(In reply to Mark Johnston from comment #21)

Mark, can you elaborate?  That's an E1000 patch.  How does that has impact on drm-kmod?  Wrong patch file?
Comment 23 Mark Johnston freebsd_committer freebsd_triage 2025-06-10 13:11:16 UTC
Created attachment 261142 [details]
proposed patch

Sigh, thanks Bjoern, I meant this one.  It's already applied to 6.x branches, but not 5.15 for some reason.
Comment 24 Steve Kargl freebsd_committer freebsd_triage 2025-06-10 18:18:12 UTC
(In reply to Mark Johnston from comment #23)

Mark, your patch seems to fix the issue.

I built and installed 87e57632, and rebooted system.
Then, updated gpu-firmware and drm-515-kmod.  After
kldload of radeonkms.ko., startx brought up the
expected desktop.  The dump_stack() call in
evergreen.c was executed only once, so initialization
only occurs once.