229995 – [panic, regression] radeonkms "timed sleep before timers are working"

Bug 229995 - [panic, regression] radeonkms "timed sleep before timers are working"

Summary: [panic, regression] radeonkms "timed sleep before timers are working"

Status:	Open

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	11.2-RELEASE
Hardware:	amd64 Any

Importance:	--- Affects Some People
Assignee:	freebsd-bugs (Nobody)

URL:
Keywords:	crash, regression

Depends on:
Blocks:

Reported:	2018-07-23 23:05 UTC by Andrew Daugherity
Modified:	2022-10-12 00:50 UTC (History)
CC List:	5 users (show)

See Also:

Attachments
dmesg w/panic (7.87 KB, text/plain) 2018-07-23 23:05 UTC, Andrew Daugherity	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Andrew Daugherity 2018-07-23 23:05:04 UTC

Created attachment 195401 [details]
dmesg w/panic

On a system where I have radeonkms activated via /boot/loader.conf:
kern.vty=vt
radeonkms_load="YES"
radeonkmsfw_R100_cp_load="YES"

This worked properly in 10.2 and 10.3.  I recently upgraded to 11.2 and was greeted by a kernel panic:
====
drmn0: <ATI ES1000 RN50> on vgapci0
info: [drm] RADEON_IS_PCI
[...]
info: [drm] Loading R100 Microcode
info: [drm] radeon: ring at 0x00000000C0001000
info: [drm] ring test succeeded in 2 usecs
panic: timed sleep before timers are working
cpuid = 2
KDB: stack backtrace:
#0 0xffffffff80b3d567 at kdb_backtrace+0x67
#1 0xffffffff80af6b07 at vpanic+0x177
#2 0xffffffff80af6983 at panic+0x43
#3 0xffffffff80b4a363 at sleepq_set_timeout_sbt+0x103
#4 0xffffffff80a96bbc at _cv_timedwait_sbt+0x13c
#5 0xffffffff826e6b05 at radeon_fence_wait_seq+0x1d5
#6 0xffffffff826e68ea at radeon_fence_wait+0x2a
#7 0xffffffff8271e2c2 at r100_ib_test+0x202
#8 0xffffffff826f803a at radeon_ib_ring_tests+0x2a
#9 0xffffffff826e1e61 at radeon_device_init+0x511
#10 0xffffffff826ee443 at radeon_driver_load_kms+0xa3
#11 0xffffffff8289aa06 at drm_get_pci_dev+0x436
#12 0xffffffff8289d6fa at drm_attach_helper+0x13a
#13 0xffffffff826e579f at radeon_attach+0x4f
#14 0xffffffff80b2fc98 at device_attach+0x3b8
#15 0xffffffff80b30f3d at bus_generic_attach+0x3d
#16 0xffffffff8077f0fe at vga_pci_attach+0x3e
#17 0xffffffff80b2fc98 at device_attach+0x3b8
Uptime: 1s
Automatic reboot in 15 seconds - press a key on the console to abort
====

It only panics when loading the module at boot.  If I 'kldload radeonkms' after the system has booted, there is no problem.  (So doing that via rc.local is a workaround at least...)

Note that when the system is up, kldload will automatically load the necessary firmware modules, but loader(8) will not, hence my specifying radeonkmsfw_R100_cp_load explicitly.  If I leave that line out, the radeonkms driver will complain about missing firmware, but the system will not panic.

The full dmesg is attached.  For comparison, a successful load of radeonkms after bootup produces mostly the same output up to the "ring test succeeded message", after which it prints another 30 lines or so:
====
info: [drm] ring test succeeded in 0 usecs
info: [drm] ib test succeeded in 0 usecs
info: [drm] radeon_device_init: Taking over the fictitious range 0xe0000000-0xe4000000
radeon_iicbb0 on drmn0
iicbus0: <Philips I2C bus> on iicbb0 addr 0xff
iic0: <I2C generic I/O> on iicbus0
radeon_iicbb1 on drmn0
iicbus1: <Philips I2C bus> on iicbb1 addr 0xff
iic1: <I2C generic I/O> on iicbus1
radeon_iicbb2 on drmn0
iicbus2: <Philips I2C bus> on iicbb2 addr 0xff
iic2: <I2C generic I/O> on iicbus2
radeon_iicbb3 on drmn0
iicbus3: <Philips I2C bus> on iicbb3 addr 0xff
iic3: <I2C generic I/O> on iicbus3
info: [drm] Radeon Display Connectors
info: [drm] Connector 0:
info: [drm]   VGA-1
info: [drm]   DDC: 0x60 0x60 0x60 0x60 0x60 0x60 0x60 0x60
info: [drm]   Encoders:
info: [drm]     CRT1: INTERNAL_DAC1
info: [drm] Connector VGA-1: get mode from tunables:
info: [drm]   - kern.vt.fb.modes.VGA-1
info: [drm]   - kern.vt.fb.default_mode
info: [drm] fb mappable at 0xE0040000
info: [drm] vram apper at 0xE0000000
info: [drm] size 2621440
info: [drm] fb depth is 16
info: [drm]    pitch is 2560
fbd0 on drmn0
VT: Replacing driver "vga" with new "fb".
====

Workaround:
Comment out any 'radeonkms*_load' lines in /boot/loader.conf and add 'kldload radeonkms' to /etc/rc.local, to load the driver later in the boot process.

Comment 1 Andrew Daugherity 2018-08-07 21:40:52 UTC

Tested several other FreeBSD versions via editing /boot/loader.conf on the mini-memstick installer:
10.4: OK
11.0, 11.1: panics same as 11.2
12.0-current: panics and reboots immediately.  Using a serial console, the last message printed is the "[drm] ring test succeeded" message (same place where 11.x panics).

The 11.2 installer also reboots immediately, rather than showing the panic as the installed system does.  I guess there's a loader setting controlling that?

Comment 2 Alexey Dokuchaev freebsd_committer

2020-09-28 10:26:02 UTC

(In reply to Andrew Daugherity from comment #1)
Thanks for those details Andrew, this might help us to track down and fix the regressions reported by users of radeon(4) driver on recent versions of FreeBSD.

It's 2020 now, and there were changes in the FreeBSD graphics stack which might or might not make a difference WRT problem you've been seeing.  Did you try any later versions of FreeBSD and DRM code?  Note that DRM support had moved out of the base system/kernel to the `graphics/drm-*-kmod' and `graphics/gpu-firmware-kmod' ports.

Adding x11@ to CC list so it can be tracked while originally filed under base system/kern.

Comment 3 Andrew Daugherity 2020-09-30 17:48:25 UTC

(In reply to Alexey Dokuchaev from comment #2)
I'm running 12.1 on a similar server (also an embedded Radeon R100) and using the drm-kmod package.  It was installed as 12.0, which had the same issue with the included drivers as 11.x, as noted in comment 1.  However, prompted by the deprecation notice I installed drm-legacy-kmod.  I don't recall if I ever tried loading that via /boot/loader.conf, but as the package message suggests using the kld_list setting in rc.conf, I'm following that method.  (Although it neglects to mention that you must also load drm2.ko that way [1], or else it will use the kernel's [deprecated] drm2.ko with the radeonkms from the package, and still generate deprecation warnings.)

I've since switched to the non-legacy drm-kmod package, which also works fine, at least in 16 bpp modes.  Other depths had issues with both the legacy/kernel drivers (they defaulted to 8 bpp, which was grayscale for some reason; 24 bpp had an incorrect console palette) and the new ones (display corruption in some depths; sorry, I forget which, but at least the new drivers default to 16 bpp, which works).

[ Aside: the vt(4) man page only mentions kern.vt.fb.default_mode="<X>x<Y>" and nothing about setting depth, which is apparently done by appending it to the resolution, e.g. "1280x1024-16".  I found that in a mailing list; seems like an oversight to not include it in the man page... ]

So... while it _is_ a bug that it panics if you load the drivers on boot, loading the drivers later is an easy workaround (and maybe even the preferred method?), so it's not that big of a problem.  I was unaware of the kld_list rc.conf setting until the package told me to use it -- that's a cleaner method, for sure. 

[1] https://github.com/FreeBSDDesktop/drm-legacy/issues/7#issuecomment-658407985

Comment 4 Niclas Zeising freebsd_committer

2020-10-01 06:02:56 UTC

(In reply to Andrew Daugherity from comment #3)

Loading drm-kmod drivers from loader.conf has never been supported.

Comment 5 Warner Losh freebsd_committer

2020-10-01 15:07:42 UTC

To elaberate, loading the linuxkpi drivers from loader.conf causes the modules to load earlier in the boot sequence before interrupts are enabled, threads are running and everything is fully initialized. There are dependencies in the code on these things working, so when they aren't the system crashes.

One of three things can make this better: (1) Make the code fail to load if loaded too early and print a warning (fail safe). (2) Fix the code to work early in boot or (3) Make the code do a deferred attach fo the drivers until after interrupts are running.

1 & 3 are likely easy, but would require some fussing around to get right. 2 is likely quite hard and would take a fair amount of effort. Of the three, I'd recommend #3 for anybody looking at this bug. I'm not entirely sure that #3 would work, but reasonably sure.