Created attachment 229310 [details] pciconf I use stable/13 built from source an i also use drm-kmod built from git as well by following stable branch. I also use wayland+sway. When using firefox, several times a day flickering with black rectangles could happen, which may disappear on it's own but sometime, not too often, laptop freezing, so that only hard reboot required. There is no any information in dmesg ever saved, except following: drmn0: GPU HANG: ecode 6:1:bb0fffff, in sway [100936] drmn0: Resetting chip for stopped heartbeat on rcs0 drmn0: sway[100936] context reset due to GPU hang This is relatively old laptop with sandy-bridge CPU. I suspect that certain changes in linuxkpi resulted in such behavior but i cannot trace it back since when issue started. It was working very stable before. I tried 5.4-stable, 5.5-stable, 5.6 and master branches of drm-kmod. Due to lack of information, in dmesg and similar, i could not provide technical details but this story. To track the issue, debugging steps/guide would be much appreciated.
Created attachment 229311 [details] dmesg
for testing purposes i updated for latest HEAD of drm-kmod and ran for a day with sway, which after some time resulted in kernel crash. Here is the debugger information: Reading symbols from /usr/obj/usr/src/amd64.amd64/sys/CHAKLUNCHIK/kernel.full... Unread portion of the kernel message buffer: drmn0: GPU HANG: ecode 6:1:00000000, in sway [100963] Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 02 fault virtual address = 0x61 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80999d27 stack pointer = 0x28:0xfffffe001b159b80 frame pointer = 0x28:0xfffffe001b159bc0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (linuxkpi_short_wq_3) trap number = 12 WARNING !drm_modeset_is_locked(&crtc->mutex) failed at /var/src/drm-kmod/drivers/gpu/drm/drm_atomic_helper.c:616 WARNING !drm_modeset_is_locked(&crtc->mutex) failed at /var/src/drm-kmod/drivers/gpu/drm/drm_atomic_helper.c:616 WARNING !drm_modeset_is_locked(&dev->mode_config.connection_mutex) failed at /var/src/drm-kmod/drivers/gpu/drm/drm_atomic_helper.c:661 <4>WARN_ON(!mutex_is_locked(&fbc->lock))WARN_ON(!mutex_is_locked(&fbc->lock))WARN_ON(!mutex_is_locked(&fbc->lock))WARN_ON(!mutex_is_locked(&fbc->lock)) panic: page fault cpuid = 2 time = 1637288382 KDB: stack backtrace: #0 0xffffffff80773ebb at kdb_backtrace+0x6b #1 0xffffffff80726fb7 at vpanic+0x187 #2 0xffffffff80726e23 at panic+0x43 #3 0xffffffff80a43fd7 at trap_fatal+0x387 #4 0xffffffff80a4402f at trap_pfault+0x4f #5 0xffffffff80a437a0 at trap+0x4a0 #6 0xffffffff80a1c878 at calltrap+0x8 #7 0xffffffff80999e1d at kmem_free+0x2d #8 0xffffffff81d6633c at __i915_gpu_coredump_free+0xfc #9 0xffffffff81d3d4e9 at intel_gt_handle_error+0xa9 #10 0xffffffff81d2bef0 at heartbeat+0x110 #11 0xffffffff808f19fd at linux_work_fn+0xed #12 0xffffffff807889d7 at taskqueue_run_locked+0x197 #13 0xffffffff80789d03 at taskqueue_thread_loop+0xc3 #14 0xffffffff806e48ae at fork_exit+0x8e #15 0xffffffff80a1d8ee at fork_trampoline+0xe WARNING !drm_modeset_is_locked(&crtc->mutex) failed at /var/src/drm-kmod/drivers/gpu/drm/drm_atomic_helper.c:616 WARNING !drm_modeset_is_locked(&crtc->mutex) failed at /var/src/drm-kmod/drivers/gpu/drm/drm_atomic_helper.c:616 WARNING !drm_modeset_is_locked(&dev->mode_config.connection_mutex) failed at /var/src/drm-kmod/drivers/gpu/drm/drm_atomic_helper.c:661 <4>WARN_ON(!mutex_is_locked(&fbc->lock))WARN_ON(!mutex_is_locked(&fbc->lock))WARN_ON(!mutex_is_locked(&fbc->lock))WARN_ON(!mutex_is_locked(&fbc->lock)) Uptime: 15h52m16s WARNING !drm_modeset_is_locked(&crtc->mutex) failed at /var/src/drm-kmod/drivers/gpu/drm/drm_atomic_helper.c:616 WARNING !drm_modeset_is_locked(&crtc->mutex) failed at /var/src/drm-kmod/drivers/gpu/drm/drm_atomic_helper.c:616 WARNING !drm_modeset_is_locked(&dev->mode_config.connection_mutex) failed at /var/src/drm-kmod/drivers/gpu/drm/drm_atomic_helper.c:661 <4>WARN_ON(!mutex_is_locked(&fbc->lock))WARN_ON(!mutex_is_locked(&fbc->lock))WARN_ON(!mutex_is_locked(&fbc->lock))WARN_ON(!mutex_is_locked(&fbc->lock)) Dumping 1930 out of 16242 MB:..1% (CTRL-C to abort) ..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
fresh stable/13 build and lastest master of drm-kmod, while attempting to watch video: kgdb -n 0 GNU gdb (GDB) 11.2 [GDB v11.2 for FreeBSD] Copyright (C) 2022 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-portbld-freebsd13.0". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/obj/usr/src/amd64.amd64/sys/CHAKLUNCHIK/kernel.full... Unread portion of the kernel message buffer: drmn0: GPU HANG: ecode 6:1:ccddeeff, in Renderer [101008] drmn0: Resetting chip for stopped heartbeat on rcs0 drmn0: Renderer[101008] context reset due to GPU hang drmn0: GPU HANG: ecode 6:1:7f7f7f7f, in Renderer [101008] Fatal trap 12: page fault while in kernel mode cpuid = 4; apic id = 04 fault virtual address = 0x61 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80986ac7 stack pointer = 0x28:0xfffffe001b163b70 frame pointer = 0x28:0xfffffe001b163bb0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (linuxkpi_short_wq_1) trap number = 12 panic: page fault cpuid = 4 time = 1643805299 KDB: stack backtrace: #0 0xffffffff8076258b at kdb_backtrace+0x6b #1 0xffffffff8071646f at vpanic+0x17f #2 0xffffffff807162e3 at panic+0x43 #3 0xffffffff80a2e455 at trap_fatal+0x385 #4 0xffffffff80a2e4af at trap_pfault+0x4f #5 0xffffffff80a07528 at calltrap+0x8 #6 0xffffffff80986bbd at kmem_free+0x2d #7 0xffffffff81d731ac at __i915_gpu_coredump_free+0x12c #8 0xffffffff81d473f9 at intel_gt_handle_error+0xa9 #9 0xffffffff81d34e50 at heartbeat+0x110 #10 0xffffffff808df36d at linux_work_fn+0xed #11 0xffffffff807770e7 at taskqueue_run_locked+0x187 #12 0xffffffff80778402 at taskqueue_thread_loop+0xc2 #13 0xffffffff806d4082 at fork_exit+0x82 #14 0xffffffff80a0859e at fork_trampoline+0xe Uptime: 3h32m56s Dumping 1024 out of 16242 MB:..2%..11%..21%..32%..41%..52%..61%..71%..82%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) trace Tracepoint 1 at 0xffffffff8071622e: file /usr/src/sys/amd64/include/pcpu_aux.h, line 55.
Seems that there is more info when drm is compile with CONFIG_I915_DEBUG_GEM You might want to try by modifying kconfig.mk . (But I have no idea if all the code compile on FreeBSD with it).
would it be then DRM_I915_DEBUG_GEM in KCONFIG = ? ( in kconfig.mk )
Yeah something like : diff --git a/kconfig.mk b/kconfig.mk index 6ade219bfb13..d12856f86937 100644 --- a/kconfig.mk +++ b/kconfig.mk @@ -15,7 +15,8 @@ KCONFIG= DRM_AMDGPU_CIK \ DRM_I915_TIMESLICE_DURATION=1 \ DRM_I915_MAX_REQUEST_BUSYWAIT=8000 \ DRM_MIPI_DSI \ - DRM_PANEL_ORIENTATION_QUIRKS + DRM_PANEL_ORIENTATION_QUIRKS \ + DRM_I915_DEBUG_GEM .if empty(NO_FBDEV) KCONFIG+= DRM_FBDEV_EMULATION \
thank you. unrelated, can i build only i915 specific code? i.e no amdgpu, radeon i will try this, if compiled. notice, that it happens sporadically, so i don't know when next crash to expect, hopefully much sooner.
export KMODS="drm linuxkpi i915" make
with debug change code is compiled, however upon reboot i getting: feeding entropy: loading module which stuck forever, so i had to revert the change and recompile without debug, so that i can boot notebook.
i pulled very latest changes in drm-kmod, compiled and run for awhile until system rebooted ( no crash core saved ), but it is recorded in /var/log/messages: Feb 3 11:44:25 chaklunchik kernel: drmn0: GPU HANG: ecode 6:1:bb13ffff, in Renderer [101226] Feb 3 11:44:25 chaklunchik kernel: drmn0: Resetting chip for stopped heartbeat on rcs0 Feb 3 11:44:25 chaklunchik kernel: drmn0: Renderer[101226] context reset due to GPU hang Feb 3 11:44:25 chaklunchik kernel: drmn0: GPU HANG: ecode 6:1:00000000, in Renderer [101226] Feb 3 11:44:25 chaklunchik kernel: Feb 3 11:44:25 chaklunchik syslogd: last message repeated 1 times Feb 3 11:44:25 chaklunchik kernel: Fatal trap 12: page fault while in kernel mode Feb 3 11:44:25 chaklunchik kernel: cpuid = 5; apic id = 05 Feb 3 11:44:25 chaklunchik kernel: fault virtual address = 0x61 Feb 3 11:44:25 chaklunchik kernel: fault code = supervisor read data, page not present Feb 3 11:44:25 chaklunchik kernel: instruction pointer = 0x20:0xffffffff80986ac7 Feb 3 11:44:25 chaklunchik kernel: stack pointer = 0x28:0xfffffe001b190b70 Feb 3 11:44:25 chaklunchik kernel: frame pointer = 0x28:0xfffffe001b190bb0 Feb 3 11:44:25 chaklunchik kernel: code segment = base rx0, limit 0xfffff, type 0x1b Feb 3 11:44:25 chaklunchik kernel: = DPL 0, pres 1, long 1, def32 0, gran 1 Feb 3 11:44:25 chaklunchik kernel: processor eflags = interrupt enabled, resume, IOPL = 0 Feb 3 11:44:25 chaklunchik kernel: current process = 0 (linuxkpi_short_wq_8) Feb 3 11:44:25 chaklunchik kernel: trap number = 12 qFeb 3 11:44:25 chaklunchik kernel: panic: page fault Feb 3 11:44:25 chaklunchik kernel: cpuid = 5 Feb 3 11:44:25 chaklunchik kernel: time = 1643881438 Feb 3 11:44:25 chaklunchik kernel: KDB: stack backtrace: Feb 3 11:44:25 chaklunchik kernel: #0 0xffffffff8076258b at kdb_backtrace+0x6b Feb 3 11:44:25 chaklunchik kernel: #1 0xffffffff8071646f at vpanic+0x17f Feb 3 11:44:25 chaklunchik kernel: #2 0xffffffff807162e3 at panic+0x43 Feb 3 11:44:25 chaklunchik kernel: #3 0xffffffff80a2e455 at trap_fatal+0x385 Feb 3 11:44:25 chaklunchik kernel: #4 0xffffffff80a2e4af at trap_pfault+0x4f Feb 3 11:44:25 chaklunchik kernel: #5 0xffffffff80a07528 at calltrap+0x8 Feb 3 11:44:25 chaklunchik kernel: #6 0xffffffff80986bbd at kmem_free+0x2d Feb 3 11:44:25 chaklunchik kernel: #7 0xffffffff81d731ac at __i915_gpu_coredump_free+0x12c Feb 3 11:44:25 chaklunchik kernel: #8 0xffffffff81d473f9 at intel_gt_handle_error+0xa9 Feb 3 11:44:25 chaklunchik kernel: #9 0xffffffff81d34e50 at heartbeat+0x110 Feb 3 11:44:25 chaklunchik kernel: #10 0xffffffff808df36d at linux_work_fn+0xed Feb 3 11:44:25 chaklunchik kernel: #11 0xffffffff807770e7 at taskqueue_run_locked+0x187 Feb 3 11:44:25 chaklunchik kernel: #12 0xffffffff80778402 at taskqueue_thread_loop+0xc2 Feb 3 11:44:25 chaklunchik kernel: #13 0xffffffff806d4082 at fork_exit+0x82 Feb 3 11:44:25 chaklunchik kernel: #14 0xffffffff80a0859e at fork_trampoline+0xe i am not sure if any of these crashes helpful. When i try versions from supported drm-kmod ( aka from ports ) i having same issues. Maybe, i could try OpenBSD, which has drm synced with linux-kernel-5.15.x to look for any improvements.
after looking for similar issues, reported upstream, there is seems to be generic pattern in gpu issues like i observing, i will leave reference here, which has many other reports. for a sake of information. currently, i do not know whether it has been ever fully resolved, even in very latest linux kernel, if they are not triggered by something outside kernel, id est mesa. https://gitlab.freedesktop.org/drm/intel/-/issues?label_name[]=GPU%20hang
this is most likely connected to #261707 and dependent #261773, i testing the reverting change for last ~5 days, and i had no issue so far. Maybe, this bug can be closed for now.
I have also experienced GPU hang with i915km driver ("i915 1.6.0 20200313" from "dmesg"; had installed "drm-devel-kmod" package) in X11 on window manager restart. This has been on Framework laptop with Intel i5-1135G7 CPU on -CURRENT of 20220303 snapshot & later of "main-n253773-57014f21e75" 20220313. From recent hang ... drmn0: GPU HANG: ecode 12:1:85dffffb, in MainThread [101934] drmn0: Resetting rcs0 for stopped heartbeat on rsc0 Yesterday (or day before), after suspending the 'puter, it could not "resume"; had to do hard shutdown; can't say for sure if it was connected to the driver.
(In reply to Oleh Vinichenko from comment #12) > this is most likely connected to #261707 and dependent #261773, … Thank you … linked (Bugzilla style): bug 261707, bug 261773
I have the same issue. FreeBSD freebsd.my.domain 13.1-RELEASE FreeBSD 13.1-RELEASE releng/13.1-n250148-fc952ac2212 GENERIC amd64 drm-510-kmod-5.10.113_1 DRM drivers modules gpu-firmware-kmod-g20210330 Firmware modules for the linuxkpi-based KMS components Goes to sleep, but on wake up I get the system freeze. Screenshots of kernel panic: https://ibb.co/zPbpC7T https://ibb.co/bXDGW7V
(In reply to Oleh Vinichenko from comment #10) I think I experience the same issue after update to drm-510-kmod on -STABLE (Bug #266315)
Similar issue on my notebook with i915 I think: Reading symbols from /boot/kernel/kernel... Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug... Unread portion of the kernel message buffer: drmn0: [drm] Resetting rcs0 for invalid CSB event drmn0: [drm] GPU HANG: ecode 11:1:f1973ffc, in Xwayland [100909] Fatal trap 12: page fault while in kernel mode cpuid = 5; apic id = 05 fault virtual address = 0x61 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80f52807 stack pointer = 0x28:0xfffffe00e288acc0 frame pointer = 0x28:0xfffffe00e288ad00 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (linuxkpi_short_wq_0) trap number = 12 panic: page fault cpuid = 5 time = 1663856399 KDB: stack backtrace: #0 0xffffffff80c69465 at kdb_backtrace+0x65 #1 0xffffffff80c1bb1f at vpanic+0x17f #2 0xffffffff80c1b993 at panic+0x43 #3 0xffffffff810afdf5 at trap_fatal+0x385 #4 0xffffffff810afe4f at trap_pfault+0x4f #5 0xffffffff81087528 at calltrap+0x8 #6 0xffffffff80f528fd at kmem_free+0x2d #7 0xffffffff85b7259d at __i915_gpu_coredump_free+0x12d #8 0xffffffff85b3d5a9 at execlists_capture_work+0xa9 #9 0xffffffff80e6e533 at linux_work_fn+0xe3 #10 0xffffffff80c7da41 at taskqueue_run_locked+0x181 #11 0xffffffff80c7ed52 at taskqueue_thread_loop+0xc2 #12 0xffffffff80bd8a5e at fork_exit+0x7e #13 0xffffffff8108859e at fork_trampoline+0xe Uptime: 9h15m46s Dumping 2321 out of 32484 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, ---- FreeBSD 13.1-RELEASE releng/13.1-n250148-fc952ac2212 GENERIC drm-510-kmod-5.10.113_6 DRM drivers modules drm-kmod-20220907_1 Metaport of DRM modules for the linuxkpi-based KMS components ... Can provide additional details if necessary.
(In reply to stephan from comment #17) Does it happen with linuxkpi as a module or compiled in kernel?
I have a possibly related issue: When switching between VTs with alt-Fx (not using X, X is idle on F9), after some switch the screen shows a couple of WARNING !drm_modeset_is_locked ... as quoted above (but with a source path into /usr/ports) and no further interaction is possible, hard poweroff necessary, i.e. no crashdump. I am using recent 13.1-STABLE 38fe63afdc9d (OSVERSION = 1301507) and ports from quarterly (drm-510-kmod-5.10.113_7) on a Livebook A3511 (cpu i3-1115G4). kldloaded modules: vmm.ko i915kms.ko drm.ko linuxkpi_gplv2.ko dmabuf.ko if_iwlwifi.ko Settings: compat.linuxkpi.i915_enable_dc=2 compat.linuxkpi.i915_enable_fbc=1
With bug 261773 fixed in February 2022, I wonder whether what remains here is a mixture of reports … (In reply to Peter Much from comment #19) > … no further interaction is possible, If things seem stuck at ttyv0, then does e.g. Alt-F3 followed by Alt-F2 have any effect? > hard poweroff necessary, … Try this in your /etc/sysctl.conf hw.acpi.power_button_state="S5" – then restart the computer, and next time things seem stuck: try a _short_ press on the power button (not so hard).
(In reply to Graham Perrin from comment #20) This did not happen anymore recently, probably because of either * moving along with STABLE, or * having a (crude but somehow working) patch&workaround for the primary source of troube, which is iwlwifi (as discussed in bug263613 and a bunch of others. That one appears to have some difficulty with separating duties between linuxkpi and native code, and apparently produces strangely invalid objects in the kernel.) I'm now strictly unloading/reloading that module in any case of doubt, and things are better.
i have never used WiFi, and my base system is built with WITHOUT_WIRELESS_SUPPORT and no any wifi modules ever loaded, so my hang has nothing to do with wifi, to my understanding.
This bug may be affecting me on the GENERIC RELEASE kernel 14.0-p6 Here's my dmesg: Apr 1 01:18:07 lappy kernel: ath0: ath_rate_tx_complete: ts_rate=27 ts_finaltsi=0, final_rix=0 Apr 1 01:18:07 lappy kernel: ath0: bad series0 hwrate 0x1b, tries 2 ts_status 0x0 Apr 1 01:42:30 lappy kernel: ath0: ath_intr: TSFOOR Apr 1 01:44:19 lappy kernel: drmn0: [drm] GPU HANG: ecode 7:0:00000000 Apr 1 01:44:19 lappy kernel: drmn0: [drm] Resetting rcs0 for no heartbeat on rcs0 Apr 1 01:44:19 lappy kernel: drmn0: [drm] GPU HANG: ecode 7:0:00000000 Apr 1 01:44:19 lappy kernel: drmn0: [drm] Resetting rcs0 for no heartbeat on rcs0 Apr 1 01:44:19 lappy kernel: drmn0: [drm] GPU HANG: ecode 7:0:00000000 Apr 1 01:44:19 lappy kernel: drmn0: [drm] Resetting rcs0 for no heartbeat on rcs0 Apr 1 01:44:25 lappy acpi[31352]: suspend at 20240401 01:44:25 Apr 1 01:44:49 lappy kernel: drmn0: [drm] GPU HANG: ecode 7:0:00000000 Apr 1 01:44:49 lappy kernel: drmn0: [drm] Resetting rcs0 for no heartbeat on rcs0 Apr 1 01:44:49 lappy kernel: drmn0: [drm] GPU HANG: ecode 7:0:00000000 Apr 1 01:44:49 lappy kernel: drmn0: [drm] Resetting rcs0 for no heartbeat on rcs0 Apr 1 01:44:49 lappy kernel: drmn0: [drm] GPU HANG: ecode 7:0:00000000 Apr 1 01:44:49 lappy kernel: drmn0: [drm] Resetting rcs0 for no heartbeat on rcs0 Apr 1 01:44:49 lappy kernel: drmn0: [drm] GPU HANG: ecode 7:0:00000000 Apr 1 01:44:49 lappy kernel: drmn0: [drm] Resetting rcs0 for no heartbeat on rcs0 Apr 1 01:45:00 lappy kernel: drmn0: [drm] GPU HANG: ecode 7:0:00000000 Apr 1 01:45:00 lappy kernel: drmn0: [drm] Resetting rcs0 for no heartbeat on rcs0 Apr 1 01:45:00 lappy kernel: drmn0: [drm] GPU HANG: ecode 7:0:00000000 Apr 1 01:45:00 lappy kernel: drmn0: [drm] Resetting rcs0 for no heartbeat on rcs0 Apr 1 01:46:31 lappy syslogd: kernel boot file is /boot/kernel/kernel Apr 1 01:46:31 lappy kernel: ---<<BOOT>>---