Bug 274515 - instapanic with i915 and Linux client under Wayland
Summary: instapanic with i915 and Linux client under Wayland
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: Mark Johnston
URL:
Keywords: crash
Depends on:
Blocks: 14.0r
  Show dependency treegraph
 
Reported: 2023-10-16 14:40 UTC by Edward Tomasz Napierala
Modified: 2024-01-09 18:01 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Edward Tomasz Napierala freebsd_committer freebsd_triage 2023-10-16 14:40:25 UTC
FreeBSD/amd64 panics when trying to run glxgears(1) from amd64 Ubuntu Focal when running under native Wayland and i915.  It's fairy (9/10 tries) reproducible.  This doesn't happen when I replace Wayland with traditional Xorg; it also doesn't happen with glxgears from ports.  Backtrace looks like this:

drmn0: [drm] Resetting rcs0 for CS error
drmn0: [drm] Xwayland[101131] context reset due to GPU hang
drmn0: [drm] GPU HANG: ecode 9:1:bcff835b, in Xwayland [101131]
drmn0: [drm] Resetting rcs0 for CS error
drmn0: [drm] Xwayland[101131] context reset due to GPU hang
drmn0: [drm] GPU HANG: ecode 9:1:bd5699c3, in Xwayland [101131]
Kernel page fault with the following non-sleepable locks held:
exclusive rw kernel vm object (kernel vm object) r = 0 (0xffffffff81ad1c48) locked @ /usr/home/trasz/git/freebsd-src/sys/vm/vm_kern.c:605
stack backtrace:
#0 0xffffffff80bc29c5 at witness_debugger+0x65
#1 0xffffffff80bc3af9 at witness_warn+0x3e9
#2 0xffffffff8104ed48 at trap_pfault+0x88
#3 0xffffffff81021358 at calltrap+0x8
#4 0xffffffff80eeb9dd at kmem_free+0x2d
#5 0xffffffff8376b05c at __i915_gpu_coredump_free+0xfc
#6 0xffffffff83715d9b at execlists_capture_work+0xab
#7 0xffffffff80df1e03 at linux_work_fn+0xe3
#8 0xffffffff80bb497b at taskqueue_run_locked+0xab
#9 0xffffffff80bb5a33 at taskqueue_thread_loop+0xd3
#10 0xffffffff80b04f02 at fork_exit+0x82
#11 0xffffffff810223be at fork_trampoline+0xe


Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 03
fault virtual address	= 0x61
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff80eeb858
stack pointer	        = 0x28:0xfffffe00c612fcf0
frame pointer	        = 0x28:0xfffffe00c612fd20
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 0 (linuxkpi_short_wq_4)
rdi: ffffffff81ad1ca0 rsi: 000fffffa00be3d2 rdx: 00000000fffffa00
rcx: ffffffffffffffd8  r8: 00000000ffffffff  r9: 0000000000000000
rax: 0000000000000000 rbx: 0000000000001000 rbp: fffffe00c612fd20
r10: fffff8042e576c00 r11: 0000000000010000 r12: fffff80264e25a00
r13: fffffa00be3d2000 r14: 0000000000000000 r15: fffff80009501c00
trap number		= 12
panic: page fault
cpuid = 3
time = 1697302088
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00c612f9c0
vpanic() at vpanic+0x132/frame 0xfffffe00c612faf0
panic() at panic+0x43/frame 0xfffffe00c612fb50
trap_fatal() at trap_fatal+0x40c/frame 0xfffffe00c612fbb0
trap_pfault() at trap_pfault+0xae/frame 0xfffffe00c612fc20
calltrap() at calltrap+0x8/frame 0xfffffe00c612fc20
--- trap 0xc, rip = 0xffffffff80eeb858, rsp = 0xfffffe00c612fcf0, rbp = 0xfffffe00c612fd20 ---
_kmem_unback() at _kmem_unback+0x78/frame 0xfffffe00c612fd20
kmem_free() at kmem_free+0x2d/frame 0xfffffe00c612fd40
__i915_gpu_coredump_free() at __i915_gpu_coredump_free+0xfc/frame 0xfffffe00c612fd80
execlists_capture_work() at execlists_capture_work+0xab/frame 0xfffffe00c612fdf0
linux_work_fn() at linux_work_fn+0xe3/frame 0xfffffe00c612fe40
taskqueue_run_locked() at taskqueue_run_locked+0xab/frame 0xfffffe00c612fec0
taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame 0xfffffe00c612fef0
fork_exit() at fork_exit+0x82/frame 0xfffffe00c612ff30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00c612ff30
--- trap 0xbe8d4339, rip = 0x8f1e1dc67eb85dfb, rsp = 0x64f5f62d9553b610, rbp = 0x1ba64ffe8da156e ---
Comment 1 Mark Johnston freebsd_committer freebsd_triage 2023-10-16 14:41:57 UTC
Could you please see if this patch helps?  https://reviews.freebsd.org/D40028

It won't fix the underlying problem which is triggering a GPU reset, but at least the panic should be gone.
Comment 2 Edward Tomasz Napierala freebsd_committer freebsd_triage 2023-10-17 14:17:34 UTC
It does fix the panic, thank you :)

Any chance to get this in before 14.0?
Comment 3 commit-hook freebsd_committer freebsd_triage 2023-10-17 15:56:26 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=6223d0b67af923f53d962a9bf594dc37004dffe8

commit 6223d0b67af923f53d962a9bf594dc37004dffe8
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2023-10-17 14:26:18 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2023-10-17 15:19:06 +0000

    linuxkpi: Handle direct-mapped addresses in linux_free_kmem()

    See the analysis in PR 271333.  It is possible for driver code to
    allocate a page, store its address as returned by page_address(), then
    call free_page() on that address.  On most systems that'll result in the
    LinuxKPI calling kmem_free() with a direct-mapped address, which is not
    legal.

    Fix the problem by making linux_free_kmem() check the address to see
    whether it's direct-mapped or not, and handling it appropriately.

    PR:             271333, 274515
    Reviewed by:    hselasky, bz
    Tested by:      trasz
    MFC after:      1 week
    Sponsored by:   The FreeBSD Foundation
    Differential Revision:  https://reviews.freebsd.org/D40028

 sys/compat/linuxkpi/common/src/linux_page.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)
Comment 4 commit-hook freebsd_committer freebsd_triage 2023-10-24 13:39:24 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=4862eb8604d503b52e7c3aa7ff32155b75a1ff93

commit 4862eb8604d503b52e7c3aa7ff32155b75a1ff93
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2023-10-17 14:26:18 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2023-10-24 13:20:01 +0000

    linuxkpi: Handle direct-mapped addresses in linux_free_kmem()

    See the analysis in PR 271333.  It is possible for driver code to
    allocate a page, store its address as returned by page_address(), then
    call free_page() on that address.  On most systems that'll result in the
    LinuxKPI calling kmem_free() with a direct-mapped address, which is not
    legal.

    Fix the problem by making linux_free_kmem() check the address to see
    whether it's direct-mapped or not, and handling it appropriately.

    PR:             271333, 274515
    Reviewed by:    hselasky, bz
    Tested by:      trasz
    MFC after:      1 week
    Sponsored by:   The FreeBSD Foundation
    Differential Revision:  https://reviews.freebsd.org/D40028

    (cherry picked from commit 6223d0b67af923f53d962a9bf594dc37004dffe8)

 sys/compat/linuxkpi/common/src/linux_page.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)
Comment 5 commit-hook freebsd_committer freebsd_triage 2023-10-25 16:57:07 UTC
A commit in branch releng/14.0 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=87dbb943df73022dd98487c123aeb125da11c4af

commit 87dbb943df73022dd98487c123aeb125da11c4af
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2023-10-17 14:26:18 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2023-10-25 16:53:01 +0000

    linuxkpi: Handle direct-mapped addresses in linux_free_kmem()

    See the analysis in PR 271333.  It is possible for driver code to
    allocate a page, store its address as returned by page_address(), then
    call free_page() on that address.  On most systems that'll result in the
    LinuxKPI calling kmem_free() with a direct-mapped address, which is not
    legal.

    Fix the problem by making linux_free_kmem() check the address to see
    whether it's direct-mapped or not, and handling it appropriately.

    Approved by:    re (gjb)
    PR:             271333, 274515
    Reviewed by:    hselasky, bz
    Tested by:      trasz
    MFC after:      1 week
    Sponsored by:   The FreeBSD Foundation
    Differential Revision:  https://reviews.freebsd.org/D40028

    (cherry picked from commit 6223d0b67af923f53d962a9bf594dc37004dffe8)
    (cherry picked from commit 4862eb8604d503b52e7c3aa7ff32155b75a1ff93)

 sys/compat/linuxkpi/common/src/linux_page.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)
Comment 6 commit-hook freebsd_committer freebsd_triage 2024-01-09 18:01:03 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=6cba7aec21bcd957478a987f9391fd33a4babdac

commit 6cba7aec21bcd957478a987f9391fd33a4babdac
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2023-10-17 14:26:18 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2024-01-09 17:59:49 +0000

    linuxkpi: Handle direct-mapped addresses in linux_free_kmem()

    See the analysis in PR 271333.  It is possible for driver code to
    allocate a page, store its address as returned by page_address(), then
    call free_page() on that address.  On most systems that'll result in the
    LinuxKPI calling kmem_free() with a direct-mapped address, which is not
    legal.

    Fix the problem by making linux_free_kmem() check the address to see
    whether it's direct-mapped or not, and handling it appropriately.

    PR:             271333, 274515
    Reviewed by:    hselasky, bz
    Tested by:      trasz
    MFC after:      1 week
    Sponsored by:   The FreeBSD Foundation
    Differential Revision:  https://reviews.freebsd.org/D40028

    (cherry picked from commit 6223d0b67af923f53d962a9bf594dc37004dffe8)

 sys/compat/linuxkpi/common/src/linux_page.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)