Only seen once. Was visiting duckduckgo with the tor-browser from pkg at the time. FreeBSD desktop.example.com 13.2-RELEASE FreeBSD 13.2-RELEASE releng/13.2-n254617-525ecfdad597 GENERIC amd64 May 8 12:38:43 desktop syslogd: kernel boot file is /boot/kernel/kernel May 8 12:38:43 desktop kernel: drmn0: [drm] GPU HANG: ecode 6:1:39393938, in Isolated Web Conten [109930] May 8 12:38:43 desktop kernel: drmn0: [drm] Resetting chip for stopped heartbeat on rcs0 May 8 12:38:43 desktop kernel: drmn0: [drm] Isolated Web Conten[109930] context reset due to GPU hang May 8 12:38:43 desktop kernel: drmn0: [drm] GPU HANG: ecode 6:1:39393938, in Isolated Web Conten [109930] May 8 12:38:43 desktop kernel: May 8 12:38:43 desktop syslogd: last message repeated 1 times May 8 12:38:43 desktop kernel: Fatal trap 12: page fault while in kernel mode May 8 12:38:43 desktop kernel: cpuid = 6; apic id = 06 May 8 12:38:43 desktop kernel: fault virtual address = 0x61 May 8 12:38:43 desktop kernel: fault code = supervisor read data, page not present May 8 12:38:43 desktop kernel: instruction pointer = 0x20:0xffffffff80f55587 May 8 12:38:43 desktop kernel: stack pointer = 0x28:0xfffffe001b7c9b60 May 8 12:38:43 desktop kernel: frame pointer = 0x28:0xfffffe001b7c9ba0 May 8 12:38:43 desktop kernel: code segment = base rx0, limit 0xfffff, type 0x1b May 8 12:38:43 desktop kernel: = DPL 0, pres 1, long 1, def32 0, gran 1 May 8 12:38:43 desktop kernel: processor eflags = interrupt enabled, resume, IOPL = 0 May 8 12:38:43 desktop kernel: current process = 0 (linuxkpi_short_wq_0) May 8 12:38:43 desktop kernel: trap number = 12 May 8 12:38:43 desktop kernel: panic: page fault May 8 12:38:43 desktop kernel: cpuid = 6 May 8 12:38:43 desktop kernel: time = 1683567443 May 8 12:38:43 desktop kernel: KDB: stack backtrace: May 8 12:38:43 desktop kernel: #0 0xffffffff80c53dc5 at kdb_backtrace+0x65 May 8 12:38:43 desktop kernel: #1 0xffffffff80c06741 at vpanic+0x151 May 8 12:38:43 desktop kernel: #2 0xffffffff80c065e3 at panic+0x43 May 8 12:38:43 desktop kernel: #3 0xffffffff810b1fa7 at trap_fatal+0x387 May 8 12:38:43 desktop kernel: #4 0xffffffff810b1fff at trap_pfault+0x4f May 8 12:38:43 desktop kernel: #5 0xffffffff81088e78 at calltrap+0x8 May 8 12:38:43 desktop kernel: #6 0xffffffff80f5567d at kmem_free+0x2d May 8 12:38:43 desktop kernel: #7 0xffffffff8271e81d at __i915_gpu_coredump_free+0x12d May 8 12:38:43 desktop kernel: #8 0xffffffff826efdd9 at intel_gt_handle_error+0xa9 May 8 12:38:43 desktop kernel: #9 0xffffffff826db131 at heartbeat+0x2a1 May 8 12:38:43 desktop kernel: #10 0xffffffff80e63653 at linux_work_fn+0xe3 May 8 12:38:43 desktop kernel: #11 0xffffffff80c68961 at taskqueue_run_locked+0x191 May 8 12:38:43 desktop kernel: #12 0xffffffff80c69c23 at taskqueue_thread_loop+0xc3 May 8 12:38:43 desktop kernel: #13 0xffffffff80bc2fce at fork_exit+0x7e May 8 12:38:43 desktop kernel: #14 0xffffffff81089eee at fork_trampoline+0xe May 8 12:38:43 desktop kernel: Uptime: 15d21h24m51s
Looks like a LinuxKPI bug: - i915_vma_coredump contains an array of pages, freed in __i915_gpu_coredump_free() -> cleanup_gt() -> i915_vma_coredump_free() with for (page = 0; page < vma->page_count; page++) free_page((unsigned long)vma->pages[page]); - free_page() just calls FreeBSD's kmem_free(). That is, it expects to receive a page mapped into the kernel map. - Looks like those pages are allocated by pool_alloc() in i915_gpu_error.c. It uses alloc_page() in the LinuxKPI, which just allocates and returns an unmapped page. pool_alloc() extracts the direct map address. So, i915kms is passing a direct map address to free_page(), which doesn't handle that. Probably free_page() should handle direct-mapped addresses by resolving them to a page and freeing that with linux_free_pages(page, 0).
https://reviews.freebsd.org/D40028
Thanks for looking at this. Just curious if this crash was related to the GPU HANG. Ever since 12.3-RELEASE I've seen the occasional GPU HANG (but never a crash) and was wondering if this fix might take care of those. See https://lists.freebsd.org/archives/freebsd-stable/2022-January/000501.html
(In reply to Greg Balfour from comment #3) Hmm, I don't *think* the patch is likely to help with that. If the bug fixed by the patch is triggered, I'd expect to see a kernel panic. I'm not sure how to begin tracking down the cause of GPU driver hangs.
^Triage: clarify Summary.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=6223d0b67af923f53d962a9bf594dc37004dffe8 commit 6223d0b67af923f53d962a9bf594dc37004dffe8 Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2023-10-17 14:26:18 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2023-10-17 15:19:06 +0000 linuxkpi: Handle direct-mapped addresses in linux_free_kmem() See the analysis in PR 271333. It is possible for driver code to allocate a page, store its address as returned by page_address(), then call free_page() on that address. On most systems that'll result in the LinuxKPI calling kmem_free() with a direct-mapped address, which is not legal. Fix the problem by making linux_free_kmem() check the address to see whether it's direct-mapped or not, and handling it appropriately. PR: 271333, 274515 Reviewed by: hselasky, bz Tested by: trasz MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D40028 sys/compat/linuxkpi/common/src/linux_page.c | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-)
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=4862eb8604d503b52e7c3aa7ff32155b75a1ff93 commit 4862eb8604d503b52e7c3aa7ff32155b75a1ff93 Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2023-10-17 14:26:18 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2023-10-24 13:20:01 +0000 linuxkpi: Handle direct-mapped addresses in linux_free_kmem() See the analysis in PR 271333. It is possible for driver code to allocate a page, store its address as returned by page_address(), then call free_page() on that address. On most systems that'll result in the LinuxKPI calling kmem_free() with a direct-mapped address, which is not legal. Fix the problem by making linux_free_kmem() check the address to see whether it's direct-mapped or not, and handling it appropriately. PR: 271333, 274515 Reviewed by: hselasky, bz Tested by: trasz MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D40028 (cherry picked from commit 6223d0b67af923f53d962a9bf594dc37004dffe8) sys/compat/linuxkpi/common/src/linux_page.c | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-)
A commit in branch releng/14.0 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=87dbb943df73022dd98487c123aeb125da11c4af commit 87dbb943df73022dd98487c123aeb125da11c4af Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2023-10-17 14:26:18 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2023-10-25 16:53:01 +0000 linuxkpi: Handle direct-mapped addresses in linux_free_kmem() See the analysis in PR 271333. It is possible for driver code to allocate a page, store its address as returned by page_address(), then call free_page() on that address. On most systems that'll result in the LinuxKPI calling kmem_free() with a direct-mapped address, which is not legal. Fix the problem by making linux_free_kmem() check the address to see whether it's direct-mapped or not, and handling it appropriately. Approved by: re (gjb) PR: 271333, 274515 Reviewed by: hselasky, bz Tested by: trasz MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D40028 (cherry picked from commit 6223d0b67af923f53d962a9bf594dc37004dffe8) (cherry picked from commit 4862eb8604d503b52e7c3aa7ff32155b75a1ff93) sys/compat/linuxkpi/common/src/linux_page.c | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=6cba7aec21bcd957478a987f9391fd33a4babdac commit 6cba7aec21bcd957478a987f9391fd33a4babdac Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2023-10-17 14:26:18 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2024-01-09 17:59:49 +0000 linuxkpi: Handle direct-mapped addresses in linux_free_kmem() See the analysis in PR 271333. It is possible for driver code to allocate a page, store its address as returned by page_address(), then call free_page() on that address. On most systems that'll result in the LinuxKPI calling kmem_free() with a direct-mapped address, which is not legal. Fix the problem by making linux_free_kmem() check the address to see whether it's direct-mapped or not, and handling it appropriately. PR: 271333, 274515 Reviewed by: hselasky, bz Tested by: trasz MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D40028 (cherry picked from commit 6223d0b67af923f53d962a9bf594dc37004dffe8) sys/compat/linuxkpi/common/src/linux_page.c | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-)