(I don't think I have fully understood the problem yet, this bug is mainly to serve as a memo documenting what we have done so far and already know; Thanks for markj@'s hints in how to get useful debugging information)
Set up kernel crash dump for DRM
Set the following sysctl's:
As well as a dump device.
After setting this up, I was able to get a kernel crash dump, with the following backtrace:
[drm:gen8_init_common_ring] Execlists enabled for rcs0
[drm:init_workarounds_ring] rcs0: Number of context specific w/a: 15
[drm:gen8_init_common_ring] Execlists enabled for bcs0
[drm:gen8_init_common_ring] Execlists enabled for vcs0
[drm:gen8_init_common_ring] Execlists enabled for vecs0
panic: vm_page_wire: page 0xfffffe000c2da0a8 does not belong to an object
cpuid = 7
time = 1568513321
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
vpanic() at vpanic+0x19d/frame 0xfffffe00e4b12750
panic() at panic+0x43/frame 0xfffffe00e4b127b0
vm_page_wire() at vm_page_wire+0x9a/frame 0xfffffe00e4b127d0
gen8_ppgtt_cleanup() at gen8_ppgtt_cleanup+0xaf/frame 0xfffffe00e4b12810
i915_ppgtt_release() at i915_ppgtt_release+0x52/frame 0xfffffe00e4b12830
i915_gem_context_free() at i915_gem_context_free+0x1e0/frame
contexts_free_worker() at contexts_free_worker+0x8d/frame 0xfffffe00e4b12880
linux_work_fn() at linux_work_fn+0xe7/frame 0xfffffe00e4b128e0
taskqueue_run_locked() at taskqueue_run_locked+0x10c/frame
taskqueue_thread_loop() at taskqueue_thread_loop+0x88/frame
fork_exit() at fork_exit+0x84/frame 0xfffffe00e4b129b0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00e4b129b0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
And the assertion was triggered here:
#1 0xffffffff80bd2830 in kern_reboot (howto=260) at
Current language: auto; currently minimal
#2 0xffffffff80bd2ca9 in vpanic (fmt=<value optimized out>, ap=<value
optimized out>) at /usr/src/sys/kern/kern_shutdown.c:908
#3 0xffffffff80bd29e3 in panic (fmt=<value optimized out>) at
835 vpanic(fmt, ap);
#4 0xffffffff80f25c5a in vm_page_wire (m=<value optimized out>) at
85 __asm __volatile("addq\t%1,%%gs:(%0)"
#5 0xffffffff84db8d2f in gen8_ppgtt_cleanup (vm=0xfffffe015261a000) at
#6 0xffffffff84db4812 in i915_ppgtt_release (kref=<value optimized
warning: Source file is more recent than executable.
2266 /* vmas should already be unbound and destroyed */
So basically, in r352110, vm_page_wire was modified to require a VM object, and the requirement is enforced as an assertion.
The Linux get_page() API basically do the same of wiring the page, but it's not yet clear to me whether we can always assert that the page is already mapped (in FreeBSD's terms).
A quick hack would be to replace the vm_page_wire(page) call in sys/compat/linuxkpi/common/include/linux/mm.h with an assertion that the
equivalent call of vm_page_wire_mapped(page) succeeded, and I am able to
get my laptop working again with CURRENT.
^Triage: Assign to reporter (committer) for coordination at least until a more appropriate assignee is determined.
We had discussed this over email and I claimed that the assertion in vm_page_wire() is too strong. But, the problematic vm_page_wire() call is wrong too, so I submitted this instead: https://github.com/FreeBSDDesktop/kms-drm/pull/175