Bug 242961

Summary: Crashes (elf64_coredump … vm_object_set_writeable_dirty) after the recent vm patch series
Product: Base System Reporter: Val Packett <val>
Component: kernAssignee: Mark Johnston <markj>
Status: Closed FIXED    
Severity: Affects Only Me CC: jeff, markj
Priority: ---    
Version: CURRENT   
Hardware: Any   
OS: Any   

Description Val Packett 2019-12-29 15:57:58 UTC
Either the series with https://reviews.freebsd.org/D22885 or 'Correctly implement PMAP_ENTER_NOREPLACE…' is causing my system to crash very soon after entering the desktop (wayfire). (I reverted both 'PMAP_…' and everything from 'Remove some unused functions' to 'Don't update per-page activation counts…' and that fixed the problem.)

A dump I got doesn't seem desktop/gpu specific in any way, but seems to point at the coredump functionality:


Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address   = 0x89
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff806e8d84
stack pointer           = 0x0:0xfffffe00cdc812c0
frame pointer           = 0x0:0xfffffe00cdc812c0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 63690 (cron)
trap number             = 12
panic: page fault
cpuid = 2
time = 1577630640
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00cdc80f30
vpanic() at vpanic+0x17e/frame 0xfffffe00cdc80f90
panic() at panic+0x43/frame 0xfffffe00cdc80ff0
trap_fatal() at trap_fatal+0x386/frame 0xfffffe00cdc81050
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00cdc810c0
trap() at trap+0x288/frame 0xfffffe00cdc811f0
calltrap() at calltrap+0x8/frame 0xfffffe00cdc811f0
--- trap 0xc, rip = 0xffffffff806e8d84, rsp = 0xfffffe00cdc812c0, rbp = 0xfffffe00cdc812c0 ---
vm_object_set_writeable_dirty() at vm_object_set_writeable_dirty+0x4/frame 0xfffffe00cdc812c0
vm_fault() at vm_fault+0x163f/frame 0xfffffe00cdc81400
vm_fault_quick_hold_pages() at vm_fault_quick_hold_pages+0x18a/frame 0xfffffe00cdc81480
vn_io_fault1() at vn_io_fault1+0x268/frame 0xfffffe00cdc815d0
vn_rdwr() at vn_rdwr+0x295/frame 0xfffffe00cdc816a0
vn_rdwr_inchunks() at vn_rdwr_inchunks+0x90/frame 0xfffffe00cdc81720
elf64_coredump() at elf64_coredump+0xbda/frame 0xfffffe00cdc81820
sigexit() at sigexit+0xba2/frame 0xfffffe00cdc81b00
postsig() at postsig+0x2f5/frame 0xfffffe00cdc81bc0
ast() at ast+0x2e7/frame 0xfffffe00cdc81bf0
doreti_ast() at doreti_ast+0x1f/frame 0x7fffffffdcb0

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
(kgdb) bt
[…]
#8  <signal handler called>
#9  vm_object_set_writeable_dirty (object=0x0) at /usr/src/sys/vm/vm_object.c:2236
#10 0xffffffff806d461f in vm_fault_dirty (entry=0xfffff8003205e000, m=0xfffffe0008806d60, prot=<optimized out>,
    fault_type=<optimized out>, fault_flags=0) at /usr/src/sys/vm/vm_fault.c:249
#11 vm_fault (map=0xfffff8002e7a5000, vaddr=140737488240640, fault_type=1 '\001', fault_flags=0,
    m_hold=0xfffffe00cdc814c0) at /usr/src/sys/vm/vm_fault.c:1358
#12 0xffffffff806d58ba in vm_fault_quick_hold_pages (map=0xfffff8002e7a5000, addr=140737488240640,
    len=<optimized out>, prot=1 '\001', ma=0xfffffe00cdc81490, max_count=<optimized out>)
    at /usr/src/sys/vm/vm_fault.c:1657
#13 0xffffffff80510908 in vn_io_fault1 (vp=<optimized out>, uio=0xfffffe00cdc81608, args=0xfffffe00cdc81638,
    td=0xfffff80056797000) at /usr/src/sys/kern/vfs_vnops.c:1111
#14 0xffffffff80510565 in vn_rdwr (rw=<optimized out>, vp=0xfffff801264f6000, base=<optimized out>,
    len=<optimized out>, offset=<optimized out>, segflg=<optimized out>, ioflg=16641,
    active_cred=0xfffff80018530e00, file_cred=0x0, aresid=0xfffffe00cdc816e0, td=0xfffff80056797000)
    at /usr/src/sys/kern/vfs_vnops.c:603
Comment 1 Mark Johnston freebsd_committer freebsd_triage 2019-12-29 16:54:03 UTC
This doesn't look directly related to my changes.  Can you print *m from frame 10 and fs from frame 11?
Comment 2 Val Packett 2019-12-29 17:04:16 UTC
(In reply to Mark Johnston from comment #1)

(kgdb) frame 10
#10 0xffffffff806d461f in vm_fault_dirty (entry=0xfffff8003205e000, m=0xfffffe0008806d60, prot=<optimized out>, 
    fault_type=<optimized out>, fault_flags=0) at /usr/src/sys/vm/vm_fault.c:249
warning: Source file is more recent than executable.
249                 (fault_flags & VM_FAULT_DIRTY) != 0;
(kgdb) p *m
$1 = {plinks = {q = {tqe_next = 0xfffffe0008446218, tqe_prev = 0xfffffe0008329d50}, s = {ss = {
        sle_next = 0xfffffe0008446218}}, memguard = {p = 18446741874824995352, v = 18446741874823830864}, uma = {
      slab = 0xfffffe0008446218, zone = 0xfffffe0008329d50}}, listq = {tqe_next = 0xfffffe0002d760c0, 
    tqe_prev = 0xfffffe0004f8e140}, object = 0x0, pindex = 6, phys_addr = 5617664000, md = {pv_list = {
      tqh_first = 0xfffff801464c72f8, tqh_last = 0xfffff801464c7300}, pv_gen = 1549922559, pat_mode = 6}, 
  ref_count = 0, busy_lock = 2, a = {{flags = 3, queue = 255 '\377', act_count = 5 '\005'}, _bits = 100597763}, 
  order = 13 '\r', pool = 0 '\000', flags = 1 '\001', oflags = 0 '\000', psind = 0 '\000', segind = 10 '\n', 
  valid = 255 '\377', dirty = 255 '\377'}
(kgdb) frame 11
#11 vm_fault (map=0xfffff8002e7a5000, vaddr=140737488240640, fault_type=1 '\001', fault_flags=0, 
    m_hold=0xfffffe00cdc814c0) at /usr/src/sys/vm/vm_fault.c:1358
1358            /*
(kgdb) p fs
$2 = <optimized out>

I guess I'll have to try a debug kernel…
Comment 3 Val Packett 2019-12-29 17:50:30 UTC
Debug kernel (also not reverting PMAP_ENTER_NOREPLACE anymore, it's for i386 anyway), different panic:

panic: Bad link elm 0xfffffe0009572b20 next->prev != elm

#4  0xffffffff80423673 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:835
#5  0xffffffff806e8fa9 in _vm_page_pqstate_commit_dequeue (pq=<optimized out>, m=0xfffffe0009572b20, old=0xfffffe00ba1fe888, 
    new=...) at /usr/src/sys/vm/vm_page.c:3332
#6  0xffffffff806e50e6 in vm_page_pqstate_commit_dequeue (m=0xfffffe0009572b20, old=0xfffffe00ba1fe888, new=...)
    at /usr/src/sys/vm/vm_page.c:3369
#7  0xffffffff806e4f76 in vm_page_pqstate_commit (m=0xfffffe0009572b20, old=<unavailable>, new=...)
    at /usr/src/sys/vm/vm_page.c:3446
#8  0xffffffff806e137a in vm_page_mvqueue (m=0xfffffe0009572b20, 
    nqueue=<error reading variable: Cannot access memory at address 0x0>, 
    nflag=<error reading variable: Cannot access memory at address 0x20>) at /usr/src/sys/vm/vm_page.c:4030
#9  vm_page_deactivate (m=0xfffffe0009572b20) at /usr/src/sys/vm/vm_page.c:4051
#10 0xffffffff806c86e5 in fault_page_release (mp=<optimized out>) at /usr/src/sys/vm/vm_fault.c:165
#11 vm_fault (map=0xfffff80144e66000, vaddr=34363002880, fault_type=<optimized out>, fault_flags=0, m_hold=0x0)
    at /usr/src/sys/vm/vm_fault.c:1259
#12 0xffffffff806c762e in vm_fault_trap (map=0xfffff80144e66000, vaddr=<optimized out>, fault_type=<optimized out>, fault_flags=0, 
    signo=0xfffffe00ba1febc4, ucode=<optimized out>) at /usr/src/sys/vm/vm_fault.c:571
#13 0xffffffff80779013 in trap_pfault (frame=<optimized out>, usermode=<optimized out>, signo=<optimized out>, 
    ucode=0xfffffe00ba1febc0) at /usr/src/sys/amd64/amd64/trap.c:828
#14 0xffffffff80778670 in trap (frame=0xfffffe00ba1fec00) at /usr/src/sys/amd64/amd64/trap.c:347
#15 <signal handler called>
#16 0x000000080030499c in ?? ()
Comment 4 Mark Johnston freebsd_committer freebsd_triage 2019-12-29 17:54:32 UTC
Are you able to trigger this with r356173 applied?
Comment 5 Val Packett 2019-12-29 18:29:04 UTC
(In reply to Mark Johnston from comment #4)

Yes. Different once again. On nondebug kernel, back to coredump, but different stuff inside:

<6>pid 47948 (wayfire), jid 0, uid 1001: exited on signal 10 (core dumped)
<6>pid 50112 (dbus-daemon), jid 0, uid 1001: exited on signal 10
panic: vm_radix_insert: key 501 is already present

[…]
#4  0xffffffff80425fa3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:835
#5  0xffffffff806f98ab in vm_radix_insert (rtree=<optimized out>, page=<optimized out>) at /usr/src/sys/vm/vm_radix.c:367
#6  0xffffffff806ec734 in vm_page_insert_after (m=<optimized out>, object=0xfffff8016f2b7420, pindex=1281, mpred=0xfffffe0009b36e80)
    at /usr/src/sys/vm/vm_page.c:1526
#7  vm_page_alloc_domain_after (object=0xfffff8016f2b7420, pindex=1281, domain=0, req=32832, mpred=0xfffffe0009b36e80)
    at /usr/src/sys/vm/vm_page.c:2086
#8  0xffffffff806ec2e4 in vm_page_alloc_after (object=0xfffff8016f2b7420, pindex=1281, req=<unavailable>, mpred=0xfffffe0009b36e80)
    at /usr/src/sys/vm/vm_page.c:1925
#9  vm_page_alloc (object=0xfffff8016f2b7420, pindex=1281, req=64) at /usr/src/sys/vm/vm_page.c:1895
#10 0xffffffff806d36d6 in vm_fault (map=0xfffff800357c5000, vaddr=34375471104, fault_type=<optimized out>, fault_flags=0, 
    m_hold=0xfffffe00d17d44f0) at /usr/src/sys/vm/vm_fault.c:934
#11 0xffffffff806d58ba in vm_fault_quick_hold_pages (map=0xfffff800357c5000, addr=34375471104, len=<optimized out>, prot=1 '\001', 
    ma=0xfffffe00d17d4490, max_count=<optimized out>) at /usr/src/sys/vm/vm_fault.c:1657
#12 0xffffffff80510908 in vn_io_fault1 (vp=<optimized out>, uio=0xfffffe00d17d4608, args=0xfffffe00d17d4638, td=0xfffff80079d02000)
    at /usr/src/sys/kern/vfs_vnops.c:1111
#13 0xffffffff80510565 in vn_rdwr (rw=<optimized out>, vp=0xfffff8014a719780, base=<optimized out>, len=<optimized out>, 
    offset=<optimized out>, segflg=<optimized out>, ioflg=16641, active_cred=0xfffff80052af2700, file_cred=0x0, 
    aresid=0xfffffe00d17d46e0, td=0xfffff80079d02000) at /usr/src/sys/kern/vfs_vnops.c:603
#14 0xffffffff80510ac0 in vn_rdwr_inchunks (rw=UIO_WRITE, vp=0xfffff8014a719780, base=0x800ef5000, len=1146880, offset=8323072, 
    segflg=UIO_USERSPACE, ioflg=16641, active_cred=0xfffff80052af2700, file_cred=0x0, aresid=0x0, td=0xfffff80079d02000)
    at /usr/src/sys/kern/vfs_vnops.c:658
#15 0xffffffff803aff0a in core_write (p=<optimized out>, base=0x800a00000, len=6344704, offset=3125248, 
    seg=<error reading variable: Cannot access memory at address 0x0>) at /usr/src/sys/kern/imgact_elf.c:1508
#16 core_output (base=0x800a00000, len=6344704, offset=3125248, p=<optimized out>, tmpbuf=<optimized out>)
    at /usr/src/sys/kern/imgact_elf.c:1527
#17 elf64_coredump (td=<optimized out>, vp=0xfffffe00d04a3000, limit=0, flags=<optimized out>)
    at /usr/src/sys/kern/imgact_elf.c:1662
#18 0xffffffff8042a752 in coredump (td=0xfffff80079d02000) at /usr/src/sys/kern/kern_sig.c:3686
#19 sigexit (td=0xfffff80079d02000, sig=10) at /usr/src/sys/kern/kern_sig.c:3190
#20 0xffffffff8042b265 in postsig (sig=10) at /usr/src/sys/kern/kern_sig.c:3088
#21 0xffffffff804895b7 in ast (framep=0xfffffe00d17d4c00) at /usr/src/sys/kern/subr_trap.c:324
#22 0xffffffff80759079 in doreti_ast () at /usr/src/sys/amd64/amd64/exception.S:1152
#23 0x000000000000000d in ?? ()
Comment 6 Mark Johnston freebsd_committer freebsd_triage 2019-12-29 18:32:52 UTC
(In reply to Greg V from comment #5)
This is strange, my changes really shouldn't be affecting the page lifecycle.  The only thing to try at this step is an INVARIANTS kernel.  The panic in comment 3 could be fixed by r356173 but the rest seem unrelated.
Comment 7 Val Packett 2019-12-29 18:53:01 UTC
(In reply to Mark Johnston from comment #6)

INVARIANTS doesn't catch this, when I said debug kernel, invariants were enabled. This is from debug kernel also:

#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:392
#2  0xffffffff804234c0 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:479
#3  0xffffffff80423916 in vpanic (fmt=<optimized out>, ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:908
#4  0xffffffff80423673 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:835
#5  0xffffffff806e177b in vm_page_insert_after (m=0xfffffe000c000710, object=0xfffff8020129b420, pindex=855, 
    mpred=0xfffffe000bf28c60) at /usr/src/sys/vm/vm_page.c:1502
#6  0xffffffff81e9e1f7 in ttm_bo_vm_fault () from /boot/modules/ttm.ko
#7  0xffffffff81b76dd8 in linux_cdev_pager_populate (vm_obj=0x357, pidx=<optimized out>, fault_type=<optimized out>, 
    max_prot=<optimized out>, first=0xfffffe00cfb5a920, last=0xfffffe00cfb5a930)
    at /usr/src/sys/compat/linuxkpi/common/src/linux_compat.c:553
#8  0xffffffff806c7dcc in vm_pager_populate (object=<unavailable>, pidx=<unavailable>, fault_type=<optimized out>, 
    max_prot=<unavailable>, first=<optimized out>, last=<optimized out>) at /usr/src/sys/vm/vm_pager.h:172
#9  vm_fault_populate (fs=<optimized out>, prot=3 '\003', fault_type=<optimized out>, fault_flags=0, wired=0, m_hold=0x0)
    at /usr/src/sys/vm/vm_fault.c:429
#10 vm_fault (map=0xfffff800bf644000, vaddr=34658910208, fault_type=<optimized out>, fault_flags=0, m_hold=0x0)
    at /usr/src/sys/vm/vm_fault.c:891
#11 0xffffffff806c762e in vm_fault_trap (map=0xfffff800bf644000, vaddr=<optimized out>, fault_type=<optimized out>, fault_flags=0, 
    signo=0xfffffe00cfb5abc4, ucode=<optimized out>) at /usr/src/sys/vm/vm_fault.c:571
#12 0xffffffff80779013 in trap_pfault (frame=<optimized out>, usermode=<optimized out>, signo=<optimized out>, 
    ucode=0xfffffe00cfb5abc0) at /usr/src/sys/amd64/amd64/trap.c:828
#13 0xffffffff80778670 in trap (frame=0xfffffe00cfb5ac00) at /usr/src/sys/amd64/amd64/trap.c:347
#14 <signal handler called>
#15 0x000000080097390a in ?? ()

I'll try reverting only the last half of the patch series
Comment 8 Mark Johnston freebsd_committer freebsd_triage 2019-12-29 18:57:57 UTC
I found a bug by causing some of my desktop applications to dump core.  I suspect this is what you are hitting.  TTM uses the plinks.q fields for its own purpose, but we were inadvertently putting TTM-managed pages on the global page queues.

diff --git a/sys/vm/vm_page.c b/sys/vm/vm_page.c
index e99732028af5..971d8a0ed236 100644
--- a/sys/vm/vm_page.c
+++ b/sys/vm/vm_page.c
@@ -4013,7 +4013,7 @@ vm_page_mvqueue(vm_page_t m, const uint8_t nqueue, const uint16_t nflag)
        KASSERT(nflag == PGA_REQUEUE || nflag == PGA_REQUEUE_HEAD,
            ("%s: invalid flags %x", __func__, nflag));
 
-       if ((m->oflags & VPO_UNMANAGED) != 0)
+       if ((m->oflags & VPO_UNMANAGED) != 0 || (m->flags & PG_FICTITIOUS) != 0)
                return;
 
        old = vm_page_astate_load(m);
Comment 9 Val Packett 2019-12-29 19:14:31 UTC
(In reply to Mark Johnston from comment #8)

Now panicked inside ttm_pool_populate (that's progress I guess!)

Fatal trap 12: page fault while in kernel mode                                                                                [10/33]
cpuid = 3; apic id = 03                                                                                                              
fault virtual address   = 0x0                                                                                                        
fault code              = supervisor read data, page not present                                                                     
instruction pointer     = 0x20:0xffffffff81ea00a3                                                                                    
stack pointer           = 0x28:0xfffffe00db84a310                                                                                    
frame pointer           = 0x28:0xfffffe00db84a3b0                                                                                    
code segment            = base rx0, limit 0xfffff, type 0x1b                                                                         
                        = DPL 0, pres 1, long 1, def32 0, gran 1                                                                     
processor eflags        = interrupt enabled, resume, IOPL = 0                                                                        
current process         = 21437 (wayfire)                                                                                            
trap number             = 12
panic: page fault

#8  <signal handler called>
#9  0xffffffff81ea00a3 in ttm_pool_populate () from /boot/modules/ttm.ko
#10 0xffffffff81ea03df in ttm_populate_and_map_pages () from /boot/modules/ttm.ko
#11 0xffffffff81ea1816 in ttm_tt_bind () from /boot/modules/ttm.ko
#12 0xffffffff81e99cdc in ttm_bo_handle_move_mem () from /boot/modules/ttm.ko
#13 0xffffffff81e97248 in ttm_bo_validate () from /boot/modules/ttm.ko
#14 0xffffffff81e97605 in ttm_bo_init_reserved () from /boot/modules/ttm.ko
#15 0xffffffff81bb1c7b in amdgpu_bo_do_create () from /boot/modules/amdgpu.ko
#16 0xffffffff81bb124e in amdgpu_bo_create () from /boot/modules/amdgpu.ko
#17 0xffffffff81ba5b5c in amdgpu_gem_create_ioctl () from /boot/modules/amdgpu.ko
#18 0xffffffff81e31cf7 in drm_ioctl_kernel () from /boot/modules/drm.ko
#19 0xffffffff81e31fd7 in drm_ioctl () from /boot/modules/drm.ko
#20 0xffffffff81ba1d8b in amdgpu_drm_ioctl () from /boot/modules/amdgpu.ko
Comment 10 Val Packett 2019-12-29 19:18:30 UTC
And now for something different… ZFS! (also still inside coredump)

panic: vm_radix_insert: key fe321 is already present                                                                                 

#4  0xffffffff80425fa3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:835
#5  0xffffffff806f990b in vm_radix_insert (rtree=<optimized out>, page=<optimized out>) at /usr/src/sys/vm/vm_radix.c:367
#6  0xffffffff806ec774 in vm_page_insert_after (m=<optimized out>, object=0xffffffff80cee598 <kernel_object_store>, pindex=1041185, 
    mpred=0xfffffe000fa6e5d0) at /usr/src/sys/vm/vm_page.c:1526
#7  vm_page_alloc_domain_after (object=0xffffffff80cee598 <kernel_object_store>, pindex=1041185, domain=0, req=8802, 
    mpred=0xfffffe000fa6e5d0) at /usr/src/sys/vm/vm_page.c:2086
#8  0xffffffff806d77ca in kmem_back_domain (domain=0, object=0xffffffff80cee598 <kernel_object_store>, addr=18446741878950854656, 
    size=262144, flags=2305) at /usr/src/sys/vm/vm_kern.c:477
#9  0xffffffff806d769f in kmem_malloc_domain (domain=0, size=262144, flags=2305) at /usr/src/sys/vm/vm_kern.c:415
#10 kmem_malloc_domainset (ds=<optimized out>, size=<optimized out>, flags=2305) at /usr/src/sys/vm/vm_kern.c:439
#11 0xffffffff806cde7d in keg_alloc_slab (keg=0xfffff80003f1ad20, zone=0xfffff80003f46000, domain=<optimized out>, flags=2, 
    aflags=<unavailable>) at /usr/src/sys/vm/uma_core.c:1318
#12 0xffffffff806d0d71 in keg_fetch_slab (keg=<optimized out>, zone=0xfffff80003f46000, rdomain=-1, flags=2)
    at /usr/src/sys/vm/uma_core.c:3209
#13 zone_import (arg=<optimized out>, bucket=0xfffff8017a26f798, max=<optimized out>, domain=-1, flags=<optimized out>)
    at /usr/src/sys/vm/uma_core.c:3287
#14 0xffffffff806cc26d in zone_alloc_bucket (zone=0xfffff80003f46000, udata=0xfffff80003f32e80, domain=<optimized out>, flags=2)
    at /usr/src/sys/vm/uma_core.c:3348
#15 cache_alloc (zone=0xfffff80003f46000, cache=<optimized out>, udata=0xfffff80003f32e80, flags=<optimized out>)
    at /usr/src/sys/vm/uma_core.c:3061
#16 0xffffffff806cbaed in uma_zalloc_arg (zone=0xfffff80003f46000, udata=0xfffff80003f32e80, flags=2)
    at /usr/src/sys/vm/uma_core.c:2954
#17 0xffffffff8117cebf in arc_get_data_buf (hdr=0xfffff8028a6c2800, size=262144, tag=<optimized out>)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:5185
#18 arc_buf_alloc_impl (hdr=0xfffff8028a6c2800, tag=<optimized out>, compressed=<optimized out>, fill=0, ret=<optimized out>)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:3036
#19 0xffffffff8117c971 in arc_alloc_buf (spa=<optimized out>, tag=<optimized out>, type=<optimized out>, size=262144)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:3566
#20 0xffffffff8118b73e in dbuf_read_impl (db=0xfffff8022bd38780, zio=0xfffff8001f8b8000, flags=10)                             [1/96]
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:1268
#21 dbuf_read (db=0xfffff8022bd38780, zio=0xfffff8001f8b8000, flags=<optimized out>)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:1437
#22 0xffffffff811aaa4f in dmu_tx_check_ioerr (zio=0xfffff8001f8b8000, dn=0xfffff8025200e5c0, level=<optimized out>, 
    blkid=<optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c:201
--Type <RET> for more, q to quit, c to continue without paging--
#23 0xffffffff811a8f37 in dmu_tx_count_write (txh=0xfffff8022b075980, off=129236992, len=65536)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c:245
#24 0xffffffff811a8e6a in dmu_tx_hold_write (tx=<optimized out>, object=<optimized out>, off=129236992, len=<optimized out>)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c:299
#25 0xffffffff81266225 in zfs_write (vp=0xfffff80248fe5960, uio=0xfffffe00cfb95608, ioflag=0, cr=0xfffff801abd60100, 
    ct=<optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1009
#26 zfs_freebsd_write (ap=<optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:4843
#27 0xffffffff807ad45d in VOP_WRITE_APV (vop=0xffffffff813121d0 <zfs_vnodeops>, a=0xfffffe00cfb95438) at vnode_if.c:967
#28 0xffffffff805140b5 in VOP_WRITE (vp=<unavailable>, uio=0xfffffe00cfb95608, ioflag=<unavailable>, cred=<optimized out>)
    at ./vnode_if.h:413
#29 vn_io_fault_doio (args=0xfffffe00cfb95638, uio=0xfffffe00cfb95608, td=0xfffff8006c031000) at /usr/src/sys/kern/vfs_vnops.c:965
#30 0xffffffff805107fc in vn_io_fault1 (vp=<optimized out>, uio=0xfffffe00cfb95608, args=0xfffffe00cfb95638, td=0xfffff8006c031000)
    at /usr/src/sys/kern/vfs_vnops.c:1075
#31 0xffffffff80510565 in vn_rdwr (rw=<optimized out>, vp=0xfffff80248fe5960, base=<optimized out>, len=<optimized out>, 
    offset=<optimized out>, segflg=<optimized out>, ioflg=16641, active_cred=0xfffff801abd60100, file_cred=0x0, 
    aresid=0xfffffe00cfb956e0, td=0xfffff8006c031000) at /usr/src/sys/kern/vfs_vnops.c:603
#32 0xffffffff80510ac0 in vn_rdwr_inchunks (rw=UIO_WRITE, vp=0xfffff80248fe5960, base=0x8101af000, len=25120768, offset=129236992, 
    segflg=UIO_USERSPACE, ioflg=16641, active_cred=0xfffff801abd60100, file_cred=0x0, aresid=0x0, td=0xfffff8006c031000)
    at /usr/src/sys/kern/vfs_vnops.c:658
#33 0xffffffff803aff0a in core_write (p=<optimized out>, base=0x80fa00000, len=33177600, offset=121180160, 
    seg=<error reading variable: Cannot access memory at address 0x0>) at /usr/src/sys/kern/imgact_elf.c:1508
#34 core_output (base=0x80fa00000, len=33177600, offset=121180160, p=<optimized out>, tmpbuf=<optimized out>)
    at /usr/src/sys/kern/imgact_elf.c:1527
#35 elf64_coredump (td=<optimized out>, vp=0xfffffe00f0fa5000, limit=0, flags=<optimized out>)
    at /usr/src/sys/kern/imgact_elf.c:1662
Comment 11 Mark Johnston freebsd_committer freebsd_triage 2019-12-29 19:24:25 UTC
(In reply to Greg V from comment #10)
I presume these are all with INVARIANTS enabled?

Can you try this patch instead of the last one?

diff --git a/sys/vm/vm_page.c b/sys/vm/vm_page.c
index e99732028af5..b9ba4b502459 100644
--- a/sys/vm/vm_page.c
+++ b/sys/vm/vm_page.c
@@ -4013,7 +4013,7 @@ vm_page_mvqueue(vm_page_t m, const uint8_t nqueue, const uint16_t nflag)
        KASSERT(nflag == PGA_REQUEUE || nflag == PGA_REQUEUE_HEAD,
            ("%s: invalid flags %x", __func__, nflag));
 
-       if ((m->oflags & VPO_UNMANAGED) != 0)
+       if ((m->oflags & VPO_UNMANAGED) != 0 || vm_page_wired(m))
                return;
 
        old = vm_page_astate_load(m);
Comment 12 Val Packett 2019-12-29 19:43:41 UTC
(In reply to Mark Johnston from comment #11)

Not all of them, no..

And the one with vm_page_wired seems to help, everything works so far!

Thanks for the very fast response :)
Comment 13 Mark Johnston freebsd_committer freebsd_triage 2019-12-29 19:48:42 UTC
(In reply to Greg V from comment #12)
Ok, thanks.  The TTM fault handler plays some games and broke an assumption in my patches.  My desktop uses amdgpu, so I'm not sure why I didn't hit this.  I guess you are using more of the DRM API than my setup does.
Comment 14 commit-hook freebsd_committer freebsd_triage 2019-12-29 20:01:06 UTC
A commit references this bug:

Author: markj
Date: Sun Dec 29 20:01:03 UTC 2019
New revision: 356183
URL: https://svnweb.freebsd.org/changeset/base/356183

Log:
  Restore a vm_page_wired() check in vm_page_mvqueue() after r356156.

  We now set PGA_DEQUEUE on a managed page when it is wired after
  allocation, and vm_page_mvqueue() ignores pages with this flag set,
  ensuring that they do not end up in the page queues.  However, this is
  not sufficient for managed fictitious pages or pages managed by the
  TTM.  In particular, the TTM makes use of the plinks.q queue linkage
  fields for its own purposes.

  PR:	242961
  Reported and tested by:	Greg V <greg@unrelenting.technology>

Changes:
  head/sys/vm/vm_page.c