Bug 256806

Summary: panic: vm_page_free_prep: freeing mapped page
Product: Base System Reporter: Niels Bakker <niels=freebsd>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed Unable to Reproduce    
Severity: Affects Only Me CC: bdrewery, chris, grahamperrin, markj, niels=freebsd
Priority: --- Keywords: crash
Version: 13.0-STABLE   
Hardware: amd64   
OS: Any   

Description Niels Bakker 2021-06-24 13:55:00 UTC
FreeBSD 13.0-STABLE #6 stable/13-n246074-d77e57f1256-dirty (stable/13 plus the patch in #256610 applied)

The system crashed somewhere between 05:29:27-05:31:29 in the morning.

panic: vm_page_free_prep: freeing mapped page 0xfffffe0014633408
cpuid = 2
time = 1624505370
KDB: stack backtrace:
#0 0xffffffff80bfe6f5 at kdb_backtrace+0x65
#1 0xffffffff80bb40d1 at vpanic+0x181
#2 0xffffffff80bb3ea3 at panic+0x43
#3 0xffffffff80efd535 at vm_page_free_prep+0x215
#4 0xffffffff80ef54e2 at vm_page_free_toq+0x12
#5 0xffffffff80ef0a17 at vm_object_page_remove+0xb7
#6 0xffffffff80ee9620 at vm_map_entry_delete+0x120
#7 0xffffffff80ee52db at vm_map_delete+0x12b
#8 0xffffffff80ee9840 at vm_map_remove+0x80
#9 0xffffffff80b689f6 at exec_new_vmspace+0x1c6
#10 0xffffffff80b3c0ba at exec_elf64_imgact+0x84a
#11 0xffffffff80b6758c at kern_execve+0x63c
#12 0xffffffff80b66bda at sys_execve+0x5a
#13 0xffffffff8101aeae at amd64_syscall+0x12e
#14 0xffffffff80ff0f3e at fast_syscall_common+0xf8
Uptime: 2d5h39m2s

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
(kgdb) #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=<optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:399
#2  0xffffffff80bb3cfa in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:486
#3  0xffffffff80bb4140 in vpanic (fmt=<optimized out>, ap=<optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:919
#4  0xffffffff80bb3ea3 in panic (fmt=<unavailable>)
    at /usr/src/sys/kern/kern_shutdown.c:843
#5  0xffffffff80efd535 in vm_page_free_prep (m=m@entry=0xfffffe0014633408)
    at /usr/src/sys/vm/vm_page.c:3813
#6  0xffffffff80ef54e2 in vm_page_free_toq (m=m@entry=0xfffffe0014633408)
    at /usr/src/sys/vm/vm_page.c:3866
#7  0xffffffff80ef54cb in vm_page_free (m=<unavailable>, 
    m@entry=0xfffffe0014633408) at /usr/src/sys/vm/vm_page.c:1332
#8  0xffffffff80ef0a17 in vm_object_page_remove (object=0xfffff80290744318, 
    start=0, end=<optimized out>, options=2)
    at /usr/src/sys/vm/vm_object.c:2137
#9  0xffffffff80ee9620 in vm_map_entry_delete (
    map=map@entry=0xfffffe012b0ee3e0, entry=entry@entry=0xfffff8007818be40)
    at /usr/src/sys/vm/vm_map.c:3870
#10 0xffffffff80ee52db in vm_map_delete (map=map@entry=0xfffffe012b0ee3e0, 
    start=<optimized out>, start@entry=4096, end=end@entry=140737488355328)
    at /usr/src/sys/vm/vm_map.c:3984
#11 0xffffffff80ee9840 in vm_map_remove (map=map@entry=0xfffffe012b0ee3e0, 
    start=4096, end=140737488355328) at /usr/src/sys/vm/vm_map.c:4002
#12 0xffffffff80b689f6 in exec_new_vmspace (
    imgp=imgp@entry=0xfffffe0120557888, 
    sv=sv@entry=0xffffffff8196b580 <elf64_freebsd_sysvec_la48>)
    at /usr/src/sys/kern/kern_exec.c:1082
#13 0xffffffff80b3c0ba in exec_elf64_imgact (imgp=<optimized out>, 
    imgp@entry=<error reading variable: value is not available>)
    at /usr/src/sys/kern/imgact_elf.c:1248
#14 0xffffffff80b6758c in do_execve (td=0xfffffe011b85d3a0, 
    args=0xfffffe0120557a48, mac_p=<optimized out>, 
    oldvmspace=0xfffffe012b0ee3e0) at /usr/src/sys/kern/kern_exec.c:616
#15 kern_execve (td=<optimized out>, td@entry=0xfffffe011b85d3a0, 
    args=args@entry=0xfffffe0120557a48, mac_p=<optimized out>, 
    mac_p@entry=0x0, oldvmspace=oldvmspace@entry=0xfffffe012b0ee3e0)
    at /usr/src/sys/kern/kern_exec.c:353
#16 0xffffffff80b66bda in sys_execve (td=0xfffffe011b85d3a0, 
    td@entry=<error reading variable: value is not available>, 
    uap=0xfffffe011b85d788, 
    uap@entry=<error reading variable: value is not available>)
    at /usr/src/sys/kern/kern_exec.c:228
#17 0xffffffff8101aeae in syscallenter (td=<optimized out>)
    at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:189
#18 amd64_syscall (td=0xfffffe011b85d3a0, traced=0)
    at /usr/src/sys/amd64/amd64/trap.c:1184
#19 <signal handler called>
#20 0x00000000038fdf76 in ?? ()
Backtrace stopped: Cannot access memory at address 0xc0005be920
(kgdb) 

(kgdb) frame 14
#14 0xffffffff80b6758c in do_execve (td=0xfffffe011b85d3a0, args=0xfffffe0120557a48, mac_p=<optimized out>, 
    oldvmspace=0xfffffe012b0ee3e0) at /usr/src/sys/kern/kern_exec.c:616
616			error = (*execsw[i]->ex_imgact)(imgp);
(kgdb) print *args
$3 = {buf = 0xfffffe00c501c000 "/bin/ps", bufkva = 0xfffff80021ec45e0, 
  begin_argv = 0xfffffe00c501c008 "/bin/ps", begin_envv = 0xfffffe00c501c01a "MAIL=/var/mail/telegraf", 
  endp = 0xfffffe00c501c0dd "IC=en_US.UTF-8", fname = 0xfffffe00c501c000 "/bin/ps", fname_buf = 0x0, 
  stringspace = 524075, argc = 3, envc = 9, fd = 0, fdp = 0x0}
(kgdb) f 15
#15 kern_execve (td=<optimized out>, td@entry=0xfffffe011b85d3a0, args=args@entry=0xfffffe0120557a48, 
    mac_p=<optimized out>, mac_p@entry=0x0, oldvmspace=oldvmspace@entry=0xfffffe012b0ee3e0)
    at /usr/src/sys/kern/kern_exec.c:353
353		return (do_execve(td, args, mac_p, oldvmspace));
(kgdb) p *args
$1 = {buf = 0xfffffe00c501c000 "/bin/ps", bufkva = 0xfffff80021ec45e0, 
  begin_argv = 0xfffffe00c501c008 "/bin/ps", begin_envv = 0xfffffe00c501c01a "MAIL=/var/mail/telegraf", 
  endp = 0xfffffe00c501c0dd "IC=en_US.UTF-8", fname = 0xfffffe00c501c000 "/bin/ps", fname_buf = 0x0, 
  stringspace = 524075, argc = 3, envc = 9, fd = 0, fdp = 0x0}
(kgdb)
Comment 1 Mark Johnston freebsd_committer freebsd_triage 2021-06-24 14:00:06 UTC
Can you show kgdb output of:

(kgdb) p/x *(vm_page_t)0xfffffe0014633408

?
Comment 2 Niels Bakker 2021-06-24 14:01:33 UTC
__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55		__asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
(kgdb) p/x *(vm_page_t)0xfffffe0014633408
$1 = {plinks = {q = {tqe_next = 0xfffffe0006b84ef8, tqe_prev = 0xfffffe000370cf08}, s = {ss = {
        sle_next = 0xfffffe0006b84ef8}}, memguard = {p = 0xfffffe0006b84ef8, v = 0xfffffe000370cf08}, uma = {
      slab = 0xfffffe0006b84ef8, zone = 0xfffffe000370cf08}}, listq = {tqe_next = 0xfffffe0010d52228, 
    tqe_prev = 0xfffff80290744360}, object = 0xfffff80290744318, pindex = 0x0, phys_addr = 0x322f55000, md = {
    pv_list = {tqh_first = 0xfffff801406ed4d8, tqh_last = 0xfffff801406ed4e0}, pv_gen = 0x72b3e843, 
    pat_mode = 0x6}, ref_count = 0x80000000, busy_lock = 0x1b85d3a2, a = {{flags = 0x9, queue = 0x1, 
      act_count = 0x5}, _bits = 0x5010009}, order = 0xd, pool = 0x0, flags = 0x1, oflags = 0x0, psind = 0x0, 
  segind = 0xa, valid = 0xff, dirty = 0xff}
(kgdb)
Comment 3 Mark Johnston freebsd_committer freebsd_triage 2021-06-24 14:11:20 UTC
(In reply to Niels Bakker from comment #2)
Thanks.  Please also show:

(kgdb) p/x *((vm_page_t)0xfffffe0014633408)->object
Comment 4 Niels Bakker 2021-06-24 14:36:37 UTC
(kgdb) p/x *((vm_page_t)0xfffffe0014633408)->object
$1 = {lock = {lock_object = {lo_name = 0xffffffff8114f4a3, lo_flags = 0x25630000, lo_data = 0x0, 
      lo_witness = 0x0}, rw_lock = 0xfffffe011b85d3a0}, object_list = {tqe_next = 0xfffff80290744420, 
    tqe_prev = 0xfffff80290744230}, shadow_head = {lh_first = 0xfffff803db649a50}, shadow_list = {
    le_next = 0x0, le_prev = 0xfffff80154173b88}, memq = {tqh_first = 0xfffffe0014633408, 
    tqh_last = 0xfffffe0010ad82f8}, rtree = {rt_root = 0xfffff803a8a57f30}, size = 0x28a, domain = {
    dr_policy = 0x0, dr_iter = 0x0}, generation = 0x1, cleangeneration = 0x1, ref_count = 0x2, 
  shadow_count = 0x1, memattr = 0x6, type = 0x1, flags = 0x3210, pg_color = 0x368, paging_in_progress = {
    __count = 0x1}, busy = {__count = 0x0}, resident_page_count = 0x1c, backing_object = 0x0, 
  backing_object_offset = 0x0, pager_object_list = {tqe_next = 0x0, tqe_prev = 0x0}, rvq = {lh_first = 0x0}, 
  handle = 0x0, un_pager = {vnp = {vnp_size = 0xb2a, writemappings = 0xfffff80113e04e10}, devp = {
      devp_pglist = {tqh_first = 0xb2a, tqh_last = 0xfffff80113e04e10}, ops = 0x0, dev = 0x0}, sgp = {
      sgp_pglist = {tqh_first = 0xb2a, tqh_last = 0xfffff80113e04e10}}, swp = {swp_tmpfs = 0xb2a, swp_blks = {
        pt_root = 0xfffff80113e04e10}, writemappings = 0x0}, phys = {ops = 0xb2a, {
        data_ptr = 0xfffff80113e04e10, data_val = 0xfffff80113e04e10}}}, cred = 0xfffff80060ea2e00, 
  charge = 0x28a000, umtx_data = 0x0}
Comment 5 Mark Johnston freebsd_committer freebsd_triage 2021-06-24 14:54:12 UTC
So OBJ_ONEMAPPING is set, and vm_map_delete() should have unmapped the range.  And yet a page in that range is still mapped.  PIP is non-zero and the object has no backing object.  Could you finally show:

(kgdb) f 9
(kgdb) p/x *entry

?
Comment 6 Niels Bakker 2021-06-24 14:55:26 UTC
(kgdb) f 9
#9  0xffffffff80ee9620 in vm_map_entry_delete (map=map@entry=0xfffffe012b0ee3e0, 
    entry=entry@entry=0xfffff8007818be40) at /usr/src/sys/vm/vm_map.c:3870
3870				vm_object_page_remove(object, offidxstart, offidxend,
(kgdb) p/x *entry
$1 = {left = 0xfffffe012b0ee3e0, right = 0xfffff80060bc9c00, start = 0x807368000, end = 0x8075f2000, 
  next_read = 0x80758e000, max_free = 0x7f3fda7f2000, object = {vm_object = 0xfffff80290744318, 
    sub_map = 0xfffff80290744318}, offset = 0x0, eflags = 0xc, protection = 0x3, max_protection = 0x7, 
  inheritance = 0x1, read_ahead = 0x0, wired_count = 0x0, cred = 0x0, wiring_thread = 0x0}
Comment 7 Mark Johnston freebsd_committer freebsd_triage 2021-06-24 16:04:02 UTC
Ok, so either the mappings belong to a different address space (i.e., OBJ_ONEMAPPING should not be set), or a new mapping was somehow created after the pmap_remove() call.  We can look at the pv entry to see if the mapping matches our map entry's bounds:

(kgdb) p *((vm_page_t)0xfffffe0014633408)->md.pv_list.tqh_first
Comment 8 Niels Bakker 2021-06-24 16:19:39 UTC
$1 = {pv_va = 34480750592, pv_next = {tqe_next = 0x0, tqe_prev = 0xfffffe0014633440}}
Comment 9 Mark Johnston freebsd_committer freebsd_triage 2021-06-30 14:12:27 UTC
(In reply to Niels Bakker from comment #8)
The mapping address is the same as the beginning of the vm_map_entry range.  I believe it isn't possible for a page to be mapped into the pmap after the pmap_remove() call, since we're in execve and other threads are suspended.

One other possibility is that the PTE was damaged somehow (e.g., due to a hardware bit-flip), so PG_V is clear and pmap_remove() skipped it.  Unfortunately, I think we can't check this because page table pages are not included in minidumps.

Has the panic occurred more than once?
Comment 10 Niels Bakker 2021-06-30 14:16:54 UTC
>Has the panic occurred more than once?
No. I've had other kernel panics with 13.0 that were actual bugs, so I figured I submit a bugreport for this panic too. Given that the hardware is from 2014 I'm more than ok witih writing this off to random bitflips.

Thank you so much for your effort in looking into this!
Comment 11 Bryan Drewery freebsd_committer freebsd_triage 2021-10-04 17:49:23 UTC
Bug 258932 may be the same thing.