Summary: | vm_reserv_depopulate panic. | ||
---|---|---|---|
Product: | Base System | Reporter: | dgilbert |
Component: | kern | Assignee: | freebsd-bugs (Nobody) <bugs> |
Status: | Open --- | ||
Severity: | Affects Some People | CC: | jrtc27, markj |
Priority: | --- | ||
Version: | CURRENT | ||
Hardware: | Any | ||
OS: | Any |
Description
dgilbert
2021-08-25 20:50:37 UTC
Re observed on amd64, Google has 5 hits (I assume this bug report will shortly appear too) for `"vm_reserv_depopulate" "is clear"`, all but one of which are just for the code, with the final one being for this 5 year old GitHub Gist https://gist.github.com/nomadlogic/ba58e8fd01267fbf7a2fa4fcee29e2f7 that was for FreeBSD 12.0-CURRENT on amd64 when using the old freebsd-base-graphics tree. Looking at vm_reserv itself, whilst a caller can do stupid things and potentially cause crashes, my initial reading is that this KASSERT should be impossible no matter what the caller is doing. The vmcore file on its own isn't useful without a copy of the corresponding kernel (/boot/kernel) and debug files (/usr/lib/debug/boot/kernel). It would be useful to see a dump of the vm_page and reservation in question: (kgdb) frame 14 (kgdb) p/x *m (kgdb) p/x *(vm_reserv_t)0xffffffd3e672c560 I will get on this ... but it might be tomorrow. I will run those commands _and_ I will upload those files. (kgdb) frame 14 #14 0xffffffc0005a8da0 in vm_page_free_prep (m=0xffffffd3f1012168) at /usr/src/sys/vm/vm_page.c:3842 warning: Source file is more recent than executable. 3842 if ((m->flags & PG_PCPU_CACHE) == 0 && vm_reserv_free_page(m)) (kgdb) p/x *m $1 = {plinks = {q = {tqe_next = 0xffffffd3f10121d0, tqe_prev = 0xffffffd3f1012100}, s = {ss = { sle_next = 0xffffffd3f10121d0}}, memguard = {p = 0xffffffd3f10121d0, v = 0xffffffd3f1012100}, uma = { slab = 0xffffffd3f10121d0, zone = 0xffffffd3f1012100}}, listq = {tqe_next = 0xffffffd3f10121d0, tqe_prev = 0xffffffd3f1012110}, object = 0x0, pindex = 0x2d0, phys_addr = 0x21f2d0000, md = {pv_list = { tqh_first = 0x0, tqh_last = 0xffffffd3f10121a0}, pv_gen = 0xf, pv_memattr = 0x2}, ref_count = 0x0, busy_lock = 0xfffffffe, a = {{flags = 0x18, queue = 0x1, act_count = 0x5}, _bits = 0x5010018}, order = 0xc, pool = 0x0, flags = 0x0, oflags = 0x0, psind = 0x0, segind = 0x1, valid = 0x0, dirty = 0x0} (kgdb) p/x *(vm_reserv_t)0xffffffd3e672c560 $2 = {lock = {lock_object = {lo_name = 0xffffffc00066006c, lo_flags = 0x1030000, lo_data = 0x0, lo_witness = 0xffffffd3ffd8e180}, mtx_lock = 0xffffffc2227f7100}, partpopq = {tqe_next = 0xffffffd3e6756fe0, tqe_prev = 0xffffffd3e679c240}, objq = {le_next = 0xffffffd3e67b04a0, le_prev = 0xffffffd0a46be0c0}, object = 0xffffffd0a46be000, pindex = 0x200, pages = 0xffffffd3f100cce8, popcnt = 0xef, domain = 0x0, inpartpopq = 0x1, lasttick = 0xa9d59012, popmap = {0x0, 0x0, 0x0, 0xffffffffff000000, 0xfffffffffffe040f, 0x24fc0ffffc925927, 0xffffff0847fc9249, 0x1fffffffffffffff}} https://termbin.com/q8g9 for that last bit. Heh... my standard window is 120 these days. I tarred /boot/kernel and /usr/lib/debug/boot/kernel into the nextcloud directory. You can fetch them from the same place (https://nextcloud.towernet.ca/s/wPpj7zgxgDBAZ6q) (In reply to dgilbert from comment #6) Thank you. Is the panic reproducible at all? (In reply to dgilbert from comment #4) I don't see anything obviously inconsistent, except: popcnt(0xffffffffff000000) + popcnt(0xfffffffffffe040f) + popcnt(0x24fc0ffffc925927) + popcnt(0xffffff0847fc9249) + popcnt(0x1fffffffffffffff) = 231 and rv->popcnt = 239... make -j8 bulidworld produced that ... but it's the only time it happened to me. make -j4 subsequently passed. There are 4 processors on the box. I can run a few more make -j8 on it. Question, tho, can I upgrade to the security patches ... or should I continue to test on this week-or-two old version? (In reply to dgilbert from comment #8) I don't see any problem with updating first. Here's what I've found so far. If I make -j8 with ccache full of answers, we're fine. If I make -j4, we're fine. If I make -j8 with ccache empty (but being filled), then I get 3 out of 4 (so far) buildworld have ended in a random crash and one in the panic you're looking at. The code does compile (at -j4). The system does have ZFS running on an NVMe drive. I'm going to keep trying to trigger the panic, but my feeling is the panic is only one of the possible outcomes of the error. (In reply to dgilbert from comment #10) It would be useful to see at least the panic message and stack trace from the other panics you've hit. The count is one panic and 3 crashes of 4 total attempts at -j8. |