panic: destroying non-empty racct: 3399680 allocated for resource 4 cpuid = 8 time = 1504448756 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe535fec7600 vpanic() at vpanic+0x19c/frame 0xfffffe535fec7680 kassert_panic() at kassert_panic+0x126/frame 0xfffffe535fec76f0 racct_destroy_locked() at racct_destroy_locked+0x7a/frame 0xfffffe535fec7720 racct_destroy() at racct_destroy+0x35/frame 0xfffffe535fec7740 uifree() at uifree+0x8b/frame 0xfffffe535fec7760 crfree() at crfree+0x85/frame 0xfffffe535fec7780 thread_wait() at thread_wait+0x9b/frame 0xfffffe535fec77a0 proc_reap() at proc_reap+0x3fe/frame 0xfffffe535fec77e0 proc_to_reap() at proc_to_reap+0x36c/frame 0xfffffe535fec7830 kern_wait6() at kern_wait6+0x28d/frame 0xfffffe535fec78d0 sys_wait4() at sys_wait4+0x78/frame 0xfffffe535fec7ac0 amd64_syscall() at amd64_syscall+0x769/frame 0xfffffe535fec7bf0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe535fec7bf0 --- syscall (7, FreeBSD ELF64, sys_wait4), rip = 0x79629a, rsp = 0x7fffffffcf78, rbp = 0x7fffffffd5f0 --- Uptime: 2d19h33m23s borg.lerctr.org /var/crash # cat version.txt FreeBSD 12.0-CURRENT #5 r323021: Wed Aug 30 12:44:11 CDT 2017 root@borg.lerctr.org:/usr/obj/usr/src/sys/VT-LER borg.lerctr.org /var/crash #
This happens on a large poudriere run.
I don't think avg@ can really provide much input here. This looks like a wait*(2)/proc change caused this issue.
This is still happening. I'll attach the text dumps I Have.
Created attachment 186996 [details] Latest (2017-10-07) text dump
Created attachment 186997 [details] info (2017-10-07) from last dump
I've also just got this, also at the end of a poudriere run: panic: destroying non-empty racct: 2334720 allocated for resource 4 cpuid = 0 curthread: 0xfffff80192300580 stack: 0xfffffe090f856000 - 0xfffffe090f85a000 stack pointer: 0xfffffe090f859910 time = 1520178482 KDB: stack backtrace: db_trace_self_wrapper() at 0xffffffff804670bb = db_trace_self_wrapper+0x2b/frame 0xfffffe090f859840 kdb_backtrace() at 0xffffffff806d4c79 = kdb_backtrace+0x39/frame 0xfffffe090f8598f0 vpanic() at 0xffffffff80699da6 = vpanic+0x166/frame 0xfffffe090f859930 kassert_panic() at 0xffffffff80699adb = kassert_panic+0x16b/frame 0xfffffe090f8599a0 racct_destroy_locked() at 0xffffffff80689f82 = racct_destroy_locked+0x72/frame 0xfffffe090f8599d0 racct_destroy() at 0xffffffff80689ee5 = racct_destroy+0x35/frame 0xfffffe090f8599f0 prison_racct_free_locked() at 0xffffffff80667adc = prison_racct_free_locked+0x4c/frame 0xfffffe090f859a10 prison_racct_free() at 0xffffffff80667a2d = prison_racct_free+0x6d/frame 0xfffffe090f859a30 prison_racct_detach() at 0xffffffff80667b8e = prison_racct_detach+0x3e/frame 0xfffffe090f859a50 prison_deref() at 0xffffffff8066583e = prison_deref+0x2fe/frame 0xfffffe090f859a90 prison_remove_one() at 0xffffffff80666ca5 = prison_remove_one+0x145/frame 0xfffffe090f859ac0 sys_jail_remove() at 0xffffffff80666b49 = sys_jail_remove+0x1f9/frame 0xfffffe090f859b10 syscallenter() at 0xffffffff808fd92b = syscallenter+0x2cb/frame 0xfffffe090f859b50 amd64_syscall() at 0xffffffff808fd4e7 = amd64_syscall+0x17/frame 0xfffffe090f859bf0 Xfast_syscall() at 0xffffffff808e485b = Xfast_syscall+0xfb/frame 0xfffffe090f859bf0
I think that my "fix" for bug 210315 was not a real fix. I have another theory now which, if it holds, may explain all panics reported so far in that bug and in this bug. I suspect that the problem could be with this pattern used in many places where process credentials change: proc_set_cred(p, newcred); PROC_UNLOCK(p); racct_proc_ucred_changed(p, oldcred, newcred); You can see this in do_jail_attach() as well as in sys_setloginclass(), sys_setuid() and alike. Now, let's compare this to how RACCT_RSS is updated by vm_daemon(): PROC_LOCK(p) racct_set(p, RACCT_RSS, rsize); PROC_UNLOCK(p); This happens concurrently to jail_attach or setuid, etc, so it's possible that once in a while this update will happen while the process has already been assigned the newcred, but before racct_proc_ucred_changed() is called. In this case the new value would be applied to the process and its _new_ credentials will be adjusted. But when racct_proc_ucred_changed() is called we will transfer the process's resource from oldcred to newcred. So, essentially we will removed more than needed (provided the delta was positive) from oldcred and add more than needed to newcred. I think that the theory explains the panic in comment #6 where a jail got some extra RACCT_RSS. It also seems to explain bug# 210315, comment #2 when an underflow happened in the global space (prison0) during shutdown. I think that doing racct_proc_ucred_changed() under the proc lock would prevent the problem, but there must be a good reason why that function requires that it is not called with under the lock... So, not sure...
*** Bug 210315 has been marked as a duplicate of this bug. ***
I devised a scenario for this bug. Running it for 10 minutes reliably reproduces the crash. In one shell executed this small script as root: while true ; do ( sleep 2 ; exec su -c xuser -m root -c ':' ) & sleep 0.001 ; done The idea behind this script is to make a process noticed by vm_daemon so that its RACCT_RSS gets set and then call setloginclass() hoping to expose the race described in comment #7. "xuser" login class is selected because it's not used by anything else, so the experiment is more pure. In another shell executed this dtrace command: dtrace -w -n 'sys_setloginclass:entry { self->t = 1; } sys_setloginclass:return { self->t = 0; } proc_set_cred:entry /self->t/ { chill(100000000); }' The idea is to make proc_set_cred() take much longer than usual while holding the proc lock, so that vm_daemon has much higher chances of running into that lock and getting blocked on it. When sys_setloginclass() releases the lock, vm_daemon gets a chance to run and expose the race. When stop the test and initiate a graceful reboot, I get a panic about RACCT_RSS underflow in "root" login class: panic: racct_adjust_resource: resource 4 usage < 0 cpuid = 2 time = 1523437499 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe002b46d860 vpanic() at vpanic+0x19c/frame 0xfffffe002b46d8e0 kassert_panic() at kassert_panic+0x126/frame 0xfffffe002b46d950 racct_adjust_resource() at racct_adjust_resource+0xd2/frame 0xfffffe002b46d980 racct_set_locked() at racct_set_locked+0x135/frame 0xfffffe002b46d9d0 racct_set() at racct_set+0x51/frame 0xfffffe002b46da00 vmspace_exit() at vmspace_exit+0x147/frame 0xfffffe002b46da40 exit1() at exit1+0x5ad/frame 0xfffffe002b46dab0 sys_sys_exit() at sys_sys_exit+0xd/frame 0xfffffe002b46dac0 amd64_syscall() at amd64_syscall+0x79b/frame 0xfffffe002b46dbf0 fast_syscall_common() at fast_syscall_common+0x105/frame 0xfffffe002b46dbf0 --- syscall (1, FreeBSD ELF64, sys_sys_exit), rip = 0x8008eb77a, rsp = 0x7fffffffe1a8, rbp = 0x7fffffffe1c0 --- (kgdb) bt #0 kdb_enter (why=0xffffffff814ee179 "panic", msg=<optimized out>) at /usr/devel/svn/autoro/sys/kern/subr_kdb.c:479 #1 0xffffffff80ac6549 in vpanic (fmt=<optimized out>, ap=0xfffffe002b46d920) at /usr/devel/svn/autoro/sys/kern/kern_shutdown.c:826 #2 0xffffffff80ac62f6 in kassert_panic (fmt=0xffffffff814ec548 "%s: resource %d usage < 0") at /usr/devel/svn/autoro/sys/kern/kern_shutdown.c:723 #3 0xffffffff80ab6ea2 in racct_adjust_resource (racct=0xfffff800074c6dd0, resource=4, amount=<optimized out>) at /usr/devel/svn/autoro/sys/kern/kern_racct.c:526 #4 0xffffffff80ab6965 in racct_set_locked (p=<optimized out>, resource=4, amount=<optimized out>, force=0) at /usr/devel/svn/autoro/sys/kern/kern_racct.c:618 #5 0xffffffff80ab67d1 in racct_set (p=0xfffff800149f7000, resource=4, amount=0) at /usr/devel/svn/autoro/sys/kern/kern_racct.c:739 #6 0xffffffff80de9577 in vmspace_container_reset (p=<optimized out>) at /usr/devel/svn/autoro/sys/vm/vm_map.c:318 #7 vmspace_exit (td=0xfffff8005d353000) at /usr/devel/svn/autoro/sys/vm/vm_map.c:427 #8 0xffffffff80a8175d in exit1 (td=0xfffff8005d353000, rval=<optimized out>, signo=0) at /usr/devel/svn/autoro/sys/kern/kern_exit.c:403 #9 0xffffffff80a811ad in sys_sys_exit (td=0xffffffff81d58408 <cnputs_mtx>, uap=<optimized out>) at /usr/devel/svn/autoro/sys/kern/kern_exit.c:181 #10 0xffffffff80f7df7b in syscallenter (td=0xfffff8005d353000) at /usr/devel/svn/autoro/sys/amd64/amd64/../../kern/subr_syscall.c:134 #11 amd64_syscall (td=0xfffff8005d353000, traced=0) at /usr/devel/svn/autoro/sys/amd64/amd64/trap.c:936 (kgdb) fr 3 #3 0xffffffff80ab6ea2 in racct_adjust_resource (racct=0xfffff800074c6dd0, resource=4, amount=<optimized out>) at /usr/devel/svn/autoro/sys/kern/kern_racct.c:526 526 KASSERT(RACCT_IS_SLOPPY(resource) || RACCT_IS_DECAYING(resource), (kgdb) p *racct $1 = {r_resources = {15961661599, 98304, 794624, 0, -1314816, 0, 3, 0, 51683328, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 2817797305501, 132851, 0, 0, 0, 0}, r_rule_links = {lh_first = 0x0}} (kgdb) p *p->p_ucred->cr_loginclass $4 = {lc_next = {le_next = 0xfffff800030692c0, le_prev = 0xfffff8000377b380}, lc_name = "root", '\000' <repeats 28 times>, lc_refcount = 6, lc_racct = 0xfffff800074c6dd0} (kgdb) p *p->p_ucred->cr_loginclass->lc_racct $5 = {r_resources = {15961661599, 98304, 794624, 0, -1314816, 0, 3, 0, 51683328, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 2817797305501, 132851, 0, 0, 0, 0}, r_rule_links = {lh_first = 0x0}}
See https://reviews.freebsd.org/D15048
A commit references this bug: Author: avg Date: Fri Apr 20 13:08:05 UTC 2018 New revision: 332816 URL: https://svnweb.freebsd.org/changeset/base/332816 Log: call racct_proc_ucred_changed() under the proc lock The lock is required to ensure that the switch to the new credentials and the transfer of the process's accounting data from the old credentials to the new ones is done atomically. Otherwise, some updates may be applied to the new credentials and then additionally transferred from the old credentials if the updates happen after proc_set_cred() and before racct_proc_ucred_changed(). The problem is especially pronounced for RACCT_RSS because - there is a strict accounting for this resource (it's reclaimable) - it's updated asynchronously by the vm daemon - it's updated by setting an absolute value instead of applying a delta I had to remove a call to rctl_proc_ucred_changed() from racct_proc_ucred_changed() and make all callers of latter call the former as well. The reason is that rctl_proc_ucred_changed, as it is implemented now, cannot be called while holding the proc lock, so the lock is dropped after calling racct_proc_ucred_changed. Additionally, I've added calls to crhold / crfree around the rctl call, because without the proc lock there is no gurantee that the new credentials, owned by the process, will stay stable. That does not eliminate a possibility that the credentials passed to the rctl will get stale. Ideally, rctl_proc_ucred_changed should be able to work under the proc lock. Many thanks to kib for pointing out the above problems. PR: 222027 Discussed with: kib No comment: trasz MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D15048 Changes: head/sys/kern/kern_jail.c head/sys/kern/kern_loginclass.c head/sys/kern/kern_prot.c head/sys/kern/kern_racct.c head/sys/kern/kern_rctl.c