Bug 210315 - panic: destroying non-empty racct: 2113536 allocated for resource 4
Summary: panic: destroying non-empty racct: 2113536 allocated for resource 4
Status: Closed DUPLICATE of bug 222027
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: Andriy Gapon
URL:
Keywords: patch
Depends on:
Blocks:
 
Reported: 2016-06-16 08:38 UTC by Andriy Gapon
Modified: 2018-04-06 17:27 UTC (History)
3 users (show)

See Also:
avg: mfc-stable11?
avg: mfc-stable10?


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andriy Gapon freebsd_committer freebsd_triage 2016-06-16 08:38:18 UTC
I have got this panic twice so far.  Both times were just at the end of a poudriere bulk build.  The problem is not readily reproducible, so it could be that a certain port somehow provokes it.
I can provide more details. I can test any diagnostic patches.

Some details:
(kgdb) bt
#0  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:295
#1  0xffffffff8062d7ef in kern_reboot (howto=<optimized out>) at
/usr/src/sys/kern/kern_shutdown.c:363
#2  0xffffffff8062de38 in vpanic (fmt=<optimized out>, ap=0xfffffe0519b73920) at
/usr/src/sys/kern/kern_shutdown.c:639
#3  0xffffffff8062db43 in panic (fmt=<unavailable>) at
/usr/src/sys/kern/kern_shutdown.c:572
#4  0xffffffff8061ef1c in racct_destroy_locked (racctp=<optimized out>) at
/usr/src/sys/kern/kern_racct.c:478
#5  0xffffffff8061ee45 in racct_destroy (racct=0xfffff802f6301518) at
/usr/src/sys/kern/kern_racct.c:495
#6  0xffffffff805fdd3c in prison_racct_free_locked (prr=0xfffff802f6301400) at
/usr/src/sys/kern/kern_jail.c:4564
#7  0xffffffff805fdc8d in prison_racct_free (prr=0xfffff802f6301400) at
/usr/src/sys/kern/kern_jail.c:4583
#8  0xffffffff805fddee in prison_racct_detach (pr=0xfffff802b0730000) at
/usr/src/sys/kern/kern_jail.c:4658
#9  0xffffffff805fb2cb in prison_deref (pr=<optimized out>, flags=3) at
/usr/src/sys/kern/kern_jail.c:2663
#10 0xffffffff805fca25 in prison_remove_one (pr=<optimized out>) at
/usr/src/sys/kern/kern_jail.c:2358
#11 0xffffffff805fc8e4 in sys_jail_remove (td=<optimized out>, uap=<optimized
out>) at /usr/src/sys/kern/kern_jail.c:2313
#12 0xffffffff80820ddd in syscallenter (td=0xfffff801146019e0,
sa=0xfffffe0519b73b80) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:135
#13 0xffffffff808209af in amd64_syscall (td=0xfffff801146019e0, traced=0) at
/usr/src/sys/amd64/amd64/trap.c:943

RACCT_RSS is 4.

(kgdb) p *prr
$5 = {
  prr_next = {
    le_next = 0xfffff80382fe4400,
    le_prev = 0xfffff8017ac90600
  },
  prr_name = "basejail-default-job-03", '\000' <repeats 232 times>,
  prr_refcount = 0,
  prr_racct = 0xfffff802e3f520b0
}
(kgdb) p *prr->prr_racct
$6 = {
  r_resources = {13884177072, 0, 0, 0, 2113536, 0 <repeats 14 times>,
13611325009, 0},
  r_rule_links = {
    lh_first = 0x0
  }
}
Comment 1 Andriy Gapon freebsd_committer freebsd_triage 2016-12-09 12:19:32 UTC
I've just got another of these panics.

I wonder if it could be caused by this code:

			size = vmspace_resident_count(vm);
			if (size >= limit) {
				vm_pageout_map_deactivate_pages(
				    &vm->vm_map, limit);
			}
#ifdef RACCT
			if (racct_enable) {
				rsize = IDX_TO_OFF(size);
				PROC_LOCK(p);
				racct_set(p, RACCT_RSS, rsize);
				ravailable = racct_get_available(p, RACCT_RSS);
				PROC_UNLOCK(p);


Specifically, we calculate rsize from size, but the value of the latter could be outdated if size was larger than limit and vm_pageout_map_deactivate_pages() got called.

As a side note, I wonder why greater-than-or-equal is used for comparing size and limit.
Comment 2 Larry Rosenman freebsd_committer freebsd_triage 2017-01-20 14:40:04 UTC
Andriy asked me to add my panic to this, it seems related.

FreeBSD borg.lerctr.org 12.0-CURRENT FreeBSD 12.0-CURRENT #13 r311997: Sat Jan 14 22:35:29 CST 2017     root@borg.lerctr.org:/usr/obj/usr/src/sys/VT-LER  amd64

panic: racct_adjust_resource: resource 4 usage < 0

GNU gdb (GDB) 7.12 [GDB v7.12 for FreeBSD]
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd12.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /boot/kernel/kernel...Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...done.
done.

Unread portion of the kernel message buffer:
<118>.
<118>Terminated
<118>Jan 19 17:54:50 192.168.200.11 last message repeated 13 times
<118>Jan 19 17:54:59 borg syslogd: exiting on signal 15
panic: racct_adjust_resource: resource 4 usage < 0
cpuid = 1
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe2eb7c18830
vpanic() at vpanic+0x186/frame 0xfffffe2eb7c188b0
kassert_panic() at kassert_panic+0x126/frame 0xfffffe2eb7c18920
racct_adjust_resource() at racct_adjust_resource+0xca/frame 0xfffffe2eb7c18950
racct_set_locked() at racct_set_locked+0xec/frame 0xfffffe2eb7c18990
racct_set() at racct_set+0x54/frame 0xfffffe2eb7c189c0
vmspace_exit() at vmspace_exit+0x147/frame 0xfffffe2eb7c18a00
exit1() at exit1+0x56b/frame 0xfffffe2eb7c18a60
sys_sys_exit() at sys_sys_exit+0xd/frame 0xfffffe2eb7c18a70
amd64_syscall() at amd64_syscall+0x2ea/frame 0xfffffe2eb7c18bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe2eb7c18bf0
--- syscall (1, FreeBSD ELF64, sys_sys_exit), rip = 0x8025c916a, rsp = 0x7fffffffebd8, rbp = 0x7fffffffebf0 ---
Uptime: 4d4h19m38s
Dumping 12670 out of 64463 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /usr/lib/debug//boot/kernel/zfs.ko.debug...done.
done.
Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /usr/lib/debug//boot/kernel/opensolaris.ko.debug...done.
done.
Reading symbols from /boot/kernel/linux.ko...Reading symbols from /usr/lib/debug//boot/kernel/linux.ko.debug...done.
done.
Reading symbols from /boot/kernel/linux_common.ko...Reading symbols from /usr/lib/debug//boot/kernel/linux_common.ko.debug...done.
done.
Reading symbols from /boot/kernel/if_lagg.ko...Reading symbols from /usr/lib/debug//boot/kernel/if_lagg.ko.debug...done.
done.
Reading symbols from /boot/kernel/snd_envy24ht.ko...Reading symbols from /usr/lib/debug//boot/kernel/snd_envy24ht.ko.debug...done.
done.
Reading symbols from /boot/kernel/snd_spicds.ko...Reading symbols from /usr/lib/debug//boot/kernel/snd_spicds.ko.debug...done.
done.
Reading symbols from /boot/kernel/coretemp.ko...Reading symbols from /usr/lib/debug//boot/kernel/coretemp.ko.debug...done.
done.
Reading symbols from /boot/kernel/fuse.ko...Reading symbols from /usr/lib/debug//boot/kernel/fuse.ko.debug...done.
done.
Reading symbols from /boot/kernel/ichsmb.ko...Reading symbols from /usr/lib/debug//boot/kernel/ichsmb.ko.debug...done.
done.
Reading symbols from /boot/kernel/smbus.ko...Reading symbols from /usr/lib/debug//boot/kernel/smbus.ko.debug...done.
done.
Reading symbols from /boot/kernel/ichwd.ko...Reading symbols from /usr/lib/debug//boot/kernel/ichwd.ko.debug...done.
done.
Reading symbols from /boot/kernel/cpuctl.ko...Reading symbols from /usr/lib/debug//boot/kernel/cpuctl.ko.debug...done.
done.
Reading symbols from /boot/kernel/cryptodev.ko...Reading symbols from /usr/lib/debug//boot/kernel/cryptodev.ko.debug...done.
done.
Reading symbols from /boot/kernel/dtraceall.ko...Reading symbols from /usr/lib/debug//boot/kernel/dtraceall.ko.debug...done.
done.
Reading symbols from /boot/kernel/profile.ko...Reading symbols from /usr/lib/debug//boot/kernel/profile.ko.debug...done.
done.
Reading symbols from /boot/kernel/dtrace.ko...Reading symbols from /usr/lib/debug//boot/kernel/dtrace.ko.debug...done.
done.
Reading symbols from /boot/kernel/systrace_freebsd32.ko...Reading symbols from /usr/lib/debug//boot/kernel/systrace_freebsd32.ko.debug...done.
done.
Reading symbols from /boot/kernel/systrace.ko...Reading symbols from /usr/lib/debug//boot/kernel/systrace.ko.debug...done.
done.
Reading symbols from /boot/kernel/sdt.ko...Reading symbols from /usr/lib/debug//boot/kernel/sdt.ko.debug...done.
done.
Reading symbols from /boot/kernel/fasttrap.ko...Reading symbols from /usr/lib/debug//boot/kernel/fasttrap.ko.debug...done.
done.
Reading symbols from /boot/kernel/fbt.ko...Reading symbols from /usr/lib/debug//boot/kernel/fbt.ko.debug...done.
done.
Reading symbols from /boot/kernel/dtnfscl.ko...Reading symbols from /usr/lib/debug//boot/kernel/dtnfscl.ko.debug...done.
done.
Reading symbols from /boot/kernel/dtmalloc.ko...Reading symbols from /usr/lib/debug//boot/kernel/dtmalloc.ko.debug...done.
done.
Reading symbols from /boot/modules/vboxdrv.ko...(no debugging symbols found)...done.
Reading symbols from /boot/kernel/ipmi.ko...Reading symbols from /usr/lib/debug//boot/kernel/ipmi.ko.debug...done.
done.
Reading symbols from /boot/kernel/ipmi_linux.ko...Reading symbols from /usr/lib/debug//boot/kernel/ipmi_linux.ko.debug...done.
done.
Reading symbols from /boot/kernel/hwpmc.ko...Reading symbols from /usr/lib/debug//boot/kernel/hwpmc.ko.debug...done.
done.
Reading symbols from /boot/kernel/filemon.ko...Reading symbols from /usr/lib/debug//boot/kernel/filemon.ko.debug...done.
done.
Reading symbols from /boot/kernel/uhid.ko...Reading symbols from /usr/lib/debug//boot/kernel/uhid.ko.debug...done.
done.
Reading symbols from /boot/modules/vboxnetflt.ko...(no debugging symbols found)...done.
Reading symbols from /boot/kernel/netgraph.ko...Reading symbols from /usr/lib/debug//boot/kernel/netgraph.ko.debug...done.
done.
Reading symbols from /boot/kernel/ng_ether.ko...Reading symbols from /usr/lib/debug//boot/kernel/ng_ether.ko.debug...done.
done.
Reading symbols from /boot/modules/vboxnetadp.ko...(no debugging symbols found)...done.
Reading symbols from /boot/kernel/nullfs.ko...Reading symbols from /usr/lib/debug//boot/kernel/nullfs.ko.debug...done.
done.
Reading symbols from /boot/kernel/fdescfs.ko...Reading symbols from /usr/lib/debug//boot/kernel/fdescfs.ko.debug...done.
done.
Reading symbols from /boot/kernel/linprocfs.ko...Reading symbols from /usr/lib/debug//boot/kernel/linprocfs.ko.debug...done.
done.
Reading symbols from /boot/kernel/tmpfs.ko...Reading symbols from /usr/lib/debug//boot/kernel/tmpfs.ko.debug...done.
done.
__curthread () at ./machine/pcpu.h:222
222             __asm("movq %%gs:%1,%0" : "=r" (td)
(kgdb) #0  __curthread () at ./machine/pcpu.h:222
#1  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:318
#2  0xffffffff80a2ffb5 in kern_reboot (howto=<optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:386
#3  0xffffffff80a30590 in vpanic (fmt=<optimized out>, ap=0xfffffe2eb7c188f0)
    at /usr/src/sys/kern/kern_shutdown.c:779
#4  0xffffffff80a303c6 in kassert_panic (
    fmt=0xffffffff813ee4fb "%s: resource %d usage < 0")
    at /usr/src/sys/kern/kern_shutdown.c:669
#5  0xffffffff80a21eca in racct_adjust_resource (racct=0xfffff8001b7c00d0,
    resource=4, amount=<optimized out>) at /usr/src/sys/kern/kern_racct.c:528
#6  0xffffffff80a21acc in racct_set_locked (p=0xfffff80055f41528,
    resource=<optimized out>, amount=0, force=0)
    at /usr/src/sys/kern/kern_racct.c:718
#7  0xffffffff80a21994 in racct_set (p=0xfffff80055f41528, resource=4,
    amount=0) at /usr/src/sys/kern/kern_racct.c:741
#8  0xffffffff80d0f8e7 in vmspace_container_reset (p=<optimized out>)
    at /usr/src/sys/vm/vm_map.c:311
#9  vmspace_exit (td=<optimized out>) at /usr/src/sys/vm/vm_map.c:420
#10 0xffffffff809f01ab in exit1 (td=<optimized out>, rval=<optimized out>,
    signo=<optimized out>) at /usr/src/sys/kern/kern_exit.c:399
#11 0xffffffff809efc3d in sys_sys_exit (td=<unavailable>, uap=<optimized out>)
    at /usr/src/sys/kern/kern_exit.c:178
#12 0xffffffff80e9a98a in syscallenter (td=0xfffff80055de6000,
    sa=<optimized out>)
    at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:135
#13 amd64_syscall (td=0xfffff80055de6000, traced=0)
    at /usr/src/sys/amd64/amd64/trap.c:902
#14 <signal handler called>
Can't read data for section '.eh_frame' in file '/'
(kgdb)

vmcore IS available.
Comment 3 Larry Rosenman freebsd_committer freebsd_triage 2017-01-20 15:08:59 UTC
my panic was on a shutdown -r for the record.
Comment 4 Larry Rosenman freebsd_committer freebsd_triage 2017-01-20 19:31:37 UTC
I had run a BUNCH of poudriere runs during this boots lifetime if that's relevant.

My poudriere web is at:
http://home.lerctr.org:8888/
Comment 5 Edward Tomasz Napierala freebsd_committer freebsd_triage 2017-01-21 11:45:25 UTC
I have no idea what the reason could be, and I don't really have a good idea on how to track down the root of the cause for those bugs.

Perhaps we should just throw those kasserts under DIAGNOSTICS for now?
Comment 6 Andriy Gapon freebsd_committer freebsd_triage 2017-01-21 21:02:10 UTC
Disabling the checks would, of course, make the symptoms disappear.
But I still wonder what causes the problem(s).

In Larry's case it's especially strange, because starting with racct_set_locked() all calculations are done under the lock.  So, how can it be possible that we end up with any value other than zero in racct_adjust_resource()?
Essentially we do:
delta = 0 - value;
value = value + delta;

It would be interesting to examine the coredump with kgdb.
maybe it would give some clues about how the impossible became possible.

FWIW, I was not able to find any write access to r_resources that was not protected by racct_lock.
Comment 7 Larry Rosenman freebsd_committer freebsd_triage 2017-01-21 21:04:15 UTC
I do have the core, if you tell me commands, or I could give someone(tm) an account....

borg.lerctr.org /var/crash $ ls -lh *6*
-rw-r--r--  1 root  wheel   164K Jan 19 18:04 core.txt.6
-rw-r--r--  1 root  wheel   475B Jan 19 18:00 info.6
-rw-r--r--  1 root  wheel   8.2K Jan 19 18:04 panicmail.6
-rw-r--r--  1 root  wheel    12K Jan 19 18:04 panicmail.6.enc
-rw-r--r--  1 root  wheel    12G Jan 19 18:03 vmcore.6
borg.lerctr.org /var/crash $
Comment 8 Edward Tomasz Napierala freebsd_committer freebsd_triage 2017-01-22 12:50:52 UTC
I suspect it was a racct for either user, jail, or login class, not the proc racct.
Comment 9 Edward Tomasz Napierala freebsd_committer freebsd_triage 2017-01-22 12:52:37 UTC
Larry, thanks for the offer, but I don't really have any idea what to look for.  The problem with those kasserts is they fire when they detect something went wrong, not when something _is_ going wrong.
Comment 10 Larry Rosenman freebsd_committer freebsd_triage 2017-01-22 14:36:59 UTC
I've held off svn up'ing to keep the source tree intact.  Do you want me to continue in that vain and save the vmcore or don't you think you can use it?

With poudriere in the mix jail's are a good possibility.
Comment 11 Andriy Gapon freebsd_committer freebsd_triage 2017-01-30 13:14:14 UTC
(In reply to Edward Tomasz Napierala from comment #8)
Yes, that must be true.  It is impossible to get that problem for the process racct.

I have one hypothesis now.
Consider the following sequence of events:
0. the vm daemon gets a reference on a vm space of a process
1. the vm daemon invokes the first racct_set(p, RACCT_RSS, rsize) call for the process
2. the process exits and vmspace_exit() is called for the the vm space
3. the vm daemon calls racct_set(p, RACCT_RSS, rsize) within if (rsize > ravailable) block
4. the vm daemon releases the vm space

I think that in this scenario we would do the following:
1. set RACCT_RSS of the process to X
2. set RACCT_RSS of the process to zero
3. set RACCT_RSS of the process (now a zombie) to Y

I think that that last call would screw up the accounting for raccts that include the process.  In fact, it also screws up the accounting for the process itself, but that is masked by the code in racct_proc_exit() that sets to zero all reclaimable resources.
Ah, and because of what racct_proc_exit() does, we have this:
4. set RACCT_RSS of the process (now a zombie) to zero

So, it looks like only the jail accounting is affected and we see it only in the window between exit1() dropping the reference to the jail and proc_reap().

So, to recap, the full sequence:
0. the vm daemon gets a reference on a vm space of a jailed process
1. the vm daemon invokes the first racct_set(p, RACCT_RSS, rsize) call for the process
2. the process exits and vmspace_exit() is called for the the vm space
3. the vm daemon calls racct_set(p, RACCT_RSS, rsize) within if (rsize > ravailable) block
4. the vm daemon releases the vm space
5. the process drops its reference to the jail
6. the jail gets destroyed <-- here we hit the problem

I think that the above could explain the panics that I saw.
Not sure if the Larry's panic can be fit into the hypothesis.
Comment 12 Larry Rosenman freebsd_committer freebsd_triage 2017-01-30 13:56:49 UTC
I **WAS** running LOTS of Poudriere runs, ergo LOTS of jails, so yes. 

My one panic was at a shutdown -r.
Comment 13 Andriy Gapon freebsd_committer freebsd_triage 2017-02-07 09:36:38 UTC
(In reply to Andriy Gapon from comment #11)
An attempt to fix the problem is here: https://reviews.freebsd.org/D9464
Comment 14 commit-hook freebsd_committer freebsd_triage 2017-02-14 13:55:01 UTC
A commit references this bug:

Author: avg
Date: Tue Feb 14 13:54:05 UTC 2017
New revision: 313730
URL: https://svnweb.freebsd.org/changeset/base/313730

Log:
  try to fix RACCT_RSS accounting

  There could be a race between the vm daemon setting RACCT_RSS based on
  the vm space and vmspace_exit (called from exit1) resetting RACCT_RSS to
  zero.  In that case we can get a zombie process with non-zero RACCT_RSS.
  If the process is jailed, that may break accounting for the jail.
  There could be other consequences.

  Fix this race in the vm daemon by updating RACCT_RSS only when a process
  is in the normal state.  Also, make accounting a little bit more
  accurate by refreshing the page resident count after calling
  vm_pageout_map_deactivate_pages().
  Finally, add an assert that the RSS is zero when a process is reaped.

  PR:		210315
  Reviewed by:	trasz
  Differential Revision: https://reviews.freebsd.org/D9464

Changes:
  head/sys/kern/kern_racct.c
  head/sys/vm/vm_pageout.c
Comment 15 Andriy Gapon freebsd_committer freebsd_triage 2017-02-27 17:30:21 UTC
MFC-ed as base r314347 and base r314348.
Comment 16 Andriy Gapon freebsd_committer freebsd_triage 2018-04-06 17:24:42 UTC
The bug has not been really fixed.
Comment 17 Andriy Gapon freebsd_committer freebsd_triage 2018-04-06 17:27:05 UTC
But it's been superseded by bug 222027, so mark this one as a duplicate of that one (even though this is an earlier bug and that one is more recent).

*** This bug has been marked as a duplicate of bug 222027 ***