Bug 193457 - Enabling RACCT/RCTL causes system hang
Summary: Enabling RACCT/RCTL causes system hang
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 9.3-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: Edward Tomasz Napierala
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-09-08 08:34 UTC by j.david.lists
Modified: 2014-09-15 13:07 UTC (History)
2 users (show)

See Also:


Attachments
Test case (585 bytes, text/plain)
2014-09-09 07:42 UTC, Edward Tomasz Napierala
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description j.david.lists 2014-09-08 08:34:42 UTC
When compiling a 9.3 kernel with the RACCT & RCTL options for rctl, the resulting system hangs repeatably.

To reproduce this, create a persistent jail and run some processes in it.  Nothing needs to be set up for rctl limits.

Once there are no more processes in the jail, run rctl on the jail:

# rctl -hu jail:jailname

The entire system will hang.

db> show threads
  100285 (0xfffffe0040a8a900) (stack 0xffffff812beaa000)  sched_switch() at sched_switch+0x250/frame 0xffffff812bead770
... probably over 100 more sched_switch() lines
  100046 (0xfffffe0002d03000) (stack 0xffffff81222a8000)  sched_switch() at sched_switch+0x250/frame 0xffffff81222abb60
  100045 (0xfffffe0002d03480) (stack 0xffffff81222a3000)  fork_trampoline() at fork_trampoline
  100044 (0xfffffe0002d03900) (stack 0xffffff812229e000)  sched_switch() at sched_switch+0x250/frame 0xffffff81222a1b60
  100041 (0xfffffe0002d05900) (stack 0xffffff812228f000)  sched_switch() at sched_switch+0x250/frame 0xffffff8122292b60
  100040 (0xfffffe0002d06000) (stack 0xffffff812228a000)  sched_switch() at sched_switch+0x250/frame 0xffffff812228db60
  100039 (0xfffffe0002c01000) (stack 0xffffff8122285000)  fork_trampoline() at fork_trampoline
  100036 (0xfffffe0002c0b000) (stack 0xffffff80f77ec000)  sched_switch() at sched_switch+0x250/frame 0xffffff80f77efb60
  100035 (0xfffffe0002c0b480) (stack 0xffffff80f77e7000)  sched_switch() at sched_switch+0x250/frame 0xffffff80f77eab60
  100034 (0xfffffe0002c0b900) (stack 0xffffff80f77e2000)  fork_trampoline() at fork_trampoline
  100033 (0xfffffe0002c0c000) (stack 0xffffff80f77d5000)  sched_switch() at sched_switch+0x250/frame 0xffffff80f77d8b60
  100032 (0xfffffe0002c0c480) (stack 0xffffff80f77d0000)  fork_trampoline() at fork_trampoline
  100026 (0xfffffe0002b28900) (stack 0xffffff80f77ac000)  fork_trampoline() at fork_trampoline
  100025 (0xfffffe0002b29000) (stack 0xffffff80002f5000)  fork_trampoline() at fork_trampoline
  100024 (0xfffffe0002b29480) (stack 0xffffff80002cd000)  fork_trampoline() at fork_trampoline
  100023 (0xfffffe0002b29900) (stack 0xffffff80002a5000)  sched_switch() at sched_switch+0x250/frame 0xffffff80002a8b60
  100022 (0xfffffe0002b23000) (stack 0xffffff80002a0000)  sched_switch() at sched_switch+0x250/frame 0xffffff80002a3b60
  100016 (0xfffffe0002b26000) (stack 0xffffff8000282000)  fork_trampoline() at fork_trampoline
  100014 (0xfffffe000296f480) (stack 0xffffff8000278000)  sched_switch() at sched_switch+0x250/frame 0xffffff800027bb60
  100008 (0xfffffe000295c900) (stack 0xffffff800025a000)  fork_trampoline() at fork_trampoline
  100007 (0xfffffe000296c000) (stack 0xffffff8000255000)  sched_switch() at sched_switch+0x250/frame 0xffffff8000258b60
  100006 (0xfffffe000296c480) (stack 0xffffff8000250000)  sched_switch() at sched_switch+0x250/frame 0xffffff8000253b60
  100005 (0xfffffe000296c900) (stack 0xffffff800024b000)  sched_switch() at sched_switch+0x250/frame 0xffffff800024eb60
  100004 (0xfffffe000295b000) (stack 0xffffff8000246000)  cpustop_handler() at cpustop_handler+0x27/frame 0xffffff8000233cf0
  100003 (0xfffffe000295b480) (stack 0xffffff8000241000)  kdb_break() at kdb_break+0x55/frame 0xffffff8000244990
  100002 (0xfffffe000295b900) (stack 0xffffff800023c000)  sched_switch() at sched_switch+0x250/frame 0xffffff800023f720
... another dozen or so sched_switch() threads
  100000 (0xffffffff8152eb80) (stack 0xffffffff817ff000)  sched_switch() at sched_switch+0x250/frame 0xffffffff81802bb0

db> show all procs
  pid  ppid  pgrp   uid   state   wmesg         wchan        cmd
 7255   645  7255   106  DsJ     allpriso 0xffffffff8152f340 jail_start
 7252  7233  7233     0  D       allproc  0xffffffff815326d0 php
 7233  7231  7233     0  Ss      wait     0xfffffe004070f000 sh
 7231   709   709     0  S       piperd   0xfffffe0002f312d8 cron
 1634  1616  1634  1000  Ds+     allproc  0xffffffff815326d0 bash
 1616  1527  1527  1000  S       select   0xfffffe0040a60dc0 sshd
 1527   689  1527     0  Ss      select   0xfffffe00405656c0 sshd
  937   645   937 25000  SsJ     select   0xfffffe0002f1c740 python2.7
  931   632   627     0  S       select   0xfffffe00409a7740 php
  929     0     0     0  DL      proctree 0xffffffff81532658 [newnfs 1]
  928     0     0     0  DL      proctree 0xffffffff81532658 [newnfs 0]
  925   924   925  1000  Ss+     ttyin    0xfffffe0002cf4ca8 bash
  924   921   921  1000  S       select   0xfffffe0002ded640 sshd
  921   689   921     0  Ss      select   0xfffffe00403f12c0 sshd
  860   643   627 25000  S       (threaded)                  httpd
100260                   S       accept   0xfffffe004041e30e httpd
100259                   S       uwait    0xfffffe00409a9280 httpd
.... lots of httpd's waiting around, probably not relevant ....
100132                   S       uwait    0xfffffe0040567500 httpd
100130                   S       piperd   0xfffffe0040757888 httpd
  855   643   627     0  S       piperd   0xfffffe0002f08b60 apache_log_client
  854   643   627 65534  S       piperd   0xfffffe0040757b60 clogd
  852   634   627    59  S       kqread   0xfffffe00408a9300 unbound
  751     1   751     0  Ss+     ttyin    0xfffffe0002cf78a8 getty
  750     1   750     0  Ss+     ttyin    0xfffffe0002d0d4a8 getty
  749     1   749     0  Ss+     ttyin    0xfffffe0002d0d8a8 getty
  748     1   748     0  Ss+     ttyin    0xfffffe0002d0f0a8 getty
  747     1   747     0  Ss+     ttyin    0xfffffe0002d0f4a8 getty
  746     1   746     0  Ss+     ttyin    0xfffffe0002d0fca8 getty
  745     1   745     0  Ss+     ttyin    0xfffffe0002d104a8 getty
  744     1   744     0  Ss+     ttyin    0xfffffe0002d100a8 getty
  743     1   743     0  Ss+     ttyin    0xfffffe0002d0f8a8 getty
  709     1   709     0  Ss      nanslp   0xffffffff81499470 cron
  705     1   705    25  Ss      pause    0xfffffe004043c548 sendmail
  701     1   701     0  Ss      select   0xfffffe0040565cc0 sendmail
  689     1   689     0  Ss      select   0xfffffe00402581c0 sshd
  656     1   656   181  Ss      select   0xfffffe0040258ac0 nrpe2
  643   637   627     0  D       proctree 0xffffffff81532658 httpd
  639   633   627     0  S       select   0xfffffe00409a8340 ntpd
  638   629   627     0  S       select   0xfffffe0002f1cdc0 supervise
  637   629   627     0  S       select   0xfffffe0002d857c0 supervise
  636   629   627     0  S       select   0xfffffe00403f15c0 supervise
  634   629   627     0  S       select   0xfffffe0002f1a4c0 supervise
  633   629   627     0  S       select   0xfffffe0002f1bb40 supervise
  632   629   627     0  S       select   0xfffffe0002f1a240 supervise
  630     1   627     0  S       piperd   0xfffffe00401cf000 readproctitle
  629     1   627     0  D       proctree 0xffffffff81532658 svscan
  571     1   571     0  Ss      rpcsvc   0xfffffe0002f1c6a8 NLM: master
  568     1   568     0  Ss      select   0xfffffe0002f1c3c0 rpc.statd
  545     0     0     0  DL      mdwait   0xfffffe004020d800 [md0]
  534     1   534     0  Ss      select   0xfffffe0002f1a6c0 amd
  523     1   523     0  Ss      select   0xfffffe0002dee4c0 rpcbind
  497     1   497     0  Ss      select   0xfffffe0002f1bbc0 syslogd
  379     1   379     0  Ss      select   0xfffffe0002f1a840 devd
   20     0     0     0  DL      -        0xffffffff81498320 [schedcpu]
   19     0     0     0  DL      -        0xffffffff81498320 [racctd]
   18     0     0     0  DL      sdflush  0xffffffff814c9594 [softdepflush]
   17     0     0     0  DL      syncer   0xffffffff814c2998 [syncer]
   16     0     0     0  DL      vlruwt   0xfffffe0002d8c000 [vnlru]
    9     0     0     0  DL      psleep   0xffffffff814c25bc [bufdaemon]
    8     0     0     0  DL      pgzero   0xffffffff814cae5c [pagezero]
    7     0     0     0  DL      pollid   0xffffffff81497e70 [idlepoll]
    6     0     0     0  DL      psleep   0xffffffff814ca0a0 [vmdaemon]
    5     0     0     0  DL      psleep   0xffffffff81541e84 [pagedaemon]
    4     0     0     0  DL      -        0xffffffff81460130 [xpt_thrd]
    3     0     0     0  DL      waiting_ 0xffffffff81535d48 [sctp_iterator]
    2     0     0     0  DL      pftm     0xffffffff80336bc0 [pfpurge]
   15     0     0     0  DL      (threaded)                  [usb]
100030                   D       -        0xffffff8000765ef0 [usbus0]
100029                   D       -        0xffffff8000765e98 [usbus0]
100028                   D       -        0xffffff8000765e40 [usbus0]
100027                   D       -        0xffffff8000765de8 [usbus0]
   14     0     0     0  DL      -        0xffffffff81498320 [yarrow]
   13     0     0     0  DL      (threaded)                  [geom]
100011                   D       -        0xffffffff8152e5d8 [g_down]
100010                   D       -        0xffffffff8152e5d0 [g_up]
100009                   D       -        0xffffffff8152e5c8 [g_event]
   12     0     0     0  WL      (threaded)                  [intr]
100046                   I                                   [swi0: uart]
100045                   I                                   [irq12: psm0]
100044                   I                                   [irq1: atkbd0]
100041                   I                                   [irq263: virtio_pci3]
100040                   I                                   [irq262: virtio_pci3]
100039                   I                                   [irq261: virtio_pci3]
100036                   I                                   [irq260: virtio_pci2]
100035                   I                                   [irq259: virtio_pci2]
100034                   I                                   [irq258: virtio_pci2]
100033                   I                                   [irq257: virtio_pci1]
100032                   I                                   [irq256: virtio_pci1]
100026                   I                                   [irq11: uhci0+]
100025                   I                                   [irq15: ata1]
100024                   I                                   [irq14: ata0]
100023                   I                                   [swi6: task queue]
100022                   I                                   [swi2: cambio]
100016                   I                                   [swi5: fast taskq]
100014                   I                                   [swi6: Giant taskq]
100008                   I                                   [swi3: vm]
100007                   I                                   [swi1: netisr 0]
100006                   I                                   [swi4: clock]
100005                   I                                   [swi4: clock]
   11     0     0     0  RL      (threaded)                  [idle]
100004                   Run     CPU 1                       [idle: cpu1]
100003                   Run     CPU 0                       [idle: cpu0]
    1     0     1     0  SLs     wait     0xfffffe0002958950 [init]
   10     0     0     0  DL      audit_wo 0xffffffff81539480 [audit]
    0     0     0     0  DLs     (threaded)                  [kernel]
100047                   D       -        0xfffffe0002d1dc00 [mca taskq]
100043                   D       -        0xfffffe0002ca8000 [vtnet1 txq 0]
100042                   D       -        0xfffffe0002ca8080 [vtnet1 rxq 0]
100038                   D       -        0xfffffe0002c0aa80 [vtnet0 txq 0]
100037                   D       -        0xfffffe0002c0ab00 [vtnet0 rxq 0]
100031                   D       vtbslp   0xfffffe0002c00080 [virtio_balloon]
100021                   D       -        0xfffffe0002b0b380 [acpi_task_2]
100020                   D       -        0xfffffe0002b0b380 [acpi_task_1]
100019                   D       -        0xfffffe0002b0b380 [acpi_task_0]
100018                   D       -        0xfffffe0002b0b400 [kqueue taskq]
100017                   D       -        0xfffffe0002b0b480 [ffs_trim taskq]
100015                   D       -        0xfffffe0002b0c080 [thread taskq]
100012                   D       -        0xfffffe0002970980 [firmware taskq]
100000                   D       swapin   0xffffffff8152e6c8 [swapper]


The stuck processes above are those that show state "D" with allproc, proctree, or allpriso.

Here is the info about racctd:

db> show proc 19  
Process 19 (racctd) at 0xfffffe0002d89000:
 state: NORMAL
 uid: 0  gids: 0
 parent: pid 0 at 0xffffffff8152e6c8
 ABI: null
 threads: 1
100059                   D       -        0xffffffff81498320 [racctd]


And here are the stack traces for some stuck processes:

db> trace 1634
Tracing pid 1634 tid 100074 td 0xfffffe0002ea4480
sched_switch() at sched_switch+0x250/frame 0xffffff812bab1930
mi_switch() at mi_switch+0xe1/frame 0xffffff812bab1970
sleepq_wait() at sleepq_wait+0x3a/frame 0xffffff812bab19a0
_sx_xlock_hard() at _sx_xlock_hard+0x4b9/frame 0xffffff812bab1a50
fork1() at fork1+0x31e/frame 0xffffff812bab1b00
sys_fork() at sys_fork+0x19/frame 0xffffff812bab1b20
amd64_syscall() at amd64_syscall+0x338/frame 0xffffff812bab1c30
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xffffff812bab1c30
--- syscall (2, FreeBSD ELF64, sys_fork), rip = 0x800bb308c, rsp = 0x7fffffffe4e8, rbp = 0x7fffffffe530 ---

db> trace 7255
Tracing pid 7255 tid 100285 td 0xfffffe0040a8a900
sched_switch() at sched_switch+0x250/frame 0xffffff812bead770
mi_switch() at mi_switch+0xe1/frame 0xffffff812bead7b0
sleepq_wait() at sleepq_wait+0x3a/frame 0xffffff812bead7e0
_sx_xlock_hard() at _sx_xlock_hard+0x4b9/frame 0xffffff812bead890
kern_jail_set() at kern_jail_set+0x3eb8/frame 0xffffff812beadaf0
sys_jail_set() at sys_jail_set+0x41/frame 0xffffff812beadb20
amd64_syscall() at amd64_syscall+0x338/frame 0xffffff812beadc30
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xffffff812beadc30
--- syscall (507, FreeBSD ELF64, sys_jail_set), rip = 0x80191ab1c, rsp = 0x7fffffffe268, rbp = 0x7fffffffe310 ---

So it looks like racctd is holding onto a lock that is deadlocking any other thread or process from touching the process or jail tables.  The underlying cause is probably related to racctd trying to handle the empty jail.

This issue is extremely reproducible for us, so if any other information would be helpful, please let me know.
Comment 1 Edward Tomasz Napierala freebsd_committer freebsd_triage 2014-09-09 07:42:02 UTC
Created attachment 147098 [details]
Test case
Comment 2 Edward Tomasz Napierala freebsd_committer freebsd_triage 2014-09-09 07:42:40 UTC
Comment on attachment 147098 [details]
Test case

Testcase, submitted by bug originator.
Comment 3 commit-hook freebsd_committer freebsd_triage 2014-09-09 16:05:55 UTC
A commit references this bug:

Author: trasz
Date: Tue Sep  9 16:05:33 UTC 2014
New revision: 271317
URL: http://svnweb.freebsd.org/changeset/base/271317

Log:
  Avoid unlocking unlocked mutex in RCTL jail code.  Specific test case
  is attached to PR.

  PR:		193457
  MFC after:	1 week
  Sponsored by:	The FreeBSD Foundation

Changes:
  head/sys/kern/kern_jail.c
Comment 4 commit-hook freebsd_committer freebsd_triage 2014-09-15 13:02:15 UTC
A commit references this bug:

Author: trasz
Date: Mon Sep 15 13:01:48 UTC 2014
New revision: 271622
URL: http://svnweb.freebsd.org/changeset/base/271622

Log:
  MFC r271317:

  Avoid unlocking unlocked mutex in RCTL jail code.  Specific test case
  is attached to PR.

  PR:		193457
  Approved by:	re (kib)
  Sponsored by:	The FreeBSD Foundation

Changes:
_U  stable/10/
  stable/10/sys/kern/kern_jail.c