Bug 190300

Summary: vmdaemon kernel thread consumes 100% cpu
Product: Base System Reporter: zuborg
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Only Me CC: alc, emaste, pi, steve.polyack
Priority: Normal    
Version: 10.0-RELEASE   
Hardware: Any   
OS: Any   

Description zuborg 2014-05-27 13:00:00 UTC
10.0 release have a bug which cause `vmdaemon` kernel thread to consume all available cpu.

Server run postgresql-9.3 (which uses mmap() actively) and ~30 perl processes which also use mmap() for its data files.


last pid: 97992;  load averages:  3.34,  4.16,  4.39                                                                                                                  up 4+23:02:49  11:50:38
144 processes: 6 running, 123 sleeping, 1 waiting, 14 lock
CPU:  8.0% user,  0.0% nice,  4.8% system,  0.3% interrupt, 86.9% idle
Mem: 14G Active, 42G Inact, 4289M Wired, 122M Cache, 1668M Buf, 1240M Free
Swap: 4096M Total, 4096M Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
   10 root         24 155 ki31     0K   384K CPU23  23 2845.2 2065.48% idle
   16 root          1 -16    -     0K    16K RUN     0  23:23  98.73% vmdaemon
76669 cdn           1  31    0  7270M  2634M accept  4  13:04  18.26% perl5.16.3
76674 cdn           1  30    0  7270M  2628M accept  7  12:38  17.14% perl5.16.3
76665 cdn           1  29    0  7270M  2610M accept  5  12:00  15.87% perl5.16.3
76672 cdn           1  28    0  7270M  2649M accept  2  14:34  15.58% perl5.16.3
76668 cdn           1  28    0  7270M  2621M accept 10  12:02  15.43% perl5.16.3
76673 cdn           1  29    0  7270M  2632M accept 18  12:58  14.99% perl5.16.3
76667 cdn           1  27    0  7271M  2611M CPU14  14  11:55  14.75% perl5.16.3
76676 cdn           1  27    0  7274M  2648M CPU7    7  13:41  14.65% perl5.16.3
76670 cdn           1  28    0  7270M  2621M accept 12  12:12  14.55% perl5.16.3
76671 cdn           1  27    0  7270M  2611M accept 22  12:02  13.92% perl5.16.3
76675 cdn           1  28    0  7270M  2600M accept  2  11:57  13.62% perl5.16.3
76666 cdn           1  25    0  7270M  2619M select 13  12:25   9.81% perl5.16.3
76695 cdn           1  26    0  7294M  2504M select 19   4:06   8.01% perl5.16.3
76702 cdn           1  23    0  7294M  2514M select  5   4:15   5.57% perl5.16.3
76701 cdn           1  25    0  7294M  2511M select 21   4:10   5.42% perl5.16.3
76699 cdn           1  24    0  7294M  2488M select 20   4:01   5.42% perl5.16.3
76694 cdn           1  25    0  7294M  2490M select  7   3:53   5.22% perl5.16.3
76691 cdn           1  23    0  7295M  2493M select 12   3:49   5.13% perl5.16.3
   15 root          1 -16    -     0K    16K psleep  6  10:34   4.54% pagedaemon
76693 cdn           1  23    0  7295M  2486M select 17   3:52   4.39% perl5.16.3
76700 cdn           1  22    0  7294M  2500M select  9   3:53   3.86% perl5.16.3
76698 cdn           1  22    0  7294M  2486M select 20   3:54   3.52% perl5.16.3
76692 cdn           1  22    0  7290M  2482M select 22   3:44   3.03% perl5.16.3
76697 cdn           1  22    0  7290M  2466M select 10   3:43   2.78% perl5.16.3
76715 pgsql         1  22    0 21253M  5046M *vm ob 14   1:38   2.54% postgres
76696 cdn           1  21    0  7294M  2470M select  6   3:44   2.44% perl5.16.3
76687 pgsql         1  25    0 21253M   140M sbwait 20   0:41   1.90% postgres
76683 pgsql         1  24    0 21253M   139M sbwait  6   0:44   1.86% postgres
42720 pgsql         1  23    0 21253M 13997M *vm ob 14  21:26   1.32% postgres
76710 pgsql         1  23    0 21253M  4971M *vm ob  8   1:37   1.27% postgres
76706 pgsql         1  23    0 21253M  5005M *vm ob 14   1:38   0.83% postgres
76704 pgsql         1  23    0 21253M  5011M *vm ob 21   1:38   0.83% postgres
76713 pgsql         1  23    0 21253M  5024M *vm ob  2   1:37   0.83% postgres
76711 pgsql         1  23    0 21253M  4964M *vm ob 23   1:36   0.83% postgres
76712 pgsql         1  24    0 21253M  4994M *vm ob 23   1:37   0.63% postgres
76707 pgsql         1  22    0 21253M  5023M *vm ob  3   1:38   0.59% postgres
76708 pgsql         1  22    0 21253M  5024M *vm ob  8   1:37   0.59% postgres

How-To-Repeat: I don't know exactly, may be run postgresql-9.3 under some load.
Comment 1 zuborg 2014-05-27 14:45:05 UTC
FreeBSD 9.x is not affected

`swapoff -a` didn't help

When I've reduced load on server -' vmdaemon' continues to use 100% cpu,
but stop doing it when I've restarted perl worker processes.
Comment 2 steve.polyack 2014-07-25 12:19:29 UTC
We're also seeing this on a FreeBSD server which has recently been upgraded from 9.2 to 10.0, and using postgresql-9.3  The server has 256GB of RAM, but only 64GB are dedicated to postgres via shared_buffers.

We've also noticed mbuf denied failures and very poor interactive performance when vmdaemon CPU usage is high, which is almost all of the time.

# vmstat -z
ITEM                   SIZE  LIMIT     USED     FREE      REQ FAIL SLEEP

UMA Kegs:               384,      0,      95,       5,      95,   0,   0
UMA Zones:             4736,      0,      95,       0,      95,   0,   0
UMA Slabs:               80,      0,    7694,       6,   22132,   0,   0
UMA RCntSlabs:           88,      0,   11976,      39,   11976,   0,   0
UMA Hash:               256,      0,       2,      13,       8,   0,   0
4 Bucket:                32,      0,     157,    6468,37583750,   0,   0
8 Bucket:                64,      0,     236,    3794,    5426,   0,   0
16 Bucket:              128,      0,     371,    3659,     728,  36,   0
32 Bucket:              256,      0,     593,    6292,  162352,4213,   0
64 Bucket:              512,      0,    1019,    1493,  444970,232031,   0
128 Bucket:            1024,      0,    1428,    1264, 1637010, 277,   0
vmem btag:               56,      0,    3206,   12059,  299550, 215,   0
VM OBJECT:              256,      0,   33864,    2376,  798557,   0,   0
RADIX NODE:             144,      0, 4311308,   12742,15145691,  57,   0
MAP:                    240,      0,       3,      61,       3,   0,   0
KMAP ENTRY:             128,      0,       7,     396,       7,   0,   0
MAP ENTRY:              128,      0,    6265,    5112, 1725230,   0,   0
VMSPACE:                448,      0,     145,    1106,   34759,   0,   0
fakepg:                 104,      0,       0,       0,       0,   0,   0
mt_zone:               4112,      0,     343,       0,     343,   0,   0
16:                      16,      0,    4637,    3144, 1155861,   0,   0
32:                      32,      0,    4603,    3897,  821025,   0,   0
64:                      64,      0,    9892,   68910,122945074,   0,   0
128:                    128,      0,    7760,   48350, 9999121,   0,   0
256:                    256,      0,    7126,   48704, 9244069,   0,   0
512:                    512,      0,    1053,   32379,  123871,   0,   0
1024:                  1024,      0,     123,     673,   40412,   0,   0
2048:                  2048,      0,     341,     207,   46366,   0,   0
4096:                  4096,      0,     573,     190,   39254,   0,   0
SLEEPQUEUE:              80,      0,     622,    3532,     622,   0,   0
uint64 pcpu:              8,      0,    1552,     112,    1552,   0,   0
Files:                   80,      0,    2208,    4542,39412500,   0,   0
TURNSTILE:              136,      0,     622,    1518,     622,   0,   0
rl_entry:                40,      0,     415,    3585,     415,   0,   0
umtx pi:                 96,      0,       0,       0,       0,   0,   0
MAC labels:              40,      0,       0,       0,       0,   0,   0
PROC:                  1208,      0,     162,     219,   34775,   0,   0
THREAD:                1168,      0,     571,      50,     626,   0,   0
cpuset:                  72,      0,     176,     209,     208,   0,   0
audit_record:          1248,      0,       0,       0,       0,   0,   0
mbuf_packet:            256, 104592120,   16370,    7380,112650926,   0,   0
mbuf:                   256, 104592120,      16,    4494,100391690,   5,   0
mbuf_cluster:          2048, 16342518,   23750,       0,   23750,12060,   0
mbuf_jumbo_page:       4096, 8171259,       0,     101,37590064,  92,   0
mbuf_jumbo_9k:         9216, 2421113,       0,       0,       0,   0,   0
mbuf_jumbo_16k:       16384, 1361876,       0,       0,       0,   0,   0
mbuf_ext_refcnt:          4,      0,       0,       0,       0,   0,   0
g_bio:                  248,      0,       0,    7392,12897141,   0,   0
ttyinq:                 160,      0,     300,    1125,    1245,   0,   0
ttyoutq:                256,      0,     157,    1193,     655,   0,   0
ata_request:            336,      0,       0,       0,       0,   0,   0
vtnet_tx_hdr:            24,      0,       0,       0,       0,   0,   0
FPU_save_area:          832,      0,       0,       0,       0,   0,   0
VNODE:                  472,      0,  114408,     928,  155200,   0,   0
VNODEPOLL:              112,      0,       0,     770,      13,   0,   0
BUF TRIE:               144,      0,    3714,  102234, 9440298,   0,   0
S VFS Cache:            108,      0,  120884,    3891,  223399,   0,   0
STS VFS Cache:          148,      0,       0,       0,       0,   0,   0
L VFS Cache:            328,      0,     441,     375,     845,   0,   0
LTS VFS Cache:          368,      0,       0,       0,       0,   0,   0
NAMEI:                 1024,      0,       0,     520,41296842,   0,   0
DIRHASH:               1024,      0,     934,     210,    1275,   0,   0
NCLNODE:                528,      0,       1,      13,       1,   0,   0
Mountpoints:            816,      0,      10,      70,      11,   0,   0
pipe:                   744,      0,     116,     524,   24303,   0,   0
procdesc:               128,      0,       0,       0,       0,   0,   0
ksiginfo:               112,      0,     373,    4002,   16780,   0,   0
itimer:                 352,      0,       0,      33,       1,   0,   0
KNOTE:                  128,      0,      23,    3728, 2156120,   0,   0
socket:                 696, 8386445,     245,     485,  109956,   0,   0
ipq:                     56, 510703,       0,       0,       0,   0,   0
udp_inpcb:              392, 8386450,       6,     924,   98789,   0,   0
udpcb:                   16, 8386663,       6,    4010,   98789,   0,   0
tcp_inpcb:              392, 8386450,     176,     784,    2934,   0,   0
tcpcb:                 1024, 8386448,     176,     372,    2934,   0,   0
tcptw:                   88,  27810,       0,    1125,     258,   0,   0
syncache:               160,  15375,       0,     750,    2342,   0,   0
hostcache:              136,  15370,      17,     679,      58,   0,   0
tcpreass:                40, 1021500,       0,     800,   33960,   0,   0
sackhole:                32,      0,       0,     375,      37,   0,   0
sctp_ep:               1408, 8386446,       0,       0,       0,   0,   0
sctp_asoc:             2352,  40000,       0,       0,       0,   0,   0
sctp_laddr:              48,  80012,       0,     664,       6,   0,   0
sctp_raddr:             728,  80000,       0,       0,       0,   0,   0
sctp_chunk:             136, 400026,       0,       0,       0,   0,   0
sctp_readq:             104, 400026,       0,       0,       0,   0,   0
sctp_stream_msg_out:    104, 400026,       0,       0,       0,   0,   0
sctp_asconf:             40, 400000,       0,       0,       0,   0,   0
sctp_asconf_ack:         48, 400060,       0,       0,       0,   0,   0
ripcb:                  392, 8386450,       0,      30,       1,   0,   0
unpcb:                  240, 8386448,      63,    1889,    8227,   0,   0
rtentry:                200,      0,      25,     475,      25,   0,   0
selfd:                   56,      0,     554,    4629,51234943,   0,   0
SWAPMETA:               288, 32685042,       0,       0,       0,   0,   0
FFS inode:              168,      0,  114348,    1871,  155129,   0,   0
FFS1 dinode:            128,      0,       0,       0,       0,   0,   0
FFS2 dinode:            256,      0,  114348,    1857,  155128,   0,   0
Comment 3 Kurt Jaeger freebsd_committer freebsd_triage 2014-08-13 15:26:15 UTC
Are you aware of the recent work on that topic ?

https://www.freebsd.org/news/status/report-2014-04-2014-06.html#PostgreSQL-
Performance-Improvements

Maybe kib@ knows more about this ?
Comment 4 steve.polyack 2014-08-18 18:43:17 UTC
I can confirm that r265945 fixes this.  Unfortunately, it's only available in the 10-STABLE tree.  We patched it into 10.0-RELEASE internally.
Comment 5 steve.polyack 2014-09-03 22:15:18 UTC
My initial assessment was incorrect, r265945 does not fix the problem.  However, r270666 by alc does fix the problem.  This has been MFC'd to 10-STABLE in r270995.

I think we can mark this as fixed.
Comment 6 Alan Cox freebsd_committer freebsd_triage 2014-09-06 18:57:10 UTC
Patch applied to 10-STABLE.  Fix will appear in 10.1-RELEASE.