On 12-CURRENT r326527 Running a poudriere build results in high CPU load (up to 100%) for the ZFS kernel threads "arc_dnlc_evicts_thr" and "arc_reclaim_thread": ------------------------------------------------------------------------------------------------ last pid: 8708; load averages: 5.53, 1.49, 0.55 up 0+00:01:05 14:09:07 787 processes: 18 running, 728 sleeping, 2 zombie, 39 waiting Mem: 379M Active, 72M Inact, 765M Wired, 14G Free ARC: 250M Total, 22M MFU, 178M MRU, 314K Anon, 6575K Header, 43M Other 100M Compressed, 292M Uncompressed, 2.91:1 Ratio Swap: 32G Total, 32G Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 17 root -8 - 0K 192K CPU6 6 0:18 70.65% zfskern{arc_dnlc_evicts_thr} 1328 root 52 0 14300K 5280K piperd 0 0:09 22.36% sh 11 root 155 ki31 0K 128K RUN 5 0:21 16.46% idle{idle: cpu5} 11 root 155 ki31 0K 128K RUN 7 0:28 16.16% idle{idle: cpu7} 11 root 155 ki31 0K 128K RUN 0 0:29 15.48% idle{idle: cpu0} 11 root 155 ki31 0K 128K RUN 1 0:28 15.38% idle{idle: cpu1} 17 root -8 - 0K 192K arc_re 1 0:09 14.79% zfskern{arc_reclaim_thread} 11 root 155 ki31 0K 128K RUN 6 0:28 14.45% idle{idle: cpu6} 11 root 155 ki31 0K 128K RUN 4 0:27 14.36% idle{idle: cpu4} 11 root 155 ki31 0K 128K RUN 2 0:28 14.06% idle{idle: cpu2} 11 root 155 ki31 0K 128K RUN 3 0:28 13.77% idle{idle: cpu3} 1513 root 52 0 13024K 3968K nanslp 4 0:00 0.49% sh 15 root -16 - 0K 48K - 3 0:00 0.10% cam{doneq1} 0 root -16 - 0K 9024K swapin 4 0:28 0.00% kernel{swapper} 0 root 8 - 0K 9024K - 3 0:00 0.00% kernel{thread taskq} 12 root -96 - 0K 624K WAIT 5 0:00 0.00% intr{irq281: hdac1} 12 root -60 - 0K 624K WAIT 0 0:00 0.00% intr{swi4: clock (0)} 15 root -16 - 0K 48K - 6 0:00 0.00% cam{scanner} ------------------------------------------------------------------------------------------------ I have 16GB of RAM and ARC is not limited - so there should be no memory pressure. That behaviour does not exist on 11-STABLE (r326088). There, both mentioned threads only reaches about 1-2% at maximum (similar CPU). Any ideas how to fix that?
Addendum, even a simple "svnlite up" in "/usr/src" or "/usr/ports" let "arc_reclaim_thread" grow up to 70% load...
On my colleague's notebook running 12-CURRENT r325310 that behaviour isn't as strong; during "svnlite up" his "arc_reclaim_thread" process only rises up to 6-8% load - still higher than on 11-STABLE, though...
Ok, I've tried several revisions for good and bad results: -------------------------------------------------------------------- 326398 bad 326390 bad 326362 bad 326347 bad 326346 good 326343 good 326333 good 326286 good -------------------------------------------------------------------- So, starting with commit r326347 - "Eliminate kmem_arena and kmem_object in preparation for further NUMA commits." - my system here freaks out...
Cc: committer of r326346.
I can also reproduce this issue. In addition, my arc size hovers around arc_min now. I expect >90G uncompressed and I get <3G. This leads to terrible Poudriere performance.
(In reply to Nikolai Lifanov from comment #5) I can confirm the low ARC usage as well (now at 1GB usage for unlimited ARC permission with 16GB RAM). Seems that every millisecond parts of ARC are being thrown away or something...
I've noticed this before r326346. Probably a week before r326346. I haven't been able to point to any commit yet.
I will try to revert r326347 and see if it makes a difference. (In reply to Cy Schubert from comment #7) I think I got my dates wrong. It may be r326347. I will revert and try.
I can confirm that reverting r326347 resolves the issue. My zpools no longer thrash. Performance, due to greatly reduced I/O, has vastly improved. So far so good: last pid: 64954; load averages: 1.87, 1.86, 2.05 up 0+00:58:45 21:12:59 71 processes: 3 running, 68 sleeping CPU: 0.0% user, 64.0% nice, 33.3% system, 0.2% interrupt, 2.6% idle Mem: 1285M Active, 1367M Inact, 4489M Wired, 650M Buf, 800M Free ARC: 1997M Total, 356M MFU, 1313M MRU, 8244K Anon, 23M Header, 298M Other 1302M Compressed, 3372M Uncompressed, 2.59:1 Ratio Swap: 20G Total, 20G Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAN 64687 root 1 52 10 10604K 1996K select 1 0:00 1.63% make 60776 root 1 52 10 14460K 6092K select 1 0:00 0.41% make 33699 root 1 20 0 14312K 5308K select 0 0:00 0.22% screen 60761 root 1 46 10 10984K 2520K select 1 0:00 0.14% make 61698 root 1 20 0 13920K 4124K CPU1 1 0:00 0.11% top 33888 root 1 30 10 10196K 1500K select 0 0:00 0.11% make 64686 root 1 52 10 11692K 3180K wait 0 0:00 0.11% sh 33914 root 1 30 10 10996K 2520K select 0 0:00 0.10% make 2712 root 1 20 0 14352K 5624K select 1 0:00 0.02% devd 3166 uucp 1 20 0 12288K 3856K nanslp 1 0:00 0.01% upsmon 3127 root 1 20 0 10860K 2456K select 0 0:00 0.01% powerd 3124 root 2 20 0 19264K 19396K select 1 0:00 0.01% ntpd 2795 root 1 20 0 11400K 2616K select 0 0:00 0.01% syslog 3358 root 1 20 0 15748K 6640K select 1 0:00 0.00% sendma And a different machine: last pid: 81498; load averages: 1.86, 1.88, 2.04 up 0+00:59:45 21:13:59 62 processes: 2 running, 60 sleeping CPU: 0.4% user, 13.9% nice, 19.4% system, 0.6% interrupt, 65.7% idle Mem: 1291M Active, 1381M Inact, 4490M Wired, 650M Buf, 778M Free ARC: 2031M Total, 349M MFU, 1331M MRU, 8254K Anon, 24M Header, 319M Other 1313M Compressed, 3410M Uncompressed, 2.60:1 Ratio Swap: 20G Total, 20G Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAN 81192 root 1 83 10 19976K 11788K CPU1 1 0:00 5.95% make 33699 root 1 20 0 14312K 5308K select 1 0:01 0.37% screen 33888 root 1 30 10 10196K 1500K select 0 0:00 0.29% make 81169 root 1 52 10 14320K 5936K select 0 0:00 0.22% make 33914 root 1 30 10 11016K 2524K select 0 0:00 0.16% make 61698 root 1 20 0 13920K 4124K CPU0 0 0:00 0.10% top 2712 root 1 20 0 14352K 5624K select 0 0:00 0.02% devd 3127 root 1 20 0 10860K 2456K select 1 0:00 0.01% powerd 3166 uucp 1 20 0 12288K 3856K nanslp 1 0:00 0.01% upsmon 3124 root 2 20 0 19264K 19396K select 0 0:00 0.01% ntpd 2795 root 1 20 0 11400K 2616K select 0 0:00 0.01% syslog 3358 root 1 20 0 15748K 6640K select 0 0:00 0.00% sendma 25899 root 1 20 0 12876K 3684K select 0 0:00 0.00% klogin 33698 root 1 20 0 12392K 3648K pause 1 0:00 0.00% screen
During a buildworld -DNO_CLEAN (second machine): Total MFU MRU Anon Hdr L2Hdr Other ZFS ARC 2001M 464M 1300M 8695K 30M 0K 198M rate hits misses total hits total misses arcstats : 71% 660 258 274342 697629 arcstats.demand_data : 67% 218 107 96070 81229 arcstats.demand_metadata : 25% 44 129 43412 569950 arcstats.prefetch_data : 0% 0 22 10019 27947 arcstats.prefetch_metadata:100% 398 0 124841 18503 zfetchstats : 4% 99 2013 28345 3844648 arcstats.l2 : 0% 0 0 0 0 vdev_cache_stats : 58% 7 5 9786 19927 I'm getting pretty healty hit rates where as with r326347 the hit rates were in the low single digits.
last pid: 23981; load averages: 1.85, 1.80, 1.61 up 0+00:48:45 21:50:20 133 processes: 4 running, 129 sleeping CPU: 4.8% user, 45.2% nice, 27.7% system, 0.2% interrupt, 22.1% idle Mem: 1509M Active, 1341M Inact, 37M Laundry, 3593M Wired, 178M Buf, 1281M Free ARC: 1324M Total, 20M MFU, 653M MRU, 134M Anon, 19M Header, 498M Other 154M Compressed, 582M Uncompressed, 3.78:1 Ratio Swap: 12G Total, 12G Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAN 2658 cy 6 23 0 92956K 45940K select 3 1:50 10.70% mate-t 19465 root 1 25 0 56596K 47756K zio->i 2 0:09 10.37% rsync 2503 root 1 24 0 67200K 43964K select 2 1:47 9.91% Xorg 20179 root 1 24 0 56192K 47576K db->db 2 0:07 9.54% rsync 23948 root 1 52 10 10920K 2036K select 1 0:00 3.42% make 19772 root 1 20 0 11040K 2656K select 1 0:00 0.81% moused 23213 root 1 4 10 12992K 4312K select 0 0:00 0.21% make 12248 root 1 4 10 18088K 9680K select 3 0:01 0.17% make 23492 root 1 52 10 10560K 1804K select 1 0:00 0.14% make 23860 cy 1 20 0 13940K 4152K CPU2 2 0:00 0.11% top 2591 cy 1 20 0 16804K 6684K select 2 0:02 0.09% xscree 23947 root 1 52 10 11688K 3176K wait 0 0:00 0.09% sh 2560 cy 1 20 0 21684K 9336K select 2 0:02 0.09% mwm 3205 root 1 30 10 11084K 2392K select 3 0:00 0.08% make 72684 root 1 30 10 10224K 1388K select 0 0:01 0.08% make MFU should be much higher.
r326346 also needs to be backed out. Will try that too.
Looking at the workloads of my four systems, I'm not entirely convinced reverting r326346 will make as much difference as removing r326347 did. Though ARC is more in line with as it was before r326347, MFU fluctuates widely, from 55M to 500M. Better than single digit MB before. I will see if reverting r326346 makes a difference too. last pid: 50534; load averages: 1.32, 0.94, 0.96 up 0+01:05:30 22:07:05 123 processes: 2 running, 121 sleeping CPU: 1.6% user, 4.2% nice, 5.0% system, 0.6% interrupt, 88.7% idle Mem: 1018M Active, 1310M Inact, 3670M Wired, 178M Buf, 1763M Free ARC: 1327M Total, 177M MFU, 651M MRU, 15M Anon, 24M Header, 460M Other 283M Compressed, 943M Uncompressed, 3.33:1 Ratio Swap: 12G Total, 12G Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAN 2503 root 1 24 0 58704K 35612K select 1 2:16 7.25% Xorg 2658 cy 5 22 0 93644K 46620K CPU0 0 2:16 6.99% mate-t 50487 root 1 4 10 10476K 1836K select 2 0:00 1.53% make 49701 root 1 52 10 17436K 9024K select 1 0:00 0.16% make 2591 cy 1 20 0 16804K 6684K select 2 0:02 0.08% xscree 50489 root 1 52 10 11668K 3172K wait 3 0:00 0.08% sh 2560 cy 1 20 0 21684K 9336K select 0 0:03 0.07% mwm 23860 cy 1 20 0 14168K 4388K CPU3 3 0:01 0.07% top 50486 root 1 52 10 11688K 3176K wait 2 0:00 0.07% sh 25264 root 1 30 10 10224K 1388K select 1 0:00 0.07% make 48259 root 1 30 10 10956K 2336K select 2 0:00 0.06% make 46730 root 1 30 10 11036K 2344K select 1 0:00 0.06% make 3041 cy 1 20 0 12708K 3704K select 0 0:01 0.05% krlogi 25290 root 1 30 10 11128K 2416K select 2 0:00 0.05% make 49044 root 1 20 0 11040K 2656K select 0 0:00 0.04% moused
So far so good (machine 1): last pid: 47502; load averages: 1.60, 1.92, 1.69 up 0+00:34:24 22:57:38 51 processes: 2 running, 49 sleeping CPU: 0.6% user, 0.0% nice, 9.1% system, 0.2% interrupt, 90.2% idle Mem: 295M Active, 1858M Inact, 4233M Wired, 645M Buf, 1555M Free ARC: 1844M Total, 516M MFU, 980M MRU, 5136K Anon, 17M Header, 325M Other 1121M Compressed, 2912M Uncompressed, 2.60:1 Ratio Swap: 20G Total, 20G Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAN 47500 root 1 30 0 12268K 3628K CPU1 1 0:04 11.72% cpdup 18042 root 1 20 0 14132K 4340K CPU0 0 0:01 0.08% top 3127 root 2 20 0 19264K 19396K select 1 0:00 0.07% ntpd 3570 root 1 20 0 14312K 5184K select 0 0:06 0.01% screen 3130 root 1 20 0 10860K 2456K select 0 0:00 0.01% powerd 3123 root 9 52 0 29220K 5564K uwait 1 0:00 0.01% nscd 99226 root 1 20 0 12876K 3684K select 1 0:00 0.00% klogin 2911 root 1 20 0 12064K 3192K select 0 0:03 0.00% ypserv 2795 root 1 20 0 11348K 2564K select 0 0:00 0.00% syslog 2920 root 1 20 0 13448K 13536K select 1 0:00 0.00% amd 2712 root 1 20 0 14352K 5624K select 1 0:00 0.00% devd 3932 named 5 52 0 56212K 38848K sigwai 1 0:00 0.00% named 3169 uucp 1 20 0 12288K 3860K connec 1 0:00 0.00% upsmon 3359 root 1 20 0 15748K 6640K select 1 0:00 0.00% sendma
Machine 2, running buildworld -DNO_CLEAN and two rsyncs -- much better: last pid: 4958; load averages: 3.53, 1.93, 1.21 up 0+00:38:03 23:13:38 124 processes: 6 running, 118 sleeping CPU: 1.9% user, 83.1% nice, 12.5% system, 0.0% interrupt, 2.6% idle Mem: 1199M Active, 840M Inact, 44M Laundry, 5251M Wired, 81M Buf, 427M Free ARC: 3059M Total, 639M MFU, 1797M MRU, 10M Anon, 52M Header, 561M Other 1778M Compressed, 5101M Uncompressed, 2.87:1 Ratio Swap: 12G Total, 12G Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAN 4847 root 1 90 10 98M 68872K RUN 3 0:03 91.98% cc 4949 root 1 85 10 92972K 61428K RUN 0 0:01 62.55% cc 3818 root 1 26 0 242M 234M select 2 0:02 17.95% rsync 4945 root 1 52 10 10568K 1952K select 2 0:00 8.38% make 4321 root 1 25 0 269M 261M select 1 0:01 7.72% rsync 2639 cy 5 21 0 93388K 45296K CPU3 3 0:51 3.28% mate-t 2581 root 1 22 0 69376K 46348K select 0 0:55 2.47% Xorg 2855 cy 55 22 0 7866M 468M select 3 0:45 0.62% firefo 4948 root 1 52 10 70348K 38172K wait 0 0:00 0.40% cc 4825 root 1 4 10 11052K 2460K select 0 0:00 0.27% make 3824 root 1 20 0 11004K 2504K sbwait 2 0:00 0.15% rsh 3459 root 1 20 0 11040K 2656K select 0 0:00 0.14% moused 4941 root 1 52 10 11732K 3196K wait 2 0:00 0.12% sh 4944 root 1 52 10 11696K 3184K wait 3 0:00 0.12% sh 3367 root 1 52 10 28328K 20252K select 1 0:01 0.11% make
I've done some testing with r326347 reverted and with r326346 + r326347 reverted. My test included I/O intensive operations including buildworld/buildkernel -DNO_CLEAN and parallel rsync. Removing r326346 + r326347 showed no significant improvement over removal of only r326347 (below). r326347 definitely caused the issue. Note MFU. Performance is significantly better due to improved ARC caching. last pid: 2156; load averages: 2.56, 2.32, 1.66 up 0+00:25:52 00:32:57 89 processes: 4 running, 85 sleeping CPU: 1.3% user, 75.8% nice, 20.5% system, 0.8% interrupt, 1.7% idle Mem: 390M Active, 602M Inact, 5631M Wired, 63M Buf, 1317M Free ARC: 3585M Total, 1106M MFU, 2029M MRU, 14M Anon, 60M Header, 376M Other 2734M Compressed, 6084M Uncompressed, 2.23:1 Ratio Swap: 20G Total, 20G Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAN 63633 root 1 52 0 36720K 27164K RUN 1 2:28 7.38% pkg 2136 root 1 4 10 10548K 1932K select 0 0:00 6.32% make 1909 root 1 4 10 11808K 3144K select 1 0:00 0.36% make 2096 root 1 24 0 13804K 3952K CPU1 1 0:00 0.11% top 2135 root 1 52 10 11700K 3124K wait 1 0:00 0.11% sh 3598 root 1 20 0 14312K 5216K select 0 0:02 0.10% screen 98994 root 1 4 10 28516K 20320K select 1 0:01 0.10% make 1828 root 1 52 10 10404K 1608K select 1 0:00 0.07% make 61839 root 1 30 10 10196K 1324K select 1 0:00 0.05% make 1906 root 1 4 10 10360K 1468K select 1 0:00 0.05% make 98907 root 1 30 10 10980K 2340K select 1 0:00 0.04% make 99013 root 1 30 10 23784K 15508K select 1 0:00 0.04% make 2712 root 1 20 0 14352K 5624K select 1 0:00 0.02% devd 3137 root 1 20 0 10860K 2456K select 0 0:00 0.01% powerd
I can definitely confirm that running stock r326658 with r326347 reverted fixes the ARC under-use for me.
I think this diff will address the problem: https://reviews.freebsd.org/D13412
(In reply to Mark Johnston from comment #18) I confirmed that the fix in D13412 resolves the problem for me.
A commit references this bug: Author: markj Date: Thu Dec 7 19:38:09 UTC 2017 New revision: 326664 URL: https://svnweb.freebsd.org/changeset/base/326664 Log: Fix the UMA reclaim worker after r326347. atomic_set_*() sets a bit in the target memory location, so atomic_set_int(&uma_reclaim_needed, 0) does not do what it looks like it does. PR: 224080 Reviewed by: jeff, kib Differential Revision: https://reviews.freebsd.org/D13412 Changes: head/sys/vm/uma_core.c
(In reply to commit-hook from comment #20) thanks for the commit. Unfortunately, the issue still is not solved for me. How to repeat: 1) freshly reboot FreeBSD 12-CURRENT r326668M 2) login as "root" 3) start "top -s 1 -P -H -S" 4) switch to another VTY 5) login as "root" 6) start "svnlite up /usr/src" 7) switch back to VTY where "top" is running "arc_reclaim_thread" raises up to 70% CPU load...
(In reply to Nils Beyer from comment #21) and total ARC usage stays under 1GB though "vfs.zfs.arc_max" is set to 16GB...
Can you tell me what the following two sysctls hold: vm.kmem_map_free vm.kmem_map_size and then run this for a few seconds: dtrace -n 'fbt::uma_reclaim_wakeup:entry { @[stack()] = count (); }'
(In reply to Jeff Roberson from comment #23) -------------------------------------------------------------------------------- vm.kmem_map_free: 15830609920 vm.kmem_map_size: 801755136 #dtrace -n 'fbt::uma_reclaim_wakeup:entry { @[stack()] = count (); }' dtrace: description 'fbt::uma_reclaim_wakeup:entry ' matched 1 probe ^C (after svnlite up finishes) -------------------------------------------------------------------------------- I have to add that I'm not using any debug options in my kernel config...
I see another problem in arc_available_memory(): we're using vmem_size(kernel_arena, VMEM_FREE) to determine whether to reap memory. It used to be vmem_size(kmem_arena), and kmem_arena was initialized with vm_kmem_size bytes. But kernel_arena is grown on demand, so it's now bogus to use the arena's size to determine whether to reclaim memory.
We can use the new uma limit facility. I can produce a diff in a couple of hours. This is essentially the same thing that kmem_map_free does however: static int sysctl_kmem_map_free(SYSCTL_HANDLER_ARGS) { u_long size, limit; /* The sysctl is unsigned, implement as a saturation value. */ size = uma_size(); limit = uma_limit(); if (size > limit) size = 0; else size = limit - size; return (sysctl_handle_long(oidp, &size, 0, req)); } We should make a uma accessor to compute this.
Confirmed. The latest patch partially fixes the problem. Mem: 186M Active, 60M Inact, 2123M Wired, 103M Buf, 2572M Free ARC: 1012M Total, 368M MFU, 487M MRU, 11M Anon, 19M Header, 128M Other On my sandbox machine (5 GB RAM) (lets call it machine 3 now). bob<1># sysctl vm.kmem_map_free vm.kmem_map_free: 1089998848 bob<1># sysctl vm.kmem_map_size vm.kmem_map_size: 520613888 bob<1># sysctl hw.physmem hw.physmem: 5346369536 bob<1># bob# dtrace -n 'fbt::uma_reclaim_wakeup:entry { @[stack()] = count (); }' dtrace: description 'fbt::uma_reclaim_wakeup:entry ' matched 1 probe There is no output. I'll get a chance to more fully test on on another machine (machine 2 before) tonight. I've set up a test partition to boot from to allow for quick testing.
Continuing to test on my sandbox (other machines are still busy on other builds): ARC: 429M Total, 9067K MFU, 262M MRU, 2618K Anon, 4246K Header, 151M Other ^^^^ ^^^^^ buildworld/buildkernel -DNO_CLEAN and tar of /usr/src in progress. ARC: 445M Total, 3880K MFU, 281M MRU, 2288K Anon, 3056K Header, 156M Other ^^^^ ^^^^^ I unset the arbitrary limits kmem_size_max andn kmem_size limits. They are now: bob<2># sysctl vm.kmem_size vm.kmem_size: 5181820928 bob<2># sysctl vm.kmem_size_max vm.kmem_size_max: 1319413950874 bob<2>#
(In reply to Jeff Roberson from comment #26) I've applied a patch similar to the one discussed here. There is some improvement but more testing and comparison with r326347 removed is needed. On my two 8 GB machines (with zfs arc tuning removed from loader.conf) I get: 2 core + 2 threads: CPU: 0.9% user, 95.9% nice, 3.0% system, 0.0% interrupt, 0.2% idle Mem: 1500M Active, 318M Inact, 33M Laundry, 1762M Wired, 154M Buf, 4148M Free ARC: 840M Total, 141M MFU, 545M MRU, 16M Anon, 12M Header, 127M Other 288M Compressed, 893M Uncompressed, 3.10:1 Ratio slippy# sysctl vm.kmem_size vm.kmem_size: 8138756096 slippy# sysctl vm.kmem_size_max vm.kmem_size_max: 1319413950874 slippy# sysctl hw.physmem hw.physmem: 8365903872 slippy# slippy$ sysctl vfs.zfs.arc_max vfs.zfs.arc_max: 7065014272 slippy$ 2 core: CPU: 0.0% user, 94.9% nice, 4.1% system, 0.6% interrupt, 0.4% idle Mem: 752M Active, 269M Inact, 1705M Wired, 111M Buf, 5215M Free ARC: 862M Total, 218M MFU, 456M MRU, 15M Anon, 15M Header, 156M Other 343M Compressed, 1079M Uncompressed, 3.15:1 Ratio cwsys# sysctl vm.kmem_size vm.kmem_size: 8325849088 cwsys# sysctl vm.kmem_size_max vm.kmem_size_max: 1319413950874 cwsys# sysctl hw.physmem hw.physmem: 8566394880 cwsys# cwsys$ sysctl vfs.zfs.arc_max vfs.zfs.arc_max: 7252107264 cwsys$ It appears arc is being limited to ~ 840 - 860 MB on the 8 GB machines. The 5 GB machine (2 core): CPU: 0.0% user, 95.1% nice, 4.7% system, 0.2% interrupt, 0.0% idle Mem: 590M Active, 147M Inact, 1126M Wired, 101M Buf, 3077M Free ARC: 488M Total, 50M MFU, 334M MRU, 6630K Anon, 5713K Header, 91M Other 107M Compressed, 431M Uncompressed, 4.02:1 Ratio bob# sysctl vm.kmem_size vm.kmem_size: 5180735488 bob# sysctl vm.kmem_size_max vm.kmem_size_max: 1319413950874 bob# sysctl hw.physmem hw.physmem: 5345284096 bob# bob# sysctl vfs.zfs.arc_max vfs.zfs.arc_max: 4106993664 bob# We might want to consider reverting r326347 until we can fix this.
With r326347 removed and kern_size, kern_size_max and arc_max tunables removed from loader.conf: Laptop w 8 GB with two processes each of tar of /usr/src and /usr/obj and buildworld/buildkernel -DNO_CLEAN and firefox: CPU: 2.7% user, 95.0% nice, 2.1% system, 0.1% interrupt, 0.0% idle Mem: 617M Active, 67M Inact, 204M Laundry, 6505M Wired, 144M Buf, 369M Free ARC: 4879M Total, 1830M MFU, 2630M MRU, 21M Anon, 54M Header, 343M Other 3983M Compressed, 10G Uncompressed, 2.69:1 Ratio Swap: 12G Total, 382M Used, 12G Free, 3% Inuse, 128K In slippy$ sysctl vm.kmem_size vm.kmem_size: 8138752000 slippy$ sysctl vm.kmem_size_max vm.kmem_size_max: 1319413950874 slippy$ sysctl hw.physmem hw.physmem: 8365899776 slippy$ sysctl vfs.zfs.arc_max vfs.zfs.arc_max: 7065010176 slippy$ Server machine in basement with 8 GB also running two processes each of tar of /usr/src and /usr/obj and buildworld/buildkernel -DNO_CLEAN, no firefox: CPU: 0.0% user, 97.0% nice, 2.6% system, 0.4% interrupt, 0.0% idle Mem: 257M Active, 246M Inact, 36K Laundry, 6690M Wired, 110M Buf, 746M Free ARC: 5153M Total, 1262M MFU, 3472M MRU, 20M Anon, 53M Header, 346M Other 4364M Compressed, 11G Uncompressed, 2.51:1 Ratio Swap: 20G Total, 95M Used, 20G Free cwsys$ sysctl vm.kmem_size vm.kmem_size: 8325844992 cwsys$ sysctl vm.kmem_size_max vm.kmem_size_max: 1319413950874 cwsys$ sysctl hw.physmem hw.physmem: 8566390784 cwsys$ sysctl vfs.zfs.arc_max vfs.zfs.arc_max: 7252103168 cwsys$ This is also with additional patch discussed in comment #26 removed.
Created attachment 188780 [details] Proposed fix Please test this fix for arclowmem et all
Currently running on four amd64 machines. Initial tests indicate this PR is fixed. Thanks Jeff. I can test on an i386 machine this weekend.
On i386 this probably needs a cast. /opt/src/svn-current/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:6115:2 1: error: ordered comparison between pointer and integer ('uint64_t' (aka 'unsig ned long long') and 'long (*)(void)') [-Werror] available_memory = MIN(available_memory, uma_avail); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /opt/src/svn-current/sys/sys/param.h:305:23: note: expanded from macro 'MIN' #define MIN(a,b) (((a)<(b))?(a):(b)) ~~~^~~~
arc.c line 6115 s/b, available_memory = MIN(available_memory, uma_avail()); ^^ That'll fix i386 build.
(In reply to Jeff Roberson from comment #31) thank you. Confirmed - with this patch, "arc_reclaim_thread" stays quiet and ARC usage raises accordingly. Disk usage has become normal and CPU power goes into user processes. "arclowmem" tests anyone?
I can comfirm that uma_avail) should've been uma_avail()) on all architectures. pho has reported that this patch gives him identical performance to code pre-dating the change.
There is also the rpi2 and UFS tied list submittals against -r326347 causing booting to hang, starting with: https://lists.freebsd.org/pipermail/freebsd-hackers/2017-December/051939.html and ending with: https://lists.freebsd.org/pipermail/freebsd-hackers/2017-December/051947.html There are tied to boot hangs with the kernel thread (100001) involving: . . . uma_zcreate() at uma_zcreate+0x10c pc = 0xc052b024 lr = 0xc050cc90 (ffs_mount+0x80) sp = 0xd42a5858 fp = 0xd42a5980 r4 = 0xd7844000 r5 = 0x00000000 r6 = 0x00000003 r7 = 0xc09a99c0 r8 = 0x00000000 r9 = 0xd429f000 r10 = 0xd828c400 ffs_mount() at ffs_mount+0x80 pc = 0xc050cc90 lr = 0xc032e1d4 (vfs_donmount+0xeec) sp = 0xd42a5988 fp = 0xd42a5b38 r4 = 0xffffffff r5 = 0xd74ec120 r6 = 0xd429f000 r7 = 0x00000000 r8 = 0x00000000 r9 = 0xc425acd0 r10 = 0xd828c400 . . . and the uma thread involving: . . . uma_reclaim_locked() at uma_reclaim_locked+0x200 pc = 0xc052d9f8 lr = 0xc052de70 (uma_reclaim_worker+0x4c) sp = 0xd87dadf0 fp = 0xd87dae20 r4 = 0xc09a9c68 r5 = 0xc4242d80 r6 = 0xc06df86e r7 = 0x00000000 r8 = 0x00000100 r9 = 0xc09b3d08 r10 = 0xc09a9c7c uma_reclaim_worker() at uma_reclaim_worker+0x4c pc = 0xc052de70 lr = 0xc0230f10 (fork_exit+0xa0) sp = 0xd87dae28 fp = 0xd87dae40 r4 = 0xd7ad8740 r5 = 0xd70ce000 r6 = 0xc052de24 r7 = 0x00000000 r8 = 0xd87dae48 r9 = 0xd7ada3a0 r10 = 0xc0824e8c . . . (extracted from the last of those messages). bisect activity was used to isolate -r326347 as the version the problem started in.
(In reply to Mark Millard from comment #37) It probably makes sense to track this as a separate incident. It's easier to find this through keywords. What do you think?
(In reply to Cy Schubert from comment #38) I can extract from the prior list messages and make a submittal if desired. There is also a complaint for pine64 from Emmanuel Vadot: https://lists.freebsd.org/pipermail/svn-src-head/2017-December/106954.html reporting that as of -r326347 : The board just hang when plugin usb device That would seem to match my rpi2 reports: I have a USB SSD on a powered hub for the root file system on the rpi2 that I was using. (Recently the mmcsd0 mechanism to hold in the sdcard is broken but the eject mechanism still works. I hold in the sdcard during the early boot. But I've used a USB SSD for such small boards for a long time.)
(In reply to Cy Schubert from comment #38) I've submitted bugzilla 224330 for Emmanuel Vadot and my list messages tied to USB on pine64 (Emmanual) and rpi2 (me). Mostly I referenced the list messages instead of copying over the information. I do not have access to the Keywords field there. Nor to See Also here in bugzilla 224080.
Another -r326347 and zfs combination list report is part of the material in the pair: https://lists.freebsd.org/pipermail/freebsd-current/2017-November/067613.html https://lists.freebsd.org/pipermail/freebsd-current/2017-November/067618.html These folks might be able to test for if their problems have been fixed when things are ready for such.
The first report says r326347 was the last known good kernel. There may be other unrelated bugs.
(In reply to Jeff Roberson from comment #42) You may well be correct about unrelated bugs. But. . . Looks like I Parse the 067613.html report differently than you do: A) First it says: Most recent CURRENT, as with r326363, crashes on booting the kernel This is different than it later reports about -r326347 as I parse the statements. B) Then it says The last known half-ways good kernel is 12.0-CURRENT #157 r326347 Note the "half-ways" part. It then continues with what I interpreted as a description of "half-ways" in a booted -r326347 context (so, not for -r326363). When performing buildworld/buildkernel/relaese/package or poudriere on a ZFS filesystem, I can now bring down the system almost in a reproducible way. I could be wrong but I do not see that as a report of the "the last known good kernel".
I have sent the patch and a short description to the person who made the report. I agree I may have misparsed.
A commit references this bug: Author: jeff Date: Tue Jan 2 04:35:57 UTC 2018 New revision: 327485 URL: https://svnweb.freebsd.org/changeset/base/327485 Log: Fix arc after r326347 broke various memory limit queries. Use UMA features rather than kmem arena size to determine available memory. Initialize the UMA limit to LONG_MAX to avoid spurious wakeups on boot before the real limit is set. PR: 224330 (partial), 224080 Reviewed by: markj, avg Sponsored by: Netflix / Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D13494 Changes: head/sys/cddl/compat/opensolaris/sys/kmem.h head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c head/sys/vm/uma.h head/sys/vm/uma_core.c head/sys/vm/uma_int.h