| Summary: | ZFS ARC and L2ARC are unrealistically large, maybe after r307265 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Base System | Reporter: | Lev A. Serebryakov <lev> | ||||||||
| Component: | kern | Assignee: | Andriy Gapon <avg> | ||||||||
| Status: | Closed FIXED | ||||||||||
| Severity: | Affects Some People | CC: | avg, ben.rubson, k_georgiev, lev, mav, remi.guyomarch | ||||||||
| Priority: | --- | ||||||||||
| Version: | 11.0-STABLE | ||||||||||
| Hardware: | Any | ||||||||||
| OS: | Any | ||||||||||
| Attachments: |
|
||||||||||
Maybe, related to bug #216364 (In reply to Lev A. Serebryakov from comment #0) Lev, I haven't forgotten about this issue. Unfortunately, I can not devote as much time as I would like to it. I revisited it again and I couldn't spot anything new besides what my old patch was supposed to fix: https://docs.freebsd.org/cgi/getmsg.cgi?fetch=24428+0+archive/2016/freebsd-fs/20161106.freebsd-fs I know that George Wilson was going to work on this problem. Perhaps, he has something for you to test or debug. Lev, I couldn't reproduce your problem using FreeBSD head and a similar configuration (raidz + l2arc) and couldn't reproduce the problem with L2ARC accounting as much I tried. I could provide any additional information, including dumping kernel structures with debugger and such.
Now it shows:
% zpool list -v
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
zroot 39.9G 14.6G 25.3G - 49% 36% 1.00x ONLINE -
gpt/root 39.9G 14.6G 25.3G - 49% 36%
zstor 13.6T 9.01T 4.62T - 24% 66% 1.00x ONLINE -
raidz1 13.6T 9.01T 4.62T - 24% 66%
ada1 - - - - - -
ada2 - - - - - -
ada3 - - - - - -
ada4 - - - - - -
ada5 - - - - - -
cache - - - - - -
gpt/l2arc 185G 1.05T 16.0E - 0% 581%
%
which looks completely unrealistic, as it is more than 4x compression!
(In reply to Lev A. Serebryakov from comment #4) It has nothing to do with compression that the numbers are for the physical / actual space on disk. It's a problem with the space accounting. I am not sure if the bad accounting and the checksum errors have the same cause or if the checksum errors somehow cause the bad accounting. Is there a chance that you could use a different disk for the cache device? I could use another 750 EVO disk, whole 120GB one instead of partition on 250GB, in day or two (I'm waiting for additional HBA right now!). Maybe, checksum errors is result of TRIM errors? Now this SSD doesn't have any quirks in our (FreeBSD) sources, but its "big brothers" (like 850 EVO and 850 PRO) are marked with "no TRIM with NCQ" (see discussion about this SSD and its quirks here: https://reviews.freebsd.org/D9478). But ZFS (zroot pool) on same device (other partition) doesn't show any errors, I've checked it with "scrub". (In reply to Lev A. Serebryakov from comment #6) Hard to tell. To be honest the TRIM code for L2ARC is rather useless. I would remove it altogether to exclude its effects. I also faced this "16.0E" issue (without compression enabled) : https://www.illumos.org/issues/7410 Created attachment 179917 [details]
the local patch
Just to clarify, I am using head plus the local patch, not a vanilla head.
The patch should be the same as the one I sent you earlier.
(In reply to Andriy Gapon from comment #9) Ok, I've replaced L2ARC to whole Samsung 850 EVO 250 SSD on "mpr" (LSI-3008) HBA and rebuilt system from r313940 (stable/11) + this patch. Lets see! Same result.
11.0-STABLE FreeBSD 11.0-STABLE #8 r313940M: Sun Feb 19 15:16:42 MSK 2017
Attached patch was applied!
% sudo smartctl -A /dev/da5
smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.0-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Samsung based SSDs
Device Model: Samsung SSD 850 EVO 250GB
Serial Number: S2R4NB0J115081X
LU WWN Device Id: 5 002538 d41a03ed9
Firmware Version: EMT02B6Q
User Capacity: 250,059,350,016 bytes [250 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 4c
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Feb 20 14:41:13 2017 MSK
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
...
% zpool list -v
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
zroot 95.5G 18.1G 77.4G - 16% 18% 1.00x ONLINE -
gpt/root 95.5G 18.1G 77.4G - 16% 18%
zstor 13.6T 9.15T 4.48T - 25% 67% 1.00x ONLINE -
raidz1 13.6T 9.15T 4.48T - 25% 67%
da1 - - - - - -
da0 - - - - - -
da2 - - - - - -
da3 - - - - - -
da4 - - - - - -
cache - - - - - -
da5 233G 526G 16.0E - 0% 225%
% sysctl -a | grep l2
kern.features.linuxulator_v4l2: 1
kern.cam.ctl2cam.max_sense: 252
vfs.zfs.l2c_only_size: 0
vfs.zfs.l2arc_norw: 1
vfs.zfs.l2arc_feed_again: 1
vfs.zfs.l2arc_noprefetch: 1
vfs.zfs.l2arc_feed_min_ms: 200
vfs.zfs.l2arc_feed_secs: 1
vfs.zfs.l2arc_headroom: 2
vfs.zfs.l2arc_write_boost: 8388608
vfs.zfs.l2arc_write_max: 8388608
vfs.cache.numfullpathfail2: 0
kstat.zfs.misc.arcstats.l2_write_buffer_list_null_iter: 4522
kstat.zfs.misc.arcstats.l2_write_buffer_list_iter: 567486
kstat.zfs.misc.arcstats.l2_write_buffer_bytes_scanned: 10356335975424
kstat.zfs.misc.arcstats.l2_write_pios: 131233
kstat.zfs.misc.arcstats.l2_write_buffer_iter: 142392
kstat.zfs.misc.arcstats.l2_write_full: 34822
kstat.zfs.misc.arcstats.l2_write_not_cacheable: 6310602
kstat.zfs.misc.arcstats.l2_write_io_in_progress: 376
kstat.zfs.misc.arcstats.l2_write_in_l2: 63312026
kstat.zfs.misc.arcstats.l2_write_spa_mismatch: 3663622
kstat.zfs.misc.arcstats.l2_write_passed_headroom: 255811
kstat.zfs.misc.arcstats.l2_write_trylock_fail: 6901
kstat.zfs.misc.arcstats.l2_padding_needed: 0
kstat.zfs.misc.arcstats.l2_hdr_size: 93854992
kstat.zfs.misc.arcstats.l2_asize: 565424867840
kstat.zfs.misc.arcstats.l2_size: 567082335232
kstat.zfs.misc.arcstats.l2_io_error: 0
kstat.zfs.misc.arcstats.l2_cksum_bad: 430864
kstat.zfs.misc.arcstats.l2_abort_lowmem: 3
kstat.zfs.misc.arcstats.l2_free_on_write: 273
kstat.zfs.misc.arcstats.l2_evict_l1cached: 69699
kstat.zfs.misc.arcstats.l2_evict_reading: 0
kstat.zfs.misc.arcstats.l2_evict_lock_retry: 10
kstat.zfs.misc.arcstats.l2_writes_lock_retry: 4
kstat.zfs.misc.arcstats.l2_writes_error: 0
kstat.zfs.misc.arcstats.l2_writes_done: 131233
kstat.zfs.misc.arcstats.l2_writes_sent: 131233
kstat.zfs.misc.arcstats.l2_write_bytes: 648248758272
kstat.zfs.misc.arcstats.l2_read_bytes: 326535054336
kstat.zfs.misc.arcstats.l2_rw_clash: 0
kstat.zfs.misc.arcstats.l2_feeds: 142392
kstat.zfs.misc.arcstats.l2_misses: 3610592
kstat.zfs.misc.arcstats.l2_hits: 1151244
kstat.zfs.misc.arcstats.evict_l2_skip: 0
kstat.zfs.misc.arcstats.evict_l2_ineligible: 351227303936
kstat.zfs.misc.arcstats.evict_l2_eligible: 21983154688
kstat.zfs.misc.arcstats.evict_l2_cached: 1323198704640
%
kstat.zfs.misc.arcstats.l2_cksum_bad starts to rise right after L2ARC "overfill"!
Same thing here, running 10.3-STABLE r313140.
It did NOT happen on r301989.
This is a large virtual NAS, offering both NFSv3 and SMB shares. Cache devices are also virtualized, TRIM isn't running here.
# zpool list -v tank
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
tank 288T 174T 114T - 19% 60% 1.00x ONLINE -
raidz2 48,0T 28,9T 19,0T - 19% 60%
da9 - - - - - -
da10 - - - - - -
da11 - - - - - -
da12 - - - - - -
da13 - - - - - -
da14 - - - - - -
raidz2 48,0T 28,9T 19,0T - 19% 60%
da15 - - - - - -
da16 - - - - - -
da17 - - - - - -
da18 - - - - - -
da19 - - - - - -
da20 - - - - - -
raidz2 48,0T 28,9T 19,0T - 19% 60%
da21 - - - - - -
da22 - - - - - -
da23 - - - - - -
da24 - - - - - -
da25 - - - - - -
da26 - - - - - -
raidz2 48,0T 29,0T 19,0T - 19% 60%
da27 - - - - - -
da28 - - - - - -
da29 - - - - - -
da30 - - - - - -
da31 - - - - - -
da32 - - - - - -
raidz2 48,0T 28,9T 19,0T - 19% 60%
da33 - - - - - -
da34 - - - - - -
da35 - - - - - -
da36 - - - - - -
da37 - - - - - -
da38 - - - - - -
raidz2 48,0T 28,9T 19,0T - 19% 60%
da39 - - - - - -
da40 - - - - - -
da41 - - - - - -
da42 - - - - - -
da43 - - - - - -
da44 - - - - - -
log - - - - - -
mirror 2,98G 2,01M 2,98G - 20% 0%
da1 - - - - - -
da2 - - - - - -
cache - - - - - -
da3 256G 764G 16,0E - 0% 298%
da4 256G 757G 16,0E - 0% 295%
da5 256G 762G 16,0E - 0% 297%
da6 256G 747G 16,0E - 0% 291%
da7 256G 776G 16,0E - 0% 303%
da8 256G 743G 16,0E - 0% 290%
# zfs-stats -a
------------------------------------------------------------------------
ZFS Subsystem Report Mon Feb 20 16:33:13 2017
------------------------------------------------------------------------
System Information:
Kernel Version: 1003511 (osreldate)
Hardware Platform: amd64
Processor Architecture: amd64
ZFS Storage pool Version: 5000
ZFS Filesystem Version: 5
FreeBSD 10.3-STABLE #2 r313140M: Fri Feb 3 09:38:12 CET 2017 root
16:33 up 7 days, 8:32, 1 user, load averages: 0,16 0,37 0,55
------------------------------------------------------------------------
System Memory:
0.00% 5.28 MiB Active, 0.47% 670.72 MiB Inact
89.61% 125.70 GiB Wired, 0.00% 0 Cache
9.92% 13.92 GiB Free, 0.00% 4.00 KiB Gap
Real Installed: 160.00 GiB
Real Available: 89.98% 143.97 GiB
Real Managed: 97.44% 140.28 GiB
Logical Total: 160.00 GiB
Logical Used: 90.89% 145.42 GiB
Logical Free: 9.11% 14.58 GiB
Kernel Memory: 1.30 GiB
Data: 97.94% 1.27 GiB
Text: 2.06% 27.29 MiB
Kernel Memory Map: 140.28 GiB
Size: 82.86% 116.24 GiB
Free: 17.14% 24.04 GiB
------------------------------------------------------------------------
ARC Summary: (HEALTHY)
Memory Throttle Count: 0
ARC Misc:
Deleted: 23.76m
Recycle Misses: 0
Mutex Misses: 36.19k
Evict Skips: 6.43k
ARC Size: 83.09% 115.73 GiB
Target Size: (Adaptive) 83.11% 115.76 GiB
Min Size (Hard Limit): 12.50% 17.41 GiB
Max Size (High Water): 8:1 139.28 GiB
ARC Size Breakdown:
Recently Used Cache Size: 62.61% 72.48 GiB
Frequently Used Cache Size: 37.39% 43.28 GiB
ARC Hash Breakdown:
Elements Max: 14.41m
Elements Current: 98.47% 14.19m
Collisions: 16.02m
Chain Max: 7
Chains: 2.27m
------------------------------------------------------------------------
ARC Efficiency: 3.28b
Cache Hit Ratio: 18.94% 620.69m
Cache Miss Ratio: 81.06% 2.66b
Actual Hit Ratio: 5.26% 172.47m
Data Demand Efficiency: 30.02% 138.53m
Data Prefetch Efficiency: 82.77% 124.81m
CACHE HITS BY CACHE LIST:
Anonymously Used: 71.34% 442.81m
Most Recently Used: 2.11% 13.07m
Most Frequently Used: 25.68% 159.41m
Most Recently Used Ghost: 0.02% 102.06k
Most Frequently Used Ghost: 0.86% 5.31m
CACHE HITS BY DATA TYPE:
Demand Data: 6.70% 41.58m
Prefetch Data: 16.64% 103.30m
Demand Metadata: 3.92% 24.33m
Prefetch Metadata: 72.74% 451.49m
CACHE MISSES BY DATA TYPE:
Demand Data: 3.65% 96.95m
Prefetch Data: 0.81% 21.51m
Demand Metadata: 95.51% 2.54b
Prefetch Metadata: 0.03% 880.35k
------------------------------------------------------------------------
L2 ARC Summary: (DEGRADED)
Passed Headroom: 975.75k
Tried Lock Failures: 121.14m
IO In Progress: 5
Low Memory Aborts: 217
Free on Write: 181.66k
Writes While Full: 46.56k
R/W Clashes: 0
Bad Checksums: 3.00m
IO Errors: 0
SPA Mismatch: 1.97b
L2 ARC Size: (Adaptive) 6.82 TiB
Header Size: 0.01% 973.67 MiB
L2 ARC Evicts:
Lock Retries: 120
Upon Reading: 0
L2 ARC Breakdown: 2.66b
Hit Ratio: 0.61% 16.16m
Miss Ratio: 99.39% 2.64b
Feeds: 688.84k
L2 ARC Buffer:
Bytes Scanned: 5.87 PiB
Buffer Iterations: 688.84k
List Iterations: 2.75m
NULL List Iterations: 2.97k
L2 ARC Writes:
Writes Sent: 100.00% 365.54k
------------------------------------------------------------------------
File-Level Prefetch: (HEALTHY)
DMU Efficiency: 16.05b
Hit Ratio: 2.21% 354.01m
Miss Ratio: 97.79% 15.69b
Colinear: 0
Hit Ratio: 100.00% 0
Miss Ratio: 100.00% 0
Stride: 0
Hit Ratio: 100.00% 0
Miss Ratio: 100.00% 0
DMU Misc:
Reclaim: 0
Successes: 100.00% 0
Failures: 100.00% 0
Streams: 0
+Resets: 100.00% 0
-Resets: 100.00% 0
Bogus: 0
------------------------------------------------------------------------
VDEV Cache Summary: 5.56m
Hit Ratio: 22.19% 1.23m
Miss Ratio: 65.26% 3.63m
Delegations: 12.55% 696.99k
------------------------------------------------------------------------
ZFS Tunables (sysctl):
kern.maxusers 9550
vm.kmem_size 150625865728
vm.kmem_size_scale 1
vm.kmem_size_min 0
vm.kmem_size_max 1319413950874
vfs.zfs.trim.max_interval 1
vfs.zfs.trim.timeout 30
vfs.zfs.trim.txg_delay 32
vfs.zfs.trim.enabled 0
vfs.zfs.vol.unmap_enabled 1
vfs.zfs.vol.mode 1
vfs.zfs.version.zpl 5
vfs.zfs.version.spa 5000
vfs.zfs.version.acl 1
vfs.zfs.version.ioctl 7
vfs.zfs.debug 0
vfs.zfs.super_owner 0
vfs.zfs.sync_pass_rewrite 2
vfs.zfs.sync_pass_dont_compress 5
vfs.zfs.sync_pass_deferred_free 2
vfs.zfs.zio.dva_throttle_enabled 1
vfs.zfs.zio.exclude_metadata 0
vfs.zfs.zio.use_uma 1
vfs.zfs.cache_flush_disable 0
vfs.zfs.zil_replay_disable 0
vfs.zfs.min_auto_ashift 12
vfs.zfs.max_auto_ashift 13
vfs.zfs.vdev.trim_max_pending 10000
vfs.zfs.vdev.bio_delete_disable 0
vfs.zfs.vdev.bio_flush_disable 0
vfs.zfs.vdev.queue_depth_pct 1000
vfs.zfs.vdev.write_gap_limit 4096
vfs.zfs.vdev.read_gap_limit 32768
vfs.zfs.vdev.aggregation_limit 131072
vfs.zfs.vdev.trim_max_active 64
vfs.zfs.vdev.trim_min_active 1
vfs.zfs.vdev.scrub_max_active 60
vfs.zfs.vdev.scrub_min_active 1
vfs.zfs.vdev.async_write_max_active 100
vfs.zfs.vdev.async_write_min_active 10
vfs.zfs.vdev.async_read_max_active 60
vfs.zfs.vdev.async_read_min_active 10
vfs.zfs.vdev.sync_write_max_active 200
vfs.zfs.vdev.sync_write_min_active 100
vfs.zfs.vdev.sync_read_max_active 100
vfs.zfs.vdev.sync_read_min_active 100
vfs.zfs.vdev.max_active 1000
vfs.zfs.vdev.async_write_active_max_dirty_percent60
vfs.zfs.vdev.async_write_active_min_dirty_percent30
vfs.zfs.vdev.mirror.non_rotating_seek_inc1
vfs.zfs.vdev.mirror.non_rotating_inc 0
vfs.zfs.vdev.mirror.rotating_seek_offset1048576
vfs.zfs.vdev.mirror.rotating_seek_inc 5
vfs.zfs.vdev.mirror.rotating_inc 0
vfs.zfs.vdev.trim_on_init 1
vfs.zfs.vdev.cache.bshift 16
vfs.zfs.vdev.cache.size 4194304
vfs.zfs.vdev.cache.max 65536
vfs.zfs.vdev.metaslabs_per_vdev 200
vfs.zfs.txg.timeout 4
vfs.zfs.space_map_blksz 4096
vfs.zfs.spa_min_slop 134217728
vfs.zfs.spa_slop_shift 5
vfs.zfs.spa_asize_inflation 24
vfs.zfs.deadman_enabled 0
vfs.zfs.deadman_checktime_ms 5000
vfs.zfs.deadman_synctime_ms 1000000
vfs.zfs.debug_flags 0
vfs.zfs.recover 0
vfs.zfs.spa_load_verify_data 1
vfs.zfs.spa_load_verify_metadata 1
vfs.zfs.spa_load_verify_maxinflight 10000
vfs.zfs.ccw_retry_interval 300
vfs.zfs.check_hostid 1
vfs.zfs.mg_fragmentation_threshold 85
vfs.zfs.mg_noalloc_threshold 0
vfs.zfs.condense_pct 200
vfs.zfs.metaslab.bias_enabled 1
vfs.zfs.metaslab.lba_weighting_enabled 1
vfs.zfs.metaslab.fragmentation_factor_enabled1
vfs.zfs.metaslab.preload_enabled 1
vfs.zfs.metaslab.preload_limit 3
vfs.zfs.metaslab.unload_delay 8
vfs.zfs.metaslab.load_pct 50
vfs.zfs.metaslab.min_alloc_size 33554432
vfs.zfs.metaslab.df_free_pct 4
vfs.zfs.metaslab.df_alloc_threshold 131072
vfs.zfs.metaslab.debug_unload 0
vfs.zfs.metaslab.debug_load 0
vfs.zfs.metaslab.fragmentation_threshold70
vfs.zfs.metaslab.gang_bang 16777217
vfs.zfs.free_bpobj_enabled 1
vfs.zfs.free_max_blocks -1
vfs.zfs.no_scrub_prefetch 0
vfs.zfs.no_scrub_io 0
vfs.zfs.resilver_min_time_ms 3000
vfs.zfs.free_min_time_ms 1000
vfs.zfs.scan_min_time_ms 1000
vfs.zfs.scan_idle 50
vfs.zfs.scrub_delay 4
vfs.zfs.resilver_delay 2
vfs.zfs.top_maxinflight 32
vfs.zfs.zfetch.array_rd_sz 1048576
vfs.zfs.zfetch.max_distance 8388608
vfs.zfs.zfetch.min_sec_reap 2
vfs.zfs.zfetch.max_streams 64
vfs.zfs.prefetch_disable 0
vfs.zfs.delay_scale 500000
vfs.zfs.delay_min_dirty_percent 60
vfs.zfs.dirty_data_sync 67108864
vfs.zfs.dirty_data_max_percent 10
vfs.zfs.dirty_data_max_max 4294967296
vfs.zfs.dirty_data_max 4294967296
vfs.zfs.max_recordsize 1048576
vfs.zfs.send_holes_without_birth_time 1
vfs.zfs.mdcomp_disable 0
vfs.zfs.nopwrite_enabled 1
vfs.zfs.dedup.prefetch 1
vfs.zfs.l2c_only_size 0
vfs.zfs.mfu_ghost_data_esize 62297529344
vfs.zfs.mfu_ghost_metadata_esize 0
vfs.zfs.mfu_ghost_size 62297529344
vfs.zfs.mfu_data_esize 66706433536
vfs.zfs.mfu_metadata_esize 2435194880
vfs.zfs.mfu_size 69802579968
vfs.zfs.mru_ghost_data_esize 60685495808
vfs.zfs.mru_ghost_metadata_esize 0
vfs.zfs.mru_ghost_size 60685495808
vfs.zfs.mru_data_esize 49709753856
vfs.zfs.mru_metadata_esize 1613580288
vfs.zfs.mru_size 51551468032
vfs.zfs.anon_data_esize 0
vfs.zfs.anon_metadata_esize 0
vfs.zfs.anon_size 6747136
vfs.zfs.l2arc_norw 0
vfs.zfs.l2arc_feed_again 1
vfs.zfs.l2arc_noprefetch 0
vfs.zfs.l2arc_feed_min_ms 200
vfs.zfs.l2arc_feed_secs 1
vfs.zfs.l2arc_headroom 32
vfs.zfs.l2arc_write_boost 268435456
vfs.zfs.l2arc_write_max 67108864
vfs.zfs.arc_meta_limit 77309411328
vfs.zfs.arc_free_target 254980
vfs.zfs.compressed_arc_enabled 1
vfs.zfs.arc_shrink_shift 7
vfs.zfs.arc_average_blocksize 8192
vfs.zfs.arc_min 18694015488
vfs.zfs.arc_max 149552123904
------------------------------------------------------------------------
Created attachment 180167 [details] dtrace script (In reply to Lev A. Serebryakov from comment #11) Lev, assuming that you are still observing the condition, could you please run the attached DTrace script for a few minutes and attach its output to this bug? Also, is your kernel + zfs compiled with INVARIANTS? If not, it might be worthwhile enabling that option to increase chances of catching the bug. (In reply to Andriy Gapon from comment #13) No INVARIANTS, but I'll add them today. And I'll run script and return with results. Looks like dtrace -s doesn't like this->tail = (arc_buf_hdr_t *)((char*)this->tail_ - this->offset); and this->head = (arc_buf_hdr_t *)((char*)this->head_ - this->offset); Should I add some include files? You mean it doesn't like the types? Try writing them as `arc_buf_hdr_t (note the leading backtick). Error is: dtrace: failed to compile script l2arc.d: line 13: syntax error near ")" If I comment-out line 13 same is on line 15. Parenthesis look balanced for me! Backtick doesn't help. Please try to write them as zfs.ko`arc_buf_hdr_t or kernel`arc_buf_hdr_t. Same error, both with zfs.ko` and kernel` Can you see arc_buf_hdr_t in output of 'ctfdump -t /boot/kernel/zfs.ko' ? How about arc_buf_hdr? Could you try to use (struct zfs.ko`arc_buf_hdr *) instead of (arc_buf_hdr_t *)? I assume you use ZFS as a module? Yes, ZFS is used as module. % kldstat Id Refs Address Size Name 1 50 0xffffffff80200000 d7afc0 kernel 2 1 0xffffffff80f7c000 300088 zfs.ko 3 11 0xffffffff8127d000 ab00 opensolaris.ko 4 1 0xffffffff81411000 3318b linux.ko 5 3 0xffffffff81445000 2b9e linux_common.ko 6 1 0xffffffff81448000 2e845 linux64.ko 7 1 0xffffffff81477000 4192 linprocfs.ko 8 1 0xffffffff8147c000 357 dtraceall.ko 9 9 0xffffffff8147d000 389c6 dtrace.ko 10 1 0xffffffff814b6000 623 dtmalloc.ko 11 1 0xffffffff814b7000 18ba dtnfscl.ko 12 1 0xffffffff814b9000 1dcb fbt.ko 13 1 0xffffffff814bb000 531c1 fasttrap.ko 14 1 0xffffffff8150f000 b9f sdt.ko 15 1 0xffffffff81510000 6dc4 systrace.ko 16 1 0xffffffff81517000 6d24 systrace_freebsd32.ko 17 1 0xffffffff8151e000 fb7 profile.ko % ctfdump -t /boot/kernel/zfs.ko /boot/kernel/zfs.ko does not contain .SUNW_ctf data % cat /etc/src.conf BATCH_DELETE_OLD_FILES=yes WITHOUT_TESTS=yes % "struct zfs.ko`arc_buf_hdr" gives antoher error: dtrace: failed to compile script l2arc.d: line 1: probe description fbt::l2arc_write_buffers:entry does not match any probes % > sudo dtrace -l | grep l2arc 27809 fbt zfs l2arc_do_free_on_write entry 27810 fbt zfs l2arc_evict entry 27811 fbt zfs l2arc_evict return 27812 fbt zfs l2arc_feed_thread entry 27813 fbt zfs l2arc_read_done entry 27814 fbt zfs l2arc_write_done entry 29273 fbt zfs l2arc_init entry 29731 fbt zfs l2arc_stop entry 30261 fbt zfs l2arc_remove_vdev entry 31154 fbt zfs l2arc_vdev_present entry 31155 fbt zfs l2arc_vdev_present return 31189 fbt zfs l2arc_fini entry 32113 fbt zfs l2arc_start entry 32114 fbt zfs l2arc_start return 32334 fbt zfs l2arc_add_vdev entry 34552 sdt zfs none l2arc-hit 34553 sdt zfs none l2arc-read 34554 sdt zfs none l2arc-miss 34565 sdt zfs none l2arc-evict 34566 sdt zfs none l2arc-write 34567 sdt zfs none l2arc-iodone % Should it be "l2arc_write_done"? But there is no "l2arc__write_done:return"! (In reply to Lev A. Serebryakov from comment #21) > Should it be "l2arc_write_done"? No... the function probably got inlined. Sorry then, I can't help you with this DTrace script. I could rebuild module with debug options and without optimization… Ok, dtrace is no option for now, but INVARIANTS give me panic very quickly: Panic String: solaris assert: write_psize <= target_sz (0x803000 <= 0x800000), file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c, line: 7140 I have core & all this stuff, of course, and could provide additional info. But for now I'm turning off L2ARC as I need stable system/ (In reply to Lev A. Serebryakov from comment #24) Could you please send me a copy of your arc.c file exactly as it is now in your source tree? Sent by e-mail. Created attachment 180235 [details]
the patch
Re-upload the patch to try to fix dos new lines.
Lev, could you please follow up on what you see with the latest patch? Thank you! After 2 days with properly applied patch it seems to work! I have 100% fill of L2ARC for about 36 hours already, no overfill and no checksum errors (!!!). I'n very sorry about mis-applied patch which lead to such long investigation. A commit references this bug: Author: avg Date: Sat Feb 25 17:03:49 UTC 2017 New revision: 314274 URL: https://svnweb.freebsd.org/changeset/base/314274 Log: l2arc: try to fix write size calculation broken by Compressed ARC commit While there, make a change to not evict a first buffer outside the requested eviciton range. To do: - give more consistent names to the size variables - upstream to OpenZFS PR: 216178 Reported by: lev Tested by: lev MFC after: 2 weeks Changes: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c (In reply to Lev A. Serebryakov from comment #29) No worries. Thank you for your persistence with testing! Now L2ARC is Ok, but ARC is 19G on system with 16G of RAM (and swap is empty). Looks like ARC has same problem. and, no, my data is not compressible, and here are a lot of programs (like Java instance with 2G heap!) is running and not swapped-out, so some memory is taken not by ARC. (In reply to Lev A. Serebryakov from comment #32) Lev, it's completely different problem. You can open a new bug report for it if you'd like, no need to discuss it here. Please also see this https://github.com/openzfs/openzfs/pull/300 Here will be new bug, not from me, but from friend of mine, who have very nice graphs from monitoring system which shows huge impact of this bug on performance. MFCed as base r315072 and base r315073. |
I have two ZFS pools: one with single device (zroot) and one raidz with 5 devices (zstor). zstor has L2ARC of 185GB. % zpool list -v NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT zroot 39.9G 13.7G 26.2G - 50% 34% 1.00x ONLINE - gpt/root 39.9G 13.7G 26.2G - 50% 34% zstor 13.6T 8.55T 5.08T - 22% 62% 1.00x ONLINE - raidz1 13.6T 8.55T 5.08T - 22% 62% ada1 - - - - - - ada2 - - - - - - ada3 - - - - - - ada4 - - - - - - ada5 - - - - - - cache - - - - - - gpt/l2arc 185G 185G 148M - 0% 99% % Both pools has compression enabled (lz4), but compression ratio on zstor is negligible (it contains mostly media files, like music, films and RAW digital photos). zroot has compression rate about 1.4: % zfs get compressratio zroot zstor NAME PROPERTY VALUE SOURCE zroot compressratio 1.43x - zstor compressratio 1.00x - % My system has 16GB of physical memory. After upgrade to post-r307266 system (10-STABLE) different system tools starts to show unrealistically large ARC. Upgrade to 11-STABLE (after r307265) doen't help either. It looks like this: (1) top output: Mem: 195M Active, 4678M Inact, 9656M Wired, 1373M Free ARC: 75G Total, 247M MFU, 6135M MRU, 1190K Anon, 77M Header, 556M Other Swap: 8192M Total, 8192M Free (2) zfs-stats -A output: ARC Size: 515.61% 74.92 GiB Target Size: (Adaptive) 100.00% 14.53 GiB Min Size (Hard Limit): 12.50% 1.82 GiB Max Size (High Water): 8:1 14.53 GiB This starts after some uptime with disc activity (like buildworld & backup) and could be reset to "normal" state with reboot only. Sometimes, L2ARC starts to grow up to 400% of its size, too. L2ARC shows a lot of checksum errors as it grow larger than physical size.