Bug 216178 - ZFS ARC and L2ARC are unrealistically large, maybe after r307265
Summary: ZFS ARC and L2ARC are unrealistically large, maybe after r307265
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.0-STABLE
Hardware: Any Any
: --- Affects Some People
Assignee: Andriy Gapon
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-01-17 12:32 UTC by Lev A. Serebryakov
Modified: 2017-03-11 16:05 UTC (History)
6 users (show)

See Also:


Attachments
the local patch (2.66 KB, patch)
2017-02-12 20:12 UTC, Andriy Gapon
no flags Details | Diff
dtrace script (1.02 KB, text/plain)
2017-02-20 16:11 UTC, Andriy Gapon
no flags Details
the patch (2.59 KB, patch)
2017-02-22 23:26 UTC, Andriy Gapon
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Lev A. Serebryakov freebsd_committer 2017-01-17 12:32:28 UTC
I have two ZFS pools: one with single device (zroot) and one raidz with 5 devices (zstor). zstor has L2ARC of 185GB.

% zpool list -v
NAME          SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
zroot        39.9G  13.7G  26.2G         -    50%    34%  1.00x  ONLINE  -
  gpt/root   39.9G  13.7G  26.2G         -    50%    34%
zstor        13.6T  8.55T  5.08T         -    22%    62%  1.00x  ONLINE  -
  raidz1     13.6T  8.55T  5.08T         -    22%    62%
    ada1         -      -      -         -      -      -
    ada2         -      -      -         -      -      -
    ada3         -      -      -         -      -      -
    ada4         -      -      -         -      -      -
    ada5         -      -      -         -      -      -
cache            -      -      -         -      -      -
  gpt/l2arc   185G   185G   148M         -     0%    99%
%

Both pools has compression enabled (lz4), but compression ratio on zstor is negligible (it contains mostly media files, like music, films and RAW digital photos). zroot has compression rate about 1.4:

% zfs get compressratio zroot zstor
NAME   PROPERTY       VALUE  SOURCE
zroot  compressratio  1.43x  -
zstor  compressratio  1.00x  -
%

My system has 16GB of physical memory.

After upgrade to post-r307266 system (10-STABLE) different system tools starts to show unrealistically large ARC. Upgrade to 11-STABLE (after r307265) doen't help either. It looks like this:

(1) top output:
Mem: 195M Active, 4678M Inact, 9656M Wired, 1373M Free
ARC: 75G Total, 247M MFU, 6135M MRU, 1190K Anon, 77M Header, 556M Other
Swap: 8192M Total, 8192M Free

(2) zfs-stats -A output:
ARC Size:                               515.61% 74.92   GiB
        Target Size: (Adaptive)         100.00% 14.53   GiB
        Min Size (Hard Limit):          12.50%  1.82    GiB
        Max Size (High Water):          8:1     14.53   GiB


This starts after some uptime with disc activity (like buildworld & backup) and could be reset to "normal" state with reboot only.

Sometimes, L2ARC starts to grow up to 400% of its size, too. L2ARC shows a lot of checksum errors as it grow larger than physical size.
Comment 1 Lev A. Serebryakov freebsd_committer 2017-01-23 14:27:23 UTC
Maybe, related to bug #216364
Comment 2 Andriy Gapon freebsd_committer 2017-01-30 15:36:29 UTC
(In reply to Lev A. Serebryakov from comment #0)

Lev,

I haven't forgotten about this issue.
Unfortunately, I can not devote as much time as I would like to it.
I revisited it again and I couldn't spot anything new besides what my old patch was supposed to fix:
https://docs.freebsd.org/cgi/getmsg.cgi?fetch=24428+0+archive/2016/freebsd-fs/20161106.freebsd-fs

I know that George Wilson was going to work on this problem.  Perhaps, he has something for you to test or debug.
Comment 3 Andriy Gapon freebsd_committer 2017-02-07 22:01:54 UTC
Lev,

I couldn't reproduce your problem using FreeBSD head and a similar configuration (raidz + l2arc) and couldn't reproduce the problem with L2ARC accounting as much I tried.
Comment 4 Lev A. Serebryakov freebsd_committer 2017-02-08 13:09:05 UTC
I could provide any additional information, including dumping kernel structures with debugger and such.

Now it shows:

% zpool list -v
NAME          SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
zroot        39.9G  14.6G  25.3G         -    49%    36%  1.00x  ONLINE  -
  gpt/root   39.9G  14.6G  25.3G         -    49%    36%
zstor        13.6T  9.01T  4.62T         -    24%    66%  1.00x  ONLINE  -
  raidz1     13.6T  9.01T  4.62T         -    24%    66%
    ada1         -      -      -         -      -      -
    ada2         -      -      -         -      -      -
    ada3         -      -      -         -      -      -
    ada4         -      -      -         -      -      -
    ada5         -      -      -         -      -      -
cache            -      -      -         -      -      -
  gpt/l2arc   185G  1.05T  16.0E         -     0%   581%
%

which looks completely unrealistic, as it is more than 4x compression!
Comment 5 Andriy Gapon freebsd_committer 2017-02-08 13:25:36 UTC
(In reply to Lev A. Serebryakov from comment #4)
It has nothing to do with compression that the numbers are for the physical / actual space on disk.  It's a problem with the space accounting.
I am not sure if the bad accounting and the checksum errors have the same cause or if the checksum errors somehow cause the bad accounting.
Is there a chance that you could use a different disk for the cache device?
Comment 6 Lev A. Serebryakov freebsd_committer 2017-02-08 13:34:19 UTC
I could use another 750 EVO disk, whole 120GB one instead of partition on 250GB, in day or two (I'm waiting for additional HBA right now!).

Maybe, checksum errors is result of TRIM errors? Now this SSD doesn't have any quirks in our (FreeBSD) sources, but its "big brothers" (like 850 EVO and 850 PRO) are marked with "no TRIM with NCQ" (see discussion about this SSD and its quirks here: https://reviews.freebsd.org/D9478). But ZFS (zroot pool) on same device (other partition) doesn't show any errors, I've checked it with "scrub".
Comment 7 Andriy Gapon freebsd_committer 2017-02-08 13:39:59 UTC
(In reply to Lev A. Serebryakov from comment #6)
Hard to tell.  To be honest the TRIM code for L2ARC is rather useless. I would remove it altogether to exclude its effects.
Comment 8 Ben RUBSON 2017-02-09 14:30:16 UTC
I also faced this "16.0E" issue (without compression enabled) :
https://www.illumos.org/issues/7410
Comment 9 Andriy Gapon freebsd_committer 2017-02-12 20:12:56 UTC
Created attachment 179917 [details]
the local patch

Just to clarify, I am using head plus the local patch, not a vanilla head.
The patch should be the same as the one I sent you earlier.
Comment 10 Lev A. Serebryakov freebsd_committer 2017-02-19 11:13:08 UTC
(In reply to Andriy Gapon from comment #9)
Ok, I've replaced L2ARC to whole Samsung 850 EVO 250 SSD on "mpr" (LSI-3008) HBA and rebuilt system from r313940 (stable/11) + this patch. Lets see!
Comment 11 Lev A. Serebryakov freebsd_committer 2017-02-20 11:45:44 UTC
Same result.

11.0-STABLE FreeBSD 11.0-STABLE #8 r313940M: Sun Feb 19 15:16:42 MSK 2017
Attached patch was applied!

% sudo smartctl -A /dev/da5
smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.0-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Samsung based SSDs
Device Model:     Samsung SSD 850 EVO 250GB
Serial Number:    S2R4NB0J115081X
LU WWN Device Id: 5 002538 d41a03ed9
Firmware Version: EMT02B6Q
User Capacity:    250,059,350,016 bytes [250 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4c
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Feb 20 14:41:13 2017 MSK
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
...
% zpool list -v
NAME         SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
zroot       95.5G  18.1G  77.4G         -    16%    18%  1.00x  ONLINE  -
  gpt/root  95.5G  18.1G  77.4G         -    16%    18%
zstor       13.6T  9.15T  4.48T         -    25%    67%  1.00x  ONLINE  -
  raidz1    13.6T  9.15T  4.48T         -    25%    67%
    da1         -      -      -         -      -      -
    da0         -      -      -         -      -      -
    da2         -      -      -         -      -      -
    da3         -      -      -         -      -      -
    da4         -      -      -         -      -      -
cache           -      -      -         -      -      -
  da5        233G   526G  16.0E         -     0%   225%
% sysctl -a | grep l2
kern.features.linuxulator_v4l2: 1
kern.cam.ctl2cam.max_sense: 252
vfs.zfs.l2c_only_size: 0
vfs.zfs.l2arc_norw: 1
vfs.zfs.l2arc_feed_again: 1
vfs.zfs.l2arc_noprefetch: 1
vfs.zfs.l2arc_feed_min_ms: 200
vfs.zfs.l2arc_feed_secs: 1
vfs.zfs.l2arc_headroom: 2
vfs.zfs.l2arc_write_boost: 8388608
vfs.zfs.l2arc_write_max: 8388608
vfs.cache.numfullpathfail2: 0
kstat.zfs.misc.arcstats.l2_write_buffer_list_null_iter: 4522
kstat.zfs.misc.arcstats.l2_write_buffer_list_iter: 567486
kstat.zfs.misc.arcstats.l2_write_buffer_bytes_scanned: 10356335975424
kstat.zfs.misc.arcstats.l2_write_pios: 131233
kstat.zfs.misc.arcstats.l2_write_buffer_iter: 142392
kstat.zfs.misc.arcstats.l2_write_full: 34822
kstat.zfs.misc.arcstats.l2_write_not_cacheable: 6310602
kstat.zfs.misc.arcstats.l2_write_io_in_progress: 376
kstat.zfs.misc.arcstats.l2_write_in_l2: 63312026
kstat.zfs.misc.arcstats.l2_write_spa_mismatch: 3663622
kstat.zfs.misc.arcstats.l2_write_passed_headroom: 255811
kstat.zfs.misc.arcstats.l2_write_trylock_fail: 6901
kstat.zfs.misc.arcstats.l2_padding_needed: 0
kstat.zfs.misc.arcstats.l2_hdr_size: 93854992
kstat.zfs.misc.arcstats.l2_asize: 565424867840
kstat.zfs.misc.arcstats.l2_size: 567082335232
kstat.zfs.misc.arcstats.l2_io_error: 0
kstat.zfs.misc.arcstats.l2_cksum_bad: 430864
kstat.zfs.misc.arcstats.l2_abort_lowmem: 3
kstat.zfs.misc.arcstats.l2_free_on_write: 273
kstat.zfs.misc.arcstats.l2_evict_l1cached: 69699
kstat.zfs.misc.arcstats.l2_evict_reading: 0
kstat.zfs.misc.arcstats.l2_evict_lock_retry: 10
kstat.zfs.misc.arcstats.l2_writes_lock_retry: 4
kstat.zfs.misc.arcstats.l2_writes_error: 0
kstat.zfs.misc.arcstats.l2_writes_done: 131233
kstat.zfs.misc.arcstats.l2_writes_sent: 131233
kstat.zfs.misc.arcstats.l2_write_bytes: 648248758272
kstat.zfs.misc.arcstats.l2_read_bytes: 326535054336
kstat.zfs.misc.arcstats.l2_rw_clash: 0
kstat.zfs.misc.arcstats.l2_feeds: 142392
kstat.zfs.misc.arcstats.l2_misses: 3610592
kstat.zfs.misc.arcstats.l2_hits: 1151244
kstat.zfs.misc.arcstats.evict_l2_skip: 0
kstat.zfs.misc.arcstats.evict_l2_ineligible: 351227303936
kstat.zfs.misc.arcstats.evict_l2_eligible: 21983154688
kstat.zfs.misc.arcstats.evict_l2_cached: 1323198704640
%

kstat.zfs.misc.arcstats.l2_cksum_bad starts to rise right after L2ARC "overfill"!
Comment 12 Rémi Guyomarch 2017-02-20 15:37:24 UTC
Same thing here, running 10.3-STABLE r313140.
It did NOT happen on r301989.

This is a large virtual NAS, offering both NFSv3 and SMB shares. Cache devices are also virtualized, TRIM isn't running here.


# zpool list -v tank
NAME         SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
tank         288T   174T   114T         -    19%    60%  1.00x  ONLINE  -
  raidz2    48,0T  28,9T  19,0T         -    19%    60%
    da9         -      -      -         -      -      -
    da10        -      -      -         -      -      -
    da11        -      -      -         -      -      -
    da12        -      -      -         -      -      -
    da13        -      -      -         -      -      -
    da14        -      -      -         -      -      -
  raidz2    48,0T  28,9T  19,0T         -    19%    60%
    da15        -      -      -         -      -      -
    da16        -      -      -         -      -      -
    da17        -      -      -         -      -      -
    da18        -      -      -         -      -      -
    da19        -      -      -         -      -      -
    da20        -      -      -         -      -      -
  raidz2    48,0T  28,9T  19,0T         -    19%    60%
    da21        -      -      -         -      -      -
    da22        -      -      -         -      -      -
    da23        -      -      -         -      -      -
    da24        -      -      -         -      -      -
    da25        -      -      -         -      -      -
    da26        -      -      -         -      -      -
  raidz2    48,0T  29,0T  19,0T         -    19%    60%
    da27        -      -      -         -      -      -
    da28        -      -      -         -      -      -
    da29        -      -      -         -      -      -
    da30        -      -      -         -      -      -
    da31        -      -      -         -      -      -
    da32        -      -      -         -      -      -
  raidz2    48,0T  28,9T  19,0T         -    19%    60%
    da33        -      -      -         -      -      -
    da34        -      -      -         -      -      -
    da35        -      -      -         -      -      -
    da36        -      -      -         -      -      -
    da37        -      -      -         -      -      -
    da38        -      -      -         -      -      -
  raidz2    48,0T  28,9T  19,0T         -    19%    60%
    da39        -      -      -         -      -      -
    da40        -      -      -         -      -      -
    da41        -      -      -         -      -      -
    da42        -      -      -         -      -      -
    da43        -      -      -         -      -      -
    da44        -      -      -         -      -      -
log             -      -      -         -      -      -
  mirror    2,98G  2,01M  2,98G         -    20%     0%
    da1         -      -      -         -      -      -
    da2         -      -      -         -      -      -
cache           -      -      -         -      -      -
  da3        256G   764G  16,0E         -     0%   298%
  da4        256G   757G  16,0E         -     0%   295%
  da5        256G   762G  16,0E         -     0%   297%
  da6        256G   747G  16,0E         -     0%   291%
  da7        256G   776G  16,0E         -     0%   303%
  da8        256G   743G  16,0E         -     0%   290%


# zfs-stats -a

------------------------------------------------------------------------
ZFS Subsystem Report                            Mon Feb 20 16:33:13 2017
------------------------------------------------------------------------

System Information:

        Kernel Version:                         1003511 (osreldate)
        Hardware Platform:                      amd64
        Processor Architecture:                 amd64

        ZFS Storage pool Version:               5000
        ZFS Filesystem Version:                 5

FreeBSD 10.3-STABLE #2 r313140M: Fri Feb 3 09:38:12 CET 2017 root
16:33  up 7 days,  8:32, 1 user, load averages: 0,16 0,37 0,55

------------------------------------------------------------------------

System Memory:

        0.00%   5.28    MiB Active,     0.47%   670.72  MiB Inact
        89.61%  125.70  GiB Wired,      0.00%   0 Cache
        9.92%   13.92   GiB Free,       0.00%   4.00    KiB Gap

        Real Installed:                         160.00  GiB
        Real Available:                 89.98%  143.97  GiB
        Real Managed:                   97.44%  140.28  GiB

        Logical Total:                          160.00  GiB
        Logical Used:                   90.89%  145.42  GiB
        Logical Free:                   9.11%   14.58   GiB

Kernel Memory:                                  1.30    GiB
        Data:                           97.94%  1.27    GiB
        Text:                           2.06%   27.29   MiB

Kernel Memory Map:                              140.28  GiB
        Size:                           82.86%  116.24  GiB
        Free:                           17.14%  24.04   GiB

------------------------------------------------------------------------

ARC Summary: (HEALTHY)
        Memory Throttle Count:                  0

ARC Misc:
        Deleted:                                23.76m
        Recycle Misses:                         0
        Mutex Misses:                           36.19k
        Evict Skips:                            6.43k

ARC Size:                               83.09%  115.73  GiB
        Target Size: (Adaptive)         83.11%  115.76  GiB
        Min Size (Hard Limit):          12.50%  17.41   GiB
        Max Size (High Water):          8:1     139.28  GiB

ARC Size Breakdown:
        Recently Used Cache Size:       62.61%  72.48   GiB
        Frequently Used Cache Size:     37.39%  43.28   GiB

ARC Hash Breakdown:
        Elements Max:                           14.41m
        Elements Current:               98.47%  14.19m
        Collisions:                             16.02m
        Chain Max:                              7
        Chains:                                 2.27m

------------------------------------------------------------------------

ARC Efficiency:                                 3.28b
        Cache Hit Ratio:                18.94%  620.69m
        Cache Miss Ratio:               81.06%  2.66b
        Actual Hit Ratio:               5.26%   172.47m

        Data Demand Efficiency:         30.02%  138.53m
        Data Prefetch Efficiency:       82.77%  124.81m

        CACHE HITS BY CACHE LIST:
          Anonymously Used:             71.34%  442.81m
          Most Recently Used:           2.11%   13.07m
          Most Frequently Used:         25.68%  159.41m
          Most Recently Used Ghost:     0.02%   102.06k
          Most Frequently Used Ghost:   0.86%   5.31m

        CACHE HITS BY DATA TYPE:
          Demand Data:                  6.70%   41.58m
          Prefetch Data:                16.64%  103.30m
          Demand Metadata:              3.92%   24.33m
          Prefetch Metadata:            72.74%  451.49m

        CACHE MISSES BY DATA TYPE:
          Demand Data:                  3.65%   96.95m
          Prefetch Data:                0.81%   21.51m
          Demand Metadata:              95.51%  2.54b
          Prefetch Metadata:            0.03%   880.35k

------------------------------------------------------------------------

L2 ARC Summary: (DEGRADED)
        Passed Headroom:                        975.75k
        Tried Lock Failures:                    121.14m
        IO In Progress:                         5
        Low Memory Aborts:                      217
        Free on Write:                          181.66k
        Writes While Full:                      46.56k
        R/W Clashes:                            0
        Bad Checksums:                          3.00m
        IO Errors:                              0
        SPA Mismatch:                           1.97b

L2 ARC Size: (Adaptive)                         6.82    TiB
        Header Size:                    0.01%   973.67  MiB

L2 ARC Evicts:
        Lock Retries:                           120
        Upon Reading:                           0

L2 ARC Breakdown:                               2.66b
        Hit Ratio:                      0.61%   16.16m
        Miss Ratio:                     99.39%  2.64b
        Feeds:                                  688.84k

L2 ARC Buffer:
        Bytes Scanned:                          5.87    PiB
        Buffer Iterations:                      688.84k
        List Iterations:                        2.75m
        NULL List Iterations:                   2.97k

L2 ARC Writes:
        Writes Sent:                    100.00% 365.54k

------------------------------------------------------------------------

File-Level Prefetch: (HEALTHY)

DMU Efficiency:                                 16.05b
        Hit Ratio:                      2.21%   354.01m
        Miss Ratio:                     97.79%  15.69b

        Colinear:                               0
          Hit Ratio:                    100.00% 0
          Miss Ratio:                   100.00% 0

        Stride:                                 0
          Hit Ratio:                    100.00% 0
          Miss Ratio:                   100.00% 0

DMU Misc:
        Reclaim:                                0
          Successes:                    100.00% 0
          Failures:                     100.00% 0

        Streams:                                0
          +Resets:                      100.00% 0
          -Resets:                      100.00% 0
          Bogus:                                0

------------------------------------------------------------------------

VDEV Cache Summary:                             5.56m
        Hit Ratio:                      22.19%  1.23m
        Miss Ratio:                     65.26%  3.63m
        Delegations:                    12.55%  696.99k

------------------------------------------------------------------------

ZFS Tunables (sysctl):
        kern.maxusers                           9550
        vm.kmem_size                            150625865728
        vm.kmem_size_scale                      1
        vm.kmem_size_min                        0
        vm.kmem_size_max                        1319413950874
        vfs.zfs.trim.max_interval               1
        vfs.zfs.trim.timeout                    30
        vfs.zfs.trim.txg_delay                  32
        vfs.zfs.trim.enabled                    0
        vfs.zfs.vol.unmap_enabled               1
        vfs.zfs.vol.mode                        1
        vfs.zfs.version.zpl                     5
        vfs.zfs.version.spa                     5000
        vfs.zfs.version.acl                     1
        vfs.zfs.version.ioctl                   7
        vfs.zfs.debug                           0
        vfs.zfs.super_owner                     0
        vfs.zfs.sync_pass_rewrite               2
        vfs.zfs.sync_pass_dont_compress         5
        vfs.zfs.sync_pass_deferred_free         2
        vfs.zfs.zio.dva_throttle_enabled        1
        vfs.zfs.zio.exclude_metadata            0
        vfs.zfs.zio.use_uma                     1
        vfs.zfs.cache_flush_disable             0
        vfs.zfs.zil_replay_disable              0
        vfs.zfs.min_auto_ashift                 12
        vfs.zfs.max_auto_ashift                 13
        vfs.zfs.vdev.trim_max_pending           10000
        vfs.zfs.vdev.bio_delete_disable         0
        vfs.zfs.vdev.bio_flush_disable          0
        vfs.zfs.vdev.queue_depth_pct            1000
        vfs.zfs.vdev.write_gap_limit            4096
        vfs.zfs.vdev.read_gap_limit             32768
        vfs.zfs.vdev.aggregation_limit          131072
        vfs.zfs.vdev.trim_max_active            64
        vfs.zfs.vdev.trim_min_active            1
        vfs.zfs.vdev.scrub_max_active           60
        vfs.zfs.vdev.scrub_min_active           1
        vfs.zfs.vdev.async_write_max_active     100
        vfs.zfs.vdev.async_write_min_active     10
        vfs.zfs.vdev.async_read_max_active      60
        vfs.zfs.vdev.async_read_min_active      10
        vfs.zfs.vdev.sync_write_max_active      200
        vfs.zfs.vdev.sync_write_min_active      100
        vfs.zfs.vdev.sync_read_max_active       100
        vfs.zfs.vdev.sync_read_min_active       100
        vfs.zfs.vdev.max_active                 1000
        vfs.zfs.vdev.async_write_active_max_dirty_percent60
        vfs.zfs.vdev.async_write_active_min_dirty_percent30
        vfs.zfs.vdev.mirror.non_rotating_seek_inc1
        vfs.zfs.vdev.mirror.non_rotating_inc    0
        vfs.zfs.vdev.mirror.rotating_seek_offset1048576
        vfs.zfs.vdev.mirror.rotating_seek_inc   5
        vfs.zfs.vdev.mirror.rotating_inc        0
        vfs.zfs.vdev.trim_on_init               1
        vfs.zfs.vdev.cache.bshift               16
        vfs.zfs.vdev.cache.size                 4194304
        vfs.zfs.vdev.cache.max                  65536
        vfs.zfs.vdev.metaslabs_per_vdev         200
        vfs.zfs.txg.timeout                     4
        vfs.zfs.space_map_blksz                 4096
        vfs.zfs.spa_min_slop                    134217728
        vfs.zfs.spa_slop_shift                  5
        vfs.zfs.spa_asize_inflation             24
        vfs.zfs.deadman_enabled                 0
        vfs.zfs.deadman_checktime_ms            5000
        vfs.zfs.deadman_synctime_ms             1000000
        vfs.zfs.debug_flags                     0
        vfs.zfs.recover                         0
        vfs.zfs.spa_load_verify_data            1
        vfs.zfs.spa_load_verify_metadata        1
        vfs.zfs.spa_load_verify_maxinflight     10000
        vfs.zfs.ccw_retry_interval              300
        vfs.zfs.check_hostid                    1
        vfs.zfs.mg_fragmentation_threshold      85
        vfs.zfs.mg_noalloc_threshold            0
        vfs.zfs.condense_pct                    200
        vfs.zfs.metaslab.bias_enabled           1
        vfs.zfs.metaslab.lba_weighting_enabled  1
        vfs.zfs.metaslab.fragmentation_factor_enabled1
        vfs.zfs.metaslab.preload_enabled        1
        vfs.zfs.metaslab.preload_limit          3
        vfs.zfs.metaslab.unload_delay           8
        vfs.zfs.metaslab.load_pct               50
        vfs.zfs.metaslab.min_alloc_size         33554432
        vfs.zfs.metaslab.df_free_pct            4
        vfs.zfs.metaslab.df_alloc_threshold     131072
        vfs.zfs.metaslab.debug_unload           0
        vfs.zfs.metaslab.debug_load             0
        vfs.zfs.metaslab.fragmentation_threshold70
        vfs.zfs.metaslab.gang_bang              16777217
        vfs.zfs.free_bpobj_enabled              1
        vfs.zfs.free_max_blocks                 -1
        vfs.zfs.no_scrub_prefetch               0
        vfs.zfs.no_scrub_io                     0
        vfs.zfs.resilver_min_time_ms            3000
        vfs.zfs.free_min_time_ms                1000
        vfs.zfs.scan_min_time_ms                1000
        vfs.zfs.scan_idle                       50
        vfs.zfs.scrub_delay                     4
        vfs.zfs.resilver_delay                  2
        vfs.zfs.top_maxinflight                 32
        vfs.zfs.zfetch.array_rd_sz              1048576
        vfs.zfs.zfetch.max_distance             8388608
        vfs.zfs.zfetch.min_sec_reap             2
        vfs.zfs.zfetch.max_streams              64
        vfs.zfs.prefetch_disable                0
        vfs.zfs.delay_scale                     500000
        vfs.zfs.delay_min_dirty_percent         60
        vfs.zfs.dirty_data_sync                 67108864
        vfs.zfs.dirty_data_max_percent          10
        vfs.zfs.dirty_data_max_max              4294967296
        vfs.zfs.dirty_data_max                  4294967296
        vfs.zfs.max_recordsize                  1048576
        vfs.zfs.send_holes_without_birth_time   1
        vfs.zfs.mdcomp_disable                  0
        vfs.zfs.nopwrite_enabled                1
        vfs.zfs.dedup.prefetch                  1
        vfs.zfs.l2c_only_size                   0
        vfs.zfs.mfu_ghost_data_esize            62297529344
        vfs.zfs.mfu_ghost_metadata_esize        0
        vfs.zfs.mfu_ghost_size                  62297529344
        vfs.zfs.mfu_data_esize                  66706433536
        vfs.zfs.mfu_metadata_esize              2435194880
        vfs.zfs.mfu_size                        69802579968
        vfs.zfs.mru_ghost_data_esize            60685495808
        vfs.zfs.mru_ghost_metadata_esize        0
        vfs.zfs.mru_ghost_size                  60685495808
        vfs.zfs.mru_data_esize                  49709753856
        vfs.zfs.mru_metadata_esize              1613580288
        vfs.zfs.mru_size                        51551468032
        vfs.zfs.anon_data_esize                 0
        vfs.zfs.anon_metadata_esize             0
        vfs.zfs.anon_size                       6747136
        vfs.zfs.l2arc_norw                      0
        vfs.zfs.l2arc_feed_again                1
        vfs.zfs.l2arc_noprefetch                0
        vfs.zfs.l2arc_feed_min_ms               200
        vfs.zfs.l2arc_feed_secs                 1
        vfs.zfs.l2arc_headroom                  32
        vfs.zfs.l2arc_write_boost               268435456
        vfs.zfs.l2arc_write_max                 67108864
        vfs.zfs.arc_meta_limit                  77309411328
        vfs.zfs.arc_free_target                 254980
        vfs.zfs.compressed_arc_enabled          1
        vfs.zfs.arc_shrink_shift                7
        vfs.zfs.arc_average_blocksize           8192
        vfs.zfs.arc_min                         18694015488
        vfs.zfs.arc_max                         149552123904

------------------------------------------------------------------------
Comment 13 Andriy Gapon freebsd_committer 2017-02-20 16:11:44 UTC
Created attachment 180167 [details]
dtrace script

(In reply to Lev A. Serebryakov from comment #11)

Lev, assuming that you are still observing the condition, could you please run the attached DTrace script for a few minutes and attach its output to this bug?

Also, is your kernel + zfs compiled with INVARIANTS?
If not, it might be worthwhile enabling that option to increase chances of catching the bug.
Comment 14 Lev A. Serebryakov freebsd_committer 2017-02-22 09:55:01 UTC
(In reply to Andriy Gapon from comment #13)
No INVARIANTS, but I'll add them today. And I'll run script and return with results.
Comment 15 Lev A. Serebryakov freebsd_committer 2017-02-22 10:00:09 UTC
Looks like dtrace -s doesn't like

this->tail = (arc_buf_hdr_t *)((char*)this->tail_ - this->offset);

and

this->head = (arc_buf_hdr_t *)((char*)this->head_ - this->offset);

Should I add some include files?
Comment 16 Andriy Gapon freebsd_committer 2017-02-22 10:28:15 UTC
You mean it doesn't like the types?
Try writing them as `arc_buf_hdr_t (note the leading backtick).
Comment 17 Lev A. Serebryakov freebsd_committer 2017-02-22 10:42:19 UTC
Error is:

dtrace: failed to compile script l2arc.d: line 13: syntax error near ")"

If I comment-out line 13 same is on line 15. Parenthesis look balanced for me!

Backtick doesn't help.
Comment 18 Andriy Gapon freebsd_committer 2017-02-22 11:57:25 UTC
Please try to write them as zfs.ko`arc_buf_hdr_t or kernel`arc_buf_hdr_t.
Comment 19 Lev A. Serebryakov freebsd_committer 2017-02-22 12:06:14 UTC
Same error, both with zfs.ko` and kernel`
Comment 20 Andriy Gapon freebsd_committer 2017-02-22 12:14:10 UTC
Can you see arc_buf_hdr_t in output of 'ctfdump -t /boot/kernel/zfs.ko' ?
How about arc_buf_hdr?

Could you try to use (struct zfs.ko`arc_buf_hdr *) instead of (arc_buf_hdr_t *)?
I assume you use ZFS as a module?
Comment 21 Lev A. Serebryakov freebsd_committer 2017-02-22 12:31:48 UTC
Yes, ZFS is used as module.

% kldstat
Id Refs Address            Size     Name
 1   50 0xffffffff80200000 d7afc0   kernel
 2    1 0xffffffff80f7c000 300088   zfs.ko
 3   11 0xffffffff8127d000 ab00     opensolaris.ko
 4    1 0xffffffff81411000 3318b    linux.ko
 5    3 0xffffffff81445000 2b9e     linux_common.ko
 6    1 0xffffffff81448000 2e845    linux64.ko
 7    1 0xffffffff81477000 4192     linprocfs.ko
 8    1 0xffffffff8147c000 357      dtraceall.ko
 9    9 0xffffffff8147d000 389c6    dtrace.ko
10    1 0xffffffff814b6000 623      dtmalloc.ko
11    1 0xffffffff814b7000 18ba     dtnfscl.ko
12    1 0xffffffff814b9000 1dcb     fbt.ko
13    1 0xffffffff814bb000 531c1    fasttrap.ko
14    1 0xffffffff8150f000 b9f      sdt.ko
15    1 0xffffffff81510000 6dc4     systrace.ko
16    1 0xffffffff81517000 6d24     systrace_freebsd32.ko
17    1 0xffffffff8151e000 fb7      profile.ko
% ctfdump -t /boot/kernel/zfs.ko
/boot/kernel/zfs.ko does not contain .SUNW_ctf data
% cat /etc/src.conf
BATCH_DELETE_OLD_FILES=yes
WITHOUT_TESTS=yes
%

"struct zfs.ko`arc_buf_hdr" gives antoher error:

dtrace: failed to compile script l2arc.d: line 1: probe description fbt::l2arc_write_buffers:entry does not match any probes

% > sudo dtrace -l | grep l2arc
27809        fbt               zfs            l2arc_do_free_on_write entry
27810        fbt               zfs                       l2arc_evict entry
27811        fbt               zfs                       l2arc_evict return
27812        fbt               zfs                 l2arc_feed_thread entry
27813        fbt               zfs                   l2arc_read_done entry
27814        fbt               zfs                  l2arc_write_done entry
29273        fbt               zfs                        l2arc_init entry
29731        fbt               zfs                        l2arc_stop entry
30261        fbt               zfs                 l2arc_remove_vdev entry
31154        fbt               zfs                l2arc_vdev_present entry
31155        fbt               zfs                l2arc_vdev_present return
31189        fbt               zfs                        l2arc_fini entry
32113        fbt               zfs                       l2arc_start entry
32114        fbt               zfs                       l2arc_start return
32334        fbt               zfs                    l2arc_add_vdev entry
34552        sdt               zfs                              none l2arc-hit
34553        sdt               zfs                              none l2arc-read
34554        sdt               zfs                              none l2arc-miss
34565        sdt               zfs                              none l2arc-evict
34566        sdt               zfs                              none l2arc-write
34567        sdt               zfs                              none l2arc-iodone
%

Should it be "l2arc_write_done"? But there is no "l2arc__write_done:return"!
Comment 22 Andriy Gapon freebsd_committer 2017-02-22 13:09:23 UTC
(In reply to Lev A. Serebryakov from comment #21)
> Should it be "l2arc_write_done"?

No... the function probably got inlined.
Sorry then, I can't help you with this DTrace script.
Comment 23 Lev A. Serebryakov freebsd_committer 2017-02-22 13:11:49 UTC
I could rebuild module with debug options and without optimization…
Comment 24 Lev A. Serebryakov freebsd_committer 2017-02-22 22:02:21 UTC
Ok, dtrace is no option for now, but INVARIANTS give me panic very quickly:

  Panic String: solaris assert: write_psize <= target_sz (0x803000 <= 0x800000), file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c, line: 7140

I have core & all this stuff, of course, and could provide additional info. But for now I'm turning off L2ARC as I need stable system/
Comment 25 Andriy Gapon freebsd_committer 2017-02-22 22:42:06 UTC
(In reply to Lev A. Serebryakov from comment #24)
Could you please send me a copy of your arc.c file exactly as it is now in your source tree?
Comment 26 Lev A. Serebryakov freebsd_committer 2017-02-22 22:48:19 UTC
Sent by e-mail.
Comment 27 Andriy Gapon freebsd_committer 2017-02-22 23:26:40 UTC
Created attachment 180235 [details]
the patch

Re-upload the patch to try to fix dos new lines.
Comment 28 Andriy Gapon freebsd_committer 2017-02-24 07:38:02 UTC
Lev, could you please follow up on what you see with the latest patch?
Thank you!
Comment 29 Lev A. Serebryakov freebsd_committer 2017-02-25 00:14:04 UTC
After 2 days with properly applied patch it seems to work!
I have 100% fill of L2ARC for about 36 hours already, no overfill and no checksum errors (!!!).

I'n very sorry about mis-applied patch which lead to such long investigation.
Comment 30 commit-hook freebsd_committer 2017-02-25 17:04:47 UTC
A commit references this bug:

Author: avg
Date: Sat Feb 25 17:03:49 UTC 2017
New revision: 314274
URL: https://svnweb.freebsd.org/changeset/base/314274

Log:
  l2arc: try to fix write size calculation broken by Compressed ARC commit

  While there, make a change to not evict a first buffer outside the
  requested eviciton range.

  To do:
  - give more consistent names to the size variables
  - upstream to OpenZFS

  PR:		216178
  Reported by:	lev
  Tested by:	lev
  MFC after:	2 weeks

Changes:
  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
Comment 31 Andriy Gapon freebsd_committer 2017-02-26 14:18:55 UTC
(In reply to Lev A. Serebryakov from comment #29)
No worries.  Thank you for your persistence with testing!
Comment 32 Lev A. Serebryakov freebsd_committer 2017-03-07 10:49:27 UTC
Now L2ARC is Ok, but ARC is 19G on system with 16G of RAM (and swap is empty).
Looks like ARC has same problem. and, no, my data is not compressible, and here are a lot of programs (like Java instance with 2G heap!) is running and not swapped-out, so some memory is taken not by ARC.
Comment 33 Andriy Gapon freebsd_committer 2017-03-07 10:52:39 UTC
(In reply to Lev A. Serebryakov from comment #32)

Lev, it's completely different problem.
You can open a new bug report for it if you'd like, no need to discuss it here.
Please also see this https://github.com/openzfs/openzfs/pull/300
Comment 34 Lev A. Serebryakov freebsd_committer 2017-03-07 11:12:17 UTC
Here will be new bug, not from me, but from friend of mine, who have very nice graphs from monitoring system which shows huge impact of this bug on performance.
Comment 35 Andriy Gapon freebsd_committer 2017-03-11 16:04:29 UTC
MFCed as base r315072 and base r315073.