Bug 209396 - ZFS primarycache attribute affects secondary cache as well
Summary: ZFS primarycache attribute affects secondary cache as well
Status: Closed Works As Intended
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.3-RELEASE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-fs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-05-09 14:25 UTC by noah.bergbauer
Modified: 2022-06-06 20:08 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description noah.bergbauer 2016-05-09 14:25:54 UTC
# zpool create testpool gpt/test0 cache gpt/test1
# zpool list -v testpool
NAME          SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
testpool     1,98G   248K  1,98G         -     0%     0%  1.00x  ONLINE  -
  gpt/test0  1,98G   248K  1,98G         -     0%     0%
cache            -      -      -         -      -      -
  gpt/test1  2,00G    36K  2,00G         -     0%     0%
# zfs create -o compression=off testpool/testset
# zfs set mountpoint=/testset testpool/testset
# dd if=/dev/zero of=/testset/test.bin bs=1M count=1K
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 10.510661 secs (102157401 bytes/sec)
# zpool list -v testpool
NAME          SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
testpool     1,98G  1011M  1021M         -    33%    49%  1.00x  ONLINE  -
  gpt/test0  1,98G  1011M  1021M         -    33%    49%
cache            -      -      -         -      -      -
  gpt/test1  2,00G  1010M  1,01G         -     0%    49%


So far so good: The data was written both to the actual pool and to the cache device. But what if we want to cache this huge file only in L2ARC and not in memory?


# zfs set primarycache=metadata testpool/testset
# zfs get secondarycache testpool/testset
NAME              PROPERTY        VALUE           SOURCE
testpool/testset  secondarycache  all             default
# dd if=/testset/test.bin of=/dev/null bs=1M
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 0.182155 secs (5894663608 bytes/sec)


Still working as expected: This read was (obviously) serviced straight from RAM because setting primarycache didn't immediately drop the cache. However, touching it with this read should cause the data to get evicted from ARC.

# zpool list -v testpool
NAME          SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
testpool     1,98G  1,00G  1006M         -    33%    50%  1.00x  ONLINE  -
  gpt/test0  1,98G  1,00G  1006M         -    33%    50%
cache            -      -      -         -      -      -
  gpt/test1  2,00G   556K  2,00G         -     0%     0%
# dd if=/testset/test.bin of=/dev/null bs=1M
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 8.518173 secs (126053066 bytes/sec)


The speed shows that this read came from disk, which is expected because RAM caching is now disabled. What's not expected is that the data was removed from the cache device as well. No matter the workload (sequential/random), ZFS will no longer utilize L2ARC for this dataset even though secondarycache is set to all. There is *some* IO still going on, so perhaps it's still caching some metadata as this is what primarycache is set to.



Note that I modified some sysctls to speed up cache warming for this test:
vfs.zfs.l2arc_norw: 0
vfs.zfs.l2arc_feed_again: 1
vfs.zfs.l2arc_noprefetch: 0
vfs.zfs.l2arc_feed_secs: 0
vfs.zfs.l2arc_write_max: 1000000000000
Comment 1 Bryan Drewery freebsd_committer freebsd_triage 2022-06-06 20:08:29 UTC
I didn't realize this until today myself but this is working as intended. L2ARC only takes metadata/data from ARC before it is evicted. So if using primarycache=metadata then data can never make it to L2ARC since it was not in the ARC.

https://www.mail-archive.com/zfs-discuss@opensolaris.org/msg45601.html

I can't find a great source for this (in a manpage) but the code spells it out.
sys/contrib/openzfs/module/zfs/arc.c

    * 2. The L2ARC attempts to cache data from the ARC before it is evicted.
    * It does this by periodically scanning buffers from the eviction-end of
    * the MFU and MRU ARC lists, copying them to the L2ARC devices if they are
    * not already there. It scans until a headroom of buffers is satisfied,
    * which itself is a buffer for ARC eviction. [...]

It also won't cache prefetched data by default. Apparently primarycache=metadata also effectively disables prefetch (https://github.com/openzfs/zfs/issues/1773)

https://github.com/openzfs/zfs/issues/12028 discusses this primary bug's issue and the fact that this primary=metadata,secondary=all setup can lead to read amplification too.


If I've misunderstood and your report is about the fact that already-cached-in-ARC data is not sent to L2arc and you think it should I think a better place to report is https://github.com/openzfs/zfs anyway. I doubt they would handle that as it's quite a hack to set primarycache=all, cache a bunch of data, and then change to metadata and expect it to fill up L2arc. Eventually that data would leave the l2arc as well and you would need to manually do the hack over again.