# zpool create testpool gpt/test0 cache gpt/test1 # zpool list -v testpool NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT testpool 1,98G 248K 1,98G - 0% 0% 1.00x ONLINE - gpt/test0 1,98G 248K 1,98G - 0% 0% cache - - - - - - gpt/test1 2,00G 36K 2,00G - 0% 0% # zfs create -o compression=off testpool/testset # zfs set mountpoint=/testset testpool/testset # dd if=/dev/zero of=/testset/test.bin bs=1M count=1K 1024+0 records in 1024+0 records out 1073741824 bytes transferred in 10.510661 secs (102157401 bytes/sec) # zpool list -v testpool NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT testpool 1,98G 1011M 1021M - 33% 49% 1.00x ONLINE - gpt/test0 1,98G 1011M 1021M - 33% 49% cache - - - - - - gpt/test1 2,00G 1010M 1,01G - 0% 49% So far so good: The data was written both to the actual pool and to the cache device. But what if we want to cache this huge file only in L2ARC and not in memory? # zfs set primarycache=metadata testpool/testset # zfs get secondarycache testpool/testset NAME PROPERTY VALUE SOURCE testpool/testset secondarycache all default # dd if=/testset/test.bin of=/dev/null bs=1M 1024+0 records in 1024+0 records out 1073741824 bytes transferred in 0.182155 secs (5894663608 bytes/sec) Still working as expected: This read was (obviously) serviced straight from RAM because setting primarycache didn't immediately drop the cache. However, touching it with this read should cause the data to get evicted from ARC. # zpool list -v testpool NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT testpool 1,98G 1,00G 1006M - 33% 50% 1.00x ONLINE - gpt/test0 1,98G 1,00G 1006M - 33% 50% cache - - - - - - gpt/test1 2,00G 556K 2,00G - 0% 0% # dd if=/testset/test.bin of=/dev/null bs=1M 1024+0 records in 1024+0 records out 1073741824 bytes transferred in 8.518173 secs (126053066 bytes/sec) The speed shows that this read came from disk, which is expected because RAM caching is now disabled. What's not expected is that the data was removed from the cache device as well. No matter the workload (sequential/random), ZFS will no longer utilize L2ARC for this dataset even though secondarycache is set to all. There is *some* IO still going on, so perhaps it's still caching some metadata as this is what primarycache is set to. Note that I modified some sysctls to speed up cache warming for this test: vfs.zfs.l2arc_norw: 0 vfs.zfs.l2arc_feed_again: 1 vfs.zfs.l2arc_noprefetch: 0 vfs.zfs.l2arc_feed_secs: 0 vfs.zfs.l2arc_write_max: 1000000000000
I didn't realize this until today myself but this is working as intended. L2ARC only takes metadata/data from ARC before it is evicted. So if using primarycache=metadata then data can never make it to L2ARC since it was not in the ARC. https://www.mail-archive.com/zfs-discuss@opensolaris.org/msg45601.html I can't find a great source for this (in a manpage) but the code spells it out. sys/contrib/openzfs/module/zfs/arc.c * 2. The L2ARC attempts to cache data from the ARC before it is evicted. * It does this by periodically scanning buffers from the eviction-end of * the MFU and MRU ARC lists, copying them to the L2ARC devices if they are * not already there. It scans until a headroom of buffers is satisfied, * which itself is a buffer for ARC eviction. [...] It also won't cache prefetched data by default. Apparently primarycache=metadata also effectively disables prefetch (https://github.com/openzfs/zfs/issues/1773) https://github.com/openzfs/zfs/issues/12028 discusses this primary bug's issue and the fact that this primary=metadata,secondary=all setup can lead to read amplification too. If I've misunderstood and your report is about the fact that already-cached-in-ARC data is not sent to L2arc and you think it should I think a better place to report is https://github.com/openzfs/zfs anyway. I doubt they would handle that as it's quite a hack to set primarycache=all, cache a bunch of data, and then change to metadata and expect it to fill up L2arc. Eventually that data would leave the l2arc as well and you would need to manually do the hack over again.