Bug 250323 - OpenZFS L2ARC size shrinking over time
Summary: OpenZFS L2ARC size shrinking over time
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Many People
Assignee: freebsd-fs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-10-13 19:00 UTC by Stefan Eßer
Modified: 2020-10-13 23:43 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Eßer freebsd_committer 2020-10-13 19:00:58 UTC
After the switch-over to OpenZFS in -CURRENT I have observed that the L2ARC shrinks over time (at a rate of 10 to 20 MB/day).

My system uses a 1 TB NVME SSD partitioned as 64 GB of SWAP (generally
unused) and 256 GB of ZFS cache (L2ARC) to speed up reads from a 3*6 TB
raidz1.

(L2ARC persistence is great, especially on a system that is used for
development and rebooted into the latest -CURRENT about once per week!)


After reboot, the full cache partition is available, but even measured
only minutes apart the reported site of the L2ARC is declining.

The following two values were obtained just 120 seconds apart:

kstat.zfs.misc.arcstats.l2_asize: 273831726080

kstat.zfs.misc.arcstats.l2_asize: 273831644160

[After finishing the text of this mail I have checked the value of
that variable another time - maybe 10 minutes have passed ...

kstat.zfs.misc.arcstats.l2_asize: 273827724288

That corresponds with some 4 MB lost over maybe 10 minutes ...]


I have first noticed this effect with the zfs-stats command updated
to support the OpenZFS sysctl variables (committed to ports a few days
ago).

After 6 days of uptime the output of "uptime; zfs-stats -L" is:


12:31PM  up 6 days, 7 mins, 2 users, load averages: 2.67, 0.73, 0.36

------------------------------------------------------------------------
ZFS Subsystem Report                            Mon Oct 12 12:31:57 2020
------------------------------------------------------------------------

L2 ARC Summary: (HEALTHY)
    Low Memory Aborts:				87
    Free on Write:				5.81	k
    R/W Clashes:				0
    Bad Checksums:				0
    IO Errors:					0

L2 ARC Size: (Adaptive)				160.09	GiB
    Decompressed Data Size:			373.03	GiB
    Compression Factor:				2.33
    Header Size:			0.12	458.14	MiB

L2 ARC Evicts:
    Lock Retries:				61
    Upon Reading:				9

L2 ARC Breakdown:				12.66	m
    Hit Ratio:				75.69%	9.58	m
    Miss Ratio:				24.31%	3.08	m
    Feeds:					495.76	k

L2 ARC Writes:
    Writes Sent:         		100.00%	48.94	k

------------------------------------------------------------------------


After a reboot and with the persistent L2ARC now reported to be
available again (and filled with the expected amount of data):


13:24  up 28 mins, 2 users, load averages: 0,09 0,05 0,01

------------------------------------------------------------------------
ZFS Subsystem Report                            Mon Oct 12 13:24:56 2020
------------------------------------------------------------------------

L2 ARC Summary: (HEALTHY)
    Low Memory Aborts:				0
    Free on Write:				0
    R/W Clashes:				0
    Bad Checksums:				0
    IO Errors:					0

L2 ARC Size: (Adaptive)				255.03	GiB
    Decompressed Data Size:			633.21	GiB
    Compression Factor:				2.48
    Header Size:			0.14	901.41	MiB

L2 ARC Breakdown:				9.11	k
    Hit Ratio:				35.44%	3.23	k
    Miss Ratio:				64.56%	5.88	k
    Feeds:					1.57	k

L2 ARC Writes:
    Writes Sent:				100.00%	205

------------------------------------------------------------------------


After 32 hours of uptime with only light load: 255 GB -> 242 GB

 8:58PM  up 1 day,  8:01, 1 user, load averages: 4.03, 1.02, 0.41

------------------------------------------------------------------------
ZFS Subsystem Report				Tue Oct 13 20:58:02 2020
------------------------------------------------------------------------

L2 ARC Summary: (HEALTHY)
	Low Memory Aborts:			7
	Free on Write:				78
	R/W Clashes:				0
	Bad Checksums:				0
	IO Errors:				0

L2 ARC Size: (Adaptive)				242.33	GiB
	Decompressed Data Size:			603.73	GiB
	Compression Factor:			2.49
	Header Size:			0.13%	828.25	MiB

L2 ARC Evicts:
	Lock Retries:				4
	Upon Reading:				0

L2 ARC Breakdown:				1.34	m
	Hit Ratio:			74.50%	1.00	m
	Miss Ratio:			25.50%	342.50	k
	Feeds:					110.24	k

L2 ARC Writes:
	Writes Sent:			100.00%	11.82	k

------------------------------------------------------------------------

I do not know whether this is just an accounting effect, or whether the
usable size of the L2ARC is actually shrinking, but since there is data
in the L2ARC after the reboot, I assume it is just an accounting error.

But I think this should still be researched and fixed - there might be
a wrap-around after several weeks of up-time, and if the size value
is not only used for display purposes, this might lead to unexpected
behavior.
Comment 1 Stefan Eßer freebsd_committer 2020-10-13 19:03:24 UTC
I'm adding Matt Macy and Allan Jude as the two committers currently most active with regard to the migration to OpenZFS.