Bug 189865 - [zfs] [patch] zfs_dirty_data_max{,_max,_percent} not exported as loader tunables
Summary: [zfs] [patch] zfs_dirty_data_max{,_max,_percent} not exported as loader tunables
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 9.2-STABLE
Hardware: Any Any
: Normal Affects Only Me
Assignee: Steven Hartland
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-05-16 17:10 UTC by Nathaniel Filardo
Modified: 2019-01-21 09:40 UTC (History)
1 user (show)

See Also:


Attachments
file.diff (1.22 KB, patch)
2014-05-16 17:10 UTC, Nathaniel Filardo
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Nathaniel Filardo 2014-05-16 17:10:01 UTC
On machines with gobs of RAM, zfs_dirty_data_max is zfs_dirty_data_max_percent (i.e. 10) percent of memory or zfs_dirty_data_max_max (ie 4G) which may take tens of minutes to sync to disk, especially if data is spread out across the disk, during which time any program that attempts to write to disk eventually stalls because there are at most three txgs pending.  It would be nice to limit transactions to something smaller so that these latency spikes go away.

Fix: I think something like the following (compiled but untested) patch would do the trick?
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2014-05-20 04:51:56 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-fs

Over to maintainer(s).
Comment 2 Steven Hartland & 2014-05-20 09:38:04 UTC
Exposing zfs_dirty_data_max directly doesn't make sense as its
a calculated value based off zfs_dirty_data_max_percent% of
all memory and capped at zfs_dirty_data_max_max.

Given this it could be limited via setting zfs_dirty_data_max_max.

The following could be exposed:-
zfs_dirty_data_max_max
zfs_dirty_data_max_percent
zfs_dirty_data_sync
zfs_delay_min_dirty_percent
zfs_delay_scale

Would that forfull your requirement?

    Regards
    Steve
Comment 3 Nathaniel Filardo 2014-05-21 05:09:20 UTC
On Tue, May 20, 2014 at 09:38:04AM +0100, Steven Hartland wrote:
> Exposing zfs_dirty_data_max directly doesn't make sense as its
> a calculated value based off zfs_dirty_data_max_percent% of
> all memory and capped at zfs_dirty_data_max_max.


I'm pretty sure the intention is that it is computed that way only if not
set already -- there's a comparison for == 0 before the value is assigned.
See arc_init():
http://fxr.watson.org/fxr/source/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c?im=excerpts#L4150

And in the Old World, the zfs.write_limit_override was similarly exported to
override the similar computation of zfs.write_limit_max.  That said, no,
I don't really care too much about this particular tunable; I was just
mirroring Solaris.
 
> Given this it could be limited via setting zfs_dirty_data_max_max.


Sure.
 
> The following could be exposed:-
> zfs_dirty_data_max_max
> zfs_dirty_data_max_percent
> zfs_dirty_data_sync
> zfs_delay_min_dirty_percent
> zfs_delay_scale
> 
> Would that forfull your requirement?


It's overkill for my case, but yes, those should probably all be exposed.

Cheers,
--nwf;
Comment 4 Steven Hartland freebsd_committer freebsd_triage 2014-05-21 14:12:54 UTC
Responsible Changed
From-To: freebsd-fs->smh

I'll take it.
Comment 5 dfilter service freebsd_committer freebsd_triage 2014-05-21 14:36:08 UTC
Author: smh
Date: Wed May 21 13:36:04 2014
New Revision: 266497
URL: http://svnweb.freebsd.org/changeset/base/266497

Log:
  Added sysctls / tunables for ZFS dirty data tuning
  
  Added the following new sysctls / tunables:
  * vfs.zfs.dirty_data_max
  * vfs.zfs.dirty_data_max_max
  * vfs.zfs.dirty_data_max_percent
  * vfs.zfs.dirty_data_sync
  * vfs.zfs.delay_min_dirty_percent
  * vfs.zfs.delay_scale
  
  PR:		kern/189865
  MFC after:	2 weeks

Modified:
  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c

Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c
==============================================================================
--- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c	Wed May 21 11:53:15 2014	(r266496)
+++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c	Wed May 21 13:36:04 2014	(r266497)
@@ -46,6 +46,11 @@
 #include <sys/zil_impl.h>
 #include <sys/dsl_userhold.h>
 
+#ifdef __FreeBSD__
+#include <sys/sysctl.h>
+#include <sys/types.h>
+#endif
+
 /*
  * ZFS Write Throttle
  * ------------------
@@ -130,33 +135,83 @@ uint64_t zfs_delay_scale = 1000 * 1000 *
  * per-pool basis using zfs.conf.
  */
 
+#ifdef __FreeBSD__
+
+extern int zfs_vdev_async_write_active_max_dirty_percent;
 
 SYSCTL_DECL(_vfs_zfs);
-#if 0
-TUNABLE_INT("vfs.zfs.no_write_throttle", &zfs_no_write_throttle);
-SYSCTL_INT(_vfs_zfs, OID_AUTO, no_write_throttle, CTLFLAG_RDTUN,
-    &zfs_no_write_throttle, 0, "");
-TUNABLE_INT("vfs.zfs.write_limit_shift", &zfs_write_limit_shift);
-SYSCTL_INT(_vfs_zfs, OID_AUTO, write_limit_shift, CTLFLAG_RDTUN,
-    &zfs_write_limit_shift, 0, "2^N of physical memory");
-SYSCTL_DECL(_vfs_zfs_txg);
-TUNABLE_INT("vfs.zfs.txg.synctime_ms", &zfs_txg_synctime_ms);
-SYSCTL_INT(_vfs_zfs_txg, OID_AUTO, synctime_ms, CTLFLAG_RDTUN,
-    &zfs_txg_synctime_ms, 0, "Target milliseconds to sync a txg");
-
-TUNABLE_QUAD("vfs.zfs.write_limit_min", &zfs_write_limit_min);
-SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, write_limit_min, CTLFLAG_RDTUN,
-    &zfs_write_limit_min, 0, "Minimum write limit");
-TUNABLE_QUAD("vfs.zfs.write_limit_max", &zfs_write_limit_max);
-SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, write_limit_max, CTLFLAG_RDTUN,
-    &zfs_write_limit_max, 0, "Maximum data payload per txg");
-TUNABLE_QUAD("vfs.zfs.write_limit_inflated", &zfs_write_limit_inflated);
-SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, write_limit_inflated, CTLFLAG_RDTUN,
-    &zfs_write_limit_inflated, 0, "Maximum size of the dynamic write limit");
-TUNABLE_QUAD("vfs.zfs.write_limit_override", &zfs_write_limit_override);
-SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, write_limit_override, CTLFLAG_RDTUN,
-    &zfs_write_limit_override, 0,
-    "Force a txg if dirty buffers exceed this value (bytes)");
+
+TUNABLE_QUAD("vfs.zfs.dirty_data_max", &zfs_dirty_data_max);
+SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, dirty_data_max, CTLFLAG_RWTUN,
+    &zfs_dirty_data_max, 0,
+    "The dirty space limit in bytes after which new writes are halted until "
+    "space becomes available");
+
+TUNABLE_QUAD("vfs.zfs.dirty_data_max_max", &zfs_dirty_data_max_max);
+SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, dirty_data_max_max, CTLFLAG_RDTUN,
+    &zfs_dirty_data_max_max, 0,
+    "The absolute cap on diry_data_max when auto calculating");
+
+TUNABLE_INT("vfs.zfs.dirty_data_max_percent", &zfs_dirty_data_max_percent);
+SYSCTL_INT(_vfs_zfs, OID_AUTO, dirty_data_max_percent, CTLFLAG_RDTUN,
+    &zfs_dirty_data_max_percent, 0,
+    "The percent of physical memory used to auto calculate dirty_data_max");
+
+TUNABLE_QUAD("vfs.zfs.dirty_data_sync", &zfs_dirty_data_sync);
+SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, dirty_data_sync, CTLFLAG_RWTUN,
+    &zfs_dirty_data_sync, 0,
+    "Force at txg if the number of dirty buffer bytes exceed this value");
+
+static int sysctl_zfs_delay_min_dirty_percent(SYSCTL_HANDLER_ARGS);
+/* No zfs_delay_min_dirty_percent tunable due to limit requirements */
+SYSCTL_PROC(_vfs_zfs, OID_AUTO, delay_min_dirty_percent,
+    CTLTYPE_INT | CTLFLAG_MPSAFE | CTLFLAG_RW, 0, sizeof(int),
+    sysctl_zfs_delay_min_dirty_percent, "I",
+    "The limit of outstanding dirty data before transations are delayed");
+
+static int sysctl_zfs_delay_scale(SYSCTL_HANDLER_ARGS);
+/* No zfs_delay_scale tunable due to limit requirements */
+SYSCTL_PROC(_vfs_zfs, OID_AUTO, delay_scale,
+    CTLTYPE_U64 | CTLFLAG_MPSAFE | CTLFLAG_RW, 0, sizeof(uint64_t),
+    sysctl_zfs_delay_scale, "QU",
+    "Controls how quickly the delay approaches infinity");
+
+static int
+sysctl_zfs_delay_min_dirty_percent(SYSCTL_HANDLER_ARGS)
+{
+	int val, err;
+
+	val = zfs_delay_min_dirty_percent;
+	err = sysctl_handle_int(oidp, &val, 0, req);
+	if (err != 0 || req->newptr == NULL)
+		return (err);
+
+	if (val < zfs_vdev_async_write_active_max_dirty_percent)
+		return (EINVAL);
+
+	zfs_delay_min_dirty_percent = val;
+
+	return (0);
+}
+
+static int
+sysctl_zfs_delay_scale(SYSCTL_HANDLER_ARGS)
+{
+	uint64_t val;
+	int err;
+
+	val = zfs_delay_scale;
+	err = sysctl_handle_64(oidp, &val, 0, req);
+	if (err != 0 || req->newptr == NULL)
+		return (err);
+
+	if (val > UINT64_MAX / zfs_dirty_data_max)
+		return (EINVAL);
+
+	zfs_delay_scale = val;
+
+	return (0);
+}
 #endif
 
 hrtime_t zfs_throttle_delay = MSEC2NSEC(10);
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Comment 6 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:48:24 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
Comment 7 Oleksandr Tymoshenko freebsd_committer freebsd_triage 2019-01-21 09:40:42 UTC
There is a commit referencing this PR, but it's still not closed and has been inactive for some time. Closing the PR as fixed but feel free to re-open it if the issue hasn't been completely resolved.

Thanks