On machines with gobs of RAM, zfs_dirty_data_max is zfs_dirty_data_max_percent (i.e. 10) percent of memory or zfs_dirty_data_max_max (ie 4G) which may take tens of minutes to sync to disk, especially if data is spread out across the disk, during which time any program that attempts to write to disk eventually stalls because there are at most three txgs pending. It would be nice to limit transactions to something smaller so that these latency spikes go away. Fix: I think something like the following (compiled but untested) patch would do the trick?
Responsible Changed From-To: freebsd-bugs->freebsd-fs Over to maintainer(s).
Exposing zfs_dirty_data_max directly doesn't make sense as its a calculated value based off zfs_dirty_data_max_percent% of all memory and capped at zfs_dirty_data_max_max. Given this it could be limited via setting zfs_dirty_data_max_max. The following could be exposed:- zfs_dirty_data_max_max zfs_dirty_data_max_percent zfs_dirty_data_sync zfs_delay_min_dirty_percent zfs_delay_scale Would that forfull your requirement? Regards Steve
On Tue, May 20, 2014 at 09:38:04AM +0100, Steven Hartland wrote: > Exposing zfs_dirty_data_max directly doesn't make sense as its > a calculated value based off zfs_dirty_data_max_percent% of > all memory and capped at zfs_dirty_data_max_max. I'm pretty sure the intention is that it is computed that way only if not set already -- there's a comparison for == 0 before the value is assigned. See arc_init(): http://fxr.watson.org/fxr/source/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c?im=excerpts#L4150 And in the Old World, the zfs.write_limit_override was similarly exported to override the similar computation of zfs.write_limit_max. That said, no, I don't really care too much about this particular tunable; I was just mirroring Solaris. > Given this it could be limited via setting zfs_dirty_data_max_max. Sure. > The following could be exposed:- > zfs_dirty_data_max_max > zfs_dirty_data_max_percent > zfs_dirty_data_sync > zfs_delay_min_dirty_percent > zfs_delay_scale > > Would that forfull your requirement? It's overkill for my case, but yes, those should probably all be exposed. Cheers, --nwf;
Responsible Changed From-To: freebsd-fs->smh I'll take it.
Author: smh Date: Wed May 21 13:36:04 2014 New Revision: 266497 URL: http://svnweb.freebsd.org/changeset/base/266497 Log: Added sysctls / tunables for ZFS dirty data tuning Added the following new sysctls / tunables: * vfs.zfs.dirty_data_max * vfs.zfs.dirty_data_max_max * vfs.zfs.dirty_data_max_percent * vfs.zfs.dirty_data_sync * vfs.zfs.delay_min_dirty_percent * vfs.zfs.delay_scale PR: kern/189865 MFC after: 2 weeks Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c ============================================================================== --- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c Wed May 21 11:53:15 2014 (r266496) +++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c Wed May 21 13:36:04 2014 (r266497) @@ -46,6 +46,11 @@ #include <sys/zil_impl.h> #include <sys/dsl_userhold.h> +#ifdef __FreeBSD__ +#include <sys/sysctl.h> +#include <sys/types.h> +#endif + /* * ZFS Write Throttle * ------------------ @@ -130,33 +135,83 @@ uint64_t zfs_delay_scale = 1000 * 1000 * * per-pool basis using zfs.conf. */ +#ifdef __FreeBSD__ + +extern int zfs_vdev_async_write_active_max_dirty_percent; SYSCTL_DECL(_vfs_zfs); -#if 0 -TUNABLE_INT("vfs.zfs.no_write_throttle", &zfs_no_write_throttle); -SYSCTL_INT(_vfs_zfs, OID_AUTO, no_write_throttle, CTLFLAG_RDTUN, - &zfs_no_write_throttle, 0, ""); -TUNABLE_INT("vfs.zfs.write_limit_shift", &zfs_write_limit_shift); -SYSCTL_INT(_vfs_zfs, OID_AUTO, write_limit_shift, CTLFLAG_RDTUN, - &zfs_write_limit_shift, 0, "2^N of physical memory"); -SYSCTL_DECL(_vfs_zfs_txg); -TUNABLE_INT("vfs.zfs.txg.synctime_ms", &zfs_txg_synctime_ms); -SYSCTL_INT(_vfs_zfs_txg, OID_AUTO, synctime_ms, CTLFLAG_RDTUN, - &zfs_txg_synctime_ms, 0, "Target milliseconds to sync a txg"); - -TUNABLE_QUAD("vfs.zfs.write_limit_min", &zfs_write_limit_min); -SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, write_limit_min, CTLFLAG_RDTUN, - &zfs_write_limit_min, 0, "Minimum write limit"); -TUNABLE_QUAD("vfs.zfs.write_limit_max", &zfs_write_limit_max); -SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, write_limit_max, CTLFLAG_RDTUN, - &zfs_write_limit_max, 0, "Maximum data payload per txg"); -TUNABLE_QUAD("vfs.zfs.write_limit_inflated", &zfs_write_limit_inflated); -SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, write_limit_inflated, CTLFLAG_RDTUN, - &zfs_write_limit_inflated, 0, "Maximum size of the dynamic write limit"); -TUNABLE_QUAD("vfs.zfs.write_limit_override", &zfs_write_limit_override); -SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, write_limit_override, CTLFLAG_RDTUN, - &zfs_write_limit_override, 0, - "Force a txg if dirty buffers exceed this value (bytes)"); + +TUNABLE_QUAD("vfs.zfs.dirty_data_max", &zfs_dirty_data_max); +SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, dirty_data_max, CTLFLAG_RWTUN, + &zfs_dirty_data_max, 0, + "The dirty space limit in bytes after which new writes are halted until " + "space becomes available"); + +TUNABLE_QUAD("vfs.zfs.dirty_data_max_max", &zfs_dirty_data_max_max); +SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, dirty_data_max_max, CTLFLAG_RDTUN, + &zfs_dirty_data_max_max, 0, + "The absolute cap on diry_data_max when auto calculating"); + +TUNABLE_INT("vfs.zfs.dirty_data_max_percent", &zfs_dirty_data_max_percent); +SYSCTL_INT(_vfs_zfs, OID_AUTO, dirty_data_max_percent, CTLFLAG_RDTUN, + &zfs_dirty_data_max_percent, 0, + "The percent of physical memory used to auto calculate dirty_data_max"); + +TUNABLE_QUAD("vfs.zfs.dirty_data_sync", &zfs_dirty_data_sync); +SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, dirty_data_sync, CTLFLAG_RWTUN, + &zfs_dirty_data_sync, 0, + "Force at txg if the number of dirty buffer bytes exceed this value"); + +static int sysctl_zfs_delay_min_dirty_percent(SYSCTL_HANDLER_ARGS); +/* No zfs_delay_min_dirty_percent tunable due to limit requirements */ +SYSCTL_PROC(_vfs_zfs, OID_AUTO, delay_min_dirty_percent, + CTLTYPE_INT | CTLFLAG_MPSAFE | CTLFLAG_RW, 0, sizeof(int), + sysctl_zfs_delay_min_dirty_percent, "I", + "The limit of outstanding dirty data before transations are delayed"); + +static int sysctl_zfs_delay_scale(SYSCTL_HANDLER_ARGS); +/* No zfs_delay_scale tunable due to limit requirements */ +SYSCTL_PROC(_vfs_zfs, OID_AUTO, delay_scale, + CTLTYPE_U64 | CTLFLAG_MPSAFE | CTLFLAG_RW, 0, sizeof(uint64_t), + sysctl_zfs_delay_scale, "QU", + "Controls how quickly the delay approaches infinity"); + +static int +sysctl_zfs_delay_min_dirty_percent(SYSCTL_HANDLER_ARGS) +{ + int val, err; + + val = zfs_delay_min_dirty_percent; + err = sysctl_handle_int(oidp, &val, 0, req); + if (err != 0 || req->newptr == NULL) + return (err); + + if (val < zfs_vdev_async_write_active_max_dirty_percent) + return (EINVAL); + + zfs_delay_min_dirty_percent = val; + + return (0); +} + +static int +sysctl_zfs_delay_scale(SYSCTL_HANDLER_ARGS) +{ + uint64_t val; + int err; + + val = zfs_delay_scale; + err = sysctl_handle_64(oidp, &val, 0, req); + if (err != 0 || req->newptr == NULL) + return (err); + + if (val > UINT64_MAX / zfs_dirty_data_max) + return (EINVAL); + + zfs_delay_scale = val; + + return (0); +} #endif hrtime_t zfs_throttle_delay = MSEC2NSEC(10); _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
batch change: For bugs that match the following - Status Is In progress AND - Untouched since 2018-01-01. AND - Affects Base System OR Documentation DO: Reset to open status. Note: I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
There is a commit referencing this PR, but it's still not closed and has been inactive for some time. Closing the PR as fixed but feel free to re-open it if the issue hasn't been completely resolved. Thanks