Created attachment 186440 [details] ABD chunk size 1K ABD was developed by Delphix and spent most of its life in testing with a value of 1024. Late in the review process over at OpenZFS the value was bumped to 4K under the auspices of "page-aligned memory allocations perform better,"[1] which is frequently true, however under production workloads we have found this to be very wasteful. The original 1K value should be restored and pushed upstream into OpenZFS. This change has already been made in Joyent's branch of Illumos[2]. https://github.com/openzfs/openzfs/pull/326#issuecomment-291223116 https://github.com/joyent/illumos-joyent/commit/09443b7960ae0f0a3ddcf56d1879e006c2790316
A commit references this bug: Author: avg Date: Wed Sep 20 08:36:31 UTC 2017 New revision: 323797 URL: https://svnweb.freebsd.org/changeset/base/323797 Log: add vfs_zfs.abd_chunk_size tunable It is reported that the default value of 4KB results in a substantial memory use overhead (at least, on some configurations). Using 1KB seems to reduce the overhead significantly. PR: 222377 Reported by: Sean Chittenden <sean@chittenden.org> MFC after: 1 week Changes: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c
Just for the record, I´ve applied the patch to 11-STABLE from today and it's working fine.
(In reply to Borja Marcos from comment #2) Please disregard, I added the comment to the wrong bug :/
A commit references this bug: Author: avg Date: Sun Oct 1 14:58:44 UTC 2017 New revision: 324160 URL: https://svnweb.freebsd.org/changeset/base/324160 Log: MFC r323797: add vfs_zfs.abd_chunk_size tunable It is reported that the default value of 4KB results in a substantial memory use overhead (at least, on some configurations). Using 1KB seems to reduce the overhead significantly. PR: 222377 Changes: _U stable/11/ stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c
One reason to make 4K the default rather than 1K is for storage interfaces like nvme which would have to read-then-write for <4k blocks.
it depends on the nvme drive. nvme works better at 4k because the dma interfaces are optimized for 4k pages. Both nvme and ssds really want to write 4k these days (they are either 4k native, or 4k native 512 emulated). And there's newer drives that want this to be 16k (4k emulated 16k native).
(In reply to Eitan Adler from comment #5) The comment is totally irrelevant. This has nothing to do with storage block sizes.
And to follow up with my own comment from https://reviews.freebsd.org/D12396#293994 It's worth pointing out that we eventually abandoned this change and went back to a 4K ABD chunk size. So while1K may have been more memory efficient in the short term, it ended up being suboptimal in the long run. I'm abandoning this issue and hoping no one repeats our lessons. https://github.com/joyent/illumos-joyent/commit/2bd6ca8c3cc70becca5f99bbf557b70ac3dfdaf7 https://smartos.org/bugview/OS-6387 https://smartos.org/bugview/OS-6363 Allocator-native chunk sizes will probably always be better with regards to long-term memory fragmentation.