Bug 222377

Summary: ZFS ABD wasteful...
Product: Base System Reporter: Sean Chittenden <seanc>
Component: kernAssignee: freebsd-fs (Nobody) <fs>
Status: Closed FIXED    
Severity: Affects Only Me CC: avg, borjam, emaste, imp
Priority: ---    
Version: CURRENT   
Hardware: Any   
OS: Any   
URL: https://reviews.freebsd.org/D12396
Attachments:
Description Flags
ABD chunk size 1K none

Description Sean Chittenden freebsd_committer freebsd_triage 2017-09-16 23:20:42 UTC
Created attachment 186440 [details]
ABD chunk size 1K

ABD was developed by Delphix and spent most of its life in testing with a value of 1024.  Late in the review process over at OpenZFS the value was bumped to 4K under the auspices of "page-aligned memory allocations perform better,"[1] which is frequently true, however under production workloads we have found this to be very wasteful.  The original 1K value should be restored and pushed upstream into OpenZFS.  This change has already been made in Joyent's branch of Illumos[2].

https://github.com/openzfs/openzfs/pull/326#issuecomment-291223116

https://github.com/joyent/illumos-joyent/commit/09443b7960ae0f0a3ddcf56d1879e006c2790316
Comment 1 commit-hook freebsd_committer freebsd_triage 2017-09-20 08:37:33 UTC
A commit references this bug:

Author: avg
Date: Wed Sep 20 08:36:31 UTC 2017
New revision: 323797
URL: https://svnweb.freebsd.org/changeset/base/323797

Log:
  add vfs_zfs.abd_chunk_size tunable

  It is reported that the default value of 4KB results in a substantial
  memory use overhead (at least, on some configurations).  Using 1KB seems
  to reduce the overhead significantly.

  PR:		222377
  Reported by:	Sean Chittenden <sean@chittenden.org>
  MFC after:	1 week

Changes:
  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c
Comment 2 Borja Marcos 2017-09-20 11:07:55 UTC
Just for the record, I´ve applied the patch to 11-STABLE from today and it's
working fine.
Comment 3 Borja Marcos 2017-09-20 11:09:58 UTC
(In reply to Borja Marcos from comment #2)
Please disregard, I added the comment to the wrong bug :/
Comment 4 commit-hook freebsd_committer freebsd_triage 2017-10-01 14:59:30 UTC
A commit references this bug:

Author: avg
Date: Sun Oct  1 14:58:44 UTC 2017
New revision: 324160
URL: https://svnweb.freebsd.org/changeset/base/324160

Log:
  MFC r323797: add vfs_zfs.abd_chunk_size tunable

  It is reported that the default value of 4KB results in a substantial
  memory use overhead (at least, on some configurations).  Using 1KB seems
  to reduce the overhead significantly.

  PR:		222377

Changes:
_U  stable/11/
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c
Comment 5 Eitan Adler freebsd_committer freebsd_triage 2018-05-29 04:22:34 UTC
One reason to make 4K the default rather than 1K is for storage interfaces like nvme which would have to read-then-write for <4k blocks.
Comment 6 Warner Losh freebsd_committer freebsd_triage 2018-05-29 04:45:25 UTC
it depends on the nvme drive. nvme works better at 4k because the dma interfaces are optimized for 4k pages.

Both nvme and ssds really want to write 4k these days (they are either 4k native, or 4k native 512 emulated). And there's newer drives that want this to be 16k (4k emulated 16k native).
Comment 7 Andriy Gapon freebsd_committer freebsd_triage 2018-05-29 07:42:21 UTC
(In reply to Eitan Adler from comment #5)
The comment is totally irrelevant.
This has nothing to do with storage block sizes.
Comment 8 Sean Chittenden freebsd_committer freebsd_triage 2018-05-29 18:42:03 UTC
And to follow up with my own comment from https://reviews.freebsd.org/D12396#293994

It's worth pointing out that we eventually abandoned this change and went back to a 4K ABD chunk size. So while1K may have been more memory efficient in the short term, it ended up being suboptimal in the long run. I'm abandoning this issue and hoping no one repeats our lessons.

https://github.com/joyent/illumos-joyent/commit/2bd6ca8c3cc70becca5f99bbf557b70ac3dfdaf7
https://smartos.org/bugview/OS-6387
https://smartos.org/bugview/OS-6363

Allocator-native chunk sizes will probably always be better with regards to long-term memory fragmentation.