This isn't so much a problem as it is an RFE. Basically, the SHA256 checksum code within ZFS looks like it could use a little helping hand. On my limited testing, it would appear that Solaris 11 has at least a 20-25% edge in efficiency when doing SHA256 checksumming for ZFS. IANAP, but it would be extremely nice to be able to have the same (or better) efficiency for ZFS on FreeBSD. I have not done specific testing with Fletcher4, but that also seemed to be slightly better tuned in Solaris 11 as well.
State Changed From-To: open->suspended assign, and note that someone will need to provide a patch.
Responsible Changed From-To: freebsd-bugs->freebsd-fs
Responsible Changed From-To: freebsd-fs->zfs-devel Assign to zfs-devel@
We would need something more to go no than "looks like" I'm afraid. Also Fletcher4 is the default checksum which achieves ~4GB/s per core in hashing performance, where as SHA-256 even with hand written assembly manages less than 1/10th that performance, so if your looking for performance for checksums use the default Fletcher4 instead of the SHA-256. That said new processors do have HW support which could be used to accelerate SHA-256 support, details of this can be found here:- http://download.intel.com/embedded/processor/whitepaper/327457.pdf These sorts of core feature enhancements should be discussed and implemented upstream at illumos. Regards Steve
Ok, thank you Steven - I'll gather up more detailed information when I have= my test environment fully fleshed out so I have absolute apples to apples = numbers and can fully constrain my testing to one hardware platform (the pl= atforms were slightly different, same processors and memory though). I'll = file that as an RFE with Illumos if that's what you think is best. I just = wanted to put that out there, since I certainly noticed the difference in m= y many weeks of testing different platforms here (OmniOS, Solaris 11.0, Sol= aris 11.1, FreeBSD 9.1, Nexenta 4 CE). Didn't really know where I should f= ile that particular RFE, so I figured I'd start with the kernel team. I di= dn't think that the SHA256 implementation in FreeBSD was taken exactly from= Illumos.
Actually after double checking it looks like FreeBSD doesn't use the same SHA-256 implementation in ZFS as illumos so there may well be something to look at there. Would be good to know the difference in performance between FreeBSD and Openindiana (illumos distribution). Regards Steve
I'll gather some numbers together comparing OmniOS (Illumos) vs FreeBSD and= get back some numbers for you. I can use my Xeon E3-1240 at home for the = benchmarking, it'll just take me some time to gather everything together to= do it. Are there any specific tools you'd like me to run, or just basic z= pool iostat and mpstat / top -P ?
Steven, It also looks as if kern/125738 is related to hardware acceleration of SHA2= 56 in ZFS where it's available - PJD took this one, but doesn't look like h= e had time to work on it. So they are similar, though this request is a bi= t more broad. Also, is there any way to scrub my mobile number off there in the ticket de= tails? Totally spaced out that it's in my default signature here at work. --Jason
FWIW, I spent a full day trying to accelerate Fletcher-4 using SIMD instructions (tested on Sandy Bridge and Nehalem). I was unable to improve on the current code; the Fletcher-4 hash is very fast and doesn't vectorize well. However, I believe that AVX-2 will probably be able to beat the non-vectorized version. I plan to try it out as soon as I can get my hands on a Haswell CPU. I've also spent several weeks analyzing the strength of Fletcher-4, and concluded that it's really quite good. Good enough for every non-cryptographic application, certainly. My recommendation is that all ZFS users should prefer Fletcher-4 over SHA-256. I haven't tried vectorizing SHA-256 and don't plan to.
Thank you very much for following up on this. Any further optimization of = checksums is gravy, for everyone that uses FreeBSD/ZFS. Sure Fletcher4 is = pretty light, but every little bit helps. Fewer CPU cycles used =3D less l= atency =3D more win. I think for crypto/dedup applications, perhaps effort= should be focused on an optimized implementation of Keccak (SHA3 winner) i= nstead of SHA256?
Responsible Changed From-To: zfs-devel->freebsd-fs apparently there is no such alias. reassign to freebsd-fs.
batch change: For bugs that match the following - Status Is In progress AND - Untouched since 2018-01-01. AND - Affects Base System OR Documentation DO: Reset to open status. Note: I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
Nobody came forward to do this, so no point in keeping the issue open.