Initially I stumbled on this problem on TrueNAS12, but for debug purposes reproduced it on FreeBSD 12.0, 12.1, and 12.2, because TrueNAS uses FreeBSD as upstream/base OS.
I have setup an FreeBSD 12.x nfsv4 server requiring krb5i (note "i" - with integrity). Clients are Linux 5.8. Everything is joined to Active Directory and using aes256-cts-hmac-sha1-96 as cipher suite for kerberos.
If I run FreeBSD server inside VM on Intel Atom C3558 CPU, only small file transfers succeed. Files transfers over 200MB become increasingly unreliable: they either hang (server timeout) or terminate with input/output error. After network traffic ceases, gssd on server still has high CPU usage for a while. Server side logs do not contain anything related. This CPU has AES-NI and SHA support.
What I have tried to narrow the culprit down:
1) Downgrading the security to krb5 (no integrity, just auth) fixed the transfers and saturated gigabit link.
2) Disabling aesni module fixed the failing transfers with krb5i.
3) Patching aesni module (so that detection of CPU support for SHA always failed) also fixed the failing transfers even with aesni module loaded.
I reproduced this on AMD Ryzen 7 3800X CPU too, which also has SHA extensions. NFS transfers fail with krb5i, if aesni modules is loaded.
I tried running crypto tests from FreeBSD tests suite. They passed successfully.
One interesting thing is forcing sync on NFS mount on Linux client makes transfers succeed even with aesni module loaded on FreeBSD server, but at 2-3x lower speed (80-100MB/s vs 25MB/s). Normally Linux client piles the data in the memory until application closes/locks/flushes the file or there is no more memory, and only then client starts sending it to server.
If nfsd on FreeBSD is explicitly limited to single thread, i.e. rc.conf with:
nfs_server_flags="-t -n 1"
then transfers succeed with kr5i/krb5p and aesni module loaded even if linux client does not use sync mount option. If thread count is set to > 1, original problem reappears. Some sort of thread safety / locking issue in sha part of aesni module?
I am sorry for not replying earlier. I haven't got any email about update on this bug.
I tried patching aesni_cipher_setup() and leaving only
kt = is_fpu_kern_thread(0);
as per D28485, but it didn't help. Note, that I am on 12.2 and that line looked a bit different than in D28485, however I think idea behind it was the same anyways.