Bug 251462 - Failing transfers over nfsv4 with krb5i on CPU with SHA acceleration
Summary: Failing transfers over nfsv4 with krb5i on CPU with SHA acceleration
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.2-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-11-29 18:33 UTC by Žilvinas Žaltiena
Modified: 2021-04-07 19:06 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Žilvinas Žaltiena 2020-11-29 18:33:22 UTC
Initially I stumbled on this problem on TrueNAS12, but for debug purposes reproduced it on FreeBSD 12.0, 12.1, and 12.2, because TrueNAS uses FreeBSD as upstream/base OS.

I have setup an FreeBSD 12.x nfsv4 server requiring krb5i (note "i" - with integrity). Clients are Linux 5.8. Everything is joined to Active Directory and using aes256-cts-hmac-sha1-96 as cipher suite for kerberos. 

The problem:

If I run FreeBSD server inside VM on Intel Atom C3558 CPU, only small file  transfers succeed. Files transfers over 200MB become increasingly unreliable: they either hang (server timeout) or terminate with input/output error. After network traffic ceases, gssd on server still has high CPU usage for a while. Server side logs do not contain anything related. This CPU has AES-NI and SHA support.

What I have tried to narrow the culprit down:
1) Downgrading the security to krb5 (no integrity, just auth) fixed the transfers and saturated gigabit link.
2) Disabling aesni module fixed the failing transfers with krb5i.
3) Patching aesni module (so that detection of CPU support for SHA always failed) also fixed the failing transfers even with aesni module loaded.
Comment 1 Žilvinas Žaltiena 2020-12-15 11:52:07 UTC
I reproduced this on AMD Ryzen 7 3800X CPU too, which also has SHA extensions. NFS transfers fail with krb5i, if aesni modules is loaded.

I tried running crypto tests from FreeBSD tests suite. They passed successfully.

One interesting thing is forcing sync on NFS mount on Linux client makes transfers succeed even with aesni module loaded on FreeBSD server, but at 2-3x lower speed (80-100MB/s vs 25MB/s). Normally Linux client piles the data in the memory until application closes/locks/flushes the file or there is no more memory, and only then client starts sending it to server.
Comment 2 Žilvinas Žaltiena 2020-12-18 19:55:43 UTC
If nfsd on FreeBSD is explicitly limited to single thread, i.e. rc.conf with:

nfs_server_flags="-t -n 1"

then transfers succeed with kr5i/krb5p and aesni module loaded even if linux client does not use sync mount option. If thread count is set to > 1, original problem reappears. Some sort of thread safety / locking issue in sha part of aesni module?
Comment 3 Konstantin Belousov freebsd_committer 2021-02-04 18:16:51 UTC
Try https://reviews.freebsd.org/D28485
Comment 4 Žilvinas Žaltiena 2021-04-07 19:06:24 UTC
I am sorry for not replying earlier. I haven't got any email about update on this bug.

I tried patching aesni_cipher_setup() and leaving only

kt = is_fpu_kern_thread(0);

as per D28485, but it didn't help. Note, that I am on 12.2 and that line looked a bit different than in D28485, however I think idea behind it was the same anyways.