Bug 267631 - slow nfs from stock FreeBSD kvm guest/client to linux kvm host/server
Summary: slow nfs from stock FreeBSD kvm guest/client to linux kvm host/server
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.0-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-fs (Nobody)
URL:
Keywords: performance
Depends on:
Blocks:
 
Reported: 2022-11-08 02:12 UTC by Andrew Cagney
Modified: 2023-09-17 14:39 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Cagney 2022-11-08 02:12:56 UTC
First the numbers:

FreeBSD:
real	11m53.066s
user	0m36.488s
sys	0m5.237s
NetBSD (9.99.something):
real    3m52.182s
user    0m46.372s
sys     0m19.655s
OpenBSD (7.2):
real	3m24.953s
user	0m42.260s
sys	0m9.800s

now the details:

The test is to use build libreswan over NFS on a BSD KVM guest using a Linux host as the NFS server.  FreeBSD and NetBSD both use GCC so I don't think it is the compiler (Open uses llvm).

Visually, linking is slower; all binaries are in the same ballpark:
-rwxr-xr-x. 1 cagney qemu   9400760 Nov  7 19:13 OBJ.kvm.fedora/programs/pluto/pluto
-rwxr-xr-x. 1 cagney cagney 9079352 Nov  7 19:24 OBJ.kvm.freebsd/programs/pluto/pluto
-rwxr-xr-x. 1 cagney cagney 9817860 Nov  7 19:36 OBJ.kvm.netbsd/programs/pluto/pluto
-rwxr-xr-x. 1 cagney cagney 7869320 Nov  7 20:04 OBJ.kvm.openbsd/programs/pluto/pluto

KVM Host and NFS server:
  Linux 6.0.5-200.fc36.x86_64
KVM Guest and NFS client (the KVM gets are configured the same way):
  FreeBSD freebsd 13.0-RELEASE FreeBSD 13.0-RELEASE

The vm was created using:

sudo virt-install \
	--connect=qemu:///system --check=path_in_use=off --graphics=none --virt-type=kvm --noreboot --console=pty,target_type=serial --vcpus=4 --memory=5120  --cpu=host-passthrough --network=network:swandefault,model=virtio --rng=type=random,device=/dev/random --security=type=static,model=dac,label='1000:107',relabel=yes \
	--filesystem=target=bench,type=mount,accessmode=squash,source=/home/libreswan/wip-misc \
	--filesystem=target=pool,type=mount,accessmode=squash,source=/home/pool \
	--filesystem=target=source,type=mount,accessmode=squash,source=/home/libreswan/wip-misc \
	--filesystem=target=testing,type=mount,accessmode=squash,source=/home/libreswan/wip-misc/testing \
	--name=m.freebsd \
	--os-variant=freebsd13.1 \
	--disk=cache=writeback,path=/home/pool/m.freebsd.qcow2 \
	--import \
	--noautoconsole

FreeBSD's network interface is:

  vtnet0: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=4c07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,LINKSTATE,TXCSUM_IPV6>
	ether 52:54:00:11:50:39
	inet 192.168.234.153 netmask 0xffffff00 broadcast 192.168.234.255
	media: Ethernet autoselect (10Gbase-T <full-duplex>)
	status: active
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>

so it should be virtualized.

FreeBSD's fstab looks like:

# Device	Mountpoint	FStype	Options	Dump	Pass#
/dev/vtbd0s1a	/		ufs	rw	1	1
/dev/vtbd0s1b	none		swap	sw	0	0
192.168.234.1:/home/pool     /pool       nfs     rw
192.168.234.1:/home/libreswan/wip-webkvm    /bench       nfs     rw
192.168.234.1:/home/libreswan/wip-webkvm   /source         nfs     rw
192.168.234.1:/home/libreswan/wip-webkvm/testing  /testing        nfs     rw

so nothing custom (presumably it is using TCP).

Linux's exportfs shows entries such as:

/home/libreswan/wip-webkvm/testing
192.168.234.0/24(sync,wdelay,hide,no_subtree_check,anonuid=1000,anongid=1000,sec=sys,rw,secure,root_squash,all_squash)


Presumably I've mis-configured FreeBSD but I'm at a loss as to what.
Comment 1 Andrew Cagney 2022-11-08 15:42:09 UTC
Changing:
	--os-variant=freebsd13.1 \
to the more correct:
	--os-variant=freebsd13.0
doesn't help.

However, the same setup on an older (OS) slower (h/w) machine does help.  Which would suggest the network interface (not surprising).

Good host:
Linux 5.15.5-100.fc34.x86_64
qemu-kvm-5.2.0-8.fc34.x86_64

Bad host:
Linux 6.0.5-200.fc36.x86_64
qemu-kvm-6.2.0-16.fc36.x86_64
Comment 2 Vladimir Druzenko freebsd_committer freebsd_triage 2023-09-16 17:21:08 UTC
Try mount_nfs -o readahead=16.
Just tested and got 540+MBytes/s read bandwidth via 10GE.
Without readahead == readahead=1: 63
readahead=2: 95
readahead=4: 159
readahead=8: 280
readahead=16: 540

16 is the maximum:
src/sys/fs/nfs/nfs.h:#define  NFS_MAXRAHEAD   16              /* Max. read ahead # blocks */
But I think I can get more bandwidth if increase it.
Comment 3 Vladimir Druzenko freebsd_committer freebsd_triage 2023-09-16 17:51:48 UTC
BTW, man page mount_nfs says that 4 is max value and 0 is allowed:
             readahead=⟨value⟩
                     Set the read-ahead count to the specified value.  This
                     may be in the range of 0 - 4, and determines how many
                     blocks will be read ahead when a large file is being read
                     sequentially.  Trying a value greater than 1 for this is
                     suggested for mounts with a large bandwidth * delay
                     product.

With value 0 mount_nfs print error message and exit. And max value is 16 now, not 4.

IMHO, better to increase this value for at least 128 (or even 1024 to "cover" more than 10GE).

I can't test patches to kernel often: I have 2 hosts with 10GE and both are in production.

One more question to fs@: is it possible to set default readahead on server side?

P.S. This may be a topic for another PR, but maybe I'll create one later if someone from fs@ asks for it.
Comment 4 Andrew Cagney 2023-09-16 18:05:45 UTC
(In reply to Vladimir Druzenko from comment #2)
FYI, in the test scenario everything is virtual.
I'll clarify the subject line.
Comment 5 Vladimir Druzenko freebsd_committer freebsd_triage 2023-09-16 19:17:09 UTC
(In reply to Andrew Cagney from comment #4)
Anyway try readahead=16.
Comment 6 Rick Macklem freebsd_committer freebsd_triage 2023-09-16 23:01:42 UTC
As you've noted, NFS performance issues are often
network interface related. However, here are a few
tunables you can try, beyond "readahead", which was
already mentioned. (And, yes, the man page is out of
date w.r.t. readahead. You can build a kernel from
sources with the value bumped up from 16, but I doubt
a value greater than 16 will be needed?)

First, do this on the client when the mount is established:
# nfsstat -m
This will show you what it is actually using.
You probably have NFSv3,TCP and an rsize of 64K or 128K.
If you stick "vfs.maxbcachebuf=1048576" in the client's
/boot/loader.conf, the rsize will probably go up to 1Mbyte.
(It will also recommend that you increase kern.ipc.maxsockbuf
 and will suggest a value. I'd increase it to at least the recommended
 value.)

A large rsize/wsize will have a similar effect to increasing
readahead, but will affect writing as well as reading.

nocto - Close to Open consistency will improve correctness
        when multiple clients access the same files. If your
        files are not being manipulated by multiple clients
        concurrently, turning it off can help.

Most 10Gbps net interfaces use multiple queues and pin a TCP
connection to a queue.  As such, an NFS mount with a single
TCP connection can only get a fraction of the bandwidth.
nconnect=N can help here, but it only works for NFSv4.1/4.2,
    so you also need to specify the "nfsv4" or "vers=4"
    mount option. (No idea w.r.t. VMs.)

Then there are all the cache timeouts "acdirmin,...".
Similar to nocto, longer timeouts have a negative impact
if other clients (or processes locally on the server) are changing
things. But longer timeouts result in better caching.

Then there is "noatime", since few care about the access time
being up to date.

However, nothing above will fix poor performance caused by a
poor net interface.
Comment 7 Andrew Cagney 2023-09-17 00:20:12 UTC
(In reply to Rick Macklem from comment #6)
I have a work-around - read from NFS+KVM, write to /tmp - and this works with the stock kernel and virtual network driver.
Comment 8 Andrew Cagney 2023-09-17 00:26:13 UTC
A blocker preventing me using 13.1 was fixed in 13.2.

Retesting with stock 13.2:

# uname -a
FreeBSD freebsd 13.2-RELEASE FreeBSD 13.2-RELEASE releng/13.2-n254617-525ecfdad597 GENERIC amd64

$ uname -a
Linux bernard 6.4.14-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Sat Sep  2 16:36:06 UTC 2023 x86_64 GNU/Linux
$ rpm -q qemu-kvm
qemu-kvm-7.2.5-1.fc38.x86_64

results in:

real    1m22.930s
user    0m52.125s
sys     0m7.070s

so somewhere between
  FreeBSD 13.0 -> 13.2
  Linux 6.0.5 -> 6.4.14
  QEMU 6.2.0 -> 7.2.5
this has been fixed.
Comment 9 Vladimir Druzenko freebsd_committer freebsd_triage 2023-09-17 01:54:55 UTC
(In reply to Rick Macklem from comment #6)
Thanks a lot for information!
Where is the best place to continue this discussion?
Comment 10 Rick Macklem freebsd_committer freebsd_triage 2023-09-17 14:39:52 UTC
Either freebsd-current@ or freebsd-fs@ are
good bets.