First the numbers: FreeBSD: real 11m53.066s user 0m36.488s sys 0m5.237s NetBSD (9.99.something): real 3m52.182s user 0m46.372s sys 0m19.655s OpenBSD (7.2): real 3m24.953s user 0m42.260s sys 0m9.800s now the details: The test is to use build libreswan over NFS on a BSD KVM guest using a Linux host as the NFS server. FreeBSD and NetBSD both use GCC so I don't think it is the compiler (Open uses llvm). Visually, linking is slower; all binaries are in the same ballpark: -rwxr-xr-x. 1 cagney qemu 9400760 Nov 7 19:13 OBJ.kvm.fedora/programs/pluto/pluto -rwxr-xr-x. 1 cagney cagney 9079352 Nov 7 19:24 OBJ.kvm.freebsd/programs/pluto/pluto -rwxr-xr-x. 1 cagney cagney 9817860 Nov 7 19:36 OBJ.kvm.netbsd/programs/pluto/pluto -rwxr-xr-x. 1 cagney cagney 7869320 Nov 7 20:04 OBJ.kvm.openbsd/programs/pluto/pluto KVM Host and NFS server: Linux 6.0.5-200.fc36.x86_64 KVM Guest and NFS client (the KVM gets are configured the same way): FreeBSD freebsd 13.0-RELEASE FreeBSD 13.0-RELEASE The vm was created using: sudo virt-install \ --connect=qemu:///system --check=path_in_use=off --graphics=none --virt-type=kvm --noreboot --console=pty,target_type=serial --vcpus=4 --memory=5120 --cpu=host-passthrough --network=network:swandefault,model=virtio --rng=type=random,device=/dev/random --security=type=static,model=dac,label='1000:107',relabel=yes \ --filesystem=target=bench,type=mount,accessmode=squash,source=/home/libreswan/wip-misc \ --filesystem=target=pool,type=mount,accessmode=squash,source=/home/pool \ --filesystem=target=source,type=mount,accessmode=squash,source=/home/libreswan/wip-misc \ --filesystem=target=testing,type=mount,accessmode=squash,source=/home/libreswan/wip-misc/testing \ --name=m.freebsd \ --os-variant=freebsd13.1 \ --disk=cache=writeback,path=/home/pool/m.freebsd.qcow2 \ --import \ --noautoconsole FreeBSD's network interface is: vtnet0: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=4c07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,LINKSTATE,TXCSUM_IPV6> ether 52:54:00:11:50:39 inet 192.168.234.153 netmask 0xffffff00 broadcast 192.168.234.255 media: Ethernet autoselect (10Gbase-T <full-duplex>) status: active nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> so it should be virtualized. FreeBSD's fstab looks like: # Device Mountpoint FStype Options Dump Pass# /dev/vtbd0s1a / ufs rw 1 1 /dev/vtbd0s1b none swap sw 0 0 192.168.234.1:/home/pool /pool nfs rw 192.168.234.1:/home/libreswan/wip-webkvm /bench nfs rw 192.168.234.1:/home/libreswan/wip-webkvm /source nfs rw 192.168.234.1:/home/libreswan/wip-webkvm/testing /testing nfs rw so nothing custom (presumably it is using TCP). Linux's exportfs shows entries such as: /home/libreswan/wip-webkvm/testing 192.168.234.0/24(sync,wdelay,hide,no_subtree_check,anonuid=1000,anongid=1000,sec=sys,rw,secure,root_squash,all_squash) Presumably I've mis-configured FreeBSD but I'm at a loss as to what.
Changing: --os-variant=freebsd13.1 \ to the more correct: --os-variant=freebsd13.0 doesn't help. However, the same setup on an older (OS) slower (h/w) machine does help. Which would suggest the network interface (not surprising). Good host: Linux 5.15.5-100.fc34.x86_64 qemu-kvm-5.2.0-8.fc34.x86_64 Bad host: Linux 6.0.5-200.fc36.x86_64 qemu-kvm-6.2.0-16.fc36.x86_64
Try mount_nfs -o readahead=16. Just tested and got 540+MBytes/s read bandwidth via 10GE. Without readahead == readahead=1: 63 readahead=2: 95 readahead=4: 159 readahead=8: 280 readahead=16: 540 16 is the maximum: src/sys/fs/nfs/nfs.h:#define NFS_MAXRAHEAD 16 /* Max. read ahead # blocks */ But I think I can get more bandwidth if increase it.
BTW, man page mount_nfs says that 4 is max value and 0 is allowed: readahead=⟨value⟩ Set the read-ahead count to the specified value. This may be in the range of 0 - 4, and determines how many blocks will be read ahead when a large file is being read sequentially. Trying a value greater than 1 for this is suggested for mounts with a large bandwidth * delay product. With value 0 mount_nfs print error message and exit. And max value is 16 now, not 4. IMHO, better to increase this value for at least 128 (or even 1024 to "cover" more than 10GE). I can't test patches to kernel often: I have 2 hosts with 10GE and both are in production. One more question to fs@: is it possible to set default readahead on server side? P.S. This may be a topic for another PR, but maybe I'll create one later if someone from fs@ asks for it.
(In reply to Vladimir Druzenko from comment #2) FYI, in the test scenario everything is virtual. I'll clarify the subject line.
(In reply to Andrew Cagney from comment #4) Anyway try readahead=16.
As you've noted, NFS performance issues are often network interface related. However, here are a few tunables you can try, beyond "readahead", which was already mentioned. (And, yes, the man page is out of date w.r.t. readahead. You can build a kernel from sources with the value bumped up from 16, but I doubt a value greater than 16 will be needed?) First, do this on the client when the mount is established: # nfsstat -m This will show you what it is actually using. You probably have NFSv3,TCP and an rsize of 64K or 128K. If you stick "vfs.maxbcachebuf=1048576" in the client's /boot/loader.conf, the rsize will probably go up to 1Mbyte. (It will also recommend that you increase kern.ipc.maxsockbuf and will suggest a value. I'd increase it to at least the recommended value.) A large rsize/wsize will have a similar effect to increasing readahead, but will affect writing as well as reading. nocto - Close to Open consistency will improve correctness when multiple clients access the same files. If your files are not being manipulated by multiple clients concurrently, turning it off can help. Most 10Gbps net interfaces use multiple queues and pin a TCP connection to a queue. As such, an NFS mount with a single TCP connection can only get a fraction of the bandwidth. nconnect=N can help here, but it only works for NFSv4.1/4.2, so you also need to specify the "nfsv4" or "vers=4" mount option. (No idea w.r.t. VMs.) Then there are all the cache timeouts "acdirmin,...". Similar to nocto, longer timeouts have a negative impact if other clients (or processes locally on the server) are changing things. But longer timeouts result in better caching. Then there is "noatime", since few care about the access time being up to date. However, nothing above will fix poor performance caused by a poor net interface.
(In reply to Rick Macklem from comment #6) I have a work-around - read from NFS+KVM, write to /tmp - and this works with the stock kernel and virtual network driver.
A blocker preventing me using 13.1 was fixed in 13.2. Retesting with stock 13.2: # uname -a FreeBSD freebsd 13.2-RELEASE FreeBSD 13.2-RELEASE releng/13.2-n254617-525ecfdad597 GENERIC amd64 $ uname -a Linux bernard 6.4.14-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Sat Sep 2 16:36:06 UTC 2023 x86_64 GNU/Linux $ rpm -q qemu-kvm qemu-kvm-7.2.5-1.fc38.x86_64 results in: real 1m22.930s user 0m52.125s sys 0m7.070s so somewhere between FreeBSD 13.0 -> 13.2 Linux 6.0.5 -> 6.4.14 QEMU 6.2.0 -> 7.2.5 this has been fixed.
(In reply to Rick Macklem from comment #6) Thanks a lot for information! Where is the best place to continue this discussion?
Either freebsd-current@ or freebsd-fs@ are good bets.