I have FreeBSD 14 server and client. Both have Intel X540 10GBase-T adapters and are connected via CAT7 and a Netgear switch that has the respective 10GBase-T ports. Via iperf3, I measure 1233 MiB/s (9.87GBit/s) throughput. Via nc, I measure 1160 MiB/s throughput. Via NFS, I get around 190-250MiB/s. I did not expect to get the full 1100MiB/s with NFS, but I did hope to be between 600-800MB/s at least. Various guides suggest tinkering with different TCP related sysctls, but I haven't had any luck improving the performance. And since nc also manages to push >1GByte over TCP, this doesn't seem like the core of the problem. I have replaced the base system's ix with the one from ports, but no change. Again, I don't think the driver or the network stack have an issue per se; it seems to be NFS related. I have used default options to do the mounts. This is what nfsstat shows for the NFS3 mount: ``` nfsv3,tcp,resvport,nconnect=1,hard,cto,lockd,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,n egnametimeo=60,rsize=65536,wsize=65536,readdirsize=65536,readahead=1,wcommitsize=16777216,timeout=120,retrans=2 ``` and for the NFS4 mount: ``` nfsv4,minorversion=2,tcp,resvport,nconnect=1,hard,cto,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,namet imeo=60,negnametimeo=60,rsize=65536,wsize=65536,readdirsize=65536,readahead=1,wcommitsize=16777216,timeout=120,re trans=2147483647 ``` Am I missing something? Is this a bug or a configuration problem? I will try to set up a linux NFS client to see if the issues are client or server related. Thanks for your help! P.S.: The server has an NVME raidz and can maintain throughput speeds over 900MiB/s reading and writing hundreds of gigabytes from/to different datasets of the pool. Even with encryption and compression. So I don't think disks are a limiting factor.
You could try these mount options: nconnect=4 (or 8) on the NFSv4 mount only (doesn't work for NFSv3) readahead=4 (or 8) You can also try bumping up the rsize/wsize. For the server, set bfs_server_maxio=1048576 in it's /etc/rc.conf. For the client, set vfs.maxbcachebuf=1048576 in it's /boot/loader.conf. A mount done after these changes should default to rsize=1048576,wsize=1048576 (you can then try 256K using the rsize and wsize mount options).
Thanks for the reply! With rsize=262144,wsize=262144 I get 293 MiB/s. With rsize=1048576,wsize=1048576 I get 395 MiB/s. This is already an improvement, but still not where I would like to have it. nconnect and readahead don't seem to make much of a difference. NFS3 vs NFS4 also not. Interestingly, if I boot Linux on the client, and perform a regular NFS mount with no options supplied, I get almost 600 MiB/s. This was even performed before changing the server setting. It seems to indicate that our problems are client-side. These are the options that Linux reports as used: ``` rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=1.3.1.2,mountvers=3,mountport=820,mountproto=udp,local_lock=none,addr=1.3.1.2 ``` So it seems to get a much higher throughput with lower rsize and wsize. I will try Linux next with higher rsize and wsize.
I can confirm that rsize and wsize do not influence the performance of the Linux NFS3 client strongly; choosing values between 128KiB and 1MiB, I always get 550-600MiB/s. The Linux client is however strongly influenced by device mtu. I am currently operating both server and client at an mtu of 9000. If I lower the mtu to 1500, the Linux client performs *very similar* to the FreeBSD client (at any mtu), delivering 200-250 MiB/s. I think, it is strange that the FreeBSD NFS client does not benefit from a mtu higher than 1500. Could this be a hint at what's going wrong?
I'd recommend that you start a discussion on a mailing list (freebsd-current@ or freebsd-stable@), since others may have insight related to tuning and the Intel NIC/driver. On the mailing list, I suggest you ask others what performance they get when using other non-Intel NICs. One well known problem (that may never be fixed) is that use of 9K jumbo mbufs can cause fragmentation of the mbuf pool. (A NIC driver does not need to use 9K jumbo mbufs for 9K mtu packets, but some do.)
Thanks for your reply! I will ask on the mailing lists, although I still consider the current behaviour a bug. I mean, by default a crucial networking component in the base system achieves less than 20% of the theoretical performance and around a third of the performance the respective component on Linux delivers by default.
(In reply to Rick Macklem from comment #1) A quick test between two FreeBSD 13.2 servers with 10G Intel X710 ethernet: (writing to a ZFS zpool with spinning disks/log on SSD/cache on SSD): # mount -t nfs -o sec=sys,vers=4 dedur01:/test /mnt # cd /mnt # dd if=/dev/zero of=16G bs=1M count=16384 status=progress 16982736896 bytes (17 GB, 16 GiB) transferred 61.065s, 278 MB/s # dd of=/dev/null if=16G bs=1M count=16384 status=progress 17151557632 bytes (17 GB, 16 GiB) transferred 49.015s, 350 MB/s # cd / # umount /mnt # mount -t nfs -o sec=sys,vers=4,readahead=8 dedur01:/test /mnt # cd /mnt # dd of=/dev/null if=16G bs=1M count=16384 status=progress 16490954752 bytes (16 GB, 15 GiB) transferred 16.019s, 1029 MB/s # nfsstat -m dedur01:/test on /mnt nfsv4,minorversion=2,tcp,resvport,nconnect=1,hard,cto,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=65536,wsize=65536,readdirsize=65536,readahead=8,wcommitsize=16777216,timeout=120,retrans=2147483647 # ifconfig ixl0 ixl0: flags=8963<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=4e507bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP> ether 3c:fd:fe:24:e7:e0 media: Ethernet autoselect (10Gbase-Twinax <full-duplex>) status: active nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> If I redo the read test without unmounting and remounting I get some silly number like 4GB/s or so (cached locally in RAM I suppose :-)
(In reply to Peter Eriksson from comment #6) Yes. If you do not umount/mount, the file (or at least part of it) will be in the client's buffer cache (kernel ram). Does anyone know what differs between the X540 and X710?
(In reply to Rick Macklem from comment #7) The X540 is a much older network card - around 5 years older than the X710. From the manual pages: X540 Bus: PCIe 2.1 x8 Chipset: Intel 82598EB Driver: ixgbe Max MTU: 16144 Features: Jumbo Frames, MSIX, TSO & RSS X710 Bus: PCIe 3.0 x8 Chipset: Intel 700-series Driver: ixl Max MTU: 9706 Features: Jumbo Frames, TX/RX checksum offload, TSO (TCP Segmentation offload), LRO (Large Receive Offload), VLAN tag insertion/extraction, VLAN checksum offload, VLAN TSO, RSS (Receive Side Steering) I don't have any X540/ixgbe boards here so can't test that combo unfortunately, but looking at the feature sets it seems the X540 lacks the LRO feature...
Thank you for the replies and sorry for my late reply (been travelling a lot for work). I am still very interested in debugging this further! @Peter Eriksson It's hopeful to see higher speeds with FreeBSD on other devices at least. > The X540 is a much older network card - around 5 years older than the X710. > I don't have any X540/ixgbe boards here so can't test that combo unfortunately, but looking at the feature sets it seems the X540 lacks the LRO feature... Hm, but I do get much higher speeds on Linux, so how can this be a hardware issue? I am not sure whether LRO is the crucial thing here, but at least `ifconfig ix0` on the client does list LRO among its features: # ifconfig -v ix0 ix0: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 9000 options=4e53fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,HWSTATS,MEXTPG> ether 98:b7:85:1f:2e:72 inet 192.168.3.80 netmask 0xffffff00 broadcast 192.168.3.255 media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>) status: active nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
Ok, wow. I have just moved the client onto a different room/Rj45-wall-socket, and now I am getting NFS speeds around 950MiB/s with default settings and 1150MiB/s with nconnect=8,readahead=8. This would suggest some SNAFU with both CAT7 cables going to the other outlet. I am really puzzled why that would affect NFS on FreeBSD more strongly that NFS on Linux or NC on FreeBSD, but that's what it looks like right now :o
(In reply to Hannes Hauswedell from comment #10) I assume we can close this PR then?
(In reply to Rick Macklem from comment #11) Yes, I am closing it. I am still puzzled by some aspects, but I don't think anyone here can clear this up. Thank you for your comments!