Bug 6032 - poor TCP performance using FDDI over long delay path
Summary: poor TCP performance using FDDI over long delay path
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 2.2.5-RELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-bugs (Nobody)
Depends on:
Reported: 1998-03-16 16:10 UTC by Curtis Villamizar
Modified: 2001-03-28 19:59 UTC (History)
0 users

See Also:

file.diff (849 bytes, patch)
1998-03-16 16:10 UTC, Curtis Villamizar
no flags Details | Diff
file.diff (838 bytes, patch)
1998-03-16 16:10 UTC, Curtis Villamizar
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Curtis Villamizar 1998-03-16 16:10:01 UTC
Change window size to a large value (for excample 128KB).  Expect to
get 40-80 Mb/s.  Instead FreeBSD yields about 1 MB/s.  BSDI and other
BSD or *ix flavors yields 40-80 Mb/s as expected.

A bit part of the problem is the function tcp_mss in tcp_input.c which
sets the window size back to a small value.

Fix: The email message below sums it up.  I never did look into the reason
why increasing MCLSHIFT would result in an unusable kernel so I'm
sending this in as one bug report.  If I get a chance I'll look at the
MCLSHIFT problem and also try to figure out why setting NMBCLUSTERS to
2048 or above was a problem.  You can do whatever you want with these
patches.  This is simply a performance issue.

Subject: FreeBSD performance problem solved
Date: Thu, 19 Feb 1998 22:23:07 -0500
From: Curtis Villamizar <curtis@brookfield.ans.net>

The FreeBSD performance problem we had run into previously has now
been solved.  It may not be the best way for the general FreeBSD
audience but it is completely solved for our puposes.

The executive summary is:

    - the kernel no longer resets the window size back to a small
      value for no apparent reason (see below)
    - we now can use just under a 1MB window (about the same as BSDI)
    - some kernel tuning (page buffer size, number of clusters) was
      done to make FDDI MTU work slightly faster
    - we get 20 Mb/s with 192 KB window and 70 msec RTT
    - we get 77 Mb/s with 896 KB window and 70 msec RTT (6.7 sec transfer)
    - we get 88 Mb/s with 896 KB window and 70 msec RTT (47 sec transfer)
    - we get 89 Mb/s with 896 KB window and 70 msec RTT (184 sec transfer)
    - these are slightly better than the BSDI figures (I think? Bill?)

The 2GB transfer in just over 3 minutes is getting quite close to FDDI
line rate.

The gory details are listed below.  I'll be sending separate bug
reports to the FreeBSD team on the tcp_mss issue and the inability to
change MCLSHIFT or increase NMBCLUSTERS to 2048.


All the kernel stuff is in /sys which is really a symbolic link to
/usr/src/sys.  Some of the key directories are netinet where all the
ip, udp, and tcp code is, kern where all the socket code is, vm where
the virtual memory code is, and sys where system header files are.

The main culprit was the function tcp_mss in tcp_input.c.  This
function is called when a TCP SYN or SYN ACK arrives.  Its purpose in
life is to adjust the initial MSS and when doing so also adjust the
buffer size if appropriate.  One of the new "features" of tcp_mss is
that it now looks up the route that would be used for the socket
return path and unconditionally reset the send and recv buffer size if
there is a sendspace or recvspace parameter on the route even if the
buffer sizes had been set by a setsockopt.  When I found this in the
code my first reaction was to not touch the source and just explicitly
set the sendspace or recvspace on the route to 10/8.  This effort was
foiled by the fact that tcp_mss seems to have picked up the wrong
route.  I then decided to get rid of the problem for good and just
change the code so it will only increase the buffer sizes according to
the route, but never decrease them.

The patch is:

Another change is the change to SB_MAX (which can also be changed with

The change to the page size makes a full MTU packet fit within a page
and allows the kernel code to do less copying.

options NMBCLUSTERS=1024

We could increase this to something over 1024.  At the POC lab it
would take 2048.  This is sort of odd since that would have only been
4 MB dedicated to clusters on a 64 MB machine.  This could be a
magical power of two boundary for some other reason that I wasn't able
to locate in the source code.  I was never successful in increasing
the cluster size from 2048 to 8192 (increase MCLSHIFT from 11 to 13).
Again, there are dependencies on the relative size of some things in
the kernel that aren't documented (and might be regarded as bugs).

Increasing NMBCLUSTERS to 2048 or more or increasing MCLSHIFT from 11
to 13 will have to be exercises for a later date.  These are tuning
beyond what we really need.

Fooling with these latter optimization gave us unusable kernels in the
POC lab so I didn't want to play with this unless I was within walking
distance of the reset button and had a console and keyboard.--JMBO0fiGssEWvmPHzz5VzwmfcAk77Q16U3WQlCFMav5t0EHV
Content-Type: text/plain; name="file.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="file.diff"

*** tcp_input.c.orig	Thu Feb 19 21:56:49 1998
--- tcp_input.c	Thu Feb 19 21:56:14 1998
*** 2075,2080 ****
--- 2075,2082 ----
  	if ((bufsize = rt->rt_rmx.rmx_sendpipe) == 0)
  		bufsize = so->so_snd.sb_hiwat;
+ 	if (bufsize < so->so_snd.sb_hiwat)
+ 	  bufsize = so->so_snd.sb_hiwat;
  	if (bufsize < mss)
  		mss = bufsize;
  	else {
*** 2089,2094 ****
--- 2091,2098 ----
  	if ((bufsize = rt->rt_rmx.rmx_recvpipe) == 0)
  		bufsize = so->so_rcv.sb_hiwat;
+ 	if (bufsize < so->so_rcv.sb_hiwat)
+ 	  bufsize = so->so_rcv.sb_hiwat;
  	if (bufsize > mss) {
  		bufsize = roundup(bufsize, mss);
  		if (bufsize > sb_max)
Run ttcp or netperf over long delay path with 128KB window.  Source
for ttcp is freely available (source on request if you don't have it).
Comment 1 Poul-Henning Kamp freebsd_committer 2001-03-28 19:59:24 UTC
State Changed
From-To: open->closed

Jlemon will judge this one.