Matt, I'm CC'ing you because it looks like the bug is also in Dragonfly, so I think you'll want to commit something similar. The inflight window scaling algorithm keeps a decaying average of a socket's bandwidth, but on the first call to tcp_xmit_bandwidth_limit, the sequence number of the previous packet is not known, so the (ack_seq - tp->t_bw_rtseq) clause just gives you the raw sequence number, resulting in a random value for the calculated bandwidth. Until enough packets have traveled so the average has decayed to the correct value, the calculated window is large enough that it's not even used. On a dialup, for example, it never gets a chance. Fix: The minimal fix is to check for (tp->t_bw_rtttime == 0 || tp->t_bw_rtseq == 0 || (int)bw < 0), and if it evalutes true, store the current values and return. I recommend committing the whole patch though. It changed the debugging output to print the instantaneous bw, and the current and previous average bw. How-To-Repeat: Run net.inet.tcp.hostcache.list, and take a look at the BANDWIDTH column. Note that there is a secondary bug in this sysctl output that multiplies the bandwidth by 8, even though the original value is already in octets/sec.
Hmm. Well, looking at the original code it looks like t_bw_rtttime is properly initialized in tcp_newtcpcb(), so it will never be 0. t_bw_rtseq is initialized for outgoing connections, but not properly initialized for incoming connections. The original code was rather messy and really needs some reorganization, so I would recommend doing some reorganizing rather then just patching more mess on top. * Remove initialization of t_bw_rttime so it can be tested against 0 in the bandwidth limiting code. * Auto-initialize t_bw_rtttime in tcp_xmit_bandwidth_limit by testing it against 0. * Add an idle check. * Assign an initial bandwidth of tcp_inflight_min so high speed connections do not have to ramp-up the exponential average all the way from 0. The patch below is against DragonFly and will not apply to FreeBSD, but is included to demonstrate my recommendation. -Matt Matthew Dillon <dillon@backplane.com>
Updated patch including Matt's recommended fix: Index: tcp_hostcache.c =================================================================== RCS file: /home/ncvs/src/sys/netinet/tcp_hostcache.c,v retrieving revision 1.7 diff -u -p -r1.7 tcp_hostcache.c --- tcp_hostcache.c 16 Aug 2004 18:32:07 -0000 1.7 +++ tcp_hostcache.c 15 Dec 2004 17:42:02 -0000 @@ -175,7 +175,7 @@ SYSCTL_INT(_net_inet_tcp_hostcache, OID_ &tcp_hostcache.purgeall, 0, "Expire all entires on next purge run"); SYSCTL_PROC(_net_inet_tcp_hostcache, OID_AUTO, list, - CTLTYPE_STRING | CTLFLAG_RD | CTLFLAG_SKIP, 0, 0, + CTLTYPE_STRING | CTLFLAG_RD, 0, 0, sysctl_tcp_hc_list, "A", "List of all hostcache entries"); @@ -676,7 +676,7 @@ sysctl_tcp_hc_list(SYSCTL_HANDLER_ARGS) (RTM_RTTUNIT / (hz * TCP_RTT_SCALE))), msec(hc_entry->rmx_rttvar * (RTM_RTTUNIT / (hz * TCP_RTT_SCALE))), - hc_entry->rmx_bandwidth * 8, + hc_entry->rmx_bandwidth, hc_entry->rmx_cwnd, hc_entry->rmx_sendpipe, hc_entry->rmx_recvpipe, Index: tcp_subr.c =================================================================== RCS file: /home/ncvs/src/sys/netinet/tcp_subr.c,v retrieving revision 1.201.2.4 diff -u -p -r1.201.2.4 tcp_subr.c --- tcp_subr.c 1 Dec 2004 05:37:28 -0000 1.201.2.4 +++ tcp_subr.c 16 Dec 2004 06:38:48 -0000 @@ -631,7 +631,6 @@ tcp_newtcpcb(inp) tp->snd_bwnd = TCP_MAXWIN << TCP_MAX_WINSHIFT; tp->snd_ssthresh = TCP_MAXWIN << TCP_MAX_WINSHIFT; tp->t_rcvtime = ticks; - tp->t_bw_rtttime = ticks; /* * IPv4 TTL initialization is necessary for an IPv6 socket as well, * because the socket may be bound to an IPv6 wildcard address, @@ -1907,9 +1906,10 @@ tcp_twrespond(struct tcptw *tw, int flag void tcp_xmit_bandwidth_limit(struct tcpcb *tp, tcp_seq ack_seq) { - u_long bw; + u_long bw, avgbw; u_long bwnd; int save_ticks; + int delta_ticks; /* * If inflight_enable is disabled in the middle of a tcp connection, @@ -1922,29 +1922,42 @@ tcp_xmit_bandwidth_limit(struct tcpcb *t } /* + * Validate the delta time. If a connection is new or has been idle + * a long time we have to reset the bandwidth calculator. + */ + save_ticks = ticks; + delta_ticks = save_ticks - tp->t_bw_rtttime; + if (tp->t_bw_rtttime == 0 || delta_ticks < 0 || delta_ticks > hz * 10) { + tp->t_bw_rtttime = ticks; + tp->t_bw_rtseq = ack_seq; + if (tp->snd_bandwidth == 0) + tp->snd_bandwidth = tcp_inflight_min; + return; + } + if (delta_ticks == 0) + return; + + /* + * Sanity check, plus ignore pure window update acks. + */ + if ((int)(ack_seq - tp->t_bw_rtseq) <= 0) + return; + + /* * Figure out the bandwidth. Due to the tick granularity this * is a very rough number and it MUST be averaged over a fairly * long period of time. XXX we need to take into account a link * that is not using all available bandwidth, but for now our * slop will ramp us up if this case occurs and the bandwidth later * increases. - * - * Note: if ticks rollover 'bw' may wind up negative. We must - * effectively reset t_bw_rtttime for this case. */ - save_ticks = ticks; - if ((u_int)(save_ticks - tp->t_bw_rtttime) < 1) - return; - bw = (int64_t)(ack_seq - tp->t_bw_rtseq) * hz / - (save_ticks - tp->t_bw_rtttime); + bw = (int64_t)(ack_seq - tp->t_bw_rtseq) * hz / delta_ticks; + tp->t_bw_rtttime = save_ticks; tp->t_bw_rtseq = ack_seq; - if (tp->t_bw_rtttime == 0 || (int)bw < 0) - return; - bw = ((int64_t)tp->snd_bandwidth * 15 + bw) >> 4; - tp->snd_bandwidth = bw; + avgbw = ((int64_t)tp->snd_bandwidth * 15 + bw) >> 4; /* * Calculate the semi-static bandwidth delay product, plus two maximal @@ -1975,22 +1988,25 @@ tcp_xmit_bandwidth_limit(struct tcpcb *t * no other choice. */ #define USERTT ((tp->t_srtt + tp->t_rttbest) / 2) - bwnd = (int64_t)bw * USERTT / (hz << TCP_RTT_SHIFT) + tcp_inflight_stab * tp->t_maxseg / 10; + bwnd = (int64_t)avgbw * USERTT / (hz << TCP_RTT_SHIFT) + tcp_inflight_stab * tp->t_maxseg / 10; #undef USERTT if (tcp_inflight_debug > 0) { static int ltime; if ((u_int)(ticks - ltime) >= hz / tcp_inflight_debug) { ltime = ticks; - printf("%p bw %ld rttbest %d srtt %d bwnd %ld\n", + printf("%p bw %lu(%lu,%lu) rttbest %d srtt %d bwnd %ld\n", tp, - bw, + avgbw, tp->snd_bandwidth, bw, tp->t_rttbest, tp->t_srtt, bwnd ); } } + + tp->snd_bandwidth = avgbw; + if ((long)bwnd < tcp_inflight_min) bwnd = tcp_inflight_min; if (bwnd > tcp_inflight_max) Index: tcp_usrreq.c =================================================================== RCS file: /home/ncvs/src/sys/netinet/tcp_usrreq.c,v retrieving revision 1.107 diff -u -p -r1.107 tcp_usrreq.c --- tcp_usrreq.c 16 Aug 2004 18:32:07 -0000 1.107 +++ tcp_usrreq.c 16 Dec 2004 06:38:59 -0000 @@ -864,7 +864,6 @@ tcp_connect(tp, nam, td) tp->t_state = TCPS_SYN_SENT; callout_reset(tp->tt_keep, tcp_keepinit, tcp_timer_keep, tp); tp->iss = tcp_new_isn(tp); - tp->t_bw_rtseq = tp->iss; tcp_sendseqinit(tp); /* @@ -959,7 +958,6 @@ tcp6_connect(tp, nam, td) tp->t_state = TCPS_SYN_SENT; callout_reset(tp->tt_keep, tcp_keepinit, tcp_timer_keep, tp); tp->iss = tcp_new_isn(tp); - tp->t_bw_rtseq = tp->iss; tcp_sendseqinit(tp); /* -- Dan Nelson dnelson@allantgroup.com
Dan Nelson wrote: > Updated patch including Matt's recommended fix: > > [...] > diff -u -p -r1.201.2.4 tcp_subr.c > --- tcp_subr.c 1 Dec 2004 05:37:28 -0000 1.201.2.4 > +++ tcp_subr.c 16 Dec 2004 06:38:48 -0000 > [...] > @@ -1922,29 +1922,42 @@ tcp_xmit_bandwidth_limit(struct tcpcb *t > } > > /* > + * Validate the delta time. If a connection is new or has been idle > + * a long time we have to reset the bandwidth calculator. > + */ > + save_ticks = ticks; > + delta_ticks = save_ticks - tp->t_bw_rtttime; > + if (tp->t_bw_rtttime == 0 || delta_ticks < 0 || delta_ticks > hz * 10) { > + tp->t_bw_rtttime = ticks; > + tp->t_bw_rtseq = ack_seq; > + if (tp->snd_bandwidth == 0) > + tp->snd_bandwidth = tcp_inflight_min; > + return; > + } > + if (delta_ticks == 0) > + return; > + > + /* > + * Sanity check, plus ignore pure window update acks. > + */ > + if ((int)(ack_seq - tp->t_bw_rtseq) <= 0) > + return; I wonder, isn't there a flaw in the logic with regard to the sequence number handling? If the sequence number wraps around 't_bw_rtseq' no longer gets set and therefore the bandwidth calculation stops until 'ack_seq' either catches up with 't_bw_rtseq' again (which would take quite a while), or 'ticks' wraps around as well, or there is inactivity for more than 10 seconds. This is probably not the intended behavior. I think the correct approach would be to separate the handling of the two cases "sequence number wrap-around" and "window update acks". I suggest to define another variable 'delta_bytes' and initialize it like this: int delta_bytes; [...] delta_bytes = ack_seq - tp->t_bw_rtseq; Then compare it with zero where you compare 'delta_ticks' with zero, and check for less than zero (wrap-around) where you check 'delta_ticks' for less than zero. Combine the respective test results with '||'. Of course, you can then use 'delta_bytes' for the bandwidth calculation further down in the code, too. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers gemini@geminix.org | http://www.escapebox.net
In the last episode (Dec 18), Uwe Doering said: > Dan Nelson wrote: > >Updated patch including Matt's recommended fix: > > > >+ /* > >+ * Sanity check, plus ignore pure window update acks. > >+ */ > >+ if ((int)(ack_seq - tp->t_bw_rtseq) <= 0) > >+ return; > > I wonder, isn't there a flaw in the logic with regard to the sequence > number handling? If the sequence number wraps around 't_bw_rtseq' no > longer gets set and therefore the bandwidth calculation stops until > 'ack_seq' either catches up with 't_bw_rtseq' again (which would take > quite a while), or 'ticks' wraps around as well, or there is > inactivity for more than 10 seconds. This is probably not the > intended behavior. I think the code works as-is. ack_seq and tp->t_bw_rtseq are both of type "tcp_seq" which is a u_int32_t. Wrap-around is handled transparently when your variables are unsigned and your sequence space covers all possible values. It's the magic of mod(2^32) arithmetic :) The (int) cast just makes the if simpler. Without the cast it would read if (ack_seq - tp->t_bw_rtseq > 2147483648U || ack_seq == tp->t_bw_rtseq) The sanity check is probably not even necessary, as any really invalid sequence numbers would have caused the packet to be dropped before it got this far. -- Dan Nelson dnelson@allantgroup.com
Dan Nelson wrote: > In the last episode (Dec 18), Uwe Doering said: >>Dan Nelson wrote: >> >>>Updated patch including Matt's recommended fix: >>> >>>+ /* >>>+ * Sanity check, plus ignore pure window update acks. >>>+ */ >>>+ if ((int)(ack_seq - tp->t_bw_rtseq) <= 0) >>>+ return; >> >>I wonder, isn't there a flaw in the logic with regard to the sequence >>number handling? If the sequence number wraps around 't_bw_rtseq' no >>longer gets set and therefore the bandwidth calculation stops until >>'ack_seq' either catches up with 't_bw_rtseq' again (which would take >>quite a while), or 'ticks' wraps around as well, or there is >>inactivity for more than 10 seconds. This is probably not the >>intended behavior. > > I think the code works as-is. ack_seq and tp->t_bw_rtseq are both of > type "tcp_seq" which is a u_int32_t. Wrap-around is handled > transparently when your variables are unsigned and your sequence space > covers all possible values. It's the magic of mod(2^32) arithmetic :) > The (int) cast just makes the if simpler. Without the cast it would > read > > if (ack_seq - tp->t_bw_rtseq > 2147483648U || ack_seq == tp->t_bw_rtseq) > > The sanity check is probably not even necessary, as any really invalid > sequence numbers would have caused the packet to be dropped before it > got this far. You are correct. Unlike 'ticks', which is of type 'int', a sequence number wrap-around has no effect on the calculation of the byte count due to using 'u_int32_t'. Tricky. ;-) Therefore, I propose testing for equality only (window update acks). Checking for byte counts of more than 2147483648U and declaring that bogus is a little arbitrary. If we are concerned about invalid sequence numbers, they can result in a lower byte count as well. The way it is now (with the patch installed) it looks too much like a wrap-around check and is therefore misleading. So I suggest to either get rid of it or add a comment that explains the situation. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers gemini@geminix.org | http://www.escapebox.net
Uwe Doering wrote: > Dan Nelson wrote: >> In the last episode (Dec 18), Uwe Doering said: >>> Dan Nelson wrote: >>> >>>> Updated patch including Matt's recommended fix: >>>> >>>> + /* >>>> + * Sanity check, plus ignore pure window update acks. >>>> + */ >>>> + if ((int)(ack_seq - tp->t_bw_rtseq) <= 0) >>>> + return; >>> >>> I wonder, isn't there a flaw in the logic with regard to the sequence >>> number handling? If the sequence number wraps around 't_bw_rtseq' no >>> longer gets set and therefore the bandwidth calculation stops until >>> 'ack_seq' either catches up with 't_bw_rtseq' again (which would take >>> quite a while), or 'ticks' wraps around as well, or there is >>> inactivity for more than 10 seconds. This is probably not the >>> intended behavior. >> >> I think the code works as-is. ack_seq and tp->t_bw_rtseq are both of >> type "tcp_seq" which is a u_int32_t. Wrap-around is handled >> transparently when your variables are unsigned and your sequence space >> covers all possible values. It's the magic of mod(2^32) arithmetic :) >> The (int) cast just makes the if simpler. Without the cast it would >> read >> >> if (ack_seq - tp->t_bw_rtseq > 2147483648U || ack_seq == tp->t_bw_rtseq) >> >> The sanity check is probably not even necessary, as any really invalid >> sequence numbers would have caused the packet to be dropped before it >> got this far. > > You are correct. Unlike 'ticks', which is of type 'int', a sequence > number wrap-around has no effect on the calculation of the byte count > due to using 'u_int32_t'. Tricky. ;-) > > Therefore, I propose testing for equality only (window update acks). > Checking for byte counts of more than 2147483648U and declaring that > bogus is a little arbitrary. If we are concerned about invalid sequence > numbers, they can result in a lower byte count as well. The way it is > now (with the patch installed) it looks too much like a wrap-around > check and is therefore misleading. So I suggest to either get rid of it > or add a comment that explains the situation. On second thought, checking for less than zero (by means of the 'int' cast) might have its merits. This may be a protection against out-of-order ACKs, which could in fact be valid, but calculating a byte count from an ACK for an earlier packet when we've already processed a later ACK would be bogus. Hard to tell what Matt's intentions were. In the end the patch appears to be okay as it is. Nevertheless, my suggestion to comment this sufficiently still stands. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers gemini@geminix.org | http://www.escapebox.net
In the last episode (Dec 21), Uwe Doering said: > Uwe Doering wrote: > >Dan Nelson wrote: > >>In the last episode (Dec 18), Uwe Doering said: > >>>Dan Nelson wrote: > >>> > >>>>Updated patch including Matt's recommended fix: > >>>> > >>>>+ /* > >>>>+ * Sanity check, plus ignore pure window update acks. > >>>>+ */ > >>>>+ if ((int)(ack_seq - tp->t_bw_rtseq) <= 0) > >>>>+ return; > > On second thought, checking for less than zero (by means of the 'int' > cast) might have its merits. This may be a protection against > out-of-order ACKs, which could in fact be valid, but calculating a byte > count from an ACK for an earlier packet when we've already processed a > later ACK would be bogus. I am pretty sure only useful acks get to this point. I've changed that bit of code to just be + /* + * Ignore pure window update acks. + */ + if (ack_seq == tp->t_bw_rtseq) + return; and added another check for negative sequences that increments a sysctl counter. So far netstat -s has counted 567 out-of-order packets but my counter is still at 0. -- Dan Nelson dnelson@allantgroup.com
In the last episode (Dec 21), Uwe Doering said: > Uwe Doering wrote: > >Dan Nelson wrote: > >>In the last episode (Dec 18), Uwe Doering said: > >>>Dan Nelson wrote: > >>> > >>>>Updated patch including Matt's recommended fix: > >>>> > >>>>+ /* > >>>>+ * Sanity check, plus ignore pure window update acks. > >>>>+ */ > >>>>+ if ((int)(ack_seq - tp->t_bw_rtseq) <= 0) > >>>>+ return; > > On second thought, checking for less than zero (by means of the 'int' > cast) might have its merits. This may be a protection against > out-of-order ACKs, which could in fact be valid, but calculating a byte > count from an ACK for an earlier packet when we've already processed a > later ACK would be bogus. I am pretty sure only useful acks get to this point. I've changed that bit of code to just be + /* + * Ignore pure window update acks. + */ + if (ack_seq == tp->t_bw_rtseq) + return; and added another check for negative sequences that increments a sysctl counter. So far netstat -s has counted 567 out-of-order packets but my counter is still at 0. -- Dan Nelson dnelson@allantgroup.com _______________________________________________ freebsd-bugs@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscribe@freebsd.org"
In the last episode (Dec 22), Dan Nelson said: > I am pretty sure only useful acks get to this point. I've changed that > bit of code to just be > > + /* > + * Ignore pure window update acks. > + */ > + if (ack_seq == tp->t_bw_rtseq) > + return; > > and added another check for negative sequences that increments a sysctl > counter. So far netstat -s has counted 567 out-of-order packets but my > counter is still at 0. And after 40 days of uptime on my mail/squid servers, the counter is still at zero, so the code works perfectly for me. It'd be nice if this was committed before 5.4. Final patch follows: Index: tcp_hostcache.c =================================================================== RCS file: /home/ncvs/src/sys/netinet/tcp_hostcache.c,v retrieving revision 1.7.2.1 diff -u -p -r1.7.2.1 tcp_hostcache.c --- tcp_hostcache.c 31 Jan 2005 23:26:36 -0000 1.7.2.1 +++ tcp_hostcache.c 8 Feb 2005 07:09:31 -0000 @@ -175,7 +175,7 @@ SYSCTL_INT(_net_inet_tcp_hostcache, OID_ &tcp_hostcache.purgeall, 0, "Expire all entires on next purge run"); SYSCTL_PROC(_net_inet_tcp_hostcache, OID_AUTO, list, - CTLTYPE_STRING | CTLFLAG_RD | CTLFLAG_SKIP, 0, 0, + CTLTYPE_STRING | CTLFLAG_RD, 0, 0, sysctl_tcp_hc_list, "A", "List of all hostcache entries"); @@ -676,7 +676,7 @@ sysctl_tcp_hc_list(SYSCTL_HANDLER_ARGS) (RTM_RTTUNIT / (hz * TCP_RTT_SCALE))), msec(hc_entry->rmx_rttvar * (RTM_RTTUNIT / (hz * TCP_RTT_SCALE))), - hc_entry->rmx_bandwidth * 8, + hc_entry->rmx_bandwidth, hc_entry->rmx_cwnd, hc_entry->rmx_sendpipe, hc_entry->rmx_recvpipe, Index: tcp_subr.c =================================================================== RCS file: /home/ncvs/src/sys/netinet/tcp_subr.c,v retrieving revision 1.201.2.12 diff -u -p -r1.201.2.12 tcp_subr.c --- tcp_subr.c 31 Jan 2005 23:26:36 -0000 1.201.2.12 +++ tcp_subr.c 8 Feb 2005 07:09:33 -0000 @@ -628,7 +632,6 @@ tcp_newtcpcb(inp) tp->snd_bwnd = TCP_MAXWIN << TCP_MAX_WINSHIFT; tp->snd_ssthresh = TCP_MAXWIN << TCP_MAX_WINSHIFT; tp->t_rcvtime = ticks; - tp->t_bw_rtttime = ticks; /* * IPv4 TTL initialization is necessary for an IPv6 socket as well, * because the socket may be bound to an IPv6 wildcard address, @@ -1934,9 +1937,10 @@ tcp_twrespond(struct tcptw *tw, int flag void tcp_xmit_bandwidth_limit(struct tcpcb *tp, tcp_seq ack_seq) { - u_long bw; + u_long bw, avgbw; u_long bwnd; int save_ticks; + int delta_ticks; INP_LOCK_ASSERT(tp->t_inpcb); @@ -1951,29 +1955,50 @@ tcp_xmit_bandwidth_limit(struct tcpcb *t } /* + * Validate the delta time. If a connection is new or has been idle + * a long time we have to reset the bandwidth calculator. + */ + save_ticks = ticks; + delta_ticks = save_ticks - tp->t_bw_rtttime; + if (tp->t_bw_rtttime == 0 || delta_ticks < 0 || delta_ticks > hz * 10) { + tp->t_bw_rtttime = ticks; + tp->t_bw_rtseq = ack_seq; + if (tp->snd_bandwidth == 0) + tp->snd_bandwidth = tcp_inflight_min; + return; + } + if (delta_ticks == 0) + return; + + /* + * Ignore pure window update acks. + */ + if (ack_seq == tp->t_bw_rtseq) + return; + + /* Sanity check */ + if (SEQ_LT(ack_seq, tp->t_bw_rtseq)) { + if (tcp_inflight_debug > 0) { + printf ("%p invalid sequence: %u vs %u", + tp, ack_seq, tp->t_bw_rtseq); + } + } + + /* * Figure out the bandwidth. Due to the tick granularity this * is a very rough number and it MUST be averaged over a fairly * long period of time. XXX we need to take into account a link * that is not using all available bandwidth, but for now our * slop will ramp us up if this case occurs and the bandwidth later * increases. - * - * Note: if ticks rollover 'bw' may wind up negative. We must - * effectively reset t_bw_rtttime for this case. */ - save_ticks = ticks; - if ((u_int)(save_ticks - tp->t_bw_rtttime) < 1) - return; - bw = (int64_t)(ack_seq - tp->t_bw_rtseq) * hz / - (save_ticks - tp->t_bw_rtttime); + bw = (int64_t)(ack_seq - tp->t_bw_rtseq) * hz / delta_ticks; + tp->t_bw_rtttime = save_ticks; tp->t_bw_rtseq = ack_seq; - if (tp->t_bw_rtttime == 0 || (int)bw < 0) - return; - bw = ((int64_t)tp->snd_bandwidth * 15 + bw) >> 4; - tp->snd_bandwidth = bw; + avgbw = ((int64_t)tp->snd_bandwidth * 15 + bw) >> 4; /* * Calculate the semi-static bandwidth delay product, plus two maximal @@ -2004,22 +2030,25 @@ tcp_xmit_bandwidth_limit(struct tcpcb *t * no other choice. */ #define USERTT ((tp->t_srtt + tp->t_rttbest) / 2) - bwnd = (int64_t)bw * USERTT / (hz << TCP_RTT_SHIFT) + tcp_inflight_stab * tp->t_maxseg / 10; + bwnd = (int64_t)avgbw * USERTT / (hz << TCP_RTT_SHIFT) + tcp_inflight_stab * tp->t_maxseg / 10; #undef USERTT if (tcp_inflight_debug > 0) { static int ltime; if ((u_int)(ticks - ltime) >= hz / tcp_inflight_debug) { ltime = ticks; - printf("%p bw %ld rttbest %d srtt %d bwnd %ld\n", + printf("%p bw %lu(%lu,%lu) rttbest %d srtt %d bwnd %ld\n", tp, - bw, + avgbw, tp->snd_bandwidth, bw, tp->t_rttbest, tp->t_srtt, bwnd ); } } + + tp->snd_bandwidth = avgbw; + if ((long)bwnd < tcp_inflight_min) bwnd = tcp_inflight_min; if (bwnd > tcp_inflight_max) Index: tcp_usrreq.c =================================================================== RCS file: /home/ncvs/src/sys/netinet/tcp_usrreq.c,v retrieving revision 1.107.2.1 diff -u -p -r1.107.2.1 tcp_usrreq.c --- tcp_usrreq.c 31 Jan 2005 23:26:37 -0000 1.107.2.1 +++ tcp_usrreq.c 8 Feb 2005 07:09:34 -0000 @@ -864,7 +864,6 @@ tcp_connect(tp, nam, td) tp->t_state = TCPS_SYN_SENT; callout_reset(tp->tt_keep, tcp_keepinit, tcp_timer_keep, tp); tp->iss = tcp_new_isn(tp); - tp->t_bw_rtseq = tp->iss; tcp_sendseqinit(tp); /* @@ -959,7 +958,6 @@ tcp6_connect(tp, nam, td) tp->t_state = TCPS_SYN_SENT; callout_reset(tp->tt_keep, tcp_keepinit, tcp_timer_keep, tp); tp->iss = tcp_new_isn(tp); - tp->t_bw_rtseq = tp->iss; tcp_sendseqinit(tp); /* -- Dan Nelson dnelson@allantgroup.com
Responsible Changed From-To: freebsd-bugs->silby Assign ownership to silby since he's been spending quality time with TCP lately.
Responsible Changed From-To: silby->andre take over.
I wonder if this could be related to why my 6.2-STABLE machine reports completely unreasonable numbers for BANDWIDTH, which often exceed the speed of the uplink on my machine, sometimes by an order of magnitude. It looks like the patch still will apply cleanly to 6.2. Once I can get a console on my box, I'll give it a spin.
Responsible Changed From-To: andre->silby silby is currently the most active... silby - if you don't want to deal with it (or others) assign it to me rather than let it languish
Reassign to the wild with permission of assignee. To submitter: is this report still valid?
It looks like the inflight bandwidth limiter and all the code calculating the bandwidth was removed back in 2010 ( base r212765 ), so there's no need for this bugreport any more. I've closed it.
As far as I can tell, a lot (almost all) of the code around the discussion has been change/deleted/reworked. Only weird thing I noticed was that "BANDWIDTH" column is pretty useless now. I think when r212765 removed snd_bwnd and snd_bandwidth, hc_entry->rmx_bandwidth should also have been removed but I guess it was kept for backwards compat was set to 0. Which is still true. I've tried a couple head and couple stable-10 boxes and bandwidth reported in sysctl net.inet.tcp.hostcache.list is always 0. I am wondering if it's time for us to just delete that BANDWIDTH column.
(In reply to Dan Nelson from comment #15) Grr. totally didn't see this comment before my comment #16. Would have saved some archeology that I had to do to dig out the rev number :-) Anyways. should we now just remove the BANDWIDTH column?
I doubt anyone is trying to parse the output of net.inet.tcp.hostcache.list, so it could be removed. At some point the spare fields in struct tcpcb could be removed, too. There are still a lot of padding bytes left at the end of the struct, though, so there's no real hurry to reclaim space which would potentially break the ABI for modules using struct tcpcb.
There's at least one "well known" script out there that does, and was used to optimize certain kernel parameters based on average RTT, etc values that came from hostcachelist. It doesn't appear to use the (unreliable?) BANDWIDTH column though: http://spatula.net/blog/2007/04/freebsd-network-performance-tuning.html
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204528 to take care of it.