| Summary: | NATD appears to memory leak when a connection fails from the internal network to the external network. | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Base System | Reporter: | brian <brian> | ||||
| Component: | bin | Assignee: | freebsd-bugs (Nobody) <bugs> | ||||
| Status: | Closed FIXED | ||||||
| Severity: | Affects Only Me | ||||||
| Priority: | Normal | ||||||
| Version: | 3.4-STABLE | ||||||
| Hardware: | Any | ||||||
| OS: | Any | ||||||
| Attachments: |
|
||||||
|
Description
brian
2000-04-13 03:20:01 UTC
On Wed, Apr 12, 2000 at 07:18:39PM -0700, brian@pocketscience.com wrote: > > In production, we are making several connection attempts to do AOL > polling. Some are getting a failure to connect (actually, a > significant number are). Since we have noticed this behavior (a bug > on our end), we have also noticed that natd memory leaks, actually > pretty significantly. > > We're pulling ~50k connections/hour. It takes ~16 hours for the > daemon to leak enough that the network dies on the machine, until > you restart natd. > Are these TCP connections? (I will assume that they are below). Are these connections to the same remote machine/port? Are these connections from the same local machine/port? > >How-To-Repeat: > Set up natd. > > from an internal machine, make several network connections that get > dropped on the remote end (not denied, but connection timeouts) > It is unclear what do you mean. Do these connections get established, and then single-dropped by the remote end, or not established at all? In the first case, turning on and tuning a system-wide TCP keepalive on the client side might help. Do you have it enabled? What are the values of net.inet.tcp.*keep* MIB variables? Did you try running natd(8) with -log option, and monitoring the memory usage by `tail -f /var/log/alias.log'? -- Ruslan Ermilov Sysadmin and DBA of the ru@ucb.crimea.ua United Commercial Bank, ru@FreeBSD.org FreeBSD committer, +380.652.247.647 Simferopol, Ukraine http://www.FreeBSD.org The Power To Serve http://www.oracle.com Enabling The Information Age Ruslan Ermilov wrote: > > On Wed, Apr 12, 2000 at 07:18:39PM -0700, brian@pocketscience.com wrote: > > > > In production, we are making several connection attempts to do AOL > > polling. Some are getting a failure to connect (actually, a > > significant number are). Since we have noticed this behavior (a bug > > on our end), we have also noticed that natd memory leaks, actually > > pretty significantly. > > > > We're pulling ~50k connections/hour. It takes ~16 hours for the > > daemon to leak enough that the network dies on the machine, until > > you restart natd. > > > Are these TCP connections? (I will assume that they are below). > Are these connections to the same remote machine/port? > Are these connections from the same local machine/port? Yes, I am sorry, they are TCP connections. They are all connecting to americaonline.aol.com (this is DNS load-balanced) port 5190 (aol in /etc/services) The local port changes, since it's ~ 100 processes each on 7 internal machines. > > > >How-To-Repeat: > > Set up natd. > > > > from an internal machine, make several network connections that get > > dropped on the remote end (not denied, but connection timeouts) > > > It is unclear what do you mean. Do these connections get established, > and then single-dropped by the remote end, or not established at all? > In the first case, turning on and tuning a system-wide TCP keepalive > on the client side might help. Do you have it enabled? What are the > values of net.inet.tcp.*keep* MIB variables? They're never established. Theyfail to successfully connect. a tcpdump shows a lot of syn's and very few fin's. The client machines are Solaris, so I am not sure how to do any TCP tuning. > > Did you try running natd(8) with -log option, and monitoring the > memory usage by `tail -f /var/log/alias.log'? I will see if I can do this. On another note, I added a ipfw rule to state all these connections, and now it's not leakign the way it was before. (ipfw add 50 allow tcp from any to any 5190 keep-state) Please note, I am not a TCP hacker, and I am learning these things as I go along. I totally appreciate your help here, friend. thank you so much. > > -- > Ruslan Ermilov Sysadmin and DBA of the > ru@ucb.crimea.ua United Commercial Bank, > ru@FreeBSD.org FreeBSD committer, > +380.652.247.647 Simferopol, Ukraine > > http://www.FreeBSD.org The Power To Serve > http://www.oracle.com Enabling The Information Age On Thu, Apr 13, 2000 at 01:34:15PM -0700, Brian Nelson wrote: > Ruslan Ermilov wrote: > > > > On Wed, Apr 12, 2000 at 07:18:39PM -0700, brian@pocketscience.com wrote: > > > > > > In production, we are making several connection attempts to do AOL > > > polling. Some are getting a failure to connect (actually, a > > > significant number are). Since we have noticed this behavior (a bug > > > on our end), we have also noticed that natd memory leaks, actually > > > pretty significantly. > > > > > > We're pulling ~50k connections/hour. It takes ~16 hours for the > > > daemon to leak enough that the network dies on the machine, until > > > you restart natd. > > > > > Are these TCP connections? (I will assume that they are below). > > Are these connections to the same remote machine/port? > > Are these connections from the same local machine/port? > > Yes, I am sorry, they are TCP connections. > They are all connecting to americaonline.aol.com (this is DNS > load-balanced) port 5190 (aol in /etc/services) > The local port changes, since it's ~ 100 processes each on 7 internal > machines. > > > > > > >How-To-Repeat: > > > Set up natd. > > > > > > from an internal machine, make several network connections that get > > > dropped on the remote end (not denied, but connection timeouts) > > > > > It is unclear what do you mean. Do these connections get established, > > and then single-dropped by the remote end, or not established at all? > > In the first case, turning on and tuning a system-wide TCP keepalive > > on the client side might help. Do you have it enabled? What are the > > values of net.inet.tcp.*keep* MIB variables? > > They're never established. Theyfail to successfully connect. a tcpdump > shows a lot of syn's and very few fin's. The client machines are > Solaris, so I am not sure how to do any TCP tuning. > Probably, I have a solution for you, but I need to know some details. Who (in the normal circumstances) closes the connection (sends FIN)? Client or server? Also, I would like to take a look on a tcpdump(1) log of one of these failing connections (without your keep-state rule for ipfw(8)). The failing connection should be: client sends SYN and never gots neither RST nor SYN-ACK back from the server. Cheers, -- Ruslan Ermilov Sysadmin and DBA of the ru@ucb.crimea.ua United Commercial Bank, ru@FreeBSD.org FreeBSD committer, +380.652.247.647 Simferopol, Ukraine http://www.FreeBSD.org The Power To Serve http://www.oracle.com Enabling The Information Age On Wed, Apr 12, 2000 at 07:18:39PM -0700, brian@pocketscience.com wrote: > [...] > from an internal machine, make several network connections that get > dropped on the remote end (not denied, but connection timeouts) > Please try the following patch. It is for RELENG_3 (latest) sources. Extract patch to the currrent directory, then follow instructions: # mv ./p /tmp # cd /usr/src/lib/libalias # patch </tmp/p # make clean all install # build/install new library # cd /usr/src/sbin/natd # make clean all install # build/install natd with new library BACKGROUND The problem was that the TCP link's timeout was set to TCP_EXPIRE_CONNECTED (86400 secs) right after the first SYN from the client (or from the server for incoming connections). With this change, this huge timeout value will only be applied to ESTABLISHED connections, i.e. only after SYN was seen from both client and server side. TCP links corresponding to failed TCP connections (those which never receive neither SYN-ACK nor RST from server), will be dropped after TCP_EXPIRE_INITIAL (300 seconds) timeout. Cheers, -- Ruslan Ermilov Sysadmin and DBA of the ru@ucb.crimea.ua United Commercial Bank, ru@FreeBSD.org FreeBSD committer, +380.652.247.647 Simferopol, Ukraine http://www.FreeBSD.org The Power To Serve http://www.oracle.com Enabling The Information Age This seems to have worked! been running for hours, and we're still at
~600k.
Thanks a lot for your help! is this going into -current or -stable any
time soon?
Ruslan Ermilov wrote:
>
> On Wed, Apr 12, 2000 at 07:18:39PM -0700, brian@pocketscience.com wrote:
> >
> [...]
> > from an internal machine, make several network connections that get
> > dropped on the remote end (not denied, but connection timeouts)
> >
> Please try the following patch. It is for RELENG_3 (latest) sources.
> Extract patch to the currrent directory, then follow instructions:
>
> # mv ./p /tmp
> # cd /usr/src/lib/libalias
> # patch </tmp/p
> # make clean all install # build/install new library
> # cd /usr/src/sbin/natd
> # make clean all install # build/install natd with new library
>
> BACKGROUND
>
> The problem was that the TCP link's timeout was set to TCP_EXPIRE_CONNECTED
> (86400 secs) right after the first SYN from the client (or from the server
> for incoming connections). With this change, this huge timeout value will
> only be applied to ESTABLISHED connections, i.e. only after SYN was seen
> from both client and server side. TCP links corresponding to failed TCP
> connections (those which never receive neither SYN-ACK nor RST from server),
> will be dropped after TCP_EXPIRE_INITIAL (300 seconds) timeout.
>
> Cheers,
> --
> Ruslan Ermilov Sysadmin and DBA of the
> ru@ucb.crimea.ua United Commercial Bank,
> ru@FreeBSD.org FreeBSD committer,
> +380.652.247.647 Simferopol, Ukraine
>
> http://www.FreeBSD.org The Power To Serve
> http://www.oracle.com Enabling The Information Age
>
> ------------------------------------------------------------------------
>
> pName: p
> Type: Plain Text (text/plain)
State Changed From-To: open->closed Fixed in 5.0-CURRENT, 4.0-STABLE and 3.4-STABLE, file src/lib/libalias/alias_db.c, revisions 1.26, 1.21.2.2 and 1.10.2.6 respectively. |