Bug 27890

Summary: FreeBSD not always seems to take the best route
Product: Base System Reporter: Andre Albsmeier <Andre.Albsmeier>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 4.3-STABLE   
Hardware: Any   
OS: Any   

Description Andre Albsmeier 2001-06-05 17:40:00 UTC
I have observed this behaviour for a long time now but finally had
time to dig into it... I reference syslogd as an example here
but I think the problem lies in the network code of the kernel...


Simple network:
 - two routers (1 and 2)
 - host C with IP 192.168.1.3
 - host S with IP 192.168.2.1

All machines are FreeBSD 4.3-STABLE.

Router 1 routes pkts between the Internet and 192.168.1.0 
Router 2 routes pkts between 192.168.1.0 and 192.168.2.0


           +-----+                 +-----+
  default  |     |   192.168.1.0   |     |   192.168.2.0
-----------|  1  |--------+--------|  2  |--------+-------- more hosts
           |     |        |        |     |        |
           +-----+        |        +-----+        |
                          |                       |
                       +-----+                 +-----+
                       |     |                 |     |
           192.168.1.3 |  C  |                 |  S  | 192.168.2.1
                       |     |                 |     |
                       +-----+                 +-----+


Relevant parts of netstat -rn on C during normal operation:
-------------------------------------------------------------
Destination        Gateway            Flags     Netif Expire
default            192.168.1.1        UGSc      fxp0
127.0.0.1          127.0.0.1          UH        lo0
192.168.1          link#1             UC        fxp0 =>
192.168.1.1        0:e0:18:90:91:bb   UHLW      fxp0   1182
192.168.1.2        0:e0:18:90:94:c8   UHLW      fxp0   1058
192.168.1.3        0:e0:18:90:45:dc   UHLW      lo0
192.168.1.255      ff:ff:ff:ff:ff:ff  UHLWb     fxp0
192.168.2          192.168.1.2        UGc       fxp0


The syslogd on host C is configured to log messages
to syslogd running on host S. This works perfectly,
all messages appear on host S.

Now we delete the route to net 192.168.2.0 on host C (this
can appear automatically if router 2 and/or its routed go
down for a while). If syslogd now wants to send a message
to S, the kernel uses the default route which is obvious
because the route to net 192.168.2.0 is gone. We can see the
packets go into router 1. I consider this as the correct
behaviour as well.

Now we bring back the route to net 192.168.2.0 again on host
C exactly as it was before (e.g. by restarting router 2 and/or
its routed). We can verify this with netstat -rn on C. We can
also ping host S or telnet to it or do other stuff which all
work perfectly.

The problem is that each time when syslogd on C wants to send
a packet to S, the kernel still uses 1 as router even though
it should send them through 2.  After HUPing or restarting
syslogd on C (which means that the UDP socket is closed and
opened again) things are back to normal.

It seems that as long as packets can be send somewhere, the
kernel doesn't bother if there is a better route to the
destination until the socket is closed and opened again.

Fix: 

Unknown. I am happy to test suggestions, of course.
How-To-Repeat: 
See above.
Comment 1 dwmalone 2001-06-05 20:05:57 UTC
On Tue, Jun 05, 2001 at 06:30:14PM +0200, Andre Albsmeier wrote:
> The problem is that each time when syslogd on C wants to send
> a packet to S, the kernel still uses 1 as router even though
> it should send them through 2.  After HUPing or restarting
> syslogd on C (which means that the UDP socket is closed and
> opened again) things are back to normal.

This sounds like it is to do with the caching of recently used
routes. Does the effect go away if you leave it for a while?
Adjusting some of the following sysctls might change how long
you have to wait:

net.inet.ip.rtexpire
net.inet.ip.rtminexpire
net.inet.ip.rtmaxcache

(Some of the networking people could definitely provide more
details.)

	David.
Comment 2 ru freebsd_committer freebsd_triage 2001-06-06 09:24:19 UTC
On Tue, Jun 05, 2001 at 06:30:14PM +0200, Andre Albsmeier wrote:
> 
> I have observed this behaviour for a long time now but finally had
> time to dig into it... I reference syslogd as an example here
> but I think the problem lies in the network code of the kernel...
> 
> 
> Simple network:
>  - two routers (1 and 2)
>  - host C with IP 192.168.1.3
>  - host S with IP 192.168.2.1
> 
> All machines are FreeBSD 4.3-STABLE.
> 
> Router 1 routes pkts between the Internet and 192.168.1.0 
> Router 2 routes pkts between 192.168.1.0 and 192.168.2.0
> 
> 
>            +-----+                 +-----+
>   default  |     |   192.168.1.0   |     |   192.168.2.0
> -----------|  1  |--------+--------|  2  |--------+-------- more hosts
>            |     |        |        |     |        |
>            +-----+        |        +-----+        |
>                           |                       |
>                        +-----+                 +-----+
>                        |     |                 |     |
>            192.168.1.3 |  C  |                 |  S  | 192.168.2.1
>                        |     |                 |     |
>                        +-----+                 +-----+
> 
> 
> Relevant parts of netstat -rn on C during normal operation:
> -------------------------------------------------------------
> Destination        Gateway            Flags     Netif Expire
> default            192.168.1.1        UGSc      fxp0
> 127.0.0.1          127.0.0.1          UH        lo0
> 192.168.1          link#1             UC        fxp0 =>
> 192.168.1.1        0:e0:18:90:91:bb   UHLW      fxp0   1182
> 192.168.1.2        0:e0:18:90:94:c8   UHLW      fxp0   1058
> 192.168.1.3        0:e0:18:90:45:dc   UHLW      lo0
> 192.168.1.255      ff:ff:ff:ff:ff:ff  UHLWb     fxp0
> 192.168.2          192.168.1.2        UGc       fxp0
> 
> 
> The syslogd on host C is configured to log messages
> to syslogd running on host S. This works perfectly,
> all messages appear on host S.
> 
> Now we delete the route to net 192.168.2.0 on host C (this
> can appear automatically if router 2 and/or its routed go
> down for a while). If syslogd now wants to send a message
> to S, the kernel uses the default route which is obvious
> because the route to net 192.168.2.0 is gone. We can see the
> packets go into router 1. I consider this as the correct
> behaviour as well.
> 
> Now we bring back the route to net 192.168.2.0 again on host
> C exactly as it was before (e.g. by restarting router 2 and/or
> its routed). We can verify this with netstat -rn on C. We can
> also ping host S or telnet to it or do other stuff which all
> work perfectly.
> 
> The problem is that each time when syslogd on C wants to send
> a packet to S, the kernel still uses 1 as router even though
> it should send them through 2.  After HUPing or restarting
> syslogd on C (which means that the UDP socket is closed and
> opened again) things are back to normal.
> 
> It seems that as long as packets can be send somewhere, the
> kernel doesn't bother if there is a better route to the
> destination until the socket is closed and opened again.
> 
> >How-To-Repeat:
> 
> See above.
> 
I can't reproduce this problem on my 4.3-STABLE box.

Yes, the UDP socket has the reference to the protocol-cloned
route to the destination host S through the router 1 initially,
and UDP packets go through that router.

In my tests, router 1 (192.168.1.1) was the host *not* configured
to act as the router, so all "foreign" packets sent to it got
silently ignored.  I used the ports/net/netcat utility to connect
to the UDP `echo' port of the destination S (192.168.2.1):

Fig.1: Initial state, before UDP socket is open.

: # netstat -arn
: Destination        Gateway            Flags     Refs     Use     Netif Expire
: default            192.168.1.1        UGSc        0        2      rl0
: 127.0.0.1          127.0.0.1          UH          1        6      lo0
: 192.168.1          link#1             UC          3        0      rl0 =>


Fig.2: We connect(2) UDP socket to the "echo" port on S (192.168.2.1).

: # nc -u 192.168.2.1 echo
: ping1
: ping2
: ping3
[...]

As you can see, we receive no echos back.


Fig.3: Routing table after UDP socket is open.

: # netstat -arn
: Destination        Gateway            Flags     Refs     Use     Netif Expire
: default            192.168.1.1        UGSc        1        2      rl0
: 127.0.0.1          127.0.0.1          UH          1        6      lo0
: 192.168.1          link#1             UC          4        0      rl0 =>
: 192.168.2.1        192.168.1.1        UGHW        1       14      rl0

The route to S (192.168.2.1) was cloned (W) from the `default' route.
refcnt=1 on the 192.168.2.1 route indicates that the UDP socket holds
a reference to this route.

Fig.4: I manually add the route to the 192.168.2 network.

: # route add -net 192.168.2   192.168.1.2 
: add net 192.168.2: gateway 192.168.1.2 

Fig.5: Routing table after the route to the 192.168.2 network was added.

: # netstat -arn
: Destination        Gateway            Flags     Refs     Use     Netif Expire
: default            192.168.1.1        UGSc        1        2      rl0
: 127.0.0.1          127.0.0.1          UH          1        6      lo0
: 192.168.1          link#1             UC          4        0      rl0 =>
: 192.168.2          192.168.1.2        UGSc        0        0      rl0

As you can see, the route to the 192.168.2.1 host is deleted from the routing
table.  It actually doesn't get freed completely, as it had non-zero reference
count (UDP socket still holds on it), but instead it gets marked as DOWN, and
will be freed and reallocated in ip_output() on the next use.

Fig.6: We continue to send UDP datagrams.

: # nc -u 192.168.2.1 echo (continued)
: ping4
: ping4
: ping5
: ping5
: ping6
: ping6

As you can see, this time we get the echos back.

Fig.7: Routing table after we sent more UDP datagrams.

: # netstat -arn -finet
: Destination        Gateway            Flags     Refs     Use     Netif Expire
: default            192.168.1.1        UGSc        0        2      rl0
: 127.0.0.1          127.0.0.1          UH          1        6      lo0
: 192.168.1          link#1             UC          4        0      rl0 =>
: 192.168.2          192.168.1.2        UGSc        1        3      rl0

The refcount on 192.168.2 route has grown to 1, indicating that the
UDP socket now holds on this route.  The `Use' count of 3 corresponds
to our three UDP datagrams (ping4, ping5, and ping6).

Could you please repeat these steps in your environment, and try to
detect where it behaved differently in your case.


Cheers,
-- 
Ruslan Ermilov		Oracle Developer/DBA,
ru@sunbay.com		Sunbay Software AG,
ru@FreeBSD.org		FreeBSD committer,
+380.652.512.251	Simferopol, Ukraine

http://www.FreeBSD.org	The Power To Serve
http://www.oracle.com	Enabling The Information Age
Comment 3 Andre Albsmeier 2001-06-06 11:29:04 UTC
Thanks for helping...

On Wed, 06-Jun-2001 at 11:24:19 +0300, Ruslan Ermilov wrote:
>
> ...
> 
> I can't reproduce this problem on my 4.3-STABLE box.
> 
> Yes, the UDP socket has the reference to the protocol-cloned
> route to the destination host S through the router 1 initially,
> and UDP packets go through that router.
> 
> In my tests, router 1 (192.168.1.1) was the host *not* configured
> to act as the router, so all "foreign" packets sent to it got

OK, I have blocked packets coming from C on router 1. So
I think I got the same config as you.


> silently ignored.  I used the ports/net/netcat utility to connect
> to the UDP `echo' port of the destination S (192.168.2.1):
> 
> Fig.1: Initial state, before UDP socket is open.
> 
> : # netstat -arn
> : Destination        Gateway            Flags     Refs     Use     Netif Expire
> : default            192.168.1.1        UGSc        0        2      rl0
> : 127.0.0.1          127.0.0.1          UH          1        6      lo0
> : 192.168.1          link#1             UC          3        0      rl0 =>
> 
> 
> Fig.2: We connect(2) UDP socket to the "echo" port on S (192.168.2.1).
> 
> : # nc -u 192.168.2.1 echo
> : ping1
> : ping2
> : ping3
> [...]
> 
> As you can see, we receive no echos back.

OK, same here.


> Fig.3: Routing table after UDP socket is open.
> 
> : # netstat -arn
> : Destination        Gateway            Flags     Refs     Use     Netif Expire
> : default            192.168.1.1        UGSc        1        2      rl0
> : 127.0.0.1          127.0.0.1          UH          1        6      lo0
> : 192.168.1          link#1             UC          4        0      rl0 =>
> : 192.168.2.1        192.168.1.1        UGHW        1       14      rl0
> 
> The route to S (192.168.2.1) was cloned (W) from the `default' route.
> refcnt=1 on the 192.168.2.1 route indicates that the UDP socket holds
> a reference to this route.

Same here:

192.168.2.1       192.168.1.1        UGHW        1      425     fxp0


> Fig.4: I manually add the route to the 192.168.2 network.
> 
> : # route add -net 192.168.2   192.168.1.2 
> : add net 192.168.2: gateway 192.168.1.2 

OK, I don;t add it manually but wait until routed messages from
192.168.1.2 brings it back.


> 
> Fig.5: Routing table after the route to the 192.168.2 network was added.
> 
> : # netstat -arn
> : Destination        Gateway            Flags     Refs     Use     Netif Expire
> : default            192.168.1.1        UGSc        1        2      rl0
> : 127.0.0.1          127.0.0.1          UH          1        6      lo0
> : 192.168.1          link#1             UC          4        0      rl0 =>
> : 192.168.2          192.168.1.2        UGSc        0        0      rl0

Yup, same here


> As you can see, the route to the 192.168.2.1 host is deleted from the routing
> table.  It actually doesn't get freed completely, as it had non-zero reference
> count (UDP socket still holds on it), but instead it gets marked as DOWN, and
> will be freed and reallocated in ip_output() on the next use.
> 
> Fig.6: We continue to send UDP datagrams.
> 
> : # nc -u 192.168.2.1 echo (continued)
> : ping4
> : ping4
> : ping5
> : ping5
> : ping6
> : ping6
> 
> As you can see, this time we get the echos back.

Yes, same here :-(


> Fig.7: Routing table after we sent more UDP datagrams.
> 
> : # netstat -arn -finet
> : Destination        Gateway            Flags     Refs     Use     Netif Expire
> : default            192.168.1.1        UGSc        0        2      rl0
> : 127.0.0.1          127.0.0.1          UH          1        6      lo0
> : 192.168.1          link#1             UC          4        0      rl0 =>
> : 192.168.2          192.168.1.2        UGSc        1        3      rl0
> 
> The refcount on 192.168.2 route has grown to 1, indicating that the
> UDP socket now holds on this route.  The `Use' count of 3 corresponds
> to our three UDP datagrams (ping4, ping5, and ping6).
> 
> Could you please repeat these steps in your environment, and try to
> detect where it behaved differently in your case.

It doesn't behave differently, that's interesting. May I ask you to
try it using syslogd?

- Let host C log to host S (with the route installed).
- Watch C's messages appear on S.
- Delete C's route to S (via router 2)
- Let host C log again (run tcpdump on router 1 to see the packets come in)
- Install the route to S (via router 2) again on C
- Log more stuff. If you don't see the packets go into router 1 anymore
  I am really lost...

Thanks,

	-Andre
Comment 4 Andre Albsmeier 2001-06-06 12:19:26 UTC
On Tue, 05-Jun-2001 at 20:05:57 +0100, David Malone wrote:
> On Tue, Jun 05, 2001 at 06:30:14PM +0200, Andre Albsmeier wrote:
> > The problem is that each time when syslogd on C wants to send
> > a packet to S, the kernel still uses 1 as router even though
> > it should send them through 2.  After HUPing or restarting
> > syslogd on C (which means that the UDP socket is closed and
> > opened again) things are back to normal.
> 
> This sounds like it is to do with the caching of recently used
> routes. Does the effect go away if you leave it for a while?

I had it running for 3 hours now but it didn't change. As
soon as I HUP'ed syslogd it worked.


> Adjusting some of the following sysctls might change how long
> you have to wait:
> 
> net.inet.ip.rtexpire
> net.inet.ip.rtminexpire
> net.inet.ip.rtmaxcache
> 
> (Some of the networking people could definitely provide more
> details.)

Ruslan Ermilov <ru@FreeBSD.org> sent me a mail and asked
me to reproduce the behaviour with netcat. This worked
properly so maybe it is really some issue with syslogd.
I will try to isolate the problem as soon as Ruslan
can reproduce it with syslogd.

Thanks,

	-Andre
Comment 5 ru freebsd_committer freebsd_triage 2001-06-06 13:32:05 UTC
On Wed, Jun 06, 2001 at 12:29:04PM +0200, Andre Albsmeier wrote:
> Thanks for helping...
> 
> On Wed, 06-Jun-2001 at 11:24:19 +0300, Ruslan Ermilov wrote:
> >
> > ...
> > 
> > I can't reproduce this problem on my 4.3-STABLE box.
> > 
> > Yes, the UDP socket has the reference to the protocol-cloned
> > route to the destination host S through the router 1 initially,
> > and UDP packets go through that router.
> > 
> > In my tests, router 1 (192.168.1.1) was the host *not* configured
> > to act as the router, so all "foreign" packets sent to it got
> 
> OK, I have blocked packets coming from C on router 1. So
> I think I got the same config as you.
> 
> 
> > silently ignored.  I used the ports/net/netcat utility to connect
> > to the UDP `echo' port of the destination S (192.168.2.1):
> > 
> > Fig.1: Initial state, before UDP socket is open.
> > 
> > : # netstat -arn
> > : Destination        Gateway            Flags     Refs     Use     Netif Expire
> > : default            192.168.1.1        UGSc        0        2      rl0
> > : 127.0.0.1          127.0.0.1          UH          1        6      lo0
> > : 192.168.1          link#1             UC          3        0      rl0 =>
> > 
> > 
> > Fig.2: We connect(2) UDP socket to the "echo" port on S (192.168.2.1).
> > 
> > : # nc -u 192.168.2.1 echo
> > : ping1
> > : ping2
> > : ping3
> > [...]
> > 
> > As you can see, we receive no echos back.
> 
> OK, same here.
> 
> 
> > Fig.3: Routing table after UDP socket is open.
> > 
> > : # netstat -arn
> > : Destination        Gateway            Flags     Refs     Use     Netif Expire
> > : default            192.168.1.1        UGSc        1        2      rl0
> > : 127.0.0.1          127.0.0.1          UH          1        6      lo0
> > : 192.168.1          link#1             UC          4        0      rl0 =>
> > : 192.168.2.1        192.168.1.1        UGHW        1       14      rl0
> > 
> > The route to S (192.168.2.1) was cloned (W) from the `default' route.
> > refcnt=1 on the 192.168.2.1 route indicates that the UDP socket holds
> > a reference to this route.
> 
> Same here:
> 
> 192.168.2.1       192.168.1.1        UGHW        1      425     fxp0
> 
> 
> > Fig.4: I manually add the route to the 192.168.2 network.
> > 
> > : # route add -net 192.168.2   192.168.1.2 
> > : add net 192.168.2: gateway 192.168.1.2 
> 
> OK, I don;t add it manually but wait until routed messages from
> 192.168.1.2 brings it back.
> 
> 
> > 
> > Fig.5: Routing table after the route to the 192.168.2 network was added.
> > 
> > : # netstat -arn
> > : Destination        Gateway            Flags     Refs     Use     Netif Expire
> > : default            192.168.1.1        UGSc        1        2      rl0
> > : 127.0.0.1          127.0.0.1          UH          1        6      lo0
> > : 192.168.1          link#1             UC          4        0      rl0 =>
> > : 192.168.2          192.168.1.2        UGSc        0        0      rl0
> 
> Yup, same here
> 
> 
> > As you can see, the route to the 192.168.2.1 host is deleted from the routing
> > table.  It actually doesn't get freed completely, as it had non-zero reference
> > count (UDP socket still holds on it), but instead it gets marked as DOWN, and
> > will be freed and reallocated in ip_output() on the next use.
> > 
> > Fig.6: We continue to send UDP datagrams.
> > 
> > : # nc -u 192.168.2.1 echo (continued)
> > : ping4
> > : ping4
> > : ping5
> > : ping5
> > : ping6
> > : ping6
> > 
> > As you can see, this time we get the echos back.
> 
> Yes, same here :-(
> 
> 
> > Fig.7: Routing table after we sent more UDP datagrams.
> > 
> > : # netstat -arn -finet
> > : Destination        Gateway            Flags     Refs     Use     Netif Expire
> > : default            192.168.1.1        UGSc        0        2      rl0
> > : 127.0.0.1          127.0.0.1          UH          1        6      lo0
> > : 192.168.1          link#1             UC          4        0      rl0 =>
> > : 192.168.2          192.168.1.2        UGSc        1        3      rl0
> > 
> > The refcount on 192.168.2 route has grown to 1, indicating that the
> > UDP socket now holds on this route.  The `Use' count of 3 corresponds
> > to our three UDP datagrams (ping4, ping5, and ping6).
> > 
> > Could you please repeat these steps in your environment, and try to
> > detect where it behaved differently in your case.
> 
> It doesn't behave differently, that's interesting. May I ask you to
> try it using syslogd?
> 
> - Let host C log to host S (with the route installed).
> - Watch C's messages appear on S.
> - Delete C's route to S (via router 2)
> - Let host C log again (run tcpdump on router 1 to see the packets come in)
> - Install the route to S (via router 2) again on C
> - Log more stuff. If you don't see the packets go into router 1 anymore
>   I am really lost...
> 
Yes, I have reproduced the problem here.  My test misses one step.
OK, now about what happens here.

Initially, there is the route (cloned from the network route) to S
(192.168.2.1) through the router 2 (192.168.1.2).  UDP socket uses
this route initially.  When this (and the 192.168.2 network) routes
disappear, on the next write (!), ip_output() detects that the S
route is DOWN, and "allocates" (caches) another route, which happens
to be the "default" route pointing to router 1 (192.168.1.1).
Later, when the route to the 192.168.2 network gets installed again,
it's not taken into account, as the cached ("default") route is still
UP.

Unfortunately, there is no easy way to fix this.  Checking for
the best-match route on every write may be too time consuming.
As the workaround, you can delete and re-add your "default"
route.  This worked for me here.  `route delete default' will
delete the "default" route from the routing table, but because
it has a refcnt>0 will not delete it immediately, but will mark
it as DOWN.  ip_output() for this UDP socket's write will detect
that the cached route is DOWN, will free it, and allocate a new
route, which will be the route to the 192.168.2 network through
router 2 (192.168.1.2) this time.

The actual fix would be to notify protocol (from within the
routing code) whenever its routing table is modified.  This
notification could then be saved in a variable as timestamp,
and every PCB-cached route could have a similar timestamp as
well, indicating when this "caching" took place.  Having
that, ip_output() would "invalidate" cached route if it was
cached before the last routing table modification was done.

I could probably try to implement this, if no one else can
come up with a better idea.


Cheers,
-- 
Ruslan Ermilov		Oracle Developer/DBA,
ru@sunbay.com		Sunbay Software AG,
ru@FreeBSD.org		FreeBSD committer,
+380.652.512.251	Simferopol, Ukraine

http://www.FreeBSD.org	The Power To Serve
http://www.oracle.com	Enabling The Information Age
Comment 6 ru freebsd_committer freebsd_triage 2001-06-06 15:27:57 UTC
State Changed
From-To: open->closed

The analysis shows this PR is the duplicate of PR kern/10778.
Comment 7 Andre Albsmeier 2001-06-06 15:29:33 UTC
On Wed, 06-Jun-2001 at 15:32:05 +0300, Ruslan Ermilov wrote:
> On Wed, Jun 06, 2001 at 12:29:04PM +0200, Andre Albsmeier wrote:
> > > : 127.0.0.1          127.0.0.1          UH          1        6      lo0
> > > : 192.168.1          link#1             UC          4        0      rl0 =>
> > > : 192.168.2          192.168.1.2        UGSc        1        3      rl0
> > > 
> > > The refcount on 192.168.2 route has grown to 1, indicating that the
> > > UDP socket now holds on this route.  The `Use' count of 3 corresponds
> > > to our three UDP datagrams (ping4, ping5, and ping6).
> > > 
> > > Could you please repeat these steps in your environment, and try to
> > > detect where it behaved differently in your case.
> > 
> > It doesn't behave differently, that's interesting. May I ask you to
> > try it using syslogd?
> > 
> > - Let host C log to host S (with the route installed).
> > - Watch C's messages appear on S.
> > - Delete C's route to S (via router 2)
> > - Let host C log again (run tcpdump on router 1 to see the packets come in)
> > - Install the route to S (via router 2) again on C
> > - Log more stuff. If you don't see the packets go into router 1 anymore
> >   I am really lost...
> > 
> Yes, I have reproduced the problem here.  My test misses one step.

Hmm, I just wonder why syslogd behaves differently...

> OK, now about what happens here.
> 
> Initially, there is the route (cloned from the network route) to S
> (192.168.2.1) through the router 2 (192.168.1.2).  UDP socket uses
> this route initially.  When this (and the 192.168.2 network) routes
> disappear, on the next write (!), ip_output() detects that the S
> route is DOWN, and "allocates" (caches) another route, which happens
> to be the "default" route pointing to router 1 (192.168.1.1).
> Later, when the route to the 192.168.2 network gets installed again,
> it's not taken into account, as the cached ("default") route is still
> UP.

So this would match my (rather amateurish) description when saying:

It seems that as long as packets can be send somewhere, the
kernel doesn't bother if there is a better route to the
destination until the socket is closed and opened again.


> Unfortunately, there is no easy way to fix this.  Checking for
> the best-match route on every write may be too time consuming.
> As the workaround, you can delete and re-add your "default"
> route.  This worked for me here.  `route delete default' will

Just tried it, worked here as well.

> delete the "default" route from the routing table, but because
> it has a refcnt>0 will not delete it immediately, but will mark
> it as DOWN.  ip_output() for this UDP socket's write will detect
> that the cached route is DOWN, will free it, and allocate a new
> route, which will be the route to the 192.168.2 network through
> router 2 (192.168.1.2) this time.
> 
> The actual fix would be to notify protocol (from within the
> routing code) whenever its routing table is modified.  This
> notification could then be saved in a variable as timestamp,
> and every PCB-cached route could have a similar timestamp as
> well, indicating when this "caching" took place.  Having
> that, ip_output() would "invalidate" cached route if it was
> cached before the last routing table modification was done.
> 
> I could probably try to implement this, if no one else can
> come up with a better idea.

I can only offer to test any new code since my knowledge about
the corresponding parts in the kernel is not sufficient to
implement it.

Thanks so far,

	-Andre
Comment 8 ru freebsd_committer freebsd_triage 2001-06-06 15:56:15 UTC
On Wed, Jun 06, 2001 at 04:29:33PM +0200, Andre Albsmeier wrote:
> On Wed, 06-Jun-2001 at 15:32:05 +0300, Ruslan Ermilov wrote:
> > On Wed, Jun 06, 2001 at 12:29:04PM +0200, Andre Albsmeier wrote:
> > > > : 127.0.0.1          127.0.0.1          UH          1        6      lo0
> > > > : 192.168.1          link#1             UC          4        0      rl0 =>
> > > > : 192.168.2          192.168.1.2        UGSc        1        3      rl0
> > > > 
> > > > The refcount on 192.168.2 route has grown to 1, indicating that the
> > > > UDP socket now holds on this route.  The `Use' count of 3 corresponds
> > > > to our three UDP datagrams (ping4, ping5, and ping6).
> > > > 
> > > > Could you please repeat these steps in your environment, and try to
> > > > detect where it behaved differently in your case.
> > > 
> > > It doesn't behave differently, that's interesting. May I ask you to
> > > try it using syslogd?
> > > 
> > > - Let host C log to host S (with the route installed).
> > > - Watch C's messages appear on S.
> > > - Delete C's route to S (via router 2)
> > > - Let host C log again (run tcpdump on router 1 to see the packets come in)
> > > - Install the route to S (via router 2) again on C
> > > - Log more stuff. If you don't see the packets go into router 1 anymore
> > >   I am really lost...
> > > 
> > Yes, I have reproduced the problem here.  My test misses one step.
> 
> Hmm, I just wonder why syslogd behaves differently...
> 
Because my test missed one step: the route to S through router 2
should exist initially to reproduce this with netcat(1).  You
then send some data, delete the route, again send data so that
the "default" route gets cached, and install the route to S
again.


-- 
Ruslan Ermilov		Oracle Developer/DBA,
ru@sunbay.com		Sunbay Software AG,
ru@FreeBSD.org		FreeBSD committer,
+380.652.512.251	Simferopol, Ukraine

http://www.FreeBSD.org	The Power To Serve
http://www.oracle.com	Enabling The Information Age