Bug 29170

Summary: [patch] ARP request fails after "bad gateway value" in if_ether.c
Product: Base System Reporter: Voradesh Yenbut <yenbut>
Component: kernAssignee: Remko Lodder <remko>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: Unspecified   
Hardware: Any   
OS: Any   

Description Voradesh Yenbut 2001-07-23 22:00:00 UTC
We have several FreeBSD systems running DNS servers.  For some unknown
reasons, one of the systems serving a subnet where most clients run
Windows 2000, occasionally failed to do arp address resolution.

The kernel logged messages like the followings:

  arp_rtrequest: bad gateway value
  arplookup 128.95.8.74 failed: could not allocate llinfo
  arpresolve: can't allocate llinfo for 128.95.8.74rt

  arp_rtrequest: bad gateway value
  arplookup 128.95.8.233 failed: could not allocate llinfo
  arpresolve: can't allocate llinfo for 128.95.8.233rt

  arp_rtrequest: bad gateway value
  arplookup 128.95.8.232 failed: could not allocate llinfo
  arpresolve: can't allocate llinfo for 128.95.8.232rt

  arplookup 128.95.8.233 failed: could not allocate llinfo
  arpresolve: can't allocate llinfo for 128.95.8.233rt

  arp_rtrequest: bad gateway value
  arplookup 128.95.8.230 failed: could not allocate llinfo
  arpresolve: can't allocate llinfo for 128.95.8.230rt

  arp_rtrequest: bad gateway value
  arplookup 128.95.8.160 failed: could not allocate llinfo
  arpresolve: can't allocate llinfo for 128.95.8.160rt

ARP requests to the addresses above failed afterward.  A system reboot
made ARP requests work again, but sooner or later the same problem
comes back.

As I searched FreeBSD mailing lists for a solution, several reports
of similar problems were found but I did not see a good solution.

Fix: 

I don't completely understand the arp code so may not have an insight
to really correct the problem, but the following patch seems to get
around the problem ("bad gateway value" is still seen but no more messages
about llinfo and arp works with the address causing the message.):

--- if_ether.c  2001/07/23 16:35:07     1.1
+++ if_ether.c  2001/07/23 19:13:24
@@ -199,7 +199,13 @@
        case RTM_RESOLVE:
                if (gate->sa_family != AF_LINK ||
                    gate->sa_len < sizeof(null_sdl)) {
-                       log(LOG_DEBUG, "arp_rtrequest: bad gateway value\n");
+                       log(LOG_DEBUG, "arp_rtrequest: %s bad gateway value %s\n",
+                           inet_ntoa(SIN(rt_key(rt))->sin_addr),
+                           gate->sa_family != AF_LINK? "family": "");
+                       rtrequest(RTM_DELETE,
+                                 (struct sockaddr *)rt_key(rt),
+                                 rt->rt_gateway,
+                                 rt_mask(rt), rt->rt_flags, 0);
                        break;
                }
                SDL(gate)->sdl_type = rt->rt_ifp->if_type;
How-To-Repeat: I don't know how to repeat this, but it can be simulated by making a
condition in arp_rtrequest() of /usr/src/sys/netinet/if_ether.c to
break out of RTM_RESOLVE.  For example,

 The following code use a static variable:

   static int toggle = 1;  /* added */

 to simulate one fault with bad gateway value condition.

              case RTM_RESOLVE:
                if (gate->sa_family != AF_LINK ||
                    toggle ||                           /* added */
                    gate->sa_len < sizeof(null_sdl)) {
                       log(LOG_DEBUG, "arp_rtrequest: bad gateway value\n");
                       if (toggle) toggle = 0;          /* added */
                       break;
                 }

 After a system reboot, the system will generate "rp_rtrequest: bad
 gateway value" to the first host it tries to contact which is
 is likely to be its default gateway.  Even though toggle's value
 is 0, subsequent attempts to contact the host generates messages:

  arplookup xx.xx.x.xxx failed: could not allocate llinfo
  arpresolve: can't allocate llinfo for xx.xx.xx.xxrt

This leads to believe that a good cleanup is not automatically done to
a route if for some reasons it has an error.
Comment 1 ru freebsd_committer freebsd_triage 2001-10-18 15:20:34 UTC
State Changed
From-To: open->feedback

Do you have a routed(8) daemon running? 


Comment 2 ru freebsd_committer freebsd_triage 2001-10-18 15:20:34 UTC
Responsible Changed
From-To: freebsd-bugs->ru

I can easily reproduce this with routed(8) and route(8), 
and understand what's going on, but not sure if this is 
the routed(8) problem or kernel's.
Comment 3 Paul Herman 2001-11-29 00:22:08 UTC
The following patch (against 4.4-RELEASE) solves this problem.  In
-CURRENT it's a little different, but the same if condition should
apply, as long as it appears before the rt_setgate() statement.

Voradesh, does this solve your problem?

-Paul.

Index: sys/net/rtsock.c
===================================================================
RCS file: /mnt/ncvs/src/sys/net/rtsock.c,v
retrieving revision 1.44.2.4
diff -u -r1.44.2.4 rtsock.c
--- sys/net/rtsock.c	2001/07/11 09:37:37	1.44.2.4
+++ sys/net/rtsock.c	2001/11/27 01:33:03
@@ -399,6 +399,14 @@
 			break;

 		case RTM_CHANGE:
+			/* Don't let the user specify non-link information
+			 * for a gateway if the RTF_LLINFO flag is set.
+			 * We'll just leave the gateway alone.
+			 */
+			if (gate && (rt->rt_flags & RTF_LLINFO) &&
+			    gate->sa_family != AF_LINK)
+				gate = rt->rt_gateway;
+
 			if (gate && (error = rt_setgate(rt, rt_key(rt), gate)))
 				senderr(error);
Comment 4 Voradesh Yenbut 2001-11-29 23:31:17 UTC
Thanks for the patch.  Unfortunately, it did not solve my problem.

The kernel was changed from 4.2 to 4.4 with the patch. After a while
the usual error messages were printed, and no communication to IP addresses
listed in the message was possible afterward.

Below is an example of messages (192.168.85 is a HP LaserJet 4M and 
128.95.8.25 is a win2k machine.)


Nov 29 14:58:08 bs8 /kernel: arp_rtrequest: bad gateway value
Nov 29 14:58:08 bs8 /kernel: arplookup 192.168.8.85 failed: could not allocate llinfo
Nov 29 14:58:08 bs8 /kernel: arpresolve: can't allocate llinfo for 192.168.8.85rt

Nov 29 14:58:31 bs8 /kernel: arplookup 192.168.8.85 failed: could not allocate llinfo
Nov 29 14:58:31 bs8 /kernel: arpresolve: can't allocate llinfo for 192.168.8.85rt
Nov 29 14:58:31 bs8 /kernel: arplookup 192.168.8.85 failed: could not allocate llinfo
Nov 29 14:58:31 bs8 /kernel: arpresolve: can't allocate llinfo for 192.168.8.85rt

Nov 29 15:10:22 bs8 /kernel: arp_rtrequest: bad gateway value
Nov 29 15:10:22 bs8 /kernel: arplookup 128.95.8.25 failed: could not allocate llinfo
Nov 29 15:10:22 bs8 /kernel: arpresolve: can't allocate llinfo for 128.95.8.25rt
Nov 29 15:10:29 bs8 /kernel: arplookup 128.95.8.25 failed: could not allocate llinfo
Nov 29 15:10:29 bs8 /kernel: arplookup 128.95.8.25 failed: could not allocate llinfo
Nov 29 15:10:29 bs8 /kernel: arpresolve: can't allocate llinfo for 128.95.8.25rt
Nov 29 15:10:29 bs8 /kernel: arplookup 128.95.8.25 failed: could not allocate llinfo
Nov 29 15:10:29 bs8 /kernel: arpresolve: can't allocate llinfo for 128.95.8.25rt
Nov 29 15:10:33 bs8 /kernel: arplookup 128.95.8.25 failed: could not allocate llinfo
Nov 29 15:10:33 bs8 /kernel: arpresolve: can't allocate llinfo for 128.95.8.25rt
Nov 29 15:10:45 bs8 /kernel: arplookup 128.95.8.25 failed: could not allocate llinfo
Nov 29 15:10:45 bs8 /kernel: arpresolve: can't allocate llinfo for 128.95.8.25rt
Nov 29 15:10:46 bs8 /kernel: arplookup 128.95.8.25 failed: could not allocate llinfo
Nov 29 15:10:46 bs8 /kernel: arpresolve: can't allocate llinfo for 128.95.8.25rt
Comment 5 ru freebsd_committer freebsd_triage 2001-11-30 13:19:25 UTC
On Thu, Nov 29, 2001 at 03:31:17PM -0800, Voradesh Yenbut wrote:
> Thanks for the patch.  Unfortunately, it did not solve my problem.
> 
> The kernel was changed from 4.2 to 4.4 with the patch. After a while
> the usual error messages were printed, and no communication to IP addresses
> listed in the message was possible afterward.
> 
> Below is an example of messages (192.168.85 is a HP LaserJet 4M and 
> 128.95.8.25 is a win2k machine.)
> 
> 
> Nov 29 14:58:08 bs8 /kernel: arp_rtrequest: bad gateway value
> Nov 29 14:58:08 bs8 /kernel: arplookup 192.168.8.85 failed: could not allocate llinfo
> Nov 29 14:58:08 bs8 /kernel: arpresolve: can't allocate llinfo for 192.168.8.85rt
> 
> Nov 29 14:58:31 bs8 /kernel: arplookup 192.168.8.85 failed: could not allocate llinfo
> Nov 29 14:58:31 bs8 /kernel: arpresolve: can't allocate llinfo for 192.168.8.85rt
> Nov 29 14:58:31 bs8 /kernel: arplookup 192.168.8.85 failed: could not allocate llinfo
> Nov 29 14:58:31 bs8 /kernel: arpresolve: can't allocate llinfo for 192.168.8.85rt
> 
> Nov 29 15:10:22 bs8 /kernel: arp_rtrequest: bad gateway value
> Nov 29 15:10:22 bs8 /kernel: arplookup 128.95.8.25 failed: could not allocate llinfo
> Nov 29 15:10:22 bs8 /kernel: arpresolve: can't allocate llinfo for 128.95.8.25rt
> Nov 29 15:10:29 bs8 /kernel: arplookup 128.95.8.25 failed: could not allocate llinfo
> Nov 29 15:10:29 bs8 /kernel: arplookup 128.95.8.25 failed: could not allocate llinfo
> Nov 29 15:10:29 bs8 /kernel: arpresolve: can't allocate llinfo for 128.95.8.25rt
> Nov 29 15:10:29 bs8 /kernel: arplookup 128.95.8.25 failed: could not allocate llinfo
> Nov 29 15:10:29 bs8 /kernel: arpresolve: can't allocate llinfo for 128.95.8.25rt
> Nov 29 15:10:33 bs8 /kernel: arplookup 128.95.8.25 failed: could not allocate llinfo
> Nov 29 15:10:33 bs8 /kernel: arpresolve: can't allocate llinfo for 128.95.8.25rt
> Nov 29 15:10:45 bs8 /kernel: arplookup 128.95.8.25 failed: could not allocate llinfo
> Nov 29 15:10:45 bs8 /kernel: arpresolve: can't allocate llinfo for 128.95.8.25rt
> Nov 29 15:10:46 bs8 /kernel: arplookup 128.95.8.25 failed: could not allocate llinfo
> Nov 29 15:10:46 bs8 /kernel: arpresolve: can't allocate llinfo for 128.95.8.25rt
> 
Your routing table is screwed.  These "can't allocate llinfo" say this.


Cheers,
-- 
Ruslan Ermilov		Oracle Developer/DBA,
ru@sunbay.com		Sunbay Software AG,
ru@FreeBSD.org		FreeBSD committer,
+380.652.512.251	Simferopol, Ukraine

http://www.FreeBSD.org	The Power To Serve
http://www.oracle.com	Enabling The Information Age
Comment 6 ru freebsd_committer freebsd_triage 2001-12-01 17:28:55 UTC
On Wed, Nov 28, 2001 at 04:22:08PM -0800, Paul Herman wrote:
> 
> The following patch (against 4.4-RELEASE) solves this problem.  In
> -CURRENT it's a little different, but the same if condition should
> apply, as long as it appears before the rt_setgate() statement.
> 
> Voradesh, does this solve your problem?
> 
> -Paul.
> 
> Index: sys/net/rtsock.c
> ===================================================================
> RCS file: /mnt/ncvs/src/sys/net/rtsock.c,v
> retrieving revision 1.44.2.4
> diff -u -r1.44.2.4 rtsock.c
> --- sys/net/rtsock.c	2001/07/11 09:37:37	1.44.2.4
> +++ sys/net/rtsock.c	2001/11/27 01:33:03
> @@ -399,6 +399,14 @@
>  			break;
> 
>  		case RTM_CHANGE:
> +			/* Don't let the user specify non-link information
> +			 * for a gateway if the RTF_LLINFO flag is set.
> +			 * We'll just leave the gateway alone.
> +			 */
> +			if (gate && (rt->rt_flags & RTF_LLINFO) &&
> +			    gate->sa_family != AF_LINK)
> +				gate = rt->rt_gateway;
> +
>  			if (gate && (error = rt_setgate(rt, rt_key(rt), gate)))
>  				senderr(error);
> 
Paul,

If we deny this combo for RTM_CHANGE, we should then deny it for RTM_ADD
as well.  For example, "route add -host 1.2.3.4 5.6.7.8 -llinfo" shouldn't
create RTF_LLINFO entry with AF_INET gateway.  Perhaps in this case (RTM_ADD),
the code should return EINVAL.


Cheers,
-- 
Ruslan Ermilov		Oracle Developer/DBA,
ru@sunbay.com		Sunbay Software AG,
ru@FreeBSD.org		FreeBSD committer,
+380.652.512.251	Simferopol, Ukraine

http://www.FreeBSD.org	The Power To Serve
http://www.oracle.com	Enabling The Information Age
Comment 7 Paul Herman 2001-12-01 23:02:02 UTC
On Sat, 1 Dec 2001, Ruslan Ermilov wrote:

> On Wed, Nov 28, 2001 at 04:22:08PM -0800, Paul Herman wrote:
> >
> > The following patch (against 4.4-RELEASE) solves this problem.  In
> > -CURRENT it's a little different, but the same if condition should
> > apply, as long as it appears before the rt_setgate() statement.
>
> If we deny this combo for RTM_CHANGE, we should then deny it for
> RTM_ADD as well.  For example, "route add -host 1.2.3.4 5.6.7.8
> -llinfo" shouldn't create RTF_LLINFO entry with AF_INET gateway.
> Perhaps in this case (RTM_ADD), the code should return EINVAL.

Hi Ruslan,

Yes.  In fact, it should ideally be in rt_setgate() which will
catch all cases.  The reason I didn't do this was because the IPV6
stack, as I found out, *does* put AF_INET information as a gateway
with the LLINFO bit set. :-( This is why I went conservative and
only made a small change.

Adding it to RTM_ADD I think would be a good thing, and returning
EINVAL should be OK as long as it works with routed (haven't
checked.)

-Paul.
Comment 8 Remko Lodder freebsd_committer freebsd_triage 2006-11-12 10:36:34 UTC
State Changed
From-To: feedback->open

Reset state to open, feedback had been recieved a while ago
Comment 9 Remko Lodder freebsd_committer freebsd_triage 2006-12-13 14:29:12 UTC
State Changed
From-To: open->feedback

steal this ticket from ru to obtain feedback about the 
current status of this problem (I will bring it back 
to ruslan with more information if possible :-)). 


Comment 10 Remko Lodder freebsd_committer freebsd_triage 2006-12-13 14:29:12 UTC
Responsible Changed
From-To: ru->remko

Grab the ticket from ru so that i can trace the feedback.
Comment 11 Remko Lodder freebsd_committer freebsd_triage 2008-02-21 17:57:03 UTC
State Changed
From-To: feedback->closed

Feedback timeout (never received)