Bug 21079

Summary: IPSEC, kernel ARPs for tunnel endpoint instead of next-hop gateway
Product: Base System Reporter: Brian Candler <B.Candler>
Component: kernAssignee: Hajimu UMEMOTO <ume>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 4.1-RELEASE   
Hardware: Any   
OS: Any   

Description Brian Candler 2000-09-06 17:10:04 UTC
When sending IPSEC packets, and the ARP cache for the next hop expires,
the machine tries to ARP for the tunnel endpoint address instead of
the next hop router. Thus the symptom is that connectivity drops after
a few minutes.

e.g.

      +- - - - - - - - +
     R1                R2
     |                 |
     A                 B
  -+-+---           -+-+---
   W1                W2

Box A is a FreeBSD-4.1 PC configured as IPSEC VPN gateway.

W1, W2 are workstations. A points defaultroute at R1.

This works for a couple of minutes, until A's ARP entry for R1 expires.
At that point, A sends out ARP packets for B's IP address, not R1's IP
address!

The kernel logs the following message:

Sep  4 10:33:01 godl-vpn /kernel: arplookup b.b.b.b failed: host is not on local network

(where b.b.b.b is B's IP address, i.e. the remote tunnel endpoint)

arp -an shows:
? (b.b.b.b) at (incomplete) [ethernet]

Connectivity is lost until you manually do
  # arp -d b.b.b.b
  # ping b.b.b.b
At this point the IPSEC packets start to flow, until the ARP cache expires again.

Fix: 

Workaround: add static ARP entry for the gateway, so that it never expires.
arp -S 192.168.1.254 gg:gg:gg:gg:gg:gg
How-To-Repeat: You can do this with just one PC running FreeBSD, as it doesn't matter
that the remote end does not exist.

(1) Create /etc/ipsec.conf

[Replace 192.168.1.180 with your PC's ethernet address, but leave all
the other numbers as they are here]

flush;
add 192.168.1.180 192.0.2.1 esp 256
        -E des-cbc 0x1111111111111111
        -A hmac-md5 0x22222222222222222222222222222222;
add 192.0.2.1 192.168.1.180 esp 256
        -E des-cbc 0x1111111111111111
        -A hmac-md5 0x22222222222222222222222222222222;

spdflush;
spdadd 10.0.0.0/24[any] 10.0.1.0/24[any] any
        -P out ipsec esp/tunnel/192.168.1.180-192.0.2.1/require;
spdadd 10.0.1.0/24[any] 10.0.0.0/24[any] any
        -P in ipsec esp/tunnel/192.0.2.1-192.168.1.180/require;

(2) ifconfig lo0 10.0.0.1 netmask 255.255.255.0 alias
(3) setkey -f /etc/ipsec.conf
(4) ping -S 10.0.0.1 10.0.1.1

Make sure there is no other IP traffic being generated by this PC (i.e.
no ntpd etc)

(5) On another VC, run a tcpdump. You should see
16:49:18.950061 192.168.1.180 > 192.0.2.1: ESP(spi=256,seq=0x1b)
16:49:19.960064 192.168.1.180 > 192.0.2.1: ESP(spi=256,seq=0x1c)
16:49:20.970104 192.168.1.180 > 192.0.2.1: ESP(spi=256,seq=0x1d)
16:49:21.980124 192.168.1.180 > 192.0.2.1: ESP(spi=256,seq=0x1e)

(except with 192.168.1.180 changed to your local ethernet address)

(6) On a third VC, type "arp -d 192.168.1.254" (but use your PC's
gateway address instead of 192.168.1.254)

Go back to the second VC and you will see:

16:49:22.990120 arp who-has 192.0.2.1 tell 192.168.1.180
                            ^^^^^^^^^
i.e. it is ARPing for the tunnel endpoint, not the gateway.

If you have a Cisco on your network in its default mode (gratiously
proxy ARP) then the Cisco will respond, but the kernel will ignore it.

(7) arp -n 192.168.1.254
? (192.168.1.254) at (incomplete) [ethernet]

(8) Stop the ping -S process and do
arp -d 192.168.1.254
ping 192.168.1.254

Then restart the ping -S; the packets will be flowing again.
Comment 1 Brian Candler 2000-09-06 17:17:58 UTC
Turns out this has already been reported to KAME project as
http://orange.kame.net/dev/query-pr.cgi?pr=233
Comment 2 Brian Candler 2000-09-06 17:26:18 UTC
[And the last part of the description should read as:]

arp -an shows:
? (g.g.g.g) at (incomplete) [ethernet]

where g.g.g.g is R1's IP address on the link to A, i.e. A's default gateway.

Connectivity is lost until you manually do
  # arp -d g.g.g.g
  # ping g.g.g.g

At this point the IPSEC packets start to flow, until the ARP cache expires again.
Comment 3 Sheldon Hearn freebsd_committer freebsd_triage 2000-09-06 19:41:54 UTC
Responsible Changed
From-To: freebsd-bugs->ume

Over to maintainer.
Comment 4 jq 2001-01-11 16:33:29 UTC
I experienced this problem when trying to connect 4 LANs
together for a newly merged company.

1 Box with either racoon or manual keying was failing
mysteriously many times per hour.

A related preblem report at Kame.net documents a better
solution for many cases.
http://orange.kame.net/dev/query-pr.cgi?pr=233

If the IPSEC gateway has a single external interface, then the
IPSEC_SRCSEL option will correct the problem.

I suggest that a good interim solution is the following:

1.	Add the line
IPSEC_SRCSEL		opt_ipsec.h
to /usr/src/sys/cong/options

2.	Add this, with suitable commentary to LINT
to provide a mechanism for people to more easily identify
and fix this problem until Kame+FreeBSD finds a more general
solution.
options         IPSEC_SRCSEL            #Prevent arp cache hangs
(That comment is not good but you get the idea)

*** options.orig        Thu Jan 11 11:22:12 2001
--- options     Thu Jan 11 11:24:26 2001
***************
*** 230,235 ****
--- 230,236 ----
  IPSEC                 opt_ipsec.h
  IPSEC_ESP             opt_ipsec.h
  IPSEC_DEBUG           opt_ipsec.h
+ IPSEC_SRCSEL          opt_ipsec.h
  IPDIVERT
  DUMMYNET              opt_ipdn.h
  IPFILTER              opt_ipfilter.h
Comment 5 itojun 2001-01-21 16:05:28 UTC
	i'm re-sending this as it did not show up onto GNATS.

itojun

------- Forwarded Messages

Return-Path: <owner-core@kame.net>
Received: from orange.kame.net (orange.kame.net [203.178.141.194])
	by coconut.itojun.org (8.9.3+3.2W/3.7W) with ESMTP id DAA28128
	for <itojun@itojun.org>; Sun, 21 Jan 2001 03:40:55 +0900 (JST)
Received: from coconut.itojun.org ([210.160.95.97])
	by orange.kame.net (8.9.3+3.2W/3.7W/smtpfeed 1.06) with ESMTP id DAA74151
	for <core@kame.net>; Sun, 21 Jan 2001 03:40:54 +0900 (JST)
Received: from kiwi.itojun.org (localhost.itojun.org [127.0.0.1])
	by coconut.itojun.org (8.9.3+3.2W/3.7W) with ESMTP id DAA28112;
	Sun, 21 Jan 2001 03:40:31 +0900 (JST)
To: "James E. Quick" <jq@quick.com>
cc: gnats@FreeBSD.org
cc: Hajimu UMEMOTO <ume@mahoroba.org>
cc: core@kame.net
In-reply-to: ume's message of Sun, 21 Jan 2001 03:25:52 JST.
      <20010121.032552.41664235.ume@mahoroba.org>
X-Template-Reply-To: itojun@itojun.org
X-Template-Return-Receipt-To: itojun@itojun.org
X-PGP-Fingerprint: F8 24 B4 2C 8C 98 57 FD  90 5F B4 60 79 54 16 E2
Subject: Re: Fw: kern/21079: IPSEC, kernel ARPs for tunnel endpoint instead of next-hop gateway
From: itojun@iijlab.net
Date: Sun, 21 Jan 2001 03:40:31 +0900
Message-ID: <28110.980016031@coconut.itojun.org>
Sender: itojun@itojun.org
X-Filter: mailagent [version 3.0 PL68] for itojun@itojun.org


> If the IPSEC gateway has a single external interface, then the
> IPSEC_SRCSEL option will correct the problem.
> I suggest that a good interim solution is the following:
> 1.	Add the line
> IPSEC_SRCSEL		opt_ipsec.h
> to /usr/src/sys/cong/options

	unfortunately, no.
	by enabling IPSEC_SRCSEL you will lose interoperability with others
	due to wrongly picked source address on IPsec tunnelled packet (outer
	header).  your main problem (ARP target address) gets solved by the
	sideeffect of IPSEC_SRCSEL.

	so, please do not enable IPSEC_SRCSEL.  we need to come up with the
	right solution.

itojun

------- Message 2

Return-Path: <owner-core@kame.net>
Received: from orange.kame.net (orange.kame.net [203.178.141.194])
	by coconut.itojun.org (8.9.3+3.2W/3.7W) with ESMTP id DAA28585
	for <itojun@itojun.org>; Sun, 21 Jan 2001 03:59:52 +0900 (JST)
Received: from coconut.itojun.org (coconut.itojun.org [210.160.95.97])
	by orange.kame.net (8.9.3+3.2W/3.7W/smtpfeed 1.06) with ESMTP id DAA74435
	for <core@kame.net>; Sun, 21 Jan 2001 03:59:50 +0900 (JST)
Received: from kiwi.itojun.org (localhost.itojun.org [127.0.0.1])
	by coconut.itojun.org (8.9.3+3.2W/3.7W) with ESMTP id DAA28578;
	Sun, 21 Jan 2001 03:59:39 +0900 (JST)
to: "James E. Quick" <jq@quick.com>
cc: gnats@FreeBSD.org, Hajimu UMEMOTO <ume@mahoroba.org>, core@kame.net
In-reply-to: itojun's message of Sun, 21 Jan 2001 03:40:31 JST.
      <28110.980016031@coconut.itojun.org>
X-Template-Reply-To: itojun@itojun.org
X-Template-Return-Receipt-To: itojun@itojun.org
X-PGP-Fingerprint: F8 24 B4 2C 8C 98 57 FD  90 5F B4 60 79 54 16 E2
Subject: Re: kern/21079: IPSEC, kernel ARPs for tunnel endpoint instead of next-hop gateway
From: itojun@iijlab.net
Date: Sun, 21 Jan 2001 03:59:38 +0900
Message-ID: <28576.980017178@coconut.itojun.org>
Sender: itojun@itojun.org
X-Filter: mailagent [version 3.0 PL68] for itojun@itojun.org


>	unfortunately, no.
>	by enabling IPSEC_SRCSEL you will lose interoperability with others
>	due to wrongly picked source address on IPsec tunnelled packet (outer
>	header).  your main problem (ARP target address) gets solved by the
>	sideeffect of IPSEC_SRCSEL.
>
>	so, please do not enable IPSEC_SRCSEL.  we need to come up with the
>	right solution.

	does it do the right thing?
	http://orange.kame.net/dev/cvsweb.cgi/kame/kame/sys/netinet6/ipsec.c
	revision 1.84 -> 1.85

itojun

------- End of Forwarded Messages
Comment 6 Hajimu UMEMOTO freebsd_committer freebsd_triage 2001-03-16 18:06:08 UTC
State Changed
From-To: open->feedback

I just merged the fix from KAME. 
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet6/ipsec.c.diff?r1=1.9&r2=1.10
Comment 7 Brian Candler 2001-04-16 12:58:45 UTC
> State-Changed-From-To: open->feedback

I have just tested this with FreeBSD 4.2-20010323-STABLE and 4.3RC3 (both of
which have ipsec.c v1.3.2.5) and the problem appears to be fixed. Thank you!
Comment 8 Hajimu UMEMOTO freebsd_committer freebsd_triage 2001-04-16 16:45:34 UTC
State Changed
From-To: feedback->closed

Thank you for your reporting.  Since this problem seems gone, 
I close this pr.