| Summary: | IPSEC, kernel ARPs for tunnel endpoint instead of next-hop gateway | ||
|---|---|---|---|
| Product: | Base System | Reporter: | Brian Candler <B.Candler> |
| Component: | kern | Assignee: | Hajimu UMEMOTO <ume> |
| Status: | Closed FIXED | ||
| Severity: | Affects Only Me | ||
| Priority: | Normal | ||
| Version: | 4.1-RELEASE | ||
| Hardware: | Any | ||
| OS: | Any | ||
Turns out this has already been reported to KAME project as http://orange.kame.net/dev/query-pr.cgi?pr=233 [And the last part of the description should read as:] arp -an shows: ? (g.g.g.g) at (incomplete) [ethernet] where g.g.g.g is R1's IP address on the link to A, i.e. A's default gateway. Connectivity is lost until you manually do # arp -d g.g.g.g # ping g.g.g.g At this point the IPSEC packets start to flow, until the ARP cache expires again. Responsible Changed From-To: freebsd-bugs->ume Over to maintainer. I experienced this problem when trying to connect 4 LANs together for a newly merged company. 1 Box with either racoon or manual keying was failing mysteriously many times per hour. A related preblem report at Kame.net documents a better solution for many cases. http://orange.kame.net/dev/query-pr.cgi?pr=233 If the IPSEC gateway has a single external interface, then the IPSEC_SRCSEL option will correct the problem. I suggest that a good interim solution is the following: 1. Add the line IPSEC_SRCSEL opt_ipsec.h to /usr/src/sys/cong/options 2. Add this, with suitable commentary to LINT to provide a mechanism for people to more easily identify and fix this problem until Kame+FreeBSD finds a more general solution. options IPSEC_SRCSEL #Prevent arp cache hangs (That comment is not good but you get the idea) *** options.orig Thu Jan 11 11:22:12 2001 --- options Thu Jan 11 11:24:26 2001 *************** *** 230,235 **** --- 230,236 ---- IPSEC opt_ipsec.h IPSEC_ESP opt_ipsec.h IPSEC_DEBUG opt_ipsec.h + IPSEC_SRCSEL opt_ipsec.h IPDIVERT DUMMYNET opt_ipdn.h IPFILTER opt_ipfilter.h i'm re-sending this as it did not show up onto GNATS. itojun ------- Forwarded Messages Return-Path: <owner-core@kame.net> Received: from orange.kame.net (orange.kame.net [203.178.141.194]) by coconut.itojun.org (8.9.3+3.2W/3.7W) with ESMTP id DAA28128 for <itojun@itojun.org>; Sun, 21 Jan 2001 03:40:55 +0900 (JST) Received: from coconut.itojun.org ([210.160.95.97]) by orange.kame.net (8.9.3+3.2W/3.7W/smtpfeed 1.06) with ESMTP id DAA74151 for <core@kame.net>; Sun, 21 Jan 2001 03:40:54 +0900 (JST) Received: from kiwi.itojun.org (localhost.itojun.org [127.0.0.1]) by coconut.itojun.org (8.9.3+3.2W/3.7W) with ESMTP id DAA28112; Sun, 21 Jan 2001 03:40:31 +0900 (JST) To: "James E. Quick" <jq@quick.com> cc: gnats@FreeBSD.org cc: Hajimu UMEMOTO <ume@mahoroba.org> cc: core@kame.net In-reply-to: ume's message of Sun, 21 Jan 2001 03:25:52 JST. <20010121.032552.41664235.ume@mahoroba.org> X-Template-Reply-To: itojun@itojun.org X-Template-Return-Receipt-To: itojun@itojun.org X-PGP-Fingerprint: F8 24 B4 2C 8C 98 57 FD 90 5F B4 60 79 54 16 E2 Subject: Re: Fw: kern/21079: IPSEC, kernel ARPs for tunnel endpoint instead of next-hop gateway From: itojun@iijlab.net Date: Sun, 21 Jan 2001 03:40:31 +0900 Message-ID: <28110.980016031@coconut.itojun.org> Sender: itojun@itojun.org X-Filter: mailagent [version 3.0 PL68] for itojun@itojun.org > If the IPSEC gateway has a single external interface, then the > IPSEC_SRCSEL option will correct the problem. > I suggest that a good interim solution is the following: > 1. Add the line > IPSEC_SRCSEL opt_ipsec.h > to /usr/src/sys/cong/options unfortunately, no. by enabling IPSEC_SRCSEL you will lose interoperability with others due to wrongly picked source address on IPsec tunnelled packet (outer header). your main problem (ARP target address) gets solved by the sideeffect of IPSEC_SRCSEL. so, please do not enable IPSEC_SRCSEL. we need to come up with the right solution. itojun ------- Message 2 Return-Path: <owner-core@kame.net> Received: from orange.kame.net (orange.kame.net [203.178.141.194]) by coconut.itojun.org (8.9.3+3.2W/3.7W) with ESMTP id DAA28585 for <itojun@itojun.org>; Sun, 21 Jan 2001 03:59:52 +0900 (JST) Received: from coconut.itojun.org (coconut.itojun.org [210.160.95.97]) by orange.kame.net (8.9.3+3.2W/3.7W/smtpfeed 1.06) with ESMTP id DAA74435 for <core@kame.net>; Sun, 21 Jan 2001 03:59:50 +0900 (JST) Received: from kiwi.itojun.org (localhost.itojun.org [127.0.0.1]) by coconut.itojun.org (8.9.3+3.2W/3.7W) with ESMTP id DAA28578; Sun, 21 Jan 2001 03:59:39 +0900 (JST) to: "James E. Quick" <jq@quick.com> cc: gnats@FreeBSD.org, Hajimu UMEMOTO <ume@mahoroba.org>, core@kame.net In-reply-to: itojun's message of Sun, 21 Jan 2001 03:40:31 JST. <28110.980016031@coconut.itojun.org> X-Template-Reply-To: itojun@itojun.org X-Template-Return-Receipt-To: itojun@itojun.org X-PGP-Fingerprint: F8 24 B4 2C 8C 98 57 FD 90 5F B4 60 79 54 16 E2 Subject: Re: kern/21079: IPSEC, kernel ARPs for tunnel endpoint instead of next-hop gateway From: itojun@iijlab.net Date: Sun, 21 Jan 2001 03:59:38 +0900 Message-ID: <28576.980017178@coconut.itojun.org> Sender: itojun@itojun.org X-Filter: mailagent [version 3.0 PL68] for itojun@itojun.org > unfortunately, no. > by enabling IPSEC_SRCSEL you will lose interoperability with others > due to wrongly picked source address on IPsec tunnelled packet (outer > header). your main problem (ARP target address) gets solved by the > sideeffect of IPSEC_SRCSEL. > > so, please do not enable IPSEC_SRCSEL. we need to come up with the > right solution. does it do the right thing? http://orange.kame.net/dev/cvsweb.cgi/kame/kame/sys/netinet6/ipsec.c revision 1.84 -> 1.85 itojun ------- End of Forwarded Messages State Changed From-To: open->feedback I just merged the fix from KAME. http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet6/ipsec.c.diff?r1=1.9&r2=1.10 > State-Changed-From-To: open->feedback
I have just tested this with FreeBSD 4.2-20010323-STABLE and 4.3RC3 (both of
which have ipsec.c v1.3.2.5) and the problem appears to be fixed. Thank you!
State Changed From-To: feedback->closed Thank you for your reporting. Since this problem seems gone, I close this pr. |
When sending IPSEC packets, and the ARP cache for the next hop expires, the machine tries to ARP for the tunnel endpoint address instead of the next hop router. Thus the symptom is that connectivity drops after a few minutes. e.g. +- - - - - - - - + R1 R2 | | A B -+-+--- -+-+--- W1 W2 Box A is a FreeBSD-4.1 PC configured as IPSEC VPN gateway. W1, W2 are workstations. A points defaultroute at R1. This works for a couple of minutes, until A's ARP entry for R1 expires. At that point, A sends out ARP packets for B's IP address, not R1's IP address! The kernel logs the following message: Sep 4 10:33:01 godl-vpn /kernel: arplookup b.b.b.b failed: host is not on local network (where b.b.b.b is B's IP address, i.e. the remote tunnel endpoint) arp -an shows: ? (b.b.b.b) at (incomplete) [ethernet] Connectivity is lost until you manually do # arp -d b.b.b.b # ping b.b.b.b At this point the IPSEC packets start to flow, until the ARP cache expires again. Fix: Workaround: add static ARP entry for the gateway, so that it never expires. arp -S 192.168.1.254 gg:gg:gg:gg:gg:gg How-To-Repeat: You can do this with just one PC running FreeBSD, as it doesn't matter that the remote end does not exist. (1) Create /etc/ipsec.conf [Replace 192.168.1.180 with your PC's ethernet address, but leave all the other numbers as they are here] flush; add 192.168.1.180 192.0.2.1 esp 256 -E des-cbc 0x1111111111111111 -A hmac-md5 0x22222222222222222222222222222222; add 192.0.2.1 192.168.1.180 esp 256 -E des-cbc 0x1111111111111111 -A hmac-md5 0x22222222222222222222222222222222; spdflush; spdadd 10.0.0.0/24[any] 10.0.1.0/24[any] any -P out ipsec esp/tunnel/192.168.1.180-192.0.2.1/require; spdadd 10.0.1.0/24[any] 10.0.0.0/24[any] any -P in ipsec esp/tunnel/192.0.2.1-192.168.1.180/require; (2) ifconfig lo0 10.0.0.1 netmask 255.255.255.0 alias (3) setkey -f /etc/ipsec.conf (4) ping -S 10.0.0.1 10.0.1.1 Make sure there is no other IP traffic being generated by this PC (i.e. no ntpd etc) (5) On another VC, run a tcpdump. You should see 16:49:18.950061 192.168.1.180 > 192.0.2.1: ESP(spi=256,seq=0x1b) 16:49:19.960064 192.168.1.180 > 192.0.2.1: ESP(spi=256,seq=0x1c) 16:49:20.970104 192.168.1.180 > 192.0.2.1: ESP(spi=256,seq=0x1d) 16:49:21.980124 192.168.1.180 > 192.0.2.1: ESP(spi=256,seq=0x1e) (except with 192.168.1.180 changed to your local ethernet address) (6) On a third VC, type "arp -d 192.168.1.254" (but use your PC's gateway address instead of 192.168.1.254) Go back to the second VC and you will see: 16:49:22.990120 arp who-has 192.0.2.1 tell 192.168.1.180 ^^^^^^^^^ i.e. it is ARPing for the tunnel endpoint, not the gateway. If you have a Cisco on your network in its default mode (gratiously proxy ARP) then the Cisco will respond, but the kernel will ignore it. (7) arp -n 192.168.1.254 ? (192.168.1.254) at (incomplete) [ethernet] (8) Stop the ping -S process and do arp -d 192.168.1.254 ping 192.168.1.254 Then restart the ping -S; the packets will be flowing again.