|Summary:||r293159 breaks OpenVPN routing|
|Component:||kern||Assignee:||Alexander V. Chernikov <melifaro>|
|Severity:||Affects Many People||CC:||arkadiusz.majewski, bdrewery, daniel, emaste, garga, gasparello, gnn, guyyur, josh.cepek, mandree, marek, melifaro, mmpestorich, olivier, pi, re, tom, tom|
Description daniel.engberg.lists 2016-03-09 07:31:53 UTC
Hi, I have a box that acts as a firewall (pf), gateway and VPN gateway running OpenVPN. Upgrading from -CURRENT r290676 to r295667 broke some of the functionality namely the ability to route traffic from the VPN to other networks. The network setup looks like this: Network A (AMD64) - 192.168.20.0/24 (VPN: 10.0.9.1) Network B - 192.168.40.0/24 (VPN: 10.0.9.240) Network C (AMD64) - 192.168.1.0/24 (VPN: 10.0.9.253) Network B and C connects to Network A and accesses both devices on Network A but also between each others network, Network A (the box itself) works in that regard as a hub. This is setup using tunneling (tun interfaces). Upgrading to r295667 (including rebuilding everything) brakes this completely (you cannot ping the other nodes either), so I decided to do some backtracking to see where it stopped working. This is tested using full rebuilds (world, kernel, ports) no partial ones. r290676 - OK r290866 - OK r291136 - OK r291262 - OK r291465 - OK r291855 - OK r292004 - OK r292019 - OK r292158 - OK r292483 - OK r292626 - OK r293017 - OK r293108 - OK r293313 - Broken The only related commit I can find is r293311 which seems very resonable. However it's not completely broken as Network C (client) can connect to other networks via VPN running r295667 which seems a bit weird to me (if the hub is working that is). Network B is a Linux client which also works but I don't think that's relevant in this case. Both Network A and Network C have no blocking filtering on the tun interfaces. pass in quick on tun0 all pass out quick on tun0 all Unfortunately I'm not a developer so I can't really tell what's really broken but I'm willing to test patches etc. If there's anything else you need or have questions just fire off a mail and I'll try to respond as useful as possible. Keep up the good work! Best regards, Daniel Engberg
Comment 1 guyyur 2016-03-09 17:54:37 UTC
Does the route for the VPN network has Netif lo0 instead of tun0 on your box? reverting r293159 changes in rtsock.c fixed it for me. I posted to freebsd-net last month https://lists.freebsd.org/pipermail/freebsd-net/2016-February/044591.html rib_lookup_info doesn't return info->rti_info[RTAX_GATEWAY] because the found interface route has a gateway but no RTF_GATEWAY in rt_flags. The check then fails to clear RTF_GATEWAY ss.ss_family is 0 not AF_LINK. With info.rti_flags still containing RTF_GATEWAY instead of RTF_GWFLAG_COMPAT, rtrequest1_fib -> rt_getifa_fib -> ifa_ifwithroute then returns lo0. Before the change the check was for rt_gateway->sa_family without caring if RTF_GATEWAY is set or not.
Comment 2 daniel.engberg.lists 2016-03-10 08:34:50 UTC
Oh, sorry for missing your post. From what I can tell I seem to see the same issue. 192.168.1.0/24 10.0.9.253 UGS lo0 So, yeah... I've updated the bug report and thank you for you findings.
Comment 3 Bryan Drewery 2016-03-17 19:53:02 UTC
I suspect there is a leak in this commit as well. # vmstat -m|grep routetbl routetbl 8928 4484K - 13324 512,1024 This keeps growing for me.
Comment 4 Bryan Drewery 2016-03-24 03:42:25 UTC
(In reply to Bryan Drewery from comment #3) > I suspect there is a leak in this commit as well. > > # vmstat -m|grep routetbl > routetbl 8928 4484K - 13324 512,1024 > > This keeps growing for me. This particular issue is fixed in r297222.
Comment 5 Mark Linimon 2016-04-18 00:57:49 UTC
Over to committer of 293159.
Comment 6 daniel.engberg.lists 2016-05-23 18:14:05 UTC
Given that FreeBSD 11 code freeze is approaching, could someone please look into this as committer doesn't seem to respond. I'd like to think that people would like this software working when 11 is released.
Comment 7 Olivier Cochard 2016-06-02 21:38:49 UTC
I've try to reproduce this problem (without pf) on my lab using a fresh r301229 but "it's work for me". My simple OpenVPN lab setup is detailled here: http://bsdrp.net/documentation/examples/gre_ipsec_and_openvpn#openvpn Should I setup an hub&spoke design for reproducing this problem ? Regards, Olivier
Comment 8 guyyur 2016-06-02 22:08:02 UTC
Oliver, Did you try to connect with more than one openvpn client? For me the problem was only seen when the second client tried to connect. The first client got the address assigned to the remote side of the tunnel and there is a correct route for it. The second client got an address that required using the /24 route which incorrectly leads to lo0. Bad route for 192.168.170.0/24 with r293159: # netstat -rnf inet | grep -e Destination -e 192.168.170 Destination Gateway Flags Netif Expire 192.168.170.0/24 192.168.170.1 UGS lo0 192.168.170.1 link#4 UHS lo0 192.168.170.2 link#4 UH tun0 Good route for 192.168.170.0/24 with r293159 rtsock.c changes reverted: # netstat -rnf inet | grep -e Destination -e 192.168.170 Destination Gateway Flags Netif Expire 192.168.170.0/24 192.168.170.1 UGS tun0 192.168.170.1 link#4 UHS lo0 192.168.170.2 link#4 UH tun0 # ifconfig tun0 inet tun0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1500 options=80000<LINKSTATE> inet 192.168.170.1 --> 192.168.170.2 netmask 0xffffff00 Opened by PID 671 You can also manually create the tunnel and route to compare 10.3-RELEASE to 11-CURRENT: ifconfig tun0 create ifconfig tun0 192.168.170.1 192.168.170.2 mtu 1500 netmask 255.255.255.0 up route add -net 192.168.170.0 192.168.170.1 255.255.255.0
Comment 9 daniel.engberg.lists 2016-06-02 23:20:41 UTC
As guyyur says it only occurs after the second client connects.
Comment 10 Olivier Cochard 2016-06-03 13:16:38 UTC
I've added another openvpn client and they have no problem for reaching the HUB subnet neither the other client subnet.
Comment 11 Olivier Cochard 2016-06-03 20:39:01 UTC
Because I didn't reach to break my OpenVPN (using the default openvpn topology), I've tried with your manual way of reproducing the problem. On a 10.3-RELEASE-p2: [email@example.com]~# ifconfig tun0 create [firstname.lastname@example.org]~# ifconfig tun0 192.168.170.1 192.168.170.2 mtu 1500 netmask 255.255.255.0 up [email@example.com]~# route add -net 192.168.170.0 192.168.170.1 255.255.255.0 add net 192.168.170.0: gateway 192.168.170.1 fib 0 [root@hp]~# ifconfig tun0 inet tun0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1500 options=80000<LINKSTATE> inet 192.168.170.1 --> 192.168.170.2 netmask 0xffffff00 [firstname.lastname@example.org]~# netstat -rnf inet | grep -e Destination -e 192.168.170 Destination Gateway Flags Netif Expire 192.168.170.0/24 192.168.170.1 UGS tun0 192.168.170.1 link#15 UHS lo0 192.168.170.2 link#15 UH tun0 On a 11.0-ALPHA1 r301229: [root@11-alpha1]~#ifconfig tun0 create [root@11-alpha1]~#ifconfig tun0 192.168.170.1 192.168.170.2 mtu 1500 netmask 255.255.255.0 up [root@11-alpha1]~#route add -net 192.168.170.0 192.168.170.1 255.255.255.0 [root@11-alpha1]~# ifconfig tun0 inet tun0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1500 options=80000<LINKSTATE> inet 192.168.170.1 --> 192.168.170.2 netmask 0xffffff00 [root@11-alpha1]~# netstat -rnf inet | grep -e Destination -e 192.168.170 Destination Gateway Flags Netif Expire 192.168.170.0/24 192.168.170.1 UGS lo0 192.168.170.1 link#9 UHS lo0 192.168.170.2 link#9 UH tun0 Then I confirm there is a "netif" difference between 10.3 and head.
Comment 12 daniel.engberg.lists 2016-06-04 07:44:45 UTC
Olivier, If you're using the 11-box as HUB/server it'll break for sure. I can attach you my config if you want.
Comment 13 Alexander V. Chernikov 2016-06-05 16:47:28 UTC
I'm sorry for disappearing for several month w/o handling this issue. Also, many thanks for Guy Yur for analysing the problem - indeed, all interface/loopback routes are installed w link-level sockaddr_dl "gateway" (used solely to save ifnet index of the target interface) and w/o RTF_GATEWAY flag. I totally forgot this case when making this change. The problem here is that sockaddr_dl is a pretty big structure (64 bytes vs 32-byte sockaddr_in6) so most of the users (which actually don't care about these "gateway" would still need to allocate this buffer in order to avoid receiving ENOMEM). Will think about it a bit and come w/ fix tomorrow.
Comment 14 daniel.engberg.lists 2016-06-17 17:39:52 UTC
Any ideas Alexander?
Comment 15 daniel.engberg.lists 2016-07-07 20:59:41 UTC
I've e-mailed re@ about this regression in case they've missed it.
Comment 16 Glen Barber 2016-08-08 15:59:21 UTC
(In reply to Alexander V. Chernikov from comment #13) > I'm sorry for disappearing for several month w/o handling this issue. > Also, many thanks for Guy Yur for analysing the problem - indeed, all > interface/loopback routes are installed w link-level sockaddr_dl "gateway" > (used solely to save ifnet index of the target interface) and w/o > RTF_GATEWAY flag. I totally forgot this case when making this change. > > The problem here is that sockaddr_dl is a pretty big structure (64 bytes vs > 32-byte sockaddr_in6) so most of the users (which actually don't care about > these "gateway" would still need to allocate this buffer in order to avoid > receiving ENOMEM). Will think about it a bit and come w/ fix tomorrow. Is there any update to this?
Comment 17 Anton 2016-10-06 17:39:58 UTC
Hi, I've the same problem - One instance of OpenVPN client, gateway UGS goes via lo0 instead of tun0 interface (FreeBSD 11.0 RC3). As workaround I have to manually change default gateway: # route change default ... What is the progress to solve this issue?
Comment 18 Renato Botelho 2016-11-01 20:56:59 UTC
This is upstream ticket - https://community.openvpn.net/openvpn/ticket/425#comment:14
Comment 19 Renato Botelho 2016-11-01 20:59:16 UTC
*** Bug 213709 has been marked as a duplicate of this bug. ***
Comment 20 daniel.engberg.lists 2016-11-04 06:30:49 UTC
FYI, OpenVPN devs are working on this issue and will try to push it into the next release which will be available soon.
Comment 21 Renato Botelho 2016-11-08 19:00:18 UTC
OpenVPN developers came up with a solution. I've imported it to pfSense 2.4.0-BETA so users can test it. https://github.com/pfsense/FreeBSD-ports/commit/153999c431c59ac95d71f3214a48d9032a566c58
Comment 22 Renato Botelho 2016-11-08 19:41:55 UTC
Adding mandree@ to CC list since he is OpenVPN maintainer
Comment 23 Matthias Andree 2016-11-09 22:05:58 UTC
Let's take the upstream patch for a spin in the port. Commit coming up.
Comment 24 commit-hook 2016-11-09 22:06:54 UTC
A commit references this bug: Author: mandree Date: Wed Nov 9 22:06:27 UTC 2016 New revision: 425811 URL: https://svnweb.freebsd.org/changeset/ports/425811 Log: Experimental patch for topology subnet. Added as an extra patch behind an option that defaults to ON so people can still opt out, this is slated for an upcoming 2.3.14 release that is, however, not yet scheduled. PR: 207831 (related) Obtained from: Gert Doering, via upstream Git repository 446ef5bda4cdc75d Changes: head/security/openvpn/Makefile head/security/openvpn/files/extra-patch-fix-subnet
Comment 25 Matthias Andree 2016-11-09 22:07:20 UTC
Note that the openvpn fix changes how openvpn configures the interface and routing and thus sidesteps the regression, which may have an impact on other applications as well.
Comment 26 daniel.engberg.lists 2016-11-18 18:39:50 UTC
Works for me (tm), should we update the "Topic/Subject" to something else?