Bug 207831 - r293159 breaks OpenVPN routing
Summary: r293159 breaks OpenVPN routing
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.0-STABLE
Hardware: Any Any
: --- Affects Many People
Assignee: Alexander V. Chernikov
URL:
Keywords: regression
: 213709 (view as bug list)
Depends on:
Blocks:
 
Reported: 2016-03-09 07:31 UTC by daniel.engberg.lists
Modified: 2018-08-29 21:29 UTC (History)
18 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description daniel.engberg.lists 2016-03-09 07:31:53 UTC
Hi,

I have a box that acts as a firewall (pf), gateway and VPN gateway running OpenVPN. Upgrading from -CURRENT r290676 to r295667 broke some of the functionality namely the ability to route traffic from the VPN to other networks.

The network setup looks like this:

Network A (AMD64) - 192.168.20.0/24 (VPN: 10.0.9.1)
Network B - 192.168.40.0/24 (VPN: 10.0.9.240)
Network C (AMD64) - 192.168.1.0/24 (VPN: 10.0.9.253)

Network B and C connects to Network A and accesses both devices on Network A but also between each others network, Network A (the box itself) works in that regard as a hub. This is setup using tunneling (tun interfaces).

Upgrading to r295667 (including rebuilding everything) brakes this completely (you cannot ping the other nodes either), so I decided to do some backtracking to see where it stopped working. This is tested using full rebuilds (world, kernel, ports) no partial ones.

r290676 - OK
r290866 - OK
r291136 - OK
r291262 - OK
r291465 - OK
r291855 - OK
r292004 - OK
r292019 - OK
r292158 - OK
r292483 - OK
r292626 - OK
r293017 - OK
r293108 - OK
r293313 - Broken
The only related commit I can find is r293311 which seems very resonable.

However it's not completely broken as Network C (client) can connect to other networks via VPN running r295667 which seems a bit weird to me (if the hub is working that is). Network B is a Linux client which also works but I don't think that's relevant in this case.

Both Network A and Network C have no blocking filtering on the tun interfaces.

pass in quick on tun0 all
pass out quick on tun0 all

Unfortunately I'm not a developer so I can't really tell what's really broken but I'm willing to test patches etc.

If there's anything else you need or have questions just fire off a mail and I'll try to respond as useful as possible.

Keep up the good work!

Best regards,
Daniel Engberg
Comment 1 guyyur 2016-03-09 17:54:37 UTC
Does the route for the VPN network has Netif lo0 instead of tun0 on your box?

reverting r293159 changes in rtsock.c fixed it for me.

I posted to freebsd-net last month
https://lists.freebsd.org/pipermail/freebsd-net/2016-February/044591.html

rib_lookup_info doesn't return info->rti_info[RTAX_GATEWAY] because the found interface route has a gateway but no RTF_GATEWAY in rt_flags.
The check then fails to clear RTF_GATEWAY ss.ss_family is 0 not AF_LINK.

With info.rti_flags still containing RTF_GATEWAY instead of RTF_GWFLAG_COMPAT,
rtrequest1_fib -> rt_getifa_fib -> ifa_ifwithroute then returns lo0.

Before the change the check was for rt_gateway->sa_family without caring if RTF_GATEWAY is set or not.
Comment 2 daniel.engberg.lists 2016-03-10 08:34:50 UTC
Oh, sorry for missing your post.
From what I can tell I seem to see the same issue.

192.168.1.0/24     10.0.9.253         UGS         lo0

So, yeah... I've updated the bug report and thank you for you findings.
Comment 3 Bryan Drewery freebsd_committer 2016-03-17 19:53:02 UTC
I suspect there is a leak in this commit as well.

# vmstat -m|grep routetbl
     routetbl  8928  4484K       -    13324  512,1024

This keeps growing for me.
Comment 4 Bryan Drewery freebsd_committer 2016-03-24 03:42:25 UTC
(In reply to Bryan Drewery from comment #3)
> I suspect there is a leak in this commit as well.
> 
> # vmstat -m|grep routetbl
>      routetbl  8928  4484K       -    13324  512,1024
> 
> This keeps growing for me.

This particular issue is fixed in r297222.
Comment 5 Mark Linimon freebsd_committer freebsd_triage 2016-04-18 00:57:49 UTC
Over to committer of 293159.
Comment 6 daniel.engberg.lists 2016-05-23 18:14:05 UTC
Given that FreeBSD 11 code freeze is approaching, could someone please look into this as committer doesn't seem to respond. I'd like to think that people would like this software working when 11 is released.
Comment 7 Olivier Cochard freebsd_committer 2016-06-02 21:38:49 UTC
I've try to reproduce this problem (without pf) on my lab using a fresh r301229 but "it's work for me".
My simple OpenVPN lab setup is detailled here: http://bsdrp.net/documentation/examples/gre_ipsec_and_openvpn#openvpn

Should I setup an hub&spoke design for reproducing this problem ?

Regards,

Olivier
Comment 8 guyyur 2016-06-02 22:08:02 UTC
Oliver,

Did you try to connect with more than one openvpn client?

For me the problem was only seen when the second client tried to connect.
The first client got the address assigned to the remote side of the tunnel
and there is a correct route for it.
The second client got an address that required using the /24 route which
incorrectly leads to lo0.


Bad route for 192.168.170.0/24 with r293159:
# netstat -rnf inet | grep -e Destination -e 192.168.170
Destination        Gateway            Flags     Netif Expire
192.168.170.0/24   192.168.170.1      UGS         lo0
192.168.170.1      link#4             UHS         lo0
192.168.170.2      link#4             UH         tun0


Good route for 192.168.170.0/24 with r293159 rtsock.c changes reverted:
# netstat -rnf inet | grep -e Destination -e 192.168.170
Destination        Gateway            Flags     Netif Expire
192.168.170.0/24   192.168.170.1      UGS        tun0
192.168.170.1      link#4             UHS         lo0
192.168.170.2      link#4             UH         tun0


# ifconfig tun0 inet
tun0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1500
        options=80000<LINKSTATE>
        inet 192.168.170.1 --> 192.168.170.2 netmask 0xffffff00
        Opened by PID 671


You can also manually create the tunnel and route to compare 10.3-RELEASE to 11-CURRENT:
ifconfig tun0 create
ifconfig tun0 192.168.170.1 192.168.170.2 mtu 1500 netmask 255.255.255.0 up
route add -net 192.168.170.0 192.168.170.1 255.255.255.0
Comment 9 daniel.engberg.lists 2016-06-02 23:20:41 UTC
As guyyur says it only occurs after the second client connects.
Comment 10 Olivier Cochard freebsd_committer 2016-06-03 13:16:38 UTC
I've added another openvpn client and they have no problem for reaching the HUB subnet neither the other client subnet.
Comment 11 Olivier Cochard freebsd_committer 2016-06-03 20:39:01 UTC
Because I didn't reach to break my OpenVPN (using the default openvpn topology), I've tried with your manual way of reproducing the problem.

On a 10.3-RELEASE-p2:

[root@10.3]~# ifconfig tun0 create
[root@10.3]~# ifconfig tun0 192.168.170.1 192.168.170.2 mtu 1500 netmask 255.255.255.0 up
[root@10.3]~# route add -net 192.168.170.0 192.168.170.1 255.255.255.0
add net 192.168.170.0: gateway 192.168.170.1 fib 0
[root@hp]~# ifconfig tun0 inet
tun0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1500
        options=80000<LINKSTATE>
        inet 192.168.170.1 --> 192.168.170.2 netmask 0xffffff00

		
[root@10.3]~# netstat -rnf inet | grep -e Destination -e 192.168.170
Destination        Gateway            Flags      Netif Expire
192.168.170.0/24   192.168.170.1      UGS        tun0
192.168.170.1      link#15            UHS         lo0
192.168.170.2      link#15            UH         tun0


On a 11.0-ALPHA1 r301229:

[root@11-alpha1]~#ifconfig tun0 create
[root@11-alpha1]~#ifconfig tun0 192.168.170.1 192.168.170.2 mtu 1500 netmask 255.255.255.0 up
[root@11-alpha1]~#route add -net 192.168.170.0 192.168.170.1 255.255.255.0
[root@11-alpha1]~# ifconfig tun0 inet
tun0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1500
        options=80000<LINKSTATE>
        inet 192.168.170.1 --> 192.168.170.2  netmask 0xffffff00         

[root@11-alpha1]~# netstat -rnf inet | grep -e Destination -e 192.168.170
Destination        Gateway            Flags     Netif Expire
192.168.170.0/24   192.168.170.1      UGS         lo0
192.168.170.1      link#9             UHS         lo0
192.168.170.2      link#9             UH         tun0   

Then I confirm there is a "netif" difference between 10.3 and head.
Comment 12 daniel.engberg.lists 2016-06-04 07:44:45 UTC
Olivier,

If you're using the 11-box as HUB/server it'll break for sure.

I can attach you my config if you want.
Comment 13 Alexander V. Chernikov freebsd_committer 2016-06-05 16:47:28 UTC
I'm sorry for disappearing for several month w/o handling this issue.
Also, many thanks for Guy Yur for analysing the problem - indeed, all interface/loopback routes are installed w link-level sockaddr_dl "gateway" (used  solely to save ifnet index of the target interface) and w/o RTF_GATEWAY flag. I totally forgot this case when making this change.

The problem here is that sockaddr_dl is a pretty big structure (64 bytes vs 32-byte sockaddr_in6) so most of the users (which actually don't care about these "gateway" would still need to allocate this buffer in order to avoid receiving ENOMEM). Will think about it a bit and come w/ fix tomorrow.
Comment 14 daniel.engberg.lists 2016-06-17 17:39:52 UTC
Any ideas Alexander?
Comment 15 daniel.engberg.lists 2016-07-07 20:59:41 UTC
I've e-mailed re@ about this regression in case they've missed it.
Comment 16 Glen Barber freebsd_committer 2016-08-08 15:59:21 UTC
(In reply to Alexander V. Chernikov from comment #13)
> I'm sorry for disappearing for several month w/o handling this issue.
> Also, many thanks for Guy Yur for analysing the problem - indeed, all
> interface/loopback routes are installed w link-level sockaddr_dl "gateway"
> (used  solely to save ifnet index of the target interface) and w/o
> RTF_GATEWAY flag. I totally forgot this case when making this change.
> 
> The problem here is that sockaddr_dl is a pretty big structure (64 bytes vs
> 32-byte sockaddr_in6) so most of the users (which actually don't care about
> these "gateway" would still need to allocate this buffer in order to avoid
> receiving ENOMEM). Will think about it a bit and come w/ fix tomorrow.

Is there any update to this?
Comment 17 Anton 2016-10-06 17:39:58 UTC
Hi,

I've the same problem - One instance of OpenVPN client, gateway UGS goes via lo0 instead of tun0 interface (FreeBSD 11.0 RC3).
As workaround I have to manually change default gateway:
# route change default ...

What is the progress to solve this issue?
Comment 18 Renato Botelho freebsd_committer 2016-11-01 20:56:59 UTC
This is upstream ticket - https://community.openvpn.net/openvpn/ticket/425#comment:14
Comment 19 Renato Botelho freebsd_committer 2016-11-01 20:59:16 UTC
*** Bug 213709 has been marked as a duplicate of this bug. ***
Comment 20 daniel.engberg.lists 2016-11-04 06:30:49 UTC
FYI, OpenVPN devs are working on this issue and will try to push it into the next release which will be available soon.
Comment 21 Renato Botelho freebsd_committer 2016-11-08 19:00:18 UTC
OpenVPN developers came up with a solution. I've imported it to pfSense 2.4.0-BETA so users can test it.

https://github.com/pfsense/FreeBSD-ports/commit/153999c431c59ac95d71f3214a48d9032a566c58
Comment 22 Renato Botelho freebsd_committer 2016-11-08 19:41:55 UTC
Adding mandree@ to CC list since he is OpenVPN maintainer
Comment 23 Matthias Andree freebsd_committer 2016-11-09 22:05:58 UTC
Let's take the upstream patch for a spin in the port. Commit coming up.
Comment 24 commit-hook freebsd_committer 2016-11-09 22:06:54 UTC
A commit references this bug:

Author: mandree
Date: Wed Nov  9 22:06:27 UTC 2016
New revision: 425811
URL: https://svnweb.freebsd.org/changeset/ports/425811

Log:
  Experimental patch for topology subnet.

  Added as an extra patch behind an option that defaults to ON so people
  can still opt out, this is slated for an upcoming 2.3.14 release that
  is, however, not yet scheduled.

  PR:		207831 (related)
  Obtained from:	Gert Doering, via upstream Git repository 446ef5bda4cdc75d

Changes:
  head/security/openvpn/Makefile
  head/security/openvpn/files/extra-patch-fix-subnet
Comment 25 Matthias Andree freebsd_committer 2016-11-09 22:07:20 UTC
Note that the openvpn fix changes how openvpn configures the interface and routing and thus sidesteps the regression, which may have an impact on other applications as well.
Comment 26 daniel.engberg.lists 2016-11-18 18:39:50 UTC
Works for me (tm), should we update the "Topic/Subject" to something else?