Bug 233283

Summary: IPv6 routing problem when using FreeBSD as a VPS at a cloud provider
Product: Base System Reporter: peos42 <peo_s>
Component: kernAssignee: freebsd-net mailing list <net>
Status: New ---    
Severity: Affects Some People CC: ae, bz, cem, emaste, hrs, jamie, jinmei, lx, sephe
Priority: ---    
Version: 11.2-STABLE   
Hardware: amd64   
OS: Any   
Description Flags
Proposed patch none

Description peos42 2018-11-18 00:20:28 UTC

I have recently seen a rising problem when I replaced a Linux server with FreeBSD. This is because it is used as a VPS at the cloud provider RamNode.

They (i.e RamNode) have provided an IPv6 gateway that is outside my allotted /64 block. They have a /48 at each DC. And the IPv6 gw is outside my /64. So FreeBSD requires me to increase the net mask from /64 to /48 for internet access to work. They also state this here... https://clientarea.ramnode.com/knowledgebase.php?action=displayarticle&id=44

This means I have a problem to communicate with a set of servers over IPv6 related to the mask we have specified. I unfortunately now have a sever that I can use IPv4 only to because of this.

Linux and windows obviously accept having gateways outside its network scope. Why, I do not know... I of course think this is wrong. OpenBSD and FreeBSD don’t accept this handling of gateways outside the net mask scope… If it is RFC compliant or not, I do not know. I have not checked…. And I think FreeBSD is actually behaving right. 

I logged a case at RamNode...

The problem is that RamNode states that most cloud providers behaves in the same way. So now it just became a problem if we want to use FreeBSD at the cloud providers with IPv6.

RamNode stated:
This kind of setup does appear to be odd but if you search you will see there are a number of large providers that take the same approach. Users on these other providers also experience issues with the gateway being outside of the subnet on BSD. Unfortunately I do not have specific knowledge as to why our configuration is this way but it does appear to be common.

So… I am working primarily with security. As I do not see any immediate security issue doing this, is it possible to add an rc.conf flag to accept gateways outside the net-mask scope? Otherwise, FreeBSD is not the horse to bet on for the future as VPS:es on the internet.

Comment 1 Conrad Meyer freebsd_committer 2018-11-18 01:13:29 UTC
What's the problem with configuring the /48 on your gateway interface?
Comment 2 peos42 2018-11-18 01:35:25 UTC
I cannot access some hosts over IPv6 (these are also hosted at RamNode within their block). 

When the VPS was Linux with the same IP and the net-mask on the server it worked ok (Linux used the default gw with a /64 set as net-mask). 

When I reinstalled the VPS to FreeBSD 11.2 nothing worked over IPv6 at the initial config. Then I saw the post I referred to and change net-mask to a /48 so the gw was included in the net-mask. Then internet and everything else worked. But not IPv6 communication to some host. Most likely due to the /48 net-mask set... I guess due to std problem. Why send the traffic on to gw if you think the host is on your local network as the mask say so.
Comment 3 Conrad Meyer freebsd_committer 2018-11-18 03:14:08 UTC
I see.  As a workaround, maybe two addresses can be configured on the interface?  A /128 to the VPS' gateway, and a second address from your actual /64 allocation?  That way it is valid to send traffic to the VPS gateway via the interface, but hosts outside of your /64 are directed over the gateway rather than the local link?
Comment 4 Bjoern A. Zeeb freebsd_committer 2018-11-19 00:22:23 UTC
(In reply to peos42 from comment #0)

I used to have such a setup with a very well known European hoster.  It's idiotic IPv4 behaviour (and was exactly that there as well) and it'll eventually cause them a lot of trouble in IPv6 land as their neighbour tables on the L2/3 device in front of you can easily fill up.  My European one after 1.5 years of silence has just updated and rolled out the new setup with a transition period years after.  They never said anything but I was happy they listened.

The solution for any hoster is to have a fe80::1/64 as a default gateway on all interfaces for all customers.  It's a link-local address, there'll not be too many of them and then, given they know the ether address of their customers route whatever network their customers get to that; no extra neighbour table addresses; their router is a lot less attackable as there's no public /64 on each interface, etc.  So much more to say about all this but that's their problem and not yours.

You can still make this work with FreeBSD and some "glue" and magic and I'll just braindump here what comes to my mind:
(a) set your ipv6_default_interface to your external interface
(b) look at ndp -an to find your routers link-local address and then set ipv6_defaultrouter="fe80:....%${ipv6_default_interface}"

    Note this is a hack as that address can change if your hoster changes things or moves the VM around; in a more or less static setups it works;  it could be "automated";

(c) I wonder if ping6 -n ff02::2%<interface> will give you answers, that should be the same address as in (b).  If the address from (b) changes you might be out of luck and the best you could do is to script a "checker" which validates the address every minute and updates the IPv6 default route accordingly.

(d) The above assumes that calling rtsol on the interface doesn't help you in that setup.  Would be great if it would.

(e) alternatively: you might be able to set the default gateway using -link;  can't remember if that works;  haven't tried that in years.

Try and see if you can work it out from there.  I'd be curious to hear...
Comment 5 Andrey V. Elsukov freebsd_committer 2018-11-19 10:10:22 UTC
I saw that Sepherosa has added support of non-prefix directly reachable routes to DragonflyBSD. Also I saw several times in the our mailing lists the question why such routes don not work on FreeBSD without properly configured prefixes. Maybe it is time to rethink this and add such support?

The noted commits are

But FreeBSD needs a bit more changes.
Comment 6 peos42 2018-11-20 08:30:48 UTC
Maybe there is a reason why DragonflyBSD fixed it. 

The cloud provider in the same support case I started this thread with said:

Additionally, if BSD followed RFC compliance for neighbour table discovery (https://tools.ietf.org/html/rfc4861) it would not be an issue, but they do not. This has been know and unfortunately has affected *BSD all the back to 2012. It's actually BSD that is not RFC compliant in this case.

I have not looked that deep. But is the case that FreeBSD does not follow the RFC4861 regarding Neighbour Discovery?

IF it is not... Then I suggest this will be added to the future ToDo list for fixing.
Comment 7 Andrey V. Elsukov freebsd_committer 2018-11-20 12:04:17 UTC
Created attachment 199377 [details]
Proposed patch

I just tried to patch, and it seems with this patch I can add on-link route to address that is not in configured prefixes list, and ND6 is able to send NS and receive NA. The patch should be applicable to FreeBSD 11+
Comment 8 Conrad Meyer freebsd_committer 2018-11-20 18:42:54 UTC
(In reply to peos42 from comment #6)
Could they be more specific in how they think BSD is non-compliant with that RFC?  It's a large document and the critique is not specific.
Comment 9 Conrad Meyer freebsd_committer 2018-11-20 18:50:42 UTC
(In reply to peos42 from comment #6)
Maybe this part?

   Router Advertisements contain a list of prefixes used for on-link
   determination and/or autonomous address configuration; flags
   associated with the prefixes specify the intended uses of a
   particular prefix.  Hosts use the advertised on-link prefixes to
   build and maintain a list that is used in deciding when a packet's
   destination is on-link or beyond a router.

So far, so good.

                                               Note that a destination
   can be on-link even though it is not covered by any advertised on-
   link prefix.  In such cases, a router can send a Redirect informing
   the sender that the destination is a neighbor.

So I guess that may be the complaint here?
Comment 10 Conrad Meyer freebsd_committer 2018-11-20 18:54:01 UTC
Further (§8.3, Host Specification):

   A host receiving a valid redirect SHOULD update its Destination Cache
   accordingly so that subsequent traffic goes to the specified target.
   If the Target and Destination Addresses are the same, the host MUST
   treat the Target as on-link.  If the Target Address is not the same
   as the Destination Address, the host MUST set IsRouter to TRUE for
   the target.
Comment 11 peos42 2018-11-20 18:58:22 UTC
(In reply to Conrad Meyer from comment #8)

RFC 4861 say: 
   If the source address of the packet prompting the solicitation is the 
   same as one of the addresses assigned to the outgoing interface, that 
   address SHOULD be placed in the IP Source Address of the outgoing 
   solicitation.  Otherwise, any one of the addresses assigned to the 
   interface should be used.   

So it IS permissible for another address to appear here. RFC 5942 that updates RFC 4861 seems to not change this.

This is probably why it works on Linux, windows, DragonflyBSD etc. I guess they have seen this as the statement is quite clear. 

Comment 12 jinmei 2018-11-20 21:26:52 UTC
I don't think this text is relevant to the topic:

   If the source address of the packet prompting the solicitation is the
   same as one of the addresses assigned to the outgoing interface, that
   address SHOULD be placed in the IP Source Address of the outgoing
   solicitation.  Otherwise, any one of the addresses assigned to the
   interface should be used.

The "otherwise" case is basically about a forwarding node (router), in which case the source address of the packet being forwarded is normally different from any of the outgoing interface of the forwarding node.  Obviously this case should be an exception to the SHOULD.  As far as I know FreeBSD is compliant to this spec.

Besides, I don't see any relevance of the source address selection of outgoing NS to this issue.

The problem description is a bit unclear, but I don't see anything in the FreeBSD's implementation that may be related to this trouble and is not RFC-compliant.  If I were to guess, the expected operation here is to allow the user to manually specify an on-link prefix (in this case, that would be <router's IPv6 address>/128).  As far as I know there's no RFC that requires a host to implement such a manual configuration.  But supporting it may not be a bad idea.  And, if we add support for it, I'd do so by extending 'ndp' so that it allows the user to manually create an entry that would be listed by 'ndp -p', rather than allowing route(8) to tweak the routing table that causes the same effect (which b72db1d3321d7a80f4da3f727765bcc200f30278 of the dragonfly patch seems to do).
Comment 13 Conrad Meyer freebsd_committer 2018-11-20 22:02:20 UTC
(In reply to Andrey V. Elsukov from comment #7)
Isn't this patch a bit of a kludge?  The existing check for the entry in our L2 entry cache should be sufficient — why don't we populate LLE cache with on-link off-prefix routers?

It's not clear to me the exact ordering, but it seems somehow we get a router advertisement and insert it into our routing table without populating the LLE of the sender in the LLE cache.  I think we must be violating the following somehow (or ignoring SHOULD):

   After extracting information from the fixed part of the Router
   Advertisement message, the advertisement is scanned for valid
   options.  If the advertisement contains a Source Link-Layer Address
   option, the link-layer address SHOULD be recorded in the Neighbor
   Cache entry for the router (creating an entry if necessary) and the
   IsRouter flag in the Neighbor Cache entry MUST be set to TRUE.  If no
   Source Link-Layer Address is included, but a corresponding Neighbor
   Cache entry exists, its IsRouter flag MUST be set to TRUE.

Maybe it's bogus that nd6_onlink_ns_rfc4861 defaults to off?
Comment 14 Andrey V. Elsukov freebsd_committer 2018-11-21 08:56:45 UTC
(In reply to Conrad Meyer from comment #13)
> (In reply to Andrey V. Elsukov from comment #7)
> Isn't this patch a bit of a kludge?  The existing check for the entry in our
> L2 entry cache should be sufficient — why don't we populate LLE cache with
> on-link off-prefix routers?
> It's not clear to me the exact ordering, but it seems somehow we get a
> router advertisement and insert it into our routing table without populating
> the LLE of the sender in the LLE cache.

Such route can by added by administrator. The main user's complain is that for IPv4 you can add route like `route add -host A.B.C.D -iface em0`, but for IPv6 this won't work, because you need to have configured prefix on the interface, without the prefix ND6 will think that address on this link is not neighbor, and won't send NS, and you will get ENOBUFS error when try to send a packet to specified host. This patch adds the check and now the kernel at least will try to resolve address on the interface.
So, in general you are able to add on-link route to your gateway like this:
route -6 add -host fd00::1 -iface em0
Comment 15 Conrad Meyer freebsd_committer 2018-11-21 16:38:52 UTC
(In reply to Andrey V. Elsukov from comment #14)
I see, thanks for explaining Andrey.
Comment 16 Jamie Landeg-Jones 2019-01-05 15:07:41 UTC
(In reply to Bjoern A. Zeeb from comment #4)

I have some VPS with vultr, and their default freebsd image seems to be set up in a similar way, though they use accept_rtadv.

Your router-ping idea works there:

catflap% netstat -rn6|grep vtnet0

default                           fe80::fc00:ff:fe05:f2a7%vtnet0 UG    20736465   1500   vtnet0
2001:19f0:300:2185::/64           link#1                        U             0   1500   vtnet0
fe80::%vtnet0/64                  link#1                        U        343447   1500   vtnet0
fe80::5400:ff:fe05:f2a7%vtnet0    link#1                        UHS         107  16384      lo0

catflap% ping6 -n ff02::2%vtnet0
PING6(56=40+8+8 bytes) fe80::5400:ff:fe05:f2a7%vtnet0 --> ff02::2%vtnet0
16 bytes from fe80::fc00:ff:fe05:f2a7%vtnet0, icmp_seq=0 hlim=64 time=0.149 ms
16 bytes from fe80::fc00:ff:fe05:f2a7%vtnet0, icmp_seq=1 hlim=64 time=0.130 ms
16 bytes from fe80::fc00:ff:fe05:f2a7%vtnet0, icmp_seq=2 hlim=64 time=0.149 ms
--- ff02::2%vtnet0 ping6 statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/std-dev = 0.130/0.143/0.149/0.009 ms

catflap% grep vtnet0_ipv6 /etc/rc.conf
ifconfig_vtnet0_ipv6="inet6 2001:19f0:300:2185::1:1 prefixlen 64 accept_rtadv"
Comment 17 Jamie Landeg-Jones 2019-01-05 15:12:39 UTC
ifconfig_vtnet0_ipv6="inet6 2001:19f0:300:2185::1:1 prefixlen 64 accept_rtadv"
rtsold_flags="-Fa"      # Flags to an IPv6 router solicitation