Bug 261129 - IPv6 default route vanishes with rtadvd/rtsold
Summary: IPv6 default route vanishes with rtadvd/rtsold
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 13.0-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: Alexander V. Chernikov
URL: https://meka.rs/blog/2022/01/15/freeb...
Keywords:
Depends on:
Blocks:
 
Reported: 2022-01-11 21:48 UTC by Goran Mekić
Modified: 2024-10-27 16:48 UTC (History)
12 users (show)

See Also:


Attachments
router-ifconfig (1.01 KB, text/plain)
2022-01-11 21:49 UTC, Goran Mekić
no flags Details
router-netstat (2.12 KB, text/plain)
2022-01-11 21:50 UTC, Goran Mekić
no flags Details
router-rc.conf (553 bytes, text/plain)
2022-01-11 21:50 UTC, Goran Mekić
no flags Details
desktop-netstat (1.62 KB, text/plain)
2022-01-11 21:51 UTC, Goran Mekić
no flags Details
desktop-rc.conf (106 bytes, text/plain)
2022-01-11 21:52 UTC, Goran Mekić
no flags Details
rtadvd.conf (41 bytes, text/plain)
2022-01-11 22:00 UTC, Goran Mekić
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Goran Mekić 2022-01-11 21:48:51 UTC
As I only started researching IPv6, there's a high probability I did something wrong, and if see something off, please let me know. What I observe is desktop acquiring IPv6 and setting proper default route, and after few minutes that route is gone. Strangely, when IPv6 is configured through dhcpcd, default route remains. I saw same behavior on laptop which has wlan0 and em0 as part of lagg0, so if there's some wrong config, I suspect it's on the router.
Comment 1 Goran Mekić 2022-01-11 21:49:45 UTC
Created attachment 230928 [details]
router-ifconfig
Comment 2 Goran Mekić 2022-01-11 21:50:15 UTC
Created attachment 230929 [details]
router-netstat
Comment 3 Goran Mekić 2022-01-11 21:50:45 UTC
Created attachment 230930 [details]
router-rc.conf
Comment 4 Goran Mekić 2022-01-11 21:51:13 UTC
Created attachment 230931 [details]
desktop-netstat
Comment 5 Goran Mekić 2022-01-11 21:52:03 UTC
Created attachment 230932 [details]
desktop-rc.conf
Comment 6 Goran Mekić 2022-01-11 22:00:16 UTC
Created attachment 230934 [details]
rtadvd.conf
Comment 7 Goran Mekić 2022-01-12 22:43:03 UTC
I would like to examine the traffic with wireshark, as machine in question is desktop anyway, but I don't know what to look for. As IPv6 novice, I don't have enough knowledge to know how router advertisement works, but other than that I'm happy to dig into code of rtadvd/rtsold and figure out what's wrong. I "just" need help with IPv6 RA protocol.

One of the ideas proposed to me was to give dnsmasq a try and it if works and compare the differences. I'll experiment with it for a while, but somebody helping me understand RA would speed up the process a lot.
Comment 8 Shawn Webb 2022-01-12 22:52:30 UTC
I'm seeing somewhat of the same behavior. I'm unsure if whether the default route is set in the first place (then removed), though. I am seeing that the IPv6 RAs coming from my router/firewall include setting the default route. On some systems, the default route is indeed set. On others, it's not.

Note that all non-BSD systems (Android phone, smart TVs, etc.) all have their default route set.
Comment 9 Goran Mekić 2022-01-12 23:37:54 UTC
I tried dhcpcd and it does set route as it should. The problem is that when I use dhcpcd on host, vnet jails don't have network access unless I set IPv6 to both ends of epair. If I use rtsold on host, vnet jails work without setting address to host's end of epair. I don't know if it helps at all with this issue or it's completely isolated problem.
Comment 10 Goran Mekić 2022-01-14 14:17:19 UTC
If I add ipv6_defaultrouter="..." (value the same that rtsold sets for a while), I get this:

netstat -rn6 | grep '^default'
default                           fe80::5a9c:fcff:fe10:6c2c%re0 UG          re0
default                           fe80::5a9c:fcff:fe10:6c2c%re0 UGS         re0

One is static route, and one is through acquired through rtsold. It's interesting that no route is lost ever, but it's a bit ackward situation.
Comment 11 Goran Mekić 2022-01-22 16:08:30 UTC
In my blog post I detailed the whole setup so it might come handy.
Comment 12 Marek Zarychta 2022-06-12 17:33:10 UTC
I am also hitting this on two hosts with statically configured IPv6 addresses and routes, but the setup is more complex here. 

I am using two different subnets:
1) 2a02:a:b:c::x:y/64 - the subnet with default route.
2) 2001:470:x:y::v:z/64 - fib 1 subnet assigned from still working IPv6 tunnel to HE.

First of the machines is an endpoint for HE tunnel and acts as a router for this  2001:470:x:y::v:z/64 tunneled subnet, but main NIC address is set from ISP assigned pool:

awg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=8000b<RXCSUM,TXCSUM,VLAN_MTU,LINKSTATE>
        (...)
	inet6 2a02:a:b:c::x:y prefixlen 64 prefer_source #(basic assignment, prefer_source fixes connectivity via default gw)
	inet6 2001:470:x:y::1 prefixlen 64  #(alias 1)
	inet6 2001:470:x:y::x:z prefixlen 128 #(alias 2)
	inet6 2a02:a:b:c::x:z  prefixlen 128 #(alias 2)
	media: Ethernet autoselect (1000baseT <full-duplex>)
	status: active
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

Default gateways:
1. fib 0:
# netstat -rn6 | grep default
default     2001:470:x:y::1    UGS        awg0
2. fib 1:
# netstat -rn6 -F 1 | grep default
default  2001:470:1:c84::16    UGS        gif0

The second machine is a host in LAN with IPv6 only jails and also has both subnets enabled plus some aliased IPv6 addresses set like above.

Initially, everything works, but after some time it looks like RA message from gif(4) tunnel to HE somehow changes the default route in fib 1 which breaks connectivity. The default route is set to 2001:470:1:x::y. "service routing restart fixes" this, but with notice: "[nhop_ctl] inet6.0 nhop_free: failed to unlink nh#8/inet6/awg0/2001:470:1:x::y"
 
The same happens on the second machine, but events are not in sync with the first machine. The RAs arrive from different sources, times of the route change differ, also different gateways are set.

I wonder if could/should the traffic be filtered on gif(4)?
Now I am trying the sysctl setting "net.inet6.ip6.no_radr=1", probably this could fix the issue.

Any additional clues will be appreciated.
Comment 13 Marek Zarychta 2022-06-12 18:02:49 UTC
(In reply to Marek Zarychta from comment #12)
>Default gateways:
>1. fib 0:
># netstat -rn6 | grep default
>default     2001:470:x:y::1    UGS        awg0
>2. fib 1:
># netstat -rn6 -F 1 | grep default
>default  2001:470:1:c84::16    UGS        gif0

Errata, the initial settings are:

Default gateways:
1. fib 0:
# netstat -rn6 | grep default
default   2a02:a:b:c::1     UGS       awg0
2. fib 1:
# netstat -rn6 -F 1 | grep default
default  2001:470:a:b::1   UGS        gif0
Comment 14 Tatsuki Makino 2022-06-13 07:41:30 UTC
I forgot a lot of things as I am still not configured for IPv6 in my current environment.... :)

We can see neighbor's status on "ndp -n -a".
We can even find neighbors on "ping6 -c 1 ff02::1%<ifname like lo0 here>".

I would be interested to know if there is a difference between good and bad :)
Comment 15 Marek Zarychta 2022-06-13 09:13:28 UTC
(In reply to Tatsuki Makino from comment #14)
I have hijacked this PR, but you are probably referring to Goran's original report.

I have not progress with solving my issue. So far I found that with settings:
ifconfig gif0 inet6 no_radr
ifconfig awg0 inet6 no_radr
and sysctls:
net.inet6.ip6.accept_rtadv=0
net.inet6.ip6.no_radr=1
the default route still gets overwritten with 2001:470:1:c84::16, 2001:470:1:c84::28 or 2001:470:1:c84::29.

I will check if filtering icmp6 on gif(4) helps and maybe will turn this off to check if incoming RAs from gif(4) are really the culprit.
Comment 16 Marek Zarychta 2022-06-14 07:44:12 UTC
After taking some measures and test, so far I came to following conclusions:

1. The default route gets _silently_ corrupted irregardless of deployed route.algo, with no traces observable neither with route(8) monitor nor with increased net.route.algo.debug_level due to unknown reason (overflow ?).

2. Setting net.inet6.ip6.no_radr=1 net.inet6.ip6.accept_rtadv=0 and respectively disabling these options on the interfaces doesn't prevent the default route from being corrupted. Changing the value of net.add_addr_allfibs doesn't help either. 

3. Restarting "routing" service fixes the issue for some time.

4.FreeBSD routing stack at current state doesn't allow to use in production two different IPv6 GUA subsets on the same interface neither on 13-STABLE nor CURRENT (tested on amd64 and arm64 archs).

5. PF(4) supports rtable, route-to and reply-to for IPv6 traffic allowing deployment of more advanced network scenarios.

6. IPv6 on FreeBSD still needs more testing, especially in muiltihomed scenarios where multiple FIBs are involved.
Comment 17 Marek Zarychta 2022-06-17 05:58:27 UTC
Update

A couple of days ago I rewrote the set of slapdash PF rules suspecting them as the cause, especially initially abused "rtable" statements. The "rtable" had been replaced with "reply-to" or deleted where possible and it was the right step. It is worth mentioning that a few rules with "rtable" were preserved though. When the issue got sorted out, to find the culprit one of the borked rules used previously was reintroduced:
"pass in quick on gif0 inet6 to ($gif_if) rtable 1"
which led to the corruption of the default route in FIBs 0 and 1 within a few hours. Maybe this happens due to incorrectly recognised protocol 41? 

Final conclusions:

1. FreeBSD routing stack is capable of using two different IPv6 GUA subnets on the same interface on both CURRENT and 13-STABLE.

3. The rules with "rtable" statements regarding IPv6 traffic should be introduced with care in PF(4) configuration file and avoided when possible.

3. IPv6 on FreeBSD still needs more testing, especially in multihomed scenarios where multiple FIBs are involved.

It is probably my fault, I am sorry for making noise on the frebsd-net@ mailing list and hijacking this PR, but provided feedback might be useful. If you still consider this a bug, please let me know, and I will submit a new PR.
Comment 18 FiLiS freebsd_committer freebsd_triage 2024-03-05 12:05:39 UTC
I ran into a similar issue on 14.0. Adding `ipv6_defaultrouter="fe80::%re0"` to rc.conf solved this one for me.
After rebooting, everything is fine. But when connecting to OpenVPN or tailscale networks and disconnecting afterwards, the v6 route gets lost somehow. v4 seems fine the whole time, even with routing everything thru a OpenVPN/tailscale tunnel and then turning it off again.
Comment 19 Tatsuki Makino 2024-10-27 05:36:17 UTC
If something like the one shown below was about to be built, /etc/rc.conf needed to define a value like:
ipv6_cpe_wanif="em0"

Also, it is mandatory to send RA from the bridge interface.
If it is sent by a member of the bridge, it does not work for some reason.

For the time being, my progress is on my way :)
Below is the figure :)

    (WAN)
      |
      | RA
      v
+-----+-----+
|    em0    |
|           |
|  FreeBSD  |  
|           |
|    em1    |
+-----+-----+
      |
      | RA by rtadvd
      v
    (LAN)
Comment 20 Patrick M. Hausen 2024-10-27 16:48:20 UTC
> Also, it is mandatory to send RA from the bridge interface.

It is mandatory to have all layer 3 configuration, specifically all addresses on the bridge interface. This is documented in the bridge section in the Handbook.

A bridge member interface must not have any IP address. It becomes a layer 2 port like in a switch.