SLAAC is supposed to both configure an interface and add its routes to the routing table. Most of the time it succeeds. However, the test case for BUG196361 revealed occasional failures. If you configure an epair interface (both sides) immediately after creating it with "ifconfig epair create", sometimes the interface will get configured but not routes will be added. Workarounds are: 1) Add a 1 second sleep between "ifconfig epair create" and statically configuring the a half of the epair. It is not sufficient to add the sleep between statically configuring the a half and using SLAAC to configure the B half. 2) Add a longish (precise time unknown, but > 5 seconds) sleep between destroying an epair interface and creating a new one. This bug has not been observed the first time that an epair is created. The test case, currently disabled, is sys/netinet/fibs_test:slaac_on_nondefault_fib6
The problem seems to be that if you destroy an epair and then recreate it within about 60s, the SLAAC address from the previous (destroyed) interface gets assigned to the newly created interface. I don't yet know why, but I can demonstrate it by running the fibs_test:slaac_on_nondefault_fib6 twice in a row with attached patch applied. The patch randomizes the addresses for each iteration. Note how the second run's failure messaged shows 2001:db8:3325:4cc5:ff:c0ff:fe00:60b assigned to epair0b. This matches the prefix from the first run, not the prefix from the second run. $ sudo kyua debug fibs_test:slaac_on_nondefault_fib6 && sudo kyua debug fibs_test:slaac_on_nondefault_fib6 fib is 2 fib is 3 net.inet6.ip6.forwarding: 1 -> 1 net.inet6.ip6.rfc6204w3: 1 -> 1 PREFIX is 2001:db8:3325:4cc5 setfib 2 ifconfig epair0a inet6 2001:db8:3325:4cc5::2/64 fib 2 setfib 3 ifconfig epair0b inet6 -ifdisabled accept_rtadv fib 3 up Executing command [ ifconfig epair0b ] Executing command [ netstat -rnf inet6 -F 3 ] Executing command [ netstat -rnf inet6 -F 3 ] Executing command [ netstat -rnf inet6 -F 3 ] Executing command [ netstat -rnf inet6 -F 0 ] Executing command [ netstat -rnf inet6 -F 0 ] Executing command [ netstat -rnf inet6 -F 0 ] Executing command [ netstat -rnf inet6 -F 1 ] Executing command [ netstat -rnf inet6 -F 1 ] Executing command [ netstat -rnf inet6 -F 1 ] ifconfig epair0a destroy net.inet6.ip6.rfc6204w3: 1 -> 1 net.inet6.ip6.forwarding: 1 -> 1 fibs_test:slaac_on_nondefault_fib6 -> passed fib is 2 fib is 3 net.inet6.ip6.forwarding: 1 -> 1 net.inet6.ip6.rfc6204w3: 1 -> 1 PREFIX is 2001:db8:78e6:5bce setfib 2 ifconfig epair0a inet6 2001:db8:78e6:5bce::2/64 fib 2 setfib 3 ifconfig epair0b inet6 -ifdisabled accept_rtadv fib 3 up Executing command [ ifconfig epair0b ] ifconfig epair0a destroy net.inet6.ip6.rfc6204w3: 1 -> 1 net.inet6.ip6.forwarding: 1 -> 1 Fail: regexp inet6 2001:db8:78e6:5bce:.*prefixlen 64.*autoconf not in stdout epair0b: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=8<VLAN_MTU> ether 02:ff:c0:00:06:0b inet6 fe80::ff:c0ff:fe00:60b%epair0b prefixlen 64 scopeid 0x6 inet6 2001:db8:3325:4cc5:ff:c0ff:fe00:60b prefixlen 64 tentative detached autoconf nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL> media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>) status: active fib: 3 groups: epair Files left in work directory after failure: forwarding.state, ifaces_to_cleanup, rfc6204w3.state, rtadvd.pid, rtadvd.sock fibs_test:slaac_on_nondefault_fib6 -> failed: atf-check failed; see the output of the test for details
Created attachment 180917 [details] Patch for the slaac_on_nondefault_fib6 testcase
In https://reviews.freebsd.org/D9451, jhujhiti noticed that the IFDISABLED flag doesn't get added to epair0b immediately upon creation, but shortly after. He suspected that was the cause of the problem. However, it's not. It's just red herring. The IFDISABLED flag gets added by devd, which runs pccard_ether, which runs "service netif quietstart" on the new interfaces. Disabling devd does not fix the test. On the second interface creation, the address from the previous run is still present.
Turned out to be a bug in the test, not the kernel. Fixed in r315656
A commit references this bug: Author: asomers Date: Mon Mar 20 23:07:35 UTC 2017 New revision: 315656 URL: https://svnweb.freebsd.org/changeset/base/315656 Log: Fix back-to-back runs of sys/netinet/fibs_test;slaac_on_nondefault_fib6 This test was failing if run twice because rtadvd takes too long to die. The rtadvd process from the first run was still running when the second run created its interfaces. The solution is to use SIGKILL during the cleanup instead of SIGTERM so rtadvd will die faster. While I'm here, randomize the addresses used for the test, which makes bugs like this easier to spot, and fix the cleanup order to be the opposite of the setup order PR: 217871 MFC after: 18 days X-MFC-With: 315458 Sponsored by: Spectra Logic Corp Changes: head/tests/sys/netinet/fibs_test.sh
This is a good find, and it does fix rapid test runs, but unfortunately the original issue I had while developing persists. The rtsol in the test times out unless I insert a sleep like so: # Configure epair interfaces get_epair sleep 1 setup_iface "$EPAIRA" "$FIB0" inet6 ${ADDR} ${MASK} echo setfib $FIB1 ifconfig "$EPAIRB" inet6 -ifdisabled accept_rtadv fib $FIB1 up setfib $FIB1 ifconfig "$EPAIRB" inet6 -ifdisabled accept_rtadv fib $FIB1 up This sleep after epair creation is enough to fix it consistently. Moving the sleep down one line, below setup_iface, does not fix it. So it would seem that the issue is with the router interface rather than the client interface. I'm a bit puzzled as to the cause... It seems to be an issue prior to initializing inet6 on the interface at all.
How do you reproduce the failure now? I haven't seen any failures since 315656.
(In reply to Alan Somers from comment #7) I'm simply running the latest CURRENT (well, master on the github mirror) unmodified. GENERIC kernel with the following /boot/loader.conf: kern.geom.label.gptid.enable="0" zfs_load="YES" coretemp_load="YES" if_epair_load="YES" net.fibs=3 net.add_addr_allfibs=0 boot_multicons="YES" boot_serial="YES" comconsole_speed="57600" console="comconsole,vidconsole" The git HEAD it's running right now is c83649c43c7 which looks like r315762 in SVN.
But what command are you running? "kyua test" of the entire directory or just that single test, or something else? How often do you need to run it before it fails?
(In reply to Alan Somers from comment #9) Ah, of course :) kyua test -k /usr/tests/sys/netinet/Kyuafile It fails right after boot, and consistently after that. rtsol tries for 10 seconds and will send multiple router solicitations and not get a reply to any of them. If I remove the sleep and insert ifconfig "$EPAIRA" like this... # Configure epair interfaces get_epair ifconfig "$EPAIRA" setup_iface "$EPAIRA" "$FIB0" inet6 ${ADDR} ${MASK} ifconfig "$EPAIRA" ... the only unusual thing I see is a duplicate ether address that disappears after configuration: Standard output: fib is 1 fib is 2 net.inet6.ip6.forwarding: 0 -> 1 net.inet6.ip6.rfc6204w3: 0 -> 1 ### first ifconfig epair0a: flags=8842<BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=8<VLAN_MTU> ether 02:00:c0:00:04:0a ether 02:00:c0:00:04:0a nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>) status: active groups: epair setfib 1 ifconfig epair0a inet6 2001:db8:3e0d:a5a3::2/64 fib 1 ### second ifconfig epair0a: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=8<VLAN_MTU> ether 02:00:c0:00:04:0a inet6 2001:db8:3e0d:a5a3::2 prefixlen 64 tentative inet6 fe80::c0ff:fe00:40a%epair0a prefixlen 64 tentative scopeid 0x4 nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>) status: active fib: 1 groups: epair setfib 2 ifconfig epair0b inet6 -ifdisabled accept_rtadv fib 2 up Executing command [ ifconfig epair0b ] ifconfig epair0a destroy net.inet6.ip6.forwarding: 1 -> 0 net.inet6.ip6.rfc6204w3: 1 -> 0 Standard error: Fail: regexp inet6 2001:db8:3e0d:a5a3:.*prefixlen 64.*autoconf not in stdout epair0b: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=8<VLAN_MTU> ether 02:ff:c0:00:05:0b inet6 fe80::ff:c0ff:fe00:50b%epair0b prefixlen 64 scopeid 0x5 nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL> media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>) status: active fib: 2 groups: epair My initial thought was that this is a DAD issue, but if that were the case, a sleep after adding the inet6 address should be what fixes it, rather than before.
I just upgraded to r315874 and decreased net.fibs from 4 to 3 but I still can't reproduce your failure.
(In reply to Alan Somers from comment #11) I'll rebuild my test machine from scratch over the weekend and see if it goes away.
Good news and bad news: The issue happens on a fresh install (11.0-RELEASE-p1 -> Subversion head), but I've somewhat isolated it. This simple script fails consistently - epair0b does not get an address: ifconfig epair0 create setfib 1 ifconfig epair0a inet6 2001:db8::1/64 fib 1 setfib 2 ifconfig epair0b inet6 -ifdisabled accept_rtadv fib 2 up rtadvd -p rtadvd.pid -C rtadvd.sock -c /dev/null epair0a rtsol epair0b ifconfig epair0b pkill -kill -F rtadvd.pid rm -f rtadvd.pid rtadvd.sock ifconfig epair0a destroy If I remove "setfib 1", on the first run after boot, rtsol will complain that epair0b is disabled (and the subsequent ifconfig does show that the IFDISABLED flag is never removed from nd6 options, odd...), but epair0b successfully gets an address on every execution of the script after that. While this is pretty clearly some kind of bug, could the test case avoid it by not calling setfib in setup_iface? I'm not clear on why setfib is used given that the ifconfig always uses the fib argument too.
A commit references this bug: Author: asomers Date: Mon Apr 17 20:13:22 UTC 2017 New revision: 317067 URL: https://svnweb.freebsd.org/changeset/base/317067 Log: MFC r313025, r313395, r314113, r314442, r315458, r315656 r313025: Add tests for multi-fib IPv6 routing PR: 196361 Submitted by: jhujhiti@adjectivism.org Reported by: Jason Healy <jhealy@logn.net> MFC after: 4 weeks Sponsored by: Spectra Logic Corp r313395: Add fibs_test:udp_dontroute6, another IPv6 multi-FIB test PR: 196361 MFC after: 3 weeks Sponsored by: Spectra Logic Corp r314113: Remove tests/sys/netinet/fibs_tests's dependency on net/socat Instead of bridging two tap interfaces with socat, just use an epair pair. MFC after: 3 weeks Sponsored by: Spectra Logic Corp r314442: Add an ATF test for IPv6 SLAAC with multiple fibs Tests that an interface can get a SLAAC address and that it inserts its routes into the correct fib. Does not test anything to do with NDP. PR: 196361 Reviewed by: Erick Turnquist <jhujhiti@adjectivism.org> MFC after: 3 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D9776 r315458: Constrain IPv6 routes to single FIBs when net.add_addr_allfibs=0 sys/netinet6/icmp6.c Use the interface's FIB for source address selection in ICMPv6 error responses. sys/netinet6/in6.c In in6_newaddrmsg, announce arrival of local addresses on the interface's FIB only. In in6_lltable_rtcheck, use a per-fib ND6 cache instead of a single cache. sys/netinet6/in6_src.c In in6_selectsrc, use the caller's fib instead of the default fib. In in6_selectsrc_socket, remove a superfluous check. sys/netinet6/nd6.c In nd6_lle_event, use the interface's fib for routing socket messages. In nd6_is_new_addr_neighbor, check all FIBs when trying to determine whether an address is a neighbor. Also, simplify the code for point to point interfaces. sys/netinet6/nd6.h sys/netinet6/nd6.c sys/netinet6/nd6_rtr.c Make defrouter_select fib-aware, and make all of its callers pass in the interface fib. sys/netinet6/nd6_nbr.c When inputting a Neighbor Solicitation packet, consider the interface fib instead of the default fib for DAD. Output NS and Neighbor Advertisement packets on the correct fib. sys/netinet6/nd6_rtr.c Allow installing the same host route on different interfaces in different FIBs. If rt_add_addr_allfibs=0, only install or delete the prefix route on the interface fib. tests/sys/netinet/fibs_test.sh Clear some expected failures, but add a skip for the newly revealed BUG217871. PR: 196361 Submitted by: Erick Turnquist <jhujhiti@adjectivism.org> Reported by: Jason Healy <jhealy@logn.net> Reviewed by: asomers MFC after: 3 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D9451 r315656: Fix back-to-back runs of sys/netinet/fibs_test;slaac_on_nondefault_fib6 This test was failing if run twice because rtadvd takes too long to die. The rtadvd process from the first run was still running when the second run created its interfaces. The solution is to use SIGKILL during the cleanup instead of SIGTERM so rtadvd will die faster. While I'm here, randomize the addresses used for the test, which makes bugs like this easier to spot, and fix the cleanup order to be the opposite of the setup order PR: 217871 MFC after: 18 days X-MFC-With: 315458 Sponsored by: Spectra Logic Corp Changes: _U stable/11/ stable/11/sys/netinet6/icmp6.c stable/11/sys/netinet6/in6.c stable/11/sys/netinet6/in6_src.c stable/11/sys/netinet6/nd6.c stable/11/sys/netinet6/nd6.h stable/11/sys/netinet6/nd6_nbr.c stable/11/sys/netinet6/nd6_rtr.c stable/11/tests/sys/netinet/fibs_test.sh stable/11/tests/sys/netinet/udp_dontroute.c