Hi, OpenVPN maintainer here. Going from FreeBSD 11 to FreeBSD 12, I discovered a new effect that I can only explain by "there is a race condition in the kernel" and it affects us in non-pretty ways. What I observe: - the tun or tap interface is initialized "the normal way" (like always) - that is, instantiate interface by open("/dev/tun0", O_RDWR), then configure by exec()'ing "ifconfig" statements for IPv4 and IPv6 - afterwards, the interface "frequently but not always" has IFDISABLED set, and that sticks tun0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1500 options=80000<LINKSTATE> inet6 fe80::250:56ff:fe9c:dffb%tun0 prefixlen 64 tentative scopeid 0x3 inet6 fd00:abcd:204:2::1000 prefixlen 64 tentative inet 10.204.2.6 --> 10.204.2.5 netmask 0xffffffff groups: tun nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> Opened by PID 48722 if the interface is in this state, IPv6 will not work (you can send packets over the interface, but since the IPv6 address is marked "tentative", it will use some other IPv6 address available, which is not accepted by the server). - configuring "ifconfig tun0 inet6 -ifdisable" will always fix the problem and make IPv6 communication work - if I add an exec() to "ifconfig tun0 inet6 -ifdisable" right after "ifconfig tun0 inet6 fd00:abcd:..." it will work "more often", but sometimes the IFDISABLE state still happens - if I add a sleep(1), and *then* "ifconfig -ifdisable", I can run this 100 times in a row (I did!) and never see the problem - both tun and tap are affected, but for whatever reason, it seems to happen "more frequently" for me on tap devices. But that might have been random coincidence. - Testing on older FreeBSD versions (up to and including 11.3) never exhibited this sort of spurious failure. We run lots of client instance tests and I pride myself on "all tests green" :-) So - is this enough information to trigger some "oh, yes, this makes sense" moment? Or shall I provide a test program to trigger this (I do not have one but could build one)?
Hi Gert, thank you for your report. A reduced test case/program would be certainly by ideal, but also the following information would be useful if you can provide it: - Exact freebsd versions affected (uname -a) - Minimum network configuration necessary to reproduce (/etc/rc.conf, openvpn or pure tun/tap config (ideally)) - truss or similar trace output under reproduction (as an attachment) - /var/run/dmesg.boot output (as an attachment) - confirmation whether openvpn/openvpn-devel from ports/packages also exhibits this issue. Additionally, we've had many changes in current since 12 was branched, I'd probably test a 13-CURRENT snapshot too (the issue may have been identified and resolved) ^Triage: cc net/openvpn maintainer
(In reply to Gert Doering from comment #0) Hi, If you could provide me a test case that reproduces it somewhat reliably, I'd appreciate it; in 12.2, tun/tap have been consolidated into a single driver and some other races fixed along the way. You could try testing on recent stable/12 to see if you can reproduce it at all before reduced test case. I'll look over the driver briefly and see if I can spot an immediate issue along these lines.
Is there any additional information about the interface in the syslog? It looks like there's only three possible places that ND6_IFF_IFDISABLED can be set, and two of them output errors: $ifname: possible hardware address duplication detected, disable IPv4 Cannot enable an interface with a link-local address marked duplicate. Noting that the second message doesn't indicate an interface name.
Hi, thanks for your quick replies. A few answers right away: - Exact freebsd versions affected (uname -a) FreeBSD fbsd-tc.ov.greenie.net 12.1-RELEASE-p2 FreeBSD 12.1-RELEASE-p2 GENERIC amd64 - Minimum network configuration necessary to reproduce (/etc/rc.conf, openvpn or pure tun/tap config (ideally)) Sharing the actual OpenVPN config is complicated (the config is very simple, but it's part of our CI tests, so without "the other end" it won't do much) - truss or similar trace output under reproduction (as an attachment) - /var/run/dmesg.boot output (as an attachment) - confirmation whether openvpn/openvpn-devel from ports/packages also exhibits this issue. Right now, my system is stubbornly refusing to show the effect at all - but if it does so, the effect is the same for openvpn (2.4.x) or openvpn-devel (which is our git master state as of two days ago, with no FreeBSD specific patches). While it nicely did this for "80%" of the cases beforehand... I think the easiest way is to write a test program, which I'll do tonight and attach it.
(In reply to Kyle Evans from comment #3) nd6_dad_timer: cancel DAD on tun0 because of ND6_IFF_IFDISABLED. nd6_dad_timer: cancel DAD on tun0 because of ND6_IFF_IFDISABLED. this is what I have, but it's sort of not explaining "why ifdisabled" but just logging the consequences. Might this be related? nd6_dad_timer: called with non-tentative address fe80:3::250:56ff:fe9c:dffb(tun0)
(In reply to Gert Doering from comment #5) Interesting; if you're not getting at least this message: $ifname: possible hardware address duplication detected, disable IPv4 Then, AFAICT, this probably indicates that rtsold or nd6 has for some reason explicitly disabled it... I guess probably the former. CC'ing bz@ because I'm less sure now that it's specifically a tun/tap issue and not a broader issue.
(In reply to Kyle Evans from comment #6) s/nd6/ndp/
I do not understand enough of the FreeBSD IPv6 magic bits to reasonably argue either way. The rc.conf of this system is very minimal ---------------------- cut ------------------------ hostname="fbsd-tc.ov.greenie.net" ifconfig_em0="inet 194.97.140.21 netmask 255.255.255.224" defaultrouter="194.97.140.30" ipv6_defaultrouter="2001:608:0:814::ffff" ifconfig_em0_ipv6="inet6 2001:608:0:814::f000:21/64" sshd_enable="YES" ntpdate_enable="YES" ntpd_enable="YES" # Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable dumpdev="AUTO" zfs_enable="YES" inetd_enable=YES # gert, 30.9.19, amanda ---------------------- cut ------------------------ but I'd still argue it should be deterministic either way, like "always IFFIDSABLED" or "never". Anyway, test program coming. Might take until tomorrow.
So. I have a nice test program now which can be run in a loop and then looked at later. As I said it's somewhat spurious on my system - sometimes it triggers 10 times in a row, then 20 times not at all. If it triggers, the output looks like this: cc -o tester -Wall tester.c sudo ./tester tun0 /dev/tun0 open ok, fd=3 run cmd: 'ifconfig tun0 inet6 fd00:abcd:204:4::1001/64 mtu 1500 up' read 296 bytes, time since start=0s tun0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1500 options=80000<LINKSTATE> inet6 fe80::250:56ff:fe9c:dffb%tun0 prefixlen 64 tentative scopeid 0x3 inet6 fd00:abcd:204:4::1001 prefixlen 64 tentative groups: tun nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> Opened by PID 62433 read 307 bytes, time since start=1s tun0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1500 options=80000<LINKSTATE> inet6 fe80::250:56ff:fe9c:dffb%tun0 prefixlen 64 tentative scopeid 0x3 inet6 fd00:abcd:204:4::1001 prefixlen 64 tentative groups: tun nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> Opened by PID 62433 *** FOUND *** (the tester program will loop for 10 seconds, run "ifconfig $dev" in a pipe, and print "FOUND!" if it found IFDISABLED in the output - which could be extended to do something else on-trigger) Attachment coming.
Created attachment 216689 [details] test program to reproduce behaviour
This is driving me totally insane. The first 4 times I ran the tester, it had IFDISABLED. The next like 50 times, nothing... not sure what else is need to trigger it - I ran a few openvpn instances in between ("if it's related to additional configs, like routing changes") but that doesn't trigger it either.
You just need to enable IPv6 in rc.conf. ipv6_activate_all_interfaces="YES" When it is enabled, all new interfaces will be created without IFDISABLED flag.
New test program coming, so please ignore this one. There is something else needed to make this trigger more often, like "packets in queue for this interface" or an actual reader on the tunfd. (OpenVPN test suite triggers this 90% for some cases, openvpn itself triggers in like 20% of all cases *with the same options* as if called by the test suite, test script in the version uploaded so far triggers in 1%...)
(In reply to Andrey V. Elsukov from comment #12) rc.conf settings can not be the correct answer for spurious and timing-dependent differences in behaviour. Also, I can not control what our users will do (and they might not have IPv6 on the LAN side, so might not have turned it on in rc.conf) - so I need a reliable and robust way to bring up a tunnel interface with IPv6, independent of rc.conf settings. Adding some argument to our ifconfig call (or adding some extra ioctl()s or whatever) is not a problem, but right now I need an extra "sleep(1)" before the "ifconfig $dev inet6 -ifdisable", which looks race'y and probably is...
(In reply to Gert Doering from comment #14) Presence of this flag is under control of /etc/network.subr script. And this always worked via rc.conf.
(In reply to Andrey V. Elsukov from comment #15) This is still missing the point. If having ipv6_activate_all_interfaces="YES" is a hard requirement now, the interface should *always* come up with IFDISABLED, not "just sometimes". This needs to be deterministic: behave the same on every instanciation. No dependency on timing or other random factors. My tun/tap interfaces are not brought up by /etc/rc.*, so I'm not sure if rc.subr should play a role here - maybe it does, due to some dynamic invocation on interface creation that I am not aware of - pointers welcome.
When you create tun/tap interface, the kernel generates LINK_UP event, then devd handles this event and invokes /etc/pccard_ether script, that uses routines from network.subr script. You may have some races with this script.
This depends on devd(8) daemon being enabled/running or not running at the moment of creation of network interface. If devd is running (and modern FreeBSD tends to require this), it is notified by the kernel about creation of new network interface. devd with default /etc/devd.conf responds with invoking "/etc/pccard_ether $subsystem start", and it processes all interfaces, not "pccard" only. The script does multiple actions on new interface and it acts according to settings from /etc/rc.conf.
(In reply to Andrey V. Elsukov from comment #17) Thanks for the explanation. I wonder if there is an easy way to trace these scripts, to see when exactly something was ifconfig'ed? I've sprinkled the pccard_ether script with "logger" commands now, and I can see that it will not touch the tun0 interface "if it is already up" (in pccard_ether_start()). In that case, I have no IFDISABLE. If I do "ifconfig tun0 create" by hand, I see it enters quietstart, and I do get IFDISABLE. The added "logger" commands do modify the timing, so it's indeed a race condition. If I have no loggers in the script *before* the decisions about "is this interface already up or not" are made, I can see it go to quietstart, and OpenVPN and the kernel message confirm "now it's IFDISABLED". Jul 23 12:50:09 fbsd-tc root[74575]: pccard_ether_start, ifn=tun0 Jul 23 12:50:09 fbsd-tc root[74579]: pccard_ether_start, quietstart Jul 23 12:50:10 fbsd-tc kernel: nd6_dad_timer: cancel DAD on tun0 because of ND6_IFF_IFDISABLED. This is good news, actually, because it turns out to be "very likely not a kernel bug". Apologies for jumping to a conclusion here. On the other hand it's bad news, because I do not know how to fix this "for good". If an OpenVPN user does not have IPv6 enabled on his regular LAN interfaces, he might not have ipv6_activate_all_interfaces="YES" set. Even for someone who *has* IPv6 active (like, me, on that test box) that setting might not be set - because having "ifconfig_em0_ipv6" and "ipv6_defaultrouter" are perfectly fine to get what I want ("a static v6 address + default route"). But they want IPv6 to work inside the tunnel, if their VPN server has working v6 (it might be their only way to reach the IPv6 Internet). So what should I do? Stick to "sleep(1); ifconfig tun0 inet6 -ifdisabled"?
Just for the record: if I insert a "sleep(1)" in the tester.c between the open() call to /dev/tun0 and the ifconfig statement, *and* insert a "sleep 1" in pccard_ether between "`ifconfig -ul` and if [ "${uif}" = "${ifn}" ] - then the time window for the race is wide enough that "it always happens". Of course it would be great to find a non-racy solution that does not involve telling users "you must change your rc.conf to make openvpn+ipv6 work reliably for you" (they might actually have "on all interfaces" disabled on purpose). But I can already say: thanks for all your help!
New interfaces including openvpn's tunnels should obey settings in the /etc/rc.conf and if this configuration file tells to enable/disable IPv6 for dynamic interfaces, such configuration should be respected. There are already several ways to (un)configure -ifdisabled flag. Another way is the following: ifconfig_DEFAULT="-ifdisabled" This also applies to new tunnels that have no specific lines like "ifconfig_tun0".
Gert, can you try this patch (totally untested, not even compile tested): https://people.freebsd.org/~bz/tmp/20201031-02-ip6-ifdisabled.diff This will enable ND6_IFF_IFDISABLED when the interface is created in the kernel before return to user space (if I am right) and user space (driver from devd -> /etc/pccard_ether => ... netif start IF => ifdisabled should no longer execute the last bit and hence not race with your program. I might have seen a similar issue with ppp/tun0 lately. Also adding @hrs to Cc: as he know this logic a lot better than me and might know of other pitfalls by doing this.
(In reply to Bjoern A. Zeeb from comment #22) Bjoern, thanks for looking into this. I have stared at the diff, and it looks like a reasonable approach that solves both the "if people do not want IPv6, they should not get it" requirement, and the "but if an interface is created under program control, and the program configures IPv6, there should not be a race with a RC script that turns IPv6 back off". I have applied the patch to a 12.2-RELEASE system (kernel + /etc/rc.subr) and can confirm that it works. As in: -- it compiles :-) -- if I bring up an interface manually ("ifconfig tun7 create up"), the interface has the desired (as in: keep existing behaviour) property of "nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>" -- if I bring up the interface from within OpenVPN (open /dev/tun, exec "ifconfig ... inet6 ...", removing the "sleep(1); ifconfig ... -ifdisabled" workaround) the resulting interface has IPv6, and does not have "IFDISABLED". Since this was a race condition before - sometimes it worked, sometimes it failed - I've ran the particular test a few dozen times, with no single failure case. And no more "nd6_dad_timer: cancel DAD on tun0 because of ND6_IFF_IFDISABLED." in dmesg either :-) So, for me, this patch is the right answer :-) gert
I'll "steal" this PR from kevans then and handle it ;-)
Opened an official review here: https://reviews.freebsd.org/D27324
A commit references this bug: Author: bz Date: Wed Nov 25 20:58:01 UTC 2020 New revision: 368031 URL: https://svnweb.freebsd.org/changeset/base/368031 Log: IPv6: set ifdisabled in the kernel rather than in rc Enable ND6_IFF_IFDISABLED when the interface is created in the kernel before return to user space. This avoids a race when an interface is create by a program which also calls ifconfig IF inet6 -ifdisabled and races with the devd -> /etc/pccard_ether -> .. netif start IF -> ifdisabled calls (the devd/rc framework disabling IPv6 again after the program had enabled it already). In case the global net.inet6.ip6.accept_rtadv was turned on, we also default to enabling IPv6 on the interfaces, rather than disabling them. PR: 248172 Reported by: Gert Doering (gert greenie.muc.de) Reviewed by: glebius (, phk) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D27324 Changes: head/libexec/rc/network.subr head/sys/netinet6/nd6.c
I think this is in FreeBSD-13 at least these days.
(In reply to Bjoern A. Zeeb from comment #27) Anything to be done for 12.4?