Summary: | [if_bridge] IPv6 ndp does not work across local bridge members | ||
---|---|---|---|
Product: | Base System | Reporter: | Martin Birgmeier <d8zNeCFG> |
Component: | kern | Assignee: | freebsd-net (Nobody) <net> |
Status: | Open --- | ||
Severity: | Affects Only Me | CC: | dgeo, donner, kp, lwhsu, melifaro, philip, pmh, qingli |
Priority: | --- | Keywords: | ipv6 |
Version: | 12.2-RELEASE | ||
Hardware: | Any | ||
OS: | Any |
Description
Martin Birgmeier
2020-07-11 14:30:50 UTC
(In reply to Martin Birgmeier from comment #0) Just to be sure, could you please provide the "ndp -a" output for both before and after the bridge0 creation? Hi Li, Since you want it "before and after the creation of bridge0", the following is from the host; but the issue actually occurs on the client - I'll provide the output for that, too. Host before "bridge0 create" and "tap904 create": [0]# ndp -a Neighbor Linklayer Address Netif Expire S Flags 2002:b2bf:ee7e:4d42:22cf:30ff:fe55:5cb6 20:cf:30:55:5c:b6 re0 permanent R fec0::4d42:22cf:30ff:fe55:5cb6 20:cf:30:55:5c:b6 re0 permanent R fec0:0:0:4d42::e1 20:cf:30:55:5c:b6 re0 permanent R fe80::22cf:30ff:fe55:5cb6%re0 20:cf:30:55:5c:b6 re0 permanent R gandalf.xyzzy 00:03:0d:4f:f3:a7 re0 23h57m34s S R fe80::203:dff:fe4f:f3a7%re0 00:03:0d:4f:f3:a7 re0 23h55m33s S R fe80::218:e7ff:fee0:807b%re0 00:18:e7:e0:80:7b re0 23h55m33s S R hal.xyzzy 20:cf:30:55:5c:b6 re0 permanent R mizar.xyzzy f0:de:f1:98:86:a9 re0 23h58m35s S [0]# After "ifconfig bridge0 create && ifconfig bridge0 addm re0 && ifconfig bridge0 up": [0]# ndp -a Neighbor Linklayer Address Netif Expire S Flags 2002:b2bf:ee7e:4d42:22cf:30ff:fe55:5cb6 20:cf:30:55:5c:b6 re0 permanent R fec0::4d42:22cf:30ff:fe55:5cb6 20:cf:30:55:5c:b6 re0 permanent R fec0:0:0:4d42::e1 20:cf:30:55:5c:b6 re0 permanent R fe80::22cf:30ff:fe55:5cb6%re0 20:cf:30:55:5c:b6 re0 permanent R gandalf.xyzzy 00:03:0d:4f:f3:a7 re0 23h58m48s S R fe80::203:dff:fe4f:f3a7%re0 00:03:0d:4f:f3:a7 re0 23h51m46s S R fe80::218:e7ff:fee0:807b%re0 00:18:e7:e0:80:7b re0 23h51m46s S R hal.xyzzy 20:cf:30:55:5c:b6 re0 permanent R mizar.xyzzy f0:de:f1:98:86:a9 re0 23h59m48s S [0]# After "ifconfig tap904 create && ifconfig bridge0 addm tap904": [0]# ndp -a Neighbor Linklayer Address Netif Expire S Flags 2002:b2bf:ee7e:4d42:22cf:30ff:fe55:5cb6 20:cf:30:55:5c:b6 re0 permanent R fec0::4d42:22cf:30ff:fe55:5cb6 20:cf:30:55:5c:b6 re0 permanent R fec0:0:0:4d42::e1 20:cf:30:55:5c:b6 re0 permanent R fe80::22cf:30ff:fe55:5cb6%re0 20:cf:30:55:5c:b6 re0 permanent R gandalf.xyzzy 00:03:0d:4f:f3:a7 re0 23h58m2s S R fe80::203:dff:fe4f:f3a7%re0 00:03:0d:4f:f3:a7 re0 23h51m0s S R fe80::218:e7ff:fee0:807b%re0 00:18:e7:e0:80:7b re0 23h51m0s S R hal.xyzzy 20:cf:30:55:5c:b6 re0 permanent R mizar.xyzzy f0:de:f1:98:86:a9 re0 23h59m2s S [0]# Now starting the bhyve VM; the rest is from inside the VM. Before manually added ndp entries: [0]# ndp -a Neighbor Linklayer Address Netif Expire S Flags v904.xyzzy 00:a0:98:50:35:17 vtnet0 permanent R gandalf.xyzzy 00:03:0d:4f:f3:a7 vtnet0 23h59m57s S R fe80::203:dff:fe4f:f3a7%vtnet0 00:03:0d:4f:f3:a7 vtnet0 23h59m2s S R fe80::218:e7ff:fee0:807b%vtnet0 00:18:e7:e0:80:7b vtnet0 23h59m2s S R 2002:b2bf:ee7e:4d42:2a0:98ff:fe50:3517 00:a0:98:50:35:17 vtnet0 permanent R fec0::4d42:2a0:98ff:fe50:3517 00:a0:98:50:35:17 vtnet0 permanent R fe80::2a0:98ff:fe50:3517%vtnet0 00:a0:98:50:35:17 vtnet0 permanent R mizar.xyzzy f0:de:f1:98:86:a9 vtnet0 23h59m57s S [0]# After "ndp -s fec0:0:0:4d42::e 20:cf:30:55:5c:b6 && ndp -s fec0:0:0:4d42::e1 20:cf:30:55:5c:b6" (the host has two IPv6 addresses assigned to its interface; fec0:0:0:4d42::e resolves to hal.xyzzy): [0]# ndp -a Neighbor Linklayer Address Netif Expire S Flags fec0:0:0:4d42::e1 20:cf:30:55:5c:b6 vtnet0 permanent R v904.xyzzy 00:a0:98:50:35:17 vtnet0 permanent R gandalf.xyzzy 00:03:0d:4f:f3:a7 vtnet0 23h58m54s S R fe80::203:dff:fe4f:f3a7%vtnet0 00:03:0d:4f:f3:a7 vtnet0 23h57m59s S R fe80::218:e7ff:fee0:807b%vtnet0 00:18:e7:e0:80:7b vtnet0 23h57m59s S R 2002:b2bf:ee7e:4d42:2a0:98ff:fe50:3517 00:a0:98:50:35:17 vtnet0 permanent R fec0::4d42:2a0:98ff:fe50:3517 00:a0:98:50:35:17 vtnet0 permanent R fe80::2a0:98ff:fe50:3517%vtnet0 00:a0:98:50:35:17 vtnet0 permanent R hal.xyzzy 20:cf:30:55:5c:b6 vtnet0 permanent R mizar.xyzzy f0:de:f1:98:86:a9 vtnet0 23h58m54s S [0]# -- Martin Isn't the IP configuration (both v4 and v6) supposed to go on the bridge interface instead of em0? There should be a message upon inserting em0 as a member: "IPv6 addresses on em0 have been removed before adding it as a member to prevent IPv6 address scope violation." To clarify the bhyve use case a little further: Setup: bhyve (tap0) - bridge - vlan0 ``` root@host:~ # ifconfig vm-service vm-service: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 ether 76:92:90:55:ad:c5 id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto stp-rstp maxaddr 2000 timeout 1200 root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0 member: tap0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP> ifmaxaddr 0 port 15 priority 128 path cost 2000000 member: vlan_service flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP> ifmaxaddr 0 port 9 priority 128 path cost 20000 groups: bridge vm-switch viid-aaabf@ nd6 options=41<PERFORMNUD,NO_RADR> root@host:~ # ifconfig vlan-service vlan_service: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=200401<RXCSUM,LRO,RXCSUM_IPV6> ether 40:62:31:11:af:6f inet 172.24.0.254 netmask 0xffffff00 broadcast 172.24.0.255 inet 172.24.0.1 netmask 0xffffffff broadcast 172.24.0.1 inet 172.24.0.153 netmask 0xffffffff broadcast 172.24.0.153 inet6 fe80::4262:31ff:fe11:af6f%vlan_service prefixlen 64 scopeid 0x9 inet6 fd55:3904:d01f:0:4262:31ff:fe11:af6f prefixlen 64 inet6 fd55:3904:d01f::153 prefixlen 64 inet6 fd55:3904:d01f::1 prefixlen 64 inet6 2404:c804:1637:4c00:4262:31ff:fe11:af6f prefixlen 64 groups: vlan vlan: 2750 vlanpcp: 0 parent interface: igb0 media: Ethernet autoselect (1000baseT <full-duplex>) status: active nd6 options=61<PERFORMNUD,AUTO_LINKLOCAL,NO_RADR> root@linux:~# ip neighbour ls 172.24.0.153 dev enp0s5 lladdr 58:9c:fc:00:41:7a REACHABLE 172.24.0.254 dev enp0s5 lladdr 40:62:31:11:af:6f REACHABLE fd55:3904:d01f::1 dev enp0s5 lladdr 40:62:31:11:af:6f router REACHABLE fe80::4262:31ff:fe11:af6f dev enp0s5 lladdr 40:62:31:11:af:6f router STALE fd55:3904:d01f::153 dev enp0s5 FAILED 2404:c804:1637:4c00:4262:31ff:fe11:af6f dev enp0s5 lladdr 40:62:31:11:af:6f router STALE ``` Linux running in bhyve is unable to communicate with fd55:3904:d01f::153 on the host because we never respond to the NDP packets. If fd55:3904:d01f::153 is configured on the bridge rather than on the vlan_service interface, everything works normally. Again: this is how it is supposed to work. You *must* configure IPv4 and IPv6 addresses on the bridge interface and not on the VLAN member. See FreeBSD Handbook on bridging. Yes, the problem is indeed that the addresses should be set on the bridge interface, not the member interfaces. It mostly works if you don't, but only mostly. Multicast is broken in that setup. That's because in bridge_input() we special-case multicast and broadcast traffic. It gets forward *out* all of the member interfaces and injected into the bridge interface. Member interfaces do not get to see it. The bridge interface is not subscribed to the expected multicast group (because the address is not set on it, but on a member interface) and the packet gets ignored. Kristof, possibly we should make that paragraph more prominent in the bridging chapter? Can I clone the docs repo now and submit a pull request instead of the awkward "could some committer please look at XY" procedure of the past - now that we are using git? (In reply to Patrick M. Hausen from comment #7) I'm not a doc committer myself, but I'd hope that patches are welcome in just about any form. Maybe send it to bcr@? He's always helpful. (Off topic: I don't think that we've fully embraced the GitHub PR workflow yet. It's likely easier to get those PRs in than it was in the past though.) Spinning this a little further, shouldn't the VM's (tapXXX) IP address also be assigned to the bridge? Obviously this would not be a good idea because then the host get's the client's traffic... So why is bridging used in the first place in this scenario? - It seems because it is the only way for the client to be connected to an outside ("real") network. It would probably be more correct for bhyve to use an internal virtual network which is then routed (layer 3) to the external network by the host. Is this analysis correct? -- Martin Of course not. A tap interface (and an epair for jails) has got two ends. One on the host side, member of the bridge, and *without* any IP address. The other end *in* the guest VM or jail with the VM's/jail's IP address. So that's a different matter. And you can build an internal network and route if you prefer. - create a bridge without a physical member interface - assign suitable IP address to that bridge, this will be the default gateway in your network - enable forwarding on the host - make all your VM taps member of that bridge - take care of external routing so other systems know how that network can be reached - or use ipfw/pf and NAT We do that all the time. Again IP addresses *of the host* go on the bridge. (In reply to Patrick M. Hausen from comment #7) > Kristof, possibly we should make that paragraph more prominent in the bridging > chapter? Can I clone the docs repo now and submit a pull request instead of the > awkward "could some committer please look at XY" procedure of the past - now that we > are using git? You can also attach your pull request to this bug. We can Cc: it to a doc committer for review. (Assuming that "pull request" is newspeak for "patch" and not some Git magic I'm not familiar with.) I think this limitation on multicast should be documented in the if_bridge(4) manual page as well as in the Handbook. Given that this setup "mostly" works (except for multicast), there's probably no point in having ifconfig complain if you assign an address to an interface that's a member in a bridge. It would also open up cans of worms about attaching interfaces that have addresses on them to bridges. Conceptually, this limitation makes sense: you really should put addresses on the bridge interface rather than the VLAN interface. In this setup, the VLAN interface becomes morally equivalent to a "link" rather than an "interface", much as the host side of the tap to bhyve is. (In reply to Patrick M. Hausen from comment #11) Thanks for setting me straight on this. It is of course the client's interface that gets the client IP address, in this case vtnet0. And the tapXXX in the host just sees the layer 2 traffic as part of the bridge. One more question: Is the handling of the re0 and tap0 interfaces (both members of bridge0) different? - Because tap0 seems to see all the NDP traffic in both directions but re0 not? -- Martin The difference is that there is an IP address on re0 and none on tap0 in the setup that is almost but not quite working. If you really want to know what that does to multicast in the kernel, I have to refer to Kristof ;-) (In reply to Philip Paeps from comment #12) Philip, a pull request is how almost all of the open source with the notable exception of FreeBSD handles contributions. The workflow is: - clone repository into my own working copy - make, debug, commit, push changes to my heart's content - click on "send pull request" in browser - the upstream project receives that, whoever is in charge reviews and hopefully accepts it, and - clicks on "merge pull request" Two clicks instead of manually creating a diff, creating a ticket, etc. ... I was hoping one of the reasons for moving to git *was* easier contribution. (In reply to Patrick M. Hausen from comment #3) This is a reply to an older comment #3... There never was such a message. I also tried to find "prevent IPv6" in head and releng/12.2, to no avail (using find ... -exec grep ...). Where should that message come from? -- Martin (In reply to Martin Birgmeier from comment #16) https://svnweb.freebsd.org/base/releng/12.2/sys/net/if_bridge.c line 1207 ff. (In reply to Patrick M. Hausen from comment #17) Thank you. Which leaves the question as to why such a message never shows up in my setup. -- Martin I'd like to come back to this issue. Basically, I am (still :-)) not assigning the IP addresses to the bridge interface. The major reason for this is that I am assembling/disassembling the bridge and its member interfaces as needed, and I do not want to always have to fiddle with reassigning IP addresses from the member interfaces to the bridge and vice versa. Which brings me to my point: In normal networking parlance a bridge knows nothing about ISO layer 3 and therefore not about IP. Much less it gets an IP address assigned (let us not digress to smart managed devices). So I believe that we have a design issue here: In FreeBSD we are talking about a "bridge" but in reality it is kludge used to tie some interfaces together. Or at least it is not a bridge in the traditional networking sense. How difficult would it be to redesign the bridge abstraction in FreeBSD to more closely resemble a real layer 2 bridge? -- Martin There are two bridge implementations in FreeBSD. The classical one you are using. And the netgraph one ng_bridge, which is much simpler. If you have a problem with the classical one, would you mind to give the ng_bridge a try? You may assign ng_eiface virtual interfaces to it, if necessary. I'm just curious to know which part of the classical bridge is the problematic part. Hi Lutz, It seems I would need FreeBSD 13 for this, right? - I am still at 12.2 -- Martin The ng_bridge(4) node type was implemented in FreeBSD 4.2. I bet it will work in 12.x, too. Isn't https://reviews.freebsd.org/D24620 needed for this? Oh, it might be necessary for bhyve VMs. I'm not familiar with this part. I thought about connecting "real" interfaces like eiface inside the VM. |