Created attachment 204676 [details] tcpdump -nvi ix0.221 Not sure if pr_208910 is related, but this is definately different Summary: Traffic through a vm -> bridge -> vlan -> ix doesn't return. tcpdump at ix0 shows both pings, tcpdump at ix0.221 shows only outbound direction of pings. vlan.pcap is a tcpdump of ix0.221 port.pcap is a tcpdump of ix0 System is 12.0-p4 and here's the bridge config: bridge2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 description: vm-lan221 ether 02:df:71:71:45:02 id 00:bd:a8:26:5b:00 priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200 root id 00:bd:a8:26:5b:00 priority 32768 ifcost 0 port 0 member: tap0 flags=167<LEARNING,DISCOVER,STP,EDGE,AUTOEDGE,AUTOPTP> ifmaxaddr 0 port 12 priority 128 path cost 2000000 proto rstp role designated state forwarding member: ix0.221 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP> ifmaxaddr 0 port 7 priority 128 path cost 55 groups: bridge vm-switch viid-c15cb@ nd6 options=1<PERFORMNUD> ix0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=e53bbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6> ether a0:36:9f:17:ba:10 inet 192.168.110.3 netmask 0xffffff00 broadcast 192.168.110.255 inet6 fe80::a236:9fff:fe17:ba10%ix0 prefixlen 64 scopeid 0x1 media: Ethernet autoselect (1000baseT <full-duplex>) status: active nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> ix0.221: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=200001<RXCSUM,RXCSUM_IPV6> ether a0:36:9f:17:ba:10 inet 66.96.20.34 netmask 0xffffffe0 broadcast 66.96.20.63 inet 66.96.20.35 netmask 0xffffffe0 broadcast 66.96.20.63 inet6 fe80::a236:9fff:fe17:ba10%ix0.221 prefixlen 64 scopeid 0x7 inet6 2001:1928:1::34 prefixlen 64 inet6 2001:1928:1::35 prefixlen 64 groups: vlan vlan: 221 vlanpcp: 0 parent interface: ix0 media: Ethernet autoselect (1000baseT <full-duplex>) status: active nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> tap0 is obviously the VM in question. Bridge2 does "know" the mac addresses ... but the broacast packets (for ARP) do make it through. [1:4:304]root@run:~> ifconfig bridge2 addr 00:04:4b:2b:92:6c Vlan1 ix0.221 0 flags=0<> 58:9c:fc:01:d4:67 Vlan1 tap0 1200 flags=0<> 00:00:aa:ae:e1:31 Vlan1 ix0.221 1168 flags=0<> 00:c0:b7:2c:43:c5 Vlan1 ix0.221 789 flags=0<> a0:36:9f:17:bb:0c Vlan1 ix0.221 1156 flags=0<> f0:9f:c2:0a:dd:0c Vlan1 ix0.221 1170 flags=0<> 00:04:4b:47:9b:dc Vlan1 ix0.221 1195 flags=0<> b4:fb:e4:80:48:0e Vlan1 ix0.221 1142 flags=0<> 00:12:3f:41:72:fd Vlan1 ix0.221 1200 flags=0<> 10:7b:44:92:e8:fd Vlan1 ix0.221 1200 flags=0<>
Created attachment 204677 [details] tcpdump -nvi ix0
If it's not already clear, the packets are coming from a "ping 192.168.221.2" in the VM. 192.168.221.2 is an external machine on vlan 221 (also FreeBSD). Interestingly, also, two DHCP servers are on the network... one on the HOST and one on an external host. The on-the-host DHCP server is bound to ix0.221 and manages to talk to the VM. The external DHCP server cannot respond to the VM (ie: the VM fails to get an IP if the local DHCP server is not running).
Tagging kevans@ into this as he just made a fix to bridge that had to do with bpf missing some packets as they traverse the bridge so he might have a clue as to if that effects this or not. Ie, it may be that bpf is not seeing all things it should be seeing. I have also had issues in the past with getting the HOST to see vlans when I am doing the bhyve/vm, tap, bridge, em0 with vlan trunking turned on. My vm's can talk to real boxes on the em0 network just fine, and the host can talk to boxes on the em0 network just fine, but my vm's can not talk to my host.
(In reply to Rodney W. Grimes from comment #3) There seems to be something else going on here- ix0.221 should've showed the inbound traffic (with or without my fixes) since that gets tapped in ether_input_internal before entering the bridge at all. The general flow here *should* look like: external -> ix0:ether_input_internal -> ix0:ether_demux -> vlan_input -> ix0.221:ether_input_internal -> bridge2:bridge_input -> tap0:ether_demux You've observed the traffic getting tapped at the second step above, then there's a disconnect somewhere after that.
If you add and interface like ix0.221 to a bridge, you cannot leave IP addresses on ix0.221. Move them to the bridge0 and it should just work.
(In reply to Eugene Grosbein from comment #5) How does one specify the vlan 221 on bridgeX for that IP address, and what happens when I have 4 vlans with different IP addresses, do I stick them all on bridgeX?
(In reply to Rodney W. Grimes from comment #6) What you're saying is "what if I have 3 ethernet cards, with different addresses all plugged into the same switch" ... in effect. I suppose there's nothing stopping the bridge interface from having the 3 aliases. Thinking about differences, you still have 3 IP addresses on the same MAC (in the vlan case, not the three ethernet cards case). If this is indeed the problem (I'm testing now) ... this needs some explicit documentation somewhere. This might be the second time I've come around to this "documentation bug" ... but from a different direction. The first time was that the bridge straight up needed the IP not the vlan ... but without a VM to think about.
Hrm. This is a case with different ethernets being different. Host also has an re0. In rc.conf, I s1,$/ix0/re0/ and rebooted. With re0, dhcpd (locally) doesn't give an IP to the VM where it did on ix0.
(In reply to Rodney W. Grimes from comment #6) Our if_bridge does not support tagged frames currently, so you'd need bridge-per-vlan, so each bridge deals with frames already stripped.
(In reply to dgilbert from comment #8) OK. My last comment sounded far-out on a limb, even to me. But I re-verified it. with isc-dhcpd and dhcpd_ifaces = either "ix0.221" or "re0.221" ... and ix0.221 or re0.221 added to brige2, The VM gets and IP from the DHCP server with ix0 and not with re0.
(In reply to dgilbert from comment #10) I tried to reproduce this bug, but with no luck. Could you please provide additional information mentioned below? 1. What is the device id of your ix interface? 2. Does log from dmesg contain anything unusual? 3. Could you provide VM's configuration parameters?
ix0@pci0:7:0:0: class=0x020000 card=0x00008086 chip=0x15638086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = 'Ethernet Controller 10G X550T' class = network subclass = ethernet ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver> mem 0xd0000000-0xd01fffff,0xd0200000-0xd0203fff irq 40 at device 0.0 on pci7 ix0: using 2048 tx descriptors and 2048 rx descriptors ix0: msix_init qsets capped at 64 ix0: pxm cpus: 8 queue msgs: 63 admincnt: 1 ix0: using 8 rx queues 8 tx queues ix0: Using MSIX interrupts with 9 vectors ix0: allocated for 8 queues ix0: allocated for 8 rx queues ix0: Ethernet address: a0:36:9f:17:ba:10 ix0: PCI Express Bus: Speed 5.0GT/s Width x4 ix0: netmap queues/slots: TX 8/2048, RX 8/2048 ix0: link state changed to UP ix0: link state changed to DOWN ix0.1: link state changed to DOWN bridge1: can't disable some capabilities on ix0.1: 0x400 ix0: promiscuous mode enabled ix0.1: promiscuous mode enabled bridge2: can't disable some capabilities on ix0.221: 0x400 ix0.221: promiscuous mode enabled ix0: link state changed to UP ix0.221: link state changed to UP ix0.1: link state changed to UP [1:13:313]root@run:~> cat /vms/FreeNAS/FreeNAS.conf loader="bhyveload" cpu=2 memory=8G network0_type="virtio-net" network0_switch="lan221" disk0_type="virtio-blk" disk0_name="disk0" disk0_dev="sparse-zvol" uuid="b2fdb1cd-b6d5-4ac0-a167-f216b52e0701" network0_mac="58:9c:fc:01:d4:67" disk1_name="disk1" disk1_type="virtio-blk" disk1_dev="sparse-zvol" disk2_name="disk2" disk2_type="virtio-blk" disk2_dev="sparse-zvol" disk3_name="disk3" disk3_type="virtio-blk" disk3_dev="sparse-zvol" ... so far I have only completed running "vm install FreeNAS" on this VM.
It doesn't explain the _different_ behaviour between ix0 and re0, but there is one bug I managed to nail myself. I _had_ ix0 (or re0) attached to bridge0 (picking up untagged vlan 1 --- which this switch refuses to tag). Then I had a few other vlans plus vlan 221 (the one we're discussing). Certainly, I have had lots of BSD machines useing the raw ethernet to pick up the management vlan untagged --- but I don't believe I've had a bridge there before. For now, I will use re0 to pick up the untagged vlan (sigh... feels like an engineering waste), but I do understand the complexity here. In a netgraph-like case, you can specify the ethertypes that are taken and left and whatnot --- ifconfig doesn't allow us to express this. I would very much like to be in a discussion of layer 2 semantics, should one occur. Terminology is drastically overloaded and the number of useful combinations is high ... leaving a more flexible solution a clear winner. What I'm saying is that the ability to pick off an untagged vlan 1 on the raw port is very useful with modern gear. I realize this means having a way to specify picking off ethertypes (at least for v4 and v6) and that potential confusion is high ... so accurate abstraction is key. Anyways... far beyond the status of this bug. re0 and ix0 behave differently in this corner case, but you may need to add re0 and/or ix0 to a bridge to replicate it.
(In reply to dgilbert from comment #13) I really never understood the need for and hence the presence of untagged frames on a switch trunk port. But it's in the standard so for now we are stuck with it. If your switch is of Cisco brand, here's what we do: switchport trunk native vlan 1001 1001 is a VLAN that is never used in our entire data centre, so everything that matters is properly tagged. Kind regards, Patrick
(In reply to punkt.de Hosting Team from comment #14) [ on configuring my switch to avoid the problem ] That's fine on a Cisco. But for the dozens of other brands, not-so-much. In this case, it's a ubiquity (or unifi ?) ... and aggressively positioned switch as part of an ecosystem of WiFi and WISP gear. AFAICT, changing the access vlan removes communication with the management lan alltogether. As I said, so far I'm using another ethernet card to talk to this. It's not an uncommon setup, I believe. FreeBSD should be that swiss army knife that just works in all situations, not the OS that prays to one particular god of configuration or the other.
^Triage: clear stale flags.