238198 – Traffic through a vm -> bridge(4) -> vlan -> ix(4) does not return

Bug 238198 - Traffic through a vm -> bridge(4) -> vlan -> ix(4) does not return

Summary: Traffic through a vm -> bridge(4) -> vlan -> ix(4) does not return

Status:	Open

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	12.0-RELEASE
Hardware:	Any Any

Importance:	--- Affects Only Me
Assignee:	freebsd-net (Nobody)

URL:
Keywords:	needs-qa

Depends on:
Blocks:

Reported:	2019-05-28 22:15 UTC by dgilbert
Modified:	2024-10-04 14:31 UTC (History)
CC List:	5 users (show)

See Also:	208910

Attachments
tcpdump -nvi ix0.221 (2.36 KB, application/vnd.tcpdump.pcap) 2019-05-28 22:15 UTC, dgilbert	no flags	Details
tcpdump -nvi ix0 (2.56 KB, application/vnd.tcpdump.pcap) 2019-05-28 22:16 UTC, dgilbert	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description dgilbert 2019-05-28 22:15:43 UTC

Created attachment 204676 [details]
tcpdump -nvi ix0.221

Not sure if pr_208910 is related, but this is definately different

Summary:

Traffic through a vm -> bridge -> vlan -> ix doesn't return.  tcpdump at ix0 shows both pings, tcpdump at ix0.221 shows only outbound direction of pings.

vlan.pcap is a tcpdump of ix0.221
port.pcap is a tcpdump of ix0

System is 12.0-p4 and here's the bridge config:

bridge2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        description: vm-lan221
        ether 02:df:71:71:45:02
        id 00:bd:a8:26:5b:00 priority 32768 hellotime 2 fwddelay 15
        maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
        root id 00:bd:a8:26:5b:00 priority 32768 ifcost 0 port 0
        member: tap0 flags=167<LEARNING,DISCOVER,STP,EDGE,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 12 priority 128 path cost 2000000 proto rstp
                role designated state forwarding
        member: ix0.221 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 7 priority 128 path cost 55
        groups: bridge vm-switch viid-c15cb@
        nd6 options=1<PERFORMNUD>

ix0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500     options=e53bbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
        ether a0:36:9f:17:ba:10
        inet 192.168.110.3 netmask 0xffffff00 broadcast 192.168.110.255
        inet6 fe80::a236:9fff:fe17:ba10%ix0 prefixlen 64 scopeid 0x1
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

ix0.221: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=200001<RXCSUM,RXCSUM_IPV6>
        ether a0:36:9f:17:ba:10
        inet 66.96.20.34 netmask 0xffffffe0 broadcast 66.96.20.63
        inet 66.96.20.35 netmask 0xffffffe0 broadcast 66.96.20.63
        inet6 fe80::a236:9fff:fe17:ba10%ix0.221 prefixlen 64 scopeid 0x7
        inet6 2001:1928:1::34 prefixlen 64
        inet6 2001:1928:1::35 prefixlen 64
        groups: vlan
        vlan: 221 vlanpcp: 0 parent interface: ix0
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

tap0 is obviously the VM in question.  Bridge2 does "know" the mac addresses ... but the broacast packets (for ARP) do make it through.

[1:4:304]root@run:~> ifconfig bridge2 addr
00:04:4b:2b:92:6c Vlan1 ix0.221 0 flags=0<>
58:9c:fc:01:d4:67 Vlan1 tap0 1200 flags=0<>
00:00:aa:ae:e1:31 Vlan1 ix0.221 1168 flags=0<>
00:c0:b7:2c:43:c5 Vlan1 ix0.221 789 flags=0<>
a0:36:9f:17:bb:0c Vlan1 ix0.221 1156 flags=0<>
f0:9f:c2:0a:dd:0c Vlan1 ix0.221 1170 flags=0<>
00:04:4b:47:9b:dc Vlan1 ix0.221 1195 flags=0<>
b4:fb:e4:80:48:0e Vlan1 ix0.221 1142 flags=0<>
00:12:3f:41:72:fd Vlan1 ix0.221 1200 flags=0<>
10:7b:44:92:e8:fd Vlan1 ix0.221 1200 flags=0<>

Comment 1 dgilbert 2019-05-28 22:16:13 UTC

Created attachment 204677 [details]
tcpdump -nvi ix0

Comment 2 dgilbert 2019-05-28 22:20:38 UTC

If it's not already clear, the packets are coming from a "ping 192.168.221.2" in the VM.  192.168.221.2 is an external machine on vlan 221 (also FreeBSD).

Interestingly, also, two DHCP servers are on the network... one on the HOST and one on an external host.  The on-the-host DHCP server is bound to ix0.221 and manages to talk to the VM.  The external DHCP server cannot respond to the VM (ie: the VM fails to get an IP if the local DHCP server is not running).

Comment 3 Rodney W. Grimes freebsd_committer

2019-05-29 03:33:36 UTC

Tagging kevans@ into this as he just made a fix to bridge that had to do with bpf missing some packets as they traverse the bridge so he might have a clue as to if that effects this or not.  Ie, it may be that bpf is not seeing all things it should be seeing.

I have also had issues in the past with getting the HOST to see vlans when I am doing the bhyve/vm, tap, bridge, em0 with vlan trunking turned on.  My vm's can talk to real boxes on the em0 network just fine, and the host can talk to boxes on the em0 network just fine, but my vm's can not talk to my host.

Comment 4 Kyle Evans freebsd_committer

2019-05-29 13:33:25 UTC

(In reply to Rodney W. Grimes from comment #3)

There seems to be something else going on here- ix0.221 should've showed the inbound traffic (with or without my fixes) since that gets tapped in ether_input_internal before entering the bridge at all. The general flow here *should* look like:

external -> ix0:ether_input_internal -> ix0:ether_demux -> vlan_input -> ix0.221:ether_input_internal -> bridge2:bridge_input -> tap0:ether_demux

You've observed the traffic getting tapped at the second step above, then there's a disconnect somewhere after that.

Comment 5 Eugene Grosbein freebsd_committer

2019-05-29 14:12:21 UTC

If you add and interface like ix0.221 to a bridge, you cannot leave IP addresses on ix0.221. Move them to the bridge0 and it should just work.

Comment 6 Rodney W. Grimes freebsd_committer

2019-05-29 14:15:45 UTC

(In reply to Eugene Grosbein from comment #5)
How does one specify the vlan 221 on bridgeX for that IP address, and what happens when I have 4 vlans with different IP addresses, do I stick them all on bridgeX?

Comment 7 dgilbert 2019-05-29 14:26:16 UTC

(In reply to Rodney W. Grimes from comment #6)

What you're saying is "what if I have 3 ethernet cards, with different addresses all plugged into the same switch" ... in effect.  I suppose there's nothing stopping the bridge interface from having the 3 aliases.  Thinking about differences, you still have 3 IP addresses on the same MAC (in the vlan case, not the three ethernet cards case).

If this is indeed the problem (I'm testing now) ... this needs some explicit documentation somewhere.  This might be the second time I've come around to this "documentation bug" ... but from a different direction.  The first time was that the bridge straight up needed the IP not the vlan ... but without a VM to think about.

Comment 8 dgilbert 2019-05-29 14:29:00 UTC

Hrm.  This is a case with different ethernets being different.  Host also has an re0.  In rc.conf, I s1,$/ix0/re0/ and rebooted.  With re0, dhcpd (locally) doesn't give an IP to the VM where it did on ix0.

Comment 9 Eugene Grosbein freebsd_committer

2019-05-29 14:29:26 UTC

(In reply to Rodney W. Grimes from comment #6)

Our if_bridge does not support tagged frames currently, so you'd need bridge-per-vlan, so each bridge deals with frames already stripped.

Comment 10 dgilbert 2019-05-29 15:59:20 UTC

(In reply to dgilbert from comment #8)

OK.  My last comment sounded far-out on a limb, even to me.  But I re-verified it.  with isc-dhcpd and dhcpd_ifaces = either "ix0.221" or "re0.221" ... and ix0.221 or re0.221 added to brige2,

The VM gets and IP from the DHCP server with ix0 and not with re0.

Comment 11 Piotr Pietruszewski 2019-05-30 13:16:19 UTC

(In reply to dgilbert from comment #10)

I tried to reproduce this bug, but with no luck. Could you please provide additional information mentioned below?

1. What is the device id of your ix interface?
2. Does log from dmesg contain anything unusual?
3. Could you provide VM's configuration parameters?

Comment 12 dgilbert 2019-05-31 00:05:52 UTC

ix0@pci0:7:0:0: class=0x020000 card=0x00008086 chip=0x15638086 rev=0x01 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Ethernet Controller 10G X550T'
    class      = network
    subclass   = ethernet

ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver> mem 0xd0000000-0xd01fffff,0xd0200000-0xd0203fff irq 40 at device 0.0 on pci7
ix0: using 2048 tx descriptors and 2048 rx descriptors
ix0: msix_init qsets capped at 64
ix0: pxm cpus: 8 queue msgs: 63 admincnt: 1
ix0: using 8 rx queues 8 tx queues
ix0: Using MSIX interrupts with 9 vectors
ix0: allocated for 8 queues
ix0: allocated for 8 rx queues
ix0: Ethernet address: a0:36:9f:17:ba:10
ix0: PCI Express Bus: Speed 5.0GT/s Width x4
ix0: netmap queues/slots: TX 8/2048, RX 8/2048
ix0: link state changed to UP
ix0: link state changed to DOWN
ix0.1: link state changed to DOWN
bridge1: can't disable some capabilities on ix0.1: 0x400
ix0: promiscuous mode enabled
ix0.1: promiscuous mode enabled
bridge2: can't disable some capabilities on ix0.221: 0x400
ix0.221: promiscuous mode enabled
ix0: link state changed to UP
ix0.221: link state changed to UP
ix0.1: link state changed to UP

[1:13:313]root@run:~> cat /vms/FreeNAS/FreeNAS.conf
loader="bhyveload"
cpu=2
memory=8G
network0_type="virtio-net"
network0_switch="lan221"
disk0_type="virtio-blk"
disk0_name="disk0"
disk0_dev="sparse-zvol"
uuid="b2fdb1cd-b6d5-4ac0-a167-f216b52e0701"
network0_mac="58:9c:fc:01:d4:67"
disk1_name="disk1"
disk1_type="virtio-blk"
disk1_dev="sparse-zvol"
disk2_name="disk2"
disk2_type="virtio-blk"
disk2_dev="sparse-zvol"
disk3_name="disk3"
disk3_type="virtio-blk"
disk3_dev="sparse-zvol"

... so far I have only completed running "vm install FreeNAS" on this VM.

Comment 13 dgilbert 2019-05-31 00:13:10 UTC

It doesn't explain the _different_ behaviour between ix0 and re0, but there is one bug I managed to nail myself.

I _had_ ix0 (or re0) attached to bridge0 (picking up untagged vlan 1 --- which this switch refuses to tag).  Then I had a few other vlans plus vlan 221 (the one we're discussing).  Certainly, I have had lots of BSD machines useing the raw ethernet to pick up the management vlan untagged --- but I don't believe I've had a bridge there before.

For now, I will use re0 to pick up the untagged vlan (sigh... feels like an engineering waste), but I do understand the complexity here.  In a netgraph-like case, you can specify the ethertypes that are taken and left and whatnot --- ifconfig doesn't allow us to express this.

I would very much like to be in a discussion of layer 2 semantics, should one occur.  Terminology is drastically overloaded and the number of useful combinations is high ... leaving a more flexible solution a clear winner.

What I'm saying is that the ability to pick off an untagged vlan 1 on the raw port is very useful with modern gear.  I realize this means having a way to specify picking off ethertypes (at least for v4 and v6) and that potential confusion is high ... so accurate abstraction is key.

Anyways... far beyond the status of this bug.  re0 and ix0 behave differently in this corner case, but you may need to add re0 and/or ix0 to a bridge to replicate it.

Comment 14 punkt.de Hosting Team 2019-06-03 08:39:16 UTC

(In reply to dgilbert from comment #13)
I really never understood the need for and hence the presence of untagged frames on a switch trunk port. But it's in the standard so for now we are stuck with it. If your switch is of Cisco brand, here's what we do:

switchport trunk native vlan 1001

1001 is a VLAN that is never used in our entire data centre, so everything that matters is properly tagged.

Kind regards,
Patrick

Comment 15 dgilbert 2019-06-03 10:54:20 UTC

(In reply to punkt.de Hosting Team from comment #14)

[ on configuring my switch to avoid the problem ]

That's fine on a Cisco.  But for the dozens of other brands, not-so-much.  In this case, it's a ubiquity (or unifi ?) ... and aggressively positioned switch as part of an ecosystem of WiFi and WISP gear.

AFAICT, changing the access vlan removes communication with the management lan alltogether.  As I said, so far I'm using another ethernet card to talk to this.

It's not an uncommon setup, I believe.  FreeBSD should be that swiss army knife that just works in all situations, not the OS that prays to one particular god of configuration or the other.

Comment 16 Mark Linimon freebsd_committer

2024-10-04 14:31:17 UTC

^Triage: clear stale flags.