Bug 221122 - Attaching interface to a bridge stops all traffic on uplink NIC for few seconds
Summary: Attaching interface to a bridge stops all traffic on uplink NIC for few seconds
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.1-RELEASE
Hardware: amd64 Any
: --- Affects Many People
Assignee: freebsd-net (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-07-31 13:57 UTC by Heinz N. Gies
Modified: 2021-12-08 09:12 UTC (History)
10 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Heinz N. Gies 2017-07-31 13:57:20 UTC
Running the following test case leads to the bridge0 to become unresponsive (this includes other interfraces on the bridge):

while true
do
   PAIR=`ifconfig epair create | sed 's/a\$//'`
   ifconfig bridge0 addm ${PAIR}a
   jail -i -c name=crash persist vnet=new vnet.interface=${PAIR}b exec.start="/sbin/ifconfig ${PAIR}b name net0p"
   jail -r crash
   ifconfig ${PAIR}a destroy
done


The test setup:

                               ┌───────────────────────────────────┐
                               │             BSD Box               │
                  ┌──────┐     │     ┌──────────────┬─────────────┐│
┌───────────┐     │      │     │     │      em0     │             ││
│ ping host │────▶│switch│─────┼───▶│ 192.168.1.22 │             ││
└───────────┘     │      │     │     └──────────────┤             ││
                  └──────┘     │                    │             ││
                               │                    │   bridge0   ││
                               │                    │             ││
                               │                    │             ││
                               │                    │             ││
                               │                    │             ││
                               │                    └─────────────┘│
                               └───────────────────────────────────┘

The kernel is the 11.1-RELEASE kernel with the following patch applied: https://reviews.freebsd.org/D11782

The compiling config is:

include GENERIC
ident FIFOKERNEL

nooptions       SCTP   # Stream Control Transmission Protocol
options         VIMAGE # VNET/Vimage support
options         RACCT  # Resource containers
options         RCTL   # same as above


The system is a Supermicro X9SCL/X9SCM with an Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz CPU and an intel 1G network card.
Comment 1 Alexander Motin freebsd_committer 2017-10-16 10:16:18 UTC
I've tried to reproduce this, and all I see is an uplink interface flap for several seconds due to bridge need to disable/restore of interface offload flags.  After NIC reinitialize the link, operation is restored.  Do I reproduce your issue, or you mean something different?
Comment 2 Heinz N. Gies 2017-10-16 10:54:34 UTC
Hi, first of all, thanks for looking into this! It does sound like an explanation for what I'm seeing. I sadly know little about the internals of the network stack, but the symptoms seem to fit. Adding an interface leads to a reproducible drop of connectivity/delay for a few seconds.
Comment 3 Alexander Motin freebsd_committer 2017-10-16 11:00:18 UTC
Then I tend to say that it behaves correctly, even though not very nice.  If you wish to avoid the flaps on bridge reconfiguration, you may explicitly disable some capabilities of uplink interface before bridge configuration, to avoid them modified by bridge later on epair interface addition/removal.
Comment 4 Heinz N. Gies 2017-10-16 11:24:23 UTC
I understand that it acts as implemented i.e. is not a code bug. Before we close this I'd like to make a case that is not working as intended but rather working as accepted.

The VNET system is rather new in FreeBSD, bridges, on the other hand, exist for a lot longer.

Historically bridges were used in a rather static manner, to bridge physical interfaces (they don't change often), or bridge between physical interfaces and tunnels or other virtual but too rather static interfaces.

This kind of use is often a one-time configuration that happens on system startup or in the case of tunnels in an incredibly rare basis. At those times the loss of connectivity for a few seconds either has no impact (during startup), or the impact is neglectable (i.e. adding tunnel interfaces as no one is connected to a nonexisting interface anyway).

I suspect that when the decision was made to implement it this way all that was taken into consideration and (rightfully so) it wasn't worth the work for finding an alternative as it was working good enough for its use.

VNET and more so VNET jails change things a bit, they make network configuration more dynamic. It becomes required to add and remove interfaces to a bridge dynamically - something that I suspect wasn't foreseen.

Features do not exist in a void, they exist in relation to their environment. The environment for bridges changed and while it was fine before it becomes problematic in this changed environment.

I agree it's not a 'bug' in the bridge driver. But we can not look at a single component in isolation and on a system level, I'm sure that 'starting/stopping a vnet jail means all other vnet jails loose connectivity' is intended behavior.
Comment 5 Alexander Motin freebsd_committer 2017-10-16 11:32:02 UTC
OK, we call it any way you like, but it does not change the facts: to be able bridge interfaces with different hardware capabilities, some of those capabilities has to be disabled, and changing capabilities for Intel NICs ends up in NIC reinit, that takes time and invasive.  Before this was introduced, bridging was just not working correctly in number of scenarios, including VNET jails also, especially for modern NICs with more offload capabilities.  If somebody see alternative way to handle that -- be my guest.
Comment 6 Eugene Grosbein freebsd_committer 2017-10-16 11:40:00 UTC
(In reply to Heinz N. Gies from comment #4)

Addition of first member to the bridge is quite different from addition of others. Why do you think it interfers with traffic flow every time?

Also, you did not show your actions (commands) and has not been quite specific describing what ill effects those actions bring thereafter.
Comment 7 Heinz N. Gies 2017-10-16 11:55:55 UTC
(In reply to Eugene Grosbein from comment #6)

> Addition of first member to the bridge is quite different from addition of others. Why do you think it interferes with traffic flow every time?

Mostly because I could not find any documentation regarding this so all I had to go by was what I observed and it never occurred to me to try a second or third interface after seeing the problem with the first.

The actions/commands in the initial bug report, along with a diagram of the setup, and hardware specifications.

The ill effect is losing network connectivity for a few seconds, for a server that can be quite problematic.

Perhaps I'm approaching this all wrong and trying to squeeze s square peg through a round hole. Are bridge/epairs the wrong tools for vnet jails, is there a better alternative?
Comment 8 Alexander Motin freebsd_committer 2017-10-16 12:06:53 UTC
(In reply to Heinz N. Gies from comment #7)
Bridge+epair are the right tools, unless you wish to dedicate one NIC completely to specific VNET Jail.

I've already told you how to workaround the problem:  when configuring uplink interface, you can explicitly disable its capabilities that bridge try to disable otherwise (TSO, LRO, TOE, TXCSUM, TXCSUM6).  In that case bridge should be happy from the beginning and not modify capabilities any more.
Comment 9 commit-hook freebsd_committer 2017-10-16 12:33:54 UTC
A commit references this bug:

Author: mav
Date: Mon Oct 16 12:32:57 UTC 2017
New revision: 324659
URL: https://svnweb.freebsd.org/changeset/base/324659

Log:
  Update details of interface capabilities changed by bridge(4).

  PR:		221122
  MFC after:	1 week

Changes:
  head/share/man/man4/bridge.4
Comment 10 Eugene Grosbein freebsd_committer 2017-10-16 12:38:58 UTC
(In reply to Heinz N. Gies from comment #7)

Please repeat your tests being more thorough:

1. Verify if you still have the problem while adding second and next bridge members after uplink interface already added as first bridge member.

2. Compare output of ifconfig $uplink before and after it added to the bridge. Then destroy the bridge and use ifconfig for uplink to disable features that bridge disables automatically. Then repeat creation of the bridge and verify if addition of uplink as first bridge member still leads to uplink reset.
Comment 11 Heinz N. Gies 2017-10-16 12:41:07 UTC
Yes I read that, and I've been going through the man pages trying to figure out which those are is there a list of settings supported by epairs. Just saw the updated info bridge I think that's what I was looking for.

I was worried that the delta (RXCSUM, TXCSUM, TSO4) is not exhaustive - and it seems it wasn't.

Weeding through ifconfig(8), will LRO also be affected?

I'm not trying to be dense. I've spent quite some time building tooling around jails and am trying to understand this good enough to write up the steps for someone (like me) who don't know how bridges are implemented to get things working in a way that can be used in a production environment without unpleasant surprises.
Comment 12 Eugene Grosbein freebsd_committer 2017-10-16 12:42:44 UTC
(In reply to Heinz N. Gies from comment #7)

> 2. Compare output of ifconfig $uplink before and after it added to the bridge.

... after it AND other members added to the bridge.
Comment 13 Heinz N. Gies 2017-10-16 12:49:18 UTC
(In reply to Eugene Grosbein from comment #12)


ifconfig em0 (no bridge interfaces)
em0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=4219b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC,VLAN_HWTSO>
	ether 00:25:90:a6:3b:c7
	hwaddr 00:25:90:a6:3b:c7
	inet 192.168.1.22 netmask 0xffffff00 broadcast 192.168.1.255
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
	media: Ethernet autoselect (1000baseT <full-duplex>)
	status: active

adding first bridge interface:

64 bytes from 192.168.1.22: icmp_seq=22 ttl=64 time=1.325 ms
Request timeout for icmp_seq 23
Request timeout for icmp_seq 24
Request timeout for icmp_seq 25
Request timeout for icmp_seq 26
Request timeout for icmp_seq 27
Request timeout for icmp_seq 28
Request timeout for icmp_seq 29
64 bytes from 192.168.1.22: icmp_seq=30 ttl=64 time=1.261 ms

ifconfig em0 (after adding bridge interface) 
em0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=42098<VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,VLAN_HWTSO>
	ether 00:25:90:a6:3b:c7
	hwaddr 00:25:90:a6:3b:c7
	inet 192.168.1.22 netmask 0xffffff00 broadcast 192.168.1.255
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
	media: Ethernet autoselect (1000baseT <full-duplex>)
	status: active


adding second interface:

64 bytes from 192.168.1.22: icmp_seq=132 ttl=64 time=1.432 ms
64 bytes from 192.168.1.22: icmp_seq=133 ttl=64 time=1.332 ms
64 bytes from 192.168.1.22: icmp_seq=134 ttl=64 time=1.146 ms

(no drops)
Comment 14 Eugene Grosbein freebsd_committer 2017-10-17 08:05:43 UTC
(In reply to Heinz N. Gies from comment #13)

Have you tried to use /etc/rc.conf to disable these features of em0 that anyway got disabled by the bridge? And then create the bridge and try to add members to it to make sure that it does not affect traffic this way.
Comment 15 Heinz N. Gies 2017-10-17 16:17:17 UTC
(In reply to Eugene Grosbein from comment #14)

Yes, I did remove the features Alexander recommended, that did solve the downtime issue. He also did submit a patch to document the behavior. While it isn't ideal we don't live in a perfect world and having it documented probably as good as it gets.
Comment 16 Julian Elischer freebsd_committer 2017-10-18 05:49:17 UTC
The earlier comment that epair and bridge were the way to go was correct but incomplete.  You can also use netgraph to plumb the jails (this was how vimage was originally done). See the examples in /usr/share/examples.
Comment 17 commit-hook freebsd_committer 2017-10-23 07:39:35 UTC
A commit references this bug:

Author: mav
Date: Mon Oct 23 07:39:05 UTC 2017
New revision: 324908
URL: https://svnweb.freebsd.org/changeset/base/324908

Log:
  MFC r324659: Update details of interface capabilities changed by bridge(4).

  PR:		221122

Changes:
_U  stable/11/
  stable/11/share/man/man4/bridge.4
Comment 18 Mason Loring Bliss freebsd_triage 2021-12-07 03:46:54 UTC
Re-opening as this is still an issue, and shouldn't be. Details to follow.
Comment 19 Mason Loring Bliss freebsd_triage 2021-12-07 03:52:06 UTC
I saw this issue on FreeBSD 13.0-RELEASE, and following kbowling's
recommendation, also tried the most recent 13-STABLE images. This latter
is where I've gathered data.

Same issue: Add an epair half to a bridge and things go away for several
seconds. The delay is quite possibly longer in -STABLE but I might be
imagining it. Either way, documented below. Note that on literally the
same hardware, the same operations cause no delay under Debian Bullseye:
Have a bridge, add a vnet device to it, and everything keeps flowing
without interruption, which is useful since these boxes are hypervisors
and running a variety of generally network-oriented tasks.

# freebsd-version -ku ; uname -a
13.0-STABLE
13.0-STABLE
FreeBSD amazon.int.blisses.org 13.0-STABLE FreeBSD 13.0-STABLE #0
stable/13-n248302-2cd26a286a9: Thu Dec  2 02:40:58 UTC 2021
root@releng3.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
amd64
# dmesg | tail
uhid0 on uhub3
uhid0: <Logitech USB Keyboard, class 0/0, rev 1.10/79.00, addr 3> on usbus1
Security policy loaded: MAC/ntpd (mac_ntpd)
epair0a: Ethernet address: 02:52:8f:32:b1:0a
epair0b: Ethernet address: 02:52:8f:32:b1:0b
epair0a: link state changed to UP
epair0b: link state changed to UP
igb0: link state changed to DOWN
epair0a: promiscuous mode enabled
igb0: link state changed to UP
# dmesg | grep igb0
igb0: <Intel(R) I350 (Copper)> mem
0xfb720000-0xfb73ffff,0xfb7c4000-0xfb7c7fff irq 40 at device 0.0 on pci3
igb0: EEPROM V0.93-0 eTrack 0x800006b2
igb0: Using 1024 TX descriptors and 1024 RX descriptors
igb0: Using 6 RX queues 6 TX queues
igb0: Using MSI-X interrupts with 7 vectors
igb0: Ethernet address: 00:25:90:a6:a5:60
igb0: netmap queues/slots: TX 6/1024, RX 6/1024
igb0: promiscuous mode enabled
igb0: link state changed to UP
igb0: link state changed to DOWN
igb0: link state changed to UP

/home/mason$ date ; ssh root@amazon ifconfig bridge0 addm
epair0a ; date
Mon 06 Dec 2021 10:36:37 PM EST
Mon 06 Dec 2021 10:36:41 PM EST
/home/mason$ date ; ssh root@amazon ifconfig bridge0 deletem
epair0a ; date
Mon 06 Dec 2021 10:37:00 PM EST
Mon 06 Dec 2021 10:37:05 PM EST
/home/mason$ date ; ssh root@amazon date ; date
Mon 06 Dec 2021 10:38:14 PM EST
Mon Dec  6 22:38:14 EST 2021
Mon 06 Dec 2021 10:38:14 PM EST
Comment 20 Mason Loring Bliss freebsd_triage 2021-12-08 04:11:47 UTC
The bridge is set up per:

    https://wiki.freebsd.org/MasonLoringBliss/JailsEpair

...albeit with igb0 rather than em0 in this case.

So:

cloned_interfaces="bridge0"
ifconfig_bridge0="inet 10.0.0.2 netmask 0xffffff00 addm igb0"
ifconfig_igb0="up"
Comment 21 Alexander Motin freebsd_committer 2021-12-08 04:18:25 UTC
Mason,  I don't see how this can be fixed without either significantly complicating the bridge driver to handle TSO/LRO/etc offload in software, or without making Intel drivers somehow avoid chip reset on interface capability changes (if that is even possible).

In TrueNAS we've workarounded this problem by adding UI checkbox to preemptively disable interface offload on boot.  Done earlier it allows to avoid the interface flap later.
Comment 22 Mason Loring Bliss freebsd_triage 2021-12-08 04:35:25 UTC
Linux manages the trick on this same box, so the hardware can manage it 
unless there's some critical difference I'm missing. I'd be happy to 
explore from either side to shed more light on it. And sure, I can change 
my model a bit and add a pool of epairs at boot and assign them 
programmatically instead of using per-jail numbering and dynamic spin-up as
I do today, but my interest in this started when I realized the delay was
there in FreeBSD. Seems unfortunate for FreeBSD's handling to be less 
capable than what Linux can do. That said, if I'm missing some concept that
is different and matters, I'm eager to learn about it.
Comment 23 Eugene Grosbein freebsd_committer 2021-12-08 06:06:33 UTC
(In reply to Mason Loring Bliss from comment #22)

Please provide output of "ifconfig igb0" and "ifconfig bridge0" just after boot when bridge has only single igb0 member.

Then, attach new epair to the bridge as you generally do and show output of ifconfig again for igb0, bridge0 and epair at host.

I'm sure the problem should be solved replacing ifconfig_igb0="up" with another one disabling offloads not supported by epair.
Comment 24 Marek Zarychta 2021-12-08 08:20:08 UTC
At least one epair(4) can be created and added to the bridge early to make consensus between capabilities:  

ifconfig_oce3="up mtu 9000"
cloned_interfaces="bridge0 epair0 ..."
create_args_epair0="mtu 9000 up"
ifconfig_bridge0="addm oce3 addm epair0a ..."

The bridge will not suffer from epair(4) interfaces added later then.
Comment 25 Eugene Grosbein freebsd_committer 2021-12-08 09:12:19 UTC
(In reply to Marek Zarychta from comment #24)

Capabilities can be changed at bridge creation time, so first epair won't be affected, too. That's why I've asked for ifconfig output to be exact.