Running the following test case leads to the bridge0 to become unresponsive (this includes other interfraces on the bridge): while true do PAIR=`ifconfig epair create | sed 's/a\$//'` ifconfig bridge0 addm ${PAIR}a jail -i -c name=crash persist vnet=new vnet.interface=${PAIR}b exec.start="/sbin/ifconfig ${PAIR}b name net0p" jail -r crash ifconfig ${PAIR}a destroy done The test setup: ┌───────────────────────────────────┐ │ BSD Box │ ┌──────┐ │ ┌──────────────┬─────────────┐│ ┌───────────┐ │ │ │ │ em0 │ ││ │ ping host │────▶│switch│─────┼───▶│ 192.168.1.22 │ ││ └───────────┘ │ │ │ └──────────────┤ ││ └──────┘ │ │ ││ │ │ bridge0 ││ │ │ ││ │ │ ││ │ │ ││ │ │ ││ │ └─────────────┘│ └───────────────────────────────────┘ The kernel is the 11.1-RELEASE kernel with the following patch applied: https://reviews.freebsd.org/D11782 The compiling config is: include GENERIC ident FIFOKERNEL nooptions SCTP # Stream Control Transmission Protocol options VIMAGE # VNET/Vimage support options RACCT # Resource containers options RCTL # same as above The system is a Supermicro X9SCL/X9SCM with an Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz CPU and an intel 1G network card.
I've tried to reproduce this, and all I see is an uplink interface flap for several seconds due to bridge need to disable/restore of interface offload flags. After NIC reinitialize the link, operation is restored. Do I reproduce your issue, or you mean something different?
Hi, first of all, thanks for looking into this! It does sound like an explanation for what I'm seeing. I sadly know little about the internals of the network stack, but the symptoms seem to fit. Adding an interface leads to a reproducible drop of connectivity/delay for a few seconds.
Then I tend to say that it behaves correctly, even though not very nice. If you wish to avoid the flaps on bridge reconfiguration, you may explicitly disable some capabilities of uplink interface before bridge configuration, to avoid them modified by bridge later on epair interface addition/removal.
I understand that it acts as implemented i.e. is not a code bug. Before we close this I'd like to make a case that is not working as intended but rather working as accepted. The VNET system is rather new in FreeBSD, bridges, on the other hand, exist for a lot longer. Historically bridges were used in a rather static manner, to bridge physical interfaces (they don't change often), or bridge between physical interfaces and tunnels or other virtual but too rather static interfaces. This kind of use is often a one-time configuration that happens on system startup or in the case of tunnels in an incredibly rare basis. At those times the loss of connectivity for a few seconds either has no impact (during startup), or the impact is neglectable (i.e. adding tunnel interfaces as no one is connected to a nonexisting interface anyway). I suspect that when the decision was made to implement it this way all that was taken into consideration and (rightfully so) it wasn't worth the work for finding an alternative as it was working good enough for its use. VNET and more so VNET jails change things a bit, they make network configuration more dynamic. It becomes required to add and remove interfaces to a bridge dynamically - something that I suspect wasn't foreseen. Features do not exist in a void, they exist in relation to their environment. The environment for bridges changed and while it was fine before it becomes problematic in this changed environment. I agree it's not a 'bug' in the bridge driver. But we can not look at a single component in isolation and on a system level, I'm sure that 'starting/stopping a vnet jail means all other vnet jails loose connectivity' is intended behavior.
OK, we call it any way you like, but it does not change the facts: to be able bridge interfaces with different hardware capabilities, some of those capabilities has to be disabled, and changing capabilities for Intel NICs ends up in NIC reinit, that takes time and invasive. Before this was introduced, bridging was just not working correctly in number of scenarios, including VNET jails also, especially for modern NICs with more offload capabilities. If somebody see alternative way to handle that -- be my guest.
(In reply to Heinz N. Gies from comment #4) Addition of first member to the bridge is quite different from addition of others. Why do you think it interfers with traffic flow every time? Also, you did not show your actions (commands) and has not been quite specific describing what ill effects those actions bring thereafter.
(In reply to Eugene Grosbein from comment #6) > Addition of first member to the bridge is quite different from addition of others. Why do you think it interferes with traffic flow every time? Mostly because I could not find any documentation regarding this so all I had to go by was what I observed and it never occurred to me to try a second or third interface after seeing the problem with the first. The actions/commands in the initial bug report, along with a diagram of the setup, and hardware specifications. The ill effect is losing network connectivity for a few seconds, for a server that can be quite problematic. Perhaps I'm approaching this all wrong and trying to squeeze s square peg through a round hole. Are bridge/epairs the wrong tools for vnet jails, is there a better alternative?
(In reply to Heinz N. Gies from comment #7) Bridge+epair are the right tools, unless you wish to dedicate one NIC completely to specific VNET Jail. I've already told you how to workaround the problem: when configuring uplink interface, you can explicitly disable its capabilities that bridge try to disable otherwise (TSO, LRO, TOE, TXCSUM, TXCSUM6). In that case bridge should be happy from the beginning and not modify capabilities any more.
A commit references this bug: Author: mav Date: Mon Oct 16 12:32:57 UTC 2017 New revision: 324659 URL: https://svnweb.freebsd.org/changeset/base/324659 Log: Update details of interface capabilities changed by bridge(4). PR: 221122 MFC after: 1 week Changes: head/share/man/man4/bridge.4
(In reply to Heinz N. Gies from comment #7) Please repeat your tests being more thorough: 1. Verify if you still have the problem while adding second and next bridge members after uplink interface already added as first bridge member. 2. Compare output of ifconfig $uplink before and after it added to the bridge. Then destroy the bridge and use ifconfig for uplink to disable features that bridge disables automatically. Then repeat creation of the bridge and verify if addition of uplink as first bridge member still leads to uplink reset.
Yes I read that, and I've been going through the man pages trying to figure out which those are is there a list of settings supported by epairs. Just saw the updated info bridge I think that's what I was looking for. I was worried that the delta (RXCSUM, TXCSUM, TSO4) is not exhaustive - and it seems it wasn't. Weeding through ifconfig(8), will LRO also be affected? I'm not trying to be dense. I've spent quite some time building tooling around jails and am trying to understand this good enough to write up the steps for someone (like me) who don't know how bridges are implemented to get things working in a way that can be used in a production environment without unpleasant surprises.
(In reply to Heinz N. Gies from comment #7) > 2. Compare output of ifconfig $uplink before and after it added to the bridge. ... after it AND other members added to the bridge.
(In reply to Eugene Grosbein from comment #12) ifconfig em0 (no bridge interfaces) em0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=4219b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC,VLAN_HWTSO> ether 00:25:90:a6:3b:c7 hwaddr 00:25:90:a6:3b:c7 inet 192.168.1.22 netmask 0xffffff00 broadcast 192.168.1.255 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active adding first bridge interface: 64 bytes from 192.168.1.22: icmp_seq=22 ttl=64 time=1.325 ms Request timeout for icmp_seq 23 Request timeout for icmp_seq 24 Request timeout for icmp_seq 25 Request timeout for icmp_seq 26 Request timeout for icmp_seq 27 Request timeout for icmp_seq 28 Request timeout for icmp_seq 29 64 bytes from 192.168.1.22: icmp_seq=30 ttl=64 time=1.261 ms ifconfig em0 (after adding bridge interface) em0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=42098<VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,VLAN_HWTSO> ether 00:25:90:a6:3b:c7 hwaddr 00:25:90:a6:3b:c7 inet 192.168.1.22 netmask 0xffffff00 broadcast 192.168.1.255 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active adding second interface: 64 bytes from 192.168.1.22: icmp_seq=132 ttl=64 time=1.432 ms 64 bytes from 192.168.1.22: icmp_seq=133 ttl=64 time=1.332 ms 64 bytes from 192.168.1.22: icmp_seq=134 ttl=64 time=1.146 ms (no drops)
(In reply to Heinz N. Gies from comment #13) Have you tried to use /etc/rc.conf to disable these features of em0 that anyway got disabled by the bridge? And then create the bridge and try to add members to it to make sure that it does not affect traffic this way.
(In reply to Eugene Grosbein from comment #14) Yes, I did remove the features Alexander recommended, that did solve the downtime issue. He also did submit a patch to document the behavior. While it isn't ideal we don't live in a perfect world and having it documented probably as good as it gets.
The earlier comment that epair and bridge were the way to go was correct but incomplete. You can also use netgraph to plumb the jails (this was how vimage was originally done). See the examples in /usr/share/examples.
A commit references this bug: Author: mav Date: Mon Oct 23 07:39:05 UTC 2017 New revision: 324908 URL: https://svnweb.freebsd.org/changeset/base/324908 Log: MFC r324659: Update details of interface capabilities changed by bridge(4). PR: 221122 Changes: _U stable/11/ stable/11/share/man/man4/bridge.4
Re-opening as this is still an issue, and shouldn't be. Details to follow.
I saw this issue on FreeBSD 13.0-RELEASE, and following kbowling's recommendation, also tried the most recent 13-STABLE images. This latter is where I've gathered data. Same issue: Add an epair half to a bridge and things go away for several seconds. The delay is quite possibly longer in -STABLE but I might be imagining it. Either way, documented below. Note that on literally the same hardware, the same operations cause no delay under Debian Bullseye: Have a bridge, add a vnet device to it, and everything keeps flowing without interruption, which is useful since these boxes are hypervisors and running a variety of generally network-oriented tasks. # freebsd-version -ku ; uname -a 13.0-STABLE 13.0-STABLE FreeBSD amazon.int.blisses.org 13.0-STABLE FreeBSD 13.0-STABLE #0 stable/13-n248302-2cd26a286a9: Thu Dec 2 02:40:58 UTC 2021 root@releng3.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 # dmesg | tail uhid0 on uhub3 uhid0: <Logitech USB Keyboard, class 0/0, rev 1.10/79.00, addr 3> on usbus1 Security policy loaded: MAC/ntpd (mac_ntpd) epair0a: Ethernet address: 02:52:8f:32:b1:0a epair0b: Ethernet address: 02:52:8f:32:b1:0b epair0a: link state changed to UP epair0b: link state changed to UP igb0: link state changed to DOWN epair0a: promiscuous mode enabled igb0: link state changed to UP # dmesg | grep igb0 igb0: <Intel(R) I350 (Copper)> mem 0xfb720000-0xfb73ffff,0xfb7c4000-0xfb7c7fff irq 40 at device 0.0 on pci3 igb0: EEPROM V0.93-0 eTrack 0x800006b2 igb0: Using 1024 TX descriptors and 1024 RX descriptors igb0: Using 6 RX queues 6 TX queues igb0: Using MSI-X interrupts with 7 vectors igb0: Ethernet address: 00:25:90:a6:a5:60 igb0: netmap queues/slots: TX 6/1024, RX 6/1024 igb0: promiscuous mode enabled igb0: link state changed to UP igb0: link state changed to DOWN igb0: link state changed to UP /home/mason$ date ; ssh root@amazon ifconfig bridge0 addm epair0a ; date Mon 06 Dec 2021 10:36:37 PM EST Mon 06 Dec 2021 10:36:41 PM EST /home/mason$ date ; ssh root@amazon ifconfig bridge0 deletem epair0a ; date Mon 06 Dec 2021 10:37:00 PM EST Mon 06 Dec 2021 10:37:05 PM EST /home/mason$ date ; ssh root@amazon date ; date Mon 06 Dec 2021 10:38:14 PM EST Mon Dec 6 22:38:14 EST 2021 Mon 06 Dec 2021 10:38:14 PM EST
The bridge is set up per: https://wiki.freebsd.org/MasonLoringBliss/JailsEpair ...albeit with igb0 rather than em0 in this case. So: cloned_interfaces="bridge0" ifconfig_bridge0="inet 10.0.0.2 netmask 0xffffff00 addm igb0" ifconfig_igb0="up"
Mason, I don't see how this can be fixed without either significantly complicating the bridge driver to handle TSO/LRO/etc offload in software, or without making Intel drivers somehow avoid chip reset on interface capability changes (if that is even possible). In TrueNAS we've workarounded this problem by adding UI checkbox to preemptively disable interface offload on boot. Done earlier it allows to avoid the interface flap later.
Linux manages the trick on this same box, so the hardware can manage it unless there's some critical difference I'm missing. I'd be happy to explore from either side to shed more light on it. And sure, I can change my model a bit and add a pool of epairs at boot and assign them programmatically instead of using per-jail numbering and dynamic spin-up as I do today, but my interest in this started when I realized the delay was there in FreeBSD. Seems unfortunate for FreeBSD's handling to be less capable than what Linux can do. That said, if I'm missing some concept that is different and matters, I'm eager to learn about it.
(In reply to Mason Loring Bliss from comment #22) Please provide output of "ifconfig igb0" and "ifconfig bridge0" just after boot when bridge has only single igb0 member. Then, attach new epair to the bridge as you generally do and show output of ifconfig again for igb0, bridge0 and epair at host. I'm sure the problem should be solved replacing ifconfig_igb0="up" with another one disabling offloads not supported by epair.
At least one epair(4) can be created and added to the bridge early to make consensus between capabilities: ifconfig_oce3="up mtu 9000" cloned_interfaces="bridge0 epair0 ..." create_args_epair0="mtu 9000 up" ifconfig_bridge0="addm oce3 addm epair0a ..." The bridge will not suffer from epair(4) interfaces added later then.
(In reply to Marek Zarychta from comment #24) Capabilities can be changed at bridge creation time, so first epair won't be affected, too. That's why I've asked for ifconfig output to be exact.
I just ran into this again today and that reminded me of the bug. This is from a different box, but the symptoms are the same: # ifconfig em0: flags=8963<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=4812099<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,VLAN_HWFILTER,NOMAP> ether elided media: Ethernet autoselect (1000baseT <full-duplex>) status: active nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384 options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6> inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2 inet 127.0.0.1 netmask 0xff000000 groups: lo nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 ether elided inet elided.2 netmask 0xffffff00 broadcast elided.255 id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200 root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0 member: epair43a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP> ifmaxaddr 0 port 4 priority 128 path cost 2000 member: em0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP> ifmaxaddr 0 port 1 priority 128 path cost 2000000 groups: bridge nd6 options=9<PERFORMNUD,IFDISABLED> epair43a: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=8<VLAN_MTU> ether elided groups: epair media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>) status: active nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
Marek, I just realized that your comment #24 applies to my situation. The first epair added incurs the penalty, but further epairs do not. I think this will be sufficient for my purposes. Thank you!
Should be fixed in stable/12 and stable/13 now.
If you could include a link to commits that'd be appreciated! Thank you.
(In reply to Eugene Grosbein from comment #28) Do you have a link to this commit?
I burned a few hours on this last night, first thinking something was amiss with iocage (fair assumption, as it seems to be another abandoned project). Then while troubleshooting, I started running the bridge creation and interface additions by hand and noticed my prompt was hanging for a few seconds. Then I found the link flaps in the logs: Aug 29 20:42:56 clweb5 kernel: ext0: link state changed to DOWN Aug 29 20:43:01 clweb5 kernel: ext0: Link is up, 1 Gbps Full Duplex, Requested FEC: None, Negotiated FEC: None, Autoneg: True, Flow Control: None Aug 29 20:43:01 clweb5 kernel: ext0: link state changed to UP Aug 29 20:45:53 clweb5 kernel: ext0: link state changed to DOWN Aug 29 20:45:57 clweb5 kernel: ext0: Link is up, 1 Gbps Full Duplex, Requested FEC: None, Negotiated FEC: None, Autoneg: True, Flow Control: None Aug 29 20:45:57 clweb5 kernel: ext0: link state changed to UP Aug 29 20:48:10 clweb5 kernel: ext0: link state changed to DOWN Aug 29 20:48:15 clweb5 kernel: ext0: Link is up, 1 Gbps Full Duplex, Requested FEC: None, Negotiated FEC: None, Autoneg: True, Flow Control: None Aug 29 20:48:15 clweb5 kernel: ext0: link state changed to UP Seems to take about 5 seconds for it to recover, which is kind of rough on a box that will be hosting multiple jails. I understand there were workarounds posted, but I'm curious about the fix mentioned here and under what conditions this should not happen? NICs are ixl(4) OS is: 13.2-RELEASE-p2 FreeBSD 13.2-RELEASE-p2 GENERIC amd64 I did dig through the manpage for if_bridge(4), and I'm sure I saw the note about matching capabilities, but it didn't really jump out as a cause. Maybe a note that specifically calls out the most common use case (bridging with epair(4) for jails, bhyve or other virtualization methods) would be a good idea? Or even something in epair(4)'s manpage?
(In reply to spork from comment #31) Sorry forgot to show my diffs for the interface options between bridged and not-bridged: [root@clweb5 /home/spork]# diff /tmp/options-ixl-nobridge /tmp/options-ixl-bridge 1c1 < options=4e503bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP> --- > options=4a500b9<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,NOMAP> 3c3 < options=4e503bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP> --- > options=4a500b9<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,NOMAP>
Some additional testing here... There are two workarounds presented in this thread: - Add "-txcsum -tso4 -tso6 -txcsum6" (or whatever your NIC requires) to the ifconfig statement for your interface(s) in rc.conf. This requires knowing what you need to disable to make sure your NIC and epair have equal capabilities so that when the epair interface is added to the bridge, there's no need to reinit the NIC to make the capabilities match, and therefore, no connectivity loss. - Pre-plumb the bridge and epair interfaces by adding them to rc.conf's cloned_interfaces and add the epair to the "addm" ifconfig line. On boot, the "addm" runs and we don't care about the reinit of the NIC because it's during boot. This method does not require knowing what capabilities need to be disabled on the NIC. I'm finding neither of these actually work as workarounds, because in 13.2 with my ixl NICs I can see both with iocage (a jail shutdown or restart) and with manual ifconfig commands (removing a vtnet interface from a bridge) cause the NIC to reinit. In other words, removing an epair/vtnet interface from a bridge seems to put the offloading capabilities back in place, rendering either workaround useless. Again, I'm not clear on what the fix was that was mentioned in comment #28, so if I'm way off base here, let me know! Example follows... We have a bridge containing my external ixl interface and an epair/vtnet interface from a jail: [root@clweb5 /home/spork]# ifconfig bridge0 bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 ether 58:9c:fc:10:ff:d9 id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200 root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0 member: vnet0.10 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP> ifmaxaddr 0 port 7 priority 128 path cost 2000 member: ext0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP> ifmaxaddr 0 port 1 priority 128 path cost 55 groups: bridge nd6 options=9<PERFORMNUD,IFDISABLED> The ext0 (ixl) interface was already a member of the bridge when the jail started to there was NO NIC reinit/loss of connectivity when the jail started (good!). ext0 options look like this while a member of bridge0 (ie: txcsum and two for v4 and v6 are disabled): ext0: flags=8963<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=4a500b9<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,NOMAP> Now I manually pull vtnet0.10 from the above bridge: [root@clweb5 /home/spork]# ifconfig bridge0 deletem vnet0.10 And we see connectivity drop for 5 seconds: Aug 31 15:32:57 clweb5 kernel: vnet0.10: promiscuous mode disabled Aug 31 15:32:57 clweb5 kernel: ext0: link state changed to DOWN Aug 31 15:33:02 clweb5 kernel: ext0: Link is up, 1 Gbps Full Duplex, Requested FEC: None, Negotiated FEC: None, Autoneg: True, Flow Control: None Aug 31 15:33:02 clweb5 kernel: ext0: link state changed to UP And we see why - removing the vtnet bridge member causes something(?) to put all the flags I'd removed from ext0 back in place (txcsum, txcsum6, tso4, tso6): [root@clweb5 /home/spork]# ifconfig ext0 ext0: flags=8963<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=4e503bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP> Again, this is me manually removing the interface from the bridge, not iocage. Standard jails and iocage jails both call a "destroy" on the vtnet/epair interface, so this isn't just an iocage issue. Sorry this is so long... anyhow the questions again: - Did the prior workarounds "work" and then stop working later? - Did the behavior of bringing explicitly-removed flags back to an interface when members are removed from a bridge change at some point? - What was the fix in comment #28?
The answer to "when did interface capabilities get restored when a member is removed" is "back in 2008". This commit altered how interface flags were dealt with: https://cgit.freebsd.org/src/commit/sys/net/if_bridge.c?id=ec29c623005ca6a32d44fb59bc2a759a96dc75e4 You can see a variable "bif_savedcaps" was added so that the bridge now tracks what the original interface flags were. Then when a member is removed, it looks like all of a bridge's interfaces are looped through and the original flags are restored (in bridge_delete_member()): + /* reneable any interface capabilities */ + bridge_set_ifcap(sc, bif, bif->bif_savedcaps); Not sure where, but this kind of feels like it could be a tunable, like "net.link.bridge.restore_caps" or similar, given a) jails will trigger this with lots of NICs b) these days 5 seconds of downtime is actually not a minor issue in many environments and c) it need not change any defaults, but rc.d/jail and 3rd party jail scripts could opt to set it d) jails are kind of a big reason people come to FreeBSD. I'm not much of a coder, but I could get that sysctl like 80% there I think after looking at the other "net.link.bridge" tunables... any takers on helping? Any thoughts on whether this makes sense?
OK, really done for now... :) I'm trying this out for a bit. [root@clweb5 /usr/src/sys/net]# diff -u if_bridge.c.dist if_bridge.c.caps --- if_bridge.c.dist 2023-08-31 22:47:16.758453000 -0400 +++ if_bridge.c.caps 2023-09-01 19:05:41.724323000 -0400 @@ -452,6 +452,13 @@ CTLFLAG_RWTUN | CTLFLAG_VNET, &VNET_NAME(log_stp), 0, "Log STP state changes"); +/* restore member if capabilites */ +VNET_DEFINE_STATIC(int, restore_caps) = 1; +#define V_restore_caps VNET(restore_caps) +SYSCTL_INT(_net_link_bridge, OID_AUTO, restore_caps, + CTLFLAG_RWTUN | CTLFLAG_VNET, &VNET_NAME(restore_caps), 0, + "Restore member interface flags on reinit"); + /* share MAC with first bridge member */ VNET_DEFINE_STATIC(int, bridge_inherit_mac); #define V_bridge_inherit_mac VNET(bridge_inherit_mac) @@ -1151,7 +1158,8 @@ #endif break; } - /* reneable any interface capabilities */ + /* reneable any interface capabilities if restore_caps is set */ + if (V_restore_caps) bridge_set_ifcap(sc, bif, bif->bif_savedcaps); } bstp_destroy(&bif->bif_stp); /* prepare to free */