I have a Dell PowerEdge m610 system with 6 bce NICs. BladeCenter is equipped with M6348 switches in fabric A and pass-through modules in fabrics B and C. bce[01] are connected to switch stack via backplane and form a lagg with LACP, bce[23] are connected via pass-through module to copper ports in the same switch stack and create another lagg with LACP. bce[45] are connected elsewhere without LAG. As far as I understand an interface can change its state in lagg when it goes down or up, thanks to if_link_state_change and do_link_state_change in sys/net/if.c. if_link_state_change is equipped with a check for "changing" into the same state. I've added following printfs to see what exactly happens: --- a/sys/dev/bce/if_bce.c +++ b/sys/dev/bce/if_bce.c @@ -6460,11 +6460,23 @@ bce_phy_intr(struct bce_softc *sc) if (new_link_state) { if (bootverbose) if_printf(sc->bce_ifp, "link UP\n"); + printf("%s: state change to UP; from 0x%x to 0x%x, calling if_link_state_change(0x%x, 0x%x)\n", + sc->bce_ifp->if_xname, + old_link_state, + new_link_state, + sc->bce_ifp->if_link_state, + LINK_STATE_UP); if_link_state_change(sc->bce_ifp, LINK_STATE_UP); } else { if (bootverbose) if_printf(sc->bce_ifp, "link DOWN\n"); + printf("%s: state change to DOWN; from 0x%x to 0x%x, calling if_link_state_change(0x%x, 0x%x)\n", + sc->bce_ifp->if_xname, + old_link_state, + new_link_state, + sc->bce_ifp->if_link_state, + LINK_STATE_DOWN); if_link_state_change(sc->bce_ifp, LINK_STATE_DOWN); } --- a/sys/net/if.c +++ b/sys/net/if.c @@ -1900,9 +1900,12 @@ void *(*vlan_cookie_p)(struct ifnet *); void if_link_state_change(struct ifnet *ifp, int link_state) { + log(LOG_NOTICE, "%s: link state changed to %s, scheduling do_link_state_change\n", ifp->if_xname, (link_state == LINK_STATE_UP) ? "UP" : "DOWN" ); /* Return if state hasn't changed. */ - if (ifp->if_link_state == link_state) + if (ifp->if_link_state == link_state) { + log(LOG_NOTICE, "%s: not really changed, skipping\n", ifp->if_xname); return; + } ifp->if_link_state = link_state; @@ -1921,6 +1924,8 @@ do_link_state_change(void *arg, int pending) if (ifp->if_vlantrunk != NULL) (*vlan_link_state_p)(ifp); + log(LOG_NOTICE, "%s: link state changed to %s, calling hooks\n", ifp->if_xname, (link_state == LINK_STATE_UP) ? "UP" : "DOWN" ); + if ((ifp->if_type == IFT_ETHER || ifp->if_type == IFT_L2VLAN) && IFP2AC(ifp)->ac_netgraph != NULL) (*ng_ether_link_state_p)(ifp, link_state); @@ -1928,8 +1933,10 @@ do_link_state_change(void *arg, int pending) (*carp_linkstate_p)(ifp); if (ifp->if_bridge) (*bridge_linkstate_p)(ifp); - if (ifp->if_lagg) + if (ifp->if_lagg) { + log(LOG_NOTICE, "%s: hook link_state\n", ifp->if_xname); (*lagg_linkstate_p)(ifp, link_state); + } if (IS_DEFAULT_VNET(curvnet)) devctl_notify("IFNET", ifp->if_xname, As howtos and manuals suggest, my rc.conf contains ifconfig_bce[01234]="up", this causes the following thing to be logged: Jul 10 17:18:55 aw19lb2 kernel: bce0: state change to UP; from 0x0 to 0x1, calling if_link_state_change(0x0, 0x2) Jul 10 17:18:55 aw19lb2 kernel: bce1: state change to UP; from 0x0 to 0x1, calling if_link_state_change(0x0, 0x2) Jul 10 17:18:55 aw19lb2 kernel: bce2: state change to UP; from 0x0 to 0x1, calling if_link_state_change(0x0, 0x2) Jul 10 17:18:55 aw19lb2 kernel: bce3: state change to UP; from 0x0 to 0x1, calling if_link_state_change(0x0, 0x2) Later on laggs get their ports added: ifconfig_lagg1_name="internal" ifconfig_internal="laggproto lacp laggport bce2 laggport bce3" This causes following lacp debug messages: Jul 10 17:18:55 aw19lb2 kernel: bce2: media changed 0x0 -> 0x22, ether = 1, fdx = 0, link = 1 Jul 10 17:18:55 aw19lb2 kernel: bce2: partner timeout changed Jul 10 17:18:55 aw19lb2 kernel: bce2: LACP_STATE_AGGREGATION==0, key is 8003 Jul 10 17:18:55 aw19lb2 kernel: bce2: -> UNSELECTED Jul 10 17:18:55 aw19lb2 kernel: bce3: media changed 0x0 -> 0x22, ether = 1, fdx = 0, link = 1 Jul 10 17:18:55 aw19lb2 kernel: bce3: partner timeout changed Jul 10 17:18:55 aw19lb2 kernel: bce3: LACP_STATE_AGGREGATION==0, key is 8004 Jul 10 17:18:55 aw19lb2 kernel: bce3: -> UNSELECTED Ports are added, but they are missing Full Duplex for some reason, so they don't participate in lagg yet. This gets fixed later on, but not for bce3: Jul 10 17:18:57 aw19lb2 kernel: bce3: state change to UP; from 0x0 to 0x1, calling if_link_state_change(0x2, 0x2) Jul 10 17:18:57 aw19lb2 kernel: bce3: link state changed to UP, scheduling do_link_state_change Jul 10 17:18:57 aw19lb2 kernel: bce3: not really changed, skipping bce3 performs change from down to up, but there was never change to down, so if_link_state_change ignores the change and lagg stuff is never called. For bce2 this looks different, there is a real change to down, and then up: Jul 10 17:18:59 aw19lb2 kernel: bce2: state change to DOWN; from 0x1 to 0x0, calling if_link_state_change(0x2, 0x1) Jul 10 17:18:59 aw19lb2 kernel: bce2: link state changed to DOWN, scheduling do_link_state_change Jul 10 17:18:59 aw19lb2 kernel: bce2: link state changed to DOWN, calling hooks Jul 10 17:18:59 aw19lb2 kernel: bce2: hook link_state Jul 10 17:18:59 aw19lb2 kernel: bce2: media changed 0x22 -> 0x100630, ether = 1, fdx = 1, link = 0 Jul 10 17:18:59 aw19lb2 kernel: bce2: LACP_STATE_AGGREGATION==0, key is 8003 Jul 10 17:18:59 aw19lb2 kernel: bce2: link state changed to DOWN # end of do_link_state_change Jul 10 17:18:59 aw19lb2 kernel: bce2: state change to UP; from 0x0 to 0x1, calling if_link_state_change(0x1, 0x2) Jul 10 17:18:59 aw19lb2 kernel: bce2: link state changed to UP, scheduling do_link_state_change Jul 10 17:18:59 aw19lb2 kernel: bce2: link state changed to UP, calling hooks Jul 10 17:18:59 aw19lb2 kernel: bce2: hook link_state Jul 10 17:18:59 aw19lb2 kernel: bce2: media changed 0x100630 -> 0x100630, ether = 1, fdx = 1, link = 1 Jul 10 17:18:59 aw19lb2 kernel: bce2: key is 20b Jul 10 17:18:59 aw19lb2 kernel: bce2: -> UNSELECTED Jul 10 17:18:59 aw19lb2 kernel: bce2: link state changed to UP Jul 10 17:18:59 aw19lb2 kernel: bce2: lacpdu receive Jul 10 17:18:59 aw19lb2 kernel: bce2: lacp_sm_rx_update_ntt: assert ntt Jul 10 17:18:59 aw19lb2 kernel: bce2: old pstate 38<SYNC,COLLECTING,DISTRIBUTING> Jul 10 17:18:59 aw19lb2 kernel: bce2: new pstate c5<ACTIVITY,AGGREGATION,DEFAULTED,EXPIRED> Jul 10 17:18:59 aw19lb2 kernel: bce2: lacpdu transmit In the meantime there are multiple messages "bce[23]: lacpdu receive", so both links are really up and are receiving packets from switch. My system is patched with patch from http://lists.freebsd.org/pipermail/freebsd-net/2013-February/034649.html, otherwise there are even more problems, as bce[01] are show to be connected to different media, even though both of them are connected directly to switch via Blade Centre's backplane. Generally so strict checking for media type forbids creating a 4-port lagg for such network configuration as I have, where some ports would use direct backplane connection and some extra copper cables, but this is yet another topic... [root@aw19lb2 ~]# ifconfig -m bce0 bce0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=800bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,LINKSTATE> capabilities=800bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,LINKSTATE> ether c8:1f:66:9d:1b:88 media: Ethernet autoselect (1000baseSX <full-duplex,rxpause,txpause>) status: active supported media: media autoselect media 1000baseSX mediaopt full-duplex media 1000baseSX [root@aw19lb2 ~]# [root@aw19lb2 ~]# ifconfig -m bce1 bce1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=800bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,LINKSTATE> capabilities=800bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,LINKSTATE> ether c8:1f:66:9d:1b:88 media: Ethernet autoselect (1000baseT <full-duplex,rxpause,txpause>) status: active supported media: media autoselect media 1000baseSX mediaopt full-duplex media 1000baseSX
Created attachment 144650 [details] Check for proper phy messages, never check phy status in bce_tick()
Comment on attachment 144650 [details] Check for proper phy messages, never check phy status in bce_tick() I've checked what Linux bce driver does and what the NIC really sends. This patch changes the following things: - Remove handling of PHY state from bce_tick() as this function might catch some not fully "stable" state changes (like IFM_ETHER|IFM_NONE) and set interface up before link reaches full link state (IFM_ETHER|IFM_FDX|IFM_1000_T|rxpause|txpause) making the link unusable for lagg. - sc->status_block->status_attn_bits(_ack) are not really link status. Those bits can change from down to up and then from down to up again without going from up to down. Call if_link_state_change() only on changes of sc->bce_link_up which is set only via bce_ifmedia_sts_rphy(). - Distinguish between STATUS_ATTN_BITS_LINK_STATE and STATUS_ATTN_BITS_TIMER_ABORT. Timer abort messages are in fact sent even if only link state is subscribed for. The old code would then put the link up with IFM_ETHER|IFM_NONE state making the link unusable in lagg. - Sent bce_pulse when on STATUS_ATTN_BITS_TIMER_ABORT with event_code == BCE_FW_EVT_CODE_SW_TIMER_EXPIRE_EVENT and also on normal remote PHY events. Without this a message "Warning: bootcode thinks driver is absent." was seen on system boot.
Over to maintainers.
Keyword: patch or patch-ready – in lieu of summary line prefix: [patch] * bulk change for the keyword * summary lines may be edited manually (not in bulk). Keyword descriptions and search interface: <https://bugs.freebsd.org/bugzilla/describekeywords.cgi>