Bug 145728 - [lagg] Stops working lagg between two servers.
Summary: [lagg] Stops working lagg between two servers.
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 7.2-RELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-04-15 18:00 UTC by Slad
Modified: 2018-01-03 05:13 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Slad 2010-04-15 18:00:12 UTC
There are 2 servers, in everyone costs on 4 network cards. 2 from them are united in lagg.

In some days lagg collapses:
1 server
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=19b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4>
        ether 00:1b:21:3b:4d:4d
        inet 1.1.1.1 netmask 0xffffffc0 broadcast 1.1.1.255
        media: Ethernet autoselect
        status: active
        laggproto lacp
        laggport: em3 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
        laggport: em2 flags=4<ACTIVE>

ifconfig em2
em2: flags=9c43<UP,BROADCAST,RUNNING,OACTIVE,SIMPLEX,LINK0,MULTICAST> metric 0 mtu 1500
        options=19b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4>
        ether 00:1b:21:3b:4d:4d
        media: Ethernet autoselect (1000baseTX <full-duplex>)
        status: active
        lagg: laggdev lagg0


#less /var/run/dmesg.boot | grep em2
em2: <Intel(R) PRO/1000 Network Connection 6.9.6.Yandex[$Revision: 1.36.2.17 $]> port 0x3000-0x301f mem 0xd3180000-0xd319ffff,0xd3100000-0xd317ffff,0xd31a0000-0xd31a3fff irq 16 at device 0.0 on pci2
em2: Using MSIX interrupts
em2: Using TXD_LOW instead of TXDW
em2: [FILTER]
em2: [FILTER]
em2: [FILTER]
em2: Ethernet address: 00:1b:21:3b:4d:4d


em2@pci0:2:0:0: class=0x020000 card=0xa01f8086 chip=0x10d38086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    class      = network
    subclass   = ethernet

em3@pci0:4:0:0: class=0x020000 card=0xa01f8086 chip=0x10d38086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    class      = network
    subclass   = ethernet


2 server
lagg1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=19b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4>
        ether 00:1b:21:1b:19:5d
        media: Ethernet autoselect
        status: active
        laggproto lacp
        laggport: em4 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
        laggport: em1 flags=18<COLLECTING,DISTRIBUTING>

em1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=19b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4>
        ether 00:1b:21:1b:19:5d
        media: Ethernet autoselect (1000baseTX <full-duplex>)
        status: active
        lagg: laggdev lagg1

# less /var/run/dmesg.boot |grep em1
em1: <Intel(R) PRO/1000 Network Connection 6.9.6.Yandex[$Revision: 1.36.2.17 $]> port 0x4000-0x401f mem 0xd0320000-0xd033ffff,0xd0300000-0xd031ffff irq 16 at device 0.0 on pci3
em1: Using MSI interrupt
em1: Using TXD_LOW instead of TXDW
em1: [FILTER]
em1: Ethernet address: 00:1b:21:1b:19:5d


em1@pci0:3:0:0: class=0x020000 card=0x10838086 chip=0x10b98086 rev=0x06 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82572EI PRO/1000 PT Desktop Adapter (Copper)'
    class      = network
    subclass   = ethernet
em4@pci0:5:0:0: class=0x020000 card=0xa01f8086 chip=0x10d38086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    class      = network
    subclass   = ethernet


Error log:
Apr 16 00:27:31 2 kernel: em4: link state changed to UP
Apr 16 00:27:34 2 kernel: em4: watchdog timeout -- resetting
Apr 16 00:27:34 2 kernel: em4: Excessive collisions = 0
Apr 16 00:27:34 2 kernel: em4: Sequence errors = 0
Apr 16 00:27:34 2 kernel: em4: Defer count = 0
Apr 16 00:27:34 2 kernel: em4: Missed Packets = 1217754
Apr 16 00:27:34 2 kernel: em4: Receive No Buffers = 0
Apr 16 00:27:34 2 kernel: em4: Receive Length Errors = 0
Apr 16 00:27:34 2 kernel: em4: Receive errors = 0
Apr 16 00:27:34 2 kernel: em4: Crc errors = 0
Apr 16 00:27:34 2 kernel: em4: Alignment errors = 0
Apr 16 00:27:34 2 kernel: em4: Collision/Carrier extension errors = 0
Apr 16 00:27:34 2 kernel: em4: RX overruns = 0
Apr 16 00:27:34 2 kernel: em4: watchdog timeouts = 143
Apr 16 00:27:34 2 kernel: em4: RX MSIX IRQ = 1654280804 TX MSIX IRQ = 1491971579 LINK MSIX IRQ = 1214367
Apr 16 00:27:34 2 kernel: em4: XON Rcvd = 203508246
Apr 16 00:27:34 2 kernel: em4: XON Xmtd = 3183073363
Apr 16 00:27:34 2 kernel: em4: XOFF Rcvd = 202792650
Apr 16 00:27:34 2 kernel: em4: XOFF Xmtd = 3170508497
Apr 16 00:27:34 2 kernel: em4: Good Packets Rcvd = 108209172443
Apr 16 00:27:34 2 kernel: em4: Good Packets Xmtd = 113645818564
Apr 16 00:27:34 2 kernel: em4: TSO Contexts Xmtd = 0
Apr 16 00:27:34 2 kernel: em4: TSO Contexts Failed = 0
Apr 16 00:27:34 2 kernel: em4: Adapter hardware address = 0xc52a0218
Apr 16 00:27:34 2 kernel: em4: CTRL = 0x58100248 RCTL = 0x801a
Apr 16 00:27:34 2 kernel: em4: Packet buffer = Tx=20k Rx=20k
Apr 16 00:27:34 2 kernel: em4: Flow control watermarks high = 18432 low = 16932
Apr 16 00:27:34 2 kernel: em4: tx_int_delay = 0, tx_abs_int_delay = 64
Apr 16 00:27:34 2 kernel: em4: rx_int_delay = 0, rx_abs_int_delay = 66
Apr 16 00:27:34 2 kernel: em4: fifo workaround = 0, fifo_reset_count = 0
Apr 16 00:27:34 2 kernel: em4: hw tdh = 0, hw tdt = 1
Apr 16 00:27:34 2 kernel: em4: hw rdh = 0, hw rdt = 4095, next_rx_desc_to_check = 0
Apr 16 00:27:34 2 kernel: em4: Num Tx descriptors avail = 4095
Apr 16 00:27:34 2 kernel: em4: Tx Descriptors not avail1 = 12063
Apr 16 00:27:34 2 kernel: em4: Tx Descriptors not avail2 = 0
Apr 16 00:27:34 2 kernel: em4: Std mbuf failed = 0
Apr 16 00:27:34 2 kernel: em4: Std mbuf cluster failed = 6
Apr 16 00:27:34 2 kernel: em4: Driver dropped packets = 0
Apr 16 00:27:34 2 kernel: em4: Driver tx dma failure in encap = 0
Apr 16 00:27:34 2 kernel: em4: Packets pended due to reorder = 0
Apr 16 00:27:34 2 kernel: em4: RX interrupts has been masked = 77251713
Apr 16 00:27:34 2 kernel: em4: TX interrupts has been generated = 0
Apr 16 00:27:34 2 kernel: em4: link state changed to DOWN


tcpdump -i em4
00:47:06.511867 LACPv1, length: 110
00:47:36.997247 LACPv1, length: 110



After reboot for some time all is normalised.

Fix: 

While only reboot :(
How-To-Repeat: To connect 2 servers directly through lagg.
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2010-04-19 05:57:52 UTC
Responsible Changed
From-To: freebsd-i386->freebsd-net

Over to maintainer(s).
Comment 2 slava 2010-04-29 05:41:36 UTC
3 days ago has refreshed one of servers to 8.0-STABLE from *default 
date=2010.04.05.00.00.00, the situation is a bit now another. Watchdog 
is not present, but the interface from lagg is in a state


lagg1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 
0 mtu 1500
        
options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>             
        ether 
00:1b:21:1b:19:5d                                                   
        media: Ethernet 
autoselect                                                
        status: 
active                                                            
        laggproto 
lacp                                                            
        laggport: em4 
flags=18<COLLECTING,DISTRIBUTING>                           
        laggport: em1 
flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>                    

em4: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 
mtu 1500 
        
options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>             
        ether 
00:1b:21:1b:19:5d                                                   
        media: Ethernet 1000baseT (1000baseT 
<full-duplex>)                       
        status: 
active                                                            
Has tried to make 
ifconfig lagg1 -laggport em4
and then
ifconfig lagg1 laggport em4
has not helped.
Comment 3 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 08:00:27 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped