Bug 246660 - Sporadic LACP Lagg Flap
Summary: Sporadic LACP Lagg Flap
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.1-STABLE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-net mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-05-22 14:11 UTC by nonesuch
Modified: 2020-05-23 05:14 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description nonesuch 2020-05-22 14:11:53 UTC
On FreeBSD 12-STABLE r354698 amd64 with LACP laggs on top of Solarflare SFXGE nics (7k and 8k series cards) attached to arista mlaggs ( 7050's or 7150's running  eos 4.0.19 and newer ). The FreeBSD side sporadically stops setting the lacp distribution flag. Causing the switch to detect a link flap. 


The setup is as follows . Dell R630 and R640's, Supermicro X9DRT Ivy Bridge boards
setup with one Solarflare SFN7122F or SFN8522 runnging Solarflare Firmware 7.1 and  7.4.4 . Each server is setup as a router with pf being used to nat and filter traffic. There is on LACP lagg made of up the two ports going to two upstream Arista 7050 or 7150 switches in a MLAGG setup. The LAGG carries anywhere from 5 to 50 vlans at a time. 

Now the complicated part. This issues happens as what appear to be random times, on routers we have setup and left to "burn in" over a weekend with little or no traffic, and on some routers where they are preforming moderate amounts work.
This issue also did not happen on 10.3-STABLE amd64 . 


Sysctls
=============================

#security.bsd.see_other_uids=0
net.inet.tcp.mssdflt=1460
net.inet.tcp.minmss=536
net.inet.tcp.rfc6675_pipe=1
net.inet.tcp.syncache.rexmtlimit=0  # (default 3)
net.inet.tcp.per_cpu_timers=1
net.inet.ip.fastforwarding=1
#kern.random.harvest.mask=65887
kern.random.harvest.mask=65537
kern.random.sys.harvest.ethernet=0
kern.random.sys.harvest.point_to_point=0
kern.random.sys.harvest.interrupt=0
hw.intr_storm_threshold=10000
kern.ipc.maxsockbuf=16777216
# socket buffers
net.inet.tcp.recvspace=4194304
net.inet.tcp.sendspace=2097152
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.recvbuf_max=16777216
net.inet.tcp.sendbuf_auto=1
net.inet.tcp.recvbuf_auto=1
net.inet.tcp.sendbuf_inc=16384
net.inet.tcp.recvbuf_inc=524288
net.inet.tcp.cc.algorithm=htcp
net.inet.ip.intr_queue_maxlen=2048
net.route.netisr_maxqlen=2048
# Do not send IP redirects (enable fastforwarding path)
net.inet.ip.redirect=0
net.inet6.ip6.redirect=0

===loader.conf===
ipmi_load="YES"
boot_multicons="YES"
boot_serial="YES"
console="comconsole,vidconsole"
net.inet.tcp.tso="0"
autoboot_delay="5"
hw.mfi.mrsas_enable="1"
hw.usb.no_pf="1"        # Disable USB packet filtering
hw.usb.no_shutdown_wait="1"
hw.vga.textmode="1"     # Text mode
machdep.hyperthreading_allowed="0"
geom_mirror_load="YES"
kern.ipc.nmbclusters="1000000"
net.isr.maxqlimit="1000000"
kern.ipc.nmbjumbop=524288
net.isr.bindthreads="0"
net.isr.maxthreads="-1"
net.link.ifqmaxlen="2048"
net.pf.source_nodes_hashsize="1048576"
net.isr.defaultqlimit="2048"
net.inet.tcp.syncache.hashsize="1024"
net.inet.tcp.syncache.bucketlimit="100"
net.inet.tcp.tcbhashsize="65536"
vm.pmap.pti=0
hw.ibrs_disable=1


===LAGG Config===

ifconfig lagg0 laggproto lacp lagghash l2,l3 laggport sfxge0 laggport sfxge1


===sfxge tunings===
kenv hw.sfxge.${NIC0_ID}.max_rss_channels=7
kenv hw.sfxge.${NIC1_ID}.max_rss_channels=7
kenv hw.sfxge.tx_ring=2048
kenv hw.sfxge.rx_ring=4096
kenv hw.sfxge.tx_dpl_get_non_tcp_max=4096
kenv hw.sfxge.tx_dpl_put_max=2048
# This turns of AIM
sysctl dev.sfxge.${NIC0_ID}.int_mod=0
sysctl dev.sfxge.${NIC1_ID}.int_mod=0


PCAPs available on request.