I'm able to easily make if_bridge spew thousands of output errors with a really simple config and using iperf3. this results into random "no buffer space" errors on the host and disruptions to services. only reproduceable when using a bridge - if no bridge involved there are no errors or disruptions. OS is 14.1-RELEASE-p2 on amd64. /etc/rc.conf: cloned_interfaces="bridge0" ifconfig_bridge0="addm igb0 192.168.0.150/24 up" ifconfig_igb0="-txcsum -txcsum6 -tso up" (the reason for turning off certain flags is to prevent link-flap when the VM is disconnected from the bridge) netstat -i: (note the Oerrs field below) Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll igb0 1500 <Link#1> 5c:ed:8c:e9:c2:48 50259033 0 0 53166489 0 0 igb1* 1500 <Link#2> 5c:ed:8c:e9:c2:49 0 0 0 0 0 0 igb2* 1500 <Link#3> 5c:ed:8c:e9:c2:4a 0 0 0 0 0 0 igb3* 1500 <Link#4> 5c:ed:8c:e9:c2:4b 0 0 0 0 0 0 lo0 16384 <Link#5> lo0 24959 0 0 24959 0 0 lo0 - localhost localhost 0 - - 0 - - lo0 - fe80::%lo0/64 fe80::1%lo0 0 - - 0 - - lo0 - your-net localhost 24959 - - 24959 - - bridge0 1500 <Link#6> 58:9c:fc:00:07:00 103262907 0 0 103141654 130543 0 bridge0 - 192.168.0.0/24 192.168.0.150 235137 - - 225757 - - ue0 1500 <Link#7> 72:84:d1:bf:ad:2f 3587 0 0 3588 0 0 ue0 - 16.1.15.0/30 16.1.15.2 3308 - - 3308 - - tap0 1500 <Link#8> 58:9c:fc:10:f3:0f 53003359 0 0 48249364 0 0 can reproduce across multiple systems of varying compute and network power (have tested on ix* NICs with similar results) iperf3 cmd line used: iperf3 -P4 -c 192.168.0.5 -t 60 remote iperf3 system is a windows server, network is gigabit. errors occur only when testing in one direction (FreeBSD bridge client -> remote system server). if i don't use if_bridge, ie: ifconfig_igb0="-txcsum -txcsum6 -tso 192.168.0.150/24 up" then no errors or dropped packets are seen at all. is my config correct? are there known issues with if_bridge in freebsd? dmesg below: ---<<BOOT>>--- Copyright (c) 1992-2023 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 14.1-RELEASE releng/14.1-n267679-10e31f0946d8 GENERIC amd64 FreeBSD clang version 18.1.5 (https://github.com/llvm/llvm-project.git llvmorg-18.1.5-0-g617a15a9eac9) VT(efifb): resolution 1024x768 CPU: Intel(R) Xeon(R) E-2314 CPU @ 2.80GHz (2808.00-MHz K8-class CPU) Origin="GenuineIntel" Id=0xa0671 Family=0x6 Model=0xa7 Stepping=1 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x7ffafbff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND> AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM> AMD Features2=0x121<LAHF,ABM,Prefetch> Structured Extended Features=0xf2bf67ef<FSGSBASE,TSCADJ,SGX,BMI1,AVX2,FDPEXC,SMEP,BMI2,ERMS,INVPCID,NFPUSG,MPX,AVX512F,AVX512DQ,RDSEED,ADX,SMAP,AVX512IFMA,CLFLUSHOPT,PROCTRACE,AVX512CD,SHA,AVX512BW,AVX512VL> Structured Extended Features2=0x40405f5e<AVX512VBMI,UMIP,PKU,OSPKE,AVX512VBMI2,GFNI,VAES,VPCLMULQDQ,AVX512VNNI,AVX512BITALG,AVX512VPOPCNTDQ,RDPID,SGXLC> Structured Extended Features3=0xbc000410<FSRM,MD_CLEAR,IBPB,STIBP,L1DFL,ARCH_CAP,SSBD> XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES> IA32_ARCH_CAPS=0x2023c6b<RDCL_NO,IBRS_ALL,SKIP_L1DFL_VME,MDS_NO> VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr TSC: P-state invariant, performance statistics real memory = 17179869184 (16384 MB) avail memory = 16503263232 (15738 MB) Event timer "LAPIC" quality 600 ACPI APIC Table: <HPE Server2 > FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 1 package(s) x 4 core(s) random: registering fast source Intel Secure Key RNG random: fast provider: "Intel Secure Key RNG" random: unblocking device. ioapic0 <Version 2.0> irqs 0-119 Launching APs: 2 3 1 random: entropy device external interface kbd1 at kbdmux0 efirtc0: <EFI Realtime Clock> efirtc0: registered as a time-of-day clock, resolution 1.000000s smbios0: <System Management BIOS> at iomem 0x72fc8000-0x72fc801e smbios0: Version: 3.3, BCD Revision: 3.3 aesni0: <AES-CBC,AES-CCM,AES-GCM,AES-ICM,AES-XTS,SHA1,SHA256> acpi0: <HPE Server2> acpi0: Power Button (fixed) attimer0: <AT timer> port 0x40-0x43,0x50-0x53 irq 0 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0 Timecounter "HPET" frequency 24000000 Hz quality 950 Event timer "HPET" frequency 24000000 Hz quality 550 Event timer "HPET1" frequency 24000000 Hz quality 440 Event timer "HPET2" frequency 24000000 Hz quality 440 Event timer "HPET3" frequency 24000000 Hz quality 440 Event timer "HPET4" frequency 24000000 Hz quality 440 Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x508-0x50b on acpi0 apei0: <ACPI Platform Error Interface> on acpi0 acpi_syscontainer0: <System Container> on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff numa-domain 0 on acpi0 pci0: <ACPI PCI bus> numa-domain 0 on pcib0 xhci0: <Intel Tiger Lake-H USB 3.2 controller> mem 0x4000000000-0x400000ffff at device 20.0 numa-domain 0 on pci0 xhci0: 32 bytes context size, 64-bit DMA usbus0 numa-domain 0 on xhci0 usbus0: 5.0Gbps Super Speed USB v3.0 pci0: <memory, RAM> at device 20.2 (no driver attached) pci0: <simple comms> at device 22.0 (no driver attached) pci0: <simple comms> at device 22.4 (no driver attached) ahci0: <AHCI SATA controller> port 0x3040-0x3047,0x3048-0x304b,0x3020-0x303f mem 0x80580000-0x80581fff,0x80584000-0x805840ff,0x80583000-0x805837ff at device 23.0 numa-domain 0 on pci0 ahci0: AHCI v1.31 with 4 6Gbps ports, Port Multiplier not supported ahcich2: <AHCI channel> at channel 2 on ahci0 ahcich3: <AHCI channel> at channel 3 on ahci0 ahcich4: <AHCI channel> at channel 4 on ahci0 ahcich5: <AHCI channel> at channel 5 on ahci0 ahciem0: <AHCI enclosure management bridge> on ahci0 pcib1: <ACPI PCI-PCI bridge> at device 27.0 numa-domain 0 on pci0 pci1: <ACPI PCI bus> numa-domain 0 on pcib1 igb0: <Intel(R) I350 (Copper)> port 0x2060-0x207f mem 0x80300000-0x803fffff,0x8040c000-0x8040ffff at device 0.0 numa-domain 0 on pci1 igb0: EEPROM V1.63-0 Option ROM V1-b3310-p0 eTrack 0x8000119d igb0: Using 1024 TX descriptors and 1024 RX descriptors igb0: Using 4 RX queues 4 TX queues igb0: Using MSI-X interrupts with 5 vectors igb0: Ethernet address: 5c:ed:8c:e9:c2:48 igb0: netmap queues/slots: TX 4/1024, RX 4/1024 igb1: <Intel(R) I350 (Copper)> port 0x2040-0x205f mem 0x80200000-0x802fffff,0x80408000-0x8040bfff at device 0.1 numa-domain 0 on pci1 igb1: EEPROM V1.63-0 Option ROM V1-b3310-p0 eTrack 0x8000119d igb1: Using 1024 TX descriptors and 1024 RX descriptors igb1: Using 4 RX queues 4 TX queues igb1: Using MSI-X interrupts with 5 vectors igb1: Ethernet address: 5c:ed:8c:e9:c2:49 igb1: netmap queues/slots: TX 4/1024, RX 4/1024 igb2: <Intel(R) I350 (Copper)> port 0x2020-0x203f mem 0x80100000-0x801fffff,0x80404000-0x80407fff at device 0.2 numa-domain 0 on pci1 igb2: EEPROM V1.63-0 Option ROM V1-b3310-p0 eTrack 0x8000119d igb2: Using 1024 TX descriptors and 1024 RX descriptors igb2: Using 4 RX queues 4 TX queues igb2: Using MSI-X interrupts with 5 vectors igb2: Ethernet address: 5c:ed:8c:e9:c2:4a igb2: netmap queues/slots: TX 4/1024, RX 4/1024 igb3: <Intel(R) I350 (Copper)> port 0x2000-0x201f mem 0x80000000-0x800fffff,0x80400000-0x80403fff at device 0.3 numa-domain 0 on pci1 igb3: EEPROM V1.63-0 Option ROM V1-b3310-p0 eTrack 0x8000119d igb3: Using 1024 TX descriptors and 1024 RX descriptors igb3: Using 4 RX queues 4 TX queues igb3: Using MSI-X interrupts with 5 vectors igb3: Ethernet address: 5c:ed:8c:e9:c2:4b igb3: netmap queues/slots: TX 4/1024, RX 4/1024 pcib2: <ACPI PCI-PCI bridge> at device 27.6 numa-domain 0 on pci0 pci2: <ACPI PCI bus> numa-domain 0 on pcib2 pcib3: <ACPI PCI-PCI bridge> at device 29.0 numa-domain 0 on pci0 pci3: <ACPI PCI bus> numa-domain 0 on pcib3 vgapci0: <VGA-compatible display> mem 0xa4000000-0xa4ffffff,0xa5b98000-0xa5b9bfff,0xa5000000-0xa57fffff at device 0.1 numa-domain 0 on pci3 vgapci0: Boot video device ehci0: <EHCI (generic) USB 2.0 controller> mem 0xa5b9c000-0xa5b9c0ff at device 0.4 numa-domain 0 on pci3 usbus1: EHCI version 1.0 usbus1 numa-domain 0 on ehci0 usbus1: 480Mbps High Speed USB v2.0 pcib4: <ACPI PCI-PCI bridge> at device 29.1 numa-domain 0 on pci0 pci4: <ACPI PCI bus> numa-domain 0 on pcib4 pcib5: <ACPI PCI-PCI bridge> at device 29.2 numa-domain 0 on pci0 pci5: <ACPI PCI bus> numa-domain 0 on pcib5 pcib6: <ACPI PCI-PCI bridge> at device 29.3 numa-domain 0 on pci0 pci6: <ACPI PCI bus> numa-domain 0 on pcib6 pcib7: <ACPI PCI-PCI bridge> at device 29.4 numa-domain 0 on pci0 pci7: <ACPI PCI bus> numa-domain 0 on pcib7 isab0: <PCI-ISA bridge> at device 31.0 numa-domain 0 on pci0 isa0: <ISA bus> numa-domain 0 on isab0 pci0: <serial bus> at device 31.5 (no driver attached) cpu0: <ACPI CPU> numa-domain 0 on acpi0 uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0 uart0: <Non-standard ns8250 class UART with FIFOs> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 vga0: <Generic ISA VGA> at port 0x3b0-0x3bb iomem 0xb0000-0xb7fff pnpid PNP0900 on isa0 atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbdc0: non-PNP ISA device will be removed from GENERIC in FreeBSD 15. hwpstate_intel0: <Intel Speed Shift> numa-domain 0 on cpu0 hwpstate_intel1: <Intel Speed Shift> numa-domain 0 on cpu1 hwpstate_intel2: <Intel Speed Shift> numa-domain 0 on cpu2 hwpstate_intel3: <Intel Speed Shift> numa-domain 0 on cpu3 Timecounter "TSC-low" frequency 1404004245 Hz quality 1000 Timecounters tick every 1.000 msec ZFS filesystem version: 5 ZFS storage pool version: features support (5000) ugen1.1: <(0x103c) EHCI root HUB> at usbus1 Trying to mount root from zfs:zroot/ROOT/default []... ugen0.1: <Intel XHCI root HUB> at usbus0 uhub0 numa-domain 0 on usbus0 uhub0: <Intel XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0 uhub1 numa-domain 0 on usbus1 uhub1: <(0x103c) EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1 ada0 at ahcich2 bus 0 scbus0 target 0 lun 0 ada0: <ST4000NM002A-2HZ101 SN03> ACS-4 ATA SATA 3.x device ada0: Serial Number WS255C4V ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 3815447MB (7814037168 512 byte sectors) ada1 at ahcich3 bus 0 scbus1 target 0 lun 0 ada1: <ST4000NM002A-2HZ101 SN03> ACS-4 ATA SATA 3.x device ada1: Serial Number WS255BGF ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada1: Command Queueing enabled ada1: 3815447MB (7814037168 512 byte sectors) ada2 at ahcich4 bus 0 scbus2 target 0 lun 0 ada2: <INTEL SSDSC2BB240G6 G2010130> ACS-2 ATA SATA 3.x device ada2: Serial Number BTWA518300MH240AGN ada2: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes) ada2: Command Queueing enabled ada2: 228936MB (468862128 512 byte sectors) ada3 at ahcich5 bus 0 scbus3 target 0 lun 0 ada3: <INTEL SSDSC2BB240G6 G2010130> ACS-2 ATA SATA 3.x device ada3: Serial Number BTWA51740BM1240AGN ada3: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes) ada3: Command Queueing enabled ada3: 228936MB (468862128 512 byte sectors) ses0 at ahciem0 bus 0 scbus4 target 0 lun 0 ses0: <AHCI SGPIO Enclosure 2.00 0001> SEMB S-E-S 2.00 device ses0: SEMB SES Device ses0: ada0,pass0 in 'Slot 02', SATA Slot: scbus0 target 0 ses0: ada1,pass1 in 'Slot 03', SATA Slot: scbus1 target 0 ses0: ada2,pass2 in 'Slot 04', SATA Slot: scbus2 target 0 ses0: ada3,pass3 in 'Slot 05', SATA Slot: scbus3 target 0 uhub0: 22 ports with 22 removable, self powered Root mount waiting for: usbus0 usbus1 Root mount waiting for: usbus0 usbus1 usb_msc_auto_quirk: UQ_MSC_NO_GETMAXLUN set for USB mass storage device Generic Mass Storage Device (0x14cd:0x1212) usb_msc_auto_quirk: UQ_MSC_NO_PREVENT_ALLOW set for USB mass storage device Generic Mass Storage Device (0x14cd:0x1212) usb_msc_auto_quirk: UQ_MSC_NO_SYNC_CACHE set for USB mass storage device Generic Mass Storage Device (0x14cd:0x1212) ugen0.2: <Generic Mass Storage Device> at usbus0 umass0 numa-domain 0 on uhub0 umass0: <Generic Mass Storage Device, class 0/0, rev 2.00/1.00, addr 1> on usbus0 umass0: SCSI over Bulk-Only; quirks = 0xc100 umass0:5:0: Attached to scbus5 da0 at umass-sim0 bus 0 scbus5 target 0 lun 0 da0: <Mass Storage Device 1.00> Removable Direct Access SCSI device da0: Serial Number 121220160204 da0: 40.000MB/s transfers da0: 3968MB (8126464 512 byte sectors) da0: quirks=0x2<NO_6_BYTE> uhub1: 8 ports with 8 removable, self powered Root mount waiting for: usbus0 usbus1 ugen1.2: <HPE Virtual NIC> at usbus1 uhub_explore: illegal enable change, port 4 usb_msc_auto_quirk: UQ_MSC_NO_GETMAXLUN set for USB mass storage device Generic Mass Storage Device (0x14cd:0x1212) usb_msc_auto_quirk: UQ_MSC_NO_PREVENT_ALLOW set for USB mass storage device Generic Mass Storage Device (0x14cd:0x1212) usb_msc_auto_quirk: UQ_MSC_NO_SYNC_CACHE set for USB mass storage device Generic Mass Storage Device (0x14cd:0x1212) ugen0.3: <Generic Mass Storage Device> at usbus0 umass1 numa-domain 0 on uhub0 umass1: <Generic Mass Storage Device, class 0/0, rev 2.00/1.00, addr 2> on usbus0 umass1: SCSI over Bulk-Only; quirks = 0xc100 umass1:6:1: Attached to scbus6 da1 at umass-sim1 bus 1 scbus6 target 0 lun 0 da1: <Mass Storage Device 1.00> Removable Direct Access SCSI device da1: Serial Number 121220160204 da1: 40.000MB/s transfers da1: 3968MB (8126464 512 byte sectors) da1: quirks=0x2<NO_6_BYTE> ugen0.4: <vendor 0x0424 product 0x2660> at usbus0 uhub2 numa-domain 0 on uhub0 uhub2: <vendor 0x0424 product 0x2660, class 9/0, rev 2.00/8.01, addr 3> on usbus0 uhub2: 2 ports with 1 removable, self powered ichsmb0: <Intel Tiger Lake SMBus controller> port 0xefa0-0xefbf mem 0x4000014000-0x40000140ff at device 31.4 numa-domain 0 on pci0 smbus0: <System Management Bus> numa-domain 0 on ichsmb0 driver bug: Unable to set devclass (class: atrtc devname: (unknown)) acpi_wmi0: <ACPI-WMI mapping> on acpi0 acpi_wmi0: cannot find EC device bridge0: Ethernet address: 58:9c:fc:00:07:00 igb0: link state changed to UP lo0: link state changed to UP igb0: link state changed to DOWN bridge0: link state changed to UP igb0: promiscuous mode enabled cdceem0 numa-domain 0 on uhub1 cdceem0: <HPE Virtual NIC, class 2/12, rev 2.00/0.01, addr 2> on usbus1 ue0: <USB Ethernet> on cdceem0 ue0: Ethernet address: 72:84:d1:bf:ad:2f igb0: link state changed to UP driver bug: Unable to set devclass (class: atrtc devname: (unknown)) tap0: Ethernet address: 58:9c:fc:10:f3:0f tap0: promiscuous mode enabled tap0: link state changed to UP driver bug: Unable to set devclass (class: atrtc devname: (unknown)) ipmi0: <IPMI System Interface> port 0xca2-0xca3 on acpi0 ipmi0: KCS mode found at io 0xca2 on acpi ipmi0: IPMI device rev. 3, firmware rev. 3.01, version 2.0, device support mask 0x9f ipmi0: Number of channels 2 ipmi0: Attached watchdog ipmi0: Establishing power cycle handler
(In reply to pascal.guitierrez from comment #0) The if_bridge(4) promote underlaying errors from bridge members, ``` 2108 if ((err = dst_ifp->if_transmit(dst_ifp, m))) { 2109 int n; 2110 2111 for (m = m0, n = 1; m != NULL; m = m0, n++) { 2112 m0 = m->m_nextpkt; 2113 m_freem(m); 2114 } 2115 if_inc_counter(sc->sc_ifp, IFCOUNTER_OERRORS, n); 2116 break; 2117 } ``` may you please also share the statistics for output dropped packets, aka `netstat -di` ?
(In reply to Zhenlei Huang from comment #1) Thanks for your response. Just ran the tests again. There are no dropped packets detected even though the Oerr count is increasing, see below for netstat -di output: Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll Drop igb0 1500 <Link#1> 5c:ed:8c:e9:c2:48 91053412 0 0 78158209 0 0 0 igb1* 1500 <Link#2> 5c:ed:8c:e9:c2:49 0 0 0 0 0 0 0 igb2* 1500 <Link#3> 5c:ed:8c:e9:c2:4a 0 0 0 0 0 0 0 igb3* 1500 <Link#4> 5c:ed:8c:e9:c2:4b 0 0 0 0 0 0 0 lo0 16384 <Link#5> lo0 62786 0 0 62786 0 0 0 lo0 - localhost localhost 0 - - 0 - - - lo0 - fe80::%lo0/64 fe80::1%lo0 0 - - 0 - - - lo0 - your-net localhost 62786 - - 62786 - - - bridge0 1500 <Link#6> 58:9c:fc:00:07:00 168029633 0 0 168394069 269681 0 0 bridge0 - 192.168.0.0/24 192.168.0.150 719881 - - 1307016 - - - ue0 1500 <Link#7> 72:84:d1:bf:ad:2f 8907 0 0 8908 0 0 0 ue0 - 16.1.15.0/30 16.1.15.2 8208 - - 8208 - - - Here is the relevant ifconfig: igb0: flags=1008943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500 options=4a520b9<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,HWSTATS,MEXTPG> ether 5c:ed:8c:e9:c2:48 media: Ethernet autoselect (1000baseT <full-duplex>) status: active nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> bridge0: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500 options=0 ether 58:9c:fc:00:07:00 inet 192.168.0.150 netmask 0xffffff00 broadcast 192.168.0.255 id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200 root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0 member: igb0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP> ifmaxaddr 0 port 1 priority 128 path cost 55 groups: bridge vm-switch viid-4c918@ nd6 options=9<PERFORMNUD,IFDISABLED>
(In reply to pascal.guitierrez from comment #2) Emm, igb(4) has been converted to use iflib(4). It seems that the iflib implementation does not report statistics of output dropped packets ( IFCOUNTER_OQDROPS ). May you please have a look at the driver statistics report from sysctl ? I do not have igb(4) devices but have em(4) which use the same driver e1000 with igb(4). For em(4) the statistics report for dropped packets are ``` # sysctl dev.em.0.iflib | grep r_drops dev.em.0.iflib.txq0.r_drops: 8402780 ``` For igb(4) I guess they should be ``` # sysctl dev.igb.0.iflib | grep r_drops ```
(In reply to Zhenlei Huang from comment #3) Here's the output from sysctl dev.igb.0.iflib | grep r_drops dev.igb.0.iflib.txq3.r_drops: 37 dev.igb.0.iflib.txq2.r_drops: 37 dev.igb.0.iflib.txq1.r_drops: 0 dev.igb.0.iflib.txq0.r_drops: 4184
I just set up the same test the best I can (albeit a different IP address assigned to the bridge) and using an igb interface (Intel I350). I can not reproduce the problem on 14.0-RELEASE-p6. I can try again with 14.1 later this week but I can not bring this system I used for testing offline until then. After performing the same iperf3 test with the identical ifconfig parameters, connected to a bridge with the iperf client being the FreeBSD machine AND just for testing purposes the other way around: root@hv-kryptos:~ # sysctl dev.igb.0.iflib | grep r_drops dev.igb.0.iflib.txq7.r_drops: 0 dev.igb.0.iflib.txq6.r_drops: 0 dev.igb.0.iflib.txq5.r_drops: 0 dev.igb.0.iflib.txq4.r_drops: 0 dev.igb.0.iflib.txq3.r_drops: 0 dev.igb.0.iflib.txq2.r_drops: 0 dev.igb.0.iflib.txq1.r_drops: 0 dev.igb.0.iflib.txq0.r_drops: 0 I might need to push the system harder to get some error generation perhaps. What happens if you disable checksum offloading and / or segmentation offloading? Also, what sort of iperf stats are you getting? I am not quite reaching the full gigabit link rates on my interface, so I'd just like a reference in case this only happens when getting closer to the full link bandwidth. It also looks like our dmesg output differs a bit (particularly the option roms are different): igb0: <Intel(R) I350 (Copper)> port 0x6020-0x603f mem 0xc7a00000-0xc7afffff,0xc7c04000-0xc7c07fff irq 26 at device 0.0 numa-domain 0 on pci3 igb0: EEPROM V1.63-0 Option ROM V0-b385-p144 eTrack 0x80000e74 igb0: Using 1024 TX descriptors and 1024 RX descriptors igb0: Using 8 RX queues 8 TX queues igb0: Using MSI-X interrupts with 9 vectors igb0: Ethernet address: 70:df:2f:ae:58:94 igb0: netmap queues/slots: TX 8/1024, RX 8/1024
(In reply to fatalnix from comment #5) As a fix for the previous message where I asked what happens if you disable offloading, I meant to say, "What happens if you do not disable offloading".
(In reply to fatalnix from comment #6) it's my understanding that adding the device to the bridge switches off the NIC offloading features automatically. i'm getting 942Mb/sec and interestingly the dropped packets are limited to a single txq, which changes between each test run: dev.igb.0.iflib.txq3.r_drops: 188 dev.igb.0.iflib.txq2.r_drops: 130 dev.igb.0.iflib.txq1.r_drops: 69 <-- only one of these increments during each test dev.igb.0.iflib.txq0.r_drops: 4636
(In reply to pascal.guitierrez from comment #7) > i'm getting 942Mb/sec and interestingly the dropped packets are limited to a single txq, which changes between each test run: If you're testing a single flow (i.e., the default for iperf3) then all packets would be steered to a single queue. Otherwise you'd end up with packet reordering. If you have multiple flows, then that's more surprising.
(In reply to Mark Johnston from comment #8) Thanks Mark, i'm using iperf3 -s on the server and iperf3 -P4 -c 192.168.0.5 -t 60 on the client, which i believe does create multiple flows: tcp4 0 1303796 192.168.0.150.53387 192.168.0.5.5201 ESTABLISHED tcp4 0 1310084 192.168.0.150.17413 192.168.0.5.5201 ESTABLISHED tcp4 0 1311532 192.168.0.150.56093 192.168.0.5.5201 ESTABLISHED tcp4 0 1295148 192.168.0.150.54089 192.168.0.5.5201 ESTABLISHED Here are the counters before the test run: dev.igb.0.iflib.txq3.r_drops: 570 dev.igb.0.iflib.txq2.r_drops: 130 dev.igb.0.iflib.txq1.r_drops: 168 dev.igb.0.iflib.txq0.r_drops: 6250 Here are the counters after the 60 second test: dev.igb.0.iflib.txq3.r_drops: 570 dev.igb.0.iflib.txq2.r_drops: 130 dev.igb.0.iflib.txq1.r_drops: 168 dev.igb.0.iflib.txq0.r_drops: 6360 <-- only this value increased Is that expected behaviour?
I have upgraded to 14.1-release-p2 and I am now able to reproduce this issue using the same tests. I did a few different tests, ensuring a solid 950~ Mbit/s link during my iperf3 tests. Each of them raised error when using the FreeBSD box as the iperf3 client. For 60 seconds each as you did, I tested though with iperf3 -P4: bridge123 1500 <Link#18> 58:9c:fc:10:ff:92 624390 0 0 4886859 13 0 iperf -P8 bridge123 1500 <Link#18> 58:9c:fc:10:ff:92 1262380 0 0 9777020 187 0 bridge123 - 192.168.88.0/24 192.168.88.3 1216769 - - 9777104 - - and iperf3 -P40, which is as many threads as I have physically: bridge123 1500 <Link#18> 58:9c:fc:10:ff:92 2179910 0 0 14681352 1756 0 bridge123 - 192.168.88.0/24 192.168.88.3 2134298 - - 14683001 - - The Oerrs field increases in error significantly, but if you notice, different P values do not seem to change the rate of errors generated either. Checking with sysctl after: # sysctl dev.igb.0.iflib | grep r_drops dev.igb.0.iflib.txq7.r_drops: 94 dev.igb.0.iflib.txq6.r_drops: 291 dev.igb.0.iflib.txq5.r_drops: 52 dev.igb.0.iflib.txq4.r_drops: 356 dev.igb.0.iflib.txq3.r_drops: 232 dev.igb.0.iflib.txq2.r_drops: 252 dev.igb.0.iflib.txq1.r_drops: 311 dev.igb.0.iflib.txq0.r_drops: 157 This stayed at 0 yesterday when I was still running on 14.0 release p6.
(In reply to fatalnix from comment #10) Ugh. Actually, it does increase with the thread increase. I was looking at the wrong number heheh. So, this might be sometghing that's load related somehow.
Could you try adding the following flags : -rxcsum -rxcsum6 -txcsum -txcsum6 -lro -tso -vlanhwtso -vlanhwfilter I use this : ifconfig_lagg0="laggproto lacp laggport igb0 laggport igb1 -rxcsum -rxcsum6 -txcsum -txcsum6 -lro -tso -vlanhwtso -vlanhwfilter" I have a supermicro with an intel card with a hardware bug. This causes extremely erratic but consistent slow connections. Took me ages to troubleshoot. Had the same behavior under openindiana. Using the flags from above I would get consistent linespeed. igb0@pci0:0:20:0: class=0x020000 rev=0x03 hdr=0x00 vendor=0x8086 device=0x1f41 subvendor=0x15d9 subdevice=0x1f41 vendor = 'Intel Corporation' device = 'Ethernet Connection I354' class = network subclass = ethernet
(In reply to erwin.freebsd-bugzilla from comment #12) thanks for your reply. can you run the same iperf3 tests without and with a bridge involved? In my testing, there are no dropped packets when using the igb0 device, it's just when it's part of a bridge that the packet drops occur.
(In reply to pascal.guitierrez from comment #13) .. and yes I repeated the tests with those flags and i'm still seeing packet drops
What does top -psh look like when running the test? It is possible you are running out of CPU, especially if any filtering is enabled.
(In reply to Kevin Bowling from comment #15) Thanks Kevin no bridge filtering is enabled (sysctl net.link.bridge): net.link.bridge.ipfw: 0 net.link.bridge.log_mac_flap: 1 net.link.bridge.allow_llz_overlap: 0 net.link.bridge.inherit_mac: 0 net.link.bridge.log_stp: 0 net.link.bridge.pfil_local_phys: 0 net.link.bridge.pfil_member: 0 net.link.bridge.ipfw_arp: 0 net.link.bridge.pfil_bridge: 0 net.link.bridge.pfil_onlyip: 1 and system does not appear to be CPU constrained, here's top -PSH below during test run using iperf -P4: last pid: 25350; load averages: 0.19, 0.25, 0.28 up 4+19:09:14 11:32:05 321 threads: 5 running, 301 sleeping, 15 waiting CPU 0: 0.0% user, 0.0% nice, 5.9% system, 0.0% interrupt, 94.1% idle CPU 1: 0.0% user, 0.0% nice, 4.3% system, 0.0% interrupt, 95.7% idle CPU 2: 0.4% user, 0.0% nice, 7.5% system, 0.0% interrupt, 92.1% idle CPU 3: 0.0% user, 0.0% nice, 6.7% system, 0.0% interrupt, 93.3% idle Mem: 16M Active, 24M Inact, 1744K Laundry, 6786M Wired, 8656M Free ARC: 5248M Total, 854M MFU, 3834M MRU, 537M Header, 22M Other 4644M Compressed, 6394M Uncompressed, 1.38:1 Ratio PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 11 root 187 ki31 0B 64K CPU0 0 108.9H 93.56% idle{idle: cpu0} 11 root 187 ki31 0B 64K CPU1 1 107.8H 93.47% idle{idle: cpu1} 11 root 187 ki31 0B 64K CPU3 3 108.0H 91.74% idle{idle: cpu3} 11 root 187 ki31 0B 64K RUN 2 107.7H 91.54% idle{idle: cpu2} 25350 root 24 0 20M 7800K sbwait 3 0:00 7.80% iperf3{iperf3} 25350 root 23 0 20M 7800K sbwait 0 0:00 7.31% iperf3{iperf3} 0 root -60 - 0B 2096K - 1 1:20 5.95% kernel{if_io_tqg_1} 0 root -60 - 0B 2096K - 0 3:59 5.83% kernel{if_io_tqg_0} 25350 root 21 0 20M 7800K sbwait 3 0:00 1.29% iperf3{iperf3} 25350 root 21 0 20M 7800K sbwait 3 0:00 1.18% iperf3{iperf3} drops are only seen when using if_bridge: dev.igb.0.iflib.txq3.r_drops: 4594 <-- this value is increasing during the above test run dev.igb.0.iflib.txq2.r_drops: 563 dev.igb.0.iflib.txq1.r_drops: 2551 dev.igb.0.iflib.txq0.r_drops: 2625 The value for "Oerrs" on the bridge0 device appears to increase in tandem with the r_drop value: Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll Drop igb0 1500 <Link#1> 5c:ed:8c:e9:c2:48 81252823 0 0 131980027 0 0 0 igb1* 1500 <Link#2> 5c:ed:8c:e9:c2:49 0 0 0 0 0 0 0 igb2* 1500 <Link#3> 5c:ed:8c:e9:c2:4a 0 0 0 0 0 0 0 igb3* 1500 <Link#4> 5c:ed:8c:e9:c2:4b 0 0 0 0 0 0 0 lo0 16384 <Link#5> lo0 62021 0 0 62021 0 0 0 lo0 - localhost localhost 0 - - 0 - - - lo0 - fe80::%lo0/64 fe80::1%lo0 0 - - 0 - - - lo0 - your-net localhost 62021 - - 62021 - - - bridge0 1500 <Link#6> 58:9c:fc:00:07:00 122336886 0 0 199687836 287872 0 0 bridge0 - 192.168.0.0/24 192.168.0.150 13289878 - - 90898935 - - -
Can you show the value of sysctl dev.igb.0.fc? And try your test with dev.igb.0.fc = 0
(In reply to Aleksandr Fedorov from comment #17) currently at default of dev.igb.0.fc=3 changing it has no apparent effect. however what i notice is that re-running the test results in some tests having 0 errors occur and others consistent errors, see below output from iperf3 with only 2 parallel tests (system has 4 cores): this test had no errors (see the "Retr" field): iperf3 -P2 -c 192.168.0.5 -t 600 Connecting to host 192.168.0.5, port 5201 [ 5] local 192.168.0.150 port 47836 connected to 192.168.0.5 port 5201 [ 7] local 192.168.0.150 port 12391 connected to 192.168.0.5 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 57.8 MBytes 484 Mbits/sec 0 2.00 MBytes [ 7] 0.00-1.00 sec 57.9 MBytes 485 Mbits/sec 0 2.00 MBytes [SUM] 0.00-1.00 sec 116 MBytes 969 Mbits/sec 0 - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 1.00-2.00 sec 56.1 MBytes 471 Mbits/sec 0 2.00 MBytes [ 7] 1.00-2.00 sec 56.0 MBytes 470 Mbits/sec 0 2.00 MBytes [SUM] 1.00-2.00 sec 112 MBytes 941 Mbits/sec 0 - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 2.00-3.00 sec 56.1 MBytes 471 Mbits/sec 0 2.00 MBytes [ 7] 2.00-3.00 sec 56.2 MBytes 472 Mbits/sec 0 2.00 MBytes [SUM] 2.00-3.00 sec 112 MBytes 943 Mbits/sec 0 - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 3.00-4.00 sec 56.0 MBytes 470 Mbits/sec 0 2.00 MBytes [ 7] 3.00-4.00 sec 56.0 MBytes 470 Mbits/sec 0 2.00 MBytes [SUM] 3.00-4.00 sec 112 MBytes 940 Mbits/sec 0 ^C- - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 4.00-4.55 sec 30.6 MBytes 469 Mbits/sec 0 2.00 MBytes [ 7] 4.00-4.55 sec 30.8 MBytes 471 Mbits/sec 0 2.00 MBytes [SUM] 4.00-4.55 sec 61.4 MBytes 941 Mbits/sec 0 then if run again, it may experience errors, and if it does, the errors are consistently seen for the rest of the test run: iperf3 -P2 -c 192.168.0.5 -t 600 Connecting to host 192.168.0.5, port 5201 [ 5] local 192.168.0.150 port 60974 connected to 192.168.0.5 port 5201 [ 7] local 192.168.0.150 port 30555 connected to 192.168.0.5 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 48.5 MBytes 406 Mbits/sec 6 2.00 MBytes [ 7] 0.00-1.00 sec 67.0 MBytes 560 Mbits/sec 3 2.00 MBytes [SUM] 0.00-1.00 sec 116 MBytes 966 Mbits/sec 9 - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 1.00-2.00 sec 52.5 MBytes 441 Mbits/sec 12 1.41 MBytes [ 7] 1.00-2.00 sec 59.5 MBytes 500 Mbits/sec 4 1.44 MBytes [SUM] 1.00-2.00 sec 112 MBytes 941 Mbits/sec 16 - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 2.00-3.00 sec 58.5 MBytes 490 Mbits/sec 2 1011 KBytes [ 7] 2.00-3.00 sec 54.2 MBytes 454 Mbits/sec 3 510 KBytes [SUM] 2.00-3.00 sec 113 MBytes 944 Mbits/sec 5 - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 3.00-4.00 sec 53.0 MBytes 445 Mbits/sec 3 2.00 MBytes [ 7] 3.00-4.00 sec 59.2 MBytes 497 Mbits/sec 3 168 KBytes [SUM] 3.00-4.00 sec 112 MBytes 942 Mbits/sec 6 ^C- - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 4.00-4.64 sec 42.5 MBytes 559 Mbits/sec 2 338 KBytes [ 7] 4.00-4.64 sec 29.0 MBytes 381 Mbits/sec 1 2.00 MBytes [SUM] 4.00-4.64 sec 71.5 MBytes 941 Mbits/sec 3 the behaviour occurs independently of the dev.igb.0.fc value (3 or 0)
(In reply to pascal.guitierrez from comment #18) if you watch 'systat -vmstat', does the zero drop case happen when all the igb interrupts are on one core, or two cores? If you rerun the -P2 test several times you should see both cases as RSS hashing differs.
(In reply to Kevin Bowling from comment #19) Thanks Kevin, yes you are right, repeating the test gets to a state where interrupts are processed on a single core and packet drops occur: 13 igb0:rxq0 7972 igb0:rxq1 97 igb0:rxq2 1 igb0:rxq3 On a good run there are zero drops and interrupts being processed on two cores: 8041 igb0:rxq0 1 igb0:rxq1 8038 igb0:rxq2 igb0:rxq3 Do you know if this is expected behaviour or is it a problem?
(In reply to pascal.guitierrez from comment #20) > Do you know if this is expected behaviour or is it a problem? Caveat it has been a long time since I have been line-rating large numbers of FreeBSD systems so I may not be privy to all the developments in the network stack.. but I would say more /known/ than /expected/. With the if_bridge in play maybe you are more likely to see this since that may imply TSO was masked off and you have a more expensive transmit path. I think https://reviews.freebsd.org/D4295 would be one way to fix the issue but I am not sure it is the only or best way. In principle there isn't a problem because dropping packets is an expected behavior of overfull queues at any point of the network and TCP or other protocols should be dealing with it. But it is obviously suboptimal if we have enough context to do something smarter.
(In reply to Kevin Bowling from comment #21) Thinking back a bit harder one thing that might help workaround this is to increase the number of transmit descriptors.. try something like this in /boot/loader.conf: dev.igb.0.iflib.override_ntxqs=4096 and let us know how it does.
(In reply to Kevin Bowling from comment #21) (In reply to Kevin Bowling from comment #22) Interesting, thanks for that. I tried changing dev.igb.0.iflib.override_ntxqs=4096 but to no effect. What did make a difference was to use the rack tcp stack net.inet.tcp.functions_default=rack using rack there was next to no packet loss when interrupts were scheduled onto a single core, obviously not fixing the root cause but is somewhat of a workaround.
(In reply to pascal.guitierrez from comment #23) Thanks for reporting back. That is very interesting. There are two possibilities that come to mind 1) the RACK stack is correctly identifying the loss as feedback for it's flow/congestion control or 2) the RACK stack has some feature that is causing it to not overflow the lower layers. I'll see if I can find someone with more recent TCP stack experience to glance at this and provide any additional insight or suggestions for the base stack.
(In reply to Kevin Bowling from comment #22) (In reply to pascal.guitierrez from comment #23) I had a similar experience last year when I was debugging a ENOBUFS error returned to TCP on using bce NICs. But I am not sure if you can find a similar solution. <snap from 2023> Turns out the root cause is the default NIC send queue length is too small. The enobufs error came from the _IF_QFULL check in ifq.h. However, tuning "sysctl net.link.ifqmaxlen" directly does not work. There is a per NIC interface setup in the driver to setup device tx/rx queues. I have to increase the tx queue "ifq_maxlen" from the device sysctl "hw.bce.tx_pages". After tuning that, I can achieve a stable 1Gbps x 100ms delay BDP. </snap from 2023> Talking about review D4295, it reminds me of `Linux has some work like TCP small queue at the sender side.` Talking about workaround, you may also test the following two patches I prepared in stable/14 branch as a workaround of improving TCP performance in congestion control. https://reviews.freebsd.org/D47218 << apply this patch firstly https://reviews.freebsd.org/D47213 << apply this patch secondly
(In reply to Kevin Bowling from comment #24) Indeed, when it gets an ENOBUFS back, RACK will start pacing the connection. Search for this comment: /* * Failures do not advance the seq counter above. For the * case of ENOBUFS we will fall out and retry in 1ms with * the hpts. Everything else will just have to retransmit * with the timer. * * In any case, we do not want to loop around for another * send without a good reason. */