Bug 217606

Summary: Bridge stops working after some days
Product: Base System Reporter: Aiko Barz <aiko>
Component: kernAssignee: freebsd-net (Nobody) <net>
Status: Closed Overcome By Events    
Severity: Affects Some People CC: ae, erj, fk, kp, meyer.sydney, mops, pi
Priority: --- Keywords: patch
Version: 11.0-RELEASE   
Hardware: amd64   
OS: Any   
Attachments:
Description Flags
Set UMA_ZONE_NOFREE and preallocate zone item(s)
none
ElectroBSD r323013-16c8587cb1b5 dmesg on HP Proliant
none
`netstat -m` output while the bridge is in wedged state
none
Diff of if_bridge.c for 11.0 and 11.1 none

Description Aiko Barz 2017-03-07 09:08:24 UTC
Hello,

we recently upgraded our Bridging FWs from 10.1-RELEASE-pxx to 11.0-RELEASE-p8. And since then they stop passing through traffic after some time. In this case after ~4 days. One of them stopped yesterday evening. (We have a failover mechanism to reduce the impact.)

$ uptime
9:26AM  up 4 days, 19:22, 2 users, load averages: 0.12, 0.06, 0.01

bridge0 consists of ix0/ix1:

ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.1.13-k> port 0xecc0-0xecdf mem 0xd9e80000-0xd9efffff,0xd9ff8000-0xd9ffbfff irq 48 at device 0.0 numa-domain 0 on pci2
ix1: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.1.13-k> port 0xece0-0xecff mem 0xd9f00000-0xd9f7ffff,0xd9ffc000-0xd9ffffff irq 52 at device 0.1 numa-domain 0 on pci2

In case of error I see the following for IPv4. The bridge does IPv6 as well. Same problem.

ix0: A load balancer is asking for its default GW. No reply...

$ tcpdump -i ix0 \( arp \)
09:37:47.330361 ARP, Request who-has A.A.A.A tell B.B.B.B, length 46

ix1: The default GW actually sends a reply. I can see it on ix1.

$ tcpdump -i ix1 \( arp \)
09:38:59.328956 ARP, Request who-has A.A.A.A tell B.B.B.B, length 46
09:38:59.329374 ARP, Reply A.A.A.A is-at 00:00:0a:0b:0c:0d (oui Cisco), length 46

A tcpdump for bridge0 show the same as ix1.

Some numbers of the currently not working system:

$ netstat -m
82409/6901/89310 mbufs in use (current/cache/total)
38692/4094/42786/1015426 mbuf clusters in use (current/cache/total/max)
38692/4065 mbuf+clusters out of packet secondary zone in use (current/cache)
0/192/192/507713 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/150433 9k jumbo clusters in use (current/cache/total/max)
0/0/0/84618 16k jumbo clusters in use (current/cache/total/max)
97986K/10681K/108667K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0 sendfile syscalls
0 sendfile syscalls completed without I/O request
0 requests for I/O initiated by sendfile
0 pages read by sendfile as part of a request
0 pages were valid at time of a sendfile request
0 pages were requested for read ahead by applications
0 pages were read ahead by sendfile
0 times sendfile encountered an already busy page
0 requests for sfbufs denied
0 requests for sfbufs delayed

$ netstat -b -d -h -i bridge0
Name    Mtu Network       Address              Ipkts Ierrs Idrop     Ibytes    Opkts Oerrs     Obytes  Coll  Drop
ix0    1.5K <Link#1>      00:00:00:00:00:0a      12G     0     0        11T     7.9G     0       1.1T     0  335k
ix1    1.5K <Link#2>      00:00:00:00:00:0b     7.9G     0     0       1.2T      12G     0        11T     0     0
bridg  1.5K <Link#8>      00:00:00:00:00:0c      20G     0     0        12T      20G  335k        12T     0     0

What I did so far:

# Disable Ethernet Flow-Control
# https://wiki.freebsd.org/10gFreeBSD/Router
dev.ix.0.fc=0
dev.ix.1.fc=0

# Disable TSO
cloned_interfaces="bridge0"
ifconfig_bridge0="addm ix0 addm ix1 up"
ifconfig_ix0="up -tso"
ifconfig_ix1="up -tso"

I found the following bug reports:
2004: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=185633
2016: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=212749

And since this system uses PF and Scrubbing. I applied this patch manually:
https://reviews.freebsd.org/D7780

But I have no success so far.

Shutting down ix0/ix1 and bringing them up makes brigde0 responsive again. But time now works against me. Netstat after that procedure:

$ netstat -m
33281/56284/89565 mbufs in use (current/cache/total)
33280/9756/43036/2015426 mbuf clusters in use (current/cache/total/max)
33280/9730 mbuf+clusters out of packet secondary zone in use (current/cache)
0/192/192/507713 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/150433 9k jumbo clusters in use (current/cache/total/max)
0/0/0/84618 16k jumbo clusters in use (current/cache/total/max)
74880K/34351K/109231K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0 sendfile syscalls
0 sendfile syscalls completed without I/O request
0 requests for I/O initiated by sendfile
0 pages read by sendfile as part of a request
0 pages were valid at time of a sendfile request
0 pages were requested for read ahead by applications
0 pages were read ahead by sendfile
0 times sendfile encountered an already busy page
0 requests for sfbufs denied
0 requests for sfbufs delayed

Kind regards,
Aiko
Comment 1 Kristof Provost freebsd_committer freebsd_triage 2017-03-09 03:01:00 UTC
I don't think you're running into the bridge memory leak that D7780 fixed. You'll want that patch, but I don't think that's the problem here.

I also don't think that it's a pf problem, because bridge_pfil() doesn't pass ARP packets to pf, so if the problem were there ARP packets should still pass.

Can you check if there's still free memory when you run into this problem? Being out of memory might explain it, but I don't see how bringing the interfaces down and up again would help.
Comment 2 Aiko Barz 2017-03-10 08:27:39 UTC
Today the other Bridge-Server had this issue.
But I forget to collect the memory situation before applying the workaround. :(

My current workaround:
$ ifconfig ix0 down
$ ifconfig ix0 up
$ ifconfig ix1 down
$ ifconfig ix1 up

This is the memory situation after applying the workaround:

$ sysctl hw | egrep 'hw.(phys|user|real)'
hw.physmem: 17095405568
hw.usermem: 14351601664
hw.realmem: 17179869184

$ grep memory /var/run/dmesg.boot
real memory  = 17179869184 (16384 MB)
avail memory = 16531873792 (15766 MB)

$ freecolor -m -o
             total       used       free     shared    buffers     cached
Mem:         15866       2625      13240          0          0          0
Swap:         8192          0       8192

$ vmstat -s
1609035107 cpu context switches
3485091537 device interrupts
 11107443 software interrupts
 44940504 traps
 55770968 system calls
       25 kernel threads created
   184620  fork() calls
    35599 vfork() calls
        4 rfork() calls
        0 swap pager pageins
        0 swap pager pages paged in
        0 swap pager pageouts
        0 swap pager pages paged out
    64250 vnode pager pageins
    64954 vnode pager pages paged in
     1378 vnode pager pageouts
    21763 vnode pager pages paged out
        0 page daemon wakeups
   297604 pages examined by the page daemon
        0 pages reactivated
 11889396 copy-on-write faults
     6627 copy-on-write optimized faults
 24742122 zero fill pages zeroed
        0 zero fill pages prezeroed
      663 intransit blocking page faults
 41093137 total VM faults taken
    63984 page faults requiring I/O
        0 pages affected by kernel thread creation
  7838450 pages affected by  fork()
  1220635 pages affected by vfork()
     1524 pages affected by rfork()
        0 pages cached
 45799762 pages freed
        0 pages freed by daemon
 30066607 pages freed by exiting processes
     2339 pages active
    17759 pages inactive
        0 pages in VM cache
   669862 pages wired down
  3371746 pages free
     4096 bytes per page
 38614188 total name lookups
          cache hits (89% pos + 7% neg) system 0% per-directory
          deletions 0%, falsehits 0%, toolong 0%

Those machines only run PF. Nothing else.

Kind regards,
Aiko
Comment 3 Kristof Provost freebsd_committer freebsd_triage 2017-03-10 09:14:47 UTC
It doesn't look like it's running out of memory. It'd be good to confirm this when it's in the faulty state, just to be sure.
Comment 4 Aiko Barz 2017-03-14 08:34:45 UTC
Yesterday the other bridge stopped working. Here is the memory situation of the not working device:

$ sysctl hw | egrep 'hw.(phys|user|real)'
hw.physmem: 17095405568
hw.usermem: 14090272768
hw.realmem: 17179869184

$ freecolor -m -o
             total       used       free     shared    buffers     cached
Mem:         15866       2869      12996          0          0          0
Swap:         8192          0       8192

$ vmstat -s
 40942340 cpu context switches
3153615407 device interrupts
 19423182 software interrupts
 74464970 traps
 92413279 system calls
       25 kernel threads created
   301461  fork() calls
    57826 vfork() calls
       50 rfork() calls
        0 swap pager pageins
        0 swap pager pages paged in
        0 swap pager pageouts
        0 swap pager pages paged out
    80151 vnode pager pageins
    80933 vnode pager pages paged in
     1378 vnode pager pageouts
    21763 vnode pager pages paged out
        0 page daemon wakeups
   457328 pages examined by the page daemon
        0 pages reactivated
 19516079 copy-on-write faults
    11506 copy-on-write optimized faults
 40735110 zero fill pages zeroed
        0 zero fill pages prezeroed
      680 intransit blocking page faults
 67500932 total VM faults taken
    79873 page faults requiring I/O
        0 pages affected by kernel thread creation
 12810470 pages affected by  fork()
  1982627 pages affected by vfork()
    19050 pages affected by rfork()
        0 pages cached
 75504128 pages freed
        0 pages freed by daemon
 49183950 pages freed by exiting processes
      838 pages active
    19799 pages inactive
        0 pages in VM cache
   733668 pages wired down
  3307401 pages free
     4096 bytes per page
 64364542 total name lookups
          cache hits (89% pos + 7% neg) system 0% per-directory
          deletions 0%, falsehits 0%, toolong 0%

So long,
Aiko
Comment 5 Kristof Provost freebsd_committer freebsd_triage 2017-03-15 02:40:17 UTC
Right, that confirms we're not dealing with a memory leak.
I've had a quick look at the ixgbe/if_ix code. It resets the hardware if you bring it down, so one possible explanation is that there's something wrong in the ix driver itself, which causes this.

I might be missing it, but I don't see anything in the bridge code that responds to bringing a member interface up/down, so right now I consider the ix driver to be the most likely suspect. I'm afraid I don't know anything about that hardware.

It might be useful to have a look in dmesg to see if there's anything obvious, as well as list the exact hardware you have.
Comment 6 Aiko Barz 2017-03-15 10:04:38 UTC
Hardware: Dell PowerEdge R320
* Intel Xeon E5-2430 v2 (2,5GHz, 6C, 15MB Cache, 7,2GT/s QPI, 80W, Turbo)
* PowerEdge R320 Mainboard, TPM
* Intel Ethernet I350 DP 1Gbit/s Serveradapter, Low Profile
* Intel X520 DP 10Gbit/s DA/SFP+ Serveradapter <--- bridge0
* 10GbE SR SFP+ Transceiver, 10GB and 1GB compatible with Intel and Broadcom Serveradapter

This is the patched kernel as mentioned above. The other machine uses the vanilla kernel:

$ dmesg
Copyright (c) 1992-2016 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 11.0-RELEASE-p8 #0: Thu Mar  2 12:13:46 CET 2017
    root@goldengate01:/usr/obj/usr/src/sys/GENERIC amd64
FreeBSD clang version 3.8.0 (tags/RELEASE_380/final 262564) (based on LLVM 3.8.0)
VT(vga): resolution 640x480
CPU: Intel(R) Xeon(R) CPU E5-2430 v2 @ 2.50GHz (2500.05-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x306e4  Family=0x6  Model=0x3e  Stepping=4
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x7fbee3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
  AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
  AMD Features2=0x1<LAHF>
  Structured Extended Features=0x281<FSGSBASE,SMEP,ERMS>
  XSAVE Features=0x1<XSAVEOPT>
  VT-x: (disabled in BIOS) PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
  TSC: P-state invariant, performance statistics
real memory  = 17179869184 (16384 MB)
avail memory = 16531873792 (15766 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <DELL   PE_SC3  >
FreeBSD/SMP: Multiprocessor System Detected: 12 CPUs
FreeBSD/SMP: 1 package(s) x 6 core(s) x 2 hardware threads
random: unblocking device.
ioapic1: Changing APIC ID to 1
ioapic0 <Version 2.0> irqs 0-23 on motherboard
ioapic1 <Version 2.0> irqs 32-55 on motherboard
random: entropy device external interface
kbd1 at kbdmux0
netmap: loaded module
module_register_init: MOD_LOAD (vesa, 0xffffffff8101d970, 0) error 19
random: registering fast source Intel Secure Key RNG
random: fast provider: "Intel Secure Key RNG"
vtvga0: <VT VGA driver> on motherboard
cryptosoft0: <software crypto> on motherboard
acpi0: <DELL PE_SC3> on motherboard
acpi0: Power Button (fixed)
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
cpu2: <ACPI CPU> on acpi0
cpu3: <ACPI CPU> on acpi0
cpu4: <ACPI CPU> on acpi0
cpu5: <ACPI CPU> on acpi0
cpu6: <ACPI CPU> on acpi0
cpu7: <ACPI CPU> on acpi0
cpu8: <ACPI CPU> on acpi0
cpu9: <ACPI CPU> on acpi0
cpu10: <ACPI CPU> on acpi0
cpu11: <ACPI CPU> on acpi0
atrtc0: <AT realtime clock> port 0x70-0x7f irq 8 on acpi0
Event timer "RTC" frequency 32768 Hz quality 0
attimer0: <AT timer> port 0x40-0x5f irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0
Timecounter "HPET" frequency 14318180 Hz quality 950
Event timer "HPET" frequency 14318180 Hz quality 350
Event timer "HPET1" frequency 14318180 Hz quality 340
Event timer "HPET2" frequency 14318180 Hz quality 340
Event timer "HPET3" frequency 14318180 Hz quality 340
Event timer "HPET4" frequency 14318180 Hz quality 340
Event timer "HPET5" frequency 14318180 Hz quality 340
Event timer "HPET6" frequency 14318180 Hz quality 340
Event timer "HPET7" frequency 14318180 Hz quality 340
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff numa-domain 0 on acpi0
pci0: <ACPI PCI bus> numa-domain 0 on pcib0
pcib1: <ACPI PCI-PCI bridge> irq 53 at device 1.0 numa-domain 0 on pci0
pci1: <ACPI PCI bus> numa-domain 0 on pcib1
mfi0: <Drake Skinny> port 0xfc00-0xfcff mem 0xd8ffc000-0xd8ffffff,0xd8f80000-0xd8fbffff irq 34 at device 0.0 numa-domain 0 on pci1
mfi0: Using MSI
mfi0: Megaraid SAS driver Ver 4.23 
pcib2: <ACPI PCI-PCI bridge> irq 53 at device 3.0 numa-domain 0 on pci0
pci2: <ACPI PCI bus> numa-domain 0 on pcib2
ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.1.13-k> port 0xecc0-0xecdf mem 0xd9e80000-0xd9efffff,0xd9ff8000-0xd9ffbfff irq 48 at device 0.0 numa-domain 0 on pci2
ix0: Using MSIX interrupts with 9 vectors
ix0: Ethernet address: xx:xx:xx:xx:xx:xx
ix0: PCI Express Bus: Speed 5.0GT/s Width x8
ix0: netmap queues/slots: TX 8/2048, RX 8/2048
ix1: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.1.13-k> port 0xece0-0xecff mem 0xd9f00000-0xd9f7ffff,0xd9ffc000-0xd9ffffff irq 52 at device 0.1 numa-domain 0 on pci2
ix1: Using MSIX interrupts with 9 vectors
ix1: Ethernet address: xx:xx:xx:xx:xx:xx
ix1: PCI Express Bus: Speed 5.0GT/s Width x8
ix1: netmap queues/slots: TX 8/2048, RX 8/2048
pcib3: <PCI-PCI bridge> irq 16 at device 17.0 numa-domain 0 on pci0
pci3: <PCI bus> numa-domain 0 on pcib3
pci0: <simple comms> at device 22.0 (no driver attached)
pci0: <simple comms> at device 22.1 (no driver attached)
ehci0: <Intel Patsburg USB 2.0 controller> mem 0xdd0fd000-0xdd0fd3ff irq 23 at device 26.0 numa-domain 0 on pci0
usbus0: EHCI version 1.0
usbus0 numa-domain 0 on ehci0
pcib4: <ACPI PCI-PCI bridge> at device 28.0 numa-domain 0 on pci0
pci4: <ACPI PCI bus> numa-domain 0 on pcib4
igb0: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> mem 0xdad00000-0xdadfffff,0xdaff8000-0xdaffbfff irq 16 at device 0.0 numa-domain 0 on pci4
igb0: Using MSIX interrupts with 9 vectors
igb0: Ethernet address: xx:xx:xx:xx:xx:xx
igb0: Bound queue 0 to cpu 0
igb0: Bound queue 1 to cpu 1
igb0: Bound queue 2 to cpu 2
igb0: Bound queue 3 to cpu 3
igb0: Bound queue 4 to cpu 4
igb0: Bound queue 5 to cpu 5
igb0: Bound queue 6 to cpu 6
igb0: Bound queue 7 to cpu 7
igb0: netmap queues/slots: TX 8/1024, RX 8/1024
igb1: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> mem 0xdae00000-0xdaefffff,0xdaffc000-0xdaffffff irq 17 at device 0.1 numa-domain 0 on pci4
igb1: Using MSIX interrupts with 9 vectors
igb1: Ethernet address: xx:xx:xx:xx:xx:xx
igb1: Bound queue 0 to cpu 8
igb1: Bound queue 1 to cpu 9
igb1: Bound queue 2 to cpu 10
igb1: Bound queue 3 to cpu 11
igb1: Bound queue 4 to cpu 0
igb1: Bound queue 5 to cpu 1
igb1: Bound queue 6 to cpu 2
igb1: Bound queue 7 to cpu 3
igb1: netmap queues/slots: TX 8/1024, RX 8/1024
pcib5: <ACPI PCI-PCI bridge> irq 16 at device 28.4 numa-domain 0 on pci0
pci5: <ACPI PCI bus> numa-domain 0 on pcib5
bge0: <Broadcom NetXtreme Gigabit Ethernet, ASIC rev. 0x5720000> mem 0xd50a0000-0xd50affff,0xd50b0000-0xd50bffff,0xd50c0000-0xd50cffff irq 16 at device 0.0 numa-domain 0 on pci5
bge0: APE FW version: NCSI v1.2.37.0
bge0: CHIP ID 0x05720000; ASIC REV 0x5720; CHIP REV 0x57200; PCI-E
miibus0: <MII bus> numa-domain 0 on bge0
brgphy0: <BCM5720C 1000BASE-T media interface> PHY 1 on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
bge0: Using defaults for TSO: 65518/35/2048
bge0: Ethernet address: xx:xx:xx:xx:xx:xx
bge1: <Broadcom NetXtreme Gigabit Ethernet, ASIC rev. 0x5720000> mem 0xd50d0000-0xd50dffff,0xd50e0000-0xd50effff,0xd50f0000-0xd50fffff irq 17 at device 0.1 numa-domain 0 on pci5
bge1: APE FW version: NCSI v1.2.37.0
bge1: CHIP ID 0x05720000; ASIC REV 0x5720; CHIP REV 0x57200; PCI-E
miibus1: <MII bus> numa-domain 0 on bge1
brgphy1: <BCM5720C 1000BASE-T media interface> PHY 2 on miibus1
brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
bge1: Using defaults for TSO: 65518/35/2048
bge1: Ethernet address: xx:xx:xx:xx:xx:xx
pcib6: <ACPI PCI-PCI bridge> irq 19 at device 28.7 numa-domain 0 on pci0
pci6: <ACPI PCI bus> numa-domain 0 on pcib6
pcib7: <PCI-PCI bridge> at device 0.0 numa-domain 0 on pci6
pci7: <PCI bus> numa-domain 0 on pcib7
pcib8: <PCI-PCI bridge> at device 0.0 numa-domain 0 on pci7
pci8: <PCI bus> numa-domain 0 on pcib8
pcib9: <PCI-PCI bridge> at device 0.0 numa-domain 0 on pci8
pci9: <PCI bus> numa-domain 0 on pcib9
vgapci0: <VGA-compatible display> mem 0xd4000000-0xd4ffffff,0xdc7fc000-0xdc7fffff,0xdb800000-0xdbffffff irq 19 at device 0.0 numa-domain 0 on pci9
vgapci0: Boot video device
pcib10: <PCI-PCI bridge> at device 1.0 numa-domain 0 on pci7
pci10: <PCI bus> numa-domain 0 on pcib10
ehci1: <Intel Patsburg USB 2.0 controller> mem 0xdd0fe000-0xdd0fe3ff irq 22 at device 29.0 numa-domain 0 on pci0
usbus1: EHCI version 1.0
usbus1 numa-domain 0 on ehci1
pcib11: <PCI-PCI bridge> at device 30.0 numa-domain 0 on pci0
pci11: <PCI bus> numa-domain 0 on pcib11
isab0: <PCI-ISA bridge> at device 31.0 numa-domain 0 on pci0
isa0: <ISA bus> numa-domain 0 on isab0
ahci0: <Intel Patsburg AHCI SATA controller> port 0xdce8-0xdcef,0xdcf8-0xdcfb,0xdcf0-0xdcf7,0xdcfc-0xdcff,0xdcc0-0xdcdf mem 0xdd0ff000-0xdd0ff7ff irq 20 at device 31.2 numa-domain 0 on pci0
ahci0: AHCI v1.30 with 6 3Gbps ports, Port Multiplier not supported
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich1: <AHCI channel> at channel 1 on ahci0
ahcich2: <AHCI channel> at channel 2 on ahci0
ahcich3: <AHCI channel> at channel 3 on ahci0
ahcich4: <AHCI channel> at channel 4 on ahci0
ahciem0: <AHCI enclosure management bridge> on ahci0
pcib12: <ACPI Host-PCI bridge> numa-domain 0 on acpi0
pci12: <ACPI PCI bus> numa-domain 0 on pcib12
pcib13: <ACPI Host-PCI bridge> on acpi0
pci13: <ACPI PCI bus> on pcib13
uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xec000-0xeffff on isa0
ppc0: cannot reserve I/O port range
est0: <Enhanced SpeedStep Frequency Control> on cpu0
est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 1f6200001c00
device_attach: est0 attach returned 6
est1: <Enhanced SpeedStep Frequency Control> on cpu1
est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 1f6200001c00
device_attach: est1 attach returned 6
est2: <Enhanced SpeedStep Frequency Control> on cpu2
est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 1f6200001c00
device_attach: est2 attach returned 6
est3: <Enhanced SpeedStep Frequency Control> on cpu3
est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 1f6200001c00
device_attach: est3 attach returned 6
est4: <Enhanced SpeedStep Frequency Control> on cpu4
est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 1f6200001c00
device_attach: est4 attach returned 6
est5: <Enhanced SpeedStep Frequency Control> on cpu5
est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 1f6200001c00
device_attach: est5 attach returned 6
est6: <Enhanced SpeedStep Frequency Control> on cpu6
est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 1f6200001c00
device_attach: est6 attach returned 6
est7: <Enhanced SpeedStep Frequency Control> on cpu7
est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 1f6200001c00
device_attach: est7 attach returned 6
est8: <Enhanced SpeedStep Frequency Control> on cpu8
est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 1f3900001c00
device_attach: est8 attach returned 6
est9: <Enhanced SpeedStep Frequency Control> on cpu9
est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 1f6200001c00
device_attach: est9 attach returned 6
est10: <Enhanced SpeedStep Frequency Control> on cpu10
est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 1f6200001c00
device_attach: est10 attach returned 6
est11: <Enhanced SpeedStep Frequency Control> on cpu11
est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 1f6200001c00
device_attach: est11 attach returned 6
mfi0: 1288 (541774932s/0x0020/info) - Shutdown command received from host
mfi0: 1289 (boot + 3s/0x0020/info) - Firmware initialization started (PCI ID 0073/1000/1f51/1028)
mfi0: 1290 (boot + 3s/0x0020/info) - Firmware version 2.121.14-3416
mfi0: 1291 (boot + 5s/0x0020/info) - Package version 20.13.0-0007
mfi0: 1292 (boot + 5s/0x0020/info) - Board Revision A06
mfi0: 1293 (boot + 31s/0x0004/info) - Enclosure PD 20(c None/p1) communication restored
mfi0: 1294 (boot + 31s/0x0002/info) - Inserted: Encl PD 20
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
Timecounters tick every 1.000 msec
nvme cam probe device init
mfisyspd0 numa-domain 0 on mfi0
mfisyspd0: 286102MB (585937500 sectors) SYSPD volume (deviceid: 0)
mfisyspd0:  SYSPD volume attached
mfisyspd1 numa-domain 0 on mfi0
mfisyspd1: 286102MB (585937500 sectors) SYSPD volume (deviceid: 1)
mfisyspd1:  SYSPD volume attached
mfi0: 1295 (boot + 31s/0x0002/info) - Inserted: PD 20(c None/p1) Info: enclPd=20, scsiType=d, portMap=00, sasAddr=5d81f060f497d900,0000000000000000
mfi0: 1296 (boot + 31s/0x0002/info) - Inserted: PD 00(e0x20/s0)
mfi0: 1297 (boot + 31s/0x0002/info) - Inserted: PD 00(e0x20/s0) Info: enclPd=20, scsiType=0, portMap=00, sasAddr=500003957801561a,0000000000000000
mfi0: 1298 (boot + 31s/0x0002/info) - Inserted: PD 01(e0x20/s1)
mfi0: 1299 (boot + 31s/0x0002/info) - Inserted: PD 01(e0x20/s1) Info: enclPd=20, scsiType=0, portMap=01, sasAddr=50000395780155f6,0000000000000000
mfi0: 1300 (541774984s/0x0020/info) - Time established as 03/02/17 13:03:04; (43 seconds since power on)
mfi0: 1301 (541775071s/0x0020/info) - Host driver is loaded and operational
usbus0: 480Mbps High Speed USB v2.0
usbus1: 480Mbps High Speed USB v2.0
ugen0.1: <Intel> at usbus0
uhub0: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus0
ugen1.1: <Intel> at usbus1
uhub1: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1
ses0 at ahciem0 bus 0 scbus5 target 0 lun 0
ses0: <AHCI SGPIO Enclosure 1.00 0001> SEMB S-E-S 2.00 device
ses0: SEMB SES Device
SMP: AP CPU #1 Launched!
SMP: AP CPU #10 Launched!
SMP: AP CPU #8 Launched!
SMP: AP CPU #11 Launched!
SMP: AP CPU #2 Launched!
SMP: AP CPU #9 Launched!
SMP: AP CPU #7 Launched!
SMP: AP CPU #3 Launched!
SMP: AP CPU #6 Launched!
SMP: AP CPU #4 Launched!
SMP: AP CPU #5 Launched!
cd0 at ahcich4 bus 0 scbus4 target 0 lun 0
cd0: <HL-DT-ST DVD-ROM DU90N D300> Removable CD-ROM SCSI device
cd0: Serial Number KMJE3BL2145
cd0: 150.000MB/s transfers (SATA 1.x, UDMA5, ATAPI 12bytes, PIO 8192bytes)
cd0: Attempt to query device size failed: NOT READY, Medium not present - tray closed
Timecounter "TSC-low" frequency 1250026658 Hz quality 1000
Trying to mount root from zfs:zroot/ROOT/default []...
Root mount waiting for: usbus1 usbus0
uhub0: 2 ports with 2 removable, self powered
uhub1: 2 ports with 2 removable, self powered
Root mount waiting for: usbus1 usbus0
ugen0.2: <vendor 0x8087> at usbus0
uhub2: <vendor 0x8087 product 0x0024, class 9/0, rev 2.00/0.00, addr 2> on usbus0
ugen1.2: <vendor 0x8087> at usbus1
uhub3: <vendor 0x8087 product 0x0024, class 9/0, rev 2.00/0.00, addr 2> on usbus1
Root mount waiting for: usbus1 usbus0
uhub2: 6 ports with 6 removable, self powered
uhub3: 8 ports with 8 removable, self powered
ugen0.3: <no manufacturer> at usbus0
uhub4: <no manufacturer Gadget USB HUB, class 9/0, rev 2.00/0.00, addr 3> on usbus0
Root mount waiting for: usbus0
uhub4: 6 ports with 6 removable, self powered
ugen0.4: <Avocent> at usbus0
ukbd0: <Keyboard> on usbus0
kbd2 at ukbd0
bridge0: Ethernet address: xx:xx:xx:xx:xx:xx
ix0: link state changed to UP
ix0: link state changed to DOWN
ix0: link state changed to UP
ix1: link state changed to UP
bge0: link state changed to DOWN
ix0: promiscuous mode enabled
bridge0: link state changed to UP
ix1: promiscuous mode enabled
ix0: link state changed to DOWN
ix1: link state changed to DOWN
ix0: link state changed to UP
ix1: link state changed to UP
ums0: <Mouse> on usbus0
ums0: 3 buttons and [Z] coordinates ID=0
ums1: <Mouse REL> on usbus0
ums1: 3 buttons and [XYZ] coordinates ID=0
pflog0: promiscuous mode enabled
bge0: link state changed to UP
bridge0: promiscuous mode enabled
bridge0: promiscuous mode disabled
bridge0: promiscuous mode enabled
bridge0: promiscuous mode disabled
bridge0: promiscuous mode enabled
bridge0: promiscuous mode disabled
bridge0: promiscuous mode enabled
bridge0: promiscuous mode disabled
bridge0: promiscuous mode enabled
bridge0: promiscuous mode disabled
mfi0: 1302 (541911600s/0x0020/WARN) - Patrol Read can't be started, as PDs are either not ONLINE, or are in a VD with an active process, or are in an excluded VD
bridge0: promiscuous mode enabled
bridge0: promiscuous mode disabled
bridge0: promiscuous mode enabled
bridge0: promiscuous mode disabled
ix1: link state changed to DOWN
ix1: link state changed to UP
ix1: link state changed to DOWN
ix1: link state changed to UP
ix0: link state changed to DOWN
ix0: link state changed to UP
ix0: link state changed to DOWN
ix0: link state changed to UP
[zone: pf frag entries] PF frag entries limit reached
mfi0: 1303 (542516400s/0x0020/WARN) - Patrol Read can't be started, as PDs are either not ONLINE, or are in a VD with an active process, or are in an excluded VD
ix0: link state changed to DOWN
ix0: link state changed to UP
ix1: link state changed to DOWN
ix1: link state changed to UP
Comment 7 Andrey V. Elsukov freebsd_committer freebsd_triage 2017-03-15 10:21:51 UTC
We have also faced such a problem. We do not use if_bridge, in our case the reason of the problem was determined as overheating of the card, after adding additional coolers to the server the problem is gone.
Comment 8 Aiko Barz 2017-03-15 11:58:36 UTC
(In reply to Andrey V. Elsukov from comment #7)

We did not have those issues with 10.1. But I keep your advise in mind.

I currently try to upgrade all those firmware-packages. Unfortunately the Dell Lifecycle controller is currently not able to upgrade the INTEL X520 firmware from 15.0.28 to 17.5.10. This seems to be a known issue. I will investigate that and report back later.

So long,
Aiko
Comment 9 Aiko Barz 2017-03-15 15:50:32 UTC
(In reply to Aiko Barz from comment #8)

The Lifecycle method does not work at the moment. But I installed the firmware version 16.5.20 from 05/12/2016. That firmware was offered as a Redhat shell script package.

Offtopic: I booted from a Centos ISO with a virtually mounted CD drive through the iDRAC web console hundreds of kilometers away from the physical machine. And it worked. Always weird and amazing.

So let's see, what happens next…

So long,
Aiko
Comment 10 Aiko Barz 2017-03-16 10:29:23 UTC
One (possibly stupid) question:

ifconfig does not list LRO. That means, it is not enabled on those cards. Right?
sysctl lists some LRO values like lro_flushed and lro_queued. They are all zero though.

I disabled TSO4 and TSO6 before opening my first request, which was listed by ifconfig. This is how it looks like now:

$ ifconfig ix0
ix0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
 options=e400bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
 ether ..:..:..:..:..:..
 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
 media: Ethernet autoselect (10Gbase-SR <full-duplex>)
 status: active

I just ask because there were some similar problems with Linux. The cards stopped doing TX after a couple of days. That is why I explicitly ask.

Kind regards,
Aiko
Comment 11 Aiko Barz 2017-03-17 14:15:58 UTC
I installed the latest firmware (17.5.10) on those Intel X520 10G cards today.

Dell replied within 10 minutes(!) with a bootable ISO image, which was able to do those nasty firmware updates incrementally(!) and automatically. This is a known bug to them. I also had to do a cold reset to detect the newly installed firmware.

The Dell technician actually was quite optimistic that my other problems might go away as well. I will tell you in a week. ;)

Have a nice weekend,
Aiko
Comment 12 Aiko Barz 2017-03-20 10:15:09 UTC
I am sorry…

We had a spontaneous failover[1] at Saturday once more. Since I installed the latest firmware blobs for every piece of hardware, what can I do now? I don't find anything in the logs. It just stops passing traffic…

So long,
Aiko

[1]: The failover gets handled by a different system. Those two bridges act alone without knowledge of each other.
Comment 13 Fabian Keil 2017-03-20 11:17:52 UTC
There's a uma-related regression in FreeBSD 11 that can result
in somewhat similar symptoms. For details see #209680.

You could check the vmstat -z output and try the patch:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209680#c2
which reverts some of the problematic commits (only in the tcp code,
though).

Apparently some of the other uma-consumers are affected as well:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209680#c10
and I'm wondering if if_bridge is one of them.

You could try if it makes a difference if you let if_bridge set the
UMA_ZONE_NOFREE flag and preallocate a couple of items.

From your report it isn't obvious to me that your problem is
uma-related so if you have other leads you may want to investigate
them first.
Comment 14 Aiko Barz 2017-03-27 08:28:33 UTC
Maybe nothing you like to hear, but I tried the pufferfish.

It is running for some days now without issues so far and it handles the workload quite well. We cannot utilize the full 10Gbit/s at the moment anyway because of other bottlenecks.

So you might close the bug report because maybe it was just us. I don't know.

Nevertheless I needed a solution right now. I'm sorry.

Kind regards,
Aiko
Comment 15 Kurt Jaeger freebsd_committer freebsd_triage 2017-08-30 10:12:38 UTC
See

https://lists.freebsd.org/pipermail/freebsd-stable/2017-August/087723.html

for another report which looks similar.
Comment 16 Fabian Keil 2017-08-30 10:57:36 UTC
Created attachment 185916 [details]
Set UMA_ZONE_NOFREE and preallocate zone item(s)

I've been using the attached workaround in ElectroBSD for a while
now and it seems to help (when combined with the other workarounds).
Comment 17 punkt.de Hosting Team 2017-09-11 08:11:09 UTC
Hi there,

we applied Fabian's patches for TCP and if_bridge (thanks!) but unfortunately the production server's jails came to a halt again last night. Depending on your point of view at a fortunate time of day (2am). The customers were hardly affected but the operator on duty might see that differently. ;-)

Uptime of the system was roughly 10 days. The colleague was too tired when the alarm woke him to think of trying things like `/etc/rc.d/netif restart` or similar, so he just rebooted the system.

Is there any way we could help to nail this and fix it?

Some suggestions on a suitable course of action at the next incident would be greatly appreciated.

Kind regards,
Patrick
Comment 18 Kristof Provost freebsd_committer freebsd_triage 2017-09-11 08:54:47 UTC
I'm afraid I don't have any ideas at this point. I'd like an ix driver expert to take a look at this so I've cced erj.
Comment 19 Eric Joyner freebsd_committer freebsd_triage 2017-09-11 18:41:07 UTC
(In reply to Kristof Provost from comment #18)

I don't really have any, either. :p

@Fabian, can you report your system configuration (network card, OS, boot dmesg, driver versions), too? I'm going to assume it doesn't exactly match Aiko's.
Comment 20 Fabian Keil 2017-09-14 10:06:14 UTC
Created attachment 186372 [details]
ElectroBSD r323013-16c8587cb1b5 dmesg on HP Proliant

TL;DR: I'm attaching a dmesg output as requested but as it looks
like the problems are unrelated the system differences are
probably irrelevant. I currently don't use ix devices.

As mentioned in comment #13 it's not obvious to me that
Aiko's issue is uma related and I mainly mentioned the
patches as the symptoms look somewhat similar and no
other suggestions were made at the time.

The fact that the uma patches didn't help on the punkt.de
system(s) seems to indicate that it's another issue as well
(but maybe the same as Aiko's).

On my systems the uma regression resulted in established
tcp connections getting terminated after a while. It could
take days for the problem to occur so I've mostly noticed
it with ssh connections. Terminated connections could be
reestablished without restarting any interfaces and the
problem didn't affect all connections at the same time
either.

I've created the if_bridge patches after noticing that
ssh connections on a bhyve host to the bhyve vms got
terminated a couple of times in a similar fashion.

At the time the bhyve host and vms were running ElectroBSD
based on FreeBSD stable/11 from March 2017 and already
contained the other tcp regression fixes.

The bhyve host is a HP DL120 G6 using a bridge1 between
tap devices for vms with access to the Internet and bridge0
between tap devices from vms without access to the Internet.
pf is being used to nat on bge0 to bridge1.

After applying the if_bridge patches the local ssh connections
to the bhyve vms stayed up so I haven't looked into this
any further. I also didn't verify that removing the if_bridge
patches reintroduces the problem again to confirm that there's
indeed a causation (and the problem wasn't fixed by syncing with
stable/11).
Comment 21 punkt.de Hosting Team 2017-09-14 10:13:58 UTC
Hi, all,

to recapitulate, our symptoms are

* no traffic reaching the jails and their epair(4) interfaces across the bridge, anymore
* same for outbound traffic from the jails to the wire
* network stack inside the jails looks fine, jail can reach it's own addresses on the epair(4)
* traffic to and from the host is just fine, too
* network connections cannot be re-established
* ifconfig down/up does not help

We will put igb(4) based network adapters into our hosts on the weekend and rewire the network connections accordingly. Just a desperate try ...

Kind regards,
Patrick
Comment 22 Kristof Provost freebsd_committer freebsd_triage 2017-09-14 21:23:15 UTC
(In reply to punkt.de Hosting Team from comment #21)
That should tell us if the problem is specific to the ix cards, or not. If it's not it's most likely a bridge issue.

It might be worth taking a look at what happens in the bridge code once it's in that state:

sudo dtrace -n fbt:if_bridge::

That looks like this, on my nearly idle bridge:
dtrace: description 'fbt:if_bridge::' matched 108 probes
CPU     ID                    FUNCTION:NAME
  6  63915            bridge_transmit:entry
  6  63814           bridge_broadcast:entry
  6  63824             bridge_enqueue:entry
  6  63825            bridge_enqueue:return
  6  63824             bridge_enqueue:entry
  6  63825            bridge_enqueue:return
  6  63815          bridge_broadcast:return
  6  63916           bridge_transmit:return
  3  63915            bridge_transmit:entry
...
Comment 23 punkt.de Hosting Team 2017-10-08 14:13:12 UTC
Hey guys,

it hit us again in production. Saturday night, 4am. Sorry - I missed that DTrace diagnose, but I can add two things at least:

We are using igb(4) interfaces in that machine, not ix(4), so it seems not to be interface related. Doesn't come as a big surprise to me, because that would have been a gross layer violation. Again: the host system is communicating just fine!

I did a `netstat -m` while the bridge was wedged as suggested by gnn. Doesn't look suspicious at all to me, but then I'm not a kernel developer. Please see attachment.

What I am going to do now, probably:

Set up an isolated test environment.
Fire up some dozen jails or so.
Generate artificial traffic through the bridge IF. A gigabit sustained bandwidth ...
Wait for the bug to be triggered.

Need to talk to my colleagues, tomorrow, but this will probably be our course of action. I can offer ssh access to that isolated system to anyone who requests it!

Kind regards
Patrick
Comment 24 punkt.de Hosting Team 2017-10-08 14:13:59 UTC
Created attachment 187001 [details]
`netstat -m` output while the bridge is in wedged state
Comment 25 punkt.de Hosting Team 2017-10-20 13:52:13 UTC
Hey guys,

I am trying to reproduce the problem on a not yet productive server. Un(?)fortunately this system is rock-stable for almost two weeks now.

I just noticed that this test system runs 11.1p1 while the problematic server runs 11.0p10 - duh!

So ... I'll attach a diff of if_bridge.c of the two releases. Looks like the areas changed cover locking, handling of fragments and handling of mbufs for fragments.

Any qualified comment on what might have caused the problem in 11.0 that changed in 11.1?

Thanks!
Patrick
Comment 26 punkt.de Hosting Team 2017-10-20 13:53:05 UTC
Created attachment 187326 [details]
Diff of if_bridge.c for 11.0 and 11.1
Comment 27 Kristof Provost freebsd_committer freebsd_triage 2017-10-20 22:24:16 UTC
(In reply to punkt.de Hosting Team from comment #25)
I'd try reverting this: https://svnweb.freebsd.org/base?view=revision&revision=313050 and seeing if you can reproduce it then.
I do not understand how it'd trigger this, but the other major commit (https://svnweb.freebsd.org/base?view=revision&revision=r306289) should only happen if you've got a filtering bridge and that's a fix for an mbuf leak. This doesn't look like an mbuf leak.

That said, considering that 11.0 is going out of support in five days I'd recommend just upgrading the box to 11.1. It'll have to be done soon anyway.
Comment 28 punkt.de Hosting Team 2023-09-27 07:33:06 UTC
I guess someone with the necessary superpowers should close this. Correct?

Kind regards,
Patrick
Comment 29 Kristof Provost freebsd_committer freebsd_triage 2023-09-27 07:37:19 UTC
Given that there's been no further reports and that all versions mentioned here are long since out of support, yeah, we should just close this.

The locking in the bridge code has also been completely changed, so even if something like this happens again it'd almost certainly be a different bug.