Bug 210901 - em0 stores packets somewhere and lets them out slowly under load.
Summary: em0 stores packets somewhere and lets them out slowly under load.
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.3-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-net mailing list
URL:
Keywords: IntelNetworking
Depends on:
Blocks:
 
Reported: 2016-07-07 19:17 UTC by dgilbert
Modified: 2019-04-22 16:14 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description dgilbert 2016-07-07 19:17:07 UTC
I fully realize that this is going to be a bear to track down.  The machine in question is my home server ... so it runs a bit of everything.  The trigger for the behavior seems to be more than 1000 torrents running.  Stats on that later.

The kicker is: replace em0 (pcie) with rl0 (motherboard) and it goes away.

I have the em0 in the machine because I believe it's a better card.  Sigh.

SO WHAT HAPPENS?

When em0 is misbehaving, the "mild" symptoms are local LAN lag from 500ms to 5000 ms.  (this is why I jokingly accuse em0 of storing the packets.)  Beyond about 5000ms, it seems the packets are dropped.  This can be observed by pinging out from the console of the box or pinging from another box to it.

I often first notice the box is having trouble when the UPS monitor looses network connectivity with the UPS.

Salient details I can think of?  The answer to "netstat -an | grep tcp4 | grep -v LISTEN | wc" is 320.  1000 torrents configured doesn't mean that that many streams happen.  It is possible that the behavior is related to the number of torrent streams in progress... or the number of TCP streams with small transmit queues.  In among that is some mildly fast SMB service for various media devices around the house.

The em0 behavior is 100% related to the large number of torrents (running using rtorrent).  Stopping rtorrent makes the host good again, starting rtorrent hoses it.

So the machine is:

FreeBSD virtual.xxx.xxx 10.3-RELEASE-p5 FreeBSD 10.3-RELEASE-p5 #4 r301872: Mon Jun 13 14:35:24 EDT 2016     root@virtual.xxx.xxx:/usr/obj/usr/src/sys/GENERIC  amd64

CPU: AMD FX(tm)-9590 Eight-Core Processor            (4716.02-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0x600f20  Family=0x15  Model=0x2  Stepping=0
  Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x3e98320b<SSE3,PCLMULQDQ,MON,SSSE3,FMA,CX16,SSE4.1,SSE4.2,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,F16C>
  AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM>
  AMD Features2=0x1ebbfff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,XOP,SKINIT,WDT,LWP,FMA4,TCE,NodeId,TBM,Topology,PCXC,PNXC>
  Structured Extended Features=0x8<BMI1>
  SVM: NP,NRIP,VClean,AFlush,DAssist,NAsids=65536
  TSC: P-state invariant, performance statistics
real memory  = 34359738368 (32768 MB)
avail memory = 33186353152 (31648 MB)

The ethernet cards are:

em0: <Intel(R) PRO/1000 Network Connection 7.6.1-k> port 0x8000-0x801f mem 0xfe340000-0xfe35ffff,0xfe320000-0xfe33ffff irq 20 at device 0.0 on pci9
em0: Using an MSI interrupt
em0: Ethernet address: 00:15:17:0d:04:a8

em0@pci0:9:0:0: class=0x020000 card=0x10838086 chip=0x10b98086 rev=0x06 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82572EI Gigabit Ethernet Controller (Copper)'

re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 0x7000-0x70ff mem 0xd2104000-0xd2104fff,0xd2100000-0xd2103fff irq 21 at device 0.0 on pci10
re0: Using 1 MSI-X message
re0: Chip rev. 0x48000000
re0: MAC rev. 0x00000000
miibus0: <MII bus> on re0
rgephy0: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 1 on miibus0
rgephy0:  none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow
re0: Using defaults for TSO: 65518/35/2048
re0: Ethernet address: 10:c3:7b:9d:8b:6d

re0@pci0:10:0:0:        class=0x020000 card=0x85051043 chip=0x816810ec rev=0x09 hdr=0x00
    vendor     = 'Realtek Semiconductor Co., Ltd.'
    device     = 'RTL8111/8168B PCI Express Gigabit Ethernet controller'
Comment 1 Sean Bruno freebsd_committer 2016-07-12 03:33:18 UTC
A couple of things to try while we are doing an overhaul to em(4):

hw.em.rx_process_limit: -1
hw.em.txd: 4096
hw.em.rxd: 4096
Comment 2 Kevin Bowling freebsd_committer 2017-01-10 11:30:06 UTC
Can you retry with 12-CURRENT?
Comment 3 dgilbert 2017-01-10 22:22:46 UTC
This was finally shown to be due to jumbo allocation for >4k (greater-than-page-size) mbufs failing.  As such, it would still happen if this still happens.

That said, I don't have a means to test on 12-CURRENT.