Bug 265714 - igc(4) drops link under high traffic
Summary: igc(4) drops link under high traffic
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.1-STABLE
Hardware: amd64 Any
: --- Affects Many People
Assignee: freebsd-net (Nobody)
URL:
Keywords: IntelNetworking
Depends on:
Blocks:
 
Reported: 2022-08-08 12:40 UTC by mike
Modified: 2023-08-17 19:41 UTC (History)
3 users (show)

See Also:


Attachments
diff of output of sysctl dev.igc before and after link drop (11.56 KB, text/plain)
2022-08-08 12:40 UTC, mike
no flags Details
Patch to display NVM information similar to what's already done for igb (10.05 KB, patch)
2023-08-16 05:29 UTC, john
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description mike 2022-08-08 12:40:20 UTC
Created attachment 235773 [details]
diff of output of sysctl dev.igc before and after link drop

Using a combo of netperf3 on 2 back to back RELENG_13 machines, the link will drop both at 1G and 2.5G link speeds.  Trying without FC does seem to reduce the instances of the drop, but it still happens.
Running a simple shell script to do

TARGET=10.1.255.168
I=/usr/local/bin/iperf3
REPDIR=/var/tmp/reports

while true 
do
d=`date "+%s"`
$I -t 60 -c $TARGET
sleep 2
/sbin/sysctl -a dev.igc.1 > $REPDIR/${d}a
$I -P4 -t 60 -c $TARGET
/sbin/sysctl -a dev.igc.1 > $REPDIR/${d}b
sleep 2
$I -R -t 60 -c $TARGET
/sbin/sysctl -a dev.igc.1 > $REPDIR/${d}c
sleep 2
$I  -R -P4 -t 60 -c $TARGET
/sbin/sysctl -a dev.igc.1 > $REPDIR/${d}d
done

will result in random drops of the link

igc1: link state changed to DOWN
igc1: link state changed to UP
igc1: link state changed to DOWN
igc1: link state changed to UP
igc1: link state changed to DOWN
igc1: link state changed to UP


its not clear if its the receiver or sender thats dropping it as the 2 machines are on xover cable. Discussion at 
https://lists.freebsd.org/archives/freebsd-stable/2022-August/000835.html
Comment 1 mike 2022-09-14 15:08:15 UTC
Note, running the same test with a different set of boxes that use the i226 version of the PHY works just fine. After ~ 18hrs of testing, I was not able to get the transmitting NIC to drop carrier under load




igc0@pci0:2:0:0:        class=0x020000 rev=0x04 hdr=0x00 vendor=0x8086 device=0x125c subvendor=0x8086 subdevice=0x0000
    vendor     = 'Intel Corporation'
    class      = network
    subclass   = ethernet
    bar   [10] = type Memory, range 32, base rx80200000, size 1048576, enabled
    bar   [1c] = type Memory, range 32, base rx80300000, size 16384, enabled
    cap 01[40] = powerspec 3  supports D0 D3  current D0
    cap 05[50] = MSI supports 1 message, 64 bit, vector masks 
    cap 11[70] = MSI-X supports 5 messages, enabled
                 Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
    cap 10[a0] = PCI-Express 2 endpoint max data 256(512) FLR RO NS
                 max read 512
                 link x1(x1) speed 5.0(5.0) ASPM disabled(L1)
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 0 corrected
    ecap 0003[140] = Serial 1 7c2be1ffff131e81
    ecap 0018[1c0] = LTR 1
    ecap 001f[1f0] = Precision Time Measurement 1
    ecap 001e[1e0] = L1 PM Substates 1


---<<BOOT>>---
Copyright (c) 1992-2021 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 13.1-STABLE #0 stable/13-d4d8ce30d: Tue Sep 13 12:20:19 EDT 2022
    mdtancsa@topton2.sentex.ca:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
FreeBSD clang version 14.0.5 (https://github.com/llvm/llvm-project.git llvmorg-14.0.5-0-gc12386ae247c)
VT(vga): resolution 640x480
CPU: Intel(R) Celeron(R) N5105 @ 2.00GHz (1996.80-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x906c0  Family=0x6  Model=0x9c  Stepping=0
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x4ff8ebbf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,SDBG,CX16,xTPR,PDCM,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,RDRAND>
  AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
  AMD Features2=0x101<LAHF,Prefetch>
  Structured Extended Features=0x2394a2c3<FSGSBASE,TSCADJ,FDPEXC,SMEP,ERMS,NFPUSG,PQE,RDSEED,SMAP,CLFLUSHOPT,CLWB,PROCTRACE,SHA>
  Structured Extended Features2=0x18400124<UMIP,WAITPKG,GFNI,RDPID,MOVDIRI,MOVDIR64B>
  Structured Extended Features3=0xfc000400<MD_CLEAR,IBPB,STIBP,L1DFL,ARCH_CAP,CORE_CAP,SSBD>
  XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES>
  IA32_ARCH_CAPS=0x6b<RDCL_NO,IBRS_ALL,SKIP_L1DFL_VME,MDS_NO>
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
  TSC: P-state invariant, performance statistics
real memory  = 8589934592 (8192 MB)
avail memory = 8019083264 (7647 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <ALASKA A M I >
WARNING: L1 data cache covers fewer APIC IDs than a core (0 < 1)
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 1 package(s) x 4 core(s)
random: registering fast source Intel Secure Key RNG
random: fast provider: "Intel Secure Key RNG"
random: unblocking device.
ioapic0 <Version 2.0> irqs 0-119
Launching APs: 2 3 1
random: entropy device external interface
kbd0 at kbdmux0
efirtc0: <EFI Realtime Clock>
efirtc0: registered as a time-of-day clock, resolution 1.000000s
smbios0: <System Management BIOS> at iomem 0x74c7b000-0x74c7b01e
smbios0: Version: 3.3, BCD Revision: 3.3
aesni0: <AES-CBC,AES-CCM,AES-GCM,AES-ICM,AES-XTS,SHA1,SHA256>
acpi0: <ALASKA A M I >
acpi0: Power Button (fixed)
cpu0: <ACPI CPU> on acpi0
hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0
Timecounter "HPET" frequency 19200000 Hz quality 950
Event timer "HPET" frequency 19200000 Hz quality 550
Event timer "HPET1" frequency 19200000 Hz quality 440
Event timer "HPET2" frequency 19200000 Hz quality 440
Event timer "HPET3" frequency 19200000 Hz quality 440
Event timer "HPET4" frequency 19200000 Hz quality 440
attimer0: <AT timer> port 0x40-0x43,0x50-0x53 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1808-0x180b on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
vgapci0: <VGA-compatible display> port 0x3000-0x303f mem 0x6000000000-0x6000ffffff,0x4000000000-0x400fffffff at device 2.0 on pci0
vgapci0: Boot video device
xhci0: <XHCI (generic) USB 3.0 controller> mem 0x6001100000-0x600110ffff at device 20.0 on pci0
xhci0: 32 bytes context size, 64-bit DMA
usbus0 on xhci0
usbus0: 5.0Gbps Super Speed USB v3.0
pci0: <memory, RAM> at device 20.2 (no driver attached)
sdhci_pci0: <Generic SD HCI> mem 0x600111a000-0x600111afff at device 20.5 on pci0
sdhci_pci0: 1 slot(s) allocated
pci0: <serial bus> at device 21.0 (no driver attached)
pci0: <serial bus> at device 21.2 (no driver attached)
pci0: <simple comms> at device 22.0 (no driver attached)
ahci0: <AHCI SATA controller> port 0x3090-0x3097,0x3080-0x3083,0x3060-0x307f mem 0x80500000-0x80501fff,0x80503000-0x805030ff,0x80502000-0x805027ff at device 23.0 on pci0
ahci0: AHCI v1.31 with 1 6Gbps ports, Port Multiplier not supported
ahcich1: <AHCI channel> at channel 1 on ahci0
pcib1: <ACPI PCI-PCI bridge> at device 28.0 on pci0
pci1: <ACPI PCI bus> on pcib1
nvme0: <Generic NVMe Device> mem 0x80400000-0x80403fff at device 0.0 on pci1
pcib2: <ACPI PCI-PCI bridge> at device 28.4 on pci0
pci2: <ACPI PCI bus> on pcib2
igc0: <Intel(R) Ethernet Controller I226-V> mem 0x80200000-0x802fffff,0x80300000-0x80303fff at device 0.0 on pci2
igc0: Using 1024 TX descriptors and 1024 RX descriptors
igc0: Using 4 RX queues 4 TX queues
igc0: Using MSI-X interrupts with 5 vectors
igc0: Ethernet address: 7c:2b:e1:13:1e:81
igc0: netmap queues/slots: TX 4/1024, RX 4/1024
pcib3: <ACPI PCI-PCI bridge> at device 28.5 on pci0
pci3: <ACPI PCI bus> on pcib3
igc1: <Intel(R) Ethernet Controller I226-V> mem 0x80000000-0x800fffff,0x80100000-0x80103fff at device 0.0 on pci3
igc1: Using 1024 TX descriptors and 1024 RX descriptors
igc1: Using 4 RX queues 4 TX queues
igc1: Using MSI-X interrupts with 5 vectors
igc1: Ethernet address: 7c:2b:e1:13:1e:82
igc1: netmap queues/slots: TX 4/1024, RX 4/1024
pcib4: <ACPI PCI-PCI bridge> at device 28.6 on pci0
pci4: <ACPI PCI bus> on pcib4
igc2: <Intel(R) Ethernet Controller I226-V> mem 0x7fe00000-0x7fefffff,0x7ff00000-0x7ff03fff at device 0.0 on pci4
igc2: Using 1024 TX descriptors and 1024 RX descriptors
igc2: Using 4 RX queues 4 TX queues
igc2: Using MSI-X interrupts with 5 vectors
igc2: Ethernet address: 7c:2b:e1:13:1e:83
igc2: netmap queues/slots: TX 4/1024, RX 4/1024
pcib5: <ACPI PCI-PCI bridge> at device 28.7 on pci0
pci5: <ACPI PCI bus> on pcib5
igc3: <Intel(R) Ethernet Controller I226-V> mem 0x7fc00000-0x7fcfffff,0x7fd00000-0x7fd03fff at device 0.0 on pci5
igc3: Using 1024 TX descriptors and 1024 RX descriptors
igc3: Using 4 RX queues 4 TX queues
igc3: Using MSI-X interrupts with 5 vectors
igc3: Ethernet address: 7c:2b:e1:13:1e:84
igc3: netmap queues/slots: TX 4/1024, RX 4/1024
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
hdac0: <Intel Jasper Lake HDA Controller> mem 0x6001110000-0x6001113fff,0x6001000000-0x60010fffff at device 31.3 on pci0
pci0: <serial bus> at device 31.5 (no driver attached)

dev.igc.0.%parent: pci2
dev.igc.0.%pnpinfo: vendor=0x8086 device=0x125c subvendor=0x8086 subdevice=0x0000 class=0x020000
dev.igc.0.%location: slot=0 function=0 dbsf=pci0:2:0:0 handle=\_SB_.PC00.RP05.PXSX
dev.igc.0.%driver: igc
dev.igc.0.%desc: Intel(R) Ethernet Controller I226-V
dev.igc.%parent:
Comment 2 john 2023-08-15 03:57:10 UTC
I recently upgrade a FreeBSD 13.2 stable system to newer hardware which included an i225 v3 NIC (it's an IOCrest M.2 card).  The system is connected to a 1Gbps switch ... every now and then the system loses link, most recent occasion was while running an installworld over NFS.  Rebooting gets things working again.
It's noted by the Intel response at:

  https://community.intel.com/t5/Ethernet-Products/Intel-Ethernet-Controller-3-I225-V-Connection-Drop/td-p/1482427

that the i225 is subject to a connection drop issue that's fixed by NVM 1.93 along with driver changes.

While currently some of the other Intel NIC drivers for FreeBSD will display EEPROM / NVM information ... i.e.:

  igb0: EEPROM V1.115-0 eTrack 0x87850000

It doesn't appear that the igc driver provides that information.  It also might be useful for the driver to display a warning if the i225 NIC is using a NVM prior to 1.93.
Comment 3 john 2023-08-16 05:29:59 UTC
Created attachment 244136 [details]
Patch to display NVM information similar to what's already done for igb

Attached is a patch to display the NVM version.  I simply took the code from the FreeBSD igb driver and transplanted it into the igc driver.  I sanity checked it against the code in the DPDK igc driver as well as smoke testing it on my system (my adapter apparently has NVM version 1.79).