Running an Intel NUC here (NUC8i5BEH) which has an Intel I219 V6 chip in it that the em(4) driver handles. Very sporadically, the machine will either outright lose network connectivity, or start to experience significantly-increased latency responding to network traffic. Almost like how a traffic jam might start, is how it feels. The machine is a Squid proxy server for my home network, and it seems accessing websites with large amounts of resources to fetch can more easily trigger whatever this bug is, though I have also triggered the bug with something as simple as editing a text file in nano over SSH. When the bug happens, I sometimes see my switch, a Netgear GS324T (S350 series), change the port over from 1Gbps to 100Mbps (green to orange on the LED). If I ping the device from another machine on the network, the ping is either lost, the host is marked as "down", or the ping returns upwards of a few thousand milliseconds later. Recovering from the issue is usually only done by rebooting the machine. Sometimes, if you just wait several minutes, the machine will eventually respond and behave normally. This to me feels like a buffer being flooded too quickly. I have been using jumbo frames w/ an MTU of 9000. As a test, I have lowered that down to 1500 to see if the issues remain. It feels like this MIGHT be tied to Bug #218894, per the last comment in 2018. If it is, lowering the MTU to 1500, or staying under 6k/pkt might avoid the issue, as it smells like a buffer in em(4) is not sized correctly on I219 chips to handle 9k/pkt jumbo frames. I am experiencing this issue on both the base em(4) driver (7.6.1-k) as well as the latest intel-em-kmod driver from ports (7.7.5). Some technical info (IP/DNS info removed): dmesg, shwoing the device going up/down a few times, including when I tried unplugging its cable from the switch: ---<<BOOT>>--- Copyright (c) 1992-2019 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 12.1-RELEASE-p4 CUSTOM-12_1 amd64 FreeBSD clang version 8.0.1 (tags/RELEASE_801/final 366581) (based on LLVM 8.0.1) VT(efifb): resolution 1024x768 module zfsctrl already present! module_register: cannot register pci/em from kernel; already loaded from if_em_updated.ko Module pci/em failed to register: 17 CPU: Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz (2304.11-MHz K8-class CPU) Origin="GenuineIntel" Id=0x806ea Family=0x6 Model=0x8e Stepping=10 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x7ffafbbf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,SDBG,FMA,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND> AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM> AMD Features2=0x121<LAHF,ABM,Prefetch> Structured Extended Features=0x29c67af<FSGSBASE,TSCADJ,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,NFPUSG,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PROCTRACE> Structured Extended Features3=0x9c002400<MD_CLEAR,TSXFA,IBPB,STIBP,L1DFL,SSBD> XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES> VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant, performance statistics real memory = 17179869184 (16384 MB) avail memory = 16487288832 (15723 MB) CPU microcode: updated from 0xc6 to 0xca Event timer "LAPIC" quality 600 ACPI APIC Table: <INTEL NUC8i5BE> FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 1 package(s) x 4 core(s) random: unblocking device. ioapic0 <Version 2.0> irqs 0-119 on motherboard Launching APs: 3 2 1 Timecounter "TSC-low" frequency 1152052507 Hz quality 1000 random: entropy device external interface module_register_init: MOD_LOAD (vesa, 0xffffffff80b2e120, 0) error 19 kbd0 at kbdmux0 random: registering fast source Intel Secure Key RNG random: fast provider: "Intel Secure Key RNG" nexus0 efirtc0: <EFI Realtime Clock> on motherboard efirtc0: registered as a time-of-day clock, resolution 1.000000s cryptosoft0: <software crypto> on motherboard aesni0: <AES-CBC,AES-CCM,AES-GCM,AES-ICM,AES-XTS> on motherboard acpi0: <INTEL NUC8i5BE> on motherboard acpi0: Power Button (fixed) cpu0: <ACPI CPU> on acpi0 hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0 Timecounter "HPET" frequency 24000000 Hz quality 950 Event timer "HPET" frequency 24000000 Hz quality 550 Event timer "HPET1" frequency 24000000 Hz quality 440 Event timer "HPET2" frequency 24000000 Hz quality 440 Event timer "HPET3" frequency 24000000 Hz quality 440 Event timer "HPET4" frequency 24000000 Hz quality 440 attimer0: <AT timer> port 0x40-0x43,0x50-0x53 irq 0 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1808-0x180b on acpi0 acpi_ec0: <Embedded Controller: GPE 0x14> port 0x62,0x66 on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 vgapci0: <VGA-compatible display> port 0x3000-0x303f mem 0xbf000000-0xbfffffff,0x80000000-0x8fffffff at device 2.0 on pci0 vgapci0: Boot video device xhci0: <XHCI (generic) USB 3.0 controller> mem 0x404a000000-0x404a00ffff at device 20.0 on pci0 xhci0: 32 bytes context size, 64-bit DMA usbus0 on xhci0 usbus0: 5.0Gbps Super Speed USB v3.0 pci0: <memory, RAM> at device 20.2 (no driver attached) pci0: <simple comms> at device 22.0 (no driver attached) ahci0: <AHCI SATA controller> port 0x3090-0x3097,0x3080-0x3083,0x3060-0x307f mem 0xc0120000-0xc0121fff,0xc0123000-0xc01230ff,0xc0122000-0xc01227ff at device 23.0 on pci0 ahci0: AHCI v1.31 with 1 6Gbps ports, Port Multiplier not supported ahcich2: <AHCI channel> at channel 2 on ahci0 pcib1: <ACPI PCI-PCI bridge> at device 28.0 on pci0 pci1: <ACPI PCI bus> on pcib1 pcib2: <ACPI PCI-PCI bridge> at device 28.4 on pci0 pcib2: [GIANT-LOCKED] pcib3: <ACPI PCI-PCI bridge> at device 29.0 on pci0 pci2: <ACPI PCI bus> on pcib3 nvme0: <Generic NVMe Device> mem 0xc0000000-0xc0003fff at device 0.0 on pci2 isab0: <PCI-ISA bridge> at device 31.0 on pci0 isa0: <ISA bus> on isab0 pci0: <serial bus> at device 31.5 (no driver attached) em0: <Intel(R) PRO/1000 Network Connection 7.7.5> mem 0xc0100000-0xc011ffff at device 31.6 on pci0 em0: Using an MSI interrupt em0: Ethernet address: 1c:69:7a:08:74:7e acpi_button0: <Sleep Button> on acpi0 acpi_button1: <Power Button> on acpi0 acpi_tz0: <Thermal Zone> on acpi0 acpi_syscontainer0: <System Container> on acpi0 acpi_tz1: <Thermal Zone> on acpi0 acpi_tz1: _HOT value is absurd, ignored (-73.1C) atrtc0: <AT realtime clock> at port 0x70 irq 8 on isa0 atrtc0: Warning: Couldn't map I/O. atrtc0: registered as a time-of-day clock, resolution 1.000000s Event timer "RTC" frequency 32768 Hz quality 0 uart0: <Non-standard ns8250 class UART with FIFOs> at port 0x3f8 irq 4 flags 0x10 on isa0 coretemp0: <CPU On-Die Thermal Sensors> on cpu0 est0: <Enhanced SpeedStep Frequency Control> on cpu0 ZFS filesystem version: 5 ZFS storage pool version: features support (5000) Timecounters tick every 10.000 msec acpi_tz1: _TMP value is absurd, ignored (-263.1C) ugen0.1: <0x8086 XHCI root HUB> at usbus0 uhub0: <0x8086 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0 nvd0: <Samsung SSD 970 EVO 1TB> NVMe namespace nvd0: 953869MB (1953525168 512 byte sectors) Trying to mount root from zfs:core/env/fbsd_12.1-20200422 []... Root mount waiting for: usbus0 Root mount waiting for: usbus0 uhub0: 18 ports with 18 removable, self powered ugen0.2: <CHESEN PS2 to USB Converter> at usbus0 ukbd0 on uhub0 ukbd0: <CHESEN PS2 to USB Converter, class 0/0, rev 1.10/0.10, addr 1> on usbus0 kbd1 at ukbd0 GEOM_ELI: Device nvd0p2.eli created. GEOM_ELI: Encryption: AES-XTS 256 GEOM_ELI: Crypto: hardware lo0: link state changed to UP em0: link state changed to UP ums0 on uhub0 ums0: <CHESEN PS2 to USB Converter, class 0/0, rev 1.10/0.10, addr 1> on usbus0 ums0: 5 buttons and [XYZ] coordinates ID=1 ipfw2 (+ipv6) initialized, divert loadable, nat loadable, default to deny, logging disabled em0: link state changed to DOWN em0: link state changed to UP em0: link state changed to DOWN em0: link state changed to UP em0: link state changed to DOWN em0: link state changed to UP em0: link state changed to DOWN em0: link state changed to UP em0: link state changed to DOWN em0: link state changed to UP em0: link state changed to DOWN em0: link state changed to UP em0: link state changed to DOWN em0: link state changed to UP em0: link state changed to DOWN em0: link state changed to UP em0: link state changed to DOWN em0: link state changed to UP em0: link state changed to DOWN em0: link state changed to UP em0: link state changed to DOWN em0: link state changed to UP em0: link state changed to DOWN em0: link state changed to UP netstat -i shows a handful of Ierrs: Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll em0 1500 <Link#1> 1c:69:7a:xx:xx:xx 5959360 4 0 5229960 0 0 em0 - 192.168.x.0/2 xxxxxx 5896418 - - 5201176 - - em0 - fe80::%em0/64 fe80::1e69:7aff:f 0 - - 0 - - em0 - fdxx::xxxx:xx xxxxxx 28862 - - 28369 - - lo0 16384 <Link#2> lo0 1384 0 0 1384 0 0 lo0 - localhost localhost 0 - - 215 - - lo0 - fe80::%lo0/64 fe80::1%lo0 0 - - 0 - - lo0 - your-net localhost 42 - - 1384 - - ipfw0 - <Link#3> ipfw0 0 0 0 0 0 0 pciconf: em0@pci0:0:31:6: class=0x020000 card=0x20748086 chip=0x15be8086 rev=0x30 hdr=0x00 vendor = 'Intel Corporation' device = 'Ethernet Connection (6) I219-V' class = network subclass = ethernet sysctl -a info for em0: # sysctl -a | grep "\.em\." hw.em.max_interrupt_rate: 8000 hw.em.eee_setting: 1 hw.em.rx_process_limit: -1 hw.em.sbp: 1 hw.em.smart_pwr_down: 0 hw.em.rx_abs_int_delay: 66 hw.em.tx_abs_int_delay: 66 hw.em.rx_int_delay: 0 hw.em.tx_int_delay: 66 hw.em.disable_crc_stripping: 0 dev.em.0.wake: 0 dev.em.0.interrupts.rx_overrun: 0 dev.em.0.interrupts.rx_desc_min_thresh: 0 dev.em.0.interrupts.tx_queue_min_thresh: 0 dev.em.0.interrupts.tx_queue_empty: 0 dev.em.0.interrupts.tx_abs_timer: 0 dev.em.0.interrupts.tx_pkt_timer: 0 dev.em.0.interrupts.rx_abs_timer: 0 dev.em.0.interrupts.rx_pkt_timer: 0 dev.em.0.interrupts.asserts: 4163967 dev.em.0.mac_stats.tx_frames_1024_1522: -1 dev.em.0.mac_stats.tx_frames_512_1023: -1 dev.em.0.mac_stats.tx_frames_256_511: -1 dev.em.0.mac_stats.tx_frames_128_255: -1 dev.em.0.mac_stats.tx_frames_65_127: -1 dev.em.0.mac_stats.tx_frames_64: -1 dev.em.0.mac_stats.rx_frames_1024_1522: -1 dev.em.0.mac_stats.rx_frames_512_1023: -1 dev.em.0.mac_stats.rx_frames_256_511: -1 dev.em.0.mac_stats.rx_frames_128_255: -1 dev.em.0.mac_stats.rx_frames_65_127: -1 dev.em.0.mac_stats.rx_frames_64: -1 dev.em.0.mac_stats.tso_ctx_fail: 0 dev.em.0.mac_stats.tso_txd: 0 dev.em.0.mac_stats.mcast_pkts_txd: 24 dev.em.0.mac_stats.bcast_pkts_txd: 124 dev.em.0.mac_stats.good_pkts_txd: 5231594 dev.em.0.mac_stats.total_pkts_txd: 5231594 dev.em.0.mac_stats.good_octets_txd: 6647896611 dev.em.0.mac_stats.good_octets_recvd: 6724467487 dev.em.0.mac_stats.mcast_pkts_recvd: 273 dev.em.0.mac_stats.bcast_pkts_recvd: 32687 dev.em.0.mac_stats.good_pkts_recvd: 5960900 dev.em.0.mac_stats.total_pkts_recvd: 5960904 dev.em.0.mac_stats.xoff_txd: 0 dev.em.0.mac_stats.xoff_recvd: 0 dev.em.0.mac_stats.xon_txd: 0 dev.em.0.mac_stats.xon_recvd: 0 dev.em.0.mac_stats.coll_ext_errs: 0 dev.em.0.mac_stats.alignment_errs: 0 dev.em.0.mac_stats.crc_errs: 0 dev.em.0.mac_stats.recv_errs: 0 dev.em.0.mac_stats.recv_jabber: 0 dev.em.0.mac_stats.recv_oversize: 0 dev.em.0.mac_stats.recv_fragmented: 0 dev.em.0.mac_stats.recv_undersize: 0 dev.em.0.mac_stats.recv_no_buff: 0 dev.em.0.mac_stats.missed_packets: 4 dev.em.0.mac_stats.defer_count: 0 dev.em.0.mac_stats.sequence_errors: 0 dev.em.0.mac_stats.symbol_errors: 0 dev.em.0.mac_stats.collision_count: 0 dev.em.0.mac_stats.late_coll: 0 dev.em.0.mac_stats.multiple_coll: 0 dev.em.0.mac_stats.single_coll: 0 dev.em.0.mac_stats.excess_coll: 0 dev.em.0.queue_rx_0.rx_irq: 0 dev.em.0.queue_rx_0.rxd_tail: 1001 dev.em.0.queue_rx_0.rxd_head: 1003 dev.em.0.queue_tx_0.no_desc_avail: 0 dev.em.0.queue_tx_0.tx_irq: 0 dev.em.0.queue_tx_0.txd_tail: 537 dev.em.0.queue_tx_0.txd_head: 538 dev.em.0.fc_low_water: 20552 dev.em.0.fc_high_water: 23584 dev.em.0.rx_control: 67141634 dev.em.0.device_control: 1573440 dev.em.0.watchdog_timeouts: 0 dev.em.0.rx_overruns: 4 dev.em.0.tx_dma_fail: 0 dev.em.0.dropped: 0 dev.em.0.cluster_alloc_fail: 0 dev.em.0.mbuf_alloc_fail: 0 dev.em.0.link_irq: 0 dev.em.0.eee_control: 1 dev.em.0.rx_processing_limit: -1 dev.em.0.itr: 488 dev.em.0.tx_abs_int_delay: 66 dev.em.0.rx_abs_int_delay: 66 dev.em.0.tx_int_delay: 66 dev.em.0.rx_int_delay: 0 dev.em.0.fc: 0 dev.em.0.debug: -1 dev.em.0.nvm: -1 dev.em.0.%parent: pci0 dev.em.0.%pnpinfo: vendor=0x8086 device=0x15be subvendor=0x8086 subdevice=0x2074 class=0x020000 dev.em.0.%location: slot=31 function=6 dbsf=pci0:0:31:6 handle=\_SB_.PCI0.GLAN dev.em.0.%driver: em dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.7.5 dev.em.%parent: ping output from another device on the network: # ping xxxxxx PING xxxxxx (192.168.x.yyy): 56 data bytes ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down 64 bytes from 192.168.x.yyy: icmp_seq=17 ttl=64 time=4227.244 ms 64 bytes from 192.168.x.yyy: icmp_seq=18 ttl=64 time=3154.970 ms 64 bytes from 192.168.x.yyy: icmp_seq=19 ttl=64 time=2143.813 ms 64 bytes from 192.168.x.yyy: icmp_seq=20 ttl=64 time=1071.436 ms 64 bytes from 192.168.x.yyy: icmp_seq=21 ttl=64 time=1.375 ms 64 bytes from 192.168.x.yyy: icmp_seq=79 ttl=64 time=0.578 ms 64 bytes from 192.168.x.yyy: icmp_seq=284 ttl=64 time=0.958 ms 64 bytes from 192.168.x.yyy: icmp_seq=293 ttl=64 time=0.580 ms 64 bytes from 192.168.x.yyy: icmp_seq=334 ttl=64 time=0.496 ms 64 bytes from 192.168.x.yyy: icmp_seq=335 ttl=64 time=0.454 ms 64 bytes from 192.168.x.yyy: icmp_seq=336 ttl=64 time=0.473 ms 64 bytes from 192.168.x.yyy: icmp_seq=337 ttl=64 time=0.457 ms 64 bytes from 192.168.x.yyy: icmp_seq=338 ttl=64 time=0.459 ms 64 bytes from 192.168.x.yyy: icmp_seq=339 ttl=64 time=0.442 ms 64 bytes from 192.168.x.yyy: icmp_seq=340 ttl=64 time=0.447 ms 64 bytes from 192.168.x.yyy: icmp_seq=341 ttl=64 time=0.452 ms 64 bytes from 192.168.x.yyy: icmp_seq=342 ttl=64 time=0.437 ms 64 bytes from 192.168.x.yyy: icmp_seq=343 ttl=64 time=0.462 ms
Can you test stable/12 snapshots?
(In reply to Kevin Bowling from comment #1) > Can you test stable/12 snapshots? Not easily. The system is currently running 12.1-RELEASE-p8. I can, however, test patches if you have a specific commit that may address the issue and works w/ 12.1-RELEASE (or mostly applies to the source). Currently using the out-of-tree intel-em-kmod driver on this platform, version 7.7.8 from Intel's download center. The latest version in the ports tree is 7.7.5 and that one also exhibited issues w/ jumbo frames. I ended up giving up and went back to mtu 1500, and have not attempted jumbo frames on 7.7.8 yet.
(In reply to Joshua Kinard from comment #2) Thanks Joshua To clarify, 7.7.8 from upstream does *not* exhibit the behviour, but 7.7.5 does?
(In reply to Kubilay Kocak from comment #3) I believe this was a problem in 12.1-RELEASE-pX. The problem went away after the upgrade to 12.2-RELEASE, both on the in-tree driver (7.7.5) and the external em-7.7.8 from Intel upstream. I've since upgraded to 13.0-RELEASE (also ran several of the RCs) on the device and haven't had any issues with jumbo frames since then. I even ported em-7.7.8 to compile on 13.0-RELEASE and that works w/o issue thus far.
Per the submitter this works as intended in 12.2-RELEASE and 13.0-RELEASE. There is nothing to MFC to stable/11 because it uses a different driver.
^Triage: Correct resolution. Without identified/references/specific commits/committers, OBE is more appropriate. mfc-* not appropriate without identified commits (cancel accordingly).
^Triage: Assign to committer resolving (OBE).
Just to add a final resolution and to mark this as a hardware issue and not a bug in FreeBSD, the problem ultimately turned out to be the 1ft CAT6A Monoprice SlimRun cable I was using to connect the NUC to my 24-port switch. I learned awhile ago that swapping to a different cable made the problem go away, but I didn't know why. After finally finding where my cable tester was hidden, I was able to resolve that the slimrun cable had a faulty ground connection between the two ends that was probably creating a ground loop between the switch and the NUC, causing the switch to overreact and either disable the port or drop it to 100mbps, which probably wasn't handled well by the em(4) driver on the NUC.