When transferring large files over the network using the re driver Apr 7 04:06:45 upstairs kernel: re0: watchdog timeout Apr 7 04:06:45 upstairs kernel: re0: link state changed to DOWN Apr 7 04:06:49 upstairs kernel: re0: link state changed to UP Apr 7 04:06:54 upstairs kernel: re0: watchdog timeout Apr 7 04:06:54 upstairs kernel: re0: link state changed to DOWN dmesg and pciconf -lv output below: Copyright (c) 1992-2012 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 9.0-STABLE #0 r233868: Wed Apr 4 04:46:51 PDT 2012 jack@upstairs.jack.com:/usr/obj/usr/src/sys/UPSTAIRS amd64 CPU: AMD FX(tm)-4100 Quad-Core Processor (4027.00-MHz K8-class CPU) Origin = "AuthenticAMD" Id = 0x600f12 Family = 15 Model = 1 Stepping = 2 Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> Features2=0x1e98220b<SSE3,PCLMULQDQ,MON,SSSE3,CX16,SSE4.1,SSE4.2,POPCNT,AESNI,XSAVE,OSXSAVE,AVX> AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM> AMD Features2=0x1c9bfff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,XOP,SKINIT,WDT,LWP,FMA4,NodeId,Topology,<b23>,<b24>> TSC: P-state invariant, performance statistics real memory = 17179869184 (16384 MB) avail memory = 16484421632 (15720 MB) Event timer "LAPIC" quality 400 ACPI APIC Table: <GBT GBTUACPI> FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 1 package(s) x 4 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 ioapic0: Changing APIC ID to 8 ioapic0 <Version 2.1> irqs 0-23 on motherboard kbd1 at kbdmux0 acpi0: <GBT GBTUACPI> on motherboard acpi0: Power Button (fixed) acpi0: reservation of 0, a0000 (3) failed acpi0: reservation of 100000, bfca0000 (3) failed cpu0: <ACPI CPU> on acpi0 cpu1: <ACPI CPU> on acpi0 cpu2: <ACPI CPU> on acpi0 cpu3: <ACPI CPU> on acpi0 attimer0: <AT timer> port 0x40-0x43 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff irq 0,8 on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 950 atrtc0: <AT realtime clock> port 0x70-0x73 on acpi0 Event timer "RTC" frequency 32768 Hz quality 0 Timecounter "ACPI-safe" frequency 3579545 Hz quality 850 acpi_timer0: <32-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 acpi_button0: <Power Button> on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 pci0: <base peripheral> at device 0.2 (no driver attached) pcib1: <ACPI PCI-PCI bridge> irq 18 at device 2.0 on pci0 pci1: <ACPI PCI bus> on pcib1 vgapci0: <VGA-compatible display> port 0xdf00-0xdf7f mem 0xfb000000-0xfbffffff,0xc0000000-0xcfffffff,0xde000000-0xdfffffff irq 18 at device 0.0 on pci1 nvidia0: <GeForce 210> on vgapci0 vgapci0: child nvidia0 requested pci_enable_io vgapci0: child nvidia0 requested pci_enable_io hdac0: <NVIDIA (0x0be3) HDA Controller> mem 0xfcffc000-0xfcffffff irq 19 at device 0.1 on pci1 pcib2: <ACPI PCI-PCI bridge> irq 16 at device 4.0 on pci0 pci2: <ACPI PCI bus> on pcib2 xhci0: <XHCI (generic) USB 3.0 controller> mem 0xfd7f8000-0xfd7fffff irq 16 at device 0.0 on pci2 xhci0: 64 byte context size. usbus0 on xhci0 pcib3: <ACPI PCI-PCI bridge> irq 17 at device 9.0 on pci0 pci3: <ACPI PCI bus> on pcib3 re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F PCIe Gigabit Ethernet> port 0xee00-0xeeff mem 0xfd8ff000-0xfd8fffff,0xfd8f8000-0xfd8fbfff irq 17 at device 0.0 on pci3 re0: Using 1 MSI-X message re0: Chip rev. 0x2c800000 re0: MAC rev. 0x00000000 miibus0: <MII bus> on re0 rgephy0: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 1 on miibus0 rgephy0: none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow re0: Ethernet address: 50:e5:49:53:61:98 ahci0: <ATI IXP700 AHCI SATA controller> port 0xff00-0xff07,0xfe00-0xfe03,0xfd00-0xfd07,0xfc00-0xfc03,0xfb00-0xfb0f mem 0xfdfff000-0xfdfff3ff irq 19 at device 17.0 on pci0 ahci0: AHCI v1.20 with 6 6Gbps ports, Port Multiplier supported ahcich0: <AHCI channel> at channel 0 on ahci0 ahcich1: <AHCI channel> at channel 1 on ahci0 ahcich2: <AHCI channel> at channel 2 on ahci0 ahcich3: <AHCI channel> at channel 3 on ahci0 ahcich4: <AHCI channel> at channel 4 on ahci0 ahcich5: <AHCI channel> at channel 5 on ahci0 ohci0: <OHCI (generic) USB controller> mem 0xfdffe000-0xfdffefff irq 18 at device 18.0 on pci0 usbus1 on ohci0 ehci0: <EHCI (generic) USB 2.0 controller> mem 0xfdffd000-0xfdffd0ff irq 17 at device 18.2 on pci0 usbus2: EHCI version 1.0 usbus2 on ehci0 ohci1: <OHCI (generic) USB controller> mem 0xfdffc000-0xfdffcfff irq 18 at device 19.0 on pci0 usbus3 on ohci1 ehci1: <EHCI (generic) USB 2.0 controller> mem 0xfdffb000-0xfdffb0ff irq 17 at device 19.2 on pci0 usbus4: EHCI version 1.0 usbus4 on ehci1 pci0: <serial bus, SMBus> at device 20.0 (no driver attached) hdac1: <ATI SB600 HDA Controller> mem 0xfdff4000-0xfdff7fff irq 16 at device 20.2 on pci0 isab0: <PCI-ISA bridge> at device 20.3 on pci0 isa0: <ISA bus> on isab0 pcib4: <ACPI PCI-PCI bridge> at device 20.4 on pci0 pci4: <ACPI PCI bus> on pcib4 ohci2: <OHCI (generic) USB controller> mem 0xfdffa000-0xfdffafff irq 18 at device 20.5 on pci0 usbus5 on ohci2 pcib5: <ACPI PCI-PCI bridge> at device 21.0 on pci0 pci5: <ACPI PCI bus> on pcib5 ohci3: <OHCI (generic) USB controller> mem 0xfdff9000-0xfdff9fff irq 18 at device 22.0 on pci0 usbus6 on ohci3 ehci2: <EHCI (generic) USB 2.0 controller> mem 0xfdff8000-0xfdff80ff irq 17 at device 22.2 on pci0 usbus7: EHCI version 1.0 usbus7 on ehci2 atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounters tick every 1.000 msec vboxdrv: fAsync=0 offMin=0x4c9 offMax=0x863 IP Filter: v4.1.28 initialized. Default = pass all, Logging = disabled ipfw2 (+ipv6) initialized, divert enabled, nat enabled, rule-based forwarding enabled, default to accept, logging disabled DUMMYNET 0 with IPv6 initialized (100409) load_dn_sched dn_sched FIFO loaded load_dn_sched dn_sched PRIO loaded load_dn_sched dn_sched QFQ loaded load_dn_sched dn_sched RR loaded load_dn_sched dn_sched WF2Q+ loaded hdacc0: <NVIDIA GT21x HDA CODEC> at cad 0 on hdac0 hdaa0: <NVIDIA GT21x HDA CODEC Audio Function Group> at nid 1 on hdacc0 pcm0: <NVIDIA GT21x HDA CODEC PCM (HDMI/DP 8ch)> at nid 5 on hdaa0 hdacc1: <NVIDIA GT21x HDA CODEC> at cad 1 on hdac0 hdaa1: <NVIDIA GT21x HDA CODEC Audio Function Group> at nid 1 on hdacc1 pcm1: <NVIDIA GT21x HDA CODEC PCM (HDMI/DP 8ch)> at nid 5 on hdaa1 hdacc2: <NVIDIA GT21x HDA CODEC> at cad 2 on hdac0 hdaa2: <NVIDIA GT21x HDA CODEC Audio Function Group> at nid 1 on hdacc2 pcm2: <NVIDIA GT21x HDA CODEC PCM (HDMI/DP 8ch)> at nid 5 on hdaa2 hdacc3: <NVIDIA GT21x HDA CODEC> at cad 3 on hdac0 hdaa3: <NVIDIA GT21x HDA CODEC Audio Function Group> at nid 1 on hdacc3 pcm3: <NVIDIA GT21x HDA CODEC PCM (HDMI/DP 8ch)> at nid 5 on hdaa3 hdacc4: <Realtek ALC889 HDA CODEC> at cad 0 on hdac1 hdaa4: <Realtek ALC889 HDA CODEC Audio Function Group> at nid 1 on hdacc4 pcm4: <Realtek ALC889 HDA CODEC PCM (Rear Analog 7.1/2.0)> at nid 20,22,21,23 and 24,26 on hdaa4 pcm5: <Realtek ALC889 HDA CODEC PCM (Front Analog)> at nid 27 and 25 on hdaa4 pcm6: <Realtek ALC889 HDA CODEC PCM (Rear Digital)> at nid 30 on hdaa4 pcm7: <Realtek ALC889 HDA CODEC PCM (Onboard Digital)> at nid 17 on hdaa4 usbus0: 5.0Gbps Super Speed USB v3.0 usbus1: 12Mbps Full Speed USB v1.0 usbus2: 480Mbps High Speed USB v2.0 usbus3: 12Mbps Full Speed USB v1.0 usbus4: 480Mbps High Speed USB v2.0 usbus5: 12Mbps Full Speed USB v1.0 usbus6: 12Mbps Full Speed USB v1.0 usbus7: 480Mbps High Speed USB v2.0 ugen0.1: <0x1b6f> at usbus0 uhub0: <0x1b6f XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0 ugen1.1: <ATI> at usbus1 uhub1: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus1 ugen2.1: <ATI> at usbus2 uhub2: <ATI EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus2 ugen3.1: <ATI> at usbus3 uhub3: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus3 ugen4.1: <ATI> at usbus4 uhub4: <ATI EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus4 ugen5.1: <ATI> at usbus5 uhub5: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus5 ugen6.1: <ATI> at usbus6 uhub6: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus6 ugen7.1: <ATI> at usbus7 uhub7: <ATI EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus7 ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: <WDC WD6401AALS-00E8B0 05.00K05> ATA-8 SATA 2.x device ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 610480MB (1250263728 512 byte sectors: 16H 63S/T 16383C) ada0: Previously was known as ad4 ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 ada1: <WDC WD3000HLFS-01G6U4 04.04V06> ATA-8 SATA 2.x device ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada1: Command Queueing enabled ada1: 286168MB (586072368 512 byte sectors: 16H 63S/T 16383C) ada1: Previously was known as ad6 SMP: AP CPU #1 Launched! cd0 at ahcich3 bus 0 scbus3 target 0 lun 0 cd0: <HL-DT-ST DVD-ROM DH20N A102> Removable CD-ROM SCSI-0 device SMP: AP CPU #2 Launched! cd0: 150.000MB/s transfers (SATA 1.x, UDMA5, ATAPI 12bytes, PIO 8192bytes) cd0: Attempt to query device size failed: NOT READY, Medium not present - tray closed SMP: AP CPU #3 Launched! uhub5: 2 ports with 2 removable, self powered uhub6: 4 ports with 4 removable, self powered uhub1: 5 ports with 5 removable, self powered uhub3: 5 ports with 5 removable, self powered uhub0: 4 ports with 4 removable, self powered Root mount waiting for: usbus7 usbus4 usbus2 Root mount waiting for: usbus7 usbus4 usbus2 uhub7: 4 ports with 4 removable, self powered uhub2: 5 ports with 5 removable, self powered uhub4: 5 ports with 5 removable, self powered Trying to mount root from ufs:/dev/ada0p2 [rw]... ugen3.2: <Logitech> at usbus3 ums0: <Logitech USB-PS2 Optical Mouse, class 0/0, rev 2.00/27.30, addr 2> on usbus3 ums0: 8 buttons and [XYZT] coordinates ID=0 hostb0@pci0:0:0:0: class=0x060000 card=0x5a141002 chip=0x5a141002 rev=0x02 hdr=0x00 vendor = 'ATI Technologies Inc' device = 'RD890 PCI to PCI bridge (external gfx0 port B)' class = bridge subclass = HOST-PCI none0@pci0:0:0:2: class=0x080600 card=0x5a231002 chip=0x5a231002 rev=0x00 hdr=0x00 vendor = 'ATI Technologies Inc' class = base peripheral pcib1@pci0:0:2:0: class=0x060400 card=0x5a141002 chip=0x5a161002 rev=0x00 hdr=0x01 vendor = 'ATI Technologies Inc' device = 'RD890 PCI to PCI bridge (PCI express gpp port B)' class = bridge subclass = PCI-PCI pcib2@pci0:0:4:0: class=0x060400 card=0x5a141002 chip=0x5a181002 rev=0x00 hdr=0x01 vendor = 'ATI Technologies Inc' device = 'RD890 PCI to PCI bridge (PCI express gpp port D)' class = bridge subclass = PCI-PCI pcib3@pci0:0:9:0: class=0x060400 card=0x5a141002 chip=0x5a1c1002 rev=0x00 hdr=0x01 vendor = 'ATI Technologies Inc' device = 'RD890 PCI to PCI bridge (PCI express gpp port H)' class = bridge subclass = PCI-PCI ahci0@pci0:0:17:0: class=0x010601 card=0xb0021458 chip=0x43911002 rev=0x40 hdr=0x00 vendor = 'ATI Technologies Inc' device = 'SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode]' class = mass storage subclass = SATA ohci0@pci0:0:18:0: class=0x0c0310 card=0x50041458 chip=0x43971002 rev=0x00 hdr=0x00 vendor = 'ATI Technologies Inc' device = 'SB7x0/SB8x0/SB9x0 USB OHCI0 Controller' class = serial bus subclass = USB ehci0@pci0:0:18:2: class=0x0c0320 card=0x50041458 chip=0x43961002 rev=0x00 hdr=0x00 vendor = 'ATI Technologies Inc' device = 'SB7x0/SB8x0/SB9x0 USB EHCI Controller' class = serial bus subclass = USB ohci1@pci0:0:19:0: class=0x0c0310 card=0x50041458 chip=0x43971002 rev=0x00 hdr=0x00 vendor = 'ATI Technologies Inc' device = 'SB7x0/SB8x0/SB9x0 USB OHCI0 Controller' class = serial bus subclass = USB ehci1@pci0:0:19:2: class=0x0c0320 card=0x50041458 chip=0x43961002 rev=0x00 hdr=0x00 vendor = 'ATI Technologies Inc' device = 'SB7x0/SB8x0/SB9x0 USB EHCI Controller' class = serial bus subclass = USB none1@pci0:0:20:0: class=0x0c0500 card=0x00000000 chip=0x43851002 rev=0x42 hdr=0x00 vendor = 'ATI Technologies Inc' device = 'SBx00 SMBus Controller' class = serial bus subclass = SMBus hdac1@pci0:0:20:2: class=0x040300 card=0xa0021458 chip=0x43831002 rev=0x40 hdr=0x00 vendor = 'ATI Technologies Inc' device = 'SBx00 Azalia (Intel HDA)' class = multimedia subclass = HDA isab0@pci0:0:20:3: class=0x060100 card=0x439d1002 chip=0x439d1002 rev=0x40 hdr=0x00 vendor = 'ATI Technologies Inc' device = 'SB7x0/SB8x0/SB9x0 LPC host controller' class = bridge subclass = PCI-ISA pcib4@pci0:0:20:4: class=0x060401 card=0x00000000 chip=0x43841002 rev=0x40 hdr=0x01 vendor = 'ATI Technologies Inc' device = 'SBx00 PCI to PCI Bridge' class = bridge subclass = PCI-PCI ohci2@pci0:0:20:5: class=0x0c0310 card=0x50041458 chip=0x43991002 rev=0x00 hdr=0x00 vendor = 'ATI Technologies Inc' device = 'SB7x0/SB8x0/SB9x0 USB OHCI2 Controller' class = serial bus subclass = USB pcib5@pci0:0:21:0: class=0x060400 card=0x00001002 chip=0x43a01002 rev=0x00 hdr=0x01 vendor = 'ATI Technologies Inc' device = 'SB700/SB800 PCI to PCI bridge (PCIE port 0)' class = bridge subclass = PCI-PCI ohci3@pci0:0:22:0: class=0x0c0310 card=0x50041458 chip=0x43971002 rev=0x00 hdr=0x00 vendor = 'ATI Technologies Inc' device = 'SB7x0/SB8x0/SB9x0 USB OHCI0 Controller' class = serial bus subclass = USB ehci2@pci0:0:22:2: class=0x0c0320 card=0x50041458 chip=0x43961002 rev=0x00 hdr=0x00 vendor = 'ATI Technologies Inc' device = 'SB7x0/SB8x0/SB9x0 USB EHCI Controller' class = serial bus subclass = USB hostb1@pci0:0:24:0: class=0x060000 card=0x00000000 chip=0x16001022 rev=0x00 hdr=0x00 vendor = 'Advanced Micro Devices [AMD]' device = 'Family 15h Processor Function 0' class = bridge subclass = HOST-PCI hostb2@pci0:0:24:1: class=0x060000 card=0x00000000 chip=0x16011022 rev=0x00 hdr=0x00 vendor = 'Advanced Micro Devices [AMD]' device = 'Family 15h Processor Function 1' class = bridge subclass = HOST-PCI hostb3@pci0:0:24:2: class=0x060000 card=0x00000000 chip=0x16021022 rev=0x00 hdr=0x00 vendor = 'Advanced Micro Devices [AMD]' device = 'Family 15h Processor Function 2' class = bridge subclass = HOST-PCI hostb4@pci0:0:24:3: class=0x060000 card=0x00000000 chip=0x16031022 rev=0x00 hdr=0x00 vendor = 'Advanced Micro Devices [AMD]' device = 'Family 15h Processor Function 3' class = bridge subclass = HOST-PCI hostb5@pci0:0:24:4: class=0x060000 card=0x00000000 chip=0x16041022 rev=0x00 hdr=0x00 vendor = 'Advanced Micro Devices [AMD]' device = 'Family 15h Processor Function 4' class = bridge subclass = HOST-PCI hostb6@pci0:0:24:5: class=0x060000 card=0x00000000 chip=0x16051022 rev=0x00 hdr=0x00 vendor = 'Advanced Micro Devices [AMD]' device = 'Family 15h Processor Function 5' class = bridge subclass = HOST-PCI vgapci0@pci0:1:0:0: class=0x030000 card=0x515819da chip=0x0a6510de rev=0xa2 hdr=0x00 vendor = 'nVidia Corporation' device = 'GT218 [GeForce 210]' class = display subclass = VGA hdac0@pci0:1:0:1: class=0x040300 card=0x515819da chip=0x0be310de rev=0xa1 hdr=0x00 vendor = 'nVidia Corporation' device = 'High Definition Audio Controller' class = multimedia subclass = HDA xhci0@pci0:2:0:0: class=0x0c0330 card=0x50071458 chip=0x70231b6f rev=0x01 hdr=0x00 class = serial bus subclass = USB re0@pci0:3:0:0: class=0x020000 card=0xe0001458 chip=0x816810ec rev=0x06 hdr=0x00 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111/8168B PCI Express Gigabit Ethernet controller' class = network subclass = ethernet How-To-Repeat: FTP transfer a file over 20G at gigabit connection for a while
Responsible Changed From-To: freebsd-bugs->freebsd-net Over to maintainer(s).
State Changed From-To: open->feedback Would you show me the output of "ifconfig re0"? Just sending large files from your box through re0 is enough to reproduce the issue?
Responsible Changed From-To: freebsd-net->yongari Take.
I have the same problem with Realtek with the same chip: re0@pci0:2:0:0: class=0x020000 card=0x78231462 chip=0x816810ec rev=0x0c hdr=0x00 And ifconfig: re0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=8209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE> ether 44:8a:5b:9b:98:7f inet xxx.xxx.xxx.xxx netmask 0xffffffc0 broadcast xxx.xxx.xxx.xxx inet6 xxx::xxx:xxx:xxx%re0 prefixlen 64 scopeid 0x3 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex,master>) status: active
I can confirm this still happens in 10.1, with following chipset: re0@pci0:3:0:0: class=0x020000 card=0x81681849 chip=0x816810ec rev=0x11 hdr=0x00 re0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=8209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE> ether XXXX inet6 XXXX inet XXXX netmask XXXX broadcast XXXX nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active @yongari: anything I can do to help debug with this? PM me.
(In reply to freebsd from comment #5) pciconf(8) output for RealTek NICs are not useful to identify exact controller type because all RealTek PCI NICs use the name device id. Could you show us dmesg output and "devinfo -rv" output? If you're able to reproduce watchdog timeouts at will could you let us know how to trigger that?
My setup, an iMac running OSX 10.11.1 as client, connected to a server running FreeBSD 10.1 in a ASRock N3700-ITX, over a Netgear GS108T. To trigger the timeout, I copy a big data set (100G+) from the client to the server over NFS. It normally happens after copying 4-6 GB of data. dmesg: re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 0xe000-0xe0ff mem 0x91304000-0x91304fff,0x91300000-0x91303fff irq 18 at device 0.0 on pci3 re0: Using 1 MSI-X message re0: Chip rev. 0x4c000000 re0: MAC rev. 0x00000000 miibus0: <MII bus> on re0 re0: Ethernet address: xxxxxx devinfo -rv: nexus0 apic0 I/O memory addresses: 0xfec00000-0xfec0001f ram0 I/O memory addresses: 0x0-0x9d7ff 0x100000-0x5f10efff 0x5f18a000-0x5f28dfff 0x5fa0c000-0x5fbd9fff 0x5ffc6000-0x5fffffff 0x100000000-0x27fffffff acpi0 Interrupt request lines: 0x9 I/O ports: 0x4e-0x4f 0x61 0x63 0x65 0x67 0x70 0x80-0x8f 0x92 0xb2-0xb3 0x280-0x28f 0x290-0x29f 0x2a0-0x2af 0x2b0-0x2bf 0x400-0x47f 0x500-0x5fe 0x680-0x69f I/O memory addresses: 0xe0000000-0xefffffff 0xfea00000-0xfeafffff 0xfed01000-0xfed01fff 0xfed03000-0xfed03fff 0xfed06000-0xfed06fff 0xfed08000-0xfed09fff 0xfed1c000-0xfed1cfff 0xfed80000-0xfedbffff 0xfee00000-0xfeefffff cpu0 pnpinfo _HID=none _UID=0 at handle=\_PR_.CPU0 ACPI I/O ports: 0x416 0x417 acpi_perf0 acpi_throttle0 est0 p4tcc0 coretemp0 cpufreq0 cpu1 pnpinfo _HID=none _UID=0 at handle=\_PR_.CPU1 ACPI I/O ports: 0x416 0x417 acpi_perf1 acpi_throttle1 est1 p4tcc1 coretemp1 cpufreq1 cpu2 pnpinfo _HID=none _UID=0 at handle=\_PR_.CPU2 ACPI I/O ports: 0x416 0x417 acpi_perf2 acpi_throttle2 est2 p4tcc2 coretemp2 cpufreq2 cpu3 pnpinfo _HID=none _UID=0 at handle=\_PR_.CPU3 ACPI I/O ports: 0x416 0x417 acpi_perf3 acpi_throttle3 est3 p4tcc3 coretemp3 cpufreq3 atrtc0 pnpinfo _HID=PNP0B00 _UID=0 at handle=\_SB_.RTC0 Interrupt request lines: 0x8 hpet0 pnpinfo _HID=PNP0103 _UID=0 at handle=\_SB_.HPET Interrupt request lines: 0x14 I/O memory addresses: 0xfed00000-0xfed003ff pcib0 pnpinfo _HID=PNP0A08 _UID=0 at handle=\_SB_.PCI0 I/O ports: 0xcf8-0xcff pci0 hostb0 pnpinfo vendor=0x8086 device=0x2280 subvendor=0x1849 subdevice=0x22b1 class=0x060000 at slot=0 function=0 handle=\_SB_.PCI0.NFC2 vgapci0 pnpinfo vendor=0x8086 device=0x22b1 subvendor=0x1849 subdevice=0x22b1 class=0x030000 at slot=2 function=0 handle=\_SB_.PCI0.GFX0 I/O ports: 0xf000-0xf03f I/O memory addresses: 0x80000000-0x8fffffff 0x90000000-0x90ffffff drm0 drmn0 ahci0 pnpinfo vendor=0x8086 device=0x22a3 subvendor=0x1849 subdevice=0x22a3 class=0x010601 at slot=19 function=0 handle=\_SB_.PCI0.SATA Interrupt request lines: 0x100 I/O ports: 0xf060-0xf07f I/O memory addresses: 0x91415000-0x914157ff ahcich0 at channel=0 I/O memory addresses: 0x91415100-0x9141517f ahcich1 at channel=1 I/O memory addresses: 0x91415180-0x914151ff xhci0 pnpinfo vendor=0x8086 device=0x22b5 subvendor=0x1849 subdevice=0x22b5 class=0x0c0330 at slot=20 function=0 handle=\_SB_.PCI0.XHC1 Interrupt request lines: 0x101 I/O memory addresses: 0x91400000-0x9140ffff usbus0 uhub0 uhub1 pnpinfo vendor=0x174c product=0x2074 devclass=0x09 devsubclass=0x00 sernum="" release=0x0100 mode=host intclass=0x09 intsubclass=0x00 i at bus=0 hubaddr=1 port=1 devaddr=2 interface=0 uhub2 pnpinfo vendor=0x05e3 product=0x0608 devclass=0x09 devsubclass=0x00 sernum="" release=0x8831 mode=host intclass=0x09 intsubclass=0x00 i at bus=0 hubaddr=1 port=5 devaddr=3 interface=0 uhub3 pnpinfo vendor=0x174c product=0x3074 devclass=0x09 devsubclass=0x00 sernum="" release=0x0100 mode=host intclass=0x09 intsubclass=0x00 i at bus=0 hubaddr=1 port=8 devaddr=4 interface=0 unknown pnpinfo vendor=0x8086 device=0x2298 subvendor=0x1849 subdevice=0x2298 class=0x108000 at slot=26 function=0 handle=\_SB_.PCI0.SEC0 I/O memory addresses: 0x91000000-0x910fffff 0x91100000-0x911fffff hdac0 pnpinfo vendor=0x8086 device=0x2284 subvendor=0x1849 subdevice=0xc892 class=0x040300 at slot=27 function=0 handle=\_SB_.PCI0.HDEF Interrupt request lines: 0x102 I/O memory addresses: 0x91410000-0x91413fff hdacc0 pnpinfo vendor=0x10ec device=0x0892 revision=0x03 stepping=0x02 at cad=0 hdaa0 pnpinfo type=0x01 subsystem=0x1849c892 at nid=1 pcm0 at nid=20,22,21,24,26 pcm1 at nid=27,25 pcm2 at nid=30 hdacc1 pnpinfo vendor=0x8086 device=0x2883 revision=0x00 stepping=0x00 at cad=2 hdaa1 pnpinfo type=0x01 subsystem=0x80860101 at nid=1 pcm3 at nid=5 pcm4 at nid=6 pcm5 at nid=7 pcib1 pnpinfo vendor=0x8086 device=0x22c8 subvendor=0x1849 subdevice=0x22c8 class=0x060400 at slot=28 function=0 handle=\_SB_.PCI0.RP01 pci1 pcib2 pnpinfo vendor=0x8086 device=0x22ca subvendor=0x1849 subdevice=0x22ca class=0x060400 at slot=28 function=1 handle=\_SB_.PCI0.RP02 pci2 pcib3 pnpinfo vendor=0x8086 device=0x22cc subvendor=0x1849 subdevice=0x22cc class=0x060400 at slot=28 function=2 handle=\_SB_.PCI0.RP03 I/O ports: 0xe000-0xefff I/O memory addresses: 0x91300000-0x913fffff pci3 re0 pnpinfo vendor=0x10ec device=0x8168 subvendor=0x1849 subdevice=0x8168 class=0x020000 at slot=0 function=0 handle=\_SB_.PCI0.RP03.D01E Interrupt request lines: 0x103 pcib3 I/O port window: 0xe000-0xe0ff pcib3 memory window: 0x91300000-0x91303fff 0x91304000-0x91304fff miibus0 rgephy0 pnpinfo oui=0xe04c model=0x0 rev=0x0 at phyno=1 pcib4 pnpinfo vendor=0x8086 device=0x22ce subvendor=0x1849 subdevice=0x22ce class=0x060400 at slot=28 function=3 handle=\_SB_.PCI0.RP04 I/O ports: 0xd000-0xdfff I/O memory addresses: 0x91200000-0x912fffff pci4 ahci1 pnpinfo vendor=0x1b21 device=0x0612 subvendor=0x1849 subdevice=0x0612 class=0x010601 at slot=0 function=0 handle=\_SB_.PCI0.RP04.D024 Interrupt request lines: 0x104 pcib4 I/O port window: 0xd000-0xd01f 0xd020-0xd023 0xd030-0xd037 0xd040-0xd043 0xd050-0xd057 pcib4 memory window: 0x91200000-0x912001ff ahcich2 at channel=0 I/O memory addresses: 2434793728-2434793855 ahcich3 at channel=1 I/O memory addresses: 2434793856-2434793983 isab0 pnpinfo vendor=0x8086 device=0x229c subvendor=0x1849 subdevice=0x229c class=0x060100 at slot=31 function=0 handle=\_SB_.PCI0.SBRG isa0 sc0 vga0 atkbdc0 I/O ports: 0x60 0x64 atkbd0 Interrupt request lines: 0x1 psm0 fdc0 ppc0 uart1 wbwd0 unknown pnpinfo vendor=0x8086 device=0x2292 subvendor=0x1849 subdevice=0x2292 class=0x0c0500 at slot=31 function=3 handle=\_SB_.PCI0.SBUS I/O ports: 0xf040-0xf05f I/O memory addresses: 0x91414000-0x9141401f unknown pnpinfo _HID=PNP0C09 _UID=1 at handle=\_SB_.PCI0.SBRG.H_EC unknown pnpinfo _HID=PNP0C0A _UID=0 at handle=\_SB_.PCI0.SBRG.H_EC.BAT0 unknown pnpinfo _HID=PNP0C0A _UID=1 at handle=\_SB_.PCI0.SBRG.H_EC.BAT1 unknown pnpinfo _HID=PNP0C0A _UID=2 at handle=\_SB_.PCI0.SBRG.H_EC.BAT2 unknown pnpinfo _HID=INT0800 _UID=0 at handle=\_SB_.PCI0.SBRG.FWHD I/O memory addresses: 0xff000000-0xffffffff unknown pnpinfo _HID=PNP0000 _UID=0 at handle=\_SB_.PCI0.SBRG.IPIC I/O ports: 0x20-0x21 0x24-0x25 0x28-0x29 0x2c-0x2d 0x30-0x31 0x34-0x35 0x38-0x39 0x3c-0x3d 0xa0-0xa1 0xa4-0xa5 0xa8-0xa9 0xac-0xad 0xb0-0xb1 0xb4-0xb5 0xb8-0xb9 0xbc-0xbd 0x4d0-0x4d1 acpi_sysresource0 pnpinfo _HID=PNP0C02 _UID=2 at handle=\_SB_.PCI0.SBRG.LDRC attimer0 pnpinfo _HID=PNP0100 _UID=0 at handle=\_SB_.PCI0.SBRG.TIMR Interrupt request lines: 0x0 I/O ports: 0x40-0x43 0x50-0x53 acpi_sysresource1 pnpinfo _HID=PNP0C02 _UID=0 at handle=\_SB_.PCI0.SBRG.SIO1 unknown pnpinfo _HID=PNP0303 _UID=0 at handle=\_SB_.PCI0.SBRG.PS2K unknown pnpinfo _HID=PNP0F03 _UID=0 at handle=\_SB_.PCI0.SBRG.PS2M unknown pnpinfo _HID=PNP0C08 _UID=0 at handle=\_SB_.PCI0.SBRG.HHMD uart0 pnpinfo _HID=PNP0501 _UID=0 at handle=\_SB_.PCI0.SBRG.UAR1 Interrupt request lines: 0x4 I/O ports: 0x3f8-0x3ff acpi_sysresource2 pnpinfo _HID=PNP0C02 _UID=1 at handle=\_SB_.PCI0.SBRG.SIO2 unknown pnpinfo _HID=PNP0501 _UID=12 at handle=\_SB_.PCI0.SBRG.UR11 unknown pnpinfo _HID=PNP0501 _UID=13 at handle=\_SB_.PCI0.SBRG.UR12 unknown pnpinfo _HID=PNP0501 _UID=14 at handle=\_SB_.PCI0.SBRG.UR13 unknown pnpinfo _HID=PNP0501 _UID=15 at handle=\_SB_.PCI0.SBRG.UR14 unknown pnpinfo _HID=80860F14 _UID=1 at handle=\_SB_.PCI0.SDHA unknown pnpinfo _HID=INT33BB _UID=2 at handle=\_SB_.PCI0.SDHB unknown pnpinfo _HID=BCM43241 _UID=0 at handle=\_SB_.PCI0.SDHB.BRCM unknown pnpinfo _HID=80860F14 _UID=3 at handle=\_SB_.PCI0.SDHC unknown pnpinfo _HID=INTL9C60 _UID=1 at handle=\_SB_.PCI0.GDM1 unknown pnpinfo _HID=INTL9C60 _UID=2 at handle=\_SB_.PCI0.GDM3 unknown pnpinfo _HID=80862288 _UID=1 at handle=\_SB_.PCI0.PWM1 unknown pnpinfo _HID=80862288 _UID=2 at handle=\_SB_.PCI0.PWM2 unknown pnpinfo _HID=8086228A _UID=1 at handle=\_SB_.PCI0.URT1 unknown pnpinfo _HID=BCM2E1A _UID=0 at handle=\_SB_.PCI0.URT1.BTH0 unknown pnpinfo _HID=BCM2E64 _UID=0 at handle=\_SB_.PCI0.URT1.BTH1 unknown pnpinfo _HID=8086228A _UID=2 at handle=\_SB_.PCI0.URT2 unknown pnpinfo _HID=BCM4752 _UID=0 at handle=\_SB_.PCI0.URT2.GPS0 unknown pnpinfo _HID=BCM4752 _UID=0 at handle=\_SB_.PCI0.URT2.GPS1 unknown pnpinfo _HID=8086228E _UID=1 at handle=\_SB_.PCI0.SPI1 unknown pnpinfo _HID=AUTH2750 _UID=0 at handle=\_SB_.PCI0.SPI1.FPNT unknown pnpinfo _HID=8086228E _UID=2 at handle=\_SB_.PCI0.SPI2 unknown pnpinfo _HID=8086228E _UID=3 at handle=\_SB_.PCI0.SPI3 unknown pnpinfo _HID=NXP1002 _UID=1 at handle=\_SB_.PCI0.NFC2 unknown pnpinfo _HID=808622C1 _UID=1 at handle=\_SB_.PCI0.I2C1 unknown pnpinfo _HID=SMO91D0 _UID=1 at handle=\_SB_.PCI0.I2C1.SHUB unknown pnpinfo _HID=808622C1 _UID=2 at handle=\_SB_.PCI0.I2C2 unknown pnpinfo _HID=10EC5670 _UID=1 at handle=\_SB_.PCI0.I2C2.RTEK unknown pnpinfo _HID=IMPJ0002 _UID=1 at handle=\_SB_.PCI0.I2C2.IMP2 unknown pnpinfo _HID=IMPJ0003 _UID=1 at handle=\_SB_.PCI0.I2C2.IMP3 unknown pnpinfo _HID=808622C1 _UID=3 at handle=\_SB_.PCI0.I2C3 unknown pnpinfo _HID=none _UID=0 at handle=\_SB_.PCI0.I2C3.CLK0 unknown pnpinfo _HID=INT33F7 _UID=1 at handle=\_SB_.PCI0.I2C3.CAMD unknown pnpinfo _HID=808622C1 _UID=4 at handle=\_SB_.PCI0.I2C4 unknown pnpinfo _HID=none _UID=0 at handle=\_SB_.PCI0.I2C4.CLK0 unknown pnpinfo _HID=none _UID=0 at handle=\_SB_.PCI0.I2C4.CLK1 unknown pnpinfo _HID=INTCF1A _UID=1 at handle=\_SB_.PCI0.I2C4.CAM1 unknown pnpinfo _HID=INT33FB _UID=1 at handle=\_SB_.PCI0.I2C4.CAM2 unknown pnpinfo _HID=INTCF1C _UID=1 at handle=\_SB_.PCI0.I2C4.STRA unknown pnpinfo _HID=INT33BE _UID=1 at handle=\_SB_.PCI0.I2C4.CAM3 unknown pnpinfo _HID=INTCF1C _UID=1 at handle=\_SB_.PCI0.I2C4.STRB unknown pnpinfo _HID=808622C1 _UID=5 at handle=\_SB_.PCI0.I2C5 unknown pnpinfo _HID=MSFT0002 _UID=1 at handle=\_SB_.PCI0.I2C5.TPDC unknown pnpinfo _HID=808622C1 _UID=6 at handle=\_SB_.PCI0.I2C6 unknown pnpinfo _HID=ATML7000 _UID=0 at handle=\_SB_.PCI0.I2C6.TSC0 unknown pnpinfo _HID=ATML1000 _UID=1 at handle=\_SB_.PCI0.I2C6.TCS1 unknown pnpinfo _HID=MSFT0002 _UID=1 at handle=\_SB_.PCI0.I2C6.TPD1 unknown pnpinfo _HID=808622C1 _UID=7 at handle=\_SB_.PCI0.I2C7 unknown pnpinfo _HID=NXP7471 _UID=1 at handle=\_SB_.PCI0.I2C7.NFC1 unknown pnpinfo _HID=808622D8 _UID=0 at handle=\_SB_.PCI0.IISH unknown pnpinfo _HID=808622A8 _UID=1 at handle=\_SB_.PCI0.LPEA unknown pnpinfo _HID=ADMA22A8 _UID=1 at handle=\_SB_.PCI0.LPEA.ADMA unknown pnpinfo _HID=AMCR22A8 _UID=1 at handle=\_SB_.PCI0.AMCR unknown pnpinfo _HID=HAD022A8 _UID=1 at handle=\_SB_.PCI0.HAD0 unknown pnpinfo _HID=808622B7 _UID=0 at handle=\_SB_.PCI0.UOTG acpi_sysresource3 pnpinfo _HID=PNP0C02 _UID=3 at handle=\_SB_.PCI0.SPRC acpi_sysresource4 pnpinfo _HID=PNP0C02 _UID=1 at handle=\_SB_.PCI0.PDRC pci_link0 pnpinfo _HID=PNP0C0F _UID=1 at handle=\_SB_.LNKA pci_link1 pnpinfo _HID=PNP0C0F _UID=2 at handle=\_SB_.LNKB pci_link2 pnpinfo _HID=PNP0C0F _UID=3 at handle=\_SB_.LNKC pci_link3 pnpinfo _HID=PNP0C0F _UID=4 at handle=\_SB_.LNKD pci_link4 pnpinfo _HID=PNP0C0F _UID=5 at handle=\_SB_.LNKE pci_link5 pnpinfo _HID=PNP0C0F _UID=6 at handle=\_SB_.LNKF pci_link6 pnpinfo _HID=PNP0C0F _UID=7 at handle=\_SB_.LNKG pci_link7 pnpinfo _HID=PNP0C0F _UID=8 at handle=\_SB_.LNKH unknown pnpinfo _HID=PNP0C0D _UID=0 at handle=\_SB_.LID0 unknown pnpinfo _HID=none _UID=0 at handle=\_SB_.USBC acpi_button0 pnpinfo _HID=PNP0C0C _UID=0 at handle=\_SB_.PWRB acpi_button1 pnpinfo _HID=PNP0C0E _UID=0 at handle=\_SB_.SLPB unknown pnpinfo _HID=INT0002 _UID=1 at handle=\_SB_.GPED unknown pnpinfo _HID=INT33FF _UID=1 at handle=\_SB_.GPO0 unknown pnpinfo _HID=INT33FF _UID=2 at handle=\_SB_.GPO1 unknown pnpinfo _HID=INT33FF _UID=3 at handle=\_SB_.GPO2 unknown pnpinfo _HID=INT33FF _UID=4 at handle=\_SB_.GPO3 unknown pnpinfo _HID=INTCFD9 _UID=0 at handle=\_SB_.TBAD unknown pnpinfo _HID=INT33BD _UID=1 at handle=\_SB_.MBID unknown pnpinfo _HID=ACPI000C _UID=0 at handle=\_SB_.PAGD unknown pnpinfo _HID=INT3497 _UID=0 at handle=\_SB_.PIND unknown pnpinfo _HID=PNP0C31 _UID=1 at handle=\_SB_.TPM_ I/O memory addresses: 0xfed40000-0xfed44fff acpi_timer0 pnpinfo unknown at unknown ACPI I/O ports: 0x408-0x40b
(In reply to freebsd from comment #7) Thanks for the info. I see you disabled TSO. Could you also disable TX checksum offloading on re(4) and test again? It seems your controller is RTL8168G. There are a couple of improvements for RTL8168G or later controllers in 10.2-RELEASE. Could you try the same test on 10.2-RELEASE? I guess you can use USB memstick without upgrading your box. (When you boot with memstick make sure to cold-boot.) If neither helps, could you let me know details on your network setup and environments? What NFS options do you use? If you don't use NFS can you trigger the issue using other applications like scp or rsync? I'm assuming here you don't use bridge(4), lagg(4), polling(4), netmap(4), pf(4) and ipfw(4) etc. Please also let me know whether re(4) recovers from the error after you see watchdog timeouts.
It does recover sometimes, but after several iterations network connection just dies. Seems to be related to NFS, interesting. Starting with a cold boot, I disabled TXCSUM, tried with a NFS copy: it times out after a few gigabytes, but recovers and continues. In general is slower than expected. Cancel after a few gigabytes. Tried with rsync (TXCSUM still disabled), completes 300+ GB without problems. Tried again with NFS, left it copying. Times out, recovers several times, until it doesnt recover. I will try with 10.2-RELEASE next. Disabling TXCSUM: # ifconfig re0 re0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=8209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE> ether XXXXXX inet6 XXXXXX%re0 prefixlen 64 scopeid 0x1 inet XXXXXX netmask 0xffffff00 broadcast XXXXXX nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active [root@hp ~]# ifconfig re0 -txcsum [root@hp ~]# ifconfig re0 re0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=82099<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE> ether XXXXXXX inet6 XXXXXXX%re0 prefixlen 64 scopeid 0x1 inet XXXXXX netmask 0xffffff00 broadcast XXXXXX nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active Message log: [root@hp ~]# tail -f /var/log/messages Nov 25 22:47:42 hp kernel: Trying to mount root from zfs:zroot/ROOT/10.1-RELEASE []... Nov 25 22:47:43 hp ntpd[748]: ntpd 4.2.4p5-a (1) Nov 25 22:47:52 hp ntpd[749]: time reset +0.409874 s Nov 25 22:48:17 hp kernel: re0: link state changed to DOWN Nov 25 22:48:21 hp kernel: re0: link state changed to UP Nov 25 22:48:21 hp devd: Executing '/etc/rc.d/dhclient quietstart re0' Nov 25 22:48:26 hp dhclient: New IP Address (re0): XXXXXX Nov 25 22:48:26 hp dhclient: New Subnet Mask (re0): XXXXX Nov 25 22:48:26 hp dhclient: New Broadcast Address (re0): XXXXX Nov 25 22:48:26 hp dhclient: New Routers (re0): XXXXX Nov 25 22:49:32 hp rpc.statd: Invalid hostname to sm_mon: iMac.local Nov 25 22:49:32 hp kernel: Local NSM refuses to monitor iMac.local Nov 25 22:51:26 hp kernel: re0: watchdog timeout Nov 25 22:51:26 hp kernel: re0: link state changed to DOWN Nov 25 22:51:31 hp kernel: re0: link state changed to UP Nov 25 22:51:31 hp devd: Executing '/etc/rc.d/dhclient quietstart re0' Nov 25 22:51:31 hp dhclient: New IP Address (re0): XXXX Nov 25 22:51:31 hp dhclient: New Subnet Mask (re0): XXXXX Nov 25 22:51:31 hp dhclient: New Broadcast Address (re0): XXXXX Nov 25 22:51:31 hp dhclient: New Routers (re0): XXXXX Nov 25 23:10:55 hp kernel: re0: watchdog timeout Nov 25 23:10:55 hp kernel: re0: link state changed to DOWN Nov 25 23:10:59 hp kernel: re0: link state changed to UP Nov 25 23:10:59 hp devd: Executing '/etc/rc.d/dhclient quietstart re0' Nov 25 23:10:59 hp dhclient: New IP Address (re0): XXXX Nov 25 23:10:59 hp dhclient: New Subnet Mask (re0): XXXX Nov 25 23:10:59 hp dhclient: New Broadcast Address (re0): XXXX Nov 25 23:10:59 hp dhclient: New Routers (re0): XXXX Nov 25 23:15:46 hp pkg: pkg upgraded: 1.6.1_2 -> 1.6.2 Nov 25 23:15:52 hp pkg: rsync-3.1.1_4 installed Nov 25 23:16:28 hp rpc.statd: Unsolicited notification from host iMac.local Nov 25 23:58:35 hp ntpd[749]: time reset -0.372588 s Nov 26 00:33:17 hp ntpd[749]: time reset +0.252420 s Nov 26 09:34:23 hp kernel: re0: watchdog timeout Nov 26 09:34:23 hp kernel: re0: link state changed to DOWN Nov 26 09:34:27 hp kernel: re0: link state changed to UP Nov 26 09:34:27 hp devd: Executing '/etc/rc.d/dhclient quietstart re0' Nov 26 09:34:27 hp dhclient: New IP Address (re0): XXXX Nov 26 09:34:27 hp dhclient: New Subnet Mask (re0): XXXX Nov 26 09:34:27 hp dhclient: New Broadcast Address (re0): XXXX Nov 26 09:34:27 hp dhclient: New Routers (re0): XXXX Nov 26 09:55:08 hp kernel: re0: watchdog timeout Nov 26 09:55:08 hp kernel: re0: link state changed to DOWN Nov 26 09:55:09 hp ntpd[749]: sendto(185.11.138.90) (fd=23): No route to host Nov 26 09:55:12 hp kernel: re0: link state changed to UP Nov 26 09:55:12 hp devd: Executing '/etc/rc.d/dhclient quietstart re0' Nov 26 09:55:13 hp dhclient: New IP Address (re0): XXXX Nov 26 09:55:13 hp kernel: arpresolve: can't allocate llinfo for XXXX on re0 Nov 26 09:55:13 hp dhclient: New Subnet Mask (re0): XXXX Nov 26 09:55:13 hp dhclient: New Broadcast Address (re0): XXXX Nov 26 09:55:13 hp dhclient: New Routers (re0): XXXX Nov 26 10:02:44 hp kernel: re0: watchdog timeout Nov 26 10:02:44 hp kernel: re0: link state changed to DOWN Nov 26 10:02:48 hp kernel: re0: link state changed to UP Nov 26 10:02:48 hp devd: Executing '/etc/rc.d/dhclient quietstart re0' Nov 26 10:02:48 hp dhclient: New IP Address (re0): XXXX Nov 26 10:02:48 hp dhclient: New Subnet Mask (re0): XXXX Nov 26 10:02:48 hp dhclient: New Broadcast Address (re0): XXXX Nov 26 10:02:48 hp dhclient: New Routers (re0): XXXX Nov 26 10:19:29 hp kernel: re0: watchdog timeout Nov 26 10:19:29 hp kernel: re0: link state changed to DOWN Nov 26 10:19:33 hp kernel: re0: link state changed to UP Nov 26 10:19:33 hp devd: Executing '/etc/rc.d/dhclient quietstart re0' Nov 26 10:19:33 hp dhclient: New IP Address (re0): XXXX Nov 26 10:19:33 hp dhclient: New Subnet Mask (re0): XXXX Nov 26 10:19:33 hp dhclient: New Broadcast Address (re0): XXXX Nov 26 10:19:33 hp dhclient: New Routers (re0): XXXX Nov 26 10:59:49 hp kernel: re0: watchdog timeout Nov 26 10:59:49 hp kernel: re0: link state changed to DOWN Nov 26 10:59:53 hp kernel: re0: link state changed to UP Nov 26 10:59:53 hp devd: Executing '/etc/rc.d/dhclient quietstart re0' Nov 26 11:00:00 hp dhclient: New IP Address (re0): XXXX Nov 26 11:00:00 hp kernel: arpresolve: can't allocate llinfo for XXXX on re0 Nov 26 11:00:00 hp dhclient: New Subnet Mask (re0): XXXXX Nov 26 11:00:00 hp dhclient: New Broadcast Address (re0): XXXX Nov 26 11:00:00 hp dhclient: New Routers (re0): XXXX packet_write_wait: Connection to XXXX: Broken pipe
Created attachment 173161 [details] pciconf
I can confirm this issue, it's easy to repeat need just send few (sometime even one) packages via network interface(update large number of just big packages, download few small files or just huge one, browse web: page with many small files or just one big etc.) but it's not related to specific card/chipset (my pciconf output in attachment) or powerd as was mentioned in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=208205 problem related to re itself. It's possible to temporary solved(warning: terrible decision) it just remove re from default kernel configuration when it start lag again(i.e. watchdog timeout) again just reload it(via shell script).
The problem still exists in version 11.1. Low load - "re" works fine, but large bidirectional - completely hangs the network. Perhaps recompilation would be good - see Marc Mach: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=208205 But the original driver from Realtek is for FreeBSD versions 7 or 8. There is also a patch, which supposedly works: https://forums.freebsd.org/threads/55861/#post-324491 On the other hand, the advice from this thread: https://forums.freebsd.org/threads/55306/ ...they do not work at my computers. Is there a chance to renew the topic? zjk
I can just confirm the issue too and it's plague for many users. When there is high traffic the network stalls for a few seconds and the timeout appears. The only way to fix it was to install the Realtek driver from their website. The following info is taken with re drivers from Realtek (the timeout disappears because it's commented out in the code, but also the issue goes away). # dmesg re0: <Realtek PCIe GBE Family Controller> port 0xd000-0xd0ff mem 0xd0604000-0xd0604fff,0xd0600000-0xd0603fff irq 18 at device 0.0 on pci2 re0: Using Memory Mapping! re0: Using 1 MSI-X message re0: ASPM disabled re0: version:1.93 # pciconf -lvbc re0@pci0:3:0:0: class=0x020000 card=0xe0001458 chip=0x816810ec rev=0x0c hdr=0x00 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller' class = network subclass = ethernet bar [10] = type I/O Port, range 32, base rxd000, size 256, enabled bar [18] = type Memory, range 64, base rxd0604000, size 4096, enabled bar [20] = type Prefetchable Memory, range 64, base rxd0600000, size 16384, enabled cap 01[40] = powerspec 3 supports D0 D1 D2 D3 current D0 cap 05[50] = MSI supports 1 message, 64 bit cap 10[70] = PCI-Express 2 endpoint MSI 1 max data 128(128) link x1(x1) speed 2.5(2.5) ASPM disabled(L0s/L1) cap 11[b0] = MSI-X supports 4 messages, enabled Table in map 0x20[0x0], PBA in map 0x20[0x800] cap 03[d0] = VPD ecap 0001[100] = AER 1 0 fatal 0 non-fatal 3 corrected ecap 0002[140] = VC 1 max VC0 ecap 0003[160] = Serial 1 01000000684ce000 ecap 0018[170] = LTR 1 # devinfo -rv pcib3 pnpinfo vendor=0x8086 device=0x0f4c subvendor=0x1458 subdevice=0x1000 class=0x060400 at slot=28 function=2 dbsf=pci0:0:28:2 handle=\_SB_.PCI0.RP03 Interrupt request lines: 0x105 I/O ports: 0xd000-0xdfff I/O memory addresses: 0xd0600000-0xd06fffff PCI domain 0 bus numbers: 3 pci2 pcib3 bus numbers: 3 re0 pnpinfo vendor=0x10ec device=0x8168 subvendor=0x1458 subdevice=0xe000 class=0x020000 at slot=0 function=0 dbsf=pci0:3:0:0 handle=\_SB_.PCI0.RP03.PXSX Interrupt request lines: 0x106 pcib3 I/O port window: 0xd000-0xd0ff pcib3 memory window: 0xd0600000-0xd0603fff 0xd0604000-0xd0604fff # ifconfig re0 re0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=201b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,WOL_MAGIC> ether xx:xx:xx:xx:xx:xx inet 192.168.0.1 netmask 0xffffff00 broadcast 192.168.0.255 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active
Alex Dupre: "The only way to fix it was to install the Realtek driver from their website. " Oh no. It does not always work that way... Currently, I have compiled into kernel the latest driver downloaded from the Realtek site 1.94.01: re0: <Realtek PCIe GBE Family Controller> port 0xe000-0xe0ff mem 0x91004000-0x91004fff, 0x91000000-0x91003fff irq 20 at device 0.0 on pci2 re0: Using Memory Mapping! re0: Using 1 MSI-X message re0: ASPM disabled re0: version: 1.94.01 Unfortunately, after a while, when the load is very large, the interface stops working - but without watchdog messages! The case is confirmed by another observation. On several computers I need to use lagg. But unfortunately - after some time, computers generate messages: [1177339] re0: Interface stopped DISTRIBUTING, possible flapping [1199071] re0: Interface stopped DISTRIBUTING, possible flapping and after that, the flapp meter in lagg increases: lagg statistics: active ports: 2 flapping: 4 After some time, the lagg connector freezes until it reboots. :( lagg0: flags=28943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,PPROMISC> metric 0 mtu 1500 options=2018<VLAN_MTU,VLAN_HWTAGGING,WOL_MAGIC> ether xxxx inet xxxx netmask xxxx broadcast xxxx nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect status: active groups: lagg laggproto lacp lagghash l2,l3,l4 re0: flags=28943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,PPROMISC> metric 0 mtu 1500 options=2018<VLAN_MTU,VLAN_HWTAGGING,WOL_MAGIC> ether xxxx hwaddr xxxx nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet 1000baseT <full-duplex> status: active re1: flags=28943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,PPROMISC> metric 0 mtu 1500 options=2018<VLAN_MTU,VLAN_HWTAGGING,WOL_MAGIC> ether xxxx hwaddr xxxx nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet 1000baseT <full-duplex> status: active
I see many difference between 1.93 and 1.94.01 versions. Did you try also with 1.93 and you experienced the issue as well?
No - I am using 1.94 from around september/october (the driver 1.94.01 is from august 2017). Previous I used the original 11.0-RELEASE, which compared to 11.1-RELEASE - in my opinion: caused less trouble. In any case: this may be a reasonable comment in the thread "comment 19": https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=208205 maybe it is not necessarily the fault of Realtek drivers ... Lagg with lacp should tolerate the temporary flaps and renew the connection. However, for me: it only works for a certain time, then the connection is lost (I must reboot). But I may be wrong.
In case you want to try, you can download the 1.93 version from here: http://12244.wpc.azureedge.net/8012244/drivers/rtdrivers/cn/nic/0006-rtl_bsd_drv_v193.tgz
I just started tests with version 1.93. I will write in two to three days: what are the results.
Four days of tests. I chose four computers, two with different mainboards (but Realtek interfaces). Every computer with lagg / lacp. 1. Normal work. I received only one message within 4 days: "Interface stopped DISTRIBUTING, possible flapping". Unfortunately, I do not know in which situation. 2. Jperf (server, client) - several tests (20 minutes). Explicitly no messages about problems, but there were single complete network outages on FreeBSD computers. 3. Jperf with Windows computers (including the same motherboard) - several tests (20 minutes). Explicitly no messages about problems, but there were single complete network outages on FreeBSD computers. 4. Network access (samba, moosefs): Explicitly no messages about problems, but there were single complete network outages on FreeBSD computers. Network outages did not increase the flapp counter in lagg (?). However, they caused the computer command line to be frozen for a few seconds (during high transfers). 1.93 - certainly better than 1.94, but still difficult to accept. zjk
This is a serious and annoying issue. Most cheap motherboards have Realtek network cards and this issue is forcing users to switch to other operating systems when even using the old drivers from Realtek website doesn't help. I haven't the know-how to debug and fix the issue, but I have no problems in contributing to a bounty for someone that is going to definitely fix the re driver in base. Is anyone interested in accepting the challenge?
Put this back on a visible bug list
As stated in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=208205 The generic driver fails under load. Replacing the card with another Realtec card did not help. Replacing the Realtec card with an Intel card did solve all problems.
I managed to solve this issue by disabling MSI and MSI-X. Put the following lines into /boot/loader.conf hw.re.msi_disable="1" hw.re.msix_disable="1" You see, the MSI/MSI-X interrupt processing supposedly eliminates the need to perform an extra read from device register after receiving an interrupt which tells that a DMA write is finished. However, there is some kind of problem either in the driver or the chip itself in the way it handles these interrupts. By disabling MSI and MSI-X, the driver switches to using the older interrupt filter handler, and thus probably performs and extra read from some device register to wait for the DMA transfer to memory to be ready (according to wikipedia, when using legacy interrupts this is the only way to ensure the DMA transfer wasn't buffered by the chipset etc). So, I would suggest everybody watching this thread to try if disabling MSI and MSI-X on their system helps. Might not apply to all Realtek NICs, but on my machine this workaround is valid.
Disabling MSI/MSI-X was proposed as solution in the past. I've just tried again to be sure, it helps, but the issue doesn't disappear completely. With it I can successfully run the google (m-lab) speed test, but I still get a watchdog timeout and network reset as soon as I start the Ookla speed test. Fully reproducible.
hw.re.msi_disable hw.re.msix_disable I tested this solution for a few days (it already exists somewhere on the internet). There is no visible effect (on my computers) - network is closing very quickly. But - maybe it depends on the network card chipset? However, I highly recommend the analysis: https://forums.freebsd.org/threads/10-2-release-re0-watchdog-timeout.55306/#post-337045 There are some extremely important remarks. One important tip - this may be the result of overloading the processor. In general - a problem for low-performance processors. Or vice versa: for the "computationally demanding" chipset of the network card, and finally the "programmatically extended" driver. Probably because the version of "built-in" driver for FreeBSD is so much "slimmed", in relation to the full version from Realtek (from the Realtek website). It may be intended to run on less-efficient processors. But I can not fully appreciate everything from this analysis. "Watchdog timeout" messages - also occur after stopping the transmission. Processor load drops to several percent, but watchdog timeout messages still appear every few seconds. In general - a reset is needed to restore the normal operation of the interface. As a solution, you can use "patch" - instead of, for example, limit the connection speed to 100 Mb, you can use, for example, dummynet for flow / band management. It is still not a solution to the problem of the driver itself.
After upgrading to 11.2-RELEASE the problem seems disappeared on my machine. Looking at dmesg the only difference is the missing of the following line at boot: re0: turning off MSI enable bit.
After upgrading several machines to 11.2 and all-night tests: nothing better, still a watchdog fault. zjk
I still see a few watchdog errors in the logs, but I'm unable to trigger them voluntarily, even with very high traffic. While before it was enough to run a single speed test to drop the connection, now I can saturate the link without a watchdog timeout. The connection is quite stable now. The issue is likely not solved, but it's much harder to be triggered in my scenario.
The following configuration is very promising: - kernel 11.2-RELEASE recompiled together, - re driver v. 1.93 (from realtek site). Effect: - NO (absolutely none) watchdog timeout, - FULL speed in both directions (I will still test different situations), - works well with lagg(!). Now I compile realtek version 1.94 with 11.2-RELEASE - I will let you know what are the effects. zjk
Surely you won't get the watchdog timeout error with the driver taken from the realtek website, it's been commented out from the source code, so it's not a real clue. Said so, with 11.0 and 11.1 I've always used the 1.93 version without issues.
I see problems with the 1.94 or 1.95 realtek driver and 11.2-RELEASE. Data transfer stops without messages after about a week of load. With 11.1 there is no problem.
I hit this a couple of times on a NFS server running 12.0-ALPHA3 while running highly parallel buildworlds with an NFS-mounted obj dir.
Created attachment 196815 [details] System load average and usage - monitorix
A. After longer tests - I must cancel the previous optimistic news. We are talking about the 11.2-RELEASE + 1.93-realtek driver: 1. Suspensions, computer stops - still occur. They are only shorter - though still cumbersome. See attachment above. Generally at the beginning the interface works quickly, after some time it slows down and shows signs of loss. 2. There are still messages about the interface suspension. Because I use lagg it looks like this: + [20445] re1: Interface stopped DISTRIBUTING, possible flapping + [48114] re0: Interface stopped DISTRIBUTING, possible flapping B. Regarding Alex's statements. This is a real problem. Of course, the "watchdog timeout" message itself is not harmful. The important thing is that the message in the function follows the reset and re-initialisation of the interface - this unfortunately results in the loss or partial destruction of transmitted files / frames (which unfortunately I have experienced many times). The application of version 1.93-1.94: is therefore of such a improvement that not only does the message disappear (commented out from function - as Alex correctly writes), but the files are not damaged during the transmission (yet to be checked!). Version 11.2-RELEASE - for me it certainly generates hundreds of messages "watchdog timeout" - but today I do not know if it prevents damage or loss of transmitted data (to be checked). I see: /* Cancel pending I/O and free all RX/TX buffers. */ re_stop(sc); /* Put controller into known state. */ re_reset(sc); It means: drop, loss transmitted information. C. However, I will not agree with Alex that it is good. Perhaps it is good for a laptop, too little for the server. It is still terrible. D. Test 11.2 + 1.94 - I have not started yet.
I don't think I've ever said this issue is good :-) What I said is that in my environment when I switched to 11.2-RELEASE it was happening less frequently. With the FreeBSD driver is easy to detect it, because it prints the timeout message and resets the interface after 5 ticks, effectively interrupting any connections for a few seconds. The Realtek driver doesn't reset the interface and doesn't print the message, so a short timeout might go unnoticed. To add new info to the thread, recently I've tried to increase the watchdog timeout of the FreeBSD driver, changing it from 5 to 50 ticks. Well, the result was that the connection interruption lasted longer, so the interface seems really stuck and the reset the only solution. In the last months I've also tried Realtek drivers 1.94.01 and 1.95 (the one I'm currently running) and I'm not seeing differences from the 1.93, in my scenario it seems to work good enough (== I'm not able to detect any connection drop during normal usage, that doesn't mean they are not happening at all).
Ok, ok Alex - I understand. Therefore, for doubters - I added 2 posts earlier chart from monitorix. For a 24/7 server - you can see how the link hangs (and this happens on a server that has not too heavy load...), only the reset restores a longer good response. For problem solvers - I must add: on most computers I use lagg. Evidently this "overlay" on the driver increases the frequency of hanging (compared to computers with re without lagg). But this is a separate problem for a separate thread.
Friends, why on a so strong and network oriented OS like FreeBSD so much time appears so annoying problem with so widespreaded network chipset? Maybe we can crowd fund that problem? I can give my 10$.
(In reply to George from comment #37) I too would contribute to a bug bounty to fix this. I'm using an embedded system that has this chipset and I've switched to a USB 3.0 NIC but I'd love to be able to go back to the onboard NIC.
I'd also like to add a "me too" here. I've been evaluating how well FreeBSD works with dual port NICs with the intent of using multiport 10G (Mellanox?) cards if it performs well. Years of experience with re(4)'s has shown that they are stable performers, and inexpensive. Which is why I chose it for the trial. It worked well for some 4-6 mos. But we're now plagued with watchdog timeout errors, with the *only* working solution being; to bounce the box(es). I'm *guessing* greater pressure on the wire(s) to be the reason for it happening now, and not earlier. Any insight (with a cure) would be *greatly* appreciated. Details follow: 11.1-STABLE r327867 amd64 watchdog timeout watchdog timeout watchdog timeout watchdog timeout watchdog timeout watchdog timeout watchdog timeout watchdog timeout watchdog timeout watchdog timeout watchdog timeout watchdog timeout rc.conf(5) ifconfig_re0="inet AA.BBB.CC.XX netmask 255.255.255.0 rxcsum txcsum tso4" ifconfig_re1="inet AA.BBB.CC.WW netmask 255.255.255.0 rxcsum txcsum tso4" ifconfig_re1_alias0="inet AA.BBB.CC.ZZ netmask 255.255.255.0" ifconfig(8) re0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=8219b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC,LINKSTATE> ether 00:13:3b:0f:13:44 hwaddr 00:13:3b:0f:13:44 inet6 fe80::213:3bff:fe0f:1344%re0 prefixlen 64 scopeid 0x1 inet AA.BBB.CC.XX netmask 0xffffff00 broadcast 24.113.41.255 nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active re1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=8219b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC,LINKSTATE> ether 00:13:3b:0f:13:45 hwaddr 00:13:3b:0f:13:45 inet AA.BBB.CC.WW netmask 0xffffff00 broadcast 24.113.41.255 inet AA.BBB.CC.ZZ netmask 0xffffff00 broadcast 24.113.41.255 inet6 fe80::213:3bff:fe0f:1345%re1 prefixlen 64 scopeid 0x2 nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384 options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6> inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3 inet 127.0.0.1 netmask 0xff000000 nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> groups: lo pciconf(8) re0@pci0:5:0:0: class=0x020000 card=0x012310ec chip=0x816810ec rev=0x07 hdr=0x00 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller' class = network subclass = ethernet re1@pci0:6:0:0: class=0x020000 card=0x012310ec chip=0x816810ec rev=0x07 hdr=0x00 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller' class = network subclass = ethernet Thanks again! --Chris
With my setup shown just above this comment. I was able to overcome the watchdog timeouts, and related problems by bumping up the values of both kern.ipc.nmbjumbop and kern.ipc.nmbclusters In doing so I have had no issues at all for more than 2 weeks. IMHO the default values for these items are too small. They should either have a larger default, or become dynamic. Allowing them to self-adjust according to the hardware involved. HTH --Chris
My story.. also had this issue. Two my home routers run 11.3-RELEASE-p5 amd64 and have realtek nics like this one: re0 pnpinfo vendor=0x10ec device=0x8168 subvendor=0x1458 subdevice=0xe000 class=0x020000 at slot=0 function=0 dbsf=pci0:1:0:0 handle=\_SB_.PCI0.RP01.PXSX Interrupt request lines: 258 pcib1 I/O port window: 0xe000-0xe0ff pcib1 memory window: 0xd0700000-0xd0703fff 0xd0704000-0xd0704fff I switched to Realtek's driver. But still have wtchdog timeouts. After some Googling I found duscussion about issues with jumbo buffers. After cheking this idea I found confirmation - after some time (depending on traffic rate/amount) memory became fragmented and requests to 9k buffers fails. Now I use 1.95 driver from vendor but with very-very dirty hack. I've replaced Jumbo_Frame_9k with value 3072. So now re driver use only 4k buffers. I'm ok with MTU of 1500 (this change limits max MTU). But now it is stable and no watchdog timeouts. And no more failures on buffers: artem@gate$ vmstat -z | grep buf mbuf_packet: 256, 2362080, 2, 1265, 9732282, 0, 0 mbuf: 256, 2362080, 514, 1789,17068586803, 0, 0 mbuf_cluster: 2048, 369076, 1267, 21, 531078, 0, 0 mbuf_jumbo_page: 4096, 184537, 513, 263,7618134369, 0, 0 mbuf_jumbo_9k: 9216, 54677, 0, 0, 0, 0, 0 mbuf_jumbo_16k: 16384, 30756, 0, 0, 0, 0, 0 I know it is stupid trick but at least it works. :) Hope it can help.
(In reply to Chris Hutchinson from comment #40) Could you share what specific values worked for you? My default values seem pretty high on FreeBSD 12.1. kern.ipc.nmbjumbop: 496998 kern.ipc.nmbclusters: 993998
In ticket 208205 I post a code change of if_re.c which solves the issue for me. Can somebody try the change too? In my oppinion, changes in sysctls, only adjusting the timing behaviour of the existing environment, without solving the issue in general. In my case the patch works for weeks now without any performance or packet loss. Link: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=208205#c32
(In reply to Bob Smith from comment #42) The ideal number will vary by card (model/brand/...) We only ever experienced the watchdog(8) problem on any of our RealTek cards on FreeBSD-11.x. We have *not* needed to "tune" these cards for watchdog on 12, or 13. The following 2 servers are on 12, and use the same NIC but, as you can see, have 2 different values. We did NOT adjust them from their default values. kern.ipc.nmbjumbop: 508829 kern.ipc.nmbclusters: 1017660 kern.ipc.nmbjumbop: 125487 kern.ipc.nmbclusters: 250974 HTH --Chris
Hey, just wanted to chime in, I have 12.1-RELEASE-p3 and the problem still occurs. It happens when there's a high load on the ethernet card. I get the timeout messaged, and the link goes offline for about 4 seconds. Looking forward to get the fix released, this is a public facing server doing regular backups, so the disconnects are no good.
I'm also experiencing this on 12.1-RELEASE-p4. While networking usually works, large transfers seem to kill the driver, which can only be fixed by rebooting. When this happens, `dmesg` will be filled with: re0: watchdog timeout re0: link state changed to DOWN re0: link state changed to UP These three messages repeat forever until the machine is rebooted. Restarting the interface with ifconfig did not fix it. This typically happens when I create a large backup (I use borgmatic) to an SFTP server on my local network. The affected machine is an Odroid H2 which I use as NAS. It's got two Realtek RTL8111G NICs, which I am not able to replace (It's a Single Board Computer, the NICs are soldered to the board and there are no PCI slots).
Try https://github.com/kostikbel/rere
The situation got worse today when the monthly full backup started to shut down re0 by watchdog, and PF started to overwhelm itself. My box became unreachable, when connecting a console, all I could see was "PF states limit reached" messages. Only a hard reset helped. I already have high values for the aforementioned sysctl values, out of the box: kern.ipc.nmbclusters: 4082026 kern.ipc.nmbjumbop: 2041013 I have no idea how to fix this (other than having to compile a kernel with experimental drivers on a production box?) Any help is appreciated.
(In reply to László Károlyi from comment #48) Hi László, please follow comment #43 in the thread. its a small code change which solves the issue. Its working for me since months. my freebsd server drives a timemachine on a gigabit ethernet connection without any problems. Which is a similar scenario like yours.
(In reply to Ralf Wostrack from comment #49) Hi Ralf, thanks for your response. I've seen the corrected drivers posted in here. My problem is, getting that working involves kernel recompilation, which I can't afford on the production server with the GENERIC kernel. I need to keep the machine easily upgradeable, and patching/compiling a kernel with each upgrade is not a path I can take. I'm still waiting for the fixed driver to roll out in the distributed GENERIC kernel. Cheers, László
(In reply to László Károlyi from comment #50) Iam using the GENERIC Kernel as well. You only need to checkout the src of your installed version. After that, patch if_re.c in /usr/src/sys/dev/re with "make -C /usr/src/sys/modules/re && make -C /usr/src/sys/modules/re install" the module should be compile and installing to /boot/modules after reboot the kernel should use the new module. I'm not pretty sure, but this should be all. Best regards, Ralf
The if_re driver is built into the GENERIC kernel, so that won't work. You need to build a new GENERIC kernel without the built-in re driver to be able to use the module version.
(In reply to László Károlyi from comment #48) Hello, FWIW One of the servers I'm running with the stock FreeBSD re driver, is currently handling pf tables totaling more than 72 million addresses. This server never experiences re related lockups or watchdog messages. It's on 12/AMD64, and has the following sysctl tunables set: kern.ipc.nmbclusters: 1017660 kern.ipc.nmbjumbop: 508829 Maybe they work for you too? HTH --Chris
(In reply to László Károlyi from comment #50) Unfortunately, you may wait forever because the fix must be tested before it reaches distribution and GENERIC kernel. FreeBSD relies on user feedback and testing and you ought reconsider testing it in your environment. You have not rebuild the GENERIC, though. You may just rebuild kernel module and load it with /boot/loader.conf or /boot.nextboot.conf (one-time loading). Despite of driver presence in the GENERIC, loaded module will be used in preference.
(In reply to Chris Hutchinson from comment #53) I think others have stated that the bigger the values are, the better. My values are way bigger than these so I don't think this helps. But thanks anyways.
(In reply to Eugene Grosbein from comment #54) Assuming it doesn't need a custom kernel compiled, I'm willing to test this on my server, in hopes of this thing picking up some speed. Can someone point me to a driver that has the potential to be added to the GENERIC branch when I can confirm it works? There are a couple links here pointing to various versions of the re0 driver.
(In reply to Eugene Grosbein from comment #54) This is new to me. I was quite sure that you cannot load a module if it's already compiled into the kernel. The realtek driver Readme.txt in fact says that you have to build a new kernel without the re driver, to be able to use their driver as a module. Since when this is possible, if really is? Do you have any source code reference for that?
(In reply to Alex Dupre from comment #57) I'm sure Readme.txt was created long time ago. OTOH, this may be driver-dependent. For example, this works for Intel gigabit drivers (em and igb): loading patched module overrides driver built in GENERIC. I cannot supply you with a reference to source, sorry.
(In reply to Ralf Wostrack from comment #43) Can you attach your change in a form of "diff -u" for unpatched and patched versions of the source file?
(In reply to Eugene Grosbein from comment #58) If it works, then I think that it happens by accident and may be unreliable. It's not possible to add a module (in the sense of module_t) if a module with the same name is already registered. See module_register() and linker_file_register_modules().
(In reply to Andriy Gapon from comment #60) If stand-alone module is loaded by loader, which one is registered - stand-alone or built in the kernel?
(In reply to Andriy Gapon from comment #60) I've just checked it out with 11.4-RELEASE/amd64 in Virtualbox adding if_re_load="YES" to /boot/loader.conf and booting GENERIC kernel. First, loader successfully loads kernel then if_re.ko Then kernel starts and after the line "FreeBSD clang version 10.0.0 (...)" it prints: module_register: cannot register pci/re from kernel: already loaded from if_re.ko So, the module has priority in case of /boot/loader.conf or /boot/nextboot.conf
(In reply to Eugene Grosbein from comment #62) Glad to hear that. Now I only need a source where I can compile and load from, and I'll be gone load testing for a while on my gigabit connected bare metal server.
Just to let you know that Realtek just released a new version of the driver: v196.04 I haven't tried it yet, it looks like they added WOL support, too. If loading the module from the loader overrides the one built into the kernel, then I think I could add it into the ports tree, and finally I'll be able to use freebsd-update / pkg upgrade to update my system.
Pyun YongHyeon was inactive (on this PR at least) for almost 5 years. Resetting Assignee and therefore Status.
(In reply to Ralf Wostrack from comment #43) I tested your patch and it didn't help me - watchdog timeout still happened during some hours after boot. I also tested increased kern.ipc.nmbjumbop and kern.ipc.nmbclusters values - no luck. (In reply to Konstantin Belousov from comment #47) The last thing I checked is https://github.com/kostikbel/rere - now I have 14 days uptime and no timeouts. I use many vlans on re0 and I think it can affect the bug.
I've commited the vendor driver to the ports tree, with additional improvements and instructions to load it with the stock GENERIC kernel: https://svnweb.freebsd.org/ports?view=revision&revision=542324 Thanks to kib for his support.
(In reply to Alex Dupre from comment #67) I just tried to load the compiled driver with the GENERIC kernel, doesn't seem to work. I get no overriding message in dmesg, and when I try to load the module manually at boot time, I get the following message: OK load if_re elf64_obj_loadfile: can't load module before kernel elf64_obj_loadfile: can't load module before kernel can't load file '/boot/modules/if_re.ko': operation not permitted any hints? I'm on 12.1-RELEASE-p7.
(In reply to László Károlyi from comment #68) If you followed the pkg-message instructions you should simply see the following lines in dmegs: re0: <Realtek PCIe GbE Family Controller> port 0xe000-0xe0ff mem 0x91104000-0x91104fff,0x91100000-0x91103fff irq 20 at device 0.0 on pci2 re0: Using Memory Mapping! re0: Using 1 MSI-X message re0: ASPM disabled re0: version:1.96.04 instead of: re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 0xe000-0xe0ff mem 0x91104000-0x91104fff,0x91100000-0x91103fff irq 20 at device 0.0 on pci2 re0: Using 1 MSI-X message re0: turning off MSI enable bit. re0: ASPM disabled re0: Chip rev. 0x54000000 re0: MAC rev. 0x00100000 In particular you should check the "version:1.06.04" line. If you want to manually load it from the loader, then you have to load the kernel before: load /boot/kernel/kernel load /boot/modules/if_re.ko
(In reply to Alex Dupre from comment #69) I followed the instructions but the module still won't load. I've fired up a virtual machine to test locally with the same patchlevel/kernel/compiled module, and I get various error messages. When loading it from the boot prompt, I get a "can't find kernel" error message: https://i.imgur.com/TtIAW8h.png On automatically loading from loader.conf, I have to halt the virtualmachine to be able to see the message but there's another, different error: https://i.imgur.com/TsSqp1g.png I hope you can work with this, because I can't do more to prove that the module won't load.
(In reply to Alex Dupre from comment #69) Additional information, I used the stable/12 branch to check out for the kernel sources: svn checkout https://svn.FreeBSD.org/base/stable/12 /usr/src/
(In reply to László Károlyi from comment #68) > I get no overriding message in dmesg Loader messages do not get to the dmesg buffer. Instead, use "kldstat -v" to verify that module is loaded. And you cannot load the module after kernel version is activated (registered), so use /boot/loader.conf or /boot/nextboot.conf
(In reply to Eugene Grosbein from comment #72) Hey Eugene, that is what I did. See #71 and #70.
(In reply to László Károlyi from comment #70) > but there's another, different error: > https://i.imgur.com/TsSqp1g.png You have mixup of incompatible kernel and module binaries.
(In reply to László Károlyi from comment #71) > Additional information, I used the stable/12 branch to check out for the kernel sources: > svn checkout https://svn.FreeBSD.org/base/stable/12 /usr/src/ This is the error. You have to use sources of same branch as of your GENERIC kernel to build the module. If you use RELEASE-pX, then you mush checkout releng/12 and not stable/12.
Sorry, it's releng/12.1 for 12.1-RELEASE-pX.
(In reply to Eugene Grosbein from comment #76) Thanks. I'll make some adjustments and will touch base later.
(In reply to Eugene Grosbein from comment #76) Alright, it seems I managed to start my server with the newly compiled driver: re0: <Realtek PCIe GbE Family Controller> port 0xd000-0xd0ff mem 0xf7204000-0xf7204fff,0xf7200000-0xf7203fff irq 35 at device 0.0 on pci8 re0: Using Memory Mapping! re0: Using 1 MSI-X message re0: ASPM disabled re0: version:1.96.04 There were no "module_register" errors in the dmesg, but looking at the output, this is the new driver. I'll try to do some load tests in the upcoming days and see what gives. Thanks!
So, I have tested today, and the results are great. I hit my server hard with transferring a 72GB file to an SMB mount on the local network, and then I copied that file over from the SMB mount to backblaze B2 with rclone (--transfers 32). The system did 250Mbps at its peak. Here's a munin chart about it: https://i.imgur.com/yg6R6ii.png No watchdog timeouts whatsoever, no link drops, and the system stayed snappy the entire time, while I was observing the copying. I think we can safely say now that the new Realtek driver is stable and I'd really like to have it merged into the GENERIC kernel. My only concern about the driver is these lines in the dmesg: --- This product is covered by one or more of the following patents: US6,570,884, US6,115,776, and US6,327,625. --- I hope this doesn't mean that the FreeBSD project can't use the driver in its kernel tree.
(In reply to László Károlyi from comment #78) > There were no "module_register" errors in the dmesg This is a messages from loader. Loader messages do not get to kernel dmesg buffer.
(In reply to László Károlyi from comment #79) You should monitor this driver's work in the long run. AFAIK this drivers uses 9KB mbufs unconditionally no matter if MTU is 1500 or more, so in the long run as kernel memory gets fragmented the driver could cause lock up of the system if it cannot allocate jumbo mbuf cluster.
(In reply to Eugene Grosbein from comment #81) I have 64GB ram in the server with 54G being wired right now, so I don't think such a condition will occur at my machine.
(In reply to László Károlyi from comment #82) These numbers does not matter. On my 16G machine I got the issue in approximately 1 week of uptime. The port (not the stock vendor driver) has the tunable hw.re.max_rx_mbuf_sz that can be set eg. to 4096 to avoid use of two pages clusters.
(In reply to Konstantin Belousov from comment #83) How about using 4096 by default with a notice in pkg-message for users requiring larger MTU?
(In reply to Eugene Grosbein from comment #84) I'd like to have this in the GENERIC kernel, at least as some kind of a configurable option in loader.conf, so that I won't have to recompile the ports module on each FreeBSD update ... Keeping a kernel and ports tree just for that on production servers is overkill.
(In reply to László Károlyi from comment #85) Yes. But, if you use GENERIC of RELEASE0pX then you don't need to keep source trees at each server. Just compile the module once for major FreeBSD branch, use "pkg create" to make a package and install the package to your servers.
(In reply to Konstantin Belousov from comment #83) > On my 16G machine I got the issue in approximately 1 week of uptime. Should it help increasing kern.ipc.nmbjumbo9 and/or kern.ipc.nmbjumbo16?
(In reply to Konstantin Belousov from comment #83) One week has passed and still works like a charm. In fact for some reason, I see it being faster than the GENERIC re0 drivers, I see bigger throughput speeds in my munin charts. At August 1st there will be another load test (another full backup) made from cron, I think if it doesn't crash then, we can consider the driver stable.
(In reply to László Károlyi from comment #88) So, today there was another full backup to Bacbkblaze B2. Not a hiccup, not a kernel message. The system did a stable throughput of 140-150Mbps: https://i.imgur.com/TaupA2S.png So, can we have the original Realtek driver in GENERIC replaced with this one please?
(In reply to László Károlyi from comment #89) There is no plan to replace it, the FreeBSD driver has some features that are missing in the Realtek one, but Stefan Esser is working to integrate the changes, and hopefully fix the issues. In the meantime enjoy the port.
(In reply to Alex Dupre from comment #90) Sad. At least please notify me when you have a working Realtek driver in GENERIC.
To possibly save anyone else some time, this is exactly what I did to get the driver installed on my FreeBSD 12.1 system. svn checkout https://svn.freebsd.org/base/releng/`uname -r | cut -d'-' -f1,1` /usr/src portsnap fetch portsnap extract cd /usr/ports/net/realtek-re-kmod/ make install echo 'if_re_load="YES"' >> /boot/loader.conf echo 'if_re_name="/boot/modules/if_re.ko"' >> /boot/loader.conf
(In reply to Bob Smith from comment #92) With FreeBSD 13.0 out, and it having changed its src and ports repository to git (https://docs.freebsd.org/en/books/handbook/mirrors/#git), the process changes to: 1. install git 2. git clone -o freebsd -b releng/$(uname -r | cut -d'-' -f1,1) https://git.FreeBSD.org/src.git /usr/src 3. git clone -o freebsd https://git.freebsd.org/ports.git /usr/ports 4. cd /usr/ports/net/realtek-re-kmod/ 5. make install 6. echo 'if_re_load="YES"' >> /boot/loader.conf 7. echo 'if_re_name="/boot/modules/if_re.ko"' >> /boot/loader.conf Just thought I'd update this since SVN is no longer the default way to check out sources and ports.
I'm just going to throw this out there for a couple of reasons... 1) several people indicated the vendors driver solved it for them 2) I just bought a realtek card capable of 9k jumbo frames. But the re(4) kernel module built into the kernel wouldn't do 9k jumbo frames. 3) This will work even if you already have the re(4) module built in, or from /boot/kernel/ Please try the /usr/ports/net/realtek-re-kmod/ After you've either built and installed it, or pkg(8) installed it. Add the following to loader.conf(5) if_re_load="YES" if_re_name="/boot/modules/if_re.ko" I have zero trouble using this driver, and am also able to use the 9k jumbo frames this card is capable of managing. HTH --Chris
Might be useful to increase awareness of the vendor driver by adding it to the man page: https://reviews.freebsd.org/D33677
Loading the realtek driver from the package repo fixes the watchdog issue: pkg install realtek-re-kmod Add to /boot/loader.conf.local: if_re_load="YES" if_re_name="/boot/modules/if_re.ko" The NICs are stable on freebsd 12.2 for me, but I can't get more than ~800Mbps out of them. There is a second, performance impacting issue with the driver but I will take a stable 800Mbps which is what it can do with 1.96.04. Tests were run using iperf3 to a local machine.
Correction... To load the driver (FreeBSD 12.2) from the package repo: pkg install realtek-re-kmod
*** Bug 227979 has been marked as a duplicate of this bug. ***
*** Bug 208205 has been marked as a duplicate of this bug. ***
Summarizing bug 208205 ... --------------------------------------- 10.3-RELEASE-p5 --------------------------------------- re0@pci0:2:0:0: class=0x020000 card=0x81681849 chip=0x816810ec rev=0x11 hdr=0x00 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller' class = network subclass = ethernet bar [10] = type I/O Port, range 32, base rxd000, size 256, enabled bar [18] = type Memory, range 64, base rx91204000, size 4096, enabled bar [20] = type Prefetchable Memory, range 64, base rx91200000, size 16384, enabled cap 01[40] = powerspec 3 supports D0 D1 D2 D3 current D0 cap 05[50] = MSI supports 1 message, 64 bit cap 10[70] = PCI-Express 2 endpoint IRQ 1 max data 128(128) RO link x1(x1) speed 2.5(2.5) ASPM disabled(L0s/L1) cap 11[b0] = MSI-X supports 4 messages, enabled Table in map 0x20[0x0], PBA in map 0x20[0x800] cap 03[d0] = VPD ecap 0001[100] = AER 1 0 fatal 0 non-fatal 0 corrected ecap 0002[140] = VC 1 max VC0 ecap 0003[160] = Serial 1 01000000684ce000 ecap 0018[170] = LTR 1 --------------------------------------- FreeBSD 11.0-ALPHA6 #0 r302331 --------------------------------------- re0@pci0:3:0:0: class=0x020000 card=0x85051043 chip=0x816810ec rev=0x09 hdr=0x00 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller' class = network subclass = ethernet bar [10] = type I/O Port, range 32, base rxe800, size 256, enabled bar [18] = type Prefetchable Memory, range 64, base rxfdfff000, size 4096, enabled bar [20] = type Prefetchable Memory, range 64, base rxfdff8000, size 16384, enabled cap 01[40] = powerspec 3 supports D0 D1 D2 D3 current D0 cap 05[50] = MSI supports 1 message, 64 bit cap 10[70] = PCI-Express 2 endpoint MSI 1 max data 128(128) RO link x1(x1) speed 2.5(2.5) ASPM disabled(L0s/L1) cap 11[b0] = MSI-X supports 4 messages, enabled Table in map 0x20[0x0], PBA in map 0x20[0x800] cap 03[d0] = VPD ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected ecap 0002[140] = VC 1 max VC0 ecap 0003[160] = Serial 1 0000000000000000 --------------------------------------- 10.3-p5 --------------------------------------- re0@pci0:2:0:0: class=0x020000 card=0x78171462 chip=0x816810ec rev=0x0c hdr=0x00 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller' class = network subclass = ethernet --------------------------------------- 11-STABLE --------------------------------------- re1@pci0:3:0:0: class=0x020000 card=0x230e1565 chip=0x816810ec rev=0x07 hdr=0x00 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller' class = network subclass = ethernet --------------------------------------- 11.1-RELEASE (no longer reproducible after 11.2-RELEASE) --------------------------------------- hardware1: re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 0xe000-0xe0ff mem 0xf0004000-0xf0004fff,0xf0000000-0xf0003fff irq 17 at device 0.0 on pci2 re0: MSI count : 1 re0: MSI-X count : 4 re0: attempting to allocate 1 MSI-X vectors (4 supported) re0: using IRQ 265 for MSI-X re0: Using 1 MSI-X message re0: turning off MSI enable bit. re0: Chip rev. 0x2c800000 re0: MAC rev. 0x00100000 miibus0: <MII bus> on re0 re0: Using defaults for TSO: 65518/35/2048 re0: bpf attached hardware2: re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 0xe000-0xe0ff mem 0xf7c00000-0xf7c00fff,0xf0000000-0xf0003fff irq 16 at device 0.0 on pci1 re0: MSI count : 1 re0: MSI-X count : 4 re0: attempting to allocate 1 MSI-X vectors (4 supported) re0: using IRQ 266 for MSI-X re0: Using 1 MSI-X message re0: Chip rev. 0x4c000000 re0: MAC rev. 0x00000000 miibus0: <MII bus> on re0 re0: Using defaults for TSO: 65518/35/2048 re0: bpf attached re0: Ethernet address: 44:8a:5b:d4:49:6d re0: netmap queues/slots: TX 1/256, RX 1/256 random: harvesting attach, 8 bytes (4 bits) from re0 hardware1: re0@pci0:2:0:0: class=0x020000 card=0x78161462 chip=0x816810ec rev=0x06 hdr=0x00 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller' class = network subclass = ethernet hardware2: re0@pci0:2:0:0: class=0x020000 card=0x78231462 chip=0x816810ec rev=0x0c hdr=0x00 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller' class = network subclass = ethernet --------------------------------------- 12.0 --------------------------------------- Hardware Info: re0@pci0:3:0:0: class=0x020000 card=0xe0001458 chip=0x816810ec rev=0x0c hdr=0x00 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller' class = network subclass = ethernet re1@pci0:4:0:0: class=0x020000 card=0xe0001458 chip=0x816810ec rev=0x0c hdr=0x00 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller' class = network subclass = ethernet Reported Workaround: bug 208205 comment 32 (commenting re_txeof in re_tick function in if_re.c.) --------------------------------------- 13.0-STABLE #0 stable/13-n248872-2c7441c86ef --------------------------------------- re1@pci0:5:0:0: class=0x020000 rev=0x15 hdr=0x00 vendor=0x10ec device=0x8168 subvendor=0x17aa subdevice=0x5094 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller' class = network subclass = ethernet bar [10] = type I/O Port, range 32, base rx2000, size 256, enabled bar [18] = type Memory, range 64, base rxfd504000, size 4096, enabled bar [20] = type Memory, range 64, base rxfd500000, size 16384, enabled cap 01[40] = powerspec 3 supports D0 D1 D2 D3 current D0 cap 05[50] = MSI supports 1 message, 64 bit cap 10[70] = PCI-Express 2 endpoint MSI 1 max data 128(128) RO max read 4096 link x1(x1) speed 2.5(2.5) ASPM disabled(L0s/L1) ClockPM disabled cap 11[b0] = MSI-X supports 4 messages, enabled Table in map 0x20[0x0], PBA in map 0x20[0x800] ecap 0001[100] = AER 2 0 fatal 0 non-fatal 1 corrected ecap 0002[140] = VC 1 max VC0 ecap 0003[160] = Serial 1 01000000684ce000 ecap 0018[170] = LTR 1 ecap 001e[178] = L1 PM Substates 1 --------------------------------------- FreeBSD 12.2 pfSense (after 350Mbps) - realtek-re-kmod (1.96.04) fixes --------------------------------------- re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 0xe000-0xe0ff mem 0xf0104000-0xf0104fff,0xf0100000-0xf0103fff irq 19 at device 0.0 on pci3 re0: Using 1 MSI-X message re0: turning off MSI enable bit. re0: ASPM disabled re0: Chip rev. 0x2c800000 re0: MAC rev. 0x00100000Realtek --------------------------------------- Workarounds --------------------------------------- Reported workaround #1: bug 208205 comment 22 Reported Workaround #2: bug 208205 comment 33 (kern.ipc.nmbjumbop / kern.ipc.nmbclusters) - FreeBSD 11, not reproducible on 12 --------------------------------------- Additional References --------------------------------------- * https://forums.freebsd.org/threads/10-2-release-re0-watchdog-timeout.55306/#post-337045 * if_re modification code: https://github.com/megabytefisher/if_re-mod
I spent some time trying to debug this, without significant results, but I wanted to share what I learned, mostly things that didn't work. My hardware is class=0x020000 rev=0x0c hdr=0x00 vendor=0x10ec device=0x8168 subvendor=0x1462 subdevice=0x7850; and I was able to fairly reliably trigger the condition with iperf3 -s on the re0 (in a vnet jail), and iperf3 -c --bidir on another interface. My hardware is fairly meager, Intel G3470, using a multiport Intel em card for the otherside of the iperf3 testing. Using the kernel driver (13.1-RELEASE), I would tend to see traffic stall and timeouts be reported within one minute, although sometimes within a few seconds. Using the vendor driver from net/realtek-re-kmod, I was unable to reproduce the error condition. There's a fair amount of differences between the two drivers, and I tried a bunch of things, but could never pass my test. Sending more transmit requests didn't help; the vendor driver sends it twice, but that doesn't seem to help; sending TX requests in the watchdog handler didn't help either. I noticed that the Tx queue would usually be stuck on the 2nd segment of a two or three segment packet, so I tried adding m_defrag to make all the packets a single segment. That didn't work either. There didn't seem to be a pattern of which Tx segment the NIC would get stuck on, or anything obvious about the data addresses. I also tried messing with the reset to maybe make that more reliable. The vendor driver does reset a little bit differently, but nothing there made things reliable for me either. After a few resets, the NIC just doesn't seem to start sending again; until a reboot (at least with my NIC; I've seen some reports that a power cycle is required). Reading the tx/rx registers before setting them, I saw that those are retained across reboots, so the device doesn't seem to be getting fully reset, which might explain some reporter's need to power cycle. I suspect there's something in the mostly opaque vendor initialization code, that puts the device in a better mode where it doesn't get stuck (at least with my test loads). I do notice that the vendor initialization causes the device to emit ethernet pause frames, which doesn't happen with the kernel code. One thing I was able to make a positive difference with though, the kernel reset doesn't clear the RX/TX buffers, although it frees the associated mbufs. Sometimes during a reset or shortly afterwards, the NIC is still using those descriptor arrays; so I would see weird packets apparently coming in on re0 via tcpdump, many of them looking like recently used mbufs on other interfaces. I also saw some bizarre packets RXed by the em NIC connected to re0, and some evidence of other NICs receiving corrupted mbufs. Using explicit_bzero during descriptor setup seemed to help, as well as turning off the CMD_OWN flag on the descriptors during re_stop. It's obviously a bit tricky if the device is acknowledging a command reset, but not actually fully resetting. I don't have an IOMMU system, but I'd guess if you had that, you could get more information about what's going on. Running with INVARIANTS showed some use after free errors, which I believe are related to the device using the mbufs, although it was hard to trigger and difficult to debug.
Same problem here on 14-CURRENT @ 14/08/2022 with re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 0xe000-0xe0ff mem 0xf7004000-0xf7004fff,0xf7000000-0xf7003fff irq 16 at device 0.0 on pci2 re0: Using 1 MSI-X message re0: turning off MSI enable bit. re0: ASPM disabled re0: Chip rev. 0x54000000 re0: MAC rev. 0x00100000 miibus0: <MII bus> on re0 re0: Using defaults for TSO: 65518/35/2048 re0: Ethernet address: 18:66:da:3b:96:1f re0: netmap queues/slots: TX 1/256, RX 1/256 and pciconf: re0@pci0:2:0:0: class=0x020000 rev=0x15 hdr=0x00 vendor=0x10ec device=0x8168 subvendor=0x1028 subdevice=0x06bb vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller' class = network subclass = ethernet
same error with freebsd 13.1-release --------------------------------------------------------------- ---<<BOOT>>--- Copyright (c) 1992-2021 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 13.1-RELEASE releng/13.1-n250148-fc952ac2212 GENERIC amd64 FreeBSD clang version 13.0.0 (git@github.com:llvm/llvm-project.git llvmorg-13.0.0-0-gd7b669b3a303) VT(efifb): resolution 1366x768 CPU: Intel(R) Core(TM) i5-3230M CPU @ 2.60GHz (2594.23-MHz K8-class CPU) Origin="GenuineIntel" Id=0x306a9 Family=0x6 Model=0x3a Stepping=9 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x7fbae3bf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND> AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM> AMD Features2=0x1<LAHF> Structured Extended Features=0x281<FSGSBASE,SMEP,ERMS> XSAVE Features=0x1<XSAVEOPT> VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant, performance statistics real memory = 8589934592 (8192 MB) avail memory = 8138895360 (7761 MB) Event timer "LAPIC" quality 600 ACPI APIC Table: <LENOVO CB-01 > FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 1 package(s) x 2 core(s) x 2 hardware threads random: registering fast source Intel Secure Key RNG random: fast provider: "Intel Secure Key RNG" random: unblocking device. ioapic0 <Version 2.0> irqs 0-23 Launching APs: 1 2 3 random: entropy device external interface kbd1 at kbdmux0 efirtc0: <EFI Realtime Clock> efirtc0: registered as a time-of-day clock, resolution 1.000000s smbios0: <System Management BIOS> at iomem 0xbaebef98-0xbaebefb6 smbios0: Version: 2.7, BCD Revision: 2.7 aesni0: <AES-CBC,AES-CCM,AES-GCM,AES-ICM,AES-XTS> acpi0: <LENOVO CB-01> acpi0: Power Button (fixed) cpu0: <ACPI CPU> on acpi0 hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 950 Event timer "HPET" frequency 14318180 Hz quality 550 Event timer "HPET1" frequency 14318180 Hz quality 440 Event timer "HPET2" frequency 14318180 Hz quality 440 Event timer "HPET3" frequency 14318180 Hz quality 440 Event timer "HPET4" frequency 14318180 Hz quality 440 atrtc0: <AT realtime clock> port 0x70-0x77 irq 8 on acpi0 atrtc0: Warning: Couldn't map I/O. atrtc0: registered as a time-of-day clock, resolution 1.000000s Event timer "RTC" frequency 32768 Hz quality 0 attimer0: <AT timer> port 0x40-0x43,0x50-0x53 irq 0 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0 acpi_ec0: <Embedded Controller: GPE 0x17> port 0x62,0x66 on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0 pci1: <ACPI PCI bus> on pcib1 vgapci0: <VGA-compatible display> port 0x3000-0x30ff mem 0xd0000000-0xd7ffffff,0xd8600000-0xd863ffff at device 0.0 on pci1 vgapci1: <VGA-compatible display> port 0x4000-0x403f mem 0xd8000000-0xd83fffff,0xc0000000-0xcfffffff at device 2.0 on pci0 vgapci1: Boot video device xhci0: <Intel Panther Point USB 3.0 controller> mem 0xd8700000-0xd870ffff at device 20.0 on pci0 xhci0: 32 bytes context size, 64-bit DMA usbus0: waiting for BIOS to give up control xhci0: Port routing mask set to 0xffffffff usbus0 on xhci0 usbus0: 5.0Gbps Super Speed USB v3.0 pci0: <simple comms> at device 22.0 (no driver attached) ehci0: <Intel Panther Point USB 2.0 controller> mem 0xd8719000-0xd87193ff at device 26.0 on pci0 usbus1: waiting for BIOS to give up control usbus1: timed out waiting for BIOS usbus1: EHCI version 1.0 usbus1 on ehci0 usbus1: 480Mbps High Speed USB v2.0 hdac0: <Intel Panther Point HDA Controller> mem 0xd8710000-0xd8713fff at device 27.0 on pci0 pcib2: <ACPI PCI-PCI bridge> at device 28.0 on pci0 pci2: <ACPI PCI bus> on pcib2 alc0: <Atheros AR8172 PCIe Fast Ethernet> port 0x2000-0x207f mem 0xd8500000-0xd853ffff at device 0.0 on pci2 alc0: 11776 Tx FIFO, 12032 Rx FIFO alc0: Using 1 MSIX message(s). miibus0: <MII bus> on alc0 atphy0: <Atheros F1 10/100/1000 PHY> PHY 0 on miibus0 atphy0: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow alc0: Using defaults for TSO: 65518/35/2048 alc0: Ethernet address: 20:89:84:99:9c:c9 pcib3: <ACPI PCI-PCI bridge> at device 28.1 on pci0 pci3: <ACPI PCI bus> on pcib3 iwn0: <Intel Centrino Wireless-N 135> mem 0xd8400000-0xd8401fff at device 0.0 on pci3 ehci1: <Intel Panther Point USB 2.0 controller> mem 0xd8718000-0xd87183ff at device 29.0 on pci0 usbus2: waiting for BIOS to give up control usbus2: timed out waiting for BIOS usbus2: EHCI version 1.0 usbus2 on ehci1 usbus2: 480Mbps High Speed USB v2.0 isab0: <PCI-ISA bridge> at device 31.0 on pci0 isa0: <ISA bus> on isab0 ahci0: <Intel Panther Point AHCI SATA controller> port 0x4088-0x408f,0x4094-0x4097,0x4080-0x4087,0x4090-0x4093,0x4060-0x407f mem 0xd8717000-0xd87177ff at device 31.2 on pci0 ahci0: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported ahcich0: <AHCI channel> at channel 0 on ahci0 ahcich2: <AHCI channel> at channel 2 on ahci0 ahciem0: <AHCI enclosure management bridge> on ahci0 acpi_lid0: <Control Method Lid Switch> on acpi0 acpi_tz0: <Thermal Zone> on acpi0 acpi_button0: <Power Button> on acpi0 atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] psm0: <PS/2 Mouse> irq 12 on atkbdc0 psm0: [GIANT-LOCKED] WARNING: Device "psm" is Giant locked and may be deleted before FreeBSD 14.0. psm0: model Synaptics Touchpad, device ID 3 battery0: <ACPI Control Method Battery> on acpi0 acpi_acad0: <AC Adapter> on acpi0 est0: <Enhanced SpeedStep Frequency Control> on cpu0 Timecounter "TSC-low" frequency 1297053673 Hz quality 1000 Timecounters tick every 1.000 msec hdacc0: <Conexant CX20757 HDA CODEC> at cad 0 on hdac0 hdaa0: <Conexant CX20757 Audio Function Group> at nid 1 on hdacc0 pcm0: <Conexant CX20757 (Analog)> at nid 23 and 26 on hdaa0 pcm1: <Conexant CX20757 (Left Analog)> at nid 22 and 25 on hdaa0 hdacc1: <Intel Panther Point HDA CODEC> at cad 3 on hdac0 hdaa1: <Intel Panther Point Audio Function Group> at nid 1 on hdacc1 pcm2: <Intel Panther Point (HDMI/DP 8ch)> at nid 5 on hdaa1 ugen0.1: <Intel XHCI root HUB> at usbus0 ugen1.1: <Intel EHCI root HUB> at usbus1 ugen2.1: <Intel EHCI root HUB> at usbus2 uhub0 on usbus0 uhub0: <Intel XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0 Trying to mount root from ufs:/dev/ada1p3 [rw]... uhub1 on usbus1 uhub1: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1 uhub2 on usbus2 uhub2: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus2 ses0 at ahciem0 bus 0 scbus2 target 0 lun 0 ses0: <AHCI SGPIO Enclosure 2.00 0001> SEMB S-E-S 2.00 device ses0: SEMB SES Device ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: <TOSHIBA-TR150 SAFZ12.3> ACS-2 ATA SATA 3.x device ada0: Serial Number 568B31JWKBZU ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 228936MB (468862128 512 byte sectors) ses0: pass0,ada0 in 'Slot 00', SATA Slot: scbus0 target 0 ada1 at ahcich2 bus 0 scbus1 target 0 lun 0 ada1: <HGST HTS545050A7E380 GG2ZBD90> ATA8-ACS SATA 2.x device ada1: Serial Number TMA55DZN3RBWKP ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada1: Command Queueing enabled ada1: 476940MB (976773168 512 byte sectors) ses0: pass1,ada1 in 'Slot 02', SATA Slot: scbus1 target 0 uhub0: 8 ports with 8 removable, self powered ugen0.2: <E-Signal USB Gaming Keyboard> at usbus0 ukbd0 on uhub0 ukbd0: <E-Signal USB Gaming Keyboard, class 0/0, rev 2.00/1.00, addr 1> on usbus0 kbd2 at ukbd0 ukbd1 on uhub0 ukbd1: <E-Signal USB Gaming Keyboard, class 0/0, rev 2.00/1.00, addr 1> on usbus0 kbd3 at ukbd1 uhub2: 2 ports with 2 removable, self powered uhub1: 2 ports with 2 removable, self powered ugen0.3: <CF0D4PCH4 Lenovo EasyCamera> at usbus0 ugen2.2: <vendor 0x8087 product 0x0024> at usbus2 uhub3 on uhub2 uhub3: <vendor 0x8087 product 0x0024, class 9/0, rev 2.00/0.00, addr 2> on usbus2 ugen1.2: <vendor 0x8087 product 0x0024> at usbus1 uhub4 on uhub1 uhub4: <vendor 0x8087 product 0x0024, class 9/0, rev 2.00/0.00, addr 2> on usbus1 Root mount waiting for: usbus0 usbus1 usbus2 ugen0.4: <ASIX AX88179A> at usbus0 uhub3: 6 ports with 6 removable, self powered uhub4: 6 ports with 6 removable, self powered ugen1.3: <PixArt USB Optical Mouse> at usbus1 ugen1.4: <vendor 0x8087 product 0x07da> at usbus1 drmn1: <drmn> on vgapci1 vgapci1: child drmn1 requested pci_enable_io vgapci1: child drmn1 requested pci_enable_io [drm] Unable to create a private tmpfs mount, hugepage support will be disabled(-19). [drm] Got stolen memory base rxbba00000, size 0x4000000 sysctl_warn_reuse: can't re-use a leaf (hw.dri.debug)! [drm] Initialized i915 1.6.0 20200917 for drmn1 on minor 0 VT: Replacing driver "efifb" with new "fb". start FB_INFO: type=11 height=768 width=1366 depth=32 pbase=0xc0000000 vbase=0xfffff800c0000000 name=drmn1 flags=0x0 stride=5504 bpp=32 end FB_INFO ichsmb0: <Intel Panther Point SMBus controller> port 0x4040-0x405f mem 0xd8715000-0xd87150ff at device 31.3 on pci0 smbus0: <System Management Bus> on ichsmb0 lo0: link state changed to UP uhid0 on uhub0 uhid0: <E-Signal USB Gaming Keyboard, class 0/0, rev 2.00/1.00, addr 1> on usbus0 axge0 on uhub0 axge0: <NetworkInterface> on usbus0 ums0 on uhub4 ums0: <PixArt USB Optical Mouse, class 0/0, rev 2.00/1.00, addr 3> on usbus1 ums0: 3 buttons and [XYZ] coordinates ID=0 ubt0 on uhub4 ubt0: <vendor 0x8087 product 0x07da, class 224/1, rev 2.00/78.69, addr 4> on usbus1 miibus1: <MII bus> on axge0 ukphy0: <Generic IEEE 802.3u media interface> PHY 3 on miibus1 ukphy0: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow ue0: <USB Ethernet> on axge0 ue0: Ethernet address: f8:e4:3b:9f:ea:3c WARNING: attempt to domain_add(bluetooth) after domainfinalize() WARNING: attempt to domain_add(netgraph) after domainfinalize() ue0: link state changed to DOWN ue0: link state changed to UP Security policy loaded: MAC/ntpd (mac_ntpd) ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ubt0: ubt_bulk_read_callback:1119: bulk-in transfer failed: USB_ERR_STALLED ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ugen0.4: <ASIX AX88179A> at usbus0 (disconnected) axge0: at uhub0, port 6, addr 3 (disconnected) ukphy0: detached miibus1: detached axge0: detached ugen0.4: <ASIX AX88179A> at usbus0 axge0 on uhub0 axge0: <NetworkInterface> on usbus0 miibus1: <MII bus> on axge0 ukphy0: <Generic IEEE 802.3u media interface> PHY 3 on miibus1 ukphy0: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow ue0: <USB Ethernet> on axge0 ue0: Ethernet address: f8:e4:3b:9f:ea:3c ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ue0: link state changed to DOWN ue0: link state changed to UP ----------------------------------------------------- root@ykla:/usr/home/ykla # pciconf -lv hostb0@pci0:0:0:0: class=0x060000 rev=0x09 hdr=0x00 vendor=0x8086 device=0x0154 subvendor=0x17aa subdevice=0x3977 vendor = 'Intel Corporation' device = '3rd Gen Core processor DRAM Controller' class = bridge subclass = HOST-PCI pcib1@pci0:0:1:0: class=0x060400 rev=0x09 hdr=0x01 vendor=0x8086 device=0x0151 subvendor=0x17aa subdevice=0x3977 vendor = 'Intel Corporation' device = 'Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port' class = bridge subclass = PCI-PCI vgapci1@pci0:0:2:0: class=0x030000 rev=0x09 hdr=0x00 vendor=0x8086 device=0x0166 subvendor=0x17aa subdevice=0x3800 vendor = 'Intel Corporation' device = '3rd Gen Core processor Graphics Controller' class = display subclass = VGA xhci0@pci0:0:20:0: class=0x0c0330 rev=0x04 hdr=0x00 vendor=0x8086 device=0x1e31 subvendor=0x17aa subdevice=0x3977 vendor = 'Intel Corporation' device = '7 Series/C210 Series Chipset Family USB xHCI Host Controller' class = serial bus subclass = USB none0@pci0:0:22:0: class=0x078000 rev=0x04 hdr=0x00 vendor=0x8086 device=0x1e3a subvendor=0x17aa subdevice=0x3977 vendor = 'Intel Corporation' device = '7 Series/C216 Chipset Family MEI Controller' class = simple comms ehci0@pci0:0:26:0: class=0x0c0320 rev=0x04 hdr=0x00 vendor=0x8086 device=0x1e2d subvendor=0x17aa subdevice=0x3977 vendor = 'Intel Corporation' device = '7 Series/C216 Chipset Family USB Enhanced Host Controller' class = serial bus subclass = USB hdac0@pci0:0:27:0: class=0x040300 rev=0x04 hdr=0x00 vendor=0x8086 device=0x1e20 subvendor=0x17aa subdevice=0x3977 vendor = 'Intel Corporation' device = '7 Series/C216 Chipset Family High Definition Audio Controller' class = multimedia subclass = HDA pcib2@pci0:0:28:0: class=0x060400 rev=0xc4 hdr=0x01 vendor=0x8086 device=0x1e10 subvendor=0x17aa subdevice=0x3977 vendor = 'Intel Corporation' device = '7 Series/C216 Chipset Family PCI Express Root Port 1' class = bridge subclass = PCI-PCI pcib3@pci0:0:28:1: class=0x060400 rev=0xc4 hdr=0x01 vendor=0x8086 device=0x1e12 subvendor=0x17aa subdevice=0x3977 vendor = 'Intel Corporation' device = '7 Series/C210 Series Chipset Family PCI Express Root Port 2' class = bridge subclass = PCI-PCI ehci1@pci0:0:29:0: class=0x0c0320 rev=0x04 hdr=0x00 vendor=0x8086 device=0x1e26 subvendor=0x17aa subdevice=0x3977 vendor = 'Intel Corporation' device = '7 Series/C216 Chipset Family USB Enhanced Host Controller' class = serial bus subclass = USB isab0@pci0:0:31:0: class=0x060100 rev=0x04 hdr=0x00 vendor=0x8086 device=0x1e59 subvendor=0x17aa subdevice=0x3977 vendor = 'Intel Corporation' device = 'HM76 Express Chipset LPC Controller' class = bridge subclass = PCI-ISA ahci0@pci0:0:31:2: class=0x010601 rev=0x04 hdr=0x00 vendor=0x8086 device=0x1e03 subvendor=0x17aa subdevice=0x3977 vendor = 'Intel Corporation' device = '7 Series Chipset Family 6-port SATA Controller [AHCI mode]' class = mass storage subclass = SATA ichsmb0@pci0:0:31:3: class=0x0c0500 rev=0x04 hdr=0x00 vendor=0x8086 device=0x1e22 subvendor=0x17aa subdevice=0x3977 vendor = 'Intel Corporation' device = '7 Series/C216 Chipset Family SMBus Controller' class = serial bus subclass = SMBus vgapci0@pci0:1:0:0: class=0x038000 rev=0x00 hdr=0x00 vendor=0x1002 device=0x6663 subvendor=0x17aa subdevice=0x3800 vendor = 'Advanced Micro Devices, Inc. [AMD/ATI]' device = 'Sun PRO [Radeon HD 8570A/8570M]' class = display alc0@pci0:2:0:0: class=0x020000 rev=0x10 hdr=0x00 vendor=0x1969 device=0x10a0 subvendor=0x17aa subdevice=0x3802 vendor = 'Qualcomm Atheros' device = 'QCA8172 Fast Ethernet' class = network subclass = ethernet iwn0@pci0:3:0:0: class=0x028000 rev=0xc4 hdr=0x00 vendor=0x8086 device=0x0893 subvendor=0x8086 subdevice=0x0262 vendor = 'Intel Corporation' device = 'Centrino Wireless-N 135' class = network root@ykla:/usr/home/ykla #
Same problem here with 13.1-RELEASE-p2, though HIGHLY intermittent (roughly once a month, and, oddly, within three minutes of booting up or else never). Also, it's a long-standing problem on this board, going back at least to FBSD 11. Using the port fixes the problem, I think. So far. The port should probably be more widely advertised. Is there a copyright problem with adopting the port into base? dmesg | grep re0: re0: <Realtek PCIe GbE Family Controller> port 0xf000-0xf0ff mem 0xfe700000-0xfe700fff,0xf0300000-0xf0303fff irq 29 at device 0.0 on pci4 re0: Using Memory Mapping! re0: Using 1 MSI-X message re0: version:1.97.00 re0: Ethernet address: b4:2e:99:47:c1:43 re0: Ethernet address: b4:2e:99:47:c1:43 re0: link state changed to UP pciconf -lv: re0@pci0:4:0:0: class=0x020000 rev=0x0c hdr=0x00 vendor=0x10ec device=0x8168 subvendor=0x1458 subdevice=0xe000 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller' class = network subclass = ethernet
(In reply to George Mitchell from comment #104) FYI, I use the driver from ports since years, without any errors. It's only the kernel based driver that has this weird phenomenon. Keep using the ports (or pkg) based one and you'll be fine.
(In reply to George Mitchell from comment #104) Which port? Thanks.
I seem to be replying to the wrong bug? My USB NIC is not a realtek but an ASIX AX88179A, but I have this problem too.
The port to install is net/realtek-re-kmod, cited in comment #92 and a few others following.
I'm here via <https://lists.freebsd.org/archives/freebsd-questions/2022-November/002411.html>. Whilst here, triage: * lower the priority, in accordance with <https://wiki.freebsd.org/Bugzilla/TriageTraining> * assume that a fix (if any) will not be merged to stable/12 before the end of its life; 12.4, expected four days from now, will be the final RELEASE from this branch.
Very similar problem: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213751
see also https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=267514
This bug has been open since 2012 and a workaround has been documented here in the comments since 2020. In the four years since, there have been no complaints of any regressions or other issues caused by using the vendor-patched version of the driver or the ports net/realtek-re-kmod repackagement of it. The latest version of the vendor-provided driver is 1.98.0 and is compatible with FreeBSD 4 and up (per the source code), tested on FreeBSD 5.x and up (per the readme in the driver tarball), and is officially supported on FreeBSD 7 and up (per the product support website). The vendor-patched version of the driver continues to be distributed under the terms of the 3-clause BSD license. re(4) says it is compatible with the 8139C+/8169/816xS/811xS/8168/810xE/8111 network cards, while the vendor-patched version of the driver says it is compatible with 8169S/8169SB/8169SC/8168B/8168C/8168CP/8168D/8168DP/8168E/8168F/ 8168FB/8168G/818GU/8168H/8168EP/8411/8168FP/ 8101E/8102E/8103E/8401/8105E/8106E/8402/8125 so there is a difference in the list of supported drivers though I do not know if this is a documentation discrepancy or if support for the older 8139x, 811xS, and 8111 adapters has actually been dropped (I would guess the former as otherwise it would be random which cards have been dropped, assuming model numbers are linearly increasing or at least linearly branching). It seems to me that we should be able to update the in-kernel version of the driver to use the patched 1.98.0 version that's been reported to be issue-free so this can be fixed upstream. If support for those 3 models has in fact been dropped, we'd need to ship two separate versions of the re driver to keep supporting the same chipsets, but I'm *guessing* that's not the case. Any confirmation from users here who have used the net/realtek-re-kmod package with 8139x, 811xS, or 8111 network adapters would be very helpful.
(In reply to Mahmoud Al-Qudsi from comment #112) > In the four years since, there have been no complaints of any regressions or other issues caused by using the vendor-patched version of the driver or the ports net/realtek-re-kmod repackagement of it. As I reported above, the vendor driver emits ethernet pause frames, which is undesirable on my network; and it's not configurable. Of course, being unable to send or receive frames is worse; but the opaque nature of the vendor driver makes it hard to do any refinement. When I saw the vendor driver is opaque, look at the function re_enable_EEE (line 8087). That's a whole lot of probably important hardware configuration that we have no idea what's going on for. That said, perhaps the watchdog message could be adjusted to refer to this bug or the realtek-re-kmod port. Or maybe the vendor could be persuaded to release more details about their hardware?
Me too message, with a bit of info: - before 13.2-p8 I have seen watchdog timeout probably not more than one per month, never looked for a fix - machine is my home desktop, all others have Intel card - I use powerd - it happens during "restic" backup (so with probaly high bi-directional traffic) - I have /home over NFSv3, nothing like this happens on last (10?) years - I can confirm that only a reboot fix the missing ethernet driver - I just added pkg (realtek-re-kmod-199.00_1 installed) # MoBo info (AMI BIOS) Manufacturer: ASUSTeK COMPUTER INC. Product Name: PRIME B250M-A Processor: Intel(R) Core(TM) i5-7600 CPU @ 3.50GHz RAM: 2x 8GB DDR4 2133 (w/out overclock) # yesterday boot / kernel module re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 0xe000-0xe0ff mem 0xf7004000-0xf7004fff,0xf7000000-0xf7003fff irq 19 at device 0.0 on pci1 re0: Using 1 MSI-X message re0: ASPM disabled re0: Chip rev. 0x54000000 re0: MAC rev. 0x00100000 re0: Using defaults for TSO: 65518/35/2048 re0: Ethernet address: 2c:4d:54:68:ae:62 re0: netmap queues/slots: TX 1/256, RX 1/256 re0: link state changed to DOWN re0: link state changed to UP # last boot / pkg module re0: <Realtek PCIe GbE Family Controller> port 0xe000-0xe0ff mem 0xf7004000-0xf7004fff,0xf7000000-0xf7003fff irq 19 at device 0.0 on pci1 re0: Using Memory Mapping! re0: Using 1 MSI-X message re0: ASPM disabled re0: version:1.99.04 re0: Ethernet address: 2c:4d:54:68:ae:62 re0: Using defaults for TSO: 65518/35/2048 re0: Ethernet address: 2c:4d:54:68:ae:62 re0: link state changed to UP # ifconfig using pkg (I forget to save output using kernel module) re0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=60251b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,TSO4,LRO,WOL_MAGIC,RXCSUM_IPV6,TXCSUM_IPV6> ether 2c:4d:54:68:ae:62 inet {home_lan} netmask 0xffffff00 broadcast {home_lan} media: Ethernet autoselect (1000baseT <full-duplex>) status: active nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> re0@pci0:2:0:0: class=0x020000 rev=0x15 hdr=0x00 vendor=0x10ec device=0x8168 subvendor=0x1043 subdevice=0x8677 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller' class = network subclass = ethernet bar [10] = type I/O Port, range 32, base rxe000, size 256, enabled bar [18] = type Memory, range 64, base rxf7004000, size 4096, enabled bar [20] = type Memory, range 64, base rxf7000000, size 16384, enabled cap 01[40] = powerspec 3 supports D0 D1 D2 D3 current D0 cap 05[50] = MSI supports 1 message, 64 bit cap 10[70] = PCI-Express 2 endpoint MSI 1 max data 128(128) RO max read 2048 link x1(x1) speed 2.5(2.5) ASPM disabled(L0s/L1) ClockPM disabled cap 11[b0] = MSI-X supports 4 messages, enabled Table in map 0x20[0x0], PBA in map 0x20[0x800] ecap 0001[100] = AER 2 0 fatal 0 non-fatal 0 corrected ecap 0002[140] = VC 1 max VC0 ecap 0003[160] = Serial 1 01000000684ce000 ecap 0018[170] = LTR 1 ecap 001e[178] = L1 PM Substates 1 # current values from sysctl (all automatic) kern.ipc.nmbjumbop: 504907 kern.ipc.nmbclusters: 1009815 kern.ipc.nmbjumbo9: 149602 kern.ipc.nmbjumbo16: 84151 # devinfo (partial) pcib2 pnpinfo vendor=0x8086 device=0xa297 subvendor=0x1043 subdevice=0x8694 class=0x060400 at slot=28 function=7 dbsf=pci0:0:28:7 handle=\_SB_.PCI0.RP08 I/O ports: 0xe000-0xefff I/O memory addresses: 0xf7000000-0xf70fffff PCI domain 0 bus numbers: 2 pci1 pcib2 bus numbers: 2 re0 pnpinfo vendor=0x10ec device=0x8168 subvendor=0x1043 subdevice=0x8677 class=0x020000 at slot=0 function=0 dbsf=pci0:2:0:0 handle=\_SB_.PCI0.RP08.PXSX Interrupt request lines: 0x83 pcib2 I/O port window: 0xe000-0xe0ff pcib2 memory window: 0xf7000000-0xf7003fff 0xf7004000-0xf7004fff Thanks a lot to all people working on this. And finally a big thanks to ale@ for the port ;)
Also have problems with in-tree if_re After upgrade 12.3 -> 14.0 it started to loose connectivity every ~10-30 minutes, for 0.5-10 minutes ... wich is pity, does not depends much on network load (as minimum small load vs huge load). Frankly speaking, even on 12.x sometimes (quite rarely) there was such looses of connectivity, but very rare. no any "watchdog timeouts" - just lost of connectivity, I guess re0@pci0:3:0:0: class=0x020000 rev=0x02 hdr=0x00 vendor=0x10ec device=0x8168 subvendor=0x1458 subdevice=0xe000 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller' class = network subclass = ethernet somehow ping which was run after recover: # ping 172.22.2.5 PING 172.22.2.5: 56 data bytes ... ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ... 64 bytes from 172.22.2.5: icmp_seq=6 ttl=64 time=13123.658 ms 64 bytes from 172.22.2.5: icmp_seq=7 ttl=64 time=12122.371 ms 64 bytes from 172.22.2.5: icmp_seq=8 ttl=64 time=11120.850 ms 64 bytes from 172.22.2.5: icmp_seq=9 ttl=64 time=10112.358 ms 64 bytes from 172.22.2.5: icmp_seq=10 ttl=64 time=9107.387 ms 64 bytes from 172.22.2.5: icmp_seq=11 ttl=64 time=8055.572 ms 64 bytes from 172.22.2.5: icmp_seq=12 ttl=64 time=7040.134 ms 64 bytes from 172.22.2.5: icmp_seq=13 ttl=64 time=6012.834 ms 64 bytes from 172.22.2.5: icmp_seq=14 ttl=64 time=5010.828 ms 64 bytes from 172.22.2.5: icmp_seq=15 ttl=64 time=4009.945 ms 64 bytes from 172.22.2.5: icmp_seq=16 ttl=64 time=3007.644 ms 64 bytes from 172.22.2.5: icmp_seq=17 ttl=64 time=2006.117 ms 64 bytes from 172.22.2.5: icmp_seq=18 ttl=64 time=1004.582 ms 64 bytes from 172.22.2.5: icmp_seq=19 ttl=64 time=2.977 ms 64 bytes from 172.22.2.5: icmp_seq=20 ttl=64 time=0.784 ms 64 bytes from 172.22.2.5: icmp_seq=21 ttl=64 time=0.464 ms looks like packets was buffered somewhere Also have such in logs after recovering: Mar 17 17:33:49 srv kernel: Limiting open port RST response from 1346 to 187 packets/sec looks like all "stalled" TCP streams packates was delivered and got RSTs fired back. turning on debug on interface - did not provide any new diagnostics ifconfig re0 -tso -tso4 -tso6 -vlanhwcsum -vlanhwtso -rxcsum -txcsum - did not helped as well (on 12.x -vlanhwtso probably helped) what was helped - switchover to the port net/realtek-re-kmod
(In reply to vova from comment #115) Well ... with if_re from net/realtek-re-kmod also I am loosing connectivity ... not that often as with in-tree driver, with quite funny results: 1. system timer gots so driffted that ntpd dies with panic: Mar 20 15:17:55 srv ntpd[1583]: Clock offset exceeds panic threshold. Mar 20 15:17:55 srv ntpd[1583]: Set system clock by hand. and also massive 'Limiting icmp unreach' (looks like just after restoring) Mar 20 15:28:33 srv kernel: Limiting icmp unreach response from 419 to 195 packets/sec Mar 20 15:45:02 srv kernel: Limiting icmp unreach response from 407 to 216 packets/sec Mar 20 15:51:33 srv kernel: Limiting icmp unreach response from 392 to 214 packets/sec going to change network card, as not likely this one can be trusted
I can trigger this on 14.1-p3. Reliably. To replicate: - mount an NFS share from a relatively fast machine; - extract a multigigabyte compressed archive. On 15-CURRENT, I was replicating it, but I no longer have that test. It would finish a poudriere run (if restarted enough) but it would consistently fail the last bit --- I believe it's signing the packages as that point (counting up for many minutes in %). I have triggered it with "7z x .." and I have triggered it by using plasma's file browser, right clicking and choosing "extract here" on an NFS share. The hardware is a levono Neo 30a 24 Gen 3. pciconf says (of the re0): re0@pci0:2:0:0: class=0x020000 rev=0x15 hdr=0x00 vendor=0x10ec device=0x8168 subvendor=0x17aa subdevice=0x375a vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller' class = network subclass = ethernet When I trigger it, console says: re0: watchdog timeout re0: link state changed to DOWN re0: link state changed to UP When I trigger it, a running ping says: root@strike:/home/dgilbert # ping 192.168.221.1 PING 192.168.221.1 (192.168.221.1): 56 data bytes ping: sendto: No route to host ping: sendto: No route to host ping: sendto: No route to host ping: sendto: No route to host 64 bytes from 192.168.221.1: icmp_seq=22 ttl=64 time=7490.585 ms 64 bytes from 192.168.221.1: icmp_seq=23 ttl=64 time=6472.741 ms 64 bytes from 192.168.221.1: icmp_seq=24 ttl=64 time=5445.312 ms 64 bytes from 192.168.221.1: icmp_seq=25 ttl=64 time=4444.340 ms 64 bytes from 192.168.221.1: icmp_seq=30 ttl=64 time=0.173 ms 64 bytes from 192.168.221.1: icmp_seq=31 ttl=64 time=0.457 ms 64 bytes from 192.168.221.1: icmp_seq=32 ttl=64 time=0.374 ms 64 bytes from 192.168.221.1: icmp_seq=33 ttl=64 time=0.377 ms 64 bytes from 192.168.221.1: icmp_seq=34 ttl=64 time=0.321 ms The topology: NFS server: FreeBSD 14.1-p3, 140T comprised of 4*4T disks and 2T of nvme cache. 128G RAM. Threadripper. 10GE network. NFS client: This crappy free-to-me all-in-one. 1GE network.
This behavior sounds very very similar to Bug 213751.
Same as https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=166724#c117 when doing restic backup operation such as prune and check. re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 0x3000-0x30ff mem 0xb4204000-0xb4204fff,0xb4200000-0xb4203fff at device 0.0 on pci3 re0: Using 1 MSI-X message re0: ASPM disabled re0: Chip rev. 0x54000000 re0: MAC rev. 0x00100000 laptop Lenovo Legion 5 (Intel) Today my re0 turned off and lost connections. At main-n272449-6e414739fc95
(In reply to vova from comment #115) > what was helped - switchover to the port net/realtek-re-kmod I will try install re drivers from ports and check if problem persists.
^Triage: clear unneeded flags. Nothing has yet been committed to be merged.
I was using the patched driver for a several of years without issue, but it stopped working earlier this year due to a different bug. I switched back to the regular driver and it was fine until a few weeks ago, now I'm back to watchdog timeouts every couple of days. I don't remember if I did anything that could have caused the issue to reappear. I'm using 14.1-RELEASE-p5 now. I had a look at /var/log/messages from July until now. There are a couple of isolated instances of two to three watchdog timeouts, but these weren't a problem. The first occurrence of the actual problem, where times out every couple of seconds until I reboot the machine, was on the 8th of September. I seem to have updated from 14.1-p3 to p4 the day before, but I don't know of that could have caused it.
Not sure if this is helpful at all, but this issue only occurs for me when reading audio files via foobar2000 (Windows) via SMB share, while simultaneously attempting to transfer large files to a different SMB share on the same machine.
I had the same 're0: watchdog timeout' bug on my Gigabyte B450 AORUS M motherboard (bios F67c). I simply could not log into FreeBSD (14.1 and 14.2) since the network service kept being restarted due to re0 getting stuck with the above mentioned error. this was 100% reproducible on my triple boot system: start windows 10 powerdown powerup, start FreeBSD reo: watchdog timeout in order to get rid of the error I had to either completely unplug the power cord from the system before starting FreeBSD or booting into Linux first. Today I found that a better solution is to install net/realtek-re-kmod and disabling the rx/tx offloading via ifconfig_re0="DHCP -rxcsum -txcsum -rxcsum6 -txcsum6"
I never had the issue but since I switched to 14.2 it seems to happen once a week or so. What is the accepted workaround/fix? dmesg shows: re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 0xe000-0xe0ff mem 0xf7104000-0xf7104fff,0xf7100000-0xf7103fff irq 19 at device 0.0 on pci1 re0: Using 1 MSI-X message re0: ASPM disabled re0: Chip rev. 0x54000000 re0: MAC rev. 0x00100000 miibus0: <MII bus> on re0 rgephy0: <RTL8251/8153 1000BASE-T media interface> PHY 1 on miibus0 rgephy0: none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow re0: Using defaults for TSO: 65518/35/2048 re0: Ethernet address: 18:31:bf:70:12:6e
The consensus of the 124 previous comments is use your choice of the net/realtek-re-kmod or net/realtek-re-kmod198 ports instead. The second has more of a track record and seems to work every time; the first is based apparently on Realtek's version "1100" driver and fails for some users but not for others. Your mileage may vary.
I've ended up switching to igb0: <Intel(R) PRO/1000 82576> for the truth - this host is quite old (>10y) if_re worked quite fine for all these years and started to glitch only after upgrading to 14.0, driver from ports helped a little bit, but it still get stuck time to time. Then I've switched to em0: <Intel(R) Legacy PRO/1000 MT 82540EM> and ... it also was glitched dtrace digging shows that 'iflib_if_transmit:return Error ENOBUFS` so, no hints what was changed in 14.x ... anyway, hardware upgrade to igb0 helped ... Also, on other host re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> works quite fine for me with base system driver
The "watchdog timeout" also bit me here last night. Host is several years old, runs 24x7, never had a problem until yesterday. re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 0xc000-0xc0ff mem 0xa1104000-0xa1104fff,0xa1100000-0xa1103fff at device 0.0 on pci2 re0: Using 1 MSI-X message re0: ASPM disabled re0: Chip rev. 0x54000000 re0: MAC rev. 0x00100000 miibus0: <MII bus> on re0 re0: Using defaults for TSO: 65518/35/2048 re0: Ethernet address: d8:5e:d3:xx:xx:xx re0: netmap queues/slots: TX 1/256, RX 1/256 re1: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 0xc000-0xc0ff mem 0xa1104000-0xa1104fff,0xa1100000-0xa1103fff at device 0.0 on pci3 re1: Using 1 MSI-X message re1: ASPM disabled re1: Chip rev. 0x54000000 re1: MAC rev. 0x00100000 miibus1: <MII bus> on re1 re1: Using defaults for TSO: 65518/35/2048 re1: Ethernet address: d8:5e:d3:xx:xx:xx re1: netmap queues/slots: TX 1/256, RX 1/256 Per advice here, I've switched to the vendor 1.100 driver. Immediately needed the -rxcsum -txcsum -rxcsum6 -txcsum6 as advised in the pkg-message. It's been running fine for almost about 20 hours now.
Meant to add above that the watchdog timeout hit re1. There had been occasional timeouts in the log for a few days but no problems. Last night there was a series of timeouts in short order and re1 stopped working. No amount of interface down/up or netif restart re1 helped. Had to reboot to get it going again. All the while re1 was not working, re0 was still working fine.