i have an issue with re0 watchdog timeout under moderate load with PowerD enabled. im using pfsense with a wan link set to vlan and using native as lan. network speed is 180mbps down 12mbps up. upon initiating a speed test the test will fail it has hit the re0 watchdog timeout. after disabling PowerD the system appears to be functioning as intended with no records of re0 watchdog timeout. im not sure which chipset it is however it is a asrock beebox n3000 http://www.asrock.com/nettop/Intel/Beebox%20Series/
Can you post the output of pciconf -lvbc?
here is the output. [2.3-BETA][root@pfSense.Home.lan]/root: pciconf -lvbc hostb0@pci0:0:0:0: class=0x060000 card=0x22b11849 chip=0x22808086 rev=0x21 hdr=0x00 vendor = 'Intel Corporation' class = bridge subclass = HOST-PCI vgapci0@pci0:0:2:0: class=0x030000 card=0x22b11849 chip=0x22b18086 rev=0x21 hdr=0x00 vendor = 'Intel Corporation' class = display subclass = VGA bar [10] = type Memory, range 64, base rx90000000, size 16777216, enabled bar [18] = type Prefetchable Memory, range 64, base rx80000000, size 268435456, enabled bar [20] = type I/O Port, range 32, base rxf000, size 64, enabled cap 01[d0] = powerspec 2 supports D0 D3 current D0 cap 05[90] = MSI supports 1 message cap 09[b0] = vendor (length 7) Intel cap 0 version 1 ahci0@pci0:0:19:0: class=0x010601 card=0x22a31849 chip=0x22a38086 rev=0x21 hdr=0x00 vendor = 'Intel Corporation' class = mass storage subclass = SATA bar [20] = type I/O Port, range 32, base rxf060, size 32, enabled bar [24] = type Memory, range 32, base rx91415000, size 2048, enabled cap 05[80] = MSI supports 1 message enabled with 1 message cap 01[70] = powerspec 3 supports D0 D3 current D0 cap 12[a8] = SATA Index-Data Pair xhci0@pci0:0:20:0: class=0x0c0330 card=0x22b51849 chip=0x22b58086 rev=0x21 hdr=0x00 vendor = 'Intel Corporation' class = serial bus subclass = USB bar [10] = type Memory, range 64, base rx91400000, size 65536, enabled cap 01[70] = powerspec 2 supports D0 D3 current D0 cap 05[80] = MSI supports 8 messages, 64 bit enabled with 1 message none0@pci0:0:26:0: class=0x108000 card=0x22981849 chip=0x22988086 rev=0x21 hdr=0x00 vendor = 'Intel Corporation' class = encrypt/decrypt bar [10] = type Memory, range 32, base rx91100000, size 1048576, enabled bar [14] = type Memory, range 32, base rx91000000, size 1048576, enabled cap 01[80] = powerspec 3 supports D0 D3 current D0 cap 05[a0] = MSI supports 1 message hdac0@pci0:0:27:0: class=0x040300 card=0x02831849 chip=0x22848086 rev=0x21 hdr=0x00 vendor = 'Intel Corporation' class = multimedia subclass = HDA bar [10] = type Memory, range 64, base rx91410000, size 16384, enabled cap 01[50] = powerspec 2 supports D0 D3 current D0 cap 05[60] = MSI supports 1 message, 64 bit enabled with 1 message pcib1@pci0:0:28:0: class=0x060400 card=0x22c81849 chip=0x22c88086 rev=0x21 hdr=0x01 vendor = 'Intel Corporation' class = bridge subclass = PCI-PCI cap 10[40] = PCI-Express 2 root port slot max data 128(128) link x1(x1) speed 2.5(5.0) ASPM disabled(L0s/L1) cap 05[80] = MSI supports 1 message cap 0d[90] = PCI Bridge card=0x22c81849 cap 01[a0] = powerspec 3 supports D0 D3 current D0 ecap 0000[100] = unknown 0 ecap 001e[200] = unknown 1 pcib2@pci0:0:28:1: class=0x060400 card=0x22ca1849 chip=0x22ca8086 rev=0x21 hdr=0x01 vendor = 'Intel Corporation' class = bridge subclass = PCI-PCI cap 10[40] = PCI-Express 2 root port slot max data 128(128) link x1(x1) speed 2.5(5.0) ASPM disabled(L0s/L1) cap 05[80] = MSI supports 1 message cap 0d[90] = PCI Bridge card=0x22ca1849 cap 01[a0] = powerspec 3 supports D0 D3 current D0 ecap 0000[100] = unknown 0 ecap 001e[200] = unknown 1 isab0@pci0:0:31:0: class=0x060100 card=0x229c1849 chip=0x229c8086 rev=0x21 hdr=0x00 vendor = 'Intel Corporation' class = bridge subclass = PCI-ISA cap 09[e0] = vendor (length 12) Intel cap 1 version 0 features: 4 PCI-e x1 slots none1@pci0:0:31:3: class=0x0c0500 card=0x22921849 chip=0x22928086 rev=0x21 hdr=0x00 vendor = 'Intel Corporation' class = serial bus subclass = SMBus bar [10] = type Memory, range 32, base rx91414000, size 32, enabled bar [20] = type I/O Port, range 32, base rxf040, size 32, enabled cap 01[50] = powerspec 3 supports D0 D3 current D0 none2@pci0:1:0:0: class=0x028000 card=0x882110ec chip=0x882110ec rev=0x00 hdr=0x00 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8821AE 802.11ac PCIe Wireless Network Adapter' class = network bar [10] = type I/O Port, range 32, base rxe000, size 256, enabled bar [18] = type Memory, range 64, base rx91300000, size 16384, enabled cap 01[40] = powerspec 3 supports D0 D1 D2 D3 current D0 cap 05[50] = MSI supports 1 message, 64 bit cap 10[70] = PCI-Express 2 endpoint max data 128(128) RO link x1(x1) speed 2.5(2.5) ASPM disabled(L0s/L1) ecap 0001[100] = AER 2 0 fatal 0 non-fatal 1 corrected ecap 0003[140] = Serial 1 00e04cfffe872b01 ecap 0018[150] = LTR 1 ecap 001e[158] = unknown 1 re0@pci0:2:0:0: class=0x020000 card=0x81681849 chip=0x816810ec rev=0x11 hdr=0x00 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller' class = network subclass = ethernet bar [10] = type I/O Port, range 32, base rxd000, size 256, enabled bar [18] = type Memory, range 64, base rx91204000, size 4096, enabled bar [20] = type Prefetchable Memory, range 64, base rx91200000, size 16384, enabled cap 01[40] = powerspec 3 supports D0 D1 D2 D3 current D0 cap 05[50] = MSI supports 1 message, 64 bit cap 10[70] = PCI-Express 2 endpoint IRQ 1 max data 128(128) RO link x1(x1) speed 2.5(2.5) ASPM disabled(L0s/L1) cap 11[b0] = MSI-X supports 4 messages, enabled Table in map 0x20[0x0], PBA in map 0x20[0x800] cap 03[d0] = VPD ecap 0001[100] = AER 1 0 fatal 0 non-fatal 0 corrected ecap 0002[140] = VC 1 max VC0 ecap 0003[160] = Serial 1 01000000684ce000 ecap 0018[170] = LTR 1
@Nick, is this still reproducible in the latest release?
its updated to 10.3-RELEASE-p5 and the issue is still there.
We're seeing this on one of our gateways in the FreeBSD cluster at bytemark. The interface will not come back up unless the machine is rebooted. FreeBSD igw0.bme.freebsd.org 11.0-ALPHA6 FreeBSD 11.0-ALPHA6 #0 r302331: Sun Jul 3 23:03:04 UTC 2016 peter@build-11.freebsd.org:/usr/obj/usr/src/sys/CLUSTER11 amd64 re0@pci0:3:0:0: class=0x020000 card=0x85051043 chip=0x816810ec rev=0x09 hdr=0x00 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller' class = network subclass = ethernet bar [10] = type I/O Port, range 32, base rxe800, size 256, enabled bar [18] = type Prefetchable Memory, range 64, base rxfdfff000, size 4096, enabled bar [20] = type Prefetchable Memory, range 64, base rxfdff8000, size 16384, enabled cap 01[40] = powerspec 3 supports D0 D1 D2 D3 current D0 cap 05[50] = MSI supports 1 message, 64 bit cap 10[70] = PCI-Express 2 endpoint MSI 1 max data 128(128) RO link x1(x1) speed 2.5(2.5) ASPM disabled(L0s/L1) cap 11[b0] = MSI-X supports 4 messages, enabled Table in map 0x20[0x0], PBA in map 0x20[0x800] cap 03[d0] = VPD ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected ecap 0002[140] = VC 1 max VC0 ecap 0003[160] = Serial 1 0000000000000000 re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 0xe800-0xe8ff mem 0xfdfff000-0xfdffffff,0xfdff8000-0xfdffbfff irq 18 at device 0.0 on pci3 re0: Using 1 MSI-X message re0: turning off MSI enable bit. re0: Chip rev. 0x48000000 re0: MAC rev. 0x00000000 miibus0: <MII bus> on re0 rgephy0: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 1 on miibus0 rgephy0: none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow re0: Using defaults for TSO: 65518/35/2048 re0: Ethernet address: 08:60:6e:d7:31:d2
(In reply to Sean Bruno from comment #5) This machine does *not* run powerd as its a gateway host.
I'll try to chime in. After upgrading one of my hosts to 11-STABLE I started to receive those timeouts. In my case this is a home router/storage. I'm using pf and bridge on that interface. In my case error may be recoverable in some cases. So I see network glitch, transfers are stopping, but after a couple of seconds the packets are flowing again. This can happen a few times in a 10 minute period before network goes down completely.
(In reply to c.kworr from comment #7) Does this happen without pf being used?
(In reply to Sean Bruno from comment #8) Never checked without pf… I have some LOR regarding pf but hadn't thought pf could be the one to blame. To check without pf I need to rewrite all firewall rules for that one. I'm not using it directly, pf is configured for bridge that contains this interface. I'll see how I can move it out of configuration.
Just a "me too" here... The box is a 10.3p5 running as a router/server. The internal interface (re0) got blocked (just once luckily until now). The outside interface (re1) was working, so I could log in remotely and reboot; ifconfig re0 down/up would not help. powerd is running in its default config (i.e. just powerd_enable="YES" in /etc/rc.conf). PF is not running, but IPFW is. # pciconf -lv ... re0@pci0:2:0:0: class=0x020000 card=0x78171462 chip=0x816810ec rev=0x0c hdr=0x00 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller' class = network subclass = ethernet re1@pci0:3:0:0: class=0x020000 card=0x34687470 chip=0x816810ec rev=0x06 hdr=0x00 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller' class = network subclass = ethernet
(In reply to Sean Bruno from comment #8) Actually you can be right. During boot I always have this LOR: Sep 7 10:08:26 limbo kernel: lock order reversal: Sep 7 10:08:26 limbo kernel: 1st 0xffffffff818f5b80 pf rulesets (pf rulesets) @ /usr/src/sys/modules/pf/../../netpfil/pf/pf.c:5879 Sep 7 10:08:26 limbo kernel: 2nd 0xffffffff80d4a6c8 pcbinfohash (pcbinfohash) @ /usr/src/sys/netinet/in_pcb.c:1957 Sep 7 10:08:26 limbo kernel: stack backtrace: Sep 7 10:08:26 limbo kernel: #0 0xffffffff8040af80 at witness_debugger+0x70 Sep 7 10:08:26 limbo kernel: #1 0xffffffff8040ae74 at witness_checkorder+0xe54 Sep 7 10:08:26 limbo kernel: #2 0xffffffff803a8647 at __rw_rlock+0xa7 Sep 7 10:08:26 limbo kernel: #3 0xffffffff804d369f at in_pcblookup_hash+0x3f Sep 7 10:08:26 limbo kernel: #4 0xffffffff818c75e5 at pf_socket_lookup+0xe5 Sep 7 10:08:26 limbo kernel: #5 0xffffffff818cdbc7 at pf_test_rule+0x1817 Sep 7 10:08:26 limbo kernel: #6 0xffffffff818c9254 at pf_test+0x18f4 Sep 7 10:08:26 limbo kernel: #7 0xffffffff818dc5dd at pf_check_out+0x1d Sep 7 10:08:26 limbo kernel: #8 0xffffffff804b890b at pfil_run_hooks+0x8b Sep 7 10:08:26 limbo kernel: #9 0xffffffff804d5bc5 at ip_tryforward+0x295 Sep 7 10:08:26 limbo kernel: #10 0xffffffff804d818f at ip_input+0x35f Sep 7 10:08:26 limbo kernel: #11 0xffffffff804b77c0 at netisr_dispatch_src+0x80 Sep 7 10:08:26 limbo kernel: #12 0xffffffff804a2fea at ether_demux+0x14a Sep 7 10:08:26 limbo kernel: #13 0xffffffff804a3de0 at ether_nh_input+0x340 Sep 7 10:08:26 limbo kernel: #14 0xffffffff804b77c0 at netisr_dispatch_src+0x80 Sep 7 10:08:26 limbo kernel: #15 0xffffffff804a3352 at ether_input+0x62 Sep 7 10:08:26 limbo kernel: #16 0xffffffff817d9f75 at re_rxeof+0x5c5 Sep 7 10:08:26 limbo kernel: #17 0xffffffff817d75ba at re_intr_msi+0xca re0@pci0:2:0:0: class=0x020000 card=0x4b101186 chip=0x43001186 rev=0x06 hdr=0x00 vendor = 'D-Link System Inc' device = 'DGE-528T Gigabit Ethernet Adapter' class = network subclass = ethernet re1@pci0:3:0:0: class=0x020000 card=0x230e1565 chip=0x816810ec rev=0x07 hdr=0x00 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller' class = network subclass = ethernet Only re1 fails.
I disabled powerd, but the problem showed up again. It only happened twice in some months, but it's still a critical problem for us.
Greetings! This issue seems to be not only FBSD 10 specific: Some other users (including me) had an discussion @ FBSD Forum about this annoying problem. Please see https://forums.freebsd.org/threads/55861/ for more details In short: it seems that the built-in Driver for re0 nics has a bug. The workaround is to compile and use the latest version from realtek by compiling the re0 driver as an external module. I hope this helps you further.
(In reply to Marc Mach from comment #13) If someone could identify what is in the Realtek driver that is not in the FreeBSD base driver, I'm willing to commit it. The diff's are kind of ridiculous and may take a lot of sleuthing to figure out.
Hello. Just to say that I have a new box which is showing this behaviour: re0 (on motherboard) locks, while re1 (PCI-X card) is still working. # pciconf -lv ... re0@pci0:1:0:0: class=0x020000 card=0x79821462 chip=0x816810ec rev=0x15 hdr=0x00 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller' class = network subclass = ethernet re1@pci0:2:0:0: class=0x020000 card=0x34687470 chip=0x816810ec rev=0x06 hdr=0x00 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller' class = network subclass = ethernet Those are the same cards as in my previous comment, but the box is a different one.
The problem still exists in version 11.1. Low load - "re" works fine, but large bidirectional - completely hangs the network. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=166724 Is there a chance to renew the topic? zjk
Hosted environment: with FreeBSD 11.1-RELEASE. Dual-Stack: re0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=82088<VLAN_MTU,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE> nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex,master>) status: active under load > 64 Mbit full dumplex I see the problem: kernel: re0: watchdog timeout kernel: re0: link state changed to DOWN kernel: re0: link state changed to UP after a few times the card goes offline until reboot. hw swap did not help. hardware1: re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 0xe000-0xe0ff mem 0xf0004000-0xf0004fff,0xf0000000-0xf0003fff irq 17 at device 0.0 on pci2 re0: MSI count : 1 re0: MSI-X count : 4 re0: attempting to allocate 1 MSI-X vectors (4 supported) re0: using IRQ 265 for MSI-X re0: Using 1 MSI-X message re0: turning off MSI enable bit. re0: Chip rev. 0x2c800000 re0: MAC rev. 0x00100000 miibus0: <MII bus> on re0 re0: Using defaults for TSO: 65518/35/2048 re0: bpf attached hardware2: re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 0xe000-0xe0ff mem 0xf7c00000-0xf7c00fff,0xf0000000-0xf0003fff irq 16 at device 0.0 on pci1 re0: MSI count : 1 re0: MSI-X count : 4 re0: attempting to allocate 1 MSI-X vectors (4 supported) re0: using IRQ 266 for MSI-X re0: Using 1 MSI-X message re0: Chip rev. 0x4c000000 re0: MAC rev. 0x00000000 miibus0: <MII bus> on re0 re0: Using defaults for TSO: 65518/35/2048 re0: bpf attached re0: Ethernet address: 44:8a:5b:d4:49:6d re0: netmap queues/slots: TX 1/256, RX 1/256 random: harvesting attach, 8 bytes (4 bits) from re0
(In reply to Dirk Meyer from comment #17) pciconf -v -l hardware1: re0@pci0:2:0:0: class=0x020000 card=0x78161462 chip=0x816810ec rev=0x06 hdr=0x00 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller' class = network subclass = ethernet hardware2: re0@pci0:2:0:0: class=0x020000 card=0x78231462 chip=0x816810ec rev=0x0c hdr=0x00 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller' class = network subclass = ethernet
(In reply to Sean Bruno from comment #14) I don't know if there is something in the Realtek driver that isn't in ours, or maybe the opposite. For instance the watchdog functionality is commented out in the Realtek driver.
I did some additional testing a while back. And it seems on this setup powerD does nothing to save power. With it on or off the computer still consumed 8 watts. And I guess at that level it's pretty efficient.
The use of powerd can make the issue more evident / easier to reproduce, but it's not the root cause.
I managed to solve this issue by disabling MSI and MSI-X. Put the following lines into /boot/loader.conf hw.re.msi_disable="1" hw.re.msix_disable="1" You see, the MSI/MSI-X interrupt processing supposedly eliminates the need to perform an extra read from device register after receiving an interrupt which tells that a DMA write is finished. However, there is some kind of problem either in the driver or the chip itself in the way it handles these interrupts. By disabling MSI and MSI-X, the driver switches to using the older interrupt filter handler, and thus probably performs and extra read from some device register to wait for the DMA transfer to memory to be ready (according to wikipedia, when using legacy interrupts this is the only way to ensure the DMA transfer wasn't buffered by the chipset etc). So, I would suggest everybody watching this thread to try if disabling MSI and MSI-X on their system helps. Might not apply to all Realtek NICs, but on my machine this workaround is valid. PS. the performance is still horrible when transferring to and from the machine, but at least now it doesn't hang sporadically.
Disabling MSI/MSI-X was proposed as solution in the past. I've just tried again to be sure, it helps, but the issue doesn't disappear completely. With it I can successfully run the google (m-lab) speed test, but I still get a watchdog timeout and network reset as soon as I start the Ookla speed test. Fully reproducible.
hw.re.msi_disable hw.re.msix_disable I tested this solution for a few days (it already exists somewhere on the internet). There is no visible effect (on my computers) - network is closing very quickly. But - maybe it depends on the network card chipset? However, I highly recommend the analysis: https://forums.freebsd.org/threads/10-2-release-re0-watchdog-timeout.55306/#post-337045 There are some extremely important remarks. One important tip - this may be the result of overloading the processor. In general - a problem for low-performance processors. Or vice versa: for the "computationally demanding" chipset of the network card, and finally the "programmatically extended" driver. Probably because the version of "built-in" driver for FreeBSD is so much "slimmed", in relation to the full version from Realtek (from the Realtek website). It may be intended to run on less-efficient processors. But I can not fully appreciate everything from this analysis. "Watchdog timeout" messages - also occur after stopping the transmission. Processor load drops to several percent, but watchdog timeout messages still appear every few seconds. In general - a reset is needed to restore the normal operation of the interface. As a solution, you can use "patch" - instead of, for example, limit the connection speed to 100 Mb, you can use, for example, dummynet for flow / band management. It is still not a solution to the problem of the driver itself.
After upgrading to 11.2-RELEASE the problem seems disappeared on my machine. Looking at dmesg the only difference is the missing of the following line at boot: re0: turning off MSI enable bit.
After upgrading several machines to 11.2 and all-night tests: nothing better, still a watchdog fault. zjk
I still see a few watchdog errors in the logs, but I'm unable to trigger them voluntarily, even with very high traffic. While before it was enough to run a single speed test to drop the connection, now I can saturate the link without a watchdog timeout. The connection is quite stable now. The issue is likely not solved, but it's much harder to be triggered in my scenario.
The following configuration is very promising: - kernel 11.2-RELEASE recompiled together, - re driver v. 1.93 (from realtek site). Effect: - NO (absolutely none) watchdog timeout, - FULL speed in both directions (I will still test different situations), - works well with lagg(!). Now I compile realtek version 1.94 with 11.2-RELEASE - I will let you know what are the effects. zjk
Surely you won't get the watchdog timeout error with the driver taken from the realtek website, it's been commented out from the source code, so it's not a real clue. Said so, with 11.0 and 11.1 I've always used the 1.93 version without issues.
A. After longer tests - I must cancel the previous optimistic news. We are talking about the 11.2-RELEASE + 1.93-realtek driver: 1. Suspensions, computer stops - still occur. They are only shorter - though still cumbersome. Generally at the beginning the interface works quickly, after some time it slows down and shows signs of loss. 2. There are still messages about the interface suspension. Because I use lagg it looks like this: + [20445] re1: Interface stopped DISTRIBUTING, possible flapping + [48114] re0: Interface stopped DISTRIBUTING, possible flapping B. Regarding Alex's statements. This is a real problem. Of course, the "watchdog timeout" message itself is not harmful. The important thing is that the message in the function follows the reset and re-initialisation of the interface - this unfortunately results in the loss or partial destruction of transmitted files / frames (which unfortunately I have experienced many times). The application of version 1.93-1.94: is therefore of such a improvement that not only does the message disappear (commented out from function - as Alex correctly writes), but the files are not damaged during the transmission (yet to be checked!). Version 11.2-RELEASE - for me it certainly generates hundreds of messages "watchdog timeout" - but today I do not know if it prevents damage or loss of transmitted data (to be checked). I see: /* Cancel pending I/O and free all RX/TX buffers. */ re_stop(sc); /* Put controller into known state. */ re_reset(sc); It means: drop, loss transmitted information. C. However, I will not agree with Alex that it is good. Perhaps it is good for a laptop, too little for the server. It is still terrible. D. Test 11.2 + 1.94 - I have not started yet.
Hi, starting from version 12.0, i facing this issue on my mini-itx server. Hardware Info: re0@pci0:3:0:0: class=0x020000 card=0xe0001458 chip=0x816810ec rev=0x0c hdr=0x00 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller' class = network subclass = ethernet re1@pci0:4:0:0: class=0x020000 card=0xe0001458 chip=0x816810ec rev=0x0c hdr=0x00 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller' class = network subclass = ethernet I solved it, by commenting out re_txeof in re_tick function in if_re.c. In my oppinion its a timing issue during high load situtations, blocking the interrupt/dma of the device. And of course the job is done by watchdog function after 5 timer ticks. static void re_tick(void *xsc) { struct rl_softc *sc; struct mii_data *mii; sc = xsc; RL_LOCK_ASSERT(sc); mii = device_get_softc(sc->rl_miibus); mii_tick(mii); if ((sc->rl_flags & RL_FLAG_LINK) == 0) re_miibus_statchg(sc->rl_dev); /* * Reclaim transmitted frames here. Technically it is not * necessary to do here but it ensures periodic reclamation * regardless of Tx completion interrupt which seems to be * lost on PCIe based controllers under certain situations. */ // re_txeof(sc); re_watchdog(sc); callout_reset(&sc->rl_stat_callout, hz, re_tick, sc); }
FWIW we ran into this problem when we opted to become a public CPAN mirror (perl.org). Which necessitated adding 2 more ports, and delegating 2 additional addresses. We used a 2 port RealTek adapter. About an hour into going live. The watchdog(8) timeouts began spamming the logs. The solution was to bump 2 entries in sysctl.conf(5): kern.ipc.nmbjumbop (245550 by default) and kern.ipc.nmbclusters (491100 by default) as in FreeBSD 11, these numbers are too small -- at least for these NICs. How high you bump them depends upon the load and traffic on your NICs. As a rule of thumb I would suggest bumping them up a quarter of their original values until watchdog shuts up. All the while accessing any performance changes. We're now on 12, and moving to 13 shortly. 12 did NOT exhibit this problem, because the numbers are much higher by default. HTH --Chris
(In reply to Chris Hutchinson from comment #33) Aren't those value's default proportional to system RAM? From a quick survey on my systems I see: 4GB 11.3 123975 247952 16GB 11.3 507532 1015064 32GB 11.3 1017580 2035160 32GB 12.1 1019729 2039460 I guess the last two don't actually differ due to FreeBSD version difference, but due to little difference in memory availability. Also I've seen this problem on the third system above, where the values are 4 times the default. So increasing them a little is no solution. I've not seen the problem on the last machine (with 12.1), but that might be because re0 is a WAN port on a slow link.
(In reply to ml from comment #34) For *me*, doubling the values from their defaults fixed it. I only mentioned it as 1) disabling MSI-X (default solution) reduces performance -- even significantly 2) It worked. So felt it worth mentioning. :) --Chris
There seems to be a port for this: https://www.freshports.org/net/realtek-re-kmod/
Might be useful to increase awareness of the vendor driver by adding it to the man page: https://reviews.freebsd.org/D33677
Still present on 13.0-STABLE #0 stable/13-n248872-2c7441c86ef: re1@pci0:5:0:0: class=0x020000 rev=0x15 hdr=0x00 vendor=0x10ec device=0x8168 subvendor=0x17aa subdevice=0x5094 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller' class = network subclass = ethernet bar [10] = type I/O Port, range 32, base rx2000, size 256, enabled bar [18] = type Memory, range 64, base rxfd504000, size 4096, enabled bar [20] = type Memory, range 64, base rxfd500000, size 16384, enabled cap 01[40] = powerspec 3 supports D0 D1 D2 D3 current D0 cap 05[50] = MSI supports 1 message, 64 bit cap 10[70] = PCI-Express 2 endpoint MSI 1 max data 128(128) RO max read 4096 link x1(x1) speed 2.5(2.5) ASPM disabled(L0s/L1) ClockPM disabled cap 11[b0] = MSI-X supports 4 messages, enabled Table in map 0x20[0x0], PBA in map 0x20[0x800] ecap 0001[100] = AER 2 0 fatal 0 non-fatal 1 corrected ecap 0002[140] = VC 1 max VC0 ecap 0003[160] = Serial 1 01000000684ce000 ecap 0018[170] = LTR 1 ecap 001e[178] = L1 PM Substates 1 I was able to reset it into a good working state(when interface was doing a watchdog up/down about every 2 minutes) via ifconfig re1 down ; sleep 10 ; ifconfig re1 up Not sure if related but for some period of time prior to watchdog times the NIC performance degrades to roughly half. This could be some disk or other bottleneck I haven't identified yet.
Running FreeBSD 12.2 pfSense default kernel with built in drivers produces watchdog timeouts and an unresponsive firewall when BW hits ~350Mbps. dmesg of the default re0: re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 0xe000-0xe0ff mem 0xf0104000-0xf0104fff,0xf0100000-0xf0103fff irq 19 at device 0.0 on pci3 re0: Using 1 MSI-X message re0: turning off MSI enable bit. re0: ASPM disabled re0: Chip rev. 0x2c800000 re0: MAC rev. 0x00100000Realtek Loading pkg add realtek-re-kmod (1.96.04) via: /boot/loader.conf.local: kern.vty=sc if_re_load="YES" if_re_name="/boot/modules/if_re.ko" Fixes the watchdog timer issue and allows up to ~800Mbps stable transfers. It's working and stable at ~800Mbps alternatively my Intel NICs (i210) run 960Mbps without issue on the same switch. Attempting to get more out of the Realtek 8168 is proving to be a challenge. There still appears to be a driver issue with 1.96.04, but it's certainly better than what is included in the kernel. This guy may actually have a fix for it here: https://github.com/megabytefisher/if_re-mod
Marking duplicate of bug 166724 will summarize reproduction cases workarounds and references there. *** This bug has been marked as a duplicate of bug 166724 ***