Configuration: Server: Dell R740XD NIC: vendor = 'Intel Corporation' device = 'Ethernet Controller 10G X550T' ix0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 media: Ethernet autoselect (1000baseT <full-duplex>) Config overrides: net.inet.tcp.hostcache.cachelimit="0" machdep.hyperthreading_allowed="0" net.inet.tcp.soreceive_stream="1" net.isr.maxthreads="-1" net.isr.bindthreads="1" dev.ix.0.fc=0 dev.ix.0.iflib.rx_budget=65535 dev.ix.1.iflib.rx_budget=65535 After certain amount of time (2-8 hours) under a constant network load ~ 1Gbit\s, observing a loss of connection and any ping\traceroute end up with "No buffer space available" error. netstat -m \ netstat -s output looks normal, all values are way beyond limits. No errors in syslog. DC crew replaced hardware\cables - didn't help. ifconfig ix0 down && ifconfig ix0 up - resolves the issue immediately. Another workaround which looks like resolved the issue: ifconfig ix0 -rxcsum -rxcsum6 -txcsum -txcsum6 -lro -tso -vlanhwtso - no connection losses observed for > 24 hours after that. Is there some command I should run next time issue is observed to help identify the root issue? Thanks.
Can you describe the sequence of events around the "loss of connection" more? Does it happen with no messages, and then you get the "No buffer space available error"? Does that buffer space message happen even though ifconfig says link is active?
we are having the same exact issue on our system on PfSense 2.5.2 (FreeBSD 12.2) Our main WAN interface is simple static IP (no VLANs just direct link) and every X amount of time (could be days) our wan just fully is unreachable (Has "No buffer space available" error) for us: ifconfig ix0 down && ifconfig ix0 up - resolves the issue immediately. for us we have vendor = 'Intel Corporation' device = 'Ethernet Connection X553 1GbE' drive = 'Intel(R) PRO/10GbE PCI-Express Network Driver' Has anyone else found the solution (or the bug?) for this>
Just to follow-up, I don't have this server anymore, but my last reply to DC technicians after they've completely replaced the whole server was: UPD: so the issue has happened again right after server\cables were replaced, so seems it's a NIC driver. Putting: -rxcsum -rxcsum6 -txcsum -txcsum6 -lro -tso -vlanhwtso to interface definition in rc.conf seems resolved the issue permanently. So Yif, try that approach or better move to another NIC.
Thanks for the quick reply Robert. Will give that a run command on the interface and start looking at moving to a different NIC card entirely. Also seeing a bunch of online threads about the X500's running on far slower speeds on FreeBSD than the linux of world. So could really be a driver issue. Thanks again
erj, is there a chance of the X553 problem being thermal?
Kevin, are you asking if things are overheating? This is our system in production that has the onboard X553 $ ipmitool sensor CPU Temp | 54.000 | degrees C | ok System Temp | 61.000 | degrees C | ok Peripheral Temp | 39.000 | degrees C | ok DIMMA1 Temp | 48.000 | degrees C | ok DIMMB1 Temp | 50.000 | degrees C | ok
(In reply to Kevin Bowling from comment #5) I don't think so, I think the thermal problem was specifically just the X552 and the external 10Gbase-T PHY.
(In reply to Eric Joyner from comment #7) Per the datasheets the 10GbaseT phy is intel in x552 and x553. Both might use the inphy CS4227 for a SFP+ cage. IIRC the issue you hinted at was for the CS4227 so that would rule out being related Yif's issue.
(In reply to Yif Swery from comment #2) Could you, please, provide also device ID? I think it is either 0x15E4 or 0x15E5, but just to be sure. We're trying to get a reproduction. Although considering Robert's original report, it might be related to the speed (1000baseT), not to the particular HW. We will check that too.
(In reply to Krzysztof Galazka from comment #9) This card info: ix3@pci0:7:0:1: class=0x020000 card=0x00008086 chip=0x15e48086 rev=0x11 hdr=0x00 vendor = 'Intel Corporation' device = 'Ethernet Connection X553 1GbE' class = network subclass = ethernet cap 01[40] = powerspec 3 supports D0 D3 current D0 cap 05[50] = MSI supports 1 message, 64 bit, vector masks cap 11[70] = MSI-X supports 64 messages, enabled Table in map 0x20[0x0], PBA in map 0x20[0x2000] cap 10[a0] = PCI-Express 2 endpoint max data 128(128) FLR RO max read 512 link x1(x1) speed 2.5(2.5) ASPM disabled(L0s/L1) ecap 0001[100] = AER 2 0 fatal 0 non-fatal 1 corrected ecap 0003[140] = Serial 1 0100c9ffff000000 ecap 000e[150] = ARI 1 ecap 0010[160] = SR-IOV 1 IOV disabled, Memory Space disabled, ARI disabled 0 VFs configured out of 64 supported First VF RID Offset 0x0180, VF RID Stride 0x0002 VF Device ID 0x15c5 Page Sizes: 4096 (enabled), 8192, 65536, 262144, 1048576, 4194304 ecap 000d[1b0] = ACS 1 the system its on is a "SuperMicro SYS-5019A-FTN4 1U Server"
What kind of traffic is used when the problem occurs? Is it TCP, UDP or both? Does it pass in a particular direction (RX- or TX-only) or is it bidirectional?
(In reply to Piotr Pietruszewski from comment #11) It was on our production system (using pfsense 2.5.2) we have both TCP and UDP in bidirectional running. Look at around 300mbps traffic going through the box on average. The issues happened at very very inconsistent times and I could not pin point anything on the network to be as anomaly that could be directly related (could not see an spikes in certain type of traffic before the interface stopped pushing traffic)
I wonder if this could be https://cgit.freebsd.org/src/commit/?h=stable/12&id=a5a91bd2a09ffe748b6dd7d5f996fe725c22c774, which I brought back to stable/12 so hopefully pfsense picks it up.