Bug 254147 - No buffer space available error on NIC Intel 10G X550T
Summary: No buffer space available error on NIC Intel 10G X550T
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.2-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-net (Nobody)
URL:
Keywords: IntelNetworking
Depends on:
Blocks:
 
Reported: 2021-03-08 22:38 UTC by robert.ayrapetyan
Modified: 2021-10-04 18:06 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description robert.ayrapetyan 2021-03-08 22:38:43 UTC
Configuration:

Server: Dell R740XD
NIC:     
vendor     = 'Intel Corporation'
device     = 'Ethernet Controller 10G X550T'
ix0: 
flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
media: Ethernet autoselect (1000baseT <full-duplex>)

Config overrides:
net.inet.tcp.hostcache.cachelimit="0"
machdep.hyperthreading_allowed="0"
net.inet.tcp.soreceive_stream="1"
net.isr.maxthreads="-1"
net.isr.bindthreads="1"
dev.ix.0.fc=0
dev.ix.0.iflib.rx_budget=65535
dev.ix.1.iflib.rx_budget=65535

After certain amount of time (2-8 hours) under a constant network load ~ 1Gbit\s, observing a loss of connection and any ping\traceroute end up with "No buffer space available" error.

netstat -m \ netstat -s output looks normal, all values are way beyond limits. No errors in syslog.

DC crew replaced hardware\cables - didn't help.

ifconfig ix0 down && ifconfig ix0 up - resolves the issue immediately.

Another workaround which looks like resolved the issue:
ifconfig ix0 -rxcsum -rxcsum6 -txcsum -txcsum6 -lro -tso -vlanhwtso
- no connection losses observed for > 24 hours after that.

Is there some command I should run next time issue is observed to help identify the root issue? Thanks.
Comment 1 Eric Joyner freebsd_committer freebsd_triage 2021-05-20 17:53:14 UTC
Can you describe the sequence of events around the "loss of connection" more?

Does it happen with no messages, and then you get the "No buffer space available error"? Does that buffer space message happen even though ifconfig says link is active?
Comment 2 Yif Swery 2021-08-23 18:04:13 UTC
we are having the same exact issue on our system on PfSense 2.5.2 (FreeBSD 12.2)

Our main WAN interface is simple static IP (no VLANs just direct link) and every X amount of time (could be days) our wan just fully is unreachable (Has "No buffer space available" error)

for us:

ifconfig ix0 down && ifconfig ix0 up - resolves the issue immediately.


for us we have 

    vendor     = 'Intel Corporation'
    device     = 'Ethernet Connection X553 1GbE'
    drive      = 'Intel(R) PRO/10GbE PCI-Express Network Driver'


Has anyone else found the solution (or the bug?) for this>
Comment 3 robert.ayrapetyan 2021-08-23 18:35:35 UTC
Just to follow-up, I don't have this server anymore, but my last reply to DC technicians after they've completely replaced the whole server was:

UPD: so the issue has happened again right after server\cables were replaced, so seems it's a NIC driver.
Putting:
-rxcsum -rxcsum6 -txcsum -txcsum6 -lro -tso -vlanhwtso
to interface definition in rc.conf seems resolved the issue permanently.

So Yif, try that approach or better move to another NIC.
Comment 4 Yif Swery 2021-08-23 18:40:27 UTC
Thanks for the quick reply Robert.

Will give that a run command on the interface and start looking at moving to a different NIC card entirely.

Also seeing a bunch of online threads about the X500's running on far slower speeds on FreeBSD than the linux of world. So could really be a driver issue.

Thanks again
Comment 5 Kevin Bowling freebsd_committer freebsd_triage 2021-08-23 20:16:04 UTC
erj, is there a chance of the X553 problem being thermal?
Comment 6 Yif Swery 2021-08-23 21:16:46 UTC
Kevin, are you asking if things are overheating?

This is our system in production that has the onboard X553

$ ipmitool sensor
CPU Temp         | 54.000     | degrees C  | ok
System Temp      | 61.000     | degrees C  | ok
Peripheral Temp  | 39.000     | degrees C  | ok
DIMMA1 Temp      | 48.000     | degrees C  | ok
DIMMB1 Temp      | 50.000     | degrees C  | ok
Comment 7 Eric Joyner freebsd_committer freebsd_triage 2021-08-23 22:50:40 UTC
(In reply to Kevin Bowling from comment #5)

I don't think so, I think the thermal problem was specifically just the X552 and the external 10Gbase-T PHY.
Comment 8 Kevin Bowling freebsd_committer freebsd_triage 2021-08-24 00:49:03 UTC
(In reply to Eric Joyner from comment #7)
Per the datasheets the 10GbaseT phy is intel in x552 and x553.  Both might use the inphy CS4227 for a SFP+ cage.  IIRC the issue you hinted at was for the CS4227 so that would rule out being related Yif's issue.
Comment 9 Krzysztof Galazka 2021-08-24 10:21:36 UTC
(In reply to Yif Swery from comment #2)

Could you, please, provide also device ID? I think it is either 0x15E4 or 0x15E5, but just to be sure. We're trying to get a reproduction. 

Although considering Robert's original report, it might be related to the speed (1000baseT), not to the particular HW. We will check that too.
Comment 10 Yif Swery 2021-08-24 14:17:44 UTC
(In reply to Krzysztof Galazka from comment #9)

This card info:

ix3@pci0:7:0:1:	class=0x020000 card=0x00008086 chip=0x15e48086 rev=0x11 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Ethernet Connection X553 1GbE'
    class      = network
    subclass   = ethernet
    cap 01[40] = powerspec 3  supports D0 D3  current D0
    cap 05[50] = MSI supports 1 message, 64 bit, vector masks
    cap 11[70] = MSI-X supports 64 messages, enabled
                 Table in map 0x20[0x0], PBA in map 0x20[0x2000]
    cap 10[a0] = PCI-Express 2 endpoint max data 128(128) FLR RO
                 max read 512
                 link x1(x1) speed 2.5(2.5) ASPM disabled(L0s/L1)
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 1 corrected
    ecap 0003[140] = Serial 1 0100c9ffff000000
    ecap 000e[150] = ARI 1
    ecap 0010[160] = SR-IOV 1 IOV disabled, Memory Space disabled, ARI disabled
                     0 VFs configured out of 64 supported
                     First VF RID Offset 0x0180, VF RID Stride 0x0002
                     VF Device ID 0x15c5
                     Page Sizes: 4096 (enabled), 8192, 65536, 262144, 1048576, 4194304
    ecap 000d[1b0] = ACS 1



the system its on is a "SuperMicro SYS-5019A-FTN4 1U Server"
Comment 11 Piotr Pietruszewski 2021-08-31 12:31:40 UTC
What kind of traffic is used when the problem occurs? Is it TCP, UDP or both? Does it pass in a particular direction (RX- or TX-only) or is it bidirectional?
Comment 12 Yif Swery 2021-08-31 17:17:11 UTC
(In reply to Piotr Pietruszewski from comment #11)

It was on our production system (using pfsense 2.5.2) we have both TCP and UDP in bidirectional running. Look at around 300mbps traffic going through the box on average.

The issues happened at very very inconsistent times and I could not pin point anything on the network to be as anomaly that could be directly related (could not see an spikes in certain type of traffic before the interface stopped pushing traffic)
Comment 13 Kevin Bowling freebsd_committer freebsd_triage 2021-10-04 18:06:19 UTC
I wonder if this could be https://cgit.freebsd.org/src/commit/?h=stable/12&id=a5a91bd2a09ffe748b6dd7d5f996fe725c22c774, which I brought back to stable/12 so hopefully pfsense picks it up.