Bug 270285 - Network issue with very small frames (tcp, padded)
Summary: Network issue with very small frames (tcp, padded)
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.3-STABLE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-net (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-03-17 12:31 UTC by Marcus Haarmann
Modified: 2023-03-17 17:04 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marcus Haarmann 2023-03-17 12:31:37 UTC
Hi experts,

we are using freebsd under the hood of a pfsense firewall (latest CE release, which builds on 12.3 Stable).

Recently, we activated a bunch of servers in a new very fast (virtualized) environment, which is connected to the internal network with 10GBit. The pfsense firewall has 1GBit NICs on LAN and WAN side and is a physical machine.
We now encountered an issue which is probably resulting from an overload situation or MTU issue (while all MTU values are 1500 everywhere):
When transmitting a medium large file over http (apache http server running on Ubuntu on the server side), the traffic sometimes, not always contains a very small frame which is corrupting the output.

We can see in the packet capture taken on the LAN side of the firewall that normally all data frames are 1514 bytes, but all of a sudden, a 60 byte frame (marked as PSH,ACK) arrives which contains a very small tcp segment data of 4 bytes only. This frame is padded with two 0 bytes at the end AFTER the payload.

Extract from wireshark:
Transmission Control Protocol, Src Port: 80, Dst Port: 37710, Seq: 553466, Ack: 188, Len: 4Frame 136159: 60 bytes on wire (480 bits), 60 bytes captured (480 bits)
Ethernet II, Src: e2:84:72:d3:14:5c (e2:84:72:d3:14:5c), Dst: IETF-VRRP-VRID_04 (00:00:5e:00:01:04)
    Destination: IETF-VRRP-VRID_04 (00:00:5e:00:01:04)
    Source: e2:84:72:d3:14:5c (e2:84:72:d3:14:5c)
    Type: IPv4 (0x0800)
    Padding: 0000

Internet Protocol Version 4, Src: 192.168.25.41, Dst: 80.xxx.xxx.xxx
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes (5)
    Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
    Total Length: 44
    Identification: 0x594c (22860)
    Flags: 0x40, Don't fragment
    ...0 0000 0000 0000 = Fragment Offset: 0
    Time to Live: 64
    Protocol: TCP (6)
    Header Checksum: 0x37c2 [validation disabled]
    [Header checksum status: Unverified]
    Source Address: 192.168.25.41
    Destination Address: 80.xx.xx.xx
Transmission Control Protocol, Src Port: 80, Dst Port: 37710, Seq: 553466, Ack: 188, Len: 4
    Source Port: 80
    Destination Port: 37710
    [Stream index: 77]
    [Conversation completeness: Complete, WITH_DATA (47)]
    [TCP Segment Len: 4]
    Sequence Number: 553466    (relative sequence number)
    Sequence Number (raw): 2934325866
    [Next Sequence Number: 553470    (relative sequence number)]
    Acknowledgment Number: 188    (relative ack number)
    Acknowledgment number (raw): 2019009064
    0101 .... = Header Length: 20 bytes (5)
    Flags: 0x018 (PSH, ACK)
    Window: 42153
    [Calculated window size: 42153]
    [Window size scaling factor: -2 (no window scaling used)]
    Checksum: 0xc1c8 [unverified]
    [Checksum Status: Unverified]
    Urgent Pointer: 0
    [Timestamps]
    [SEQ/ACK analysis]
    TCP payload (4 bytes)
    [Reassembled PDU in frame: 137331]
    TCP segment data (4 bytes)

0000   00 00 5e 00 01 04 e2 84 72 d3 14 5c 08 00 45 00
0010   00 2c 59 4c 40 00 40 06 37 c2 c0 a8 19 29 50 XX
0020   XX XX 00 50 93 4e ae e6 42 6a 78 57 a2 28 50 18
0030   a4 a9 c1 c8 00 00 ee 86 11 a2 00 00

The real payload is only 4 bytes, (0xee, 0x86, 0x11, 0xa2), After that, two bytes are appended (sort of padding, resulting in a total of 60 bytes).
We do not know why this padding occurs, but it is copied into the forwarded frame, leading to a currupted output.


Any idea is welcome how to solve this. The padding seems to be the underlying reason of the data congestion, but we do not know if there is a way to prevent the padding on the sending side or if simply there is a bug in the tcp code of Freebsd which forwards this padded bytes as content.
Traffic between the internal machines is not affected for whatever reason (mostly Linux). 

The congestion issues do not occur on each transfer, but only sometimes. 
Most of the time, those small frames are not sent for whatever reason.
I found in the cap file that there have been TCP window full / TCP window update  messages, but these are 
We have tested always with the same file, but the frame sizes vary on each transfer.
Padding seems to be a well-known process in order to make small packets have the minimum length of 64 bytes, as I have read in some articles.
So the tcp code should be able to handle this from my point of view.
I can of course provide the .cap file with the defect flow if that makes sense.
But my C knowledge is very limited, so I am not able to debug the kernel (and as always, this happens in production, is not reproducable in testing, and we do not have the identical network setup for test).

Best regards,

Marcus
Comment 1 Michael Tuexen freebsd_committer freebsd_triage 2023-03-17 12:58:57 UTC
The two bytes after the IPv4 packet are padding. It is required, since the an ethernet frame must be at least of 64 bytes. The last 4 bytes are frame check sequence, so there must be 60 bytes. The IPv4 packet is 44 bytes, plus 6 bytes source, 6 bytes dest, 2 bytes upper layer information makes 58 bytes. Therefore 2 bytes is appended.
See https://en.wikipedia.org/wiki/Ethernet_frame
So the packet looks correct to me.
Comment 2 Marcus Haarmann 2023-03-17 13:18:37 UTC
Yes, packet also looks ok for me, the question is why the traffic forwarded to the client includes these two 0 bytes in the middle of the payload.
(pfsense/freebsd reorders the traffic and as a result, we are getting different
frame sizes in output).

So some part in the code does not the respect the actual length but seems to read the whole segment starting from the payload.
The whole setup is:
Server (10GBit) 
-> Switch1 
-> Switch2 
-> pfSense LAN (GBit) <--- here we can see the small packet with padding
-> haproxy 
-> pfSense WAN (GBit) <--- here we can see the 00 00 bytes in the outgoing frame
-> some internet hops
-> client   -> resulting in a defect download

We wanted to reduce this to a minimal number of components.
We were able to reproduce the error situation from local pfsense command line
(not touching the WAN interface or haproxy at all), with a "fetch http:......" call.

So even the local file was defect which was produced on the firewall.
This means that some code internally did forward the 0 bytes to the logical socket which was opened by the fetch command.
This can be reproduced in 1 of ~500 requests.
And we always see the padded packet in the incoming data in case a corruption is found.
Comment 3 Richard Scheffenegger freebsd_committer freebsd_triage 2023-03-17 13:24:40 UTC
hat does the packet look like after pfsense?

Is pfsense doing any kind of tcp reassmebly/resegmentation, or header normalization?

As tuexen@ already stated, the packet dump you provided is a perfectly valid, properly formatted minimum size ethernet frame, and the two bytes of (ethernet) padding should not interfere with any IP or TCP processing at all.

Can you provide how the packet looks like after pfsense?

But this appears not to be a TCP stack issue, IMHO.
Comment 4 Kristof Provost freebsd_committer freebsd_triage 2023-03-17 13:44:35 UTC
(In reply to Richard Scheffenegger from comment #3)
Given the presence of HAProxy (with unknown configuration), I'd say there's almost certainly some resegmentation going on.

The easiest way forward will be to take this discussion to a pfsense community support forum, when they should be told to test either pfSense plus 23.01 or a recent snapshot for pfSense 2.7. (Both plus and CE branches have migrated to freebsd current, and no one is going to be interested in debugging something on the stable/12-based branch).

If the issue can still be reproduced there the setup needs to be reduced further (i.e. does this occur without HAProxy in the path?) and fully described in a bug report for pfsense, including packet captures before and after the firewall device.
Comment 5 Marcus Haarmann 2023-03-17 13:51:47 UTC
reproduced some minutes ago, without haproxy, direct call to fetch:
The following frame was received:

0000   a0 36 9f 5f 90 42 e2 84 72 d3 14 5c 08 00 45 00
0010   00 2d cb 45 40 00 40 06 bc 08 c0 a8 19 29 c0 a8
0020   19 03 00 50 2b d2 a6 44 37 b5 d6 94 85 ed 50 18
0030   a4 cf d8 4e 00 00 b0 b9 89 d3 de 00

(padded with a single 00 byte), content was 5 bytes (b0 b9 89 d3 de).
Next packet received:
0000   a0 36 9f 5f 90 42 e2 84 72 d3 14 5c 08 00 45 00   .6._.B..r..\..E.
0010   05 dc cb 46 40 00 40 06 b6 58 c0 a8 19 29 c0 a8   ...F@.@..X...)..
0020   19 03 00 50 2b d2 a6 44 37 ba d6 94 85 ed 50 10   ...P+..D7.....P.
0030   a4 cf 22 8b 00 00 5e 2c 5b ad de 09 e6 d0 27 59   .."...^,[.....'Y
(data starts with 0x5e 0x2c ....)

Resulting defect (hex dump of defect file vs. correct file):
000eeac0  bd ec e8 40 92 5f 88 ef  ed dd 10 7c 3e 88 a3 23  |œìè@._.ïíÝ.|>.£#|
000eead0  e8 6c 67 b0 b9 89 d3 de  00 5e 2c 5b ad de 09 e6  |èlg°¹.ÓÞ.^,[­Þ.æ|
000eeae0  d0 27 59 1e f7 57 56 42  b3 db 91 18 1b 43 d2 eb  |Ð'Y.÷WVB³Û...CÒë|

000eeac0  bd ec e8 40 92 5f 88 ef  ed dd 10 7c 3e 88 a3 23  |œìè@._.ïíÝ.|>.£#|
000eead0  e8 6c 67 b0 b9 89 d3 de  5e 2c 5b ad de 09 e6 d0  |èlg°¹.ÓÞ^,[­Þ.æÐ|
000eeae0  27 59 1e f7 57 56 42 b3  db 91 18 1b 43 d2 eb 85  |'Y.÷WVB³Û...CÒë.|

The wrong byte was inserted between 0xde and 0x5e.
This is the local file constructed by a fetch http:.... command 
executed directly on the firewall.
If you say that a package of this kind is fully ok, which is also my understanding of the padding mechanism here, then the kernel should not forward this padding byte to user space.

I am not aware that the pfsense people to some kind of mangling. In this reduced setup, only the LAN adapter is touched, no forwarding occurs. Output is directly saved in the file.
Comment 6 Marcus Haarmann 2023-03-17 14:33:13 UTC
Only thing I found in pf -sr output:
scrub on igb0 inet all fragment reassemble
scrub on igb0 inet6 all fragment reassemble

So yes, they are using a pf mechanism in order to reassemble packets for the filter, but this is not specific to pfsense. 
It is a normal functionality of the pf.

I will try to remove this and check if we can reproduce the situation.
Comment 7 Marcus Haarmann 2023-03-17 14:50:19 UTC
Does not help, error still occurring. scrub now disabled
Comment 8 Michael Tuexen freebsd_committer freebsd_triage 2023-03-17 15:30:14 UTC
Please note that the padding is NOT part of the IP packet. Therefore, no node should consider at as being part of the IP packet or even the TCP payload. I guess some node is looking at the Ethernet frame length to deduce the IP packet length, instead of looking at the length field in the IP header. Network cards did this wrong in the past and failed hardware checksum checks since they took the padding into consideration where they shouldn't. This doesn't help here, but might give a hint where things can go wrong.
Comment 9 Marcus Haarmann 2023-03-17 17:04:10 UTC
Thanks for the hint with the old network cards, I will suggest exchanging the NIC with a new one. Maybe this helps. I am not sure how old this machine is. We have tried switching off and on the hardware 

I now posted the same request to the pfsense team. Maybe they have modified the kernel at the tcp level.