258732 – [tcp] TCP_MAXSEG does not work

Bug 258732 - [tcp] TCP_MAXSEG does not work

Summary: [tcp] TCP_MAXSEG does not work

Status:	New

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	12.2-STABLE
Hardware:	Any Any

Importance:	--- Affects Many People
Assignee:	freebsd-net (Nobody)

URL:
Keywords:

Depends on:
Blocks:

Reported:	2021-09-26 09:17 UTC by zhh0000zhh
Modified:	2024-02-08 12:26 UTC (History)
CC List:	3 users (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description zhh0000zhh 2021-09-26 09:17:21 UTC

Comment 1 zhh0000zhh 2021-09-26 09:20:58 UTC

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=144000

Now, TCP_MAXSEG needs to be called after connect to succeed, but this can only affect the local socket, not the remote MSS, because the MSS is already negotiated at connect, and by tcpdump catch the packet, the MSS value is not negotiate after TCP_MAXSEG is set at the local socket by setsockopt.

Comment 2 Michael Tuexen freebsd_committer

2021-09-26 14:35:58 UTC

The current man page of tcp says for the TCP_MAXSEG socket option:

By default, a sender- and receiver-TCP will negotiate among themselves to determine the maximum segment size to be used for each connection.  The TCP_MAXSEG option allows the user to determine the result of this negotiation, and to reduce it if desired.

Are you saying that the socket option does not work as described above?

Comment 3 zhh0000zhh 2021-09-27 00:16:36 UTC

(In reply to Michael Tuexen from comment #2)
Currently not working as expected.MSS cannot be set before connect.

Comment 4 Tom Jones freebsd_committer

2021-10-07 14:33:59 UTC

Do you have an example of open source software using this feature that we can look at?

Comment 5 Michael Tuexen freebsd_committer

2021-10-07 14:43:49 UTC

We discussed this at the transport call. This is not a bug, it is a feature request. However, we are not sure what the use case of this feature would be. So it would be good to answer comment #4.

Comment 6 zhh0000zhh 2021-10-07 14:57:24 UTC

My Code:
 int mss;
 socklen_t mss_size;
 mss_size = sizeof(mss);
 getsockopt(fd, IPPROTO_TCP, TCP_MAXSEG, &mss, &mss_size); //mss <1460
 setsockopt(fd2, IPPROTO_TCP, TCP_MAXSEG, &mss, mss_size); //need same mss

for more ,you can search "TCP_MAXSEG site:github.com" in google.

in rfc793 page 19, Section 3.1, "This field must only be sent in the initial connection request (i.e., in segments with the SYN control bit set)."
https://datatracker.ietf.org/doc/html/rfc793#section-3.1

I think this is bug, not feature.

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=144000 is not true.

Comment 7 Michael Tuexen freebsd_committer

2021-10-07 17:43:23 UTC

(In reply to zhh0000zhh from comment #6)

But why do you need to synchronise two TCP sockets. TCP provides a byte stream service. So why do you bother what the MSS is?

It is clear that the MSS option is only present in SYN and SYN ACK segments. But this has no relation to the question whether an application should have an way to impact the announcement of the MSS to the peer.

The argument that this is no a bug is that the code behaves as documented in the man page. After connection setup, you can query what the MSS used by the local endpoint is in send direction. You can even reduce them.

Linux also allows you set limit the MSS used by the peer. FreeBSD doesn't allow this right now. So you are asking for this feature to be added.

However, I don't understand why you need this feature. TCP is a byte stream protocol. So why do you want to limit the MSS used by your peer?

Comment 8 zhh0000zhh 2021-10-08 01:11:35 UTC

For high performance ipv6 to ipv4 forwarding, mss is required (TCP_NODELAY ON)

and

in rfc879 page 2, Section 3, 
"The MSS can be used completely independently in each
   direction of data flow.  The result may be quite different maximum
   sizes in the two directions."

https://datatracker.ietf.org/doc/html/rfc879#section-3

Comment 9 Michael Tuexen freebsd_committer

2021-10-08 09:03:44 UTC

(In reply to zhh0000zhh from comment #8)
Sure, the MSS can be different in each direction. Each node declares its maximum receive size.

However, I do not understand
* How the Nagle algorithm is involved in this discussion
* Why an application cares about the MSS. TCP is a byte stream. At the API level you have no guarantee that the data you receive has any relation to the segmentation which was used by the peer. On the sending side, you don't have to care about the segmentation process.

Comment 10 zhh0000zhh 2021-10-08 09:29:32 UTC

Turning off the Nagle algorithm causes the system to not wait for buffers.
And any socket forwarder should have low latency.
This leads to a problem for ipv6 to ipv4 forwarding where
When the forwarder accepts a message from ipv4 and forwards it to the ipv6 network, the actual data sent by ipv6 will be split into two packets because the MSS of ipv4 is larger than the MSS of ipv6 (e.g. an ipv4 packet has 1460 bytes payload, this packet will be split into a 1446 bytes packet and a 14 bytes packet in ipv6, with Nagle disabled, This problem can be verified using tcpdump)
This leads to a performance problem, as the ipv4 network generates one packet, while the ipv6 network outputs two packets, which has a significant impact on both packet count (there is a direct correlation between packet count and performance) and effective bandwidth (excessive packet header loss leads to reduced bandwidth utilization)
Inherently, the inability to set MSS does not lead to program errors, but in high performance and low latency demand scenarios can result in a significant waste of resources.

Comment 11 zhh0000zhh 2022-05-31 02:44:14 UTC

Has this problem been fixed? thanks

Comment 12 Arthur Kiyanovski 2024-02-08 12:26:52 UTC

(In reply to Michael Tuexen from comment #5)

Just adding a usage example:

A popular way to sanity test network performance is the iperf tool.
With this tool, the way to control the size of packets sent is with the -M parameter, which according to the documentation:

-M, --mss n
	      set TCP maximum segment size using TCP_MAXSEG

For example, to get a quick look at pps, what one may like to do is to set the -M parameter to some small number and run the BW test. This gives a good feel for pps.

There may be other ways to achieve this, but this is the popular quick and dirty one I know.