Setting IPV6_USE_MIN_MTU to one (1) on a IPv6 TCP socket results in fragmented IPv6 packets being sent rather than the TCP segment size being adjusted to reflect the MTU limit on the socket. 00:56:44.177930 IP6 2001:470:1f00:820:218:f3ff:feba:9a37 > 2001:470:1f00:820:6233:4bff:fe01:7585: frag (0|1232) 5555 > 63656: Flags [.], ack 42, win 8211, options [nop,nop,TS val 2829969063 ecr 1028520077], length 1200 00:56:44.177936 IP6 2001:470:1f00:820:218:f3ff:feba:9a37 > 2001:470:1f00:820:6233:4bff:fe01:7585: frag (1232|228) 00:56:44.177953 IP6 2001:470:1f00:820:218:f3ff:feba:9a37 > 2001:470:1f00:820:6233:4bff:fe01:7585: frag (0|1232) 5555 > 63656: Flags [.], ack 42, win 8211, options [nop,nop,TS val 2829969063 ecr 1028520077], length 1200 00:56:44.177957 IP6 2001:470:1f00:820:218:f3ff:feba:9a37 > 2001:470:1f00:820:6233:4bff:fe01:7585: frag (1232|228) 00:56:44.177974 IP6 2001:470:1f00:820:218:f3ff:feba:9a37 > 2001:470:1f00:820:6233:4bff:fe01:7585: frag (0|1232) 5555 > 63656: Flags [.], ack 42, win 8211, options [nop,nop,TS val 2829969063 ecr 1028520077], length 1200 Fix: The TCP layer should check whether ip6po_minmtu is set on the socket and adjust the maxmtu appropriately. The code fragment below should do that but has not been tested. sys/netinet/tcp_input.c: if (isipv6) { struct ip6_pktopts *opt; maxmtu = tcp_maxmtu6(&inp->inp_inc, mtuflags); opt = inp->inp_depend6.inp6_outputopts; if (opt && opt->ip6po_minmtu) maxmtu = min(maxmtu, IPV6_MMTU); tp->t_maxopd = tp->t_maxseg = V_tcp_v6mssdflt; } else How-To-Repeat: Apply the following patch to named and transfer a zone. [The intent of the patch is to avoid PMTUD issues. Too many nameservers are behind load balancers / firewalls that don't pass PTB messages.] diff --git a/lib/isc/unix/socket.c b/lib/isc/unix/socket.c index ffe7e02..6fb8860 100644 --- a/lib/isc/unix/socket.c +++ b/lib/isc/unix/socket.c @@ -2262,6 +2264,31 @@ clear_bsdcompat(void) { } #endif +static void +use_min_mtu(isc__socket_t *sock) { +#if !defined(IPV6_USE_MIN_MTU) && !defined(IPV6_MTU) + UNUSED(sock); +#endif +#ifdef IPV6_USE_MIN_MTU + /* use minimum MTU */ + if (sock->pf == AF_INET6) { + int on = 1; + (void)setsockopt(sock->fd, IPPROTO_IPV6, IPV6_USE_MIN_MTU, + (void *)&on, sizeof(on)); + } +#endif +#if defined(IPV6_MTU) + /* + * Use minimum MTU on IPv6 sockets. + */ + if (sock->pf == AF_INET6) { + int mtu = 1280; + (void)setsockopt(sock->fd, IPPROTO_IPV6, IPV6_MTU, + &mtu, sizeof(mtu)); + } +#endif +} + static isc_result_t opensocket(isc__socketmgr_t *manager, isc__socket_t *sock, isc__socket_t *dup_socket) @@ -2426,6 +2453,11 @@ opensocket(isc__socketmgr_t *manager, isc__socket_t *sock, } #endif + /* + * Use minimum mtu if possible. + */ + use_min_mtu(sock); + #if defined(USE_CMSG) || defined(SO_RCVBUF) if (sock->type == isc_sockettype_udp) { @@ -2490,32 +2522,6 @@ opensocket(isc__socketmgr_t *manager, isc__socket_t *sock, } #endif /* IPV6_RECVPKTINFO */ #endif /* ISC_PLATFORM_HAVEIN6PKTINFO */ -#ifdef IPV6_USE_MIN_MTU /* RFC 3542, not too common yet*/ - /* use minimum MTU */ - if (sock->pf == AF_INET6 && - setsockopt(sock->fd, IPPROTO_IPV6, IPV6_USE_MIN_MTU, - (void *)&on, sizeof(on)) < 0) { - isc__strerror(errno, strbuf, sizeof(strbuf)); - UNEXPECTED_ERROR(__FILE__, __LINE__, - "setsockopt(%d, IPV6_USE_MIN_MTU) " - "%s: %s", sock->fd, - isc_msgcat_get(isc_msgcat, - ISC_MSGSET_GENERAL, - ISC_MSG_FAILED, - "failed"), - strbuf); - } -#endif -#if defined(IPV6_MTU) - /* - * Use minimum MTU on IPv6 sockets. - */ - if (sock->pf == AF_INET6) { - int mtu = 1280; - (void)setsockopt(sock->fd, IPPROTO_IPV6, IPV6_MTU, - &mtu, sizeof(mtu)); - } -#endif #if defined(IPV6_MTU_DISCOVER) && defined(IPV6_PMTUDISC_DONT) /* * Turn off Path MTU discovery on IPv6/UDP sockets. @@ -3313,6 +3319,11 @@ internal_accept(isc_task_t *me, isc_event_t *ev) { NEWCONNSOCK(dev)->connected = 1; /* + * Use minimum mtu if possible. + */ + use_min_mtu(NEWCONNSOCK(dev)); + + /* * Save away the remote address */ dev->address = NEWCONNSOCK(dev)->peer_address;
Responsible Changed From-To: freebsd-bugs->freebsd-net Over to maintainer(s).
Responsible Changed From-To: freebsd-net->andre Take over.
If IP6PO_MINMTU_ALL is set it should also impact mss negotiation.
Need to account for ipv6 and tcp header sizes in advertised mss.
Mark, can you please check if this is still a problem? Assigning back to the pool.
(In reply to Hiren Panchasara from comment #5) My test system died years ago but I believe that it still is a problem. It should be trivial to check. create a IPv6 TCP socket. set IPV6_USE_MIN_MTU=1 using setsockopt connect to a data sink write 1400 bytes to the socket in a single operation Examine the packets sent with tcpdump. There should be no fragmented packets being sent as TCP is supposed to take into account MTU information. Mark
(In reply to marka from comment #6) > (In reply to Hiren Panchasara from comment #5) > My test system died years ago but I believe that it still is a problem. > > It should be trivial to check. > > create a IPv6 TCP socket. > set IPV6_USE_MIN_MTU=1 using setsockopt > connect to a data sink > write 1400 bytes to the socket in a single operation > > Examine the packets sent with tcpdump. There should be no fragmented > packets being sent as TCP is supposed to take into account MTU > information. According to RFC3542 this is what the kernel should do - do IP fragmentation as application requested. https://tools.ietf.org/html/rfc3542#section-11.1 "If the packet is larger than the minimum MTU and this feature has been enabled the IP layer will fragment to the minimum MTU."
And this is what always pisses me off. If we have 10/50/100G link with 9k MTU, bind always does IPv6 fragmentation due to this option.
(In reply to Andrey V. Elsukov from comment #7) RFC 6691 o As a result, when the effective MTU of an interface varies, TCP SHOULD use the smallest effective MTU of the interface to calculate the value to advertise in the MSS option. IPV6_USE_MIN_MTU=1 changes the effective MTU of the interface for this socket.
(In reply to Andrey V. Elsukov from comment #8) So what! Most DNS/TCP response is a few of packets. What does it matter if it is the 3 or 4 packets. What matters is avoiding PMTUD as it is NOT reliable. Setting the IPv6 packet size to 1280 avoids triggering PMTUD issues. Limiting the packet size avoids timeout and retransmissions due to PTB not been generated due to rate limiting or being lost due to stupid load balancers and firewalls that drop ICMP. Go put your validating resolvers behind a IPv6 in IPv4 link then come back and say this is not needed.
(In reply to marka from comment #9) > (In reply to Andrey V. Elsukov from comment #7) > RFC 6691 > > o As a result, when the effective MTU of an interface varies, TCP > SHOULD use the smallest effective MTU of the interface to calculate > the value to advertise in the MSS option. > > IPV6_USE_MIN_MTU=1 changes the effective MTU of the interface for this > socket. This is socket option and it doesn't change interface's MTU value and doesn't affect MSS value, as I see. It just instructs the kernel explicitly do IPv6 fragmentation exactly as described in the RFC3542.
(In reply to marka from comment #10) > (In reply to Andrey V. Elsukov from comment #8) > So what! Most DNS/TCP response is a few of packets. What does it > matter if it is the 3 or 4 packets. Zone transfers need a lot of such few packets. > What matters is avoiding PMTUD as it is NOT reliable. Setting the > IPv6 packet size to 1280 avoids triggering PMTUD issues. Limiting > the packet size avoids timeout and retransmissions due to PTB not > been generated due to rate limiting or being lost due to stupid > load balancers and firewalls that drop ICMP. > > Go put your validating resolvers behind a IPv6 in IPv4 link then > come back and say this is not needed. When I build the network in the DC, I know better what MTU can be used in my network. And forcing 1280 bytes size for the network, where 9k is the default MTU is at least strange in the 2017.
(In reply to Andrey V. Elsukov from comment #11) Read the words "effective MTU" that I quoted. The "effective MTU" is 1280 with this option set.
I think the TCP mss should honor the IPV6_USE_MIN_MTU option, the same way as SCTP should use the minmtu. We should also disable PMTU when the socket option is set. Will put a corresponding patch into phabricator...
batch change of PRs untouched in 2018 marked "in progress" back to open.
A patch is under review: https://reviews.freebsd.org/D16796
A commit references this bug: Author: tuexen Date: Tue Aug 21 14:12:31 UTC 2018 New revision: 338138 URL: https://svnweb.freebsd.org/changeset/base/338138 Log: Enabling the IPPROTO_IPV6 level socket option IPV6_USE_MIN_MTU on a TCP socket resulted in sending fragmented IPV6 packets. This is fixes by reducing the MSS to the appropriate value. In addtion, if the socket option is set before the handshake happens, announce this MSS to the peer. This is not stricly required, but done since TCP is conservative. PR: 173444 Reviewed by: bz@, rrs@ MFC after: 1 month Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16796 Changes: head/sys/netinet/in_pcb.h head/sys/netinet/tcp_input.c head/sys/netinet/tcp_subr.c head/sys/netinet/tcp_usrreq.c
Michael, can this PR be closed? Thanks
(In reply to Oleksandr Tymoshenko from comment #18) I think I should MFC that to stable/11... Then I'll close it. Will do that next week.
A commit references this bug: Author: tuexen Date: Fri Jan 25 15:25:54 UTC 2019 New revision: 343432 URL: https://svnweb.freebsd.org/changeset/base/343432 Log: MFC r338138: Enabling the IPPROTO_IPV6 level socket option IPV6_USE_MIN_MTU on a TCP socket resulted in sending fragmented IPV6 packets. This is fixes by reducing the MSS to the appropriate value. In addtion, if the socket option is set before the handshake happens, announce this MSS to the peer. This is not stricly required, but done since TCP is conservative. PR: 173444 Reviewed by: bz@, rrs@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16796 Changes: _U stable/11/ stable/11/sys/netinet/in_pcb.h stable/11/sys/netinet/tcp_input.c stable/11/sys/netinet/tcp_subr.c stable/11/sys/netinet/tcp_usrreq.c
MFCed to stable/11, but will not MFC to stable/10, since FreeBSD 10.x is EOL.