See dump from A.B.C.232: 16:42:03.520909 A.B.C.41.1816 > A.B.C.232.http: S 0:0(0) win 57344 <mss 1460,...> (DF) 16:42:03.520936 A.B.C.232.http > A.B.C.41.1816: S 0:0(0) ack 0 win 65535 <mss 1460,...> (DF) 16:42:03.565467 A.B.C.41.1816 > A.B.C.232.http: . ack 1 win 57920 <nop,...> (DF) 16:42:03.574552 A.B.C.41.1816 > A.B.C.232.http: P 1:443(442) ack 1 win 57920 <nop,...> (DF) 16:42:03.575489 A.B.C.232.http > A.B.C.41.1816: . 1:1449(1448) ack 443 win 33304 <nop,...> (DF) 16:42:03.575499 A.B.C.232.http > A.B.C.41.1816: P 1449:2108(659) ack 443 win 33304 <nop,...> (DF) 16:42:03.575953 A.B.C.225 > A.B.C.232: icmp: A.B.C.41 unreachable - need to frag (DF) 16:42:03.635175 A.B.C.41.1816 > A.B.C.232.http: . ack 1 win 57920 <nop,...> (DF) 16:42:05.920839 A.B.C.232.http > A.B.C.41.1816: . 1:1449(1448) ack 443 win 33304 <nop,...> (DF) 16:42:05.921312 A.B.C.225 > A.B.C.232: icmp: A.B.C.41 unreachable - need to frag (DF) 16:42:10.420908 A.B.C.232.http > A.B.C.41.1816: . 1:1449(1448) ack 443 win 33304 <nop,...> (DF) 16:42:10.421421 A.B.C.225 > A.B.C.232: icmp: A.B.C.41 unreachable - need to frag (DF) ... Part of one of ICMP packets: 16:42:03.575953 A.B.C.225 > A.B.C.232: icmp: A.B.C.41 unreachable - need to frag (DF) 0x0000 4500 0038 b25c 4000 4001 6aa0 AABB CCe1 0x0010 AABB CCe8 0x0014 0304 426e (type, subtype, cksum) 0x0018 0000 (reserved) 0x001A 0000 (next_mtu) When arrive NEEDFRAG icmp which didn't contain a next_mtu proposal then MTU is not changed. Sender retransmitting the data with unchanged (large) MTU so communication is impossible. Fix: The relevant part of code is (ip_icmp.c): /* ------------------------------------------- */ mtu = ntohs(icp->icmp_nextmtu); if (!mtu) mtu = ip_next_mtu(mtu, 1); if (mtu >= max(296, (tcp_minmss + sizeof(struct tcpiphdr)))) tcp_hc_updatemtu(&inc, mtu); #ifdef DEBUG_MTUDISC printf("MTU for %s reduced to %d\n", inet_ntoa(icmpsrc.sin_addr), mtu); #endif /* ------------------------------------------- */ The icmp_nextmtu is zero. The ip_next_mtu is trying to found smaller value for MTU, but smaller value for zero is zero again. Zero didn't pass the sanity check for minimum value, so MTU is not changed. It seems to be error within 'mtu = ip_next_mtu(mtu, 1)' line. It should not use proposed MTU (which is zero already) as argument but current MTU from hc cache. How-To-Repeat: See description.
Responsible Changed From-To: freebsd-bugs->freebsd-net Over to freebsd-net Mailinglist for evaluation
Hi Dan, We have a bit different code in HEAD and it seems for me it works correctly with zero icmp_nextmtu. All we need is to ask Andre to merge his work to RELENG_5. I'd prepare a patch if you have chance to test. -- Maxim Konovalov
Maxim Konovalov wrote: > > Hi Dan, > > We have a bit different code in HEAD and it seems for me it works > correctly with zero icmp_nextmtu. All we need is to ask Andre to > merge his work to RELENG_5. I'd prepare a patch if you have chance to > test. Duh, that's still on my roster! I'm sorry for taking this long and I'll get this done later today. -- Andre
Responsible Changed From-To: freebsd-net->andre Take over. Patch just needs MFC.
Maxim Konovalov wrote: > We have a bit different code in HEAD and it seems for me it works > correctly with zero icmp_nextmtu. All we need is to ask Andre to > merge his work to RELENG_5. I'd prepare a patch if you have chance to > test. Moving the processing of NEEDFRAG from icmp_input() to tcp_ctlinput() seems to be good way. My patch is hack a lot only. I'll wait for Andre's MFC. This PR could be closed after it. Thanks Dan
Dan Lukes wrote: > > Maxim Konovalov wrote: > > We have a bit different code in HEAD and it seems for me it works > > correctly with zero icmp_nextmtu. All we need is to ask Andre to > > merge his work to RELENG_5. I'd prepare a patch if you have chance to > > test. > > Moving the processing of NEEDFRAG from icmp_input() to > tcp_ctlinput() seems to be good way. It is the correct way because we needs check whether the NEEDFRAG ain't spoofed. > My patch is hack a lot only. The effect of your patch is not what you think it is. Normally NEEDFRAG happens when the TCP window is already open and a couple too big packets generate as many NEEDFRAG answers stepping the path MTU down too much. > I'll wait for Andre's MFC. This PR could be closed after it. -- Andre
>>My patch is hack a lot only. > > The effect of your patch is not what you think it is. Normally > NEEDFRAG happens when the TCP window is already open and a couple > too big packets generate as many NEEDFRAG answers stepping the > path MTU down too much. True. I miss it. Lets MFC your changes. Dan
Just tested on fresh 5.4-STABLE and problem still exists. I'm not sure, but I think that new mtu value should not be based on ip->ip_len, which is length of needfrag icmp packet? /* * If no alternative MTU was * proposed, try the next smaller * one. */ if (!mtu) mtu = ip_next_mtu(ntohs(ip->ip_len), 1); vita
Vitezslav Novy wrote: > > Just tested on fresh 5.4-STABLE and problem still exists. > I'm not sure, but I think that new mtu value should not be based on > ip->ip_len, which is length of needfrag icmp packet? No, it's derived from *vip which is the IP header of the packet that caused the ICMP packet. The code is correct. Please provide a new tcpdump of the transaction in question. -- Andre
There are second place with similar problem. The NEETDFRAG with filled nextmtu has been seen on the wire, but ignored by destination. On the side in question the ICMP packed return throught IPNAT. Maybe the data part of packed is incorrectly translated or so, so packet is not properly recognized and assigned to apropriate connection on receiving side. It may be IPNAT bug or side effect of recent change of kernel code. I have no acces to the place, so I can't monitor it by myself. I ask them to provide the dump. Dan
Let's here's dump (-vv option used). ############################################## 22:00:50.945513 192.168.35.6.hosts2-ns > 192.168.222.141.1865: . [tcp sum ok] 311:1771(1460) ack 529 win 65535 (DF) (ttl 64, id 19535, len 1500) 22:00:50.945983 10.250.35.129 > 192.168.35.6: icmp: 192.168.222.141 unreachable - need to frag (mtu 1438) for 192.168.35.6.hosts2-ns > 192.168.222.141.1865: [|tcp] (DF) (ttl 62, id 19535, len 1500) (ttl 254, id 20508, len 56) 22:00:51.985402 192.168.35.6.hosts2-ns > 192.168.222.141.1865: . [tcp sum ok] 311:1771(1460) ack 529 win 65535 (DF) (ttl 64, id 20043, len 1500) 22:00:51.985860 10.250.35.129 > 192.168.35.6: icmp: 192.168.222.141 unreachable - need to frag (mtu 1438) for 192.168.35.6.hosts2-ns > 192.168.222.141.1865: [|tcp] (DF) (ttl 62, id 20043, len 1500) (ttl 254, id 20512, len 56) 22:00:53.865203 192.168.35.6.hosts2-ns > 192.168.222.141.1865: . [tcp sum ok] 311:1771(1460) ack 529 win 65535 (DF) (ttl 64, id 21013, len 1500) 22:00:53.865609 10.250.35.129 > 192.168.35.6: icmp: 192.168.222.141 unreachable - need to frag (mtu 1438) for 192.168.35.6.hosts2-ns > 192.168.222.141.1865: [|tcp] (DF) (ttl 62, id 21013, len 1500) (ttl 254, id 20519, len 56) the same sekvence repeated unitl connection closed ############################################## The content of packet may be relevant also: 22:00:50.945513 192.168.35.6.hosts2-ns > 192.168.222.141.1865: . [tcp sum ok] 311:1771(1460) ack 529 win 65535 (DF) (ttl 64, id 19535, len 1500) 0x0000 4500 05dc 4c4f 4000 4006 65e8 c0a8 2306 E...LO@.@.e...#. 0x0010 c0a8 de8d 0051 0749 1820 47ac d868 88be .....Q.I..G..h.. 0x0020 5010 ffff 2b99 0000 3c68 746d 6c3e 0d0a P...+...<html>.. ... (rest of packet is unimportant) 0x05d0 203c 6120 6872 6566 3d27 6874 .<a.href='ht Complete ICMP NEEDFRAG reply: 22:00:50.945983 10.250.35.129 > 192.168.35.6: icmp: 192.168.222.141 unreachable - need to frag (mtu 1438) for 192.168.35.6.hosts2-ns > 192.168.222.141.1865: [|tcp] (DF) (ttl 62, id 19535, len 1500) (ttl 254, id 20508, len 56) 0x0000 4500 0038 501c 0000 fe01 5a7f 0afa 2381 E..8P.....Z...#. 0x0010 c0a8 2306 0304 8ff7 0000 059e 4500 05dc ..#.........E... 0x0020 4c4f 4000 3e06 67e8 c0a8 2306 c0a8 de8d LO@.>.g...#..... 0x0030 0051 0749 1820 47ac .Q.I..G. Proposed nextmtu is 0x59e=1438 I can check the data also - the header of original packet: 4500 05dc 4c4f 4000 4006 65e8 c0a8 2306 c0a8 de8d 0051 0749 1820 47ac The header of packet as returned within ICMP 4500 05dc 4c4f 4000 3e06 67e8 c0a8 2306 c0a8 de8d 0051 0749 1820 47ac It seems to be correct (the TTL and checksum is diferent, of course) Despite of it, source continue to send packet with len 1460 until connection closed. Dan
Andre Oppermann wrote: > No, it's derived from *vip which is the IP header of the packet > that caused the ICMP packet. The code is correct. Ok, I see it now. > Please provide a new tcpdump of the transaction in question. Here is new tcpdump vita 17:00:48.521890 IP (tos 0x0, ttl 64, id 6520, offset 0, flags [DF], length: 64) 10.0.4.232.64571 > 10.0.4.78.57486: S [tcp sum ok] 2067350785:2067350785(0) win 65535 <mss 1460,nop,nop,sackOK,nop,wscale 1,nop,nop,timestamp 318074 0> 17:00:48.565967 IP (tos 0x0, ttl 62, id 23135, offset 0, flags [DF], length: 60) 10.0.4.78.57486 > 10.0.4.232.64571: S [tcp sum ok] 2229625222:2229625222(0) ack 2067350786 win 57344 <mss 1460,nop,wscale 0,nop,nop,timestamp 2367882411 318074> 17:00:48.565990 IP (tos 0x0, ttl 64, id 6522, offset 0, flags [DF], length: 52) 10.0.4.232.64571 > 10.0.4.78.57486: . [tcp sum ok] 1:1(0) ack 1 win 33304 <nop,nop,timestamp 318079 2367882411> 17:00:48.615113 IP (tos 0x8, ttl 64, id 6524, offset 0, flags [DF], length: 1500) 10.0.4.232.64571 > 10.0.4.78.57486: . [tcp sum ok] 1:1449(1448) ack 1 win 33304 <nop,nop,timestamp 318084 2367882411> 0x0000: 4508 05dc 197c 4000 4006 fe62 0a00 04e8 0x0010: 0a00 044e fc3b e08e 7b39 4502 84e5 6187 0x0020: 8010 8218 ebdf 0000 0101 080a 0004 da84 0x0030: 8d23 04ab 7f45 4c46 0101 0109 0000 0000 <--- skipped ---> 17:00:48.615116 IP (tos 0x8, ttl 64, id 6525, offset 0, flags [DF], length: 1500) 10.0.4.232.64571 > 10.0.4.78.57486: . [tcp sum ok] 1449:2897(1448) ack 1 win 33304 <nop,nop,timestamp 318084 2367882411> 0x0000: 4508 05dc 197d 4000 4006 fe61 0a00 04e8 0x0010: 0a00 044e fc3b e08e 7b39 4aaa 84e5 6187 0x0020: 8010 8218 8197 0000 0101 080a 0004 da84 0x0030: 8d23 04ab b218 0000 7c12 0000 ff01 0000 <--- skipped ---> 17:00:48.615118 IP (tos 0x8, ttl 64, id 6526, offset 0, flags [DF], length: 1500) 10.0.4.232.64571 > 10.0.4.78.57486: . [tcp sum ok] 2897:4345(1448) ack 1 win 33304 <nop,nop,timestamp 318084 2367882411> 0x0000: 4508 05dc 197e 4000 4006 fe60 0a00 04e8 0x0010: 0a00 044e fc3b e08e 7b39 5052 84e5 6187 0x0020: 8010 8218 5b9d 0000 0101 080a 0004 da84 0x0030: 8d23 04ab fe18 0000 1114 0000 2f19 0000 <--- skipped ---> 17:00:48.615669 IP (tos 0x0, ttl 64, id 12369, offset 0, flags [DF], length: 56) 10.0.4.225 > 10.0.4.232: icmp 36: 10.0.4.78 unreachable - need to frag for IP (tos 0x8, ttl 64, id 6524, offset 0, flags [DF], length: 1500) 10.0.4.232.64571 > 10.0.4.78.57486: [|tcp] 0x0000: 4500 0038 3051 4000 4001 ecab 0a00 04e1 0x0010: 0a00 04e8 0304 5ff5 0000 0000 4508 05dc 0x0020: 197c 4000 4006 fe62 0a00 04e8 0a00 044e 0x0030: fc3b e08e 7b39 4502 17:00:48.615749 IP (tos 0x0, ttl 64, id 12370, offset 0, flags [DF], length: 56) 10.0.4.225 > 10.0.4.232: icmp 36: 10.0.4.78 unreachable - need to frag for IP (tos 0x8, ttl 64, id 6525, offset 0, flags [DF], length: 1500) 10.0.4.232.64571 > 10.0.4.78.57486: [|tcp] 0x0000: 4500 0038 3052 4000 4001 ecaa 0a00 04e1 0x0010: 0a00 04e8 0304 5a4d 0000 0000 4508 05dc 0x0020: 197d 4000 4006 fe61 0a00 04e8 0a00 044e 0x0030: fc3b e08e 7b39 4aaa 17:00:48.615913 IP (tos 0x0, ttl 64, id 12371, offset 0, flags [DF], length: 56) 10.0.4.225 > 10.0.4.232: icmp 36: 10.0.4.78 unreachable - need to frag for IP (tos 0x8, ttl 64, id 6526, offset 0, flags [DF], length: 1500) 10.0.4.232.64571 > 10.0.4.78.57486: [|tcp] 0x0000: 4500 0038 3053 4000 4001 eca9 0a00 04e1 0x0010: 0a00 04e8 0304 54a5 0000 0000 4508 05dc 0x0020: 197e 4000 4006 fe60 0a00 04e8 0a00 044e 0x0030: fc3b e08e 7b39 5052 17:00:48.714074 IP (tos 0x10, ttl 64, id 6529, offset 0, flags [DF], length: 52) 10.0.4.232.61358 > 10.0.4.78.21: . [tcp sum ok] 69:69(0) ack 365 win 33304 <nop,nop,timestamp 318094 2367882415> 0x0000: 4510 0034 1981 4000 4006 03fe 0a00 04e8 0x0010: 0a00 044e efae 0015 a15f 429f 6803 c07c 0x0020: 8010 8218 6ec7 0000 0101 080a 0004 da8e 0x0030: 8d23 04af 17:00:48.994083 IP (tos 0x8, ttl 64, id 6530, offset 0, flags [DF], length: 1500) 10.0.4.232.64571 > 10.0.4.78.57486: . [tcp sum ok] 1:1449(1448) ack 1 win 33304 <nop,nop,timestamp 318122 2367882411> 0x0000: 4508 05dc 1982 4000 4006 fe5c 0a00 04e8 0x0010: 0a00 044e fc3b e08e 7b39 4502 84e5 6187 0x0020: 8010 8218 ebb9 0000 0101 080a 0004 daaa 0x0030: 8d23 04ab 7f45 4c46 0101 0109 0000 0000 <--- skipped ---> 17:00:48.994603 IP (tos 0x0, ttl 64, id 12380, offset 0, flags [DF], length: 56) 10.0.4.225 > 10.0.4.232: icmp 36: 10.0.4.78 unreachable - need to frag for IP (tos 0x8, ttl 64, id 6530, offset 0, flags [DF], length: 1500) 10.0.4.232.64571 > 10.0.4.78.57486: [|tcp] 0x0000: 4500 0038 305c 4000 4001 eca0 0a00 04e1 0x0010: 0a00 04e8 0304 5ff5 0000 0000 4508 05dc 0x0020: 1982 4000 4006 fe5c 0a00 04e8 0a00 044e 0x0030: fc3b e08e 7b39 4502 17:00:49.554105 IP (tos 0x8, ttl 64, id 6531, offset 0, flags [DF], length: 1500) 10.0.4.232.64571 > 10.0.4.78.57486: . [tcp sum ok] 1:1449(1448) ack 1 win 33304 <nop,nop,timestamp 318178 2367882411> 0x0000: 4508 05dc 1983 4000 4006 fe5b 0a00 04e8 0x0010: 0a00 044e fc3b e08e 7b39 4502 84e5 6187 0x0020: 8010 8218 eb81 0000 0101 080a 0004 dae2 0x0030: 8d23 04ab 7f45 4c46 0101 0109 0000 0000 <--- skipped ---> 17:00:49.554649 IP (tos 0x0, ttl 64, id 12387, offset 0, flags [DF], length: 56) 10.0.4.225 > 10.0.4.232: icmp 36: 10.0.4.78 unreachable - need to frag for IP (tos 0x8, ttl 64, id 6531, offset 0, flags [DF], length: 1500) 10.0.4.232.64571 > 10.0.4.78.57486: [|tcp] 0x0000: 4500 0038 3063 4000 4001 ec99 0a00 04e1 0x0010: 0a00 04e8 0304 5ff5 0000 0000 4508 05dc 0x0020: 1983 4000 4006 fe5b 0a00 04e8 0a00 044e 0x0030: fc3b e08e 7b39 4502 17:00:50.474110 IP (tos 0x8, ttl 64, id 6534, offset 0, flags [DF], length: 1500) 10.0.4.232.64571 > 10.0.4.78.57486: . [tcp sum ok] 1:1449(1448) ack 1 win 33304 <nop,nop,timestamp 318270 2367882411> 0x0000: 4508 05dc 1986 4000 4006 fe58 0a00 04e8 0x0010: 0a00 044e fc3b e08e 7b39 4502 84e5 6187 0x0020: 8010 8218 eb25 0000 0101 080a 0004 db3e 0x0030: 8d23 04ab 7f45 4c46 0101 0109 0000 0000 <--- skipped ---> 17:00:50.474623 IP (tos 0x0, ttl 64, id 12400, offset 0, flags [DF], length: 56) 10.0.4.225 > 10.0.4.232: icmp 36: 10.0.4.78 unreachable - need to frag for IP (tos 0x8, ttl 64, id 6534, offset 0, flags [DF], length: 1500) 10.0.4.232.64571 > 10.0.4.78.57486: [|tcp] 0x0000: 4500 0038 3070 4000 4001 ec8c 0a00 04e1 0x0010: 0a00 04e8 0304 5ff5 0000 0000 4508 05dc 0x0020: 1986 4000 4006 fe58 0a00 04e8 0a00 044e 0x0030: fc3b e08e 7b39 4502
In tcp_ctlinput in following code cannot be called ntohs(ip->ip_len) beacause ip->ip_len has already been swapped in icmp_input. Tested and works for me vita mtu = ntohs(icp->icmp_nextmtu); /* * If no alternative MTU was * proposed, try the next smaller * one. */ if (!mtu) mtu = ip_next_mtu(ntohs(ip->ip_len), 1);
State Changed From-To: open->patched Fixed in netinet/tcp_subr.c rev. 1.233. MFC after 3 days.
State Changed From-To: patched->closed MFC to all affected branches is done.