Bug 207208 - ping has a problem with fragmented replies
Summary: ping has a problem with fragmented replies
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 10.2-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-bugs mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-02-15 11:26 UTC by Jasper Siepkes
Modified: 2016-03-03 08:58 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jasper Siepkes 2016-02-15 11:26:39 UTC
When using ping with a packet size larger then the interfaces MTU (in my example 1500) which leads to a fragemented reply ping will always say it didn't receive a response. However with tcpdump I can see the replies being received.

Single ping request with a total packet size of a 1000 bytes (within MTU of interface):
---8<------------------
$ ping -c 1 -s 972 80.113.23.178
PING 80.113.23.178 (80.113.23.178): 972 data bytes
980 bytes from 80.113.23.178: icmp_seq=0 ttl=245 time=22.851 ms

--- 80.113.23.178 ping statistics ---
1 packets transmitted, 1 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 22.851/22.851/22.851/0.000 ms
---8<------------------

Resulting tcpdump output:
---8<------------------
# tcpdump -nvvvs 9000 -i igb0 icmp
12:13:48.708408 IP (tos 0x0, ttl 64, id 63061, offset 0, flags [none], proto ICMP (1), length 1000)
    192.168.93.148 > 80.113.23.178: ICMP echo request, id 50210, seq 0, length 980
12:13:48.731234 IP (tos 0x0, ttl 245, id 44677, offset 0, flags [none], proto ICMP (1), length 1000)
    80.113.23.178 > 192.168.93.148: ICMP echo reply, id 50210, seq 0, length 980
---8<------------------

Single ping request with a requested payload size of 2500 bytes (which will result in a packet larger then MTU of interface):

---8<------------------
ping -c 1 -s 2500 80.113.23.178
PING 80.113.23.178 (80.113.23.178): 2500 data bytes

--- 80.113.23.178 ping statistics ---
1 packets transmitted, 0 packets received, 100.0% packet loss
---8<------------------

Resulting tcpdump output:
---8<------------------
# tcpdump -nvvvs 9000 -i igb0 icmp
12:14:45.394397 IP (tos 0x0, ttl 64, id 63325, offset 0, flags [+], proto ICMP (1), length 1500)
    192.168.93.148 > 80.113.23.178: ICMP echo request, id 54562, seq 0, length 1480
12:14:45.394407 IP (tos 0x0, ttl 64, id 63325, offset 1480, flags [none], proto ICMP (1), length 1048)
    192.168.93.148 > 80.113.23.178: ip-proto-1
12:14:45.418429 IP (tos 0x0, ttl 245, id 23347, offset 0, flags [+], proto ICMP (1), length 1500)
    80.113.23.178 > 192.168.93.148: ICMP echo reply, id 54562, seq 0, length 1480
12:14:45.418436 IP (tos 0x0, ttl 245, id 23347, offset 1480, flags [none], proto ICMP (1), length 1048)
    80.113.23.178 > 192.168.93.148: ip-proto-1
---8<------------------

Beware that some hosts will just truncate the echo reply even if you send a large packet. 'www.nu.nl' does this for example (probably to prevent getting dos'ed). The echo reply is then not fragmented and ping works fine. ping only seems to have problems with fragement replies. I've also tested this on Linux, Mac OS X and OpenBSD and they don't seem to have this problem.

I've observed this behavior on 10.2-RELEASE-p12.
Comment 1 Maxim Konovalov freebsd_committer 2016-02-15 17:38:19 UTC
It works for me on 10.3-PRE:

# ping -c 1 -s 2500 80.113.23.178
PING 80.113.23.178 (80.113.23.178): 2500 data bytes
2508 bytes from 80.113.23.178: icmp_seq=0 ttl=245 time=59.502 ms

--- 80.113.23.178 ping statistics ---
1 packets transmitted, 1 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 59.502/59.502/59.502/0.000 ms

It looks like you are behind NAT.  Are you sure it behaves well with icmp fragments?
Comment 2 Maxim Konovalov freebsd_committer 2016-02-15 17:42:27 UTC
Also it makes sense to check the following sysctls for any non-standard settings

net.inet.ip.maxfragpackets
net.inet.ip.maxfragsperpacket

and network stats for errors:

netstat -sp ip
netstat -sp icmp
Comment 3 Jasper Siepkes 2016-02-16 09:36:57 UTC
Thanks for the prompt response Maxim.

I did some checks:

# sysctl net.inet.ip.maxfragsperpacket net.inet.ip.maxfragpackets
net.inet.ip.maxfragsperpacket: 16
net.inet.ip.maxfragpackets: 8192

Those are the defaults I believe. Also double checked any modifications to ICMP and IP related stuff in loader.conf or sysctl.conf. 

----8<-----------------------
# netstat -sp ip
ip:
	5136257 total packets received
	0 bad header checksums
	0 with size smaller than minimum
	0 with data size < data length
	0 with ip length > max ip packet size
	0 with header length < data size
	0 with data length < header length
	0 with bad options
	0 with incorrect version number
	0 fragments received
	0 fragments dropped (dup or out of space)
	0 fragments dropped after timeout
	0 packets reassembled ok
	254049 packets for this host
	12 packets for unknown/unsupported protocol
	0 packets forwarded (0 packets fast forwarded)
	0 packets not forwardable
	0 packets received for unknown multicast group
	0 redirects sent
	702407 packets sent from this host
	0 packets sent with fabricated ip header
	0 output packets dropped due to no bufs, etc.
	0 output packets discarded due to no route
	31 output datagrams fragmented
	62 fragments created
	22 datagrams that can't be fragmented
	0 tunneling packets that can't find gif
	0 datagrams with bad address in header
# netstat -sp icmp
icmp:
	0 calls to icmp_error
	0 errors not generated in response to an icmp message
	0 messages with bad code fields
	0 messages less than the minimum length
	0 messages with bad checksum
	0 messages with bad length
	0 multicast echo requests ignored
	0 multicast timestamp requests ignored
	Input histogram:
		echo reply: 1
		destination unreachable: 7282
		time exceeded: 1
	0 message responses generated
	0 invalid return addresses
	0 no return routes
	ICMP address mask responses are disabled
----8<-----------------------

I ran the tests again so the single 'echo reply' received is the normal size and the "time exceeded" is the one with the larger payload.

The host I used is behind NAT (PAT) so that could indeed be a problem. However I just now also did the test on another host which isn't behind NAT (pinged another host in its network segment) and he also had the problem.

I will install a vanilla VM today and do some tests to see if this really is an issue or I messed up somewhere else in the config.
Comment 4 Maxim Konovalov freebsd_committer 2016-02-16 10:23:13 UTC
Hello,

> # netstat -sp ip
> ip:
>         0 fragments received
{...]
>         22 datagrams that can't be fragmented
>
[...]

The above looks suspicious.  Here is what it should be:

# netstat -sz >/dev/null
# netstat -sp ip | grep frag
        0 fragments received
        0 fragments dropped (dup or out of space)
        0 fragments dropped after timeout
        0 output datagrams fragmented
        0 fragments created
        0 datagrams that can't be fragmented
# ping -qc 1 -s 2500 80.113.23.178
PING 80.113.23.178 (80.113.23.178): 2500 data bytes

--- 80.113.23.178 ping statistics ---
1 packets transmitted, 1 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 59.983/59.983/59.983/0.000 ms
# netstat -sp ip | grep frag
        2 fragments received
        0 fragments dropped (dup or out of space)
        0 fragments dropped after timeout
        1 output datagram fragmented
        2 fragments created
        0 datagrams that can't be fragmented

To test ip fragmentation withoug NAT you can simple run

ping -s 32000 -c1 127.0.0.1

and check stats above.

I still think that your NAT is culprit.

-- 
Maxim Konovalov
Comment 5 Maxim Konovalov freebsd_committer 2016-03-03 07:57:16 UTC
Hi Jasper,

would you mind if I close the PR until we have more information about this issue?

-- maxim
Comment 6 Jasper Siepkes 2016-03-03 08:33:23 UTC
Hi Maxim,

Sorry for the late reply; I was out with a bad case of the flu for a while so I have a bit of a backlog.

I did some tests but I haven't been able to reproduce any of my findings in a more controlled (VM) setup. On top of that it also turned out our ISP had a core router with a broken NIC so that also added some noise.

So I think it must be a network (NAT) issue like you suggested. So we can close this PR and i'll keep an eye on problems like these and can always reopen if I suspect something is non-NAT weirdness ;-).
Comment 7 Maxim Konovalov freebsd_committer 2016-03-03 08:58:03 UTC
Jasper,

Thanks for the response, hope you are ok now.

Feel free to re-open the ticket if you get more info in future.

Best regards,

Maxim