Bug 194238 - [tcp] Ping attempted with MTU 9000 transmits fragmented packets of size 1500
Summary: [tcp] Ping attempted with MTU 9000 transmits fragmented packets of size 1500
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.0-RELEASE
Hardware: amd64 Any
: --- Affects Many People
Assignee: freebsd-net (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-10-08 11:59 UTC by Praveen
Modified: 2016-02-25 04:39 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Praveen 2014-10-08 11:59:17 UTC
Setup Details:

Machine1:

OS:  FreeBSD 10.0

We are using below Emulex skyhawk adapter.

oce3@pci0:4:0:3:        class=0x020000 card=0xe8fe10df chip=0x072010df rev=0x10 hdr=0x00
    vendor     = 'Emulex Corporation'
    device     = 'OneConnect NIC (Skyhawk)'
    class      = network
    subclass   = ethernet
    bar   [10] = type Prefetchable Memory, range 64, base rxfa700000, size 16384, enabled
    bar   [18] = type Prefetchable Memory, range 64, base rxfa620000, size 131072, enabled
    bar   [20] = type Prefetchable Memory, range 64, base rxfa600000, size 131072, enabled


Machine2:
OS:  FreeBSD 10.0

We are using below Emulex skyhawk adapter.
oce1@pci0:5:0:1:        class=0x020000 card=0xe80010df chip=0x072010df rev=0x10 hdr=0x00
    vendor     = 'Emulex Corporation'
    device     = 'OneConnect NIC (Skyhawk)'
    class      = network
    subclass   = ethernet
    bar   [10] = type Prefetchable Memory, range 64, base rxfa780000, size 16384, enabled
    bar   [18] = type Prefetchable Memory, range 64, base rxfa720000, size 131072, enabled
    bar   [20] = type Prefetchable Memory, range 64, base rxfa700000, size 131072, enabled


Both the adapters are connected directly without switch.

Both the interfaces are configured with mtu=9000 

When we send jumbo frames from machine1 to machine2, packets are getting fragmented to 1500bytes.

To reproduce, we can use "ping -s 9000 <peer ipaddr> " on Machine1.

tcpdump on Machine1 shows packets of length 1500 bytes.

We are observing this behavior even with other adapters like Intel

igb0@pci0:2:0:0:        class=0x020000 card=0x152115d9 chip=0x15218086 rev=0x01 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'I350 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
    bar   [10] = type Memory, range 32, base rxfbc20000, size 131072, enabled
    bar   [18] = type I/O Port, range 32, base rxe020, size 32, enabled
    bar   [1c] = type Memory, range 32, base rxfbc44000, size 16384, enabled
Comment 1 Alexander V. Chernikov freebsd_committer freebsd_triage 2014-10-20 12:44:27 UTC
can you show 'route -n get <address>' on both machines?
This is the typical problem which can happen when you first configure addreses (so on-interface routes with default MTU gets installed) and then you configure MTU.
In that case interface MTU will be 9k, but interface routes MTU will be still 1500.

You can either manuall fix this by issuing route modify <prefix> -mtu 9000 or
to configure MTU at startup along with interface addresses.
Comment 2 Praveen 2014-11-05 10:43:51 UTC
Thanks. Your suggestion has worked.

In Linux, we don't need to configure route if we change mtu. 
Is it a bug in Freebsd ?
Comment 3 Alexander V. Chernikov freebsd_committer freebsd_triage 2014-11-05 10:45:12 UTC
I'd rather say that it is a historical "feature".
I'm going to change this behavior to be more user-friendly soon.
Comment 4 commit-hook freebsd_committer freebsd_triage 2014-11-17 01:05:42 UTC
A commit references this bug:

Author: melifaro
Date: Mon Nov 17 01:05:32 UTC 2014
New revision: 274611
URL: https://svnweb.freebsd.org/changeset/base/274611

Log:
  Finish r274175: do control plane MTU tracking.

  Update route MTU in case of ifnet MTU change.
  Add new RTF_FIXEDMTU to track explicitly specified MTU.

  Old behavior:
  ifconfig em0 mtu 1500->9000 -> all routes traversing em0 do not change MTU.
  User has to manually update all routes.
  ifconfig em0 mtu 9000->1500 -> all routes traversing em0 do not change MTU.
  However, if ip[6]_output finds route with rt_mtu > interface mtu, rt_mtu
  gets updated.

  New behavior:
  ifconfig em0 mtu 1500->9000 -> all interface routes in all fibs gets updated
  with new MTU unless RTF_FIXEDMTU flag set on them.
  ifconfig em0 mtu 9000->1500 -> all routes in all fibs gets updated with new
  MTU unless RTF_FIXEDMTU flag set on them AND rt_mtu is less than ifp mtu.

  route add ... -mtu XXX automatically sets RTF_FIXEDMTU flag.
  route change .. -mtu 0 automatically removes RTF_FIXEDMTU flag.

  PR:		194238
  MFC after:	1 month
  CR:		D1125

Changes:
  head/sbin/route/route.c
  head/sys/net/if.c
  head/sys/net/route.c
  head/sys/net/route.h
  head/sys/netinet/ip_output.c
  head/sys/netinet6/ip6_output.c
Comment 5 Glen Barber freebsd_committer freebsd_triage 2015-07-08 18:32:13 UTC
To originators/assignees of this PR:

A commit to the tree references this PR, however the PR is still in a non-closed state.

Please review this PR and close as appropriate, or if closing the PR requires a merge to stable/10, please let re@ know as soon as possible.

Thank you.

Glen