Bug 187094 - [nfs] [patch] Support DHCP option for jumbo frames on PXE/BOOTP interface
Summary: [nfs] [patch] Support DHCP option for jumbo frames on PXE/BOOTP interface
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.3-RELEASE
Hardware: Any Any
: Normal Affects Some People
Assignee: Ian Lepore
URL:
Keywords: patch
Depends on:
Blocks:
 
Reported: 2014-02-26 20:10 UTC by Robert Blayzor
Modified: 2017-12-17 07:14 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Robert Blayzor 2014-02-26 20:10:00 UTC
Upon PXE booting FreeBSD kernel trying to set a higher MTU via DHCP.  

Using "option interface-mtu ...."  in ISC-DHCP, MTU never properly gets set to anything higher than 1500.

Attempting to change the MTU in rc startup scripts only appears to cosmetically change..


dev@rns0 [~] cat /etc/rc.conf | grep vmx
ifconfig_vmx0="mtu 9000"

dev@rns0 [~] ifconfig vmx0
vmx0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
	options=60039b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,TSO6,RXCSUM_IPV6,TXCSUM_IPV6>
	ether 00:50:56:a7:03:22
	inet6 fe80::250:56ff:fea7:322%vmx0 prefixlen 64 scopeid 0x1
	inet 10.0.0.20 netmask 0xffff0000 broadcast 10.0.255.255
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
	media: Ethernet autoselect
	status: active


dev@rns0 [~] sudo ping -D -c 1 -s 1600 10.0.0.200
PING 10.0.0.200 (10.0.0.200): 1600 data bytes
ping: sendto: Message too long

dev@rns0 [~] sudo ping -D -c 1 -s 1472 10.0.0.200
PING 10.0.0.200 (10.0.0.200): 1472 data bytes
1480 bytes from 10.0.0.200: icmp_seq=0 ttl=64 time=0.064 ms

--- 10.0.0.200 ping statistics ---
1 packets transmitted, 1 packets received, 0.0% packet loss

Fix: 

None.

Attempting to set MTU via rc.conf startup does not work.
How-To-Repeat: Using dhclient or PXE boot configure DHCP server to use MTU > 1500.  ie:

 option interface-mtu 9000;


System boots and gets IP address properly but MTU cannot be set.
Comment 1 Robert Blayzor 2014-02-27 01:47:52 UTC
This may not be a bug, but rather lack of functionality.

DHCP option tag 26 - Interface MTU Size


Target to fix:  sys/nfs/bootp_subr.c

Working on patch.
Comment 2 Robert Blayzor 2014-02-27 18:24:11 UTC
I would like to submit the following patch that corrects two major issues:

-  Look for and set MTU according to  DHCP option tag 26 for Interface MTU. 
   This allows booting interface to be used on a jumbo frame enabled network.

   Currently this is broken and cannot be overridden or set later.

-  Remove ancient proxy ARP setting.  Currently it is more of a problem that 
   a host is multi-homed and booting interface network may not have a router.
   (ie: default route is on another interface/network)

   The default today is to use DHCP router tag and if no router tag is
   supplied by DHCP, default route will be set to hosts self.  Not supplying
   a route is a completely valid option, especially since many hosts
   multi-homed.


Patch is against 10.0-RELEASE

Index: bootp_subr.c
===================================================================
--- bootp_subr.c	(revision 261846)
+++ bootp_subr.c	(working copy)
@@ -196,6 +196,8 @@
 #define TAG_HOSTNAME	 12  /* Client host name */
 #define TAG_ROOT	 17  /* Root path */

+#define TAG_INTF_MTU     26  /* Interface MTU Size (RFC2132) */
+
 /* DHCP specific tags */
 #define TAG_OVERLOAD	 52  /* Option Overload */
 #define TAG_MAXMSGSIZE   57  /* Maximum DHCP Message Size */
@@ -229,6 +231,8 @@
 #endif

 static char bootp_cookie[128];
+static unsigned int bootp_ifmtu = 0;
+
 static struct socket *bootp_so;
 SYSCTL_STRING(_kern, OID_AUTO, bootp_cookie, CTLFLAG_RD,
 	bootp_cookie, 0, "Cookie (T134) supplied by bootp server");
@@ -1030,7 +1034,22 @@
 		return (0);
 	}

-	printf("Adjusted interface %s\n", ifctx->ireq.ifr_name);
+	printf("Adjusted interface %s", ifctx->ireq.ifr_name);
+
+	/* Do BOOTP interface options */
+	if (bootp_ifmtu != 0) {
+	        printf(" (MTU=%d", bootp_ifmtu);
+	        if (bootp_ifmtu > 1514)
+	                printf("/JUMBO");
+	        printf(")");
+
+		ifr->ifr_mtu = bootp_ifmtu;
+		error = ifioctl(bootp_so, SIOCSIFMTU, (caddr_t) ifr, td);
+		if (error != 0)
+		        panic("%s: SIOCSIFMTU, error=%d", __func__, error);
+	}
+        printf("\n");
+
 	/*
 	 * Do enough of ifconfig(8) so that the chosen interface
 	 * can talk to the servers.  (just set the address)
@@ -1053,7 +1072,12 @@

 	/* Add new default route */

-	if (ifctx->gotgw != 0 || gctx->gotgw == 0) {
+        /*  Only set default route if we received one in the request.
+            Proxy ARP considered obsolete.  More valid to NOT set
+            a router in request as the host may be multi-homed and
+            gateway may not be on this interface.
+	*/
+	if (ifctx->gotgw != 0 || gctx->gotgw != 0) {
 		clear_sinaddr(&defdst);
 		clear_sinaddr(&defmask);
 		/* XXX MRT just table 0 */
@@ -1518,6 +1542,11 @@
 		p[i] = '\0';
 	}

+        p = bootpc_tag(&gctx->tag, &ifctx->reply, ifctx->replylen,
+		       TAG_INTF_MTU);
+        if (p != NULL) {
+	        bootp_ifmtu = (((unsigned char)p[0] << 8) + (unsigned char)p[1]);
+	}

 	printf("\n");
Comment 3 commit-hook freebsd_committer 2016-03-21 14:52:54 UTC
A commit references this bug:

Author: ian
Date: Mon Mar 21 14:51:52 UTC 2016
New revision: 297149
URL: https://svnweb.freebsd.org/changeset/base/297149

Log:
  If the dhcp server provides an interface-mtu option, parse the value and
  set that mtu on the interface.

  These changes are based on the patch submitted by Robert Blayzor in the
  PR, but I changed things around a bit, so the blame for any mistakes
  belongs to me.

  PR:		187094

Changes:
  head/sys/nfs/bootp_subr.c
Comment 4 Ian Lepore freebsd_committer 2016-03-21 15:33:14 UTC
I have committed the parts of this related to handling the interface-mtu option (and also some changes to libstand and loader(8) inspired by this PR).

I left out the part of the submitted patch related to changing the default route and the comments about proxy arp, because it's not clear to me that changing the gctx->gotgw != 0 logic to == 0 is correct, especially after researching the history of how the code in that area has evolved over time (but I'm willing to be convinced; I'm not a networking expert).
Comment 5 Robert Blayzor 2016-03-21 15:41:19 UTC
The problem with setting the gateway is on multi-homed systems where setting the default gateway to self if none is given by the DHCP server is not the desired result. The DHCP server is capable of sending a default gateway or not, both valid. What may NOT be valid is that the gateway is set self.

Case in point, the diskless boot/bootp network may be a private network with no gateway at all. The servers us a different NIC gateway on another interface.

In our scenario our diskless boot network is such where it's just DHCP, NIS and NFS private servers with no gateway. Each server has a NIC with public facing/non-private NIC which gets gateway.

The problem we run into now, is that when our servers boot, we do not send a gateway from DHCP. When this happens the servers will boot with self as a gateway. From that point on, it's not valid to "reset" the gateway via another network without some sort of hacks in the start-up script. 

Therefore my comment about "it's perfectly valid to send, or not send default gateway in DHCP". I'm not convinced where setting the gateway to self would be a common practice more so than not sending or setting a gateway at all.


BTW-- I also have a more recent patch for this that is cleaned up against 10.3 that I can submit for review.
Comment 6 Roman Bogorodskiy freebsd_committer 2016-03-21 15:46:15 UTC
Ian, while not directly related, I'm wondering if you could take a look at: https://reviews.freebsd.org/D5675 ?
Comment 7 Ian Lepore freebsd_committer 2016-03-21 15:56:08 UTC
(In reply to Robert Blayzor from comment #5)

But the logic change seems incorrect... 

The current logic says "if this interface got a gateway option, set it as default route; or if no interfaces got a gateway option, then set the default route to zero".

By changing the == to != the logic becomes "if this interface got a gateway or any other interface got a gateway, then set the default route to this interface's gateway" (which might be zero in the case where gctx->gotgw is non-zero because some other interface had a gateway).

The existing logic is strange and twisted but seems to basically result in "the default route is always set to either zero or the gateway option received for one of the interfaces", and if multiple interfaces have a gateway option, I have no idea what the default route would be left at.  The last one received maybe?
Comment 8 Robert Blayzor 2016-03-21 17:32:24 UTC
The logic today is that if the host receives no gateway from the DHCP server it installs a default route pointing to it's own IP address.

It would seem more likely that if no gateway is received from DHCP, then NO default route be added.


In my patch, I changed it to do exactly that. At least thats how we worked around the problem of the self default being added.


The only way around the problem above (without the patch) is to go in and manually delete the route and add the correct static default.

ANother work around (but extremely ugly) is to send a bogus default route back to the client. ie: A next hop that is not on it's network. Doing that the client seems to fail to install the default route, though this seems like an ugly workaround rather than a fix.
Comment 9 Robert Blayzor 2016-03-21 17:48:11 UTC
Perhaps what we need to be looking at changing is around line 1562:


   if (ifctx->gotgw == 0) {
      /* Use proxyarp */
      ifctx->gw.sin_addr.s_addr = ifctx->myaddr.sin_addr.s_addr;
   }


It would make more sense to remove that completely.

Unless an there is an argument for this. It seems unlikely. If someone wants to use "self", one could actually set that option back from DHCP. Seems far more useful to not install a route at all if one is not being sent.
Comment 10 Ian Lepore freebsd_committer 2016-03-21 22:28:58 UTC
aha, that's the clue I was missing.  I thought the gw field was getting zeroed and I missed the place where it was assigned to its own address.

I'm a bit scared to remove the proxy arp thing because there is plenty of recent advice and how-to info out there on configuring proxy arp on modern equipment; I don't want to eliminate a feature some people may be using.

What I don't understand now is what's the harm of letting it make the self-referential default route entries?  I set my dhcp server to not deliver a gateway option, added a printf to bootp_subr to verify that it was setting a default route to the self ip address, and I added defaultrouter=<ip> to my rc.conf, and the system still boots just fine, installs the static default route when running the rc scripts, and in general seems to work normally.  Basically I can't tell any difference between letting the code currently checked in run, or commenting out the whole block so that no route ever gets installed.  If I leave the defaultrouter= out of rc.conf that works as expected too: I can still access the lan, but nothing outside of the local network.

But all the hardware I have to test this on right now only has a single NIC.  Is there something about having multiple interfaces that makes it stop working?
Comment 11 Robert Blayzor 2016-03-21 23:30:53 UTC
In all of my testing and deployments whenever I leave out a gateway in DHCP (no default route) and a default route is added via this proxy-arp method, the "defaultrouter=" in rc.conf never gets added because a default entry already exists.

It very may well be because you're attempting to add a default route on another NIC other than the BOOTP interface.

I know it most certainly does not in our scenario and the only way I was able to work around it was removing the proxy-arp entry from adding that default or setting DHCP to send a bogus gw that was out of the subnet so it failed to add it. Then and only then would the "defaultrouter=" in rc.conf get added for my other network.
Comment 12 Ian Lepore freebsd_committer 2016-03-22 01:16:38 UTC
(In reply to Robert Blayzor from comment #11)

It is on the bootp NIC (the one and only NIC on the system) for me.

Could this be a version difference?  I'm testing on 11-current, are you on 10-something?
Comment 13 Robert Blayzor 2016-03-22 11:42:13 UTC
I'm comparing/using 10-STABLE since we're in a production environment. I don't think this code has changed much at all in recent years that I've seen.

I know that in 10-STABLE I still experience the same issue.

For example, I have two NICs:

vmx0 - BOOTP interface/DHCP backend network
vmx1 - public facing


vmx0 gets DHCP host information, bootp info, cookie etc for diskless boot. NO default route/gateway.

vmx1 get configured from rc.* scripts, static with "defaultrouter=".


In a stock install system will boot normally but my default route is NOT correct as it will end up set to self, regardless of what I have defaultrouter set to.


The only way around that was to use one of the previously two mentioned methods.


IMHO- It makes more sense to NOT set a default at all if one is not being sent by the server. If you want to set the default gateway to "self", couldn't one just set that option via DHCP?
Comment 14 Robert Blayzor 2016-03-22 11:57:49 UTC
Another possible idea is to change the behavior of "defaultrouter" in rc.d scripts to remove any existing default before attempting to add another?
Comment 15 Ian Lepore freebsd_committer 2016-03-27 22:17:39 UTC
I set up a box with multiple NICs and did some experimenting, and was finally able to recreate the situation where defaultrouter=<ip> in rc.conf would error out and get ignored because of the route installed by the bootp code.  It turns out that it depends in part on what order your interfaces appear in the system list versus which one is chosen to mount the rootfs.

When there are multiple interfaces it was installing the self-ip default route for each one, but of course only the first one it installed actually worked, the rest got errors (which were ignored so booting would continue).  Code in nfs_diskless resets the IP on the chosen interface.  If that happens to be the one whose self-ip default route got successfully installed, that ends up deleting the default route, then (due to a different bug [1]) the default route didn't get reinstalled by nfs_diskless, and that left no default route in the system when it got to the rc processing.  That's why I was initially not seeing an error there.  When I changed things around so that a different interface was first in the list and its self-ip became the default route I started seeing the same errors as reported in this PR.

Another bug in this area of the code is that bootpc_init() would always choose the first interface that got an ip address to use for mounting the rootfs, not the interface that received the rootpath option.  In theory, that could leave it trying to use an interface that can't reach the server providing the rootfs.  Instead of assuming that any random interface that got an IP will work, the code should assume that the interface that delivered the rootpath option is the one that can reach the server providing the data (either directly or via a router option provided for that same interface). 

[1] That different bug is that bootpc_init() is not copying any valid data into the nfsv3_diskless.mygateway field.
Comment 16 commit-hook freebsd_committer 2016-03-27 22:22:19 UTC
A commit references this bug:

Author: ian
Date: Sun Mar 27 22:21:35 UTC 2016
New revision: 297323
URL: https://svnweb.freebsd.org/changeset/base/297323

Log:
  Set ifctx->gotrootpath=1 only when the root path came from the dhcp/bootp
  server (and not when it came from a fallback method such as the ROOTDEVNAME
  option).  This makes the code in bootpc_init() choose the first interface
  that provided a rootpath name.  Previously it was choosing the first
  interface that got an IP address, which could be on a different and
  potentially unreachable subnet than the server providing the rootfs.

  If the rootpath name actually does come from a fallback source, then the
  code continues to use the first interface in the list that got configured.

  Note that this wasn't directly reported in the PR cited below, but was
  discovered while working on that PR.

  PR:		187094

Changes:
  head/sys/nfs/bootp_subr.c
Comment 17 commit-hook freebsd_committer 2016-03-27 22:59:25 UTC
A commit references this bug:

Author: ian
Date: Sun Mar 27 22:58:56 UTC 2016
New revision: 297325
URL: https://svnweb.freebsd.org/changeset/base/297325

Log:
  Stop setting the default route to the IP of the interface itself when the
  bootp/dhcp server doesn't provide a router option.  Doing so prevents
  setting defaultrouter=<ip> in rc.conf (it fails because there's already
  a bogus default route installed by bootpc_init).

  When an admin wants to use this style of proxy arp on an interface, the
  proper mechanism is to set the "use-lease-addr-for-default-route" flag
  in the dhcp server config.  That causes the lease address to be delivered
  in the routers option, and the normal handling of the routers option will
  then install the self-ip as the default route.

  PR:		187094

Changes:
  head/sys/nfs/bootp_subr.c
Comment 18 Ian Lepore freebsd_committer 2016-03-28 02:13:35 UTC
I forgot to cite the PR in the last commit related to this stuff, r297326.  It fixes the problem I mentioned in an earlier comment about trying to install a default route for every interface.  Now it only installs the default route associated with the interface it chooses to use for the rootfs mount.

At this point I think both the problems fixed in the original submitted patch have been dealt with, and this can be closed if it works okay for everyone else.
Comment 19 Robert Blayzor 2016-04-26 15:25:19 UTC
Was there a code/patch commited for the original report and use the DHCP hint for intf-MTU and set accordingly?
Comment 20 Ian Lepore freebsd_committer 2016-04-26 15:37:04 UTC
(In reply to Robert Blayzor from comment #19)

Yes, r297149 added handling for that to the kernel bootp code, and r297150 + r297151 made the equivelent changes to libstand and loader(8).

What I have not done yet is MFC'd any of the changes (the code freeze was still in effect for the 10-stable branch at the time).  I'll see if I can get that done sometime in the next few days (MFC to 10 is easy, 9-stable is harder and I may not do that unless someone asks for it).
Comment 21 Robert Blayzor 2016-04-26 19:18:34 UTC
MFC to stable/10 would be much appreciated.
Comment 22 commit-hook freebsd_committer 2016-05-31 17:02:02 UTC
A commit references this bug:

Author: ian
Date: Tue May 31 17:01:55 UTC 2016
New revision: 301056
URL: https://svnweb.freebsd.org/changeset/base/301056

Log:
  MFC r297147, r297148, r297149, r297150, r297151:

    Make both the loader and kernel use the interface-mtu option if the
    dhcp server provides it.  Made up of these (semi-)related changes...

    [kernel...] If the dhcp server provides an interface-mtu option, parse
    the value and set that mtu on the interface.

    [libstand...]

    Garbage collect the bswap routines from libstand, use sys/endian.h.

    If the dhcp server delivers an interface-mtu option, parse it and store
    the value in a new global intf_mtu for use by the application.

    [loader...]

    If the dhcp server provided an interface-mtu option, transcribe the value
    to the boot.netif.mtu env var, which will be picked up by pre-existing code
    in nfs_mountroot() and used to configure the interface accordingly.

  PR:		187094

Changes:
_U  stable/10/
  stable/10/lib/libstand/Makefile
  stable/10/lib/libstand/bootp.c
  stable/10/lib/libstand/bootp.h
  stable/10/lib/libstand/bswap.c
  stable/10/lib/libstand/globals.c
  stable/10/lib/libstand/net.h
  stable/10/lib/libstand/stand.h
  stable/10/sys/boot/common/dev_net.c
  stable/10/sys/boot/i386/libi386/pxe.c
  stable/10/sys/boot/libstand32/Makefile
  stable/10/sys/boot/userboot/libstand/Makefile
  stable/10/sys/nfs/bootp_subr.c
Comment 23 commit-hook freebsd_committer 2016-05-31 17:16:05 UTC
A commit references this bug:

Author: ian
Date: Tue May 31 17:15:57 UTC 2016
New revision: 301057
URL: https://svnweb.freebsd.org/changeset/base/301057

Log:
  MFC r297323,r297324, r297325, r297326:

    Set only one default route for nfsroot mount, the one associated with the
    interface that will be used to mount the rootfs (and never a self-ip proxy
    arp route).  Made up of the following related changes...

    Set ifctx->gotrootpath=1 only when the root path came from the dhcp/bootp
    server (and not when it came from a fallback method such as the ROOTDEVNAME
    option).  This makes the code in bootpc_init() choose the first interface
    that provided a rootpath name.  Previously it was choosing the first
    interface that got an IP address, which could be on a different and
    potentially unreachable subnet than the server providing the rootfs.

    If the rootpath name actually does come from a fallback source, then the
    code continues to use the first interface in the list that got configured.
    Note that this wasn't directly reported in the PR cited below, but was
    discovered while working on that PR.

    Switch bootpc_adjust_interface() from returning int to void.  Its one caller
    doesn't check for errors, and all the errors that can happen result in it
    calling panic anyway, except for one that's really more of a warning (and
    is going to disappear on an upcoming commit anyway).

    Stop setting the default route to the IP of the interface itself when the
    bootp/dhcp server doesn't provide a router option.  Doing so prevents
    setting defaultrouter=<ip> in rc.conf (it fails because there's already
    a bogus default route installed by bootpc_init).

    When an admin wants to use this style of proxy arp on an interface, the
    proper mechanism is to set the "use-lease-addr-for-default-route" flag
    in the dhcp server config.  That causes the lease address to be delivered
    in the routers option, and the normal handling of the routers option will
    then install the self-ip as the default route.

    Do not try to install a default route for each interface found, because
    only the first one will actually work and all the others just result in
    errors (which would get printed but otherwise ignored).

    Instead, wait until we make a choice of which interface will be used to
    mount the rootfs, and install the default route associated with it (if any).
    After doing the md_mount() call to obtain the needed info, remove the
    default route again, and transcribe the route info into the nfs_diskless
    structure.  If the system eventually chooses to mount the nfs rootfs, the
    default route will be installed again when the nfs_diskless code
    re-initializes the interface.

  PR:		187094

Changes:
_U  stable/10/
  stable/10/sys/nfs/bootp_subr.c
Comment 24 Robert Blayzor 2016-08-04 17:29:49 UTC
Were the fixes for this PR (jumbo frames) and the fix for setting the default gateway on PXE boot MFC'd and have those made it into 10-STABLE ?
Comment 25 Ian Lepore freebsd_committer 2017-12-10 19:49:48 UTC
(In reply to Robert Blayzor from comment #24)

Yes, the changes have all been MFC'd to 10.  I'm sorry I missed this question back when you first asked it.
Comment 26 vali gholami 2017-12-17 07:14:51 UTC
MARKED AS SPAM