Upon PXE booting FreeBSD kernel trying to set a higher MTU via DHCP. Using "option interface-mtu ...." in ISC-DHCP, MTU never properly gets set to anything higher than 1500. Attempting to change the MTU in rc startup scripts only appears to cosmetically change.. dev@rns0 [~] cat /etc/rc.conf | grep vmx ifconfig_vmx0="mtu 9000" dev@rns0 [~] ifconfig vmx0 vmx0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000 options=60039b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,TSO6,RXCSUM_IPV6,TXCSUM_IPV6> ether 00:50:56:a7:03:22 inet6 fe80::250:56ff:fea7:322%vmx0 prefixlen 64 scopeid 0x1 inet 10.0.0.20 netmask 0xffff0000 broadcast 10.0.255.255 nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> media: Ethernet autoselect status: active dev@rns0 [~] sudo ping -D -c 1 -s 1600 10.0.0.200 PING 10.0.0.200 (10.0.0.200): 1600 data bytes ping: sendto: Message too long dev@rns0 [~] sudo ping -D -c 1 -s 1472 10.0.0.200 PING 10.0.0.200 (10.0.0.200): 1472 data bytes 1480 bytes from 10.0.0.200: icmp_seq=0 ttl=64 time=0.064 ms --- 10.0.0.200 ping statistics --- 1 packets transmitted, 1 packets received, 0.0% packet loss Fix: None. Attempting to set MTU via rc.conf startup does not work. How-To-Repeat: Using dhclient or PXE boot configure DHCP server to use MTU > 1500. ie: option interface-mtu 9000; System boots and gets IP address properly but MTU cannot be set.
This may not be a bug, but rather lack of functionality. DHCP option tag 26 - Interface MTU Size Target to fix: sys/nfs/bootp_subr.c Working on patch.
I would like to submit the following patch that corrects two major issues: - Look for and set MTU according to DHCP option tag 26 for Interface MTU. This allows booting interface to be used on a jumbo frame enabled network. Currently this is broken and cannot be overridden or set later. - Remove ancient proxy ARP setting. Currently it is more of a problem that a host is multi-homed and booting interface network may not have a router. (ie: default route is on another interface/network) The default today is to use DHCP router tag and if no router tag is supplied by DHCP, default route will be set to hosts self. Not supplying a route is a completely valid option, especially since many hosts multi-homed. Patch is against 10.0-RELEASE Index: bootp_subr.c =================================================================== --- bootp_subr.c (revision 261846) +++ bootp_subr.c (working copy) @@ -196,6 +196,8 @@ #define TAG_HOSTNAME 12 /* Client host name */ #define TAG_ROOT 17 /* Root path */ +#define TAG_INTF_MTU 26 /* Interface MTU Size (RFC2132) */ + /* DHCP specific tags */ #define TAG_OVERLOAD 52 /* Option Overload */ #define TAG_MAXMSGSIZE 57 /* Maximum DHCP Message Size */ @@ -229,6 +231,8 @@ #endif static char bootp_cookie[128]; +static unsigned int bootp_ifmtu = 0; + static struct socket *bootp_so; SYSCTL_STRING(_kern, OID_AUTO, bootp_cookie, CTLFLAG_RD, bootp_cookie, 0, "Cookie (T134) supplied by bootp server"); @@ -1030,7 +1034,22 @@ return (0); } - printf("Adjusted interface %s\n", ifctx->ireq.ifr_name); + printf("Adjusted interface %s", ifctx->ireq.ifr_name); + + /* Do BOOTP interface options */ + if (bootp_ifmtu != 0) { + printf(" (MTU=%d", bootp_ifmtu); + if (bootp_ifmtu > 1514) + printf("/JUMBO"); + printf(")"); + + ifr->ifr_mtu = bootp_ifmtu; + error = ifioctl(bootp_so, SIOCSIFMTU, (caddr_t) ifr, td); + if (error != 0) + panic("%s: SIOCSIFMTU, error=%d", __func__, error); + } + printf("\n"); + /* * Do enough of ifconfig(8) so that the chosen interface * can talk to the servers. (just set the address) @@ -1053,7 +1072,12 @@ /* Add new default route */ - if (ifctx->gotgw != 0 || gctx->gotgw == 0) { + /* Only set default route if we received one in the request. + Proxy ARP considered obsolete. More valid to NOT set + a router in request as the host may be multi-homed and + gateway may not be on this interface. + */ + if (ifctx->gotgw != 0 || gctx->gotgw != 0) { clear_sinaddr(&defdst); clear_sinaddr(&defmask); /* XXX MRT just table 0 */ @@ -1518,6 +1542,11 @@ p[i] = '\0'; } + p = bootpc_tag(&gctx->tag, &ifctx->reply, ifctx->replylen, + TAG_INTF_MTU); + if (p != NULL) { + bootp_ifmtu = (((unsigned char)p[0] << 8) + (unsigned char)p[1]); + } printf("\n");
A commit references this bug: Author: ian Date: Mon Mar 21 14:51:52 UTC 2016 New revision: 297149 URL: https://svnweb.freebsd.org/changeset/base/297149 Log: If the dhcp server provides an interface-mtu option, parse the value and set that mtu on the interface. These changes are based on the patch submitted by Robert Blayzor in the PR, but I changed things around a bit, so the blame for any mistakes belongs to me. PR: 187094 Changes: head/sys/nfs/bootp_subr.c
I have committed the parts of this related to handling the interface-mtu option (and also some changes to libstand and loader(8) inspired by this PR). I left out the part of the submitted patch related to changing the default route and the comments about proxy arp, because it's not clear to me that changing the gctx->gotgw != 0 logic to == 0 is correct, especially after researching the history of how the code in that area has evolved over time (but I'm willing to be convinced; I'm not a networking expert).
The problem with setting the gateway is on multi-homed systems where setting the default gateway to self if none is given by the DHCP server is not the desired result. The DHCP server is capable of sending a default gateway or not, both valid. What may NOT be valid is that the gateway is set self. Case in point, the diskless boot/bootp network may be a private network with no gateway at all. The servers us a different NIC gateway on another interface. In our scenario our diskless boot network is such where it's just DHCP, NIS and NFS private servers with no gateway. Each server has a NIC with public facing/non-private NIC which gets gateway. The problem we run into now, is that when our servers boot, we do not send a gateway from DHCP. When this happens the servers will boot with self as a gateway. From that point on, it's not valid to "reset" the gateway via another network without some sort of hacks in the start-up script. Therefore my comment about "it's perfectly valid to send, or not send default gateway in DHCP". I'm not convinced where setting the gateway to self would be a common practice more so than not sending or setting a gateway at all. BTW-- I also have a more recent patch for this that is cleaned up against 10.3 that I can submit for review.
Ian, while not directly related, I'm wondering if you could take a look at: https://reviews.freebsd.org/D5675 ?
(In reply to Robert Blayzor from comment #5) But the logic change seems incorrect... The current logic says "if this interface got a gateway option, set it as default route; or if no interfaces got a gateway option, then set the default route to zero". By changing the == to != the logic becomes "if this interface got a gateway or any other interface got a gateway, then set the default route to this interface's gateway" (which might be zero in the case where gctx->gotgw is non-zero because some other interface had a gateway). The existing logic is strange and twisted but seems to basically result in "the default route is always set to either zero or the gateway option received for one of the interfaces", and if multiple interfaces have a gateway option, I have no idea what the default route would be left at. The last one received maybe?
The logic today is that if the host receives no gateway from the DHCP server it installs a default route pointing to it's own IP address. It would seem more likely that if no gateway is received from DHCP, then NO default route be added. In my patch, I changed it to do exactly that. At least thats how we worked around the problem of the self default being added. The only way around the problem above (without the patch) is to go in and manually delete the route and add the correct static default. ANother work around (but extremely ugly) is to send a bogus default route back to the client. ie: A next hop that is not on it's network. Doing that the client seems to fail to install the default route, though this seems like an ugly workaround rather than a fix.
Perhaps what we need to be looking at changing is around line 1562: if (ifctx->gotgw == 0) { /* Use proxyarp */ ifctx->gw.sin_addr.s_addr = ifctx->myaddr.sin_addr.s_addr; } It would make more sense to remove that completely. Unless an there is an argument for this. It seems unlikely. If someone wants to use "self", one could actually set that option back from DHCP. Seems far more useful to not install a route at all if one is not being sent.
aha, that's the clue I was missing. I thought the gw field was getting zeroed and I missed the place where it was assigned to its own address. I'm a bit scared to remove the proxy arp thing because there is plenty of recent advice and how-to info out there on configuring proxy arp on modern equipment; I don't want to eliminate a feature some people may be using. What I don't understand now is what's the harm of letting it make the self-referential default route entries? I set my dhcp server to not deliver a gateway option, added a printf to bootp_subr to verify that it was setting a default route to the self ip address, and I added defaultrouter=<ip> to my rc.conf, and the system still boots just fine, installs the static default route when running the rc scripts, and in general seems to work normally. Basically I can't tell any difference between letting the code currently checked in run, or commenting out the whole block so that no route ever gets installed. If I leave the defaultrouter= out of rc.conf that works as expected too: I can still access the lan, but nothing outside of the local network. But all the hardware I have to test this on right now only has a single NIC. Is there something about having multiple interfaces that makes it stop working?
In all of my testing and deployments whenever I leave out a gateway in DHCP (no default route) and a default route is added via this proxy-arp method, the "defaultrouter=" in rc.conf never gets added because a default entry already exists. It very may well be because you're attempting to add a default route on another NIC other than the BOOTP interface. I know it most certainly does not in our scenario and the only way I was able to work around it was removing the proxy-arp entry from adding that default or setting DHCP to send a bogus gw that was out of the subnet so it failed to add it. Then and only then would the "defaultrouter=" in rc.conf get added for my other network.
(In reply to Robert Blayzor from comment #11) It is on the bootp NIC (the one and only NIC on the system) for me. Could this be a version difference? I'm testing on 11-current, are you on 10-something?
I'm comparing/using 10-STABLE since we're in a production environment. I don't think this code has changed much at all in recent years that I've seen. I know that in 10-STABLE I still experience the same issue. For example, I have two NICs: vmx0 - BOOTP interface/DHCP backend network vmx1 - public facing vmx0 gets DHCP host information, bootp info, cookie etc for diskless boot. NO default route/gateway. vmx1 get configured from rc.* scripts, static with "defaultrouter=". In a stock install system will boot normally but my default route is NOT correct as it will end up set to self, regardless of what I have defaultrouter set to. The only way around that was to use one of the previously two mentioned methods. IMHO- It makes more sense to NOT set a default at all if one is not being sent by the server. If you want to set the default gateway to "self", couldn't one just set that option via DHCP?
Another possible idea is to change the behavior of "defaultrouter" in rc.d scripts to remove any existing default before attempting to add another?
I set up a box with multiple NICs and did some experimenting, and was finally able to recreate the situation where defaultrouter=<ip> in rc.conf would error out and get ignored because of the route installed by the bootp code. It turns out that it depends in part on what order your interfaces appear in the system list versus which one is chosen to mount the rootfs. When there are multiple interfaces it was installing the self-ip default route for each one, but of course only the first one it installed actually worked, the rest got errors (which were ignored so booting would continue). Code in nfs_diskless resets the IP on the chosen interface. If that happens to be the one whose self-ip default route got successfully installed, that ends up deleting the default route, then (due to a different bug [1]) the default route didn't get reinstalled by nfs_diskless, and that left no default route in the system when it got to the rc processing. That's why I was initially not seeing an error there. When I changed things around so that a different interface was first in the list and its self-ip became the default route I started seeing the same errors as reported in this PR. Another bug in this area of the code is that bootpc_init() would always choose the first interface that got an ip address to use for mounting the rootfs, not the interface that received the rootpath option. In theory, that could leave it trying to use an interface that can't reach the server providing the rootfs. Instead of assuming that any random interface that got an IP will work, the code should assume that the interface that delivered the rootpath option is the one that can reach the server providing the data (either directly or via a router option provided for that same interface). [1] That different bug is that bootpc_init() is not copying any valid data into the nfsv3_diskless.mygateway field.
A commit references this bug: Author: ian Date: Sun Mar 27 22:21:35 UTC 2016 New revision: 297323 URL: https://svnweb.freebsd.org/changeset/base/297323 Log: Set ifctx->gotrootpath=1 only when the root path came from the dhcp/bootp server (and not when it came from a fallback method such as the ROOTDEVNAME option). This makes the code in bootpc_init() choose the first interface that provided a rootpath name. Previously it was choosing the first interface that got an IP address, which could be on a different and potentially unreachable subnet than the server providing the rootfs. If the rootpath name actually does come from a fallback source, then the code continues to use the first interface in the list that got configured. Note that this wasn't directly reported in the PR cited below, but was discovered while working on that PR. PR: 187094 Changes: head/sys/nfs/bootp_subr.c
A commit references this bug: Author: ian Date: Sun Mar 27 22:58:56 UTC 2016 New revision: 297325 URL: https://svnweb.freebsd.org/changeset/base/297325 Log: Stop setting the default route to the IP of the interface itself when the bootp/dhcp server doesn't provide a router option. Doing so prevents setting defaultrouter=<ip> in rc.conf (it fails because there's already a bogus default route installed by bootpc_init). When an admin wants to use this style of proxy arp on an interface, the proper mechanism is to set the "use-lease-addr-for-default-route" flag in the dhcp server config. That causes the lease address to be delivered in the routers option, and the normal handling of the routers option will then install the self-ip as the default route. PR: 187094 Changes: head/sys/nfs/bootp_subr.c
I forgot to cite the PR in the last commit related to this stuff, r297326. It fixes the problem I mentioned in an earlier comment about trying to install a default route for every interface. Now it only installs the default route associated with the interface it chooses to use for the rootfs mount. At this point I think both the problems fixed in the original submitted patch have been dealt with, and this can be closed if it works okay for everyone else.
Was there a code/patch commited for the original report and use the DHCP hint for intf-MTU and set accordingly?
(In reply to Robert Blayzor from comment #19) Yes, r297149 added handling for that to the kernel bootp code, and r297150 + r297151 made the equivelent changes to libstand and loader(8). What I have not done yet is MFC'd any of the changes (the code freeze was still in effect for the 10-stable branch at the time). I'll see if I can get that done sometime in the next few days (MFC to 10 is easy, 9-stable is harder and I may not do that unless someone asks for it).
MFC to stable/10 would be much appreciated.
A commit references this bug: Author: ian Date: Tue May 31 17:01:55 UTC 2016 New revision: 301056 URL: https://svnweb.freebsd.org/changeset/base/301056 Log: MFC r297147, r297148, r297149, r297150, r297151: Make both the loader and kernel use the interface-mtu option if the dhcp server provides it. Made up of these (semi-)related changes... [kernel...] If the dhcp server provides an interface-mtu option, parse the value and set that mtu on the interface. [libstand...] Garbage collect the bswap routines from libstand, use sys/endian.h. If the dhcp server delivers an interface-mtu option, parse it and store the value in a new global intf_mtu for use by the application. [loader...] If the dhcp server provided an interface-mtu option, transcribe the value to the boot.netif.mtu env var, which will be picked up by pre-existing code in nfs_mountroot() and used to configure the interface accordingly. PR: 187094 Changes: _U stable/10/ stable/10/lib/libstand/Makefile stable/10/lib/libstand/bootp.c stable/10/lib/libstand/bootp.h stable/10/lib/libstand/bswap.c stable/10/lib/libstand/globals.c stable/10/lib/libstand/net.h stable/10/lib/libstand/stand.h stable/10/sys/boot/common/dev_net.c stable/10/sys/boot/i386/libi386/pxe.c stable/10/sys/boot/libstand32/Makefile stable/10/sys/boot/userboot/libstand/Makefile stable/10/sys/nfs/bootp_subr.c
A commit references this bug: Author: ian Date: Tue May 31 17:15:57 UTC 2016 New revision: 301057 URL: https://svnweb.freebsd.org/changeset/base/301057 Log: MFC r297323,r297324, r297325, r297326: Set only one default route for nfsroot mount, the one associated with the interface that will be used to mount the rootfs (and never a self-ip proxy arp route). Made up of the following related changes... Set ifctx->gotrootpath=1 only when the root path came from the dhcp/bootp server (and not when it came from a fallback method such as the ROOTDEVNAME option). This makes the code in bootpc_init() choose the first interface that provided a rootpath name. Previously it was choosing the first interface that got an IP address, which could be on a different and potentially unreachable subnet than the server providing the rootfs. If the rootpath name actually does come from a fallback source, then the code continues to use the first interface in the list that got configured. Note that this wasn't directly reported in the PR cited below, but was discovered while working on that PR. Switch bootpc_adjust_interface() from returning int to void. Its one caller doesn't check for errors, and all the errors that can happen result in it calling panic anyway, except for one that's really more of a warning (and is going to disappear on an upcoming commit anyway). Stop setting the default route to the IP of the interface itself when the bootp/dhcp server doesn't provide a router option. Doing so prevents setting defaultrouter=<ip> in rc.conf (it fails because there's already a bogus default route installed by bootpc_init). When an admin wants to use this style of proxy arp on an interface, the proper mechanism is to set the "use-lease-addr-for-default-route" flag in the dhcp server config. That causes the lease address to be delivered in the routers option, and the normal handling of the routers option will then install the self-ip as the default route. Do not try to install a default route for each interface found, because only the first one will actually work and all the others just result in errors (which would get printed but otherwise ignored). Instead, wait until we make a choice of which interface will be used to mount the rootfs, and install the default route associated with it (if any). After doing the md_mount() call to obtain the needed info, remove the default route again, and transcribe the route info into the nfs_diskless structure. If the system eventually chooses to mount the nfs rootfs, the default route will be installed again when the nfs_diskless code re-initializes the interface. PR: 187094 Changes: _U stable/10/ stable/10/sys/nfs/bootp_subr.c
Were the fixes for this PR (jumbo frames) and the fix for setting the default gateway on PXE boot MFC'd and have those made it into 10-STABLE ?
(In reply to Robert Blayzor from comment #24) Yes, the changes have all been MFC'd to 10. I'm sorry I missed this question back when you first asked it.
MARKED AS SPAM