When you configure transport mode IPSec between two FreeBSD hosts (no tunnels or if_ipsec), TCP connectivity between those hosts breaks. It happens because a) ESP packets are always generated with the DF flag set, b) PMTUD does not work in IPSec transport mode because there is no interface (?) c) when TCP segments of standard size are encapsulated into ESP packets, the resulting oversized ESP packets cannot pass through any interface with MTU=1500, nor can they be fragmented because of the DF flag, so they are just blackholed and never leave the host. How to reproduce. Configure a simple transport mode IPSec between two FreeBSD hosts and try to scp files from one host to another. The file transfer will inevitably stall, until you clear all IPSec policies. Watch with tcpdump: all ESP packets have the DF flag set, but large ESP packets will be missing. A workaround. A host route to the peer with "-mtu 1400" can be configured as described in https://lists.freebsd.org/pipermail/freebsd-net/2019-December/054952.html but it is not scalable. What is to be done. ESP packets should not have the DF flags set by default for things to "just work." I've checked that the net.inet.ipsec.dfbit does not affect transport mode. Regardless of its value, the DF flag is always on.
All the above seems to be valid for IPv6 too.
Victor - are you using asymmetric keys between hosts during this phase of testing, or are you using pre-shared keys? If asymmetric, can you try with insecure keys (so the key can be transmitted within a packet)?
(In reply to dewayne from comment #2) I don't quite understand the second part of your question, the problem is not within ISAKMP. ISAKMP works fine. The problem with TCP begins later, when all SA are already established. But answering your question, the test lab uses preshared keys, racoon with a vanilla configuration.
There are multiple ways to solve this problem that work just fine for FreeBSD 11 at least. First, one can use IPSec transport mode combined with gif tunnel and mtu=1500 for the gif. Oversized IPv4 gif packets have DF bit set to 0, as per gif(4) manual page, so they get fragmented while being transmitted over path with lowest intermediate mtu 1500 or less and no packet drops occur. Second, one can try sysctl net.inet.ipsec.dfbit=0 that is documented in ipsec(4) manual page for IPSec tunnel mode but maybe it works for transport mode, too. Check it out. Maybe, you can switch your IPSec to tunnel mode. Third, you can adjust TCP MSS by means of packet filters. For example, ipfw currently has additional kernel module ipfw_pmod.ko and command ipfw tcp-setmss.
(In reply to Eugene Grosbein from comment #4) > First, one can use IPSec transport mode combined with gif tunnel and mtu=1500 for the gif. The solution with gif or if_ipsec tunnels is not scalable if you want to create a mesh of hosts with protected traffic between them. If we are talking about not more than 2-3 hosts, then the if_ipsec solution is the most elegant. > Second, one can try sysctl net.inet.ipsec.dfbit=0 that is documented in > ipsec(4) manual page for IPSec tunnel mode > but maybe it works for transport mode, too I wrote in the initial problem description that this sysctl does not work for transport mode. You just did not pay attention. > Third, you can adjust TCP MSS by means of packet filters. I don't think I can if the packet in question is not received or transmitted via any interface (like locally generated ssh-client traffic intercepted by IPSec policies). Or I'll try if you provide an example of matching such a packet. I also tried pf's "scrub out proto 50 no-df" but there was no match. In a FreeBSD - Windows 7 combination, this kind of transport mode works transparently out of the box. I think Windows knows to adjust MSS, or something.
OTOH, RFC 2401 Appendix B https://tools.ietf.org/html/rfc2401#page-1-48 states that packets generated by IPSec transport mode must be allowed to fragment over the path and this is incompatible with current behaviour keeping DF=1 for TCP and may be an error in our IPSEC stack. Adding ae@ to CC: list. Andrey, what is your opinion on the problem? Should we clear DF bit unconditionally for outgoing IPv4 IPSec transport mode packets?
(In reply to Victor Sudakov from comment #5) > I don't think I can if the packet in question is not received or transmitted > via any interface (like locally generated ssh-client traffic intercepted > by IPSec policies). Any outgoing packet has its destination IP address and it is not changed by IPSec transport mode. It's possible to perform routing lookup for any reachable destination IP address to discover transmit MTU and deduce right MSS.
(In reply to Victor Sudakov from comment #5) >In a FreeBSD - Windows 7 combination, this kind of transport mode works > transparently out of the box. I think Windows knows to adjust MSS, or something. Can you enable some TCP service at FreeBSD side (f.e. inetd/echo or ftpd) and check it out if Windows sets DF=1 for initial encrypted TCP SYN when you connect from Windows to FreeBSD over such IPSec transport mode configuration?
(In reply to Eugene Grosbein from comment #7) > It's possible to perform routing lookup for any reachable destination IP address to discover transmit MTU and deduce right MSS. Yes, this (or similar) advice was given in https://lists.freebsd.org/pipermail/freebsd-net/2019-December/054952.html It works (I checked) but does not scale.
(In reply to Eugene Grosbein from comment #8) > check it out if Windows sets DF=1 for initial encrypted TCP SYN My FreeBSD - Windows7 IPSec configuration is gone with my Windows7 workstation. If it helps the cause, I can recreate with Windows 10 or Windows 2016 server, it will take some time though because I don't remember very well well how you set up SPD on Windows, it was somewhat non-trivial.
(In reply to Victor Sudakov from comment #9) It does scale: with racoon, you can use phase1 up-script to create specific routes with -mtu 1400 automatically.
(In reply to Victor Sudakov from comment #10) Windows 7 should be fine. I don't think newer versions of Windows have a regression dealing with DF bit.
(In reply to Victor Sudakov from comment #5) > Or I'll try if you provide an example of matching such a packet. This works for me: ipfw add tcp-setmss 1418 tcp from any to 'table(1)' tcpflags syn out ipfw add tcp-setmss 1418 tcp from 'table(1)' to any tcpflags syn in
(In reply to Eugene Grosbein from comment #11) > you can use phase1 up-script to create specific routes A clever idea. A host route to $REMOTE_ADDR via... via what? Maybe sourcing rc.conf for $defaultrouter would be sufficient in most cases. Your idea about ipfw. Can it really match locally created packets not passing via any interface?
(In reply to Victor Sudakov from comment #14) Routing lookup can be performed within shell script, too: gw=$(route -n get "$REMOTE_ADDR" | awk '/gateway: / {print $2}') As for ipfw. First, ipfw never requied matching on some interface name, this is optional. Second, every outgoing locally generated packet has its outgoing interface anyway including targeted to same host, these go out via lo0.
Created attachment 210122 [details] net.inet.ipsec.trans.cleardf For testing: new sysctl net.inet.ipsec.trans.cleardf is zero by default. If set to 1, it forces clearing DF bit for outgoing encrypted transport mode packets.
(In reply to Eugene Grosbein from comment #16) Eugene, could you make "no DF bit" the default behavior? Without the DF bit, transport mode will work "out of the box" in a vanilla configuration. If someone somewhere needs the DF flag on ESP packets, no doubt for obscure and sinister purposes, they could reenable it for themselves. Remember that net.inet.ipsec.dfbit=0 by default.
(In reply to Eugene Grosbein from comment #16) I thought that there was a convention regarding sysctl naming format. Should net.inet.ipsec.trans.cleardf be net.inet.ipsec.trans_cleardf, or are there plans for the trans sub-branch? As it might help people coming into ipsec in the future. Is it possible to have a crisp (clear) description that distinguishes net.inet.ipsec.trans.cleardf: "Clear do not fragment bit for outgoing transport mode packets." and net.inet.ipsec.dfbit=Do not fragment bit on encap. Suggestion net.inet.ipsec.dfbit="Do not fragment bit on tunnel encap." ^ (I'd personally prefer net.inet.ipsec.tunnel_cleardf, and obsolete, in the future, ipsec.dfbit as it doesn't do as currently stated. Perhaps worth consideration?)
(In reply to dewayne from comment #18) Yes, the sysctl is somewhat misnamed but it's for testing only, not considered as permanent solution. I still wait for testing results from Victor. If we get good results and agreement with other developers, we ought just clear DF unconditionally.
(In reply to Eugene Grosbein from comment #15) I've made a quick and dirty script which I run from the remote block. It seems that this workaround does work. #!/bin/sh if echo $REMOTE_ADDR | grep -q ":" ; then gw=$(route -6 -n get "$REMOTE_ADDR" | awk '/gateway: / {print $2}') else gw=$(route -4 -n get "$REMOTE_ADDR" | awk '/gateway: / {print $2}') fi case "${1}" in phase1_up) route add -host $REMOTE_ADDR -mtu 1200 $gw ;; *) route delete -host $REMOTE_ADDR ;; esac
I have the not yet fully thought idea how to fix this problem. I'll try to implement it during coming holidays. There are still unimplemented IPsec method IPSEC_CTLINPUT and unused hdrsz field in the struct inpcbpolicy. We can use them to handle inbound ICMP NEEDFRAG messages and adjust required room for TCP protocol.
(In reply to Eugene Grosbein from comment #8) > Can you enable some TCP service at FreeBSD side (f.e. inetd/echo or ftpd) > and check it out if Windows sets DF=1 for initial encrypted TCP SYN > when you connect from Windows to FreeBSD over such IPSec transport > mode configuration? I've finally found time to do that. 192.168.3.80 is a Windows 2012 server, 192.168.3.1 is FreeBSD with daytime and ftpd services enabled. As you see from the packet dump, all ESP packets have the DF flag set.
Created attachment 210202 [details] ESP from Windows server to FreeBSD
(In reply to Eugene Grosbein from comment #19) > I still wait for testing results from Victor. > If we get good results and agreement with other developers, we ought just clear DF unconditionally. I'm beginning to feel that the solution is not as simple as clearing the DF flag unconditionally. Windows does not do that as seen from the packet dump I attached yesterday https://bugs.freebsd.org/bugzilla/attachment.cgi?id=210202 , and still FTP from a Windows host works over a IPSec transport mode.
The more I think of it, the more I feel that the idea of removing the DF flag from ESP packets is incorrect. Because in IPv6, there is no flag to remove. If an IPv6 packet was not fragmented by the originator, there is nothing to be done in transit.
I think anyone looking into this should look at the later RFCs not the original ones: https://tools.ietf.org/html/rfc4301 section 4.1 and appendix D.1 might be the best starting points. Searching the document for "transport" will yield more easily.
(In reply to Bjoern A. Zeeb from comment #26) Bjoern, can you formulate in a few own words what behavior you deem appropriate in accordance with the later RFCs? I can only say that what we have now is completely broken: you enable IPSec transport mode between FreeBSD hosts on your LAN (very easy and elegant with strongswan, as it turns out) and bummer! Your TCP does not work any more.
A few years ago I used the multiple routing tables to have different MTU of one table, which was used for procsses that were going to use tunnels or ipsec. I can't remember the details of how I forced it but My memory was that the tunnels went to table 1 which was 1500 and everything s went to the default table which was 1400.
(In reply to Julian Elischer from comment #28) > I used the multiple routing tables to have different MTU This is one of the workarounds and we have even discussed something similar in the comments, but should not IPsec "just work" out of the box? That should be our goal.
Any update on the issue? I have 2 OPNSense (Freebsd) machines connected via IPSec tunnel and packets above some size are dropped from time to time. IPSec should work "out of the box".
(In reply to Pawel Zeddi from comment #30) The proposed IPSEC_CTLINPUT() was committed in https://reviews.freebsd.org/D30992 But I did not track which versions contain this code. Probably, some issues related tp MTU and DF-bit should be solved with this commit.
(In reply to Andrey V. Elsukov from comment #31) Patch is for FreeBSD 14 which means OPNSense will update to this version in few years. Very inconvenient (to put it mildly).
Sorry Franco for adding you to the loop, maybe you can have a look at the patch. May be worth to backport to OPNsense kernel :)