Bug 248474

Summary: if_ipsec: NAT broken on IPsec/VTI
Product: Base System Reporter: Ziomalski <kokosmaps>
Component: miscAssignee: freebsd-net (Nobody) <net>
Status: Open ---    
Severity: Affects Some People CC: ae, crest, eugen, franco, jimp, kp, m.muenz
Priority: ---    
Version: Unspecified   
Hardware: amd64   
OS: Any   
Attachments:
Description Flags
strongswan work-around patch none

Description Ziomalski 2020-08-04 20:31:29 UTC
Per pfSense documentation and many forum posts going back 5 years, NAT is still not possible on routed IPsec/VTI tunnels. 

When NAT is setup, packets correctly get translated and sent out the tunnel. However, packets returning never cross back into LAN. 

Here is an example. 
https://forum.netgate.com/topic/132970/ipsec-vti-tunnels/31

Last sentence of documentation.
https://docs.netgate.com/pfsense/en/latest/vpn/ipsec/ipsec-routed.html

From everything I can find, the issue resides with the if_ipsec implementation in FreeBSD. Debian based systems like VyOS and EdgeMax have no issues with this.
Comment 1 crest 2020-08-04 21:36:39 UTC
I can assure you that on FreeBSD 12.1 if_ipsec(4) and stronSwan can work behind NAT44, because I'm using it to keep my home lab connected to others servers in a colo. I suspect this is configuration problem. Can you share the configuration generated by pfSense and packet traces on the egress, the ipsec* and enc0 interfaces. Some of those should contain plaintext so please double check them before publishing them.
Comment 2 Ziomalski 2020-08-05 06:07:19 UTC
(In reply to crest from comment #1)
The reason I posted here was because of the following pfSense Dev response:
https://forum.netgate.com/topic/155803/nat-still-broken-on-ipsec-vti/2

I am currently on pfS 2.4.5 which is still FreeBSD 11.3. I have my 192.168 lan subnet that needs to communicate accross a VTI as a single IP 10.x.y.z with NAT. Packet capture on the VTI shows correct translation in both directions however it never reaches back to my LAN. However, I have noticed that the default deny rule on the WAN shows the 10.x.y.z destination as blocked. My ipsec firewall tab has an allow *all* rule. 

If you are positive about 12.1, I think my best bet is to spool up the new 20.7 Opnsense and give it a go there. 

I can provide the details to my current config but I think this is a dead end with 11.3

Thanks for your help!
Comment 3 Eugene Grosbein freebsd_committer 2020-08-05 07:04:33 UTC
First, one should disregard any "5 year old" problem reports if not verified recently because FreeBSD ipsec code got major rewrite and improvement since 11.1-RELEASE.

Next, FreeBSD has several NAT implementations. The problem may be pf-specific but not specific to natd/ipfw nat.

You need to use tcpdump and carefully inspect filtering rules to see what's happening.
Comment 4 Eugene Grosbein freebsd_committer 2020-08-05 07:56:23 UTC
Created attachment 217021 [details]
strongswan work-around patch

Also, it is possible you hit obscure problem in kernel+strongswan co-operation: strongswan unconditionally uses IPSEC_LEVEL_UNIQUE while talking to kernel that may be inappropriate for setups similar to yours.

Sadly, strongswan has no configuration to give user opportunity switching to IPSEC_LEVEL_USE that solves the problem. Here I attach quick-n-dirty work-around patch for strongswan.

You should save it to /usr/ports/security/strongswan/files/patch-kernel_pfkey_ipsec.c and rebuild/reinstall strongswan. No strongswan nor pf reconfiguration required.

Please try it and report back.
Comment 5 Eugene Grosbein freebsd_committer 2020-08-05 08:06:46 UTC
Also, it would be nice if you use unpatched system that exhibits the problem to monitor output of:

netstat -sp ipsec|fgrep "inbound packets violated"

Look if the counter grows when you try passing traffic over IPSec tunnel.
Comment 6 Eugene Grosbein freebsd_committer 2020-08-05 08:19:06 UTC
CC'ng ae. Maybe Andrey has something to tell on the topic.
Comment 7 Andrey V. Elsukov freebsd_committer 2020-08-05 10:07:25 UTC
if_ipsec works with ipfw's NAT, we have many of such installations for years.
I think you use PF and it won't work in some configurations, because of its design and how IPsec handled in FreeBSD.
Comment 8 Michael Muenz 2020-08-05 13:52:38 UTC
Hi,

I just set up a test tunnel between two OPNsense 20.7 (12.1).

Site-A (10.10.12.0/2) - FW-A --- FW-B - Site-B (10.10.10.0/24)

Ping without NAT from A to B is no problem. When I set a nat rule (pf of course) on interface ipsec4000 so source address should be 192.168.199.1 instead of 10.10.12.0, I can see the packet correctly translated in ipsec4000 and enc0, but no ESP packet going to WAN of FW-B, so there seem to be a missing hook somewhere.

TCPDUMP on LAN FW-A:
root@PB-FW1-KARL:~ # tcpdump -n -i vtnet1 icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vtnet1, link-type EN10MB (Ethernet), capture size 262144 bytes
15:32:22.768297 IP 10.10.12.51 > 10.10.10.1: ICMP echo request, id 18103, seq 26, length 64
15:32:23.840814 IP 10.10.12.51 > 10.10.10.1: ICMP echo request, id 18103, seq 27, length 64
15:32:24.913369 IP 10.10.12.51 > 10.10.10.1: ICMP echo request, id 18103, seq 28, length 64


TCPDUMP on ipsec4000 FW-A:
root@PB-FW1-KARL:~ # tcpdump -n icmp -i ipsec4000
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ipsec4000, link-type NULL (BSD loopback), capture size 262144 bytes
15:32:49.288388 IP 192.168.199.1 > 10.10.10.1: ICMP echo request, id 4355, seq 51, length 64
15:32:50.350974 IP 192.168.199.1 > 10.10.10.1: ICMP echo request, id 4355, seq 52, length 64
15:32:51.423561 IP 192.168.199.1 > 10.10.10.1: ICMP echo request, id 4355, seq 53, length 64


TCPDUMP on enc0 FW-A:

root@PB-FW1-KARL:~ # tcpdump -n icmp -i enc0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on enc0, link-type ENC (OpenBSD encapsulated IP), capture size 262144 bytes
15:32:38.737241 (authentic,confidential): SPI 0xc520971a: IP 192.168.199.1 > 10.10.10.1: ICMP echo request, id 4355, seq 41, length 64
15:32:39.752371 (authentic,confidential): SPI 0xc520971a: IP 192.168.199.1 > 10.10.10.1: ICMP echo request, id 4355, seq 42, length 64
15:32:40.824808 (authentic,confidential): SPI 0xc520971a: IP 192.168.199.1 > 10.10.10.1: ICMP echo request, id 4355, seq 43, length 64


Also pf seems to work correctly:
root@PB-FW1-KARL:~ # pfctl -s all | grep 10.10.10.
nat on ipsec4000 inet proto icmp from any to 10.10.10.0/24 -> 192.168.199.1 port 1024:65535
all icmp 10.10.10.1:24247 <- 10.10.12.51:24247       0:0
all icmp 192.168.199.1:25905 (10.10.12.51:24247) -> 10.10.10.1:25905       0:0

Setting nat on enc0 will just be ignored:
root@PB-FW1-KARL:~ # pfctl -s all | grep 10.10.10.
nat on enc0 inet proto icmp from any to 10.10.10.0/24 -> 192.168.199.1 port 1024:65535
all icmp 10.10.10.1:26295 <- 10.10.12.51:26295       0:0
all icmp 10.10.12.51:26295 -> 10.10.10.1:26295       0:0


It seems Andrey (Hi, we were in touch about VTI and OPNsense around 2 years ago) is right, I loaded ipfw and ipfw_nat in OPNsense and did:

ipfw nat 123 config ip 192.168.199.1
ipfw add 124 nat 123 ip4 from 10.10.12.0/24 to 10.10.10.0/24 out

After this I can see packets on Site-B coming from 192.168.199.1, but not reply, but this seems just to be another missing ipfw rule. 

So for OPNsense I don't see a quick fix, maybe we can try patch from Eugene if it really has a chance to go in upstream :)

Best,
Michael
Comment 9 Andrey V. Elsukov freebsd_committer 2020-08-05 14:20:40 UTC
(In reply to Michael Muenz from comment #8)

AFAIK, pf NAT and route-to rules work as last point in the network stack, i.e. pf doesn't reinject packet back to the stack and there is no way for IPsec to catch the packet to make IPsec transformation. If you want to make it works, you need to patch pf(4) and add IPSEC_OUTPUT()/IPSEC_FORWARD() methods to some points, where pf does send to the network interface like IP output routines do. Probably some changes also are required in the inbound path.

I don't think that proposed for strongswan change will help.
Comment 10 Eugene Grosbein freebsd_committer 2020-08-05 14:27:15 UTC
(In reply to Andrey V. Elsukov from comment #9)

Your notes are relevant for problems with outgoing traffic. Are they relevant for incoming traffic that is decrypted before it hits pfil hooks and the pf?
Comment 11 Andrey V. Elsukov freebsd_committer 2020-08-05 14:35:40 UTC
(In reply to Eugene Grosbein from comment #10)

I have very basic knowledge about PF's internals, but I don't think the problem is with if_ipsec, it does nothing special after decryption, PF will see packet as decrypted and received on the if_ipsec interface, and I don't see how it can fail to handle such packets.
Comment 12 Andrey V. Elsukov freebsd_committer 2020-08-05 14:38:03 UTC
(In reply to Andrey V. Elsukov from comment #11)

Probably you need to disable if_enc's pfil handling to avoid some confusions.
Comment 13 Eugene Grosbein freebsd_committer 2020-08-05 14:44:13 UTC
(In reply to Andrey V. Elsukov from comment #11)

IPSec code adds PACKET_TAG_IPSEC_IN_DONE tag to decrypted mbuf then calls pfil hooks. Bad things could happen if mbuf looses PACKET_TAG_IPSEC_IN_DONE due to pfil hook processing: ipsec_in_reject() returns error code 1 (invalid) and packet is dropped increasing ips_in_polvio counter.

Switching to IPSEC_LEVEL_USE is bad hack but it helps.
Comment 14 Michael Muenz 2020-08-05 14:59:34 UTC
(In reply to Eugene Grosbein from comment #4)
I added the patch and build a new pkg, but it doesn't change anything.
Can see the translated packets but no ESP packets leaving WAN.
Comment 15 Eugene Grosbein freebsd_committer 2020-08-05 15:32:36 UTC
(In reply to Michael Muenz from comment #14)

The patch could help only if unpached system shows growing counters in the output of: netstat -sp ipsec|fgrep "inbound packets violated"

If you have problem with outgoing traffic, that would be another case.
Comment 16 Eugene Grosbein freebsd_committer 2020-08-05 15:39:52 UTC
(In reply to Michael Muenz from comment #14)

Every transit packet coming from LAN to WAN first passes pfil hooks as incoming packet before routing lookup for destination, then routing lookup is performed to determine outgoing interface, then packes passes pfil hooks second time as outgoing traffic.

If one needs to perform NAT translation for outgoing traffic first and later IPSEC processing, that must be done this way: configure translation at first pass  before routing lookup as opposed to more traditional second pass.
Comment 17 Andrey V. Elsukov freebsd_committer 2020-08-06 12:28:50 UTC
Did you tried disable if_enc's pfil handling? 

% sysctl net.enc | grep filter
net.enc.out.ipsec_filter_mask: 0
net.enc.in.ipsec_filter_mask: 0

Also you can try enable filtertunnel variable

% sysctl net | grep filtertunnel
net.inet.ipsec.filtertunnel: 1
net.inet6.ipsec6.filtertunnel: 1
Comment 18 Michael Muenz 2020-08-06 13:33:21 UTC
(In reply to Andrey V. Elsukov from comment #17)
Beautiful Andrey, really nice! That did it for OPNsense (pfSense for sure too). 

I'll add a note to the documentation to set a tunable.
Comment 19 Ziomalski 2020-08-07 06:21:10 UTC
(In reply to Andrey V. Elsukov from comment #17)
WOW. Thank you Andrey. I was able to get a connection after adding these changes. pfSense 2.4.5. 
I'm going to be doing more testing and hopefully get it in production soon. I'll report if there are any issues. 

This has been a huge problem for me and I was hitting dead ends for weeks.Many thanks

Cheers!
Comment 20 Michael Muenz 2020-08-07 07:10:07 UTC
I have not tested too deeply but there *may* be strange side effects when using filtering and NAT (SPD) when using route-based and legacy policy-based IPsec tunnels on the same system.

If the *sense is just using route-based it should be OK.
Comment 21 Eugene Grosbein freebsd_committer 2020-08-07 08:01:46 UTC
(In reply to Michael Muenz from comment #20)

Route-based and legacy policy-based IPsec tunnels co-exist just find on the same system.
Comment 22 Michael Muenz 2020-08-07 08:25:07 UTC
(In reply to Eugene Grosbein from comment #21)
Sure, they can. This is only related to *sense, so closing this one here is just fine.

Thanks for all your efforts.
Comment 23 Ziomalski 2020-08-07 19:14:39 UTC
(In reply to Michael Muenz from comment #22)
Thanks Michael for your comments/testing. 

Can you expand a bit on mixing route/policy based connections? I actually require one of each for my setup. My production is running on EdgeMax and this VTI/NAT issue was my last road-block to switching to pf/opn-sense, or so I thought.

[VTI]
LAN(192.168../16) -> filtered dest. subnets -> VTI with NAT(10.../32)

[Policy]
LAN(192.168../16) -> Remote net(60.../29) -> Tunnel with NAT(193.../32)
Local-193.../32
Remote-60.../29

Both of these VPNs are only used one way. The far end does not connect to our resources.

You have me worried with your statement and so any advice would be great. Are you a dev for one of the sense? Should I move this to a forum?

I'm a bit under-experienced compared to you guys(especially with the backend stuff) so I really appreciate the help.
Comment 24 Kubilay Kocak freebsd_committer freebsd_triage 2020-08-08 01:16:59 UTC
^Triage: Correct resolution, FIXED is for resolution by way of a change (usually a commit). If there's anything to be improved related to this issue (documentation, defaults, or otherwise), please re-open the issue with additional information and/or a change proposal
Comment 25 jimp 2020-09-29 19:01:47 UTC
The suggested corrections in this issue only solve the problem for a small number of cases. Sacrificing filtering on enc in favor of if_ipsec isn't viable if someone needs both policy-based and route-based IPsec tunnels to different peers at the same time. The number of instances with a mix of both is much larger than instances which are purely using if_ipsec.

At least with filtering on enc the firewall can filter traffic for both, just no NAT or per-interface rules. If you disable filtering on enc, if_ipsec rules would work but traffic would flow freely and unfiltered on enc for policy-based tunnels, which is a security risk.

The ideal solution would allow both to coexist peacefully rather than being forced to choose. For example, policy-based traffic would filter on enc, while route-based traffic would not be processed by pfil on enc, but would filter on each individual if_ipsec interface instead.

Should this issue be reopened, or should there be a new issue framing this as a feature request instead of a bug?
Comment 26 Ziomalski 2020-09-30 21:55:58 UTC
Re-opening bug: It is currently not possible to simultanously have Routed IPsec with NAT and Policy IPsec. 

Jimp's comment above has the best description of the issue. 

If this issue is supposed to be moved to another location, I will not re-open again. 

Thank you all
Comment 27 Eugene Grosbein freebsd_committer 2020-10-01 02:15:17 UTC
(In reply to Ziomalski from comment #26)

This is not true: "It is currently not possible to simultanously have Routed IPsec with NAT and Policy IPsec". I have both ipsec-tools/racoon running as IKEv1 daemon with "policy ipsec" for incoming L2TP/IPSec end-user VPNs and strongswan as IKEv2 initiator for LAN-to-LAN "routed ipsec" (ipsec0 interface) VPN and ipfw nat, works like a charm.
Comment 28 Eugene Grosbein freebsd_committer 2020-10-01 02:16:11 UTC
(In reply to Eugene Grosbein from comment #27)

Forgot to note, I use FreeBSD 11.4.
Comment 29 Michael Muenz 2020-10-01 11:07:01 UTC
(In reply to Eugene Grosbein from comment #27)

Indeed, the problem description should be adjusted that "only" NAT via pf is affected.
Comment 30 jimp 2020-10-01 14:16:12 UTC
You can have both route-based and policy-based IPsec active at once but you cannot filter both at once in the expected manner.

It is not limited to NAT rules, it affects both NAT and firewall rules in pf (and presumably others) which attempt to filter directly on if_ipsec interfaces while filtering is also active on the enc interface.
Comment 31 Eugene Grosbein freebsd_committer 2020-10-01 14:44:50 UTC
(In reply to jimp from comment #30)

With ipfw you don't even need to filter on enc pseudo-interface.