Bug 201519

Summary: pf NAT translates ICMP type 3 packects incorrectly
Product: Base System Reporter: Alexey Pereklad <mybox>
Component: binAssignee: Kristof Provost <kp>
Status: Closed FIXED    
Severity: Affects Only Me CC: admin, clbuisson, franco, kp, kredaxx, maximos, mybox, pi, tablosazi.farahan
Priority: ---    
Version: 10.3-STABLE   
Hardware: Any   
OS: Any   

Description Alexey Pereklad 2015-07-13 09:10:38 UTC
I have an issue with pf in FreeBSD 9.3. Looks there is something wrong with pf's NAT while processing ICMP packets of type 3 (destination unreachable). Here is what I see on LAN interface:

16:46:10.334993 IP (tos 0xc0, ttl 64, id 63254, offset 0, flags [none], proto ICMP (1), length 289)
10.12.0.198 > 84.47.xx.yy: ICMP 10.12.0.198 udp port 8293 unreachable, length 269
IP (tos 0x0, ttl 60, id 34284, offset 0, flags [none], proto UDP (17), length 261)
84.47.xx.yy.53 > 10.12.0.198.8293: 37288 2/4/4 www.jdm022.com. CNAME sbsfe-p8.geo.mf0.yahoodns.net., sbsfe-p8.geo.mf0.yahoodns.net. A 98.138.19.143 (233)

I.e. some server (84.47.xx.yy) send an UDP packet to client (10.12.0.198, port 8293). This port is closed on client and client send ICMP packet "Port unreachable" to server 84.47.xx.yy. This ICMP packet contains the header of that UDP packet that was sent to closed client's port:

84.47.xx.yy.53 > 10.12.0.198.8293: 37288 2/4/4 www.jdm022.com. CNAME sbsfe-p8.geo.mf0.yahoodns.net., sbsfe-p8.geo.mf0.yahoodns.net. A 98.138.19.143 (233)

And this is what I see on external WAN interface:

16:46:10.335012 IP (tos 0xc0, ttl 63, id 63254, offset 0, flags [none], proto ICMP (1), length 289)
10.12.0.198 > 84.47.xx.yy: ICMP 213.208.kkk.zz udp port 61534 unreachable, length 269
IP (tos 0x0, ttl 60, id 34284, offset 0, flags [none], proto UDP (17), length 261)
84.47.xx.yy.53 > 213.208.kkk.zz.61534: 37288 2/4/4 www.jdm022.com. CNAME sbsfe-p8.geo.mf0.yahoodns.net., sbsfe-p8.geo.mf0.yahoodns.net. A 98.138.19.143 (233)

As you can see, pf translated UDP header that was included into ICMP packet: "ICMP 213.208.kkk.zz udp port 61534 unreachable". IP 213.208.kkk.zz is IP of my external WAN interface where NAT works. But it did not change ICMP packet itself. So I have outgoing ICMP "port unreachable" packet with source address 10.12.0.198 ON EXTERNAL interface.

Also I found that pf can't block this kind of packets. Rule like:

block out quick on $wan_if proto icmp from 10.12/16 to any icmp-type 3 code 3

does not work at all. So I have to use IPFW to block those ICMP packets.

Here is my NAT rule:

nat on $wan_if from <clients> to any -> 213.208.kkk.zz

Table <clients> defines like this:

table <clients> { 10.12/16, 10.13/16 }

Also I found a mention about this issue in OpenBSD pf: http://openbsd-archive.7691.n7.nabble.com/system-6564-pf-not-nating-does-not-see-icmp4-port-unreachable-packets-from-machine-behind-pf-td187997.html
They said that this bug is fixed in 2011. But in FreeBSD 9.3 it is not fixed so far?

My system: FreeBSD vpn2-lesnoy.isp.local 9.3-RELEASE-p2 FreeBSD 9.3-RELEASE-p2 #0: Mon Sep 15 16:44:27 UTC 2014 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 



I checked if I can reproduce this issue with CURRENT. Well, CURRENT has the same problem. Here is my test lab:

# uname -a
FreeBSD test-BSD-01.hyperv.local 11.0-CURRENT FreeBSD 11.0-CURRENT #1 r285351: Fri Jul 10 14:49:08 MSK 2015     root@test-BSD-01.hyperv.local:/usr/obj/usr/src/sys/GENERIC  amd64

Here is dump on LAN interface:

# tcpdump -npi hn1 host 172.16.129.18
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on hn1, link-type EN10MB (Ethernet), capture size 262144 bytes
11:43:25.506775 IP 172.16.129.18.29490 > 208.67.220.220.53: 9125+ A? freebsd.org. (29)
11:43:25.570851 IP 208.67.220.220.53 > 172.16.129.18.29490: 9125 1/0/0 A 8.8.178.110 (45)
11:43:25.571635 IP 172.16.129.18 > 208.67.220.220: ICMP 172.16.129.18 udp port 29490 unreachable, length 36

Dump on external WAN interface at the same moment:

 # tcpdump -npi hn0 \(udp and port 53\) or icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on hn0, link-type EN10MB (Ethernet), capture size 262144 bytes
11:43:30.741672 IP 213.208.xx.yy.55677 > 208.67.220.220.53: 1319+ A? ya.ru. (23)
11:43:30.795961 IP 208.67.220.220.53 > 213.208.xx.yy.55677: 1319 3/0/0 A 93.158.134.3, A 213.180.193.3, A 213.180.204.3 (71)
11:43:30.796700 IP 172.16.129.18 > 208.67.220.220: ICMP 213.208.xx.yy udp port 55677 unreachable, length 36

Here is my /etc/pf.conf:

nat on hn0 from 172.16.129.18 to any -> hn0
pass in all
pass out all
Comment 1 kredaxx 2016-03-05 18:39:29 UTC
I have the exact same problem on:

FreeBSD r1 10.2-RELEASE-p5 FreeBSD 10.2-RELEASE-p5 #0: Sun Oct 11 14:19:57 CEST 2015
Comment 2 Kurt Jaeger freebsd_committer freebsd_triage 2016-05-21 19:53:31 UTC
See

https://lists.freebsd.org/pipermail/freebsd-pf/2016-May/008047.html

for a patch.
Comment 3 Max 2016-05-21 22:18:34 UTC
This patch is not fully tested. releng/10.3.

--- sys/netpfil/pf/pf.c.orig    2016-05-21 17:57:29.420602000 +0300
+++ sys/netpfil/pf/pf.c 2016-05-22 00:54:16.043961000 +0300
@@ -4793,8 +4793,7 @@ pf_test_state_icmp(struct pf_state **sta
                                    &nk->addr[pd2.didx], pd2.af) ||
                                    nk->port[pd2.didx] != th.th_dport)
                                        pf_change_icmp(pd2.dst, &th.th_dport,
-                                           NULL, /* XXX Inbound NAT? */
-                                           &nk->addr[pd2.didx],
+                                           saddr, &nk->addr[pd2.didx],
                                            nk->port[pd2.didx], NULL,
                                            pd2.ip_sum, icmpsum,
                                            pd->ip_sum, 0, pd2.af);
@@ -4866,8 +4865,7 @@ pf_test_state_icmp(struct pf_state **sta
                                    &nk->addr[pd2.didx], pd2.af) ||
                                    nk->port[pd2.didx] != uh.uh_dport)
                                        pf_change_icmp(pd2.dst, &uh.uh_dport,
-                                           NULL, /* XXX Inbound NAT? */
-                                           &nk->addr[pd2.didx],
+                                           saddr, &nk->addr[pd2.didx],
                                            nk->port[pd2.didx], &uh.uh_sum,
                                            pd2.ip_sum, icmpsum,
                                            pd->ip_sum, 1, pd2.af);
@@ -4934,8 +4932,7 @@ pf_test_state_icmp(struct pf_state **sta
                                    &nk->addr[pd2.didx], pd2.af) ||
                                    nk->port[pd2.didx] != iih.icmp_id)
                                        pf_change_icmp(pd2.dst, &iih.icmp_id,
-                                           NULL, /* XXX Inbound NAT? */
-                                           &nk->addr[pd2.didx],
+                                           saddr, &nk->addr[pd2.didx],
                                            nk->port[pd2.didx], NULL,
                                            pd2.ip_sum, icmpsum,
                                            pd->ip_sum, 0, AF_INET);
@@ -4987,8 +4984,7 @@ pf_test_state_icmp(struct pf_state **sta
                                    &nk->addr[pd2.didx], pd2.af) ||
                                    nk->port[pd2.didx] != iih.icmp6_id)
                                        pf_change_icmp(pd2.dst, &iih.icmp6_id,
-                                           NULL, /* XXX Inbound NAT? */
-                                           &nk->addr[pd2.didx],
+                                           saddr, &nk->addr[pd2.didx],
                                            nk->port[pd2.didx], NULL,
                                            pd2.ip_sum, icmpsum,
                                            pd->ip_sum, 0, AF_INET6);
@@ -5027,8 +5023,7 @@ pf_test_state_icmp(struct pf_state **sta

                                if (PF_ANEQ(pd2.dst,
                                    &nk->addr[pd2.didx], pd2.af))
-                                       pf_change_icmp(pd2.src, NULL,
-                                           NULL, /* XXX Inbound NAT? */
+                                       pf_change_icmp(pd2.dst, NULL, saddr,
                                            &nk->addr[pd2.didx], 0, NULL,
                                            pd2.ip_sum, icmpsum,
                                            pd->ip_sum, 0, pd2.af);
Comment 4 commit-hook freebsd_committer freebsd_triage 2016-05-23 12:41:55 UTC
A commit references this bug:

Author: kp
Date: Mon May 23 12:41:29 UTC 2016
New revision: 300501
URL: https://svnweb.freebsd.org/changeset/base/300501

Log:
  pf: Fix ICMP translation

  Fix ICMP source address rewriting in rdr scenarios.

  PR:		201519
  Submitted by:	Max <maximos@als.nnov.ru>
  MFC after:	1 week

Changes:
  head/sys/netpfil/pf/pf.c
Comment 5 Kristof Provost freebsd_committer freebsd_triage 2016-05-23 12:42:38 UTC
(In reply to Max from comment #3)
Awesome work Max! I'll try to MFC this to stable/10 next week.
Comment 6 Max 2016-05-23 13:01:45 UTC
(In reply to Kristof Provost from comment #5)
https://svnweb.freebsd.org/base/head/sys/netpfil/pf/pf.c?annotate=300501&pathrev=300501#l5017
should be "pf_change_icmp(pd2.dst, NULL, saddr,", not "pf_change_icmp(pd2.src, NULL, saddr,"
Comment 7 commit-hook freebsd_committer freebsd_triage 2016-05-23 14:00:05 UTC
A commit references this bug:

Author: kp
Date: Mon May 23 13:59:49 UTC 2016
New revision: 300508
URL: https://svnweb.freebsd.org/changeset/base/300508

Log:
  pf: Fix more ICMP mistranslation

  In the default case fix the substitution of the destination address.

  PR:		201519
  Submitted by:	Max <maximos@als.nnov.ru>
  MFC after:	1 week

Changes:
  head/sys/netpfil/pf/pf.c
Comment 8 commit-hook freebsd_committer freebsd_triage 2016-05-30 01:22:04 UTC
A commit references this bug:

Author: kp
Date: Mon May 30 01:21:44 UTC 2016
New revision: 300979
URL: https://svnweb.freebsd.org/changeset/base/300979

Log:
  MFC 300501, 300508

  pf: Fix ICMP translation

  Fix ICMP source address rewriting in rdr scenarios.

  pf: Fix more ICMP mistranslation

  In the default case fix the substitution of the destination address.

  PR:		201519
  Submitted by:	Max <maximos@als.nnov.ru>

Changes:
_U  stable/10/
  stable/10/sys/netpfil/pf/pf.c
Comment 9 clbuisson 2016-08-03 17:03:13 UTC
Upgrading my Router/firwall from 9.3-STABLE svn 299225 to 10.3-STABLE svn 303269
I found that NATed traceroute's from the internal network to an external system
displayed the IPv4 addresses/names of the final destination system instead of
the IPv4 addresses/names of the intermediate systems/routers.

I reverted 300979 and obtained correct traceroute addresses/name display.

So I dare think that the bug cannot be closed.
Comment 10 Kristof Provost freebsd_committer freebsd_triage 2016-08-03 21:00:42 UTC
(In reply to clbuisson from comment #9)
I'm afraid I don't understand what the problem is.

Can you add a description of your network setup, the trace route output and a network capture (please specify where in the network the capture was made)?
Comment 11 clbuisson 2016-08-03 21:40:10 UTC
There is nothing complicated in my setup !

1. An Internal network with "private" IPv4 addresses
2. A Gateway/Router/Firewall connected to this internal network, and to the
Internet (ADSL), and NATing the traffic betwwen 1 and 3
3. The Internet with any system, for exemple www.freebsd.org

On a system on the internal network, if I do

traceroute www.freebsd.org

I get

- first line: the internal address/name of the gateway (OK)
- a number of lines, one for each intermediate router on the Internet, but
labelled with the address/name of www.freebsd.org (!OK)
- last line: the address/name of www.freebsd.org (OK)

Details seem irrelevant (anyone can find the address of www/freebsd.org ..), and
the effect of outgoing NAT on UDP or ICMP (in case of traceroute -I) is supposed
known. It is clear that the bug is in the NAT of the ICMP TIME_EXCEEDED received
from the Internet (invalid substitution of the address of the responding router with address of the traceroute target).
Comment 12 Kristof Provost freebsd_committer freebsd_triage 2016-08-04 06:24:09 UTC
(In reply to clbuisson from comment #11)
I'm unable to reproduce the described behaviour on my system. Please make a network capture so we can look in detail at what's going wrong.
Comment 13 Vladyslav V. Prodan 2016-08-09 14:26:29 UTC
(In reply to clbuisson from comment #11)

Show please your network diagram - L1 and L2.
As well as the route to the external IP.

I'm on FreeBSD 10.3-STABLE r302074 bunch of miracles happening with traceroute :(
Only I still used carp, route-to with several uplinks ...
Comment 14 Kristof Provost freebsd_committer freebsd_triage 2016-08-09 14:30:05 UTC
(In reply to Vladislav V. Prodan from comment #13)
I've been talking to clbuisson@orange.fr in private, and it looks like there is indeed something wrong in 10.3, but not in 11 or 12.
Right now I have no idea why.
Comment 15 Franco Fichtner 2016-08-10 09:08:21 UTC
I can confirm that the patches break traceroute output on 10.3.  Can this be reopened?
Comment 16 Kristof Provost freebsd_committer freebsd_triage 2016-08-10 09:12:57 UTC
Yes, it's on the top of my list.
Comment 17 Kristof Provost freebsd_committer freebsd_triage 2016-08-17 13:39:58 UTC
I suspect I know what the cause is. stable/10 does not include the fix for 204005, so PF_ANEQ() doesn't work correctly.

Merging r289932 and r289940 should fix the problem.
I'm currently building a version of stable/10 with the fix, if I'm correct this will be fixed soon.
Comment 18 Kristof Provost freebsd_committer freebsd_triage 2016-08-17 15:21:40 UTC
This should be fixed as of r304293 in stable/10.

Can one of the affected users confirm so we can close?
Comment 19 clbuisson 2016-08-17 16:18:49 UTC
Running now with a patched kernel: first (quick) tests are positive !

Thank you, for your work
Comment 20 Franco Fichtner 2016-08-17 16:36:12 UTC
Looks good, thanks!
Comment 21 vali gholami 2017-12-17 07:12:46 UTC
MARKED AS SPAM