Bug 284944 - pf: incorrect ICMP error translation in af-to inet
Summary: pf: incorrect ICMP error translation in af-to inet
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 15.0-CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: Kristof Provost
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-02-21 09:48 UTC by Lexi Winter
Modified: 2025-03-05 16:30 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Lexi Winter freebsd_triage 2025-02-21 09:48:30 UTC
it's possible i'm doing something wrong here since i thought this problem should be fixed by 25dbba4fc6e152a05e091180b2e031ab495ba337, but i'm using f5aff1871d3273b3cd3621ea5d3e37cdd807e66f (with a couple of local patches, including the fix for bug 284866) and i'm still seeing the issue.

the problem is that traceroute over af-to NAT64 returns garbage once it hits pf:

# traceroute6 64:ff9b::1.1.1.1
traceroute6 to 64:ff9b::1.1.1.1 (64:ff9b::101:101) from 2a00:1098:6b:200::1, 64 hops max, 28 byte packets
 1  uk-myb-1.le-fay.org (2a00:1098:6b:100::1)  0.544 ms  0.411 ms  0.305 ms
 2  uk-aai-1.le-fay.org (2001:8b0:aab5:100::1)  6.738 ms  7.558 ms  7.520 ms
 3  64:ff9b::101:101 (64:ff9b::101:101)  12.666 ms  12.443 ms  11.981 ms
 4  64:ff9b::101:101 (64:ff9b::101:101)  12.904 ms  11.460 ms  13.006 ms
 5  64:ff9b::101:101 (64:ff9b::101:101)  14.095 ms  13.377 ms  13.012 ms
 6  64:ff9b::101:101 (64:ff9b::101:101)  12.984 ms  13.523 ms  14.175 ms
 7  64:ff9b::101:101 (64:ff9b::101:101)  13.939 ms  13.436 ms  13.025 ms

on the router's external interface, i see the correct outgoing and incoming traffic:

09:42:39.937871 IP (tos 0x0, ttl 1, id 4079, offset 0, flags [none], proto UDP (17), length 40)
    81.187.47.193.25587 > 1.1.1.1.33441: UDP, length 12

09:42:39.942958 IP (tos 0x0, ttl 64, id 13105, offset 0, flags [none], proto ICMP (1), length 56)
    90.155.53.128 > 81.187.47.193: ICMP time exceeded in-transit, length 36
        IP (tos 0x0, id 4079, offset 0, flags [none], proto UDP (17), length 40)
    81.187.47.193.25587 > 1.1.1.1.33441: UDP, length 12

but the translated outgoing error on the internal interface is wrong:

09:42:39.937819 IP6 (hlim 1, next-header UDP (17) payload length: 20) 2a00:1098:6b:200::1.25587 > 64:ff9b::101:101.33441: [udp sum ok] UDP, length 12

09:42:39.942997 IP6 (hlim 64, next-header ICMPv6 (58) payload length: 56) 64:ff9b::101:101 > 2a00:1098:6b:200::1: [icmp6 sum ok] ICMP6, time exceeded in-transit for 64:ff9b::101:101

specifically the source address is 64:ff9b::101:101, when it should be 64:ff9b::5a9b:3580 (64:ff9b::90.155.53.128).

the af-to rule is fairly straightforward:

anchor on ep.uk-aai-1 {
        pass from fe80::/10 to any
        pass in from <lf> to any
        pass in inet6 proto ipv6-icmp from <dn42> to any icmp6-type echoreq
        pass in proto tcp from <dn42> to <dn42-fuchsia> port { smtp, domain, http, https }
        pass in proto udp from <dn42> to <dn42-fuchsia> port domain
        pass in from any to 64:ff9b::/96 af-to inet from 81.187.47.193/32
}
Comment 1 Kristof Provost freebsd_committer freebsd_triage 2025-02-25 09:41:52 UTC
I can confirm that bug, and I see why it's broken. This even affects OpenBSD.

I'm testing a patch along these lines:
diff --git a/sys/netpfil/pf/pf.c b/sys/netpfil/pf/pf.c
index f3c9ea7a2fb1..ac4bab45ffda 100644
--- a/sys/netpfil/pf/pf.c
+++ b/sys/netpfil/pf/pf.c
@@ -8109,8 +8109,18 @@ pf_test_state_icmp(struct pf_kstate **state, struct pf_pdesc *pd,
                                            nk->port[didx], 1, pd->af, nk->af);
                                        m_copyback(pd2.m, pd2.off, sizeof(uh),
                                            (c_caddr_t)&uh);
-                                       PF_ACPY(&pd->nsaddr,
-                                           &nk->addr[pd2.sidx], nk->af);
+                                       if (pd->af == AF_INET) {
+                                               struct pf_addr prefix, nsaddr;
+                                               int prefixlen = in6_mask2len(
+                                                   (struct in6_addr *)&(*state)->rule->dst.addr.v.a.mask, NULL);
+                                               if (prefixlen < 32)
+                                                       prefixlen = 96;
+                                               PF_ACPY(&prefix, &nk->addr[pd2.sidx], nk->af);
+                                               PF_ACPY(&nsaddr, pd->src, pd->af);
+                                               inet_nat64(AF_INET6, pd->src, &nsaddr, &prefix,
+                                                   prefixlen);
+                                               PF_ACPY(&pd->nsaddr, &nsaddr, AF_INET6);
+                                       }
                                        PF_ACPY(&pd->ndaddr,
                                            &nk->addr[pd2.didx], nk->af);
                                        pd->naf = nk->af;

(Though that only fixes the issue for UDP ICMP payloads, and can stand some cleanup.)
Comment 2 Lexi Winter freebsd_triage 2025-02-25 09:44:38 UTC
would you like me to test this patch here?  i have a non-critical NAT64 gateway i can update to test.
Comment 3 Kristof Provost freebsd_committer freebsd_triage 2025-02-25 09:56:33 UTC
(In reply to Lexi Winter from comment #2)
It's a bit early for that, I think. It needs more cleanup and needs to also be implemented for other protocols (TCP, ICMP and SCTP).
Comment 4 Lexi Winter freebsd_triage 2025-02-25 10:13:10 UTC
i enjoy living on the edge, so i applied it anyway for shits and giggles.

seems to work for UDP traceroute:

root@rose:~ # traceroute6 64:ff9b::1.1.1.1
traceroute6 to 64:ff9b::1.1.1.1 (64:ff9b::101:101) from 2a00:1098:6b:200::2, 64 hops max, 28 byte packets
 1  uk-myb-2.le-fay.org (2a00:1098:6b:200::1)  0.093 ms  0.097 ms  0.042 ms
 2  uk-myb-1.le-fay.org (2a00:1098:6b:100::1)  0.770 ms  0.329 ms  0.273 ms
 3  64:ff9b::ac10:300 (64:ff9b::ac10:300)  0.606 ms  0.469 ms  0.467 ms
 4  64:ff9b::ac10:20b (64:ff9b::ac10:20b)  0.503 ms  0.438 ms  0.409 ms
 5  64:ff9b::5d5d:850a (64:ff9b::5d5d:850a)  0.388 ms
    64:ff9b::5d5d:850b (64:ff9b::5d5d:850b)  0.392 ms  0.396 ms
 6  64:ff9b::5d5d:8501 (64:ff9b::5d5d:8501)  0.780 ms  0.616 ms  0.636 ms
 7  64:ff9b::539:514b (64:ff9b::539:514b)  2.228 ms
    64:ff9b::c342:e1b3 (64:ff9b::c342:e1b3)  9.416 ms  13.658 ms
 8  64:ff9b::8d65:4785 (64:ff9b::8d65:4785)  1.645 ms
    64:ff9b::8d65:475d (64:ff9b::8d65:475d)  2.334 ms
    64:ff9b::8d65:476b (64:ff9b::8d65:476b)  2.420 ms
 9  64:ff9b::101:101 (64:ff9b::101:101)  1.634 ms  1.866 ms  1.609 ms

(the weird routing is expected, this ISP does L4 ECMP routing.)

however ICMP traceroute doesn't work at all, which i think is a regression:

root@rose:~ # traceroute6 -I 64:ff9b::1.1.1.1
traceroute6 to 64:ff9b::1.1.1.1 (64:ff9b::101:101) from 2a00:1098:6b:200::2, 64 hops max, 20 byte packets
 1  uk-myb-2.le-fay.org (2a00:1098:6b:200::1)  0.078 ms  0.055 ms  0.046 ms
 2  uk-myb-1.le-fay.org (2a00:1098:6b:100::1)  0.370 ms  0.266 ms  0.241 ms
 3  * * *
 4  * * *
 5  * * *
 6  *^C

ping is fine though:

root@rose:~ # ping 64:ff9b::1.1.1.1
PING(56=40+8+8 bytes) 2a00:1098:6b:200::2 --> 64:ff9b::101:101
16 bytes from 64:ff9b::101:101, icmp_seq=0 hlim=56 time=2.089 ms
16 bytes from 64:ff9b::101:101, icmp_seq=1 hlim=56 time=2.066 ms
Comment 5 Kristof Provost freebsd_committer freebsd_triage 2025-02-28 11:01:59 UTC
There's a more complete patch in https://reviews.freebsd.org/D49143 along with test cases in https://reviews.freebsd.org/D49144

The current version of the fix is based on a proposed OpenBSD patch. I'm likely to go with the OpenBSD version, if only to reduce diff between OpenBSD and FreeBSD pf.
Comment 6 Lexi Winter freebsd_triage 2025-02-28 14:03:22 UTC
as mentioned on IRC,

applied D49143 on top of 8cebb0630046a8eb10c551a856397ed230e73833 but it doesn't seem to work.

FreeBSD yarrow.eden.le-fay.org 15.0-CURRENT FreeBSD 15.0-CURRENT #6 lf/main-n269114-4eb4ddc3c02f: Fri Feb 28 12:00:07 GMT 2025     srcmastr@hemlock.eden.le-fay.org:/build/obj/freebsd/build/src/freebsd/lf/main/amd64.amd64/sys/LF-PF amd64

UDP traceroute works:

root@yarrow:~ # traceroute6 64:ff9b::1.1.1.1
traceroute6 to 64:ff9b::1.1.1.1 (64:ff9b::101:101) from 2a00:1098:6b:100::2, 64 hops max, 28 byte packets
 1  uk-myb-1.le-fay.org (2a00:1098:6b:100::1)  0.209 ms  0.123 ms  0.108 ms
 2  64:ff9b::ac10:300 (64:ff9b::ac10:300)  0.448 ms  0.343 ms  0.339 ms
 3  64:ff9b::ac10:20b (64:ff9b::ac10:20b)  0.284 ms
    64:ff9b::ac10:20a (64:ff9b::ac10:20a)  0.394 ms  0.300 ms
 4  64:ff9b::5d5d:850a (64:ff9b::5d5d:850a)  0.271 ms  0.353 ms  0.380 ms
 5  64:ff9b::5d5d:8501 (64:ff9b::5d5d:8501)  0.511 ms  0.495 ms
    64:ff9b::5d5d:8506 (64:ff9b::5d5d:8506)  0.654 ms
 6  64:ff9b::c342:e1b3 (64:ff9b::c342:e1b3)  2.757 ms  1.474 ms *
 7  64:ff9b::8d65:475d (64:ff9b::8d65:475d)  9.995 ms
    64:ff9b::8d65:473d (64:ff9b::8d65:473d)  2.095 ms
    64:ff9b::8d65:4787 (64:ff9b::8d65:4787)  12.876 ms
 8  64:ff9b::101:101 (64:ff9b::101:101)  1.492 ms  1.263 ms  1.477 ms

ICMP does not:

root@yarrow:~ # traceroute6 -I 64:ff9b::1.1.1.1
traceroute6 to 64:ff9b::1.1.1.1 (64:ff9b::101:101) from 2a00:1098:6b:100::2, 64 hops max, 20 byte packets
 1  uk-myb-1.le-fay.org (2a00:1098:6b:100::1)  0.216 ms  0.177 ms  0.128 ms
 2  * * *

even though the incoming packets seem fine:

14:01:00.708206 IP6 (hlim 1, next-header ICMPv6 (58) payload length: 20) 2a00:1098:6b:100::2 > 64:ff9b::101:101: [icmp6 sum ok] ICMP6, echo request, id 3038, seq 1
14:01:00.708270 IP6 (hlim 64, next-header ICMPv6 (58) payload length: 68) 2a00:1098:6b:100::1 > 2a00:1098:6b:100::2: [icmp6 sum ok] ICMP6, time exceeded in-transit for 64:ff9b::101:101
14:01:00.713707 IP6 (hlim 1, next-header ICMPv6 (58) payload length: 20) 2a00:1098:6b:100::2 > 64:ff9b::101:101: [icmp6 sum ok] ICMP6, echo request, id 3038, seq 2
14:01:00.713766 IP6 (hlim 64, next-header ICMPv6 (58) payload length: 68) 2a00:1098:6b:100::1 > 2a00:1098:6b:100::2: [icmp6 sum ok] ICMP6, time exceeded in-transit for 64:ff9b::101:101
14:01:00.713893 IP6 (hlim 1, next-header ICMPv6 (58) payload length: 20) 2a00:1098:6b:100::2 > 64:ff9b::101:101: [icmp6 sum ok] ICMP6, echo request, id 3038, seq 3
14:01:00.713933 IP6 (hlim 64, next-header ICMPv6 (58) payload length: 68) 2a00:1098:6b:100::1 > 2a00:1098:6b:100::2: [icmp6 sum ok] ICMP6, time exceeded in-transit for 64:ff9b::101:101
14:01:00.714056 IP6 (hlim 2, next-header ICMPv6 (58) payload length: 20) 2a00:1098:6b:100::2 > 64:ff9b::101:101: [icmp6 sum ok] ICMP6, echo request, id 3038, seq 4
14:01:00.714496 IP6 (class 0xc0, hlim 63, next-header ICMPv6 (58) payload length: 68) 64:ff9b::ac10:300 > 2a00:1098:6b:100::2: [icmp6 sum ok] ICMP6, time exceeded in-transit for 64:ff9b::101:101
14:01:05.727039 IP6 (hlim 2, next-header ICMPv6 (58) payload length: 20) 2a00:1098:6b:100::2 > 64:ff9b::101:101: [icmp6 sum ok] ICMP6, echo request, id 3038, seq 5
14:01:05.727525 IP6 (class 0xc0, hlim 63, next-header ICMPv6 (58) payload length: 68) 64:ff9b::ac10:300 > 2a00:1098:6b:100::2: [icmp6 sum ok] ICMP6, time exceeded in-transit for 64:ff9b::101:101
14:01:10.749036 IP6 (hlim 2, next-header ICMPv6 (58) payload length: 20) 2a00:1098:6b:100::2 > 64:ff9b::101:101: [icmp6 sum ok] ICMP6, echo request, id 3038, seq 6
14:01:10.749507 IP6 (class 0xc0, hlim 63, next-header ICMPv6 (58) payload length: 68) 64:ff9b::ac10:300 > 2a00:1098:6b:100::2: [icmp6 sum ok] ICMP6, time exceeded in-transit for 64:ff9b::101:101
14:01:15.784896 IP6 (hlim 3, next-header ICMPv6 (58) payload length: 20) 2a00:1098:6b:100::2 > 64:ff9b::101:101: [icmp6 sum ok] ICMP6, echo request, id 3038, seq 7
14:01:15.785237 IP6 (class 0xc0, hlim 62, next-header ICMPv6 (58) payload length: 68) 64:ff9b::ac10:20b > 2a00:1098:6b:100::2: [icmp6 sum ok] ICMP6, time exceeded in-transit for 64:ff9b::101:101

the only difference i see between hop 1 and hop 2 is the packets for hop 2 have a traffic class set, but i'm not sure why that would make any difference.
Comment 7 Kristof Provost freebsd_committer freebsd_triage 2025-03-04 17:40:51 UTC
(In reply to Lexi Winter from comment #6)
Can you apply https://reviews.freebsd.org/D49231 on top as well?

With the entire series my Windows client can now traceroute over nat64.
Comment 8 Lexi Winter freebsd_triage 2025-03-05 02:22:52 UTC
the test patch didn't apply here:

Patching file tests/sys/netpfil/pf/nat64.py using Plan A...
No such line 237 in input file, ignoring
Hunk #1 failed at 238.
Hunk #2 failed at 282.
2 out of 2 hunks failed while patching tests/sys/netpfil/pf/nat64.py

however the pf.c patch is fine.  for reference, this is the diff between my local branch and freebsd/main: https://www.le-fay.org/tmp/30d/netpfil.diff

i'm rebuilding now.
Comment 9 Lexi Winter freebsd_triage 2025-03-05 03:06:32 UTC
the test patch didn't apply because i didn't have D49144.

looks much better now, both ICMP and UDP traceroute work:

root@uk-jmp-2:~ # traceroute6 -I 64:ff9b::1.1.1.1
traceroute6 to 64:ff9b::1.1.1.1 (64:ff9b::101:101) from 2001:ba8:404a:100::1, 64 hops max, 20 byte packets
 1  uk-myb-2.le-fay.org (2a00:1098:6b:200::1)  1.547 ms  0.985 ms  1.119 ms
 2  uk-myb-1.le-fay.org (2a00:1098:6b:100::1)  1.987 ms  1.978 ms  1.704 ms
 3  64:ff9b::ac10:300 (64:ff9b::ac10:300)  2.044 ms  2.539 ms  2.114 ms
 4  64:ff9b::ac10:20b (64:ff9b::ac10:20b)  2.142 ms  1.928 ms  2.099 ms
 5  64:ff9b::5d5d:850b (64:ff9b::5d5d:850b)  1.744 ms  2.146 ms  1.447 ms
 6  64:ff9b::5d5d:8501 (64:ff9b::5d5d:8501)  1.353 ms  1.398 ms  1.358 ms
 7  64:ff9b::539:514b (64:ff9b::539:514b)  3.650 ms  2.611 ms  2.834 ms
 8  64:ff9b::8d65:4787 (64:ff9b::8d65:4787)  2.565 ms  3.320 ms  2.981 ms
 9  64:ff9b::101:101 (64:ff9b::101:101)  2.465 ms  2.113 ms  2.297 ms

root@uk-jmp-2:~ # traceroute6  64:ff9b::1.1.1.1
traceroute6 to 64:ff9b::1.1.1.1 (64:ff9b::101:101) from 2001:ba8:404a:100::1, 64 hops max, 28 byte packets
 1  uk-myb-2.le-fay.org (2a00:1098:6b:200::1)  1.003 ms  0.866 ms  0.796 ms
 2  uk-myb-1.le-fay.org (2a00:1098:6b:100::1)  1.039 ms  1.020 ms  1.127 ms
 3  64:ff9b::ac10:300 (64:ff9b::ac10:300)  1.307 ms  1.163 ms  1.146 ms
 4  64:ff9b::ac10:20a (64:ff9b::ac10:20a)  1.246 ms
    64:ff9b::ac10:20b (64:ff9b::ac10:20b)  1.329 ms  1.101 ms
 5  64:ff9b::5d5d:850a (64:ff9b::5d5d:850a)  1.042 ms  1.105 ms  1.209 ms
 6  64:ff9b::5d5d:8501 (64:ff9b::5d5d:8501)  1.313 ms  1.479 ms  1.329 ms
 7  64:ff9b::c342:e1b3 (64:ff9b::c342:e1b3)  10.437 ms  12.894 ms  2.782 ms
 8  64:ff9b::8d65:4785 (64:ff9b::8d65:4785)  2.225 ms
    64:ff9b::8d65:473d (64:ff9b::8d65:473d)  27.034 ms
    64:ff9b::8d65:4785 (64:ff9b::8d65:4785)  2.668 ms
 9  64:ff9b::101:101 (64:ff9b::101:101)  1.944 ms  2.210 ms  2.240 ms

also UDP and TCP in quick testing.  i'll leave it running for a bit and see if any other problems crop up.
Comment 10 commit-hook freebsd_committer freebsd_triage 2025-03-05 09:38:58 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=2f77491169cacafd269fb653bec11087d85af035

commit 2f77491169cacafd269fb653bec11087d85af035
Author:     Kristof Provost <kp@FreeBSD.org>
AuthorDate: 2025-02-25 13:20:18 +0000
Commit:     Kristof Provost <kp@FreeBSD.org>
CommitDate: 2025-03-05 09:37:57 +0000

    pf tests: test ICMP error translation with nat64

    Ensure that when we translate an ICMPv4 to ICMPv6 message we set the correct
    source IP address.

    PR:             284944
    Sponsored by:   Rubicon Communications, LLC ("Netgate")
    Differential Revision:  https://reviews.freebsd.org/D49144

 tests/sys/netpfil/pf/nat64.py | 94 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 92 insertions(+), 2 deletions(-)
Comment 11 commit-hook freebsd_committer freebsd_triage 2025-03-05 09:39:00 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=e9a490b10d6cf2006bc2442648f8b2de39aacc5f

commit e9a490b10d6cf2006bc2442648f8b2de39aacc5f
Author:     Kristof Provost <kp@FreeBSD.org>
AuthorDate: 2025-03-04 16:13:38 +0000
Commit:     Kristof Provost <kp@FreeBSD.org>
CommitDate: 2025-03-05 09:37:56 +0000

    pf: fix ICMP source address translation for nat64

    While handling an ICMP error related to another state (e.g. TTL expired, port
    closed, fragmentation needed, ...) we can't just use the state's source address
    as the ICMP source address. We have to translate the IPv4 address back to an
    IPv6 nat64 address.

    Failing to do so breaks things like traceroute, where the intermediate router
    generates an ICMP error message, and the traceroute tool uses the source address
    to build the path.

    Use the state's source address to figure out the prefix, and copy the IPv4 IP
    address to the last bytes to construct the mapped IPv6 address.

    PR:             284944
    Sponsored by:   Rubicon Communications, LLC ("Netgate")
    In collaboration with:  sashan <sashan@openbsd.org>, dac07517c7
    Differential Revision:  https://reviews.freebsd.org/D49143

 sys/netpfil/pf/pf.c | 72 ++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 55 insertions(+), 17 deletions(-)