Bug 268717 - [pf] [ipnat] rdr rules don't work for traffic originating at localhost
Summary: [pf] [ipnat] rdr rules don't work for traffic originating at localhost
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.1-RELEASE
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-pf (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-01-02 13:28 UTC by dfr
Modified: 2023-07-30 10:13 UTC (History)
4 users (show)

See Also:


Attachments
Script to setup a test. Run this, then try 'telnet 10.123.0.1 8080' (646 bytes, application/x-sh)
2023-01-02 13:28 UTC, dfr
no flags Details
second test scenario showing redirect failing if both ends are on the same bridge (943 bytes, application/x-sh)
2023-01-03 15:58 UTC, dfr
no flags Details
possible fix for redirects initiated by localhost (1.31 KB, patch)
2023-01-05 08:35 UTC, dfr
no flags Details | Diff
Better fix for the problem, covering both pf and ipfilter (4.52 KB, patch)
2023-01-30 11:57 UTC, dfr
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description dfr 2023-01-02 13:28:53 UTC
Created attachment 239212 [details]
Script to setup a test. Run this, then try 'telnet 10.123.0.1 8080'

I am using rules that look like this:

rdr log (all) inet proto tcp from any to ! 10.123.0.2 port = 8080 -> 10.123.0.2 port 80

to allow access to workloads running in jails/containers which is a common pattern for container runtimes like containerd or podman. Typically, a workload (e.g. nginx) runs in a container on its usual port and is 'published' to the host by mapping a host port to the container's address and port. In the example above, the container is in a vnet jail with address 10.123.0.2 and the intention is to publish the container's port 80 as the host's port 8080.

This type of rule works well for traffic originating on some other machine with destination set to the host's IP and port 8080 but when traffic originates on the host itself, the rule matches but traffic doesn't flow. As far as I can tell, the pf rule matches outgoing packets (e.g. SYN for new connections) as being 'received' by lo0 where the packet is rewritten to the container IP/port and travels to the container. The return packet (e.g. SYN+ACK) is not received and packet traces show the host rejecting with RST.

My hypothesis is that a state entry is created for the outgoing packet and registed with lo0 but the reply is received on a different interface (in my case a bridge) and the reverse translation is not performed.
Comment 1 dfr 2023-01-02 13:48:25 UTC
Packet trace showing the problem:

# telnet 10.123.0.1 8080
Trying 10.123.0.1...
13:03:58.531882 IP (tos 0x10, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    10.123.0.1.13733 > 10.123.0.2.http: Flags [S], cksum 0x3465 (incorrect -> 0x5ae4), seq 2886169781, win 65535, options [mss 16344,nop,wscale 6,sackOK,TS val 3565974209 ecr 0], length 0
13:03:58.531897 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    10.123.0.2.http > 10.123.0.1.13733: Flags [S.], cksum 0x9ccf (correct), seq 1244013501, ack 2886169782, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 640967696 ecr 3565974209], length 0
13:03:58.531908 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
    10.123.0.1.13733 > 10.123.0.2.http: Flags [R], cksum 0x4435 (correct), seq 2886169782, win 0, length 0
13:03:59.557528 IP (tos 0x10, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    10.123.0.1.13733 > 10.123.0.2.http: Flags [S], cksum 0x3465 (incorrect -> 0x56dd), seq 2886169781, win 65535, options [mss 16344,nop,wscale 6,sackOK,TS val 3565975240 ecr 0], length 0
13:03:59.557595 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    10.123.0.2.http > 10.123.0.1.13733: Flags [S.], cksum 0x94c1 (correct), seq 1244013501, ack 2886169782, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 640968727 ecr 3565975240], length 0
13:03:59.557655 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
    10.123.0.1.13733 > 10.123.0.2.http: Flags [R], cksum 0x4435 (correct), seq 2886169782, win 0, length 0
13:04:01.760495 IP (tos 0x10, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    10.123.0.1.13733 > 10.123.0.2.http: Flags [S], cksum 0x3465 (incorrect -> 0x4e42), seq 2886169781, win 65535, options [mss 16344,nop,wscale 6,sackOK,TS val 3565977443 ecr 0], length 0
13:04:01.760564 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    10.123.0.2.http > 10.123.0.1.13733: Flags [S.], cksum 0x838b (correct), seq 1244013501, ack 2886169782, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 640970930 ecr 3565977443], length 0
13:04:01.760625 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
    10.123.0.1.13733 > 10.123.0.2.http: Flags [R], cksum 0x4435 (correct), seq 2886169782, win 0, length 0
Comment 2 dfr 2023-01-03 15:58:23 UTC
Created attachment 239234 [details]
second test scenario showing redirect failing if both ends are on the same bridge
Comment 3 dfr 2023-01-03 15:59:07 UTC
Further testing show more strangeness. If I add a second vnet jail on the same bridge and attempt the telnet that should redirect, it also fails. The rule matches when the host receives the SYN and is re-written and directed to the first jail. The SYN+ACK reply doesn't match the state entry and is transmitted without reversing the redirect.

I've added a second test script to set this up - run the script then 'jexec testvnet2 telnet 10.123.0.1 8080'
Comment 4 dfr 2023-01-03 16:34:38 UTC
The second scenario with two vnet jails on the same bridge is possibly the bridge 'helping' by delivering the SYN+ACK reply directly, without allowing PF to re-write. Adding debug printfs to pf seems to show that it never sees the reply.
Comment 5 dfr 2023-01-03 16:45:18 UTC
Turns out I was missing 'sysctl net.link.bridge.pfil_member=1' for the two jails redirecting via a bridge scenario so ignore that part. Adding this doesn't affect the original scenario with the host originating the redirected connection.
Comment 6 dfr 2023-01-05 08:35:25 UTC
Created attachment 239274 [details]
possible fix for redirects initiated by localhost

Redirect rules are triggered on PF_IN events to allow the rule to replace the destination address+port and also on PF_OUT events to reverse the replacement for packets flowing back towards the original source address. If the source is a local address, this second event is not triggered since the return packet is delivered to the local protocol stack.

A possible fix is to simulate the PF_OUT event for packets destined for local processing, allowing the second part of the redirect to be applied. This does conflict with source address validation in 14-current which I'm disabling for testing. That could be mitigated by relaxing source address validation to allow packets with non-local source addresses pre-filtering.
Comment 7 Kristof Provost freebsd_committer freebsd_triage 2023-01-06 02:39:02 UTC
(In reply to dfr from comment #6)
I've been poking at this a bit as well (briefly, because I'm supposed to be on vacation), and I believe your diagnosis may be correct.

To summarise what I've done so far: after turning your script into a atf-sh test case I can reproduce the problem, and poke at things with dtrace.

The test script sets up a vnet jail (IP 192.0.2.1) with an rdr rule redirecting traffic for 192.0.2.1:8080 to 192.0.2.2:8080.

The following D script:

#!/usr/sbin/dtrace -s

fbt:pf:pf_state_insert:entry
{
        s = (struct pf_state_key *)arg2;

        printf("%x -> %x, %d:%d af %d proto %d\n", s->addr[0].pfa.v4.s_addr,
                s->addr[1].pfa.v4.s_addr,
                ntohs(s->port[0]), ntohs(s->port[1]), s->af, s->proto);

        s = (struct pf_state_key *)arg3;

        printf("%x -> %x, %d:%d af %d proto %d", s->addr[0].pfa.v4.s_addr,
                s->addr[1].pfa.v4.s_addr,
                ntohs(s->port[0]), ntohs(s->port[1]), s->af, s->proto);

stack();
}

fbt:pf:pf_find_state:entry
{
        s = (struct pf_state_key *)arg1;

        printf("%x -> %x, %d:%d af %d proto %d", s->addr[0].pfa.v4.s_addr,
                s->addr[1].pfa.v4.s_addr,
                ntohs(s->port[0]), ntohs(s->port[1]), s->af, s->proto);
}
fbt:pf:pf_find_state:return
{
        printf("=> %p", arg1);
}

produces 

dtrace: script '/home/kp/pf.dtrace' matched 3 probes
CPU     ID                    FUNCTION:NAME
  5  80283              pf_find_state:entry 10200c0 -> 10200c0, 8080:40477 af 2 proto 6
  5  80284             pf_find_state:return => 0
  6  80283              pf_find_state:entry 10200c0 -> 10200c0, 40477:8080 af 2 proto 6
  6  80284             pf_find_state:return => 0
  6  80537            pf_state_insert:entry 10200c0 -> 10200c0, 40477:8080 af 2 proto 6
10200c0 -> 20200c0, 40477:8080 af 2 proto 6
              pf.ko`pf_test_rule+0x282b
              pf.ko`pf_check_in+0x25
              kernel`pfil_mbuf_in+0x55
              kernel`ip_input+0x3c6
              kernel`swi_net+0x191
              kernel`ithread_loop+0x279
              kernel`fork_exit+0x80
              kernel`0xffffffff810a44ae

  6  80283              pf_find_state:entry 20200c0 -> 10200c0, 8080:40477 af 2 proto 6
  6  80284             pf_find_state:return => 0

So I think that matches your diagnosis, in that we're creating states (both the original 192.0.2.1 to 192.0.2.1 and rdr'd 192.0.2.1 to 192.0.2.2 traffic, but only on PF_IN. When the reply comes in we don't find a corresponding state, and things don't work.

I'm rather wary of inserting special case handling for this in pf_test(). I wonder if we don't need to add a pfil call in the if_lo handling somewhere, to make that traffic match the normal (i.e. non-loopback) traffic patter.
Comment 8 dfr 2023-01-06 08:13:21 UTC
I was also wary of adding complexity to pf_test although there is some precedent here - pf_route does recurse back to pf_test. The other idea I had was to add a second packet filter call from ip_input and ip6_input at the point where the packet is accepted as local but before its handed off to tcp (or udp) but that seemed a riskier option.

Perhaps the PF_OUT test for packets with local destination could be performed in pf_route which would leave pf_test alone?
Comment 9 Kristof Provost freebsd_committer freebsd_triage 2023-01-10 09:21:24 UTC
(In reply to dfr from comment #8)
We don't hit pf_route() in this scenario, so I don't think that'd help.

I'm going to have to find a bit of time to sit down and concentrate on this before I can say something more useful. I hope to do so next week.
Comment 10 dfr 2023-01-10 11:19:52 UTC
You are right - in this case, r->rt will be NULL. I'm not particularly advocating for my attempted fix but I do think this feature is useful - I originally wanted this to work for 'publishing' ports in podman/docker containers to the host which is a very common pattern in the Linux world. Along the way, I noticed that bastillebsd uses the same kind of redirect rule and is also affected.

Thanks for taking the time to look at this - its been on my TODO list for months and I'm just happy to see some progress.
Comment 11 dfr 2023-01-19 14:19:01 UTC
I was testing this again today with a back-port of my patch to stable/13 and it seems to work fairly well so far. It doesn't work if the source address is 127.0.0.1 since the redirected packet still has a 127.0.0.1 as source address but I can ignore that for now.
Comment 12 Kristof Provost freebsd_committer freebsd_triage 2023-01-21 08:52:35 UTC
(In reply to dfr from comment #11)
I've poked at it for a little bit more, but I think I've mostly succeeded in confusing myself more. I think part of the issue is that we only create the state on the inbound side (so after we've sent the packet out on lo0 and done the outbound processing), and then end up sending it out again.

In any event, this is going to need careful analysis and thought, and that's just not going to happen until my vacation is over. I'll keep this on my todo list, but expect no progress before March.
Comment 13 Kristof Provost freebsd_committer freebsd_triage 2023-01-24 08:50:31 UTC
Oh. Wait.

The setup in comment #1 tries to use 'rdr' to redirect an outbound packet. But the pf.conf man page says: "Then either the rdr rules are evaluated on an inbound
 packet or the nat rules on an outbound packet."

So I don't think that's expected to work. You should be able (and indeed, in my test script I can) make this work using 'nat' rather than 'rdr'.
Comment 14 dfr 2023-01-24 13:22:57 UTC
For my use-case, I need to be able to change both destination address and port and currently the nat rule only allows changing the address. Also, I'm not sure that nat will work here since it re-writes the source address and I need to change the destination address.

As I understand the current situation with rdr, for new local connections, a PF_IN event is triggered when the packet leaves the local network stack and this matches the rdr rule, re-writing destination address and port and setting a state to match the reply. Unfortunately for reply packets no corresponding PF_OUT event is triggered when the packet is delivered to the local network stack so the reverse re-write does not happen. This is why my suggested change works since it simulates the PF_OUT event for packets which will be processed locally.
Comment 15 Kristof Provost freebsd_committer freebsd_triage 2023-01-25 08:14:07 UTC
(In reply to dfr from comment #14)
Right, but rdr very much expects to be used on inbound traffic only.
I believe the relevant code to be in pf_get_translation(), where we only look at the RDR ruleset if direction != PF_OUT (i.e. it's PF_IN).

So I think we have three choices:
1) extend nat (or binat) to be able to change the port and destination address (rather than source address).
2) teach rdr to work on PF_OUT
3) Build on the work in https://reviews.freebsd.org/D38025 and use OpenBSD's rdr-to, where the man page at least seems to suggest it can also work out outbound traffic.
Comment 16 dfr 2023-01-25 11:07:37 UTC
As far as I understand the code (which is not very well TBH), both nat and rdr rules rely on both PF_IN and PF_OUT events but differ on which event triggers state creation: PF_IN for rdr and PF_OUT for nat with the opposite event matching the other direction to reverse the re-write.

For connections initiated locally, this symmetry is broken since we generate PF_IN events for the SYN causing the re-write and creating the state but since there is no corresponding PF_OUT for the SYN+ACK reply, the connection fails. This is why I think we need PF_OUT for packets which will be delivered to the local stack rather than routed onward.

I thought about whether it makes sense for rdr state creation to happen on PF_OUT but wouldn't that have other problems since the un-redirected destination address may direct the packet to the wrong outgoing interface?

I haven't looked closely at upstream OpenBSD pf but it seems they have changed the rule language considerably which would likely be a problem for FreeBSD unless there is a compability mode. I seem to remember that they support tables for rdr target addresses which may be useful for managing a pool of load-balanced replicas.

Anyway, for now I have a workable solution for my two (related) use-cases: 'publishing' container ports to the host for podman and mapping service VIPs to containers for kubernetes. I'm looking forward to seeing the right solution for this merged into main (and maybe stable/13).
Comment 17 Kristof Provost freebsd_committer freebsd_triage 2023-01-27 08:59:35 UTC
(In reply to dfr from comment #16)
> As far as I understand the code (which is not very well TBH), both nat and rdr rules rely on both PF_IN and PF_OUT events but differ on which event triggers state creation: PF_IN for rdr and PF_OUT for nat with the opposite event matching the other direction to reverse the re-write.

Sort of. I find it more useful to think about when the packets pass through which parts of the stack. That does imply a direction (i.e. PF_IN/PF_OUT).
In the normal case, when we're thinking of rdr, we'd be dealing with incoming packets, arriving through ip_input(), triggering a PF_IN pf_test().
For the failing case described here we're dealing with packets that are generated locally, and the first time they hit the firewall is from ip_output() (i.e. pf_test(PF_OUT)).

> For connections initiated locally, this symmetry is broken since we generate PF_IN events for the SYN causing the re-write and creating the state but since there is no corresponding PF_OUT for the SYN+ACK reply, the connection fails. This is why I think we need PF_OUT for packets which will be delivered to the local stack rather than routed onward.

For locally generated connections we're only getting the PF_IN after we've already had a PF_OUT. We can't usefully do anything on SYN+ACK packets, and if we're trying to add an artificial PF_OUT we're just papering over a problem that started earlier.

> I thought about whether it makes sense for rdr state creation to happen on PF_OUT but wouldn't that have other problems since the un-redirected destination address may direct the packet to the wrong outgoing interface?

I'm not sure I follow.

> I haven't looked closely at upstream OpenBSD pf but it seems they have changed the rule language considerably which would likely be a problem for FreeBSD unless there is a compability mode. I seem to remember that they support tables for rdr target addresses which may be useful for managing a pool of load-balanced replicas.

Kajetan intends to keep the old syntax supported, at least for a while. I still need to look at his work in detail, which I don't expect to be able to do before the end of February.

> I'm looking forward to seeing the right solution for this merged into main (and maybe stable/13).

To be clear, I'm not making any promises about the time and effort I can spend on this.
Comment 18 dfr 2023-01-27 13:16:24 UTC
(In reply to Kristof Provost from comment #17)

> Sort of. I find it more useful to think about when the packets pass through which parts of the stack. That does imply a direction (i.e. PF_IN/PF_OUT).
In the normal case, when we're thinking of rdr, we'd be dealing with incoming packets, arriving through ip_input(), triggering a PF_IN pf_test().
For the failing case described here we're dealing with packets that are generated locally, and the first time they hit the firewall is from ip_output() (i.e. pf_test(PF_OUT)).

I like this description of the situation. In this case though, locally generated packets are seen by the firewall via ip_input, causing PF_IN event and creating the state:

(kgdb) bt
#0  pf_create_state (r=0xfffffe00041c6df8, nr=0xfffff80027b0d800, a=0x0, nsn=0x0, nk=0xfffff80027b03a50, sk=0xfffff80027b03aa8, m=0xfffff800276b0200, off=20, sport=13975,
    dport=20480, rewrite=0xfffffe000378cae8, sm=0xfffffe000378cce8, tag=-1, bproto_sum=19927, bip_sum=0, hdrlen=20, pd=<optimized out>, kif=<optimized out>)
    at ../../../netpfil/pf/pf.c:4533
#1  pf_test_rule (rm=rm@entry=0xfffffe000378ccd8, sm=sm@entry=0xfffffe000378cce8, direction=direction@entry=1, kif=kif@entry=0xfffff800039b3200,
    m=m@entry=0xfffff800276b0200, off=off@entry=20, pd=0xfffffe000378cc00, am=0xfffffe000378ccb8, rsm=0xfffffe000378cca8, inp=0x0) at ../../../netpfil/pf/pf.c:4483
#2  0xffffffff80e6fc82 in pf_test (dir=dir@entry=1, pflags=65536, ifp=0xfffff800038f2800, m0=m0@entry=0xfffffe000378cdb0, inp=0x0) at ../../../netpfil/pf/pf.c:7217
#3  0xffffffff80e90f15 in pf_check_in (m=0xfffffe000378cdb0, ifp=0x0, flags=41175, ruleset=<optimized out>, inp=0x5000) at ../../../netpfil/pf/pf_ioctl.c:6463
#4  0xffffffff80d506e5 in pfil_mbuf_common (pch=<optimized out>, p=..., p@entry=..., ifp=0xfffff800038f2800, ifp@entry=0xfffffe000378cd80, flags=65536, inp=inp@entry=0x0)
    at ../../../net/pfil.c:214
#5  pfil_mbuf_in (head=0xfffff8000389de00, p=p@entry=..., ifp=ifp@entry=0xfffff800038f2800, inp=inp@entry=0x0) at ../../../net/pfil.c:226
#6  0xffffffff80dcfbb6 in ip_input (m=0xfffff800276b0200) at ../../../netinet/ip_input.c:613
#7  0xffffffff80d4d261 in netisr_process_workstream_proto (nwsp=0xffffffff825866c0, proto=1) at ../../../net/netisr.c:929
#8  swi_net (arg=0xffffffff825866c0) at ../../../net/netisr.c:976

In this case, the interface is lo0 and direction is PF_IN. The problem (at least as I understand it) is that the replies don't generate the corresponding PF_OUT and the state isn't matched. The return packets cause a PF_IN from the incoming interface (e.g. bridge0) which does nothing and then they are delivered to the local tcp stack without reversing the redirect and are immediately rejected with RST.

>> I thought about whether it makes sense for rdr state creation to happen on PF_OUT but wouldn't that have other problems since the un-redirected destination address may direct the packet to the wrong outgoing interface?
>
> I'm not sure I follow.

What I'm trying to say is that for connection attempts to some local address which would match the rdr rule, there will not be a PF_OUT event to trigger for the same reason as what I'm suggesting above for the current PF_IN triggered rule. The packet won't be re-written but instead will be delivered to the local tcp and rejected.

> To be clear, I'm not making any promises about the time and effort I can spend on this.

Also to be clear, I'm not asking for you to do the work necessarily. I am perfectly happy to do it if someone is prepared to review the change. I do have the time to work on this and the motivation since this problem is a blocker for my projects.

I am going to spend some time to see if ipfirewall is also affected. I have a feeling that it might be for similar reasons.
Comment 19 dfr 2023-01-27 15:38:49 UTC
Testing with ipfirewall's ipnat module shows the exact same pattern where the rule matches the initial connection attempt and translates the destination address and port but the reply does not reverse the translation.

Adding a call to pfil_mbuf_out() in ip_input() right before the packet is handed off to upper-layer protocols fixes the problem for both pf and ipnat.
Comment 20 Marek Zarychta 2023-01-27 15:47:18 UTC
(In reply to dfr from comment #190
It looks like a feature, not a bug. Maybe you can rework your PF rules and/or either use multiple fibs with rtables keyword or pass with route-to keyword.
Comment 21 dfr 2023-01-27 15:55:50 UTC
I don't think its a feature - if we are not supporting rdr rules where the initial source address is local, then the rule shouldn't match at all. 

As it is, for both pf and ipnat, the 'outgoing' rule matches, rewriting the destination address and port and creating state but reply packets never match that state because the firewall does not see the reply packet 'leaving' the network to be delivered to local tcp/udp/whatever.
Comment 22 Marek Zarychta 2023-01-27 16:26:06 UTC
(In reply to dfr from comment #21)
>I don't think its a feature
You are right. It's a missing feature, we lack more of them. But should one always submit a PR after finding a missing feature? Probably yes, if it's a patch with the implementation of that feature.
Comment 23 dfr 2023-01-27 16:33:37 UTC
The patch attached to this bug only covers pf. Moving the proposed fix to ip_input/ip6_input seems to fix both pf and ipnat (haven't checked ipfw yet). I will keep testing and update with a new patch
Comment 24 dfr 2023-01-30 11:57:21 UTC
Created attachment 239803 [details]
Better fix for the problem, covering both pf and ipfilter

This approach to fixing the issue moves the change to ip*_input, which works with both pf and ipfilter.
Comment 25 Kristof Provost freebsd_committer freebsd_triage 2023-02-01 08:23:43 UTC
(In reply to dfr from comment #24)
Am I misreading that, or is that now calling the out pfil hook for all incoming traffic?
Comment 26 dfr 2023-02-01 10:28:32 UTC
(In reply to Kristof Provost from comment #25)
Yes, its calling the hook for everything which matches a local address. This is partly why its conditional on a sysctl. It is similar to the previous fix which put the extra out filter in pf_test* but moving it up a level allows it to work for ipfilter as well (and possibly ipfw although I have not tested that).
Comment 27 Kristof Provost freebsd_committer freebsd_triage 2023-02-01 10:40:05 UTC
(In reply to dfr from comment #26)
While I can't articulate a scenario off the top of my head I can almost guarantee that this extra call into the firewall(s) is going to cause issues. It's going to confuse rulesets and break setups.
Comment 28 dfr 2023-02-01 10:54:21 UTC
(In reply to Kristof Provost from comment #27)
This is another reason for being conditional on the sysctl. I am open to alternative suggestions for this but I think we do need to treat the hand-off to local L4+ processing as a potential filtering event.

I think that Linux iptables make this clearer, allowing filters to register for NF_INET_LOCAL_IN or NF_INET_LOCAL_OUT specifically.
Comment 29 Mina Galić freebsd_triage 2023-02-16 18:06:42 UTC
maybe we should move this to Phabricator and/or a mailing list to get broader view?
Comment 30 Kristof Provost freebsd_committer freebsd_triage 2023-02-16 23:34:51 UTC
(In reply to Mina Galić from comment #29)
That would probably get more eyes on this. 

That said, my view is still that the latest proposed patch is the wrong approach. We should implement support for this use case in the firewall, not by adding this extra pfil hook. 

I still hope to look at this once my holiday is over, starting in about two weeks.
Comment 31 dfr 2023-05-24 13:18:50 UTC
I've finally had time to get back to this today and I've opened a review (https://reviews.freebsd.org/D40256) with my proposal which adds optional pfil hooks to capture the missing event for messages delivered locally. The only difference with the patch attached here is that its rebased and I added a unit test.

I still think this is a viable approach, especially since it fixes the issue for both pf and ipfilter (and possibly ipfw but I've not tried to test that).
Comment 32 commit-hook freebsd_committer freebsd_triage 2023-05-31 10:11:45 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=5ab151574c8a1824c6cd8eded28506cb983284bc

commit 5ab151574c8a1824c6cd8eded28506cb983284bc
Author:     Doug Rabson <dfr@FreeBSD.org>
AuthorDate: 2023-05-24 13:11:37 +0000
Commit:     Doug Rabson <dfr@FreeBSD.org>
CommitDate: 2023-05-31 10:11:05 +0000

    netinet*: Fix redirects for connections from localhost

    Redirect rules use PFIL_IN and PFIL_OUT events to allow packet filter
    rules to change the destination address and port for a connection.
    Typically, the rule triggers on an input event when a packet is received
    by a router and the destination address and/or port is changed to
    implement the redirect. When a reply packet on this connection is output
    to the network, the rule triggers again, reversing the modification.

    When the connection is initiated on the same host as the packet filter,
    it is initially output via lo0 which queues it for input processing.
    This causes an input event on the lo0 interface, allowing redirect
    processing to rewrite the destination and create state for the
    connection. However, when the reply is received, no corresponding output
    event is generated; instead, the packet is delivered to the higher level
    protocol (e.g. tcp or udp) without reversing the redirect, the reply is
    not matched to the connection and the packet is dropped (for tcp, a
    connection reset is also sent).

    This commit fixes the problem by adding a second packet filter call in
    the input path. The second call happens right before the handoff to
    higher level processing and provides the missing output event to allow
    the redirect's reply processing to perform its rewrite. This extra
    processing is disabled by default and can be enabled using pfilctl:

            pfilctl link -o pf:default-out inet-local
            pfilctl link -o pf:default-out6 inet6-local

    PR:             268717
    Reviewed-by:    kp, melifaro
    MFC-after:      2 weeks
    Differential Revision: https://reviews.freebsd.org/D40256

 sys/netinet/ip_input.c                  | 22 ++++++++-
 sys/netinet/ip_var.h                    |  4 ++
 sys/netinet6/ip6_input.c                | 19 ++++++++
 sys/netinet6/ip6_var.h                  |  4 ++
 tests/sys/netpfil/common/Makefile       |  1 +
 tests/sys/netpfil/{pf => common}/rdr.sh | 84 +++++++++++++++++++++++++++++----
 tests/sys/netpfil/common/utils.subr     |  4 ++
 tests/sys/netpfil/pf/Makefile           |  1 -
 8 files changed, 127 insertions(+), 12 deletions(-)
Comment 33 commit-hook freebsd_committer freebsd_triage 2023-06-03 10:09:28 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=4a6b92849e619c40ca9a78d38339798f2735ec56

commit 4a6b92849e619c40ca9a78d38339798f2735ec56
Author:     Doug Rabson <dfr@FreeBSD.org>
AuthorDate: 2023-06-02 15:58:33 +0000
Commit:     Doug Rabson <dfr@FreeBSD.org>
CommitDate: 2023-06-03 10:07:56 +0000

    pf: Fix tests broken by enabling inet-local filtering

    Summary:
    Three of the pf dummynet tests were using filter rules which matched
    both the intended epair interface as well as lo0 which now receives
    PFIL_OUT events for messages delivered to the local network stack (if
    enabled). This commit changes the rules to match only for the expected
    epair interface.

    PR:             268717
    Reviewed-by:    kp
    MFC-after:      2 weeks
    Differential Revision: https://reviews.freebsd.org/D40393

 tests/sys/netpfil/common/dummynet.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
Comment 34 commit-hook freebsd_committer freebsd_triage 2023-06-20 14:36:08 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=3a1f834b5228986a7c14fd60da13cf2700e80996

commit 3a1f834b5228986a7c14fd60da13cf2700e80996
Author:     Doug Rabson <dfr@FreeBSD.org>
AuthorDate: 2023-06-20 13:01:58 +0000
Commit:     Doug Rabson <dfr@FreeBSD.org>
CommitDate: 2023-06-20 14:34:01 +0000

    pf: Add code to enable filtering for locally delivered packets

    This is disabled by default since it potentially changes the behavior of
    existing filter rule sets. To enable this extra filter for packets being
    delivered locally, use:

            sysctl net.pf.filter_local=1
            service pf restart

    PR:             268717
    Reviewed-by:    kp
    MFC-after:      2 weeks
    Differential Revision: https://reviews.freebsd.org/D40373

 UPDATING                                     | 12 ++++++++++++
 sys/netpfil/pf/pf_ioctl.c                    | 20 ++++++++++++++++++++
 tests/sys/netpfil/common/utils.subr          |  3 +--
 tests/sys/netpfil/pf/fragmentation_compat.sh |  3 ++-
 tests/sys/netpfil/pf/fragmentation_pass.sh   |  3 ++-
 tests/sys/netpfil/pf/killstate.sh            | 24 ++++++++++++++++--------
 tests/sys/netpfil/pf/map_e.sh                |  3 ++-
 tests/sys/netpfil/pf/pass_block.sh           |  3 ++-
 tests/sys/netpfil/pf/pfsync.sh               |  1 +
 tests/sys/netpfil/pf/route_to.sh             |  3 ++-
 tests/sys/netpfil/pf/set_skip.sh             |  2 +-
 tests/sys/netpfil/pf/table.sh                |  6 ++++--
 12 files changed, 65 insertions(+), 18 deletions(-)
Comment 35 commit-hook freebsd_committer freebsd_triage 2023-07-16 10:44:42 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=b22299c457b21d77fc5770b9f1a9043487b25ed9

commit b22299c457b21d77fc5770b9f1a9043487b25ed9
Author:     Doug Rabson <dfr@FreeBSD.org>
AuthorDate: 2023-05-24 13:11:37 +0000
Commit:     Doug Rabson <dfr@FreeBSD.org>
CommitDate: 2023-07-14 10:07:58 +0000

    netinet*: Fix redirects for connections from localhost

    Redirect rules use PFIL_IN and PFIL_OUT events to allow packet filter
    rules to change the destination address and port for a connection.
    Typically, the rule triggers on an input event when a packet is received
    by a router and the destination address and/or port is changed to
    implement the redirect. When a reply packet on this connection is output
    to the network, the rule triggers again, reversing the modification.

    When the connection is initiated on the same host as the packet filter,
    it is initially output via lo0 which queues it for input processing.
    This causes an input event on the lo0 interface, allowing redirect
    processing to rewrite the destination and create state for the
    connection. However, when the reply is received, no corresponding output
    event is generated; instead, the packet is delivered to the higher level
    protocol (e.g. tcp or udp) without reversing the redirect, the reply is
    not matched to the connection and the packet is dropped (for tcp, a
    connection reset is also sent).

    This commit fixes the problem by adding a second packet filter call in
    the input path. The second call happens right before the handoff to
    higher level processing and provides the missing output event to allow
    the redirect's reply processing to perform its rewrite. This extra
    processing is disabled by default and can be enabled using pfilctl:

            pfilctl link -o pf:default-out inet-local
            pfilctl link -o pf:default-out6 inet6-local

    PR:             268717
    Reviewed-by:    kp, melifaro
    MFC-after:      2 weeks
    Differential Revision: https://reviews.freebsd.org/D40256

    (cherry picked from commit 5ab151574c8a1824c6cd8eded28506cb983284bc)

 sys/netinet/ip_input.c                  | 22 ++++++++-
 sys/netinet/ip_var.h                    |  4 ++
 sys/netinet6/ip6_input.c                | 19 ++++++++
 sys/netinet6/ip6_var.h                  |  4 ++
 tests/sys/netpfil/common/Makefile       |  1 +
 tests/sys/netpfil/{pf => common}/rdr.sh | 84 +++++++++++++++++++++++++++++----
 tests/sys/netpfil/common/utils.subr     |  4 ++
 tests/sys/netpfil/pf/Makefile           |  1 -
 8 files changed, 127 insertions(+), 12 deletions(-)
Comment 36 commit-hook freebsd_committer freebsd_triage 2023-07-16 10:44:44 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=6dfb2c2dce0ffabd783ec24b8d4d128993363f72

commit 6dfb2c2dce0ffabd783ec24b8d4d128993363f72
Author:     Doug Rabson <dfr@FreeBSD.org>
AuthorDate: 2023-06-20 13:01:58 +0000
Commit:     Doug Rabson <dfr@FreeBSD.org>
CommitDate: 2023-07-14 10:07:58 +0000

    pf: Add code to enable filtering for locally delivered packets

    This is disabled by default since it potentially changes the behavior of
    existing filter rule sets. To enable this extra filter for packets being
    delivered locally, use:

            sysctl net.pf.filter_local=1
            service pf restart

    PR:             268717
    Reviewed-by:    kp
    MFC-after:      2 weeks
    Differential Revision: https://reviews.freebsd.org/D40373

    (cherry picked from commit 3a1f834b5228986a7c14fd60da13cf2700e80996)

 UPDATING                              | 12 ++++++++++++
 sys/netpfil/pf/pf_ioctl.c             | 20 ++++++++++++++++++++
 tests/sys/netpfil/common/utils.subr   |  3 +--
 tests/sys/netpfil/pf/fragmentation.sh |  3 ++-
 tests/sys/netpfil/pf/killstate.sh     | 24 ++++++++++++++++--------
 tests/sys/netpfil/pf/map_e.sh         |  3 ++-
 tests/sys/netpfil/pf/pass_block.sh    |  3 ++-
 tests/sys/netpfil/pf/pfsync.sh        |  1 +
 tests/sys/netpfil/pf/route_to.sh      |  3 ++-
 tests/sys/netpfil/pf/set_skip.sh      |  2 +-
 tests/sys/netpfil/pf/table.sh         |  6 ++++--
 11 files changed, 63 insertions(+), 17 deletions(-)
Comment 37 dfr 2023-07-30 10:13:33 UTC
Closing this - for 13.2 and later, redirecting from localhost mostly works as long as the target address for a connect attempt is not 127.0.0.1 or ::1.

Fixing that probably needs a more flexible binat since in this case, the kernel will typically choose a localhost source address and the rewrite needs to change that to the outgoing interface address. Since that is a feature request for pf, probably best in a new bug.