Bug 229092

Summary: [pf] [pfsync] States created by route-to rules pfsynced without interface
Product: Base System Reporter: Kajetan Staszkiewicz <vegeta>
Component: kernAssignee: freebsd-pf mailing list <pf>
Status: New ---    
Severity: Affects Some People CC: kp, vegeta
Priority: --- Keywords: patch
Version: 11.1-RELEASE   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
Reconstruct rt_kif in pfsync_state_import
none
Reconstruct interface route by standard fib lookup none

Description Kajetan Staszkiewicz 2018-06-17 21:40:24 UTC
Created attachment 194342 [details]
Reconstruct rt_kif in pfsync_state_import

I use FreeBSD and pf on routers and hardware loadbalancers. Routers do normal routing and have firewalls with only block or pass rules. Loadbalancers use route-to rules with tables of target hosts. On routers pfsync works just fine while on loadbalancers it fails because states are synced without target interface.

There are 2 ways to fix it:
1. Modify struct pfsync_state to include target interface, but that would be breaking compatibility.
2. Reconstruct missing interface using rules on the second loadbalancer.

Please find attached patch solving the issue using the 2nd method. There is still the issue of source_nodes not being synced, they probably can be reconstructed in a similar fashion. I might provide a patch for that later on. This the 1st version of the patch, I am not totally sure of its stability and it is designed only to solve the issue in my particular case, that is for rules with the following syntax: "route-to (internal4027 <pool_154571_4>) round-robin"
Comment 1 Kajetan Staszkiewicz 2018-06-18 14:08:31 UTC
I came across an issue preventing this from working correctly when rebooting hardware: pfsync is started before pf (or in my case before my custom service populating pf rules. That's a problem, because for route-to interface to be correctly rebuilt, pf rules must be already present. I'm unsure if changing this order is a good idea, maybe it is like this for a good reason?
Comment 2 Kajetan Staszkiewicz 2018-06-21 12:59:21 UTC
While looking on possibility of recreating src_nodes I found that the way src nodes are created is rather sketchy. For example when a new state is created with new src_node, first a node is searched for, if none is found then it is created and inserted, each of those operation with its own locking and unlocking src_node hash. Operations within pf_map_addr operate on unlocked src_node which probably explain crashes I had when flushing nodes on heavily loaded system. Then there is the issue that each locking and unlocking operation requires computing the hash again, why not compute it once and store it within the node, this way unlocking could be a bit faster. Creation of node could return it locked as now it needs to be re-locked for any further operations. Then there is the issue that pf_state->rt_kif is copied from pf_rule->rpool.cur which might not be the same as during pf_map_addr() (there is no locking inside that function so it might be inconsistent anyway). And last but not the least is that it seems to me that pf_src_node->*kif is not used at all. And src_node itself never stores information about interface choosen for route-to targets, it is only copied to state instead.

I will prepare a patch addressing those issues first and then work on recreating redirection interface as originally this issue was about.

The proof of concept patch seems generally working, I configured my firewall service to start before pfsync and I could reboot my load balancers as I pleased and traffic was correctly forwarded.
Comment 3 Kajetan Staszkiewicz 2018-06-29 12:37:37 UTC
I found another bug: states synced during initial bulk update are considered to come from incompatible ruleset, even if ruleset *is* compatible. I also must raise a question why the initial sync is "update" and not "insert".


--- a/sys/netpfil/pf/if_pfsync.c
+++ b/sys/netpfil/pf/if_pfsync.c
@@ -874,21 +874,21 @@ pfsync_in_upd(struct pfsync_pkt *pkt, struct mbuf *m, int offset, int count)
                                printf("pfsync_input: PFSYNC_ACT_UPD: "
                                    "invalid value\n");
                        }
                        V_pfsyncstats.pfsyncs_badval++;
                        continue;
                }

                st = pf_find_state_byid(sp->id, sp->creatorid);
                if (st == NULL) {
                        /* insert the update */
-                       if (pfsync_state_import(sp, 0))
+                       if (pfsync_state_import(sp, pkt->flags))
                                V_pfsyncstats.pfsyncs_badstate++;
                        continue;
                }

                if (st->state_flags & PFSTATE_ACK) {
                        PFSYNC_LOCK(sc);
                        pfsync_undefer_state(st, 1);
                        PFSYNC_UNLOCK(sc);
                }



This bug is fixed in OpenBSD some time ago: https://github.com/openbsd/src/commit/ddb7828bc6708358e6c08caaf09e3524e8cab7b4
Comment 4 commit-hook freebsd_committer 2018-06-30 12:51:36 UTC
A commit references this bug:

Author: kp
Date: Sat Jun 30 12:51:08 UTC 2018
New revision: 335816
URL: https://svnweb.freebsd.org/changeset/base/335816

Log:
  pfsync: Fix state sync during initial bulk update

  States learned via pfsync from a peer with the same ruleset checksum were not
  getting assigned to rules like they should because pfsync_in_upd() wasn't
  passing the PFSYNC_SI_CKSUM flag along to pfsync_state_import.

  PR:		229092
  Submitted by:	Kajetan Staszkiewicz <vegeta tuxpowered.net>
  Obtained from:	OpenBSD
  MFC after:	1 week
  Sponsored by:	InnoGames GmbH

Changes:
  head/sys/netpfil/pf/if_pfsync.c
Comment 5 Kristof Provost freebsd_committer 2018-06-30 12:59:37 UTC
Thanks for that patch. I've not yet had the opportunity to look at the other patch (or remarks in any detail). I'll try to do so as soon as possible, but it may be some time.
Comment 6 commit-hook freebsd_committer 2018-07-07 14:46:59 UTC
A commit references this bug:

Author: kp
Date: Sat Jul  7 14:46:02 UTC 2018
New revision: 336064
URL: https://svnweb.freebsd.org/changeset/base/336064

Log:
  MFC r335816:

  pfsync: Fix state sync during initial bulk update

  States learned via pfsync from a peer with the same ruleset checksum were not
  getting assigned to rules like they should because pfsync_in_upd() wasn't
  passing the PFSYNC_SI_CKSUM flag along to pfsync_state_import.

  PR:		229092
  Submitted by:	Kajetan Staszkiewicz <vegeta tuxpowered.net>
  Obtained from:	OpenBSD
  Sponsored by:	InnoGames GmbH

Changes:
_U  stable/11/
  stable/11/sys/netpfil/pf/if_pfsync.c
Comment 7 Kajetan Staszkiewicz 2018-08-17 16:26:30 UTC
Do we consider breaking pfsync protocol compatibility? If we could just modify the protocol to sync redirection interface, there would be no need for reconstrucing it and for identical ruleset with identical table contents.
Comment 8 Kristof Provost freebsd_committer 2018-08-18 14:37:47 UTC
(In reply to Kajetan Staszkiewicz from comment #7)
I'd be very very hesitant to break compatibility. A common pattern with pfsync is that one gateway is upgraded while the other takes over. That'll need to keep working.

That said, it might be possible to extend the protocol by using one of the _pad fields. It'd have to work (minus newly supported/improved cases) when syncing with older code, but that might be possible.
Comment 9 Kajetan Staszkiewicz 2018-08-18 22:51:51 UTC
I see only those fields free to be used:

struct pfsync_state {
	u_int8_t	 __spare[2];
}

struct pfsync_state_peer {
	u_int8_t	pad[6];
}

None of them is enough to carry char ifname[IFNAMSIZ] information. I thought interfaces maybe have some increasing ID which would fit into those bytes but I can't find such thing. We could add such increasing ID to pfi_kif but that would still be an opportunistic solution, working correctly only if two routers have identical interfaces which were added in the same order. That might in some situations be even harder to achieve than having identical ruleset as required by the patch I proposed.
Comment 10 Kristof Provost freebsd_committer 2018-08-20 08:33:55 UTC
Good point. I don't see an immediate straightforward way of handling this.
I was going to suggest a new action type, but pfsync_input() stops handling the entire packet when it encounters an unknown action type.
We'd either have to make sure the new type is always at the end of the packet, or we'd have to teach pfsync to ignore unknown action types first, and only later (once we hope everyone has upgraded) add the new one.
Comment 11 Kajetan Staszkiewicz 2019-01-22 22:56:15 UTC
Created attachment 201346 [details]
Reconstruct interface route by standard fib lookup

I found another issue. Even if we can somehow reconstruct route interface, there is still a requirement for having identical ruleset on both routers because it is rule->rt which makes Route-to, Duplicate-to and Reply-to targets work. This information is never kept in state.

Attached patch solves this issue by copying rule->rt to state->rt (new field). Pfsync struct got this field too. Route interface is reconstructed by normal lookup in routing table in fib 0.

Warning: for "no state" rules stil rule->rt must be used and I have coded it but not tested. For stateful ruleset all seems fine for route-to target.
Comment 12 Kristof Provost freebsd_committer 2019-01-22 23:16:34 UTC
(In reply to Kajetan Staszkiewicz from comment #11)
Wouldn't the pfcksum protect us from having different rules in the first place?
Comment 13 Kajetan Staszkiewicz 2019-01-23 08:11:14 UTC
(In reply to Kristof Provost from comment #12)
pfcksum only checks if loaded rules are the same, it does not ensure rules are the same on 2 routers. There are a few ways to have different rulesets, let me give you a little list I came across while trying to make pfsync work:
- Any rule using interface IP addresses in unnamed table {} will end up being different on 2 routers unless named <table> {} is used.
- Same thing for SNAT rules, although I'm unsure if those are included in pfchecksum.
- If ruleset is dynamically generated by a script, data structure might not have explicit ordering and produce different result on each run: for me it was Python and its dictionaries and sets.
- In a dynamical environment it might happen that the ruleset is different for short periods of time when new configuration is applied as it will never be applied at exactly the same time on both routers. For me on some loadbalancers new configuration is applied tens of times a day.
Comment 14 Kajetan Staszkiewicz 2019-01-23 21:52:17 UTC
To sum it up: I don't think it is feasible to have any functionality depending on  ruleset being identical. It is really hard to achieve it and it might not be worth the effort.
Comment 15 Kristof Provost freebsd_committer 2019-01-23 22:47:43 UTC
(In reply to Kajetan Staszkiewicz from comment #13)

> - Any rule using interface IP addresses in unnamed table {} will end up being different on 2 routers unless named <table> {} is used.

Ah, because pf generates a random id for the table? I'd argue that that's something the rules sync script (if there is one) should account for, but I'd be happy to take patches to make that 'random id' predictable (and consistent across hosts).

> - Same thing for SNAT rules, although I'm unsure if those are included in pfchecksum.

I'm not sure what you mean by SNAT rules. The pf_setup_pfsync_matching() function checksums all rules, other than the scrub rules.

> - If ruleset is dynamically generated by a script, data structure might not have explicit ordering and produce different result on each run: for me it was Python and its dictionaries and sets.

I don't understand this one. It shouldn't matter how rules are generated, the kernel will calculate a checksum. Or do you mean to say pf should compensate for bugs in synchronisation scripts? 

I don't really see a way around the requirement for the ruleset to be identical on all pfsync synced hosts.
Comment 16 Kajetan Staszkiewicz 2019-01-24 01:11:20 UTC
(In reply to Kristof Provost from comment #15)

> (In reply to Kajetan Staszkiewicz from comment #13)
>
>> - Any rule using interface IP addresses in unnamed table {} will end up 
>> being different on 2 routers unless named <table> {} is used.
>
> Ah, because pf generates a random id for the table?

I think so.

> I'd argue that that's
> something the rules sync script (if there is one)

I don't "sync" rules. I "generate" on central database and upload to
loadbalancers. Generated files look identical, line by line. (+/- Python
issue, I will comment on it later).

> should account for, but I'd

Taking that into account is exactly what was needed in my case. Consider
such two rules:

1. allow in on $IFACE from { $HOST1 $HOST2 }

Table used here is unnamed, anonymous, dynamic or however it is called
in the world of pf. There is no guarantee of its name and thus even if
configuration is generated centrally, it will result in ruleset having
different checksum on each loadbalancer.

But is there even any real table used at all? I remember something about
dynamically generated table names but what I see is expansion of ruleset
during loading into separate rules. e.g. rule:

rdr on $if_public inet6 proto ipv6-icmp from any to $if_public -> <own_addrs_pub>

got expanded to 2 rules:

rdr on public inet6 proto ipv6-icmp from any tofe80::6a05:caff:fe0b:dd02 -> <own_addrs_pub> round-robin
rdr on public inet6 proto ipv6-icmp from any to 2a00:XXXXXX -> <own_addrs_pub> round-robin

(BTW, expansion to link-local addresses seems a bug to me, I will report
it separately).

2. table <table> { $HOST1 $HOST2 }
pass in on $IFACE from <table>

Here table is named. Ruleset is now consitent between loadbalancers no
matter the contents of table.

> be happy to take patches to make that 'random id' predictable (and consistent
> across hosts).

Maybe one day but for now I already forced usage of named tables everywhere.

>> - Same thing for SNAT rules, although I'm unsure if those are included in pfchecksum.
>
> I'm not sure what you mean by SNAT rules.

Sorry, of course I meant NAT rules in pf. I very much prefer nftables
terminology of SNAT and DNAT, they just make way more sense.

> The pf_setup_pfsync_matching()
> function checksums all rules, other than the scrub rules.

That just adds one more type of rules that can screw up checksum, as I
expected.

>> - If ruleset is dynamically generated by a script, data structure might not 
>> have explicit ordering and produce different result on each run: for me it
>> was Python and its dictionaries and sets.
>
> I don't understand this one. 

Data structures like sets and hashes have no explicit ordering, at least
in Python. I think I was getting consistent results with Python 2.7 but
totally random when moved to 3.5 Things put to them will be retrieved in
some random order. One database of rules will produce functionally
identical (at least as long as they are "quick" rules) firewall but with
rules in different order. Of course pf can't do anything about it and
this is expeced, see next paragraph.

> It shouldn't matter how rules are generated, the
> kernel will calculate a checksum. Or do you mean to say pf should compensate
> for bugs 

That is not a bug.

> in synchronisation scripts?

No, it definitely should not. All I'm saying that it is another trap
I've encountered while fighting with this topic and that it is very hard
to make the ruleset identical from point of view of pf and we should not
expect identical rulesets.

> I don't really see a way around the requirement for the ruleset to be identical
> on all pfsync synced hosts.

But is there really such requirement with current status of pf?

I think the whole discussion wandered away from the main topic. Let's
get back on track.

Current situation:

1. Identical pf.conf will result in different checksum in many cases due
to interface addresses, dynamic table names and/or rule expansion from
unnamed tables.
2. pfsync of normal firewall states which only pass or nat traffic don't
need identical ruleset at all.
3. pfsync of states from route/dup/reply-to rules is *fully broken*.

Let me repeat once again: none of *working* functionalities of pf seems
to require identical ruleset. Maaaaybeee label counters?

I want to focus on fixing issue 3. There are multiple aproaches:
1. Old patch which depends on ruleset being identical and reconstructing
missing information from rules.
2. New patch which sends part of missing information (state->rt) over
pfsync and discovers interface to use from normal route lookup.
3. Modify pfsync structure to io include both state->rt and state->rt_kif.

I would *love* to have 3. implemmented but for now I work with 2.
because 1. was way too unrealiable.

How should we progress?
Comment 17 Kristof Provost freebsd_committer 2019-01-24 01:26:09 UTC
Right, for 3. we come back to the compatibility issue. pfsync has to remain able to run with different versions, so while we could potentially extend the protocol to include this information we *have* to make sure doing so won't break a host that doesn't understand the new fields. And vice versa: a host which doesn't include the information must be able to send state to a host which expects the extra information.

That's probably possible, but it'll need some special care.
Comment 18 Kajetan Staszkiewicz 2019-01-24 13:59:18 UTC
My 2nd patch stores missing state->rt information in currently unused part of struct pfsync_state. That should make it compatible. A router running non-patched kernel will simply not transmit any data there when sending states and ignore all data when receiving them from a patched router. So that part should be safe.

What looks potentially unsafe is guessing of target interface. Although it is already badly broken, as packets are leaving router via route matching destination on unpatched kerel.

Is guessing of target interface done correctly? Can I use fib lookup functions just like this? No locking needed?
Comment 19 Kristof Provost freebsd_committer 2019-01-28 20:45:46 UTC
There's a typo in the KASSERT (r_dir = PF_IN).

I wonder if 'rt' can't be a flag. That'd give us more room for other extensions later.
Comment 20 Kajetan Staszkiewicz 2019-02-06 12:44:02 UTC
'rt' contains values from enum	{ PF_NOPFROUTE, PF_FASTROUTE, PF_ROUTETO, PF_DUPTO, PF_REPLYTO }. I don't see how those could be squashed into a single flag, as they dictate differenct actions to be taken against packet.
Comment 21 Kristof Provost freebsd_committer 2019-02-09 14:34:37 UTC
(In reply to Kajetan Staszkiewicz from comment #20)
You are of course correct here.

I'd like to try to write a test case for this. Do you have any suggestions on how to best reproduce (as simple a version as possible of) the problematic behaviour?

vnet lets us create arbitrary numbers of pf/pfsync instances, so it should be possible to reproduce this. See /usr/src/tests/sys/netpfil/pf/pfsync.sh if you're interested in examples.