Bug 219316 - Wildcard matching of ipfw flow tables
Summary: Wildcard matching of ipfw flow tables
Status: In Progress
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.0-STABLE
Hardware: Any Any
: --- Affects Only Me
Assignee: Lutz Donnerhacke
URL:
Keywords: patch
Depends on:
Blocks:
 
Reported: 2017-05-15 21:16 UTC by Lutz Donnerhacke
Modified: 2021-05-03 08:11 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Lutz Donnerhacke freebsd_committer freebsd_triage 2017-05-15 21:16:29 UTC
For Carrier Grade NAT environments any simple NAT table selection is not usable:

1) Large Scale NAT violates the happy eyeball requirement, that a given client should always use the same external IP while communicating to a given service.

2) Mapping all customers to a single IP does not work either, because there are too much connections originating by those customers.

Consequently a deterministically selected group of clients has to share the same NAT table using a single external IP. A typical approach is to use wildcards to match the right NAT instance:

add 2100 nat 100 ipv4 from 100.64.0.0:255.192.0.63 to any xmit ext out
add 2101 nat 101 ipv4 from 100.64.0.1:255.192.0.63 to any xmit ext out
add 2102 nat 102 ipv4 from 100.64.0.2:255.192.0.63 to any xmit ext out
...

This approach is inefficient, tables could help. But tables does not support wildcard masking of lookup data. With such an wildcard mask, especially the flow tables could greatly improve performance.
Comment 1 Lutz Donnerhacke freebsd_committer freebsd_triage 2017-05-15 21:18:05 UTC
First of all, the ipfw command needs to be extended.

Index: sbin/ipfw/ipfw.8
===================================================================
--- sbin/ipfw/ipfw.8    (revision 314807)
+++ sbin/ipfw/ipfw.8    (working copy)
@@ -66,6 +66,8 @@
 .Nm
 .Oo Cm set Ar N Oc Cm table Ar name Cm lookup Ar addr
 .Nm
+.Oo Cm set Ar N Oc Cm table Ar name Cm setmask Ar addr
+.Nm
 .Oo Cm set Ar N Oc Cm table Ar name Cm lock
 .Nm
 .Oo Cm set Ar N Oc Cm table Ar name Cm unlock
Index: sbin/ipfw/ipfw2.h
===================================================================
--- sbin/ipfw/ipfw2.h   (revision 314807)
+++ sbin/ipfw/ipfw2.h   (working copy)
@@ -231,6 +231,7 @@
        TOK_FIB,
        TOK_SETFIB,
        TOK_LOOKUP,
+       TOK_SETMASK,
        TOK_SOCKARG,
        TOK_SETDSCP,
        TOK_FLOW,
Index: sbin/ipfw/tables.c
===================================================================
--- sbin/ipfw/tables.c  (revision 314807)
+++ sbin/ipfw/tables.c  (working copy)
@@ -49,6 +49,7 @@
 static void table_create(ipfw_obj_header *oh, int ac, char *av[]);
 static void table_modify(ipfw_obj_header *oh, int ac, char *av[]);
 static void table_lookup(ipfw_obj_header *oh, int ac, char *av[]);
+static void table_setmask(ipfw_obj_header *oh, int ac, char *av[]);
 static void table_lock(ipfw_obj_header *oh, int lock);
 static int table_swap(ipfw_obj_header *oh, char *second);
 static int table_get_info(ipfw_obj_header *oh, ipfw_xtable_info *i);
@@ -114,6 +115,7 @@
       { "atomic",      TOK_ATOMIC },
       { "lock",                TOK_LOCK },
       { "unlock",      TOK_UNLOCK },
+      { "setmask",     TOK_SETMASK },
       { NULL, 0 }
 };

@@ -142,6 +144,7 @@
  *     ipfw table NAME add [addr[/masklen] value] [addr[/masklen] value] ..
  *     ipfw table NAME delete addr[/masklen] [addr[/masklen]] ..
  *     ipfw table NAME lookup addr
+ *     ipfw table NAME setmask addr
  *     ipfw table {NAME | all} flush
  *     ipfw table {NAME | all} list
  *     ipfw table {NAME | all} info
@@ -289,6 +292,10 @@
                ac--; av++;
                table_lookup(&oh, ac, av);
                break;
+       case TOK_SETMASK:
+               ac--; av++;
+               table_setmask(&oh, ac, av);
+               break;
        }
 }

@@ -1043,8 +1050,8 @@
 }

 static int
-table_do_lookup(ipfw_obj_header *oh, char *key, ipfw_xtable_info *xi,
-    ipfw_obj_tentry *xtent)
+table_do_lookup_or_setmask(ipfw_obj_header *oh, char *key, ipfw_xtable_info *xi,
+    ipfw_obj_tentry *xtent, int opcode)
 {
        char xbuf[sizeof(ipfw_obj_header) + sizeof(ipfw_obj_tentry)];
        ipfw_obj_tentry *tent;
@@ -1064,7 +1071,7 @@
        oh->ntlv.type = type;

        sz = sizeof(xbuf);
-       if (do_get3(IP_FW_TABLE_XFIND, &oh->opheader, &sz) != 0)
+       if (do_get3(opcode, &oh->opheader, &sz) != 0)
                return (errno);

        if (sz < sizeof(xbuf))
@@ -1089,7 +1096,7 @@
        strlcpy(key, *av, sizeof(key));

        memset(&xi, 0, sizeof(xi));
-       error = table_do_lookup(oh, key, &xi, &xtent);
+       error = table_do_lookup_or_setmask(oh, key, &xi, &xtent, IP_FW_TABLE_XFIND);

        switch (error) {
        case 0:
@@ -1109,6 +1116,32 @@
 }

 static void
+table_setmask(ipfw_obj_header *oh, int ac, char *av[])
+{
+       ipfw_obj_tentry xtent;
+       ipfw_xtable_info xi;
+       char key[64];
+       int error;
+
+       if (ac == 0)
+               errx(EX_USAGE, "mask required");
+
+       strlcpy(key, *av, sizeof(key));
+
+       memset(&xi, 0, sizeof(xi));
+       error = table_do_lookup_or_setmask(oh, key, &xi, &xtent, IP_FW_TABLE_XSETMASK);
+
+       switch (error) {
+       case 0:
+               break;
+       case ESRCH:
+               errx(EX_UNAVAILABLE, "Table %s not found", oh->ntlv.name);
+       default:
+               err(EX_OSERR, "getsockopt(IP_FW_TABLE_XSETMASK)");
+       }
+}
+
+static void
 tentry_fill_key_type(char *arg, ipfw_obj_tentry *tentry, uint8_t type,
     uint8_t tflags)
 {
Comment 2 Lutz Donnerhacke freebsd_committer freebsd_triage 2017-05-15 21:20:07 UTC
Please note, that the use of wildcard do raise the bug 217620.
Comment 3 Lutz Donnerhacke freebsd_committer freebsd_triage 2017-05-15 21:25:31 UTC
In order to process the new ipfw configuration-opcode, the kernel backend needs to be changed, too. This backend patch does not defined any functionality besides parsing the options and checking if an optional algorithm specific function is available. Otherwise the call returns ENOTSUP.

Index: sys/netinet/ip_fw.h
===================================================================
--- sys/netinet/ip_fw.h (revision 314807)
+++ sys/netinet/ip_fw.h (working copy)
@@ -110,6 +110,7 @@
 #define        IP_FW_DUMP_SOPTCODES    116     /* Dump available sopts/versions */
 #define        IP_FW_DUMP_SRVOBJECTS   117     /* Dump existing named objects */

+#define        IP_FW_TABLE_XSETMASK    118     /* set a generic input mask */
 /*
  * The kernel representation of ipfw rules is made of a list of
  * 'instructions' (for all practical purposes equivalent to BPF
Index: sys/netpfil/ipfw/ip_fw_table.c
===================================================================
--- sys/netpfil/ipfw/ip_fw_table.c      (revision 314807)
+++ sys/netpfil/ipfw/ip_fw_table.c      (working copy)
@@ -1143,6 +1143,78 @@
 }

 /*
+ * Set a generic input mask for a table
+ * Data layout (v0)(current):
+ * Request: [ ipfw_obj_header ipfw_obj_tentry ]
+ * Reply: [ ipfw_obj_header ipfw_obj_tentry ]
+ *
+ * Returns 0 on success
+ */
+static int
+set_table_mask(struct ip_fw_chain *ch, ip_fw3_opheader *op3,
+    struct sockopt_data *sd)
+{
+       ipfw_obj_tentry *tent;
+       ipfw_obj_header *oh;
+       struct tid_info ti;
+       struct table_config *tc;
+       struct table_algo *ta;
+       struct table_info *kti;
+       struct namedobj_instance *ni;
+       int error;
+       size_t sz;
+
+       /* Check minimum header size */
+       sz = sizeof(*oh) + sizeof(*tent);
+       if (sd->valsize != sz)
+               return (EINVAL);
+
+       oh = (struct _ipfw_obj_header *)ipfw_get_sopt_header(sd, sz);
+       tent = (ipfw_obj_tentry *)(oh + 1);
+
+       /* Basic length checks for TLVs */
+       if (oh->ntlv.head.length != sizeof(oh->ntlv))
+               return (EINVAL);
+
+       objheader_to_ti(oh, &ti);
+       ti.type = oh->ntlv.type;
+       ti.uidx = tent->idx;
+
+       IPFW_UH_WLOCK(ch);
+       ni = CHAIN_TO_NI(ch);
+
+       /*
+        * Find existing table and check its type .
+        */
+       ta = NULL;
+       if ((tc = find_table(ni, &ti)) == NULL) {
+               IPFW_UH_WUNLOCK(ch);
+               return (ESRCH);
+       }
+
+       /* check table type */
+       if (tc->no.subtype != ti.type) {
+               IPFW_UH_WUNLOCK(ch);
+               return (EINVAL);
+       }
+
+       kti = KIDX_TO_TI(ch, tc->no.kidx);
+       ta = tc->ta;
+
+       if (ta->set_mask == NULL) {
+               IPFW_UH_WUNLOCK(ch);
+               return (ENOTSUP);
+       }
+
+       IPFW_WLOCK(ch);
+       error = ta->set_mask(tc->astate, kti, tent);
+       IPFW_WUNLOCK(ch);
+       IPFW_UH_WUNLOCK(ch);
+
+       return (error);
+}
+
+/*
  * Flushes all entries or destroys given table.
  * Data layout (v0)(current):
  * Request: [ ipfw_obj_header ]
@@ -3258,6 +3330,7 @@
        { IP_FW_TABLE_XSWAP,    0,      HDIR_SET,       swap_table },
        { IP_FW_TABLES_ALIST,   0,      HDIR_GET,       list_table_algo },
        { IP_FW_TABLE_XGETSIZE, 0,      HDIR_GET,       get_table_size },
+       { IP_FW_TABLE_XSETMASK, 0,      HDIR_SET,       set_table_mask },
 };

 static int
Index: sys/netpfil/ipfw/ip_fw_table.h
===================================================================
--- sys/netpfil/ipfw/ip_fw_table.h      (revision 314807)
+++ sys/netpfil/ipfw/ip_fw_table.h      (working copy)
@@ -108,6 +108,8 @@
     ipfw_obj_tentry *tent);
 typedef int ta_find_tentry(void *ta_state, struct table_info *ti,
     ipfw_obj_tentry *tent);
+typedef int ta_set_mask(void *ta_state, struct table_info *ti,
+    ipfw_obj_tentry *tent);
 typedef void ta_dump_tinfo(void *ta_state, struct table_info *ti,
     ipfw_ta_tinfo *tinfo);
 typedef uint32_t ta_get_count(void *ta_state, struct table_info *ti);
@@ -139,6 +141,7 @@
        ta_print_config *print_config;
        ta_dump_tinfo   *dump_tinfo;
        ta_get_count    *get_count;
+       ta_set_mask     *set_mask;
 };
 #define        TA_FLAG_DEFAULT         0x01    /* Algo is default for given type */
 #define        TA_FLAG_READONLY        0x02    /* Algo does not support modifications*/
Comment 4 Lutz Donnerhacke freebsd_committer freebsd_triage 2017-05-15 21:27:55 UTC
I do only need the real functionality in the flow tables, so this patch provides only this partial implementation. I do reuse the already existing flow masks.

Index: sys/netpfil/ipfw/ip_fw_table_algo.c
===================================================================
--- sys/netpfil/ipfw/ip_fw_table_algo.c (revision 314807)
+++ sys/netpfil/ipfw/ip_fw_table_algo.c (working copy)
@@ -186,6 +187,17 @@
  *    entry not found: returns ENOENT
  *
  *
+ * -set_mask: set generic input mask specifed in @tei
+ *  typedef int ta_set_mask(void *ta_state, struct table_info *ti,
+ *      ipfw_obj_tentry *tent);
+ *  OPTIONAL, locked (UH+WLOCK). (M_NOWAIT). Returns 0 on success.
+ *
+ *  Finds entry specified by given key.
+ *  * Caller is required to do the following:
+ *    entry found: returns 0, export entry to @tent
+ *    entry not found: returns ENOENT
+ *
+ *
  * -need_modify: checks if @ti has enough space to hold another @count items.
  *  typedef int (ta_need_modify)(void *ta_state, struct table_info *ti,
  *      uint32_t count, uint64_t *pflags);
@@ -3099,6 +3111,7 @@
        size_t                  items;
        struct fhashentry4      fe4;
        struct fhashentry6      fe6;
+       uint8_t                 flags;
 };

 struct ta_buf_fhash {
@@ -3274,6 +3292,7 @@
        cfg = malloc(sizeof(struct fhash_cfg), M_IPFW, M_WAITOK | M_ZERO);

        cfg->size = 512;
+       cfg->flags = tflags;

        cfg->head = malloc(sizeof(struct fhashbhead) * cfg->size, M_IPFW,
            M_WAITOK | M_ZERO);
@@ -3475,6 +3494,69 @@
        return (ENOENT);
 }

+static int
+ta_set_fhash_mask(void *ta_state, struct table_info *ti,
+    ipfw_obj_tentry *tent)
+{
+       struct fhash_cfg *cfg;
+       struct fhashentry *ent;
+       struct fhashentry6 fe6, *pm6;
+       struct fhashentry4 *pm4;
+       struct tentry_info tei;
+       int error;
+
+       cfg = (struct fhash_cfg *)ta_state;
+
+       ent = &fe6.e;
+       pm6 = &fe6;
+       pm4 = (struct fhashentry4 *) &fe6;
+
+       memset(&fe6, 0, sizeof(fe6));
+       memset(&tei, 0, sizeof(tei));
+
+       tei.paddr = &tent->k.flow;
+       tei.subtype = tent->subtype;
+
+       if ((error = tei_to_fhash_ent(&tei, ent)) != 0)
+               return (error);
+
+       /* Fill in fe masks based on @tflags */
+        switch(ent->af) {
+#ifdef INET
+       case AF_INET:
+               if (cfg->flags & IPFW_TFFLAG_SRCIP)
+                       cfg->fe4.sip = pm4->sip;
+               if (cfg->flags & IPFW_TFFLAG_DSTIP)
+                       cfg->fe4.dip = pm4->dip;
+               if (cfg->flags & IPFW_TFFLAG_SRCPORT)
+                       cfg->fe4.e.sport = ent->sport;
+               if (cfg->flags & IPFW_TFFLAG_DSTPORT)
+                       cfg->fe4.e.dport = ent->dport;
+               if (cfg->flags & IPFW_TFFLAG_PROTO)
+                       cfg->fe4.e.proto = ent->proto;
+               break;
+#endif
+#ifdef INET6
+       case AF_INET6:
+               if (cfg->flags & IPFW_TFFLAG_SRCIP)
+                       cfg->fe6.sip6 = pm6->sip6;
+               if (cfg->flags & IPFW_TFFLAG_DSTIP)
+                       cfg->fe6.dip6 = pm6->dip6;
+               if (cfg->flags & IPFW_TFFLAG_SRCPORT)
+                       cfg->fe6.e.sport = ent->sport;
+               if (cfg->flags & IPFW_TFFLAG_DSTPORT)
+                       cfg->fe6.e.dport = ent->dport;
+               if (cfg->flags & IPFW_TFFLAG_PROTO)
+                       cfg->fe6.e.proto = ent->proto;
+               break;
+#endif
+       default:
+               return (EINVAL);
+       }
+
+       return (0);
+}
+
 static void
 ta_foreach_fhash(void *ta_state, struct table_info *ti, ta_foreach_f *f,
     void *arg)
@@ -3771,6 +3853,7 @@
        .fill_mod       = ta_fill_mod_fhash,
        .modify         = ta_modify_fhash,
        .flush_mod      = ta_flush_mod_fhash,
+       .set_mask       = ta_set_fhash_mask,
 };

 /*
Comment 5 Lutz Donnerhacke freebsd_committer freebsd_triage 2017-05-15 21:28:18 UTC
This code runs in production since several months.
Comment 6 Andrey V. Elsukov freebsd_committer freebsd_triage 2017-05-17 12:33:46 UTC
(In reply to lutz from comment #0)
> Consequently a deterministically selected group of clients has to share the
> same NAT table using a single external IP. A typical approach is to use
> wildcards to match the right NAT instance:
> 
> add 2100 nat 100 ipv4 from 100.64.0.0:255.192.0.63 to any xmit ext out
> add 2101 nat 101 ipv4 from 100.64.0.1:255.192.0.63 to any xmit ext out
> add 2102 nat 102 ipv4 from 100.64.0.2:255.192.0.63 to any xmit ext out
> ...
> 
> This approach is inefficient, tables could help. But tables does not support
> wildcard masking of lookup data. With such an wildcard mask, especially the
> flow tables could greatly improve performance.

Can you provide an example how your patches solve this problem? Some commands/rules that you use for configuration would be good.
Comment 7 Lutz Donnerhacke freebsd_committer freebsd_triage 2017-05-17 16:27:48 UTC
# ipfw show
00100 228070727002 277397011152705 nat tablearg ip4 from any to any flow table(natin) recv ext in
00200 247814016293  35467809536790 nat tablearg ip4 from any to any flow table(natout) xmit ext out

# cat /etc/firewall.rules
nat 1 config ip a.b.c.48 same_ports
nat 2 config ip a.b.d.48 same_ports
...
nat 127 config ip x.y.z.46 same_ports
nat 128 config ip x.y.z.47 same_ports

table natin create type flow:dst-ip valtype nat
table natin setmask 255.255.255.255
table natin add a.b.c.48 1
table natin add a.b.d.48 2
...
table natin add x.y.z.46 127
table natin add x.y.z.47 128

table natout create type flow:src-ip valtype nat
table natout setmask 255.192.0.127
table natout add 100.64.0.0 1
table natout add 100.64.0.1 2
...
table natout add 100.64.0.126 127
table natout add 100.64.0.127 128


There are multiple machines doing this (with different NAT IPs)

I'm going to extend the flow in the following way in order to reuse the ports much more:

table natin create type flow:src-ip,proto,src-port,dst-ip valtype nat
table natin setmask 0.0.15.0,1,3,255.255.255.255

table natout create type flow:src-ip,proto,dst-ip,dst-port valtype nat
table natout setmask 255.192.0.127,1,0.0.15.0,3

Yes, this generates 128 (NAT-IPs) * 2 (Protocol) * 16 (dest-ip) * 4 (dest-port) =  16384 NAT tables.

Depending on the available RAM, I'll extent the masks further.

But I do need a different NAT table selection algorithm for this approach, the current linked list needs to be replaced by a much more efficient access scheme. I'll send this patch later.
Comment 8 Julian Elischer freebsd_committer freebsd_triage 2017-05-20 15:27:26 UTC
I've read this twice and my head explodes somewhere in the middle.
I think I need to draw pictures to understand it. :-)
Comment 9 Julian Elischer freebsd_committer freebsd_triage 2017-05-20 15:52:35 UTC
(In reply to lutz from comment #0)
>For Carrier Grade NAT environments any simple NAT table selection is not usable:
>
>1) Large Scale NAT violates the happy eyeball requirement, that a given client
> should always use the same external IP while communicating to a given service.

On what timescale? Forever? If a client is idle for 5 minutes (no sessions) can
it start using a new IP?

>
>2) Mapping all customers to a single IP does not work either, because there
> are too much connections originating by those customers.

How may remote addresses are you talking too?
You can reuse the same address and port to may different remote addresses..

>
> Consequently a deterministically selected group of clients has to share the
> same NAT table using a single external IP. A typical approach is to use 
> wildcards to match the right NAT instance:

you just said that "Mapping all customers to a single IP does not work .."
and yet that is what you show here.. Am I misreading it?

How many clients are we talking about here? 10? 100? 1000? 10K? 100K? 1M?
and are these clients all on separate hardware? or are they coming from a small number of session aggregator machines?

>
> add 2100 nat 100 ipv4 from 100.64.0.0:255.192.0.63 to any xmit ext out
> add 2101 nat 101 ipv4 from 100.64.0.1:255.192.0.63 to any xmit ext out
> add 2102 nat 102 ipv4 from 100.64.0.2:255.192.0.63 to any xmit ext out
> ...
> 
> This approach is inefficient, tables could help. But tables does not support
> wildcard masking of lookup data. With such an wildcard mask, especially the
> flow tables could greatly improve performance.

I don't quite understand this bit


my memory is that you can have a table
100.64.0.0:255.192.0.63  0
100.64.0.1:255.192.0.63  1
100.64.0.2:255.192.0.63  2
... etc

followed by

nat tablearg ip from table (x) to any out xmit XX0

now getting the return packets back to the same NAT instance could be a challenge depending on what the NAT does but it should be possible if each NAT uses a different address as you suggest.

what am I missing?
Comment 10 Julian Elischer freebsd_committer freebsd_triage 2017-05-20 16:02:39 UTC
reply to self:

ah ipfw doesn't support addr:mask

I remember it working on a version that did at one stage...
so maybe the answer is to extend the table add command to support addr:mask

I think that would be simpler..  especially in the radix tree lookup (assuming that is still an option) The radix tree code already supports all the discontinuous mask features.
Comment 11 Lutz Donnerhacke freebsd_committer freebsd_triage 2017-05-21 22:14:40 UTC
For flows the extend to ip:mask (per entry) does not really help:
 - ports and protocol numbers are not covered
 - hashs are not radix trees, they can handle only an uniform mask

And there is already a mask in the hash. I do only modify it.
Comment 12 Lutz Donnerhacke freebsd_committer freebsd_triage 2017-05-21 22:41:30 UTC
Ah, I missed the previous comment.

>>1) Large Scale NAT violates the happy eyeball requirement, that a given client
>> should always use the same external IP while communicating to a given service.

> On what timescale? Forever?

As long as the client has the same (CGN) IP (from 10.64.0.0/10).

> If a client is idle for 5 minutes (no sessions) can
> it start using a new IP?

No. That violates the happy-eyeball contraint. Several web services bind the session to the external visible IP. If this IP does change, the customer has to login again and again. We already made this mistake (using LSN).

>>2) Mapping all customers to a single IP does not work either, because there
>> are too much connections originating by those customers.

> How may remote addresses are you talking too?
> You can reuse the same address and port to may different remote addresses..

That would surprise me. Such an implementation would require dynamic memory for the NAT tables. I do not see such a memory usage on my FreeBSD machines. I did see such an effect on a CISCO ASA.
See: https://lutz.donnerhacke.de/Blog/High-memory-with-extended-PAT-on-ASA

>> Consequently a deterministically selected group of clients has to share the
>> same NAT table using a single external IP. A typical approach is to use 
>> wildcards to match the right NAT instance:

> you just said that "Mapping all customers to a single IP does not work .."
> and yet that is what you show here.. Am I misreading it?

The classical NAT setting does not distinguish between the client IPs and therefore does either have a single IP or LSN.

My setup partitions the clients by their IPs and then I use a "single IP per partition" NAT.

> How many clients are we talking about here? 10? 100? 1000? 10K? 100K? 1M?
> and are these clients all on separate hardware? or are they coming from a
> small number of session aggregator machines?

Currently I have ~10k clients per hardware, the setup scales horizontally. If I get more clients, I add additional machines and tell them in DHCP to use a different gateway (next machine).

>> add 2100 nat 100 ipv4 from 100.64.0.0:255.192.0.63 to any xmit ext out
>> add 2101 nat 101 ipv4 from 100.64.0.1:255.192.0.63 to any xmit ext out
>> add 2102 nat 102 ipv4 from 100.64.0.2:255.192.0.63 to any xmit ext out
>>
>> This approach is inefficient, tables could help. But tables does not support
>> wildcard masking of lookup data. With such an wildcard mask, especially the
>> flow tables could greatly improve performance.

> I don't quite understand this bit
> my memory is that you can have a table
> 100.64.0.0:255.192.0.63  0
> 100.64.0.1:255.192.0.63  1
...
> nat tablearg ip from table (x) to any out xmit XX0

You are right. That's the setup I'm used before switching to this flow based NAT. I only used the very early setup to demonstrate the problem. My fault.

> what am I missing?

You are missing the privacy expectations and the Law Enforcement Agencies. For privacy, we like to use different external IPs for the same client reaching different services. That's why flows.

For LEAs we need to tell exactly which user war involved in a specific session, so we need to log some data about NAT. This is an overwhelming large amount of data, so we like to push down the necessary logs. This can be done by allocation blocks of ports to a customer instead of individual ports.

In order to carefully assign such port ranges, they need to be large (at least 300 per customer in order to access Google Maps without errors). That's why we need to heavily reuse port (ranges) and this requires multiple NAT tables per customer. The only separation method left is to include the destination address, port and protocol.

That's why we switched to flows.