Bug 269770 - libalias udp redirect_port temporary translation failure
Summary: libalias udp redirect_port temporary translation failure
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.1-STABLE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-02-23 03:44 UTC by Peter Much
Modified: 2024-02-22 01:53 UTC (History)
0 users

See Also:


Attachments
DTRACE cmdline (2.98 KB, text/plain)
2024-02-22 00:39 UTC, Peter Much
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Much 2023-02-23 03:44:48 UTC
I occasionally observe temporary failures on my public UDP ports.

The problem seems to be with libalias when used for UDP port_redirect:

ipfw nat 3 config log same_ports unreg_only ip <public-ip> \
        redirect_port udp 192.168.xx.xx:5007 5006 \
        redirect_port tcp 192.168.xx.xx:5007 5006

02420     0       0 count log proto ip4 src-port 64000-64010
02425     0       0 nat 3 proto ip4
02430     0       0 count log proto ip4 src-port 64000-64010

Then randomly
# nc -u4p 64000 <public-ip> 5006
# nc -u4p 64001 <public-ip> 5006
# nc -u4p 64002 <public-ip> 5006

Feb 23 03:51:45 <security.info> edge kernel: [75643] ipfw-oper: 2420 Count UDP 91.12.117.156:64000 <public-ip>:5006 in via tun3
Feb 23 03:51:45 <security.info> edge kernel: [75643] ipfw-oper: 2430 Count UDP 91.12.117.156:64000 192.168.xx.xx:5007 in via tun3
Feb 23 03:52:08 <security.info> edge kernel: [75666] ipfw-oper: 2420 Count UDP 91.12.117.156:64001 <public-ip>:5006 in via tun3
Feb 23 03:52:08 <security.info> edge kernel: [75666] ipfw-oper: 2430 Count UDP 91.12.117.156:64001 192.168.xx.xx:5007 in via tun3
Feb 23 03:52:29 <security.info> edge kernel: [75687] ipfw-oper: 2420 Count UDP 91.12.117.156:64001 <public-ip>:5006 in via tun3
Feb 23 03:52:29 <security.info> edge kernel: [75687] ipfw-oper: 2430 Count UDP 91.12.117.156:64001 192.168.xx.xx:5007 in via tun3
Feb 23 03:52:36 <security.info> edge kernel: [75694] ipfw-oper: 2420 Count UDP 91.12.117.156:64001 51.158.21.23:5006 in via tun3
Feb 23 03:52:36 <security.info> edge kernel: [75694] ipfw-oper: 2430 Count UDP 91.12.117.156:64001 192.168.98.18:5007 in via tun3
Feb 23 03:52:56 <security.info> edge kernel: [75714] ipfw-oper: 2420 Count UDP 91.12.117.156:64002 <public-ip>:5006 in via tun3
Feb 23 03:52:56 <security.info> edge kernel: [75714] ipfw-oper: 2430 Count UDP 91.12.117.156:64002 192.168.xx.xx:5007 in via tun3
Feb 23 03:53:06 <security.info> edge kernel: [75724] ipfw-oper: 2420 Count UDP 91.12.117.156:64000 <public-ip>:5006 in via tun3
Feb 23 03:53:06 <security.info> edge kernel: [75724] ipfw-oper: 2430 Count UDP 91.12.117.156:64000 <public-ip>:5006 in via tun3
Feb 23 03:53:06 <security.info> edge kernel: [75724] ipfw-oper: 2685 Unreach 13 UDP 91.12.117.156:64000 <public-ip>:5006 in via tun3

In the beginning 64000 and 64001 get through.
Now only using 64001 for some time, and then
starting to use 64002, suddenly 64000 is no longer
translated. (Not always happens in that exact sequence.)

# ipfw nat show log
nat 2: icmp=32, udp=1, tcp=40, sctp=0, pptp=0, proto=0, frag_id=0 frag_ptr=0 / tot=73
nat 3: icmp=0, udp=2, tcp=72, sctp=0, pptp=0, proto=0, frag_id=0 frag_ptr=0 / tot=74

These udp= counters are changing meanwhile, but I don't get a clue
from it, and didn't find a way to log details.

Platform 13.2-BETA2 (but the problem is older)
Comment 1 Peter Much 2024-02-22 00:39:08 UTC
Created attachment 248673 [details]
DTRACE cmdline

 for general amusement and bookkeping - I mean, to observe what libalias adds and removes from it's database
Comment 2 Peter Much 2024-02-22 01:53:37 UTC
I managed to analyze this a bit further:

With an UDP service (e.g. VPN) portforwarded behind NAT there is no notion of a connection, and apparently it can happen that libalias creates TWO distinct records for inbound and outbound traffic, with the inbound record lacking the portforward (and the received packets then dumped as undeliverable on the NATing node).

The problem always resolves itself after either 12 or 14 minutes, apparently due to internal timings.

Observed with dtrace (cmdline is attached):

203.0.113.1   is my public address where NAT aliases to
198.51.100.1  is the address of the connecting client from outside
192.168.1.12  is the internal address of the VPN server (where portforward goes to)

-> At this time we have a packet client->server, two server->client and another client->server, all successful

 18  17832                 DeleteLink:entry openvpn 1708555176 icmp=5 udp=2 tcp=108
          type=17 (src_addr 192.168.1.12) (dst_addr 0.0.0.0)
          (alias_addr 0.0.0.0) (proxy_addr 0.0.0.0)
          (src_port 5007) (dst_port 0) (alias_port 5006) (proxy_port 0)

-> For whatever reason libalias tries to delete some UDP, but the next line shows that nothing was actually deleted. So this would apparently be the permanent portforward, there is no other codepath
   
 18  49947                    AddLink:entry openvpn 1708555180 icmp=5 udp=2 tcp=108

-> At this time we get a packet client->server, and it is the first failing (it gets processed by libalias, but NOT portforwarded (in the next second we also
get packets in the other direction, so without fractional seconds logged we
do not know when exactly what happened) 

 18  17832                 DeleteLink:entry openvpn 1708555186 icmp=5 udp=2 tcp=111
          type=17 (src_addr 192.168.1.12) (dst_addr 198.51.100.1)
          (alias_addr 0.0.0.0) (proxy_addr 0.0.0.0)
          (src_port 5007) (dst_port 8211) (alias_port 5006) (proxy_port 0)

-> Here the portforward got deleted (but WHY? It is continuousley used, pings
   every 10 seconds), and udp count goes to 1.

 18  49947                    AddLink:entry openvpn 1708555186 icmp=5 udp=1 tcp=111
          (src_addr 203.0.113.1) (dst_addr 91.62.14.110) (alias_addr 203.0.113.1)
          (src_port 5006)
 18  49948                   AddLink:return openvpn 1708555186 icmp=5 udp=2 tcp=111
          type=17 (src_addr 203.0.113.1) (dst_addr 198.51.100.1)
          (alias_addr 203.0.113.1) (proxy_addr 0.0.0.0)
          (src_port 5006) (dst_port 8211) (alias_port 5006) (proxy_port 0)

-> And immediately added again, but now WITHOUT the portforward!!

 19  17832                 DeleteLink:entry suricata 1708555186 icmp=5 udp=2 tcp=111
          type=17 (src_addr 192.168.1.12) (dst_addr 0.0.0.0)
          (alias_addr 0.0.0.0) (proxy_addr 0.0.0.0)
          (src_port 5007) (dst_port 0) (alias_port 5006) (proxy_port 0)

-> Apparently the housekeeping is too aggressive...

 19  49947                    AddLink:entry suricata 1708555186 icmp=5 udp=2 tcp=111
          (src_addr 192.168.1.12) (dst_addr 198.51.100.1) (alias_addr 0.0.0.0)
          (src_port 5007)
 19  49948                   AddLink:return suricata 1708555186 icmp=5 udp=3 tcp=111
          type=17 (src_addr 192.168.1.12) (dst_addr 198.51.100.1)
          (alias_addr 0.0.0.0) (proxy_addr 0.0.0.0)
          (src_port 5007) (dst_port 8211) (alias_port 5006) (proxy_port 0)

-> And now we get the portforward as a *SEPARATE* record (udp went to 3!) 

 19  17832                 DeleteLink:entry suricata 1708555186 icmp=5 udp=3 tcp=111
          type=17 (src_addr 192.168.1.12) (dst_addr 0.0.0.0)
          (alias_addr 0.0.0.0) (proxy_addr 0.0.0.0)
          (src_port 5007) (dst_port 0) (alias_port 5006) (proxy_port 0)

-> groundhog day again