Ipnat 10.1- 10.2 It does not work properly after 1-2 days. Not all sites are starting to check.
I have similar problems, very hard to debug. reboot fixes it though. there are reports https://forums.freebsd.org/threads/50432/ https://forums.freebsd.org/threads/50618/ there is a report here of other ipnat issues that I can't replicate https://sourceforge.net/p/ipfilter/bugs/560/
Aleks can you post output of ipnat -s ? for me I see "bucket max in" increasing when things are not working right
A few questions: 1. Is this for IPv4 or IPv6? 2. Do you have fastforward enabled? 3. Is tso4 or tso6 enabled? 5. What kind of interfaces do you have? (fxp?) 5. Do the logs indicate "bad" packets? 6. Do you have rxcsum, txcsum, rxcsum6, txcsum6 enabled on any interfaces? 7, What kind of traffic traverses your firewall? 8. Can you send me copies of your ipf.conf and ipnat.conf? 9. Can you send me copies of your logs? If you can, can you try -CURRENT and issue the following dtrace command and send output: dtrace -n 'sdt:::ipf_fi_bad_* { stack(); }'
ipnat -CF -f /etc/ipnat.rules rdr vmx2 *.*.*.*/32 port 37777 -> 192.168.2.120 port 37777 rdr vmx2 *.*.*.*/32 port 554 -> 192.168.2.120 port 554 rdr vmx2 *.*.*.*/32 port 8933 -> 192.168.2.2 port 3389 ipnat -s 0 proxy create fail in 0 proxy fail in 15774 bad nat in 15774 bad nat new in 0 bad next addr in 8039 bucket max in 0 clone nomem in 0 decap bad in 0 decap fail in 0 decap pullup in 0 divert dup in 0 divert exist in 15774 drop in 0 exhausted in 0 icmp address in 0 icmp basic in 6641 inuse in 0 icmp mbuf wrong size in 15363 icmp header unmatched in 0 icmp rebuild failures in 5 icmp short in 0 icmp packet size wrong in 0 IFP address fetch failures in 1786353797 packets untranslated in 0 NAT insert failures in 130865 NAT lookup misses in 1786289959 NAT lookup nowild in 0 new ifpaddr failed in 0 memory requests failed in 0 table max reached in 2063844 packets translated in 15774 finalised failed in 0 search wraps in 0 null translations in 0 translation exists in 0 no memory in 0% hash efficiency in 72.11% bucket usage in 0 minimal length in -1 maximal length in 0.000 average length in 0 proxy create fail out 0 proxy fail out 0 bad nat out 0 bad nat new out 0 bad next addr out 7735 bucket max out 0 clone nomem out 0 decap bad out 0 decap fail out 0 decap pullup out 0 divert dup out 0 divert exist out 0 drop out 0 exhausted out 0 icmp address out 0 icmp basic out 6639 inuse out 0 icmp mbuf wrong size out 5055 icmp header unmatched out 0 icmp rebuild failures out 0 icmp short out 0 icmp packet size wrong out 0 IFP address fetch failures out 629077063 packets untranslated out 0 NAT insert failures out 70007 NAT lookup misses out 629021373 NAT lookup nowild out 0 new ifpaddr failed out 0 memory requests failed out 0 table max reached out 4352189 packets translated out 0 finalised failed out 0 search wraps out 0 null translations out 0 translation exists out 0 no memory out 0% hash efficiency out 72.11% bucket usage out 0 minimal length out -1 maximal length out 0.000 average length out 0 log successes 0 log failures 6819 added in 0 added out 0 active 0 transparent adds 0 divert build 6817 expired 0 flush all 0 flush closing 0 flush queue 0 flush state 0 flush timeout 0 hostmap new 0 hostmap fails 0 hostmap add 0 hostmap NULL rule 0 log ok 0 log fail 0 orphan count 3 rule count 0 map rules 3 rdr rules 0 wilds ipnat -s (9.1-RELEASE-p7 FreeBSD 9.1-RELEASE-p7) mapped in 2667739183 out 1702069172 added 46417036 expired 46254947 no memory 0 bad nat 20 inuse 31107 orphans 0 rules 4 wilds 0 hash efficiency 30.68% bucket usage 95.34% minimal length 0 maximal length 11 average length 3.260 TCP Entries per state 0 1 2 3 4 5 6 7 8 9 10 11 1 1420 1 0 13308 3261 508 0 2 0 1420 86
dtrace -n 'sdt:::ipf_fi_bad_* { stack(); }' dtrace: failed to initialize dtrace: DTrace device not available on system portinstall dtrace [Updating the portsdb <format:bdb_btree> in /usr/ports ... - 25971 port entries found .........1000.........2000.........3000.........4000.........5000.........6000.........7000.........8000.........9000.........10000.........11000.........12000.........13000.........14000.........15000.........16000.........17000.........18000.........19000.........20000.........21000.........22000.........23000.........24000.........25000......... ..... done]
rinetd.conf 0.0.0.0 3839 192.168.3.2 3389 0.0.0.0 3389 192.168.4.4 3389 ipfw show 00002 58 4728 deny ip from 37.57.10.103 to any 00002 2 101 deny ip from 185.81.157.165 to any 00002 0 0 deny ip from 134.147.203.115 to any 00002 2 80 deny ip from 184.105.139.102 to any 00002 1 68 deny ip from 198.23.217.2 to any 00002 26 1276 deny ip from 94.19.161.49 to any 00002 0 0 deny ip from 94.19.161.49 to any 00002 1 57 deny ip from 113.240.250.157 to any 00002 0 0 deny ip from 185.103.252.141 to any 00002 0 0 deny ip from 23.107.61.74 to any 00002 4 165 deny ip from 185.130.4.187 to any 00002 21 973 deny ip from 71.6.167.142 to any 00002 12 1384 deny ip from 77.37.208.110 to any 00002 8 480 deny ip from 208.100.26.231 to any 00002 2 103 deny ip from 159.122.221.35 to any 00002 4 250 deny ip from 46.28.53.164 to any 00002 0 0 deny ip from 72.178.16.119 to any 00002 0 0 deny ip from 149.56.106.210 to any 00002 8 480 deny ip from 208.100.26.230 to any 00002 516 33586 deny ip from 89.18.27.155 to any 00002 1 40 deny ip from 184.105.139.78 to any 00002 46 2096 deny ip from 109.87.13.248 to any 00002 50 2292 deny ip from 178.136.244.49 to any 00002 3 121 deny ip from 66.240.236.119 to any 00002 0 0 deny ip from 184.105.139.122 to any 00002 19 1996 deny ip from 118.193.27.8 to any 00002 22 2172 deny ip from 118.193.27.26 to any 00010 0 0 deny tcp from table(51) to me dst-port 21 via vmx3 00011 0 0 deny tcp from table(50) to me dst-port 22 via vmx3 00012 0 0 deny tcp from table(52) to me dst-port 25 via vmx3 00014 0 0 deny tcp from table(55) to me dst-port 80 via vmx3 00100 829048410 5928407892820 allow ip from any to any via lo0 00200 0 0 deny ip from any to 127.0.0.0/8 00300 0 0 deny ip from 127.0.0.0/8 to any 00400 0 0 deny ip from any to ::1 00500 0 0 deny ip from ::1 to any 00600 0 0 allow ipv6-icmp from :: to ff02::/16 00700 0 0 allow ipv6-icmp from fe80::/10 to fe80::/10 00800 0 0 allow ipv6-icmp from fe80::/10 to ff02::/16 00900 0 0 allow ipv6-icmp from any to any ip6 icmp6types 1 01000 0 0 allow ipv6-icmp from any to any ip6 icmp6types 2,135,136 64103 0 0 allow ip from any to any via lo0 64203 0 0 deny ip from any to 127.0.0.0/8 64303 0 0 deny ip from 127.0.0.0/8 to any 64503 61597109 24592953509 nat 2 ip from 192.168.0.0/21 to any out via vmx2 64603 140053843 88880209406 nat 2 ip from any to any in via vmx2 64703 1711865023 2203642003048 allow ip from any to any 65000 987 183839 allow ip from any to any 65535 25 5026 allow ip from any to any
10.2-STABLE FreeBSD 10.2-STABLE #1: tcp_extensions="NO" tcp_drop_synfin="YES" icmp_drop_redirect="YES" icmp_log_redirect="YES" kern_securelevel_enable="YES" kern_securelevel=2 ipfilter_enable="YES" ipfilter_rules="/etc/ipf.rules" ipfilter_flags="-D -T ipf_nattable_sz=10009,ipf_nattable_max=300000 -E" ipfstat list 0 input bad packets 0 input IPv6 packets 15777 input packets blocked 2031607163 input packets passed 2031557771 input packets not matched 0 input packets counted 5 input packets short 0 input packets logged and blocked 0 input packets logged and passed 0 input fragment state kept 0 input fragment state lost 0 input packet state kept 0 input packet state lost 0 input invalid source 0 input cache hits 0 input cache misses 0 input bad coalesces 0 input pullups succeeded 0 input pullups failed 0 input TCP checksum failures 0 input block reason packet blocked 0 input block reason log rule failure 0 input block reason pps rate exceeded 0 input block reason jumbogram 0 input block reason makefrip failed 0 input block reason cannot add state 0 input block reason IP ID update failed 0 input block reason log-or-block failed 0 input block reason decapsulate failure 0 input block reason cannot create new auth entry 0 input block reason packet queued for auth 0 input block reason buffer coalesce failure 0 input block reason buffer pullup failure 0 input block reason auth feedback 0 input block reason bad fragment 15777 input block reason IPv4 NAT failure 0 input block reason IPv6 NAT failure 0 output bad packets 0 output IPv6 packets 0 output packets blocked 711655122 output packets passed 711643028 output packets not matched 0 output packets counted 0 output packets short 0 output packets logged and blocked 0 output packets logged and passed 0 output fragment state kept 0 output fragment state lost 0 output packet state kept 0 output packet state lost 0 output invalid source 0 output cache hits 0 output cache misses 0 output bad coalesces 0 output pullups succeeded 0 output pullups failed 0 output TCP checksum failures 0 output block reason packet blocked 0 output block reason log rule failure 0 output block reason pps rate exceeded 0 output block reason jumbogram 0 output block reason makefrip failed 0 output block reason cannot add state 0 output block reason IP ID update failed 0 output block reason log-or-block failed 0 output block reason decapsulate failure 0 output block reason cannot create new auth entry 0 output block reason packet queued for auth 0 output block reason buffer coalesce failure 0 output block reason buffer pullup failure 0 output block reason auth feedback 0 output block reason bad fragment 0 output block reason IPv4 NAT failure 0 output block reason IPv6 NAT failure 0 packets logged 0 log failures 0 red-black no memory 0 red-black node maximum 0 ICMP replies sent 0 TCP RSTs sent 0 fastroute successes 0 fastroute failures 4964119 IPF Ticks 0 Packet log flags set: none
I believe I have stumbled into this problem as well. As time goes on, devices will get timeouts on new connections. Once established, a connection moves at full speed. A reboot fixes it. Here are answers to some of your questions Cy, let me know if you want the config files etc. FreeBSD 11.0-RELEASE-p2 (amd64, GENERIC kernel). The system is pretty bare and only provides sshd, isc-dhcpd, ipnat, ipf as services. Intel NICs (igb) with the default options: options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6> ipv6 disabled in rc.conf (though it still shows active on lo0) After about 8 hours of uptime: # ipfstat | egrep 'NAT failure' 4 input block reason IPv4 NAT failure 0 input block reason IPv6 NAT failure 2080 output block reason IPv4 NAT failure <--- climbing constantly 0 output block reason IPv6 NAT failure
Do you mind sending me a copy of your ipf.conf and ipnat.conf?
Just a quick comment that I too see this (or similar) problem on 11.0-RELEASE-p2. Really annoying. Network interface flags: > root@hope:/etc # ifconfig bce1 > bce1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 > options=c01bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,VLAN_HWTSO,LINKSTATE> > root@hope:/etc # uname -a > FreeBSD hope.grebo.net 11.0-RELEASE-p2 FreeBSD 11.0-RELEASE-p2 #0: Mon Oct 24 06:55:27 UTC 2016 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 Can provide more info if needed. Machine ia a dual-CPU 32GB HP server that acts as a firewall/NAT-gateway at home so I can do some experimentation on it if needed :-) - Peter
Can you provide: - /var/log/messages outputs showing ipmon outputs. If you're not running ipmon, please do and send me outputs for a period of time. - ipf.conf and ipnat.conf -ifconfig -a output - ipfstat -in and ipfstat -on, separately (easier to read). - ipnat -lv output - are you using IPv4, IPv6, or both? Depending on what we see, I'll ask for more. You can send to my freebsd.org email address instead of posting here if you want.
Output from commands/files sent to your email. Right now the machine doesn't exhibit the problem though (since last reboot late last night - but there hasn't been much traffic through it since then). Will send a new set of output when the problem manifests itself again. I'm only using IPv4 (right now).
(In reply to Peter Eriksson from comment #12) Are you able to send me those outputs when the machine is exhibiting the problem?
I have this problem as well. Once established, a connection moves at full speed. Reboot fixes this problem on short time. Periodically stops doing NAT. If you repeat 4-6 queries, one of them passes and the connection is established. If the connection is established, it works fine. Checked simply. Started ping from the internal to external network, and stopped. If the ping did not pass (and this happens in 3-5 cases in a row) - the number of errors increases. If it worked during the ping, no errors were added. Similarly, the redirect from the external network to the internal network does not work. Similarly, the redirect from the external network to the internal network does not work. Only one from 3-5 requests pass. Reproduce this problem on i386 and amd64 kernel. After one day uptime: # ipfstat | grep 'NAT failure' 1284 input block reason IPv4 NAT failure 0 input block reason IPv6 NAT failure 4150 output block reason IPv4 NAT failure 0 output block reason IPv6 NAT failure In this time #uname -a: FreeBSD test 11.0-RELEASE-p10 FreeBSD 11.0-RELEASE-p10 #1 r317791: Thu May 4 16:47:26 EEST 2017 user@test:/usr/obj/usr/src/sys/TEST amd64 Kernel -- GENERIC with minimum change, I add: maxusers 0 options IPFILTER options IPFILTER_LOG nfe0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=82008<VLAN_MTU,WOL_MAGIC,LINKSTATE> em1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=4019b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,VLAN_HWTSO> There are no messages from ipnat to /var/log/messages, except the startup message when the system starts. What other information is needed?
(In reply to DYM from comment #14) Can you provide ipnat.conf and ipf.conf please? Can you list /var/log/messages or the file to where your ipmon output is sent? Do you use ippool?
Also, ifconfig -a, netstat -nr, and netstat -ni. Can you also provide tcpdump -w output? Don't post it here, send it to me directly. Thanks.
xx.xx.xx.xx -- external IP yy.yy.yy.yy -- internal IP zz.zz.zz.zz -- internal IP #cat ipnat.conf rdr em1 xx.xx.xx.xx/32 port 80 -> yy.yy.yy.yy port 80 rdr em1 xx.xx.xx.xx/32 port 88 -> zz.zz.zz.zz port 80 map em1 yy.yy.yy.0/24 -> xx.xx.xx.xx/32 portmap tcp/udp 10000:200000 map em1 yy.yy.yy.0/24 -> xx.xx.xx.xx/32 map em1 zz.zz.zz.0/24 -> xx.xx.xx.xx/32 portmap tcp/udp 10000:200000 map em1 zz.zz.zz.0/24 -> xx.xx.xx.xx/32 #cat ipf.rules pass in all pass out all Can you list /var/log/messages or the file to where your ipmon output is sent? There from ipf/ipnat/ipmon for a day only one record: IP Filter: v5.1.2 initialized. Default = pass all, Logging = enabled Do you use ippool? No. If necessary, I can build a stand and simulate an internal machine, a gateway server and an external network. However, this will take some time.
Sorry. #cat ipnat.rules rdr em1 xx.xx.xx.xx/32 port 80 -> yy.yy.yy.yy port 80 rdr em1 xx.xx.xx.xx/32 port 88 -> zz.zz.zz.zz port 80 map em1 yy.yy.yy.0/24 -> xx.xx.xx.xx/32 portmap tcp/udp 10000:20000 map em1 yy.yy.yy.0/24 -> xx.xx.xx.xx/32 map em1 zz.zz.zz.0/24 -> xx.xx.xx.xx/32 portmap tcp/udp 10000:20000 map em1 zz.zz.zz.0/24 -> xx.xx.xx.xx/32
If download a file from two sources (simulating a load gateway) some downloads started (beginning of the pcap-file), ome of the same sources -- no: No route to host / Couldn't fetch it Here is the output of the NAT errors before starting to simulate, and after the end. root@test /home/user # ipfstat | grep 'NAT failure' 16 input block reason IPv4 NAT failure 0 input block reason IPv6 NAT failure 70 output block reason IPv4 NAT failure 0 output block reason IPv6 NAT failure root@test /home/user # ipfstat | grep 'NAT failure' 16 input block reason IPv4 NAT failure 0 input block reason IPv6 NAT failure 80 output block reason IPv4 NAT failure 0 output block reason IPv6 NAT failure I also noticed that this happens on the GENERIC kernel when ipnat turn on.
Thanks DYM, I can see the failures above.I see three TCP sessions. Two to downloads.sourforge.net and a third, a sourceforge redirect to a geo-located sourceforge server. Distache.freebsd.org failed because it failed to resolve an A (IPv4) address. The DNS lookup returned an AAAA (IPv6) address. You will see this in packets 5030 and 5031 of the pcap file you sent me. An nslookup of discache.freebsd.org (from here, Western Canada) resolves to: cwfw$ nslookup distcache.freebsd.org Server: 127.0.0.1 Address: 127.0.0.1#53 Non-authoritative answer: distcache.freebsd.org canonical name = distcache.geo.freebsd.org. Name: distcache.geo.freebsd.org Address: 149.20.1.201 Name: distcache.geo.freebsd.org Address: 2001:4f8:1:11::50:1 cwfw$ Below you see your request is for a AAAA record and you receive an IPv6 address. First your computer requests a AAAA (IPv6) address for distcache.freebsd.org. google.com.domain: 9233+ AAAA? distcache.FreeBSD.org. (39) And you receive a reply from google DNS of a CNAME and a AAAA record (IPv6) address. 17:03:47.618329 IP google-public-dns-a.google.com.domain > 46-245-176-99.static.mivitec.net.16379: 9233 2/0/0 CNAME distcache.geo.FreeBSD.org., AAAA 2001:41c8:112:8300::50:1 (95) An example of a request for both should look like. (My firewall machine acts a a forwarding only DNS to the outside world and distcache.freebsd.org was cached elsewhere in my network using named and nscd): My computer requests an A (IPv4) address. 23:40:59.502376 IP cwsys.58627 > cwfw.domain: 59749+ [1au] A? distcache.geo.freebsd.org. (82) An IPv4 (A) address is returned. 23:40:59.502701 IP cwfw.domain > cwsys.58627: 59749 1/3/7 A 149.20.1.201 (1100) Next a AAAA request is made. 23:40:59.504976 IP cwsys.50325 > cwfw.domain: 12700+ [1au] AAAA? distcache.geo.freebsd.org. (82) An AAAA record (IPv6) is returned. 23:40:59.505174 IP cwfw.domain > cwsys.50325: 12700 1/3/7 AAAA 2001:4f8:1:11::50:1 (1112) (Note the IPs will be different here in Canada than in the Ukraine because distcache is geo-located.) The pcap file shows a request for the wrong address. This explains no route to host. You cannot route IPv6 on an IPv4 network. The NAT issue was not capgured in the pcap file you sent me. BTW, I ran the same test here with no NAT errors. Can you rerun the test please?
I'm having this issue on a FreeBSD 11.2 system as well. ---- # uname -a FreeBSD <hostname> 11.2-RELEASE FreeBSD 11.2-RELEASE #0 r335510: Fri Jun 22 04:32:14 UTC 2018 root@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64 # ipfstat % sudo ipfstat 0 input bad packets 26965780 input IPv6 packets 79528 input packets blocked 73406775 input packets passed 37554 input packets not matched 0 input packets counted 0 input packets short 31191 input packets logged and blocked 6 input packets logged and passed 0 input fragment state kept 0 input fragment state lost 0 input packet state kept 0 input packet state lost 0 input invalid source 0 input cache hits 0 input cache misses 0 input bad coalesces 0 input pullups succeeded 7 input pullups failed 0 input TCP checksum failures 79405 input block reason packet blocked 0 input block reason log rule failure 0 input block reason pps rate exceeded 0 input block reason jumbogram 0 input block reason makefrip failed 0 input block reason cannot add state 0 input block reason IP ID update failed 0 input block reason log-or-block failed 0 input block reason decapsulate failure 0 input block reason cannot create new auth entry 0 input block reason packet queued for auth 0 input block reason buffer coalesce failure 0 input block reason buffer pullup failure 0 input block reason auth feedback 0 input block reason bad fragment 123 input block reason IPv4 NAT failure 0 input block reason IPv6 NAT failure 0 output bad packets 26954117 output IPv6 packets 48491 output packets blocked 69747622 output packets passed 22591464 output packets not matched 0 output packets counted 0 output packets short 0 output packets logged and blocked 0 output packets logged and passed 0 output fragment state kept 0 output fragment state lost 320625 output packet state kept 3333 output packet state lost 0 output invalid source 0 output cache hits 0 output cache misses 0 output bad coalesces 80 output pullups succeeded 0 output pullups failed 0 output TCP checksum failures 0 output block reason packet blocked 0 output block reason log rule failure 0 output block reason pps rate exceeded 0 output block reason jumbogram 0 output block reason makefrip failed 0 output block reason cannot add state 0 output block reason IP ID update failed 0 output block reason log-or-block failed 0 output block reason decapsulate failure 0 output block reason cannot create new auth entry 0 output block reason packet queued for auth 0 output block reason buffer coalesce failure 0 output block reason buffer pullup failure 0 output block reason auth feedback 0 output block reason bad fragment 48491 output block reason IPv4 NAT failure 0 output block reason IPv6 NAT failure 1305 packets logged 28835 log failures 0 red-black no memory 0 red-black node maximum 0 ICMP replies sent 0 TCP RSTs sent 0 fastroute successes 0 fastroute failures 1305713 IPF Ticks 0 Packet log flags set: none ---- Like the other reporter, my NAT failures grow over time: 48491 output block reason IPv4 NAT failure Is this bug still active?
ifconfig
It's still an open bug. The alternative is to use pf instead.
(In reply to ddrinnon from comment #23) Thanks for responding. I'm migrating to IPFW right now after seeing this issue on the forums. I just thought I'd chime in to help diagnose the problem with IPFILTER/IPNAT if possible.
(In reply to Jeremy Cooper from comment #24) ifconfig ?
I think the error is related to the driver. using ipfw
(In reply to Aleks from comment #25) em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC> ether 00:e0:67:0d:6d:30 hwaddr 00:e0:67:0d:6d:30 inet6 fe80::2e0:67ff:fe0d:6d30%em0 prefixlen 64 scopeid 0x1 inet6 yyyy:yyyy:yyyy:yyy0::2 prefixlen 64 inet xx.xx.xx.xx netmask 0xffffff00 broadcast xx.xx.xx.255 nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active em1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC> ether 00:e0:67:0d:6d:31 hwaddr 00:e0:67:0d:6d:31 inet6 fe80::2e0:67ff:fe0d:6d31%em1 prefixlen 64 scopeid 0x2 inet6 yyyy:yyyy:yyyy:yyy1::1 prefixlen 64 inet 192.168.1.1 netmask 0xffffff00 broadcast 192.168.1.255 nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active em2: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC> ether 00:e0:67:0d:6d:32 hwaddr 00:e0:67:0d:6d:32 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect status: no carrier em3: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC> ether 00:e0:67:0d:6d:33 hwaddr 00:e0:67:0d:6d:33 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect status: no carrier lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384 options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6> inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5 inet 127.0.0.1 netmask 0xff000000 nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> groups: lo gif0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1280 options=80000<LINKSTATE> tunnel inet xx.xx.xx.xx --> zz.zz.zz.zz inet6 gggg:gggg:gggg:gggg::gggg prefixlen 127 inet6 fe80::2e0:67ff:fe0d:6d30%gif0 prefixlen 64 scopeid 0x6 nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> groups: gif ---- # netstat -in Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll em0 1500 <Link#1> 00:e0:67:0d:6d:30 35448355 0 0 26049754 0 0 em0 - fe80::%em0/64 fe80::2e0:67ff:fe 33 - - 4014 - - em0 - yyyy:yyyy:yyy yyyy:yyyy:yyyy:yyy0:: 128841 - - 19590 - - em0 - xx.xx.xx.0/24 xx.xx.xx.xx 3932344 - - 2923714 - - em1 1500 <Link#2> 00:e0:67:0d:6d:31 26559500 0 0 35266695 0 0 em1 - fe80::%em1/64 fe80::2e0:67ff:fe 52287 - - 73187 - - em1 - yyyy:yyyy:yyy yyyy:yyyy:yyyy:yyy1:: 261873 - - 311304 - - em1 - 192.168.1.0/2 192.168.1.1 141091 - - 116878 - - em2* 1500 <Link#3> 00:e0:67:0d:6d:32 0 0 0 0 0 0 em3* 1500 <Link#4> 00:e0:67:0d:6d:33 0 0 0 0 0 0 lo0 16384 <Link#5> lo0 10076 0 0 10076 0 0 lo0 - ::1/128 ::1 205 - - 320 - - lo0 - fe80::%lo0/64 fe80::1%lo0 0 - - 2 - - lo0 - 127.0.0.0/8 127.0.0.1 9864 - - 9871 - - gif0 1280 <Link#6> gif0 10613773 0 0 7874989 40607 0 gif0 - gggg:gggg:ggg gggg:gggg:gggg:gggg::gggg 2298 - - 42565 - - gif0 - fe80::%gif0/6 fe80::2e0:67ff:fe 0 - - 0 - -
Gather the maximum information. Close Error
For completeness: -- # cat /etc/ipnat.rules map em0 192.168.1.0/24 -> xx.xx.xx.xx/32 portmap tcp/udp 43000:65534 #map em0 192.168.1.0/24 -> xx.xx.xx.xx/32 # # Exact port numbers have been fudged to protect the innocent. # rdr em0 xx.xx.xx.xx/32 port 31999 -> 192.168.1.10 port 9999 tcp rdr em0 xx.xx.xx.xx/32 port 32000 -> 192.168.1.11 port 22 tcp rdr em0 xx.xx.xx.xx/32 port 32001 -> 192.168.1.10 port 8888 tcp rdr em0 xx.xx.xx.xx/32 port 32002 -> 192.168.1.12 port 22 tcp rdr em0 xx.xx.xx.xx/32 port 32003 -> 192.168.1.10 port 8000 tcp rdr em0 xx.xx.xx.xx/32 port 32004 -> 192.168.1.10 port 22 tcp rdr em0 xx.xx.xx.xx/32 port 32005 -> 192.168.1.13 port 80 tcp rdr em0 xx.xx.xx.xx/32 port 32006 -> 192.168.1.14 port 22 tcp rdr em0 xx.xx.xx.xx/32 port 32007 -> 192.168.1.15 port 22 tcp rdr em0 xx.xx.xx.xx/32 port 32008 -> 192.168.1.10 port 80 tcp rdr em0 xx.xx.xx.xx/32 port 32009 -> 192.168.1.10 port 5900 tcp rdr em0 xx.xx.xx.xx/32 port 32010 -> 192.168.1.10 port 8889 tcp
(In reply to Aleks from comment #28) I'm trying; it takes some time to scrub private information from these reports.
(In reply to Jeremy Cooper from comment #30) If you want you can send output directly to me> ipnat -lv should also give more info. I see you have TSO turned off. This is good. As a test, can you turn off rxcsum and txcsum? Are there any ipmon messages about bad packets? TCP window errors? What kind of client systems do you have behind your firewall>? Windows, FreeBSD, other? I'd also like to sample some of your NAT failures with: dtrace -n 'sdt:::frb_natv4in { printf("reson: %d, if: %s\n", args[0]->fin_reason, stringof(args[0]->fin_ifp)); stack(); }' -n 'sdt:::frb_natv4out { printf("reson: %d, if: %s\n", args[0]->fin_reason, stringof(args[0]->fin_ifp)); stack(); }' Please send me or attach the output from the above dtrace command.
(In reply to Cy Schubert from comment #31) I've sent Cy some of the information he asked for under separate cover. But for completeness in the bug, here's one piece of information that doesn't need much scrubbing. The Dtrace output: 1 61730 none:frb_natv4out reson: 0, if: ipl.ko`ipf_check+0x70f kernel`pfil_run_hooks+0xb3 kernel`ip_tryforward+0x1fa kernel`ip_input+0x36a kernel`netisr_dispatch_src+0xa8 kernel`ether_demux+0x140 kernel`ether_nh_input+0x336 kernel`netisr_dispatch_src+0xa8 kernel`ether_input+0x26 kernel`if_input+0xa kernel`em_rxeof+0x2c0 kernel`em_handle_que+0x40 kernel`taskqueue_run_locked+0x154 kernel`taskqueue_thread_loop+0x98 kernel`fork_exit+0x83 kernel`0xffffffff80f585be 1 61730 none:frb_natv4out reson: 0, if: ipl.ko`ipf_check+0x70f kernel`pfil_run_hooks+0xb3 kernel`ip_tryforward+0x1fa kernel`ip_input+0x36a kernel`netisr_dispatch_src+0xa8 kernel`ether_demux+0x140 kernel`ether_nh_input+0x336 kernel`netisr_dispatch_src+0xa8 kernel`ether_input+0x26 kernel`if_input+0xa kernel`em_rxeof+0x2c0 kernel`em_handle_que+0x40 kernel`taskqueue_run_locked+0x154 kernel`taskqueue_thread_loop+0x98 kernel`fork_exit+0x83 kernel`0xffffffff80f585be 1 61730 none:frb_natv4out reson: 0, if: ipl.ko`ipf_check+0x70f kernel`pfil_run_hooks+0xb3 kernel`ip_tryforward+0x1fa kernel`ip_input+0x36a kernel`netisr_dispatch_src+0xa8 kernel`ether_demux+0x140 kernel`ether_nh_input+0x336 kernel`netisr_dispatch_src+0xa8 kernel`ether_input+0x26 kernel`if_input+0xa kernel`em_rxeof+0x2c0 kernel`em_handle_que+0x40 kernel`taskqueue_run_locked+0x154 kernel`taskqueue_thread_loop+0x98 kernel`fork_exit+0x83 kernel`0xffffffff80f585be 1 61730 none:frb_natv4out reson: 0, if: ipl.ko`ipf_check+0x70f kernel`pfil_run_hooks+0xb3 kernel`ip_tryforward+0x1fa kernel`ip_input+0x36a kernel`netisr_dispatch_src+0xa8 kernel`ether_demux+0x140 kernel`ether_nh_input+0x336 kernel`netisr_dispatch_src+0xa8 kernel`ether_input+0x26 kernel`if_input+0xa kernel`em_rxeof+0x2c0 kernel`em_handle_que+0x40 kernel`taskqueue_run_locked+0x154 kernel`taskqueue_thread_loop+0x98 kernel`fork_exit+0x83 kernel`0xffffffff80f585be 1 61730 none:frb_natv4out reson: 0, if: ipl.ko`ipf_check+0x70f kernel`pfil_run_hooks+0xb3 kernel`ip_tryforward+0x1fa kernel`ip_input+0x36a kernel`netisr_dispatch_src+0xa8 kernel`ether_demux+0x140 kernel`ether_nh_input+0x336 kernel`netisr_dispatch_src+0xa8 kernel`ether_input+0x26 kernel`if_input+0xa kernel`em_rxeof+0x2c0 kernel`em_handle_que+0x40 kernel`taskqueue_run_locked+0x154 kernel`taskqueue_thread_loop+0x98 kernel`fork_exit+0x83 kernel`0xffffffff80f585be 1 61730 none:frb_natv4out reson: 0, if: ipl.ko`ipf_check+0x70f kernel`pfil_run_hooks+0xb3 kernel`ip_tryforward+0x1fa kernel`ip_input+0x36a kernel`netisr_dispatch_src+0xa8 kernel`ether_demux+0x140 kernel`ether_nh_input+0x336 kernel`netisr_dispatch_src+0xa8 kernel`ether_input+0x26 kernel`if_input+0xa kernel`em_rxeof+0x2c0 kernel`em_handle_que+0x40 kernel`taskqueue_run_locked+0x154 kernel`taskqueue_thread_loop+0x98 kernel`fork_exit+0x83 kernel`0xffffffff80f585be 1 61730 none:frb_natv4out reson: 0, if: ipl.ko`ipf_check+0x70f kernel`pfil_run_hooks+0xb3 kernel`ip_tryforward+0x1fa kernel`ip_input+0x36a kernel`netisr_dispatch_src+0xa8 kernel`ether_demux+0x140 kernel`ether_nh_input+0x336 kernel`netisr_dispatch_src+0xa8 kernel`ether_input+0x26 kernel`if_input+0xa kernel`em_rxeof+0x2c0 kernel`em_handle_que+0x40 kernel`taskqueue_run_locked+0x154 kernel`taskqueue_thread_loop+0x98 kernel`fork_exit+0x83 kernel`0xffffffff80f585be 1 61730 none:frb_natv4out reson: 0, if: ipl.ko`ipf_check+0x70f kernel`pfil_run_hooks+0xb3 kernel`ip_tryforward+0x1fa kernel`ip_input+0x36a kernel`netisr_dispatch_src+0xa8 kernel`ether_demux+0x140 kernel`ether_nh_input+0x336 kernel`netisr_dispatch_src+0xa8 kernel`ether_input+0x26 kernel`if_input+0xa kernel`em_rxeof+0x2c0 kernel`em_handle_que+0x40 kernel`taskqueue_run_locked+0x154 kernel`taskqueue_thread_loop+0x98 kernel`fork_exit+0x83 kernel`0xffffffff80f585be 1 61730 none:frb_natv4out reson: 0, if: ipl.ko`ipf_check+0x70f kernel`pfil_run_hooks+0xb3 kernel`ip_tryforward+0x1fa kernel`ip_input+0x36a kernel`netisr_dispatch_src+0xa8 kernel`ether_demux+0x140 kernel`ether_nh_input+0x336 kernel`netisr_dispatch_src+0xa8 kernel`ether_input+0x26 kernel`if_input+0xa kernel`em_rxeof+0x2c0 kernel`em_handle_que+0x40 kernel`taskqueue_run_locked+0x154 kernel`taskqueue_thread_loop+0x98 kernel`fork_exit+0x83 kernel`0xffffffff80f585be 1 61730 none:frb_natv4out reson: 0, if: ipl.ko`ipf_check+0x70f kernel`pfil_run_hooks+0xb3 kernel`ip_tryforward+0x1fa kernel`ip_input+0x36a kernel`netisr_dispatch_src+0xa8 kernel`ether_demux+0x140 kernel`ether_nh_input+0x336 kernel`netisr_dispatch_src+0xa8 kernel`ether_input+0x26 kernel`if_input+0xa kernel`em_rxeof+0x2c0 kernel`em_handle_que+0x40 kernel`taskqueue_run_locked+0x154 kernel`taskqueue_thread_loop+0x98 kernel`fork_exit+0x83 kernel`0xffffffff80f585be 1 61730 none:frb_natv4out reson: 0, if: ipl.ko`ipf_check+0x70f kernel`pfil_run_hooks+0xb3 kernel`ip_tryforward+0x1fa kernel`ip_input+0x36a kernel`netisr_dispatch_src+0xa8 kernel`ether_demux+0x140 kernel`ether_nh_input+0x336 kernel`netisr_dispatch_src+0xa8 kernel`ether_input+0x26 kernel`if_input+0xa kernel`em_rxeof+0x2c0 kernel`em_handle_que+0x40 kernel`taskqueue_run_locked+0x154 kernel`taskqueue_thread_loop+0x98 kernel`fork_exit+0x83 kernel`0xffffffff80f585be 1 61730 none:frb_natv4out reson: 0, if: ipl.ko`ipf_check+0x70f kernel`pfil_run_hooks+0xb3 kernel`ip_tryforward+0x1fa kernel`ip_input+0x36a kernel`netisr_dispatch_src+0xa8 kernel`ether_demux+0x140 kernel`ether_nh_input+0x336 kernel`netisr_dispatch_src+0xa8 kernel`ether_input+0x26 kernel`if_input+0xa kernel`em_rxeof+0x2c0 kernel`em_handle_que+0x40 kernel`taskqueue_run_locked+0x154 kernel`taskqueue_thread_loop+0x98 kernel`fork_exit+0x83 kernel`0xffffffff80f585be 1 61730 none:frb_natv4out reson: 0, if: ipl.ko`ipf_check+0x70f kernel`pfil_run_hooks+0xb3 kernel`ip_tryforward+0x1fa kernel`ip_input+0x36a kernel`netisr_dispatch_src+0xa8 kernel`ether_demux+0x140 kernel`ether_nh_input+0x336 kernel`netisr_dispatch_src+0xa8 kernel`ether_input+0x26 kernel`if_input+0xa kernel`em_rxeof+0x2c0 kernel`em_handle_que+0x40 kernel`taskqueue_run_locked+0x154 kernel`taskqueue_thread_loop+0x98 kernel`fork_exit+0x83 kernel`0xffffffff80f585be
(In reply to Jeremy Cooper from comment #32) I'm betting we need to refine that Dtrace command. I can see you were aiming for a reason code and interface name, yet the reason seems to be "0" and the interface name is blank. That can't be right.
(In reply to Jeremy Cooper from comment #33) I've revised the dtrace selector to cast the void * that is fin_ifp into an ifnet *, and then to fetch the if_xname from that. We get something a tiny bit better: CPU ID FUNCTION:NAME 3 61736 none:frb_natv4in reson: 0, if: em0 ipl.ko`ipf_check+0x1fe kernel`pfil_run_hooks+0xb3 kernel`ip_input+0x3eb kernel`netisr_dispatch_src+0xa8 kernel`ether_demux+0x140 kernel`ether_nh_input+0x336 kernel`netisr_dispatch_src+0xa8 kernel`ether_input+0x26 kernel`if_input+0xa kernel`em_rxeof+0x2c0 kernel`em_handle_que+0x40 kernel`taskqueue_run_locked+0x154 kernel`taskqueue_thread_loop+0x98 kernel`fork_exit+0x83 kernel`0xffffffff80f585be 1 61730 none:frb_natv4out reson: 0, if: em0 ipl.ko`ipf_check+0x70f kernel`pfil_run_hooks+0xb3 kernel`ip_tryforward+0x1fa kernel`ip_input+0x36a kernel`netisr_dispatch_src+0xa8 kernel`ether_demux+0x140 kernel`ether_nh_input+0x336 kernel`netisr_dispatch_src+0xa8 kernel`ether_input+0x26 kernel`if_input+0xa kernel`em_rxeof+0x2c0 kernel`em_handle_que+0x40 kernel`taskqueue_run_locked+0x154 kernel`taskqueue_thread_loop+0x98 kernel`fork_exit+0x83 kernel`0xffffffff80f585be 1 61730 none:frb_natv4out reson: 0, if: em0 ipl.ko`ipf_check+0x70f kernel`pfil_run_hooks+0xb3 kernel`ip_tryforward+0x1fa kernel`ip_input+0x36a kernel`netisr_dispatch_src+0xa8 kernel`ether_demux+0x140 kernel`ether_nh_input+0x336 kernel`netisr_dispatch_src+0xa8 kernel`ether_input+0x26 kernel`if_input+0xa kernel`em_rxeof+0x2c0 kernel`em_handle_que+0x40 kernel`taskqueue_run_locked+0x154 kernel`taskqueue_thread_loop+0x98 kernel`fork_exit+0x83 kernel`0xffffffff80f585be 1 61730 none:frb_natv4out reson: 0, if: em0 ipl.ko`ipf_check+0x70f kernel`pfil_run_hooks+0xb3 kernel`ip_tryforward+0x1fa kernel`ip_input+0x36a kernel`netisr_dispatch_src+0xa8 kernel`ether_demux+0x140 kernel`ether_nh_input+0x336 kernel`netisr_dispatch_src+0xa8 kernel`ether_input+0x26 kernel`if_input+0xa kernel`em_rxeof+0x2c0 kernel`em_handle_que+0x40 kernel`taskqueue_run_locked+0x154 kernel`taskqueue_thread_loop+0x98 kernel`fork_exit+0x83 kernel`0xffffffff80f585be 1 61730 none:frb_natv4out reson: 0, if: em0 ipl.ko`ipf_check+0x70f kernel`pfil_run_hooks+0xb3 kernel`ip_tryforward+0x1fa kernel`ip_input+0x36a kernel`netisr_dispatch_src+0xa8 kernel`ether_demux+0x140 kernel`ether_nh_input+0x336 kernel`netisr_dispatch_src+0xa8 kernel`ether_input+0x26 kernel`if_input+0xa kernel`em_rxeof+0x2c0 kernel`em_handle_que+0x40 kernel`taskqueue_run_locked+0x154 kernel`taskqueue_thread_loop+0x98 kernel`fork_exit+0x83 kernel`0xffffffff80f585be 1 61730 none:frb_natv4out reson: 0, if: em0 ipl.ko`ipf_check+0x70f kernel`pfil_run_hooks+0xb3 kernel`ip_tryforward+0x1fa kernel`ip_input+0x36a kernel`netisr_dispatch_src+0xa8 kernel`ether_demux+0x140 kernel`ether_nh_input+0x336 kernel`netisr_dispatch_src+0xa8 kernel`ether_input+0x26 kernel`if_input+0xa kernel`em_rxeof+0x2c0 kernel`em_handle_que+0x40 kernel`taskqueue_run_locked+0x154 kernel`taskqueue_thread_loop+0x98 kernel`fork_exit+0x83 kernel`0xffffffff80f585be 1 61730 none:frb_natv4out reson: 0, if: em0 ipl.ko`ipf_check+0x70f kernel`pfil_run_hooks+0xb3 kernel`ip_tryforward+0x1fa kernel`ip_input+0x36a kernel`netisr_dispatch_src+0xa8 kernel`ether_demux+0x140 kernel`ether_nh_input+0x336 kernel`netisr_dispatch_src+0xa8 kernel`ether_input+0x26 kernel`if_input+0xa kernel`em_rxeof+0x2c0 kernel`em_handle_que+0x40 kernel`taskqueue_run_locked+0x154 kernel`taskqueue_thread_loop+0x98 kernel`fork_exit+0x83 kernel`0xffffffff80f585be 1 61730 none:frb_natv4out reson: 0, if: em0 ipl.ko`ipf_check+0x70f kernel`pfil_run_hooks+0xb3 kernel`ip_tryforward+0x1fa kernel`ip_input+0x36a kernel`netisr_dispatch_src+0xa8 kernel`ether_demux+0x140 kernel`ether_nh_input+0x336 kernel`netisr_dispatch_src+0xa8 kernel`ether_input+0x26 kernel`if_input+0xa kernel`em_rxeof+0x2c0 kernel`em_handle_que+0x40 kernel`taskqueue_run_locked+0x154 kernel`taskqueue_thread_loop+0x98 kernel`fork_exit+0x83 kernel`0xffffffff80f585be 1 61730 none:frb_natv4out reson: 0, if: em0 ipl.ko`ipf_check+0x70f kernel`pfil_run_hooks+0xb3 kernel`ip_tryforward+0x1fa kernel`ip_input+0x36a kernel`netisr_dispatch_src+0xa8 kernel`ether_demux+0x140 kernel`ether_nh_input+0x336 kernel`netisr_dispatch_src+0xa8 kernel`ether_input+0x26 kernel`if_input+0xa kernel`em_rxeof+0x2c0 kernel`em_handle_que+0x40 kernel`taskqueue_run_locked+0x154 kernel`taskqueue_thread_loop+0x98 kernel`fork_exit+0x83 kernel`0xffffffff80f585be 1 61730 none:frb_natv4out reson: 0, if: em0 ipl.ko`ipf_check+0x70f kernel`pfil_run_hooks+0xb3 kernel`ip_tryforward+0x1fa kernel`ip_input+0x36a kernel`netisr_dispatch_src+0xa8 kernel`ether_demux+0x140 kernel`ether_nh_input+0x336 kernel`netisr_dispatch_src+0xa8 kernel`ether_input+0x26 kernel`if_input+0xa kernel`em_rxeof+0x2c0 kernel`em_handle_que+0x40 kernel`taskqueue_run_locked+0x154 kernel`taskqueue_thread_loop+0x98 kernel`fork_exit+0x83 kernel`0xffffffff80f585be 1 61730 none:frb_natv4out reson: 0, if: em0 ipl.ko`ipf_check+0x70f kernel`pfil_run_hooks+0xb3 kernel`ip_tryforward+0x1fa kernel`ip_input+0x36a kernel`netisr_dispatch_src+0xa8 kernel`ether_demux+0x140 kernel`ether_nh_input+0x336 kernel`netisr_dispatch_src+0xa8 kernel`ether_input+0x26 kernel`if_input+0xa kernel`em_rxeof+0x2c0 kernel`em_handle_que+0x40 kernel`taskqueue_run_locked+0x154 kernel`taskqueue_thread_loop+0x98 kernel`fork_exit+0x83 kernel`0xffffffff80f585be 1 61730 none:frb_natv4out reson: 0, if: em0 ipl.ko`ipf_check+0x70f kernel`pfil_run_hooks+0xb3 kernel`ip_tryforward+0x1fa kernel`ip_input+0x36a kernel`netisr_dispatch_src+0xa8 kernel`ether_demux+0x140 kernel`ether_nh_input+0x336 kernel`netisr_dispatch_src+0xa8 kernel`ether_input+0x26 kernel`if_input+0xa kernel`em_rxeof+0x2c0 kernel`em_handle_que+0x40 kernel`taskqueue_run_locked+0x154 kernel`taskqueue_thread_loop+0x98 kernel`fork_exit+0x83 kernel`0xffffffff80f585be 1 61730 none:frb_natv4out reson: 0, if: em0 ipl.ko`ipf_check+0x70f kernel`pfil_run_hooks+0xb3 kernel`ip_tryforward+0x1fa kernel`ip_input+0x36a kernel`netisr_dispatch_src+0xa8 kernel`ether_demux+0x140 kernel`ether_nh_input+0x336 kernel`netisr_dispatch_src+0xa8 kernel`ether_input+0x26 kernel`if_input+0xa kernel`em_rxeof+0x2c0 kernel`em_handle_que+0x40 kernel`taskqueue_run_locked+0x154 kernel`taskqueue_thread_loop+0x98 kernel`fork_exit+0x83 kernel`0xffffffff80f585be 1 61730 none:frb_natv4out reson: 0, if: em0 ipl.ko`ipf_check+0x70f kernel`pfil_run_hooks+0xb3 kernel`ip_tryforward+0x1fa kernel`ip_input+0x36a kernel`netisr_dispatch_src+0xa8 kernel`ether_demux+0x140 kernel`ether_nh_input+0x336 kernel`netisr_dispatch_src+0xa8 kernel`ether_input+0x26 kernel`if_input+0xa kernel`em_rxeof+0x2c0 kernel`em_handle_que+0x40 kernel`taskqueue_run_locked+0x154 kernel`taskqueue_thread_loop+0x98 kernel`fork_exit+0x83 kernel`0xffffffff80f585be 1 61730 none:frb_natv4out reson: 0, if: em0 ipl.ko`ipf_check+0x70f kernel`pfil_run_hooks+0xb3 kernel`ip_tryforward+0x1fa kernel`ip_input+0x36a kernel`netisr_dispatch_src+0xa8 kernel`ether_demux+0x140 kernel`ether_nh_input+0x336 kernel`netisr_dispatch_src+0xa8 kernel`ether_input+0x26 kernel`if_input+0xa kernel`em_rxeof+0x2c0 kernel`em_handle_que+0x40 kernel`taskqueue_run_locked+0x154 kernel`taskqueue_thread_loop+0x98 kernel`fork_exit+0x83 kernel`0xffffffff80f585be 1 61730 none:frb_natv4out reson: 0, if: em0 ipl.ko`ipf_check+0x70f kernel`pfil_run_hooks+0xb3 kernel`ip_tryforward+0x1fa kernel`ip_input+0x36a kernel`netisr_dispatch_src+0xa8 kernel`ether_demux+0x140 kernel`ether_nh_input+0x336 kernel`netisr_dispatch_src+0xa8 kernel`ether_input+0x26 kernel`if_input+0xa kernel`em_rxeof+0x2c0 kernel`em_handle_que+0x40 kernel`taskqueue_run_locked+0x154 kernel`taskqueue_thread_loop+0x98 kernel`fork_exit+0x83 kernel`0xffffffff80f585be
Oh, the improved dtrace fragment: dtrace -n 'sdt:::frb_natv4in { printf("reson: %d, if: %s\n", args[0]->fin_reason, ((struct ifnet *)(args[0]->fin_ifp))->if_xname); stack(); }' -n \ 'sdt:::frb_natv4out { printf("reson: %d, if: %s\n", args[0]->fin_reason, \ ((struct ifnet *)(args[0]->fin_ifp))->if_xname); stack(); }'
Created attachment 195977 [details] Provide additional dtrace output Thanks for getting that straight. (Very late day + day off doing renos = couldn't wait to shower and get to bed.) Please apply this patch to tell us where NAT is failing. Currently we have a fail flag but no indication as to the cause. This patch sets the rval to the point in the code where it fails. This should help track the issue down. After applying the patch, change the dtrace script to: dtrace -n 'sdt:::frb_natv4in { printf("if: %s, rval: %d\n", ((struct ifnet *)(args[0]->fin_ifp))->if_xname, args[1]); stack(); }' -n \ 'sdt:::frb_natv4out { printf("if: %s, rval: %d\n", \ ((struct ifnet *)(args[0]->fin_ifp))->if_xname, args[1]); stack(); }' From here we can determine the direction in which to sleuth next.
I ask all the participants to help. I'm the one who started this topic. But you can not reach the end. I was not a great administrator then. Help to create a fix
(In reply to Cy Schubert from comment #36) I've rebuilt ipl.ko with the supplied patches, reloaded it, re-applied my filter and nat rules and am running DTrace now. As the bug states, it may take a couple of days before we see anything; haven't reproduced the problem yet.
The bug should show up anytime, not just after two days.
(In reply to Cy Schubert from comment #39) It's been almost two hours, no failures yet.
Well, speak of the devil, first failure just now. dtrace: description 'sdt:::frb_natv4in ' matched 1 probe dtrace: description 'sdt:::frb_natv4out ' matched 1 probe CPU ID FUNCTION:NAME 1 61730 none:frb_natv4out if: em0, rval: -2 ipl.ko`ipf_check+0x70f kernel`pfil_run_hooks+0xb3 kernel`ip_tryforward+0x1fa kernel`ip_input+0x36a kernel`netisr_dispatch_src+0xa8 kernel`ether_demux+0x140 kernel`ether_nh_input+0x336 kernel`netisr_dispatch_src+0xa8 kernel`ether_input+0x26 kernel`if_input+0xa kernel`em_rxeof+0x2c0 kernel`em_handle_que+0x40 kernel`taskqueue_run_locked+0x154 kernel`taskqueue_thread_loop+0x98 kernel`fork_exit+0x83 kernel`0xffffffff80f585be
(In reply to Jeremy Cooper from comment #41) Up to ten failures now. All in natv4out, all rval: -2.
(In reply to Jeremy Cooper from comment #42) Looking at the source, rval -2 means that ipf_add_nat() returned NULL. Looking at ipf_add_nat(), one source of returning NULL is an overflow of the table: if (nsp->ns_active >= softn->ipf_nat_table_max) { NBUMPSIDED(fin->fin_out, ns_table_max); return NULL; } But this clearly hasn't happened because that case should increase the "ns_table_max" statistic, which doesn't appear to have changed: % sudo ipnat -sl | grep table 0 table max reached in 0 table max reached out So I'll keep reading.
(In reply to Jeremy Cooper from comment #43) For completeness, the current ipnat statistics: 0 proxy create fail in 0 proxy fail in 0 bad nat in 0 bad nat new in 0 bad next addr in 8 bucket max in 0 clone nomem in 0 decap bad in 0 decap fail in 0 decap pullup in 0 divert dup in 0 divert exist in 0 drop in 0 exhausted in 0 icmp address in 0 icmp basic in 59 inuse in 0 icmp mbuf wrong size in 52 icmp header unmatched in 0 icmp rebuild failures in 0 icmp short in 0 icmp packet size wrong in 0 IFP address fetch failures in 825768 packets untranslated in 0 NAT insert failures in 736047 NAT lookup misses in 92424 NAT lookup nowild in 0 new ifpaddr failed in 0 memory requests failed in 0 table max reached in 149082 packets translated in 0 finalised failed in 0 search wraps in 0 null translations in 0 translation exists in 0 no memory in 107% hash efficiency in 2.93% bucket usage in 0 minimal length in -1 maximal length in 0.933 average length in 0 proxy create fail out 0 proxy fail out 10 bad nat out 10 bad nat new out 0 bad next addr out 2 bucket max out 0 clone nomem out 0 decap bad out 0 decap fail out 0 decap pullup out 0 divert dup out 0 divert exist out 10 drop out 0 exhausted out 0 icmp address out 0 icmp basic out 56 inuse out 0 icmp mbuf wrong size out 88 icmp header unmatched out 0 icmp rebuild failures out 0 icmp short out 0 icmp packet size wrong out 0 IFP address fetch failures out 684969 packets untranslated out 0 NAT insert failures out 531961 NAT lookup misses out 158391 NAT lookup nowild out 0 new ifpaddr failed out 0 memory requests failed out 0 table max reached out 82747 packets translated out 10 finalised failed out 0 search wraps out 0 null translations out 0 translation exists out 0 no memory out 103% hash efficiency out 2.83% bucket usage out 0 minimal length out -1 maximal length out 0.966 average length out 0 log successes 0 log failures 2 added in 2637 added out 56 active 0 transparent adds 0 divert build 2583 expired 0 flush all 0 flush closing 0 flush queue 0 flush state 0 flush timeout 1139 hostmap new 0 hostmap fails 1508 hostmap add 0 hostmap NULL rule 0 log ok 0 log fail 0 orphan count 13 rule count 1 map rules 12 rdr rules 0 wilds
(In reply to Jeremy Cooper from comment #44) We're hitting 'ns_badnatnew': 10 bad nat out 10 bad nat new out
ipf_badnat() can return NULL when the table is exhausted. Can you list: ipf -T list | grep nat sysctl net.inet.ipf | grep nat I'll take a look at this later tonight.
A commit references this bug: Author: cy Date: Fri Aug 10 00:04:33 UTC 2018 New revision: 337558 URL: https://svnweb.freebsd.org/changeset/base/337558 Log: Identify the return value (rval) that led to the IPv4 NAT failure in ipf_nat_checkout() and report it in the frb_natv4out and frb_natv4in dtrace probes. This is currently being used to diagnose NAT failures in PR/208566. It's rather handy so this commit makes it available for future diagnosis and debugging efforts. PR: 208566 MFC after: 1 week Changes: head/sys/contrib/ipfilter/netinet/ip_nat.c
Cy, now that I know generally where to look, I'm going to add some more DT probes on my own. I'm out of town for a few days, but I'll be back on Sunday and will start things back up again then. I'll share my search and patches here.
I am able to reproduce some kind of NAT failures, not necessarily after two days, but randomly, only when I run a series of svnsync commands to svn.freebsd.org on two machines within my network while also running a series of svnsync commands on the gateway itself. I'm not sure if what I'm seeing is related to this PR or if it's a totally different problem. Most of the failures I'm seeing are ns_badnatnew. Unfortunately it reaches the badnat label from multiple sources. Sleuthing continues. Rather than be inundated by a lot of information, for starters can you run the following dtrace script for me, please? I'd like to compare with what I'm seeing here. dtrace -n 'sdt:::frb_natv4in' -n \ 'sdt:::frb_natv4out' -n \ 'sdt:::ns_badnatnew' -n \ 'sdt:::ns_memfail' -n \ 'sdt:::ns_table_max'
A commit references this bug: Author: cy Date: Fri Aug 17 02:46:36 UTC 2018 New revision: 337948 URL: https://svnweb.freebsd.org/changeset/base/337948 Log: MFC r337558, r337560 r337558: Identify the return value (rval) that led to the IPv4 NAT failure in ipf_nat_checkout() and report it in the frb_natv4out and frb_natv4in dtrace probes. This is currently being used to diagnose NAT failures in PR/208566. It's rather handy so this commit makes it available for future diagnosis and debugging efforts. PR: 208566 r337560: Correct a comment. Should have been detected by ipf_nat_in() not ipf_nat_out(). Changes: _U stable/10/ stable/10/sys/contrib/ipfilter/netinet/ip_nat.c _U stable/11/ stable/11/sys/contrib/ipfilter/netinet/ip_nat.c
Created attachment 196269 [details] After updating to r337558, apply this and use the dtrace script Please update to r337558 (or use this in addition to the previous patch) and use this dtrace script. dtrace -n 'sdt:::frb_natv4in' \ -n 'sdt:::frb_natv4out' \ -n 'sdt:::ns_memfail' \ -n 'sdt:::ns_table_max' \ -n 'sdt:::ns_badnatnew { printf("if: %s, pf_nat_newdivert(): %s, ipf_nat_newrewrite(): %s, direction: %s, ipf_proxy_new(): %s\n", ((struct ifnet *)(args[0]->fin_ifp))->if_xname, ((args[1]->nat_redir & 0x20) == 0) ? "no" : "yes", ((args[1]->nat_redir & 0x08) == 0) ? "no" : "yes", (args[1]->nat_dir == 1) ? "ipf_nat_newmap()" : "ipf_nat_newrdr()", ((args[2]->in_apr != 0 ) && ((args[1]->nat_flags & 0x0020) == 0)) ? "failed" : "ok"); }' \ -n 'sdt:::ns_exhausted_1' \ -n 'sdt:::ns_exhausted_2' \ -n 'sdt:::ns_exhausted_3' \ -n 'sdt:::ns_new_ifpaddr_1' \ -n 'sdt:::ns_wrap' \ -n 'sdt:::ns_appr_fail' \ -n 'sdt:::ns_unfinalised' I'm not looking at data yet but determining the execution path. I suspect either ns_appr_fail or ns_unfinalised.
I want to take a look at ipf_nat_hashtab() too. Use this script instead. dtrace -n 'sdt:::frb_natv4in' \ -n 'sdt:::frb_natv4out' \ -n 'sdt:::ns_memfail' \ -n 'sdt:::ns_table_max' \ -n 'sdt:::ns_badnatnew { printf("if: %s, pf_nat_newdivert(): %s, ipf_nat_newrewrite(): %s, direction: %s, ipf_proxy_new(): %s\n", ((struct ifnet *)(args[0]->fin_ifp))->if_xname, ((args[1]->nat_redir & 0x20) == 0) ? "no" : "yes", ((args[1]->nat_redir & 0x08) == 0) ? "no" : "yes", (args[1]->nat_dir == 1) ? "ipf_nat_newmap()" : "ipf_nat_newrdr()", ((args[2]->in_apr != 0 ) && ((args[1]->nat_flags & 0x0020) == 0)) ? "failed" : "ok"); }' \ -n 'sdt:::ns_exhausted_1' \ -n 'sdt:::ns_exhausted_2' \ -n 'sdt:::ns_exhausted_3' \ -n 'sdt:::ns_new_ifpaddr_1' \ -n 'sdt:::ns_wrap' \ -n 'sdt:::ns_appr_fail' \ -n 'sdt:::ns_unfinalised' \ -n 'sdt:::ns_bucket_max_0' \ -n 'sdt:::ns_bucket_max_1'
Can you provide output from: ipf -T nat_maxbucket
Created attachment 196320 [details] DTRACE output on a system with failing RDR's
I also have this problem, since FreeBSD 9.x something. For me it is always related to RDR, I never experienced it with NAT rules. Problems starts after a few hours/days (probably depending on the number of connections, but that is only a guess). The only particular thing I noticed is the max length in the output of ipnat -s is zero when I start, but that becomes -1 after some time. I do not know, if this has something to do with it. # ipnat -s | grep length 0 minimal length in -1 maximal length in 0.005 average length in 0 minimal length out -1 maximal length out 0.005 average length out I have attached a file with the requested dtrace debug output, that the problems start appearing (failing connections). # ipf -T nat_maxbucket nat_maxbucket min 1 max 2147483647 current 22 Hopefully, the debug output is useful for you.
(In reply to frank from comment #55) 9.X had ipf 4.1.28. If it had problems they're coincidental because starting in 10.0 ipf was updated to 5.1.2.
Created attachment 196324 [details] Patch to resolve this PR. Please apply this patch. It replaces 196269 as it includes a fix and the additional DTrace probes. My plan is to split the patch into two or three commits to fix the problem, extend a DTrace probe, and add new DTrace probes for future debugging. It appears that the bucket count was decremented to less than zero in an unsigned int making it a very large number. This affected adding entries to the hash table.
(In reply to frank from comment #55) Also, this affected not just rdr but also map rules. I'm able to reproduce this with map rules by simultaneously doing an svnsync to svn.freebsd.org from two machines in my network and on the gateway itself. It's hit & miss and is not 100% reproducible every time. Using a browser I can occasionally reproduce this by hitting some websites and not others. Using Government of Canada websites tends to more easily reproduce this problem than others.
Before you apply the latest patch, this DTrace script clearly illustrates the problem. dtrace -n 'sdt:::frb_natv4in' \ -n 'sdt:::frb_natv4out' \ -n 'sdt:::ns_memfail' \ -n 'sdt:::ns_table_max' \ -n 'sdt:::ns_badnatnew { printf("if: %s, pf_nat_newdivert(): %s, ipf_nat_newrewrite(): %s, direction: %s, ipf_proxy_new(): %s\n", ((struct ifnet *)(args[0]->fin_ifp))->if_xname, ((args[1]->nat_redir & 0x20) == 0) ? "no" : "yes", ((args[1]->nat_redir & 0x08) == 0) ? "no" : "yes", (args[1]->nat_dir == 1) ? "ipf_nat_newmap()" : "ipf_nat_newrdr()", ((args[2]->in_apr != 0 ) && ((args[1]->nat_flags & 0x0020) == 0)) ? "failed" : "ok"); }' \ -n 'sdt:::ns_exhausted_1' \ -n 'sdt:::ns_exhausted_2' \ -n 'sdt:::ns_exhausted_3' \ -n 'sdt:::ns_new_ifpaddr_1' \ -n 'sdt:::ns_wrap' \ -n 'sdt:::ns_appr_fail' \ -n 'sdt:::ns_unfinalised' \ -n 'sdt:::ns_bucket_max_0 { printf("%d\n", args[0]); }' \ -n 'sdt:::ns_bucket_max_1 { printf("%d\n", args[0]); }'
(In reply to Cy Schubert from comment #56) O.k., maybe I was wrong their. I remember having this problem a long time, but maybe 9.x is an over exaggeration :-)
(In reply to frank from comment #60) I think the bug started with FreeBSD 10 when ipfilter went from version 4 to 5. If there is a fix, I'll revert one of my nat gateways from pf to ipf to test it out.
(In reply to ddrinnon from comment #61) Apply the latest patch in this PR. In addition to the fix it includes additional DTrace probes that will be used by the DTrace scripts posted here.
I will update from version 11.1-STABLE FreeBSD 11.1-STABLE # 0 r332428 freebsd-update fetch Looking up update.FreeBSD.org mirrors ... 2 mirrors found. Fetching public key from update4.freebsd.org ... failed. Fetching public key from update5.freebsd.org ... failed. No mirrors remaining, giving up.
FreeBSD-11.2-STABLE-amd64-20180816-r337910-disc1.iso 2018-08-17 04:23 676M
(In reply to Cy Schubert from comment #57) I have been running the patch for 12 hours now, and I have not seen any problem now (however load is lower in the weekend). Output of the dtrace from comment #59 is also empty. I have a small nagging doubt if this is the actual patch, or only fixing its visibility, but I'm not very familiar with the IP-filter version 5 source code, so likely I'm wrong. ipf_nat_delete() is called for deleting a NAT entry. For this NAT entry the hash value is calculated. If the hash value refers to an empty list, the the counter for that list is not decreased with this patch to below zero. My doubt is: if I delete a NAT entry, should that not always be present in the list of NAT entries corresponding to its hash value?
(In reply to Aleks from comment #63) The patch hasn't even been committed to -CURRENT yet. It won't be to -STABLE for at least three days after that.
(In reply to frank from comment #65) The bucket index is subtracted by one at lines 2304 and 2314. When 0 it becomes -1, except these are unsigned integers, so they become very large numbers 2^32-1 on i386 and 2^64-1 on amd64. Thus are always larger than the maximum bucket; the hash table insertion fails causing NAT to fail.
A commit references this bug: Author: cy Date: Sun Aug 19 13:44:56 UTC 2018 New revision: 338045 URL: https://svnweb.freebsd.org/changeset/base/338045 Log: Expose np (nat_t - an entry in the nat table structure) in the DTrace probe when nat fails (label badnat). This is useful in diagnosing failed NAT issues and was used in PR/208566. PR: 208566 MFC after: 3 days Changes: head/sys/contrib/ipfilter/netinet/ip_nat.c
A commit references this bug: Author: cy Date: Sun Aug 19 13:45:00 UTC 2018 New revision: 338046 URL: https://svnweb.freebsd.org/changeset/base/338046 Log: Add handy DTrace probes useful in diagnosing NAT issues. DTrace probes are situated next to error counters and/or in one instance prior to the -1 return from various functions. This was useful in diagnosis of PR/208566 and will be handy in the future diagnosing NAT failures. PR: 208566 MFC after: 3 days Changes: head/sys/contrib/ipfilter/netinet/ip_nat.c
A commit references this bug: Author: cy Date: Sun Aug 19 13:45:03 UTC 2018 New revision: 338047 URL: https://svnweb.freebsd.org/changeset/base/338047 Log: The bucket index is subtracted by one at lines 2304 and 2314. When 0 it becomes -1, except these are unsigned integers, so they become very large numbers. Thus are always larger than the maximum bucket; the hash table insertion fails causing NAT to fail. This commit ensures that if the index is already zero it is not reduced prior to insertion into the hash table. PR: 208566 Changes: head/sys/contrib/ipfilter/netinet/ip_nat.c
Tonight I received a few more outputs from dtrace, see below. I have not noticed any connection failures, so everything appears to be working fine. dtrace: description 'sdt:::frb_natv4in' matched 1 probe dtrace: description 'sdt:::frb_natv4out' matched 1 probe dtrace: description 'sdt:::ns_memfail' matched 1 probe dtrace: description 'sdt:::ns_table_max' matched 1 probe dtrace: description 'sdt:::ns_badnatnew ' matched 1 probe dtrace: description 'sdt:::ns_exhausted_1' matched 1 probe dtrace: description 'sdt:::ns_exhausted_2' matched 1 probe dtrace: description 'sdt:::ns_exhausted_3' matched 1 probe dtrace: description 'sdt:::ns_new_ifpaddr_1' matched 1 probe dtrace: description 'sdt:::ns_wrap' matched 1 probe dtrace: description 'sdt:::ns_appr_fail' matched 1 probe dtrace: description 'sdt:::ns_unfinalised' matched 1 probe dtrace: description 'sdt:::ns_bucket_max_0 ' matched 1 probe dtrace: description 'sdt:::ns_bucket_max_1 ' matched 1 probe CPU ID FUNCTION:NAME 1 29624 none:ns_bucket_max_0 22 1 29649 none:ns_unfinalised 1 29623 none:ns_badnatnew if: vtnet0, pf_nat_newdivert(): no, ipf_nat_newrewrite(): no, direction: ipf_nat_newrdr(), ipf_proxy_new(): ok 1 29638 none:frb_natv4in 1 29624 none:ns_bucket_max_0 22 1 29649 none:ns_unfinalised 1 29623 none:ns_badnatnew if: vtnet0, pf_nat_newdivert(): no, ipf_nat_newrewrite(): no, direction: ipf_nat_newrdr(), ipf_proxy_new(): ok 1 29638 none:frb_natv4in 1 29624 none:ns_bucket_max_0 22 1 29649 none:ns_unfinalised 1 29623 none:ns_badnatnew if: vtnet0, pf_nat_newdivert(): no, ipf_nat_newrewrite(): no, direction: ipf_nat_newrdr(), ipf_proxy_new(): ok 1 29638 none:frb_natv4in 1 29624 none:ns_bucket_max_0 22 1 29649 none:ns_unfinalised 1 29623 none:ns_badnatnew if: vtnet0, pf_nat_newdivert(): no, ipf_nat_newrewrite(): no, direction: ipf_nat_newrdr(), ipf_proxy_new(): ok 1 29638 none:frb_natv4in 1 29624 none:ns_bucket_max_0 22 1 29649 none:ns_unfinalised 1 29623 none:ns_badnatnew if: vtnet0, pf_nat_newdivert(): no, ipf_nat_newrewrite(): no, direction: ipf_nat_newrdr(), ipf_proxy_new(): ok 1 29638 none:frb_natv4in 1 29624 none:ns_bucket_max_0 22 1 29649 none:ns_unfinalised 1 29623 none:ns_badnatnew if: vtnet0, pf_nat_newdivert(): no, ipf_nat_newrewrite(): no, direction: ipf_nat_newrdr(), ipf_proxy_new(): ok 1 29638 none:frb_natv4in 1 29624 none:ns_bucket_max_0 22 1 29649 none:ns_unfinalised 1 29623 none:ns_badnatnew if: vtnet0, pf_nat_newdivert(): no, ipf_nat_newrewrite(): no, direction: ipf_nat_newrdr(), ipf_proxy_new(): ok 1 29638 none:frb_natv4in 1 29624 none:ns_bucket_max_0 22 1 29649 none:ns_unfinalised 1 29623 none:ns_badnatnew if: vtnet0, pf_nat_newdivert(): no, ipf_nat_newrewrite(): no, direction: ipf_nat_newrdr(), ipf_proxy_new(): ok 1 29638 none:frb_natv4in 1 29624 none:ns_bucket_max_0 22 1 29649 none:ns_unfinalised 1 29623 none:ns_badnatnew if: vtnet0, pf_nat_newdivert(): no, ipf_nat_newrewrite(): no, direction: ipf_nat_newrdr(), ipf_proxy_new(): ok 1 29638 none:frb_natv4in 1 29624 none:ns_bucket_max_0 22 1 29649 none:ns_unfinalised 1 29623 none:ns_badnatnew if: vtnet0, pf_nat_newdivert(): no, ipf_nat_newrewrite(): no, direction: ipf_nat_newrdr(), ipf_proxy_new(): ok 1 29638 none:frb_natv4in 1 29624 none:ns_bucket_max_0 22 1 29649 none:ns_unfinalised 1 29623 none:ns_badnatnew if: vtnet0, pf_nat_newdivert(): no, ipf_nat_newrewrite(): no, direction: ipf_nat_newrdr(), ipf_proxy_new(): ok 1 29638 none:frb_natv4in 1 29624 none:ns_bucket_max_0 22 1 29649 none:ns_unfinalised 1 29623 none:ns_badnatnew if: vtnet0, pf_nat_newdivert(): no, ipf_nat_newrewrite(): no, direction: ipf_nat_newrdr(), ipf_proxy_new(): ok 1 29638 none:frb_natv4in 1 29624 none:ns_bucket_max_0 22 1 29649 none:ns_unfinalised 1 29623 none:ns_badnatnew if: vtnet0, pf_nat_newdivert(): no, ipf_nat_newrewrite(): no, direction: ipf_nat_newrdr(), ipf_proxy_new(): ok 1 29638 none:frb_natv4in 1 29624 none:ns_bucket_max_0 22 1 29649 none:ns_unfinalised 1 29623 none:ns_badnatnew if: vtnet0, pf_nat_newdivert(): no, ipf_nat_newrewrite(): no, direction: ipf_nat_newrdr(), ipf_proxy_new(): ok 1 29638 none:frb_natv4in 1 29624 none:ns_bucket_max_0 22 1 29649 none:ns_unfinalised 1 29623 none:ns_badnatnew if: vtnet0, pf_nat_newdivert(): no, ipf_nat_newrewrite(): no, direction: ipf_nat_newrdr(), ipf_proxy_new(): ok 1 29638 none:frb_natv4in 1 29624 none:ns_bucket_max_0 22 1 29649 none:ns_unfinalised 1 29623 none:ns_badnatnew if: vtnet0, pf_nat_newdivert(): no, ipf_nat_newrewrite(): no, direction: ipf_nat_newrdr(), ipf_proxy_new(): ok 1 29638 none:frb_natv4in 1 29624 none:ns_bucket_max_0 22 1 29649 none:ns_unfinalised 1 29623 none:ns_badnatnew if: vtnet0, pf_nat_newdivert(): no, ipf_nat_newrewrite(): no, direction: ipf_nat_newrdr(), ipf_proxy_new(): ok 1 29638 none:frb_natv4in 1 29624 none:ns_bucket_max_0 22 1 29649 none:ns_unfinalised 1 29623 none:ns_badnatnew if: vtnet0, pf_nat_newdivert(): no, ipf_nat_newrewrite(): no, direction: ipf_nat_newrdr(), ipf_proxy_new(): ok 1 29638 none:frb_natv4in
(In reply to Cy Schubert from comment #62) I patched one of my gateways running 11.2-RELEASE-p2. I'll stream video through it and see what happens. When this bug came up and 9.3 went EOL, I had no choice but to stop using ipf and switched to ip. I probably can't go back to ipf permanently, but I will help testing this patch.
(In reply to frank from comment #71) The DTrace output you're seeing now is a different issue. It is actually hitting maxbuckets, a tuning issue. Try this: ipf -T nat_maxbucket=44 This increases it from 22 to 44. Or, increase it to some other number.
(In reply to ddrinnon from comment #72) stable/9 is EOL however I can look at it and checkout this branch and apply the fix there if it needs it. Unfortunately I do not have any FreeBSD-9 machines here so I will not be able to test it.
(In reply to ddrinnon from comment #72) There is a problem when you transmit then rtmp sound lags behind in a week Maybe this can not be tied down. ffmpeg
(In reply to Aleks from comment #75) 1. The patch was posted not even 3 days ago. 3 days != 1 week. Please explain. 2. What ports does your application use? What protocol (UDP/TCP/UDP-lite...)?
FreeBSD 11.0-STABLE #0 /usr/local/bin/ffmpeg -cpuflags -sse4.2+avx+sse3+sse2+sse+mmx -i "$SOURCE" -deinterlace -vcodec libx264 -pix_fmt nv12 -aspect 16:9 -preset $QUAL -r $FPS -g $(($FPS * 2)) -b:v $VBR -acodec libfdk_aac -ar 44100 -af "volume=10dB" -threads 6 -qscale 3 -b:a 529000 -bufsize 512k -metadata title="www.111.com" -metadata streamName="www.111.com" -s 1024x576 -f flv "$YOUTUBE_URL"
VBR="2500k" # Bitrate de la vidéo en sortie FPS="25" # FPS de la vidéo en sortie QUAL="medium" # Preset de qualité FFMPEG TIMES="00:00:00" # TIME START TINEK="23:59:58" # TIME STOP SOURCE="udp://225.17.1.2:4000?fifo_size=1000000&overrun_nonfatal=1"
I have registered the problem in Dan 2016-04-06 07:24:33 UTC But I had to go ipfw
Aleks, The youtube video doesn't tickle the bug. Whatever is going on there is not documented by this bug. You should open a new PR for this.
(In reply to Aleks from comment #79) Also, when you do open the sister ticket to this one, please include output from the DTrace script. ipnat -lv and ipfstat -sl outputs. I'm not sure if the new issue is a nat or state issue. I suspect (from experience) incorrect rules that create a new state table entry for each new packet that arrives. I'll need to see your ipf.conf and ipnat.conf.
A commit references this bug: Author: cy Date: Wed Aug 22 01:04:52 UTC 2018 New revision: 338169 URL: https://svnweb.freebsd.org/changeset/base/338169 Log: MFC r338045: Expose np (nat_t - an entry in the nat table structure) in the DTrace probe when nat fails (label badnat). This is useful in diagnosing failed NAT issues and was used in PR/208566. PR: 208566 Changes: _U stable/10/ stable/10/sys/contrib/ipfilter/netinet/ip_nat.c _U stable/11/ stable/11/sys/contrib/ipfilter/netinet/ip_nat.c
A commit references this bug: Author: cy Date: Wed Aug 22 01:23:12 UTC 2018 New revision: 338170 URL: https://svnweb.freebsd.org/changeset/base/338170 Log: MFC r338046: Add handy DTrace probes useful in diagnosing NAT issues. DTrace probes are situated next to error counters and/or in one instance prior to the -1 return from various functions. This was useful in diagnosis of PR/208566 and will be handy in the future diagnosing NAT failures. PR: 208566 Changes: _U stable/10/ stable/10/sys/contrib/ipfilter/netinet/ip_nat.c _U stable/11/ stable/11/sys/contrib/ipfilter/netinet/ip_nat.c
A commit references this bug: Author: cy Date: Wed Aug 22 01:43:12 UTC 2018 New revision: 338171 URL: https://svnweb.freebsd.org/changeset/base/338171 Log: MFC r338047: The bucket index is subtracted by one at lines 2304 and 2314. When 0 it becomes -1, except these are unsigned integers, so they become very large numbers. Thus are always larger than the maximum bucket; the hash table insertion fails causing NAT to fail. This commit ensures that if the index is already zero it is not reduced prior to insertion into the hash table. PR: 208566 Changes: _U stable/10/ stable/10/sys/contrib/ipfilter/netinet/ip_nat.c _U stable/11/ stable/11/sys/contrib/ipfilter/netinet/ip_nat.c
(In reply to frank from comment #71) Can you let me know what maxbucket value you used?
(In reply to Cy Schubert from comment #85) I now use this: nat_table_size min 1 max 2147483647 current 2053 nat_maxbucket min 1 max 2147483647 current 1000 But I still need to figure out what exactly is the maxbucket limit. Unfortunately, work does not allow me to spend any time on it.
(In reply to frank from comment #86) A bucket is an array index used in a hash function. ipf -T nat_maxbucket lists your maxumum number of buckets. There are two hash tables. One for input and the other for output. The dtrace probes ns_bucket_max_0 and ns_bucket_max_1 will display (arg[0]) the input and output bucket values when maximum buckets is reached. Consider these dtrace probes, when activated, similar to a printf statement in the kernel. I'm not sure if this is the question you need an answer to. If it is not, please rephrase.
This PR has been fixed.
(In reply to Cy Schubert from comment #87) The thing I can't get my head around is the default values. Why is the default calculated max bucket size 22, while there are 2047 possible hash values?
(In reply to frank from comment #89) It counts the number of 0 and 1 bits in ipf nat table size then multiplies by two. if (softn->ipf_nat_maxbucket == 0) { for (i = softn->ipf_nat_table_sz; i > 0; i >>= 1) softn->ipf_nat_maxbucket++; softn->ipf_nat_maxbucket *= 2; } I'm not sure how Darren arrived at this. 2047 is 0x7ff, 11 bits * 2 is 22. You can try rebuilding your kernel or ipl.ko with LARGE_NAT defined in ip_nat.h. However this alters a number of tables and changes the way NAT aging works. I've been toying with the idea of making this dynamic however these are set by ipf_nat_soft_create() when ipfilter is initialized. Therefore they need to be set prior to the ipfilter module loading or boot. This would require a kenv (static and module) and a sysctl (in the case that ipfilter is unloaded and reloaded). Documenting this would probably be quite confusing for the end-user.
(In reply to Cy Schubert from comment #90) But should maxbucket not simply be the same as the hash value since that is the number of possible buckets (i.e. 2047)? Anyway 22 is ridicuously low. It must be very easy to create 23 tcp sessions that map to a different bucket and then you have a kind of DoS attack.
(In reply to frank from comment #91) Your NAT table is 2047 entries. The number of buckets in the hash table is 22 for input and another 22 for output. 22 is fine for most applications. If you have a very large network then add LARGE_NAT to ip_compat.h and rebuild your kernel or module. Be aware that this will use a lot more kernel memory. It also changes how entries age and are expired. Second year Comp Sci.
what to add where exactly?
Your kernel config. Update GENERIC or better yet, create your own and include GENERIC. Then, make KERNCONF=YOUR_KERNEL buildkernel installkernel. Beware though, it will use more kernel memory.
/sbin/ipf -T nat_maxbucket=2047
It's different from that. LARGE_NAT generates different code.
add LARGE_NAT to ip_compat.h ?????????????????
edit ip_nat.h change #undef LARGE_NAT to #define LARGE_NAT
??? //#undef LARGE_NAT #define LARGE_NAT
Yes, that is one way to do it and probably what Darren thought of when he wrote ipfilter. This ticket has been resolved. PRs are for breakage, generally. If you have usage questions you should discuss them on the mailing lists or forums. I'll even answer emails sent to me personally but polluting a ticket with other chatter will confuse others who review these tickets.
Version 12.1-2 is a cooler problem. This is how I decided. /sbin/ipf -T nat_maxbucket=4095 ip_nat.h # ifdef LARGE_NAT //# define<--->NAT_SIZE<------>2047 # define<----->NAT_SIZE<------>4095 # else //# define<--->NAT_SIZE<------>127 # define<----->NAT_SIZE<------>4095 # endif #endif #ifndef RDR_SIZE # ifdef LARGE_NAT //# define<--->RDR_SIZE<------>2047 # define<----->RDR_SIZE<------>4095 # else //# define<--->RDR_SIZE<------>127 # define<----->RDR_SIZE<------>4095 # endif #endif #ifndef HOSTMAP_SIZE # ifdef LARGE_NAT //# define<--->HOSTMAP_SIZE<-->8191 # define<----->HOSTMAP_SIZE<-->16383 # else //# define<--->HOSTMAP_SIZE<-->2047 # define<----->HOSTMAP_SIZE<-->16383 # endif #endif #ifndef NAT_TABLE_MAX /* * This is newly introduced and for the sake of "least surprise", the numbers * present aren't what we'd normally use for creating a proper hash table. */ # ifdef>LARGE_NAT //# define<--->NAT_TABLE_MAX<->18000 # define<----->NAT_TABLE_MAX<->1000000 # else //# define<--->NAT_TABLE_MAX<->30000 # define<----->NAT_TABLE_MAX<->1000000 # endif #endif #ifndef NAT_TABLE_SZ # ifdef LARGE_NAT //# define<--->NAT_TABLE_SZ<-->16383 # define<----->NAT_TABLE_SZ<-->32767 # else //# define<--->NAT_TABLE_SZ<-->2047 # define<----->NAT_TABLE_SZ<-->32767 # endif cd /usr/src/sys/amd64/compile/test make cleandepend && make depend make depend && make && make install