Bug 222126

Summary:

pf is not clearing expired states

Product:

Base System

Reporter:

noah.bergbauer

Component:

kern

Assignee:

freebsd-arm (Nobody) <freebsd-arm>

Status:

Closed FIXED

Severity:

Affects Some People

CC:

adamw, bjornr, farrokhi, freebsd.ports, hlh, hselasky, jmg, kp, luke.hamburg, mail, maximos, mm, prox

Priority:

---

Version:

11.1-RELEASE

Hardware:

amd64

OS:

Any

See Also:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229644

Attachments:

Description	Flags
run a bunch of threads, and detect if sleep fails to wakeup	none

Description noah.bergbauer 2017-09-07 17:32:35 UTC

Ever since I updated this server from 10.3-RELEASE to 11.1-RELEASE a few weeks ago, it sometimes just stops accepting connections (existing connections are fine). The kernel complains about too many firewall states:

[zone: pf states] PF states limit reached

A quick look at those states with pfctl reveals ten-thousands of old and dead connections that should be long gone - for example, FIN_WAIT_2 states with an age of three hours. The pfctl output says "expires in 00:00:00" for all of these connections, so pf obviously agrees that they're dead but doesn't delete them for some reason.

When I first diagnosed this problem, adding "set timeout interval 1" to the pf configuration immediately cleared out the old states and the server was up and running again. However, this did not permanently fix the issue. The server keeps going down regularly and I have to manually flush the pf states to get it back online.

Comment 1 Kristof Provost freebsd_committer

2017-09-07 18:43:07 UTC

I don't think I've seen that before.

Do you have a non-default kernel config? (I'm especially interested in knowing if you're running VIMAGE, but any non-default settings are interesting to know about)

Can you check if the pf purge thread is running? ('[pf purge'])

Please also give this dtrace script a quick try:

#!/usr/sbin/dtrace -s

fbt:pf:pf_purge_expired_states:entry
{

}

fbt:pf:pf_purge_expired_states:return
{

}

fbt:pf:pf_purge_unlinked_rules:entry
{

}

fbt:pf:pf_purge_unlinked_rules:return
{

}

fbt:pf:pf_purge_expired_fragments:entry
{

}

fbt:pf:pf_purge_expired_src_nodes:entry
{

}

fbt:pf:pf_purge_expired_src_nodes:return
{

}

fbt:pf:pf_purge_thread:entry
{

}

It should show us if the purge code is running or not.

Comment 2 noah.bergbauer 2017-09-07 19:00:50 UTC

The purge thread is there:

root        16    0,0  0,0       0     16  -  DL   10:57      0:00,04 - [pf purge]


The dtrace script fails:

dtrace: failed to compile script ./pf.dtrace: line 3: probe description fbt:pf:pf_purge_expired_states:entry does not match any probes


The kernel configuration is mostly just GENERIC with plenty of unneeded drivers removed, but VIMAGE is enabled and "device pf" as well.

Comment 3 Kristof Provost freebsd_committer

2017-09-07 19:11:24 UTC

(In reply to noah.bergbauer from comment #2)
The function may have wound up being inlined in your kernel config, so we can't instrument it.
Just remove whatever functions dtrace can't find, we'll get something out of what's left over.

I'm building an equivalent config to try to reproduce, but that will take a bit of time.

Comment 4 noah.bergbauer 2017-09-07 19:27:14 UTC

I just realized that because pf is compiled into the kernel, the dtrace probes need to be fbt:kernel:* instead of fbt:pf:* . With that, it starts up:

dtrace: script './pf.dtrace' matched 8 probes

And then just sits there with no output.

As states were filling up again, I removed the "set timeout interval 1" from the config and re-added it. This again cleared out the dead states and dtrace started printing things for a few seconds, but now it's silent again.

Comment 5 Kristof Provost freebsd_committer

2017-09-07 19:52:29 UTC

Okay, that's an interesting clue.
Let's see if we can work out where it's getting stuck. Can you do 'sudo procstat -kk <pid of pf purge>'?

Comment 6 noah.bergbauer 2017-09-07 19:58:49 UTC

# procstat -kk 16
   16 100102 pf purge            -                   mi_switch+0xe5 sleepq_timedwait+0x42 _sleep+0x235 pf_purge_thread+0x8e fork_exit+0x85 fork_trampoline+0xe

Comment 7 Kristof Provost freebsd_committer

2017-09-07 20:32:36 UTC

Hmm, that looks entirely normal. Your purge thread is sleeping in the rw_sleep() call. It should be waking up every 10 seconds or so. It's either not waking up, or it's not actually doing anything when it wakes up.
I have no idea how either one could happen.

Comment 8 noah.bergbauer 2017-09-07 21:35:03 UTC

The workaround I'm using now is:

echo "set timeout interval 2" | pfctl -mf - && echo "set timeout interval 1" | pfctl -mf -

Every hour in crontab.

It causes the purge thread to wake up (https://github.com/freebsd/freebsd/blob/e0bfaddbc2f6dac2c494ca1fa0566cd4e2ee0659/sys/netpfil/pf/pf_ioctl.c#L1924) and from there it works fine for a while (very random, last time it was just a few seconds but now it's 20 minutes and still going).


So this means that it's apparently just not waking up for some reason.

Comment 9 Kristof Provost freebsd_committer

2017-09-08 09:25:16 UTC

That's quite odd. It's almost as if rw_sleep() isn't doing what it's supposed to do.
That seems unlikely, but not impossible.
Is the output of 'sysctl kern.hz' sane?

If you can experiment on the box it might be worth testing to see if r320848 (PR 219251) helps. I can't quite convince myself that it would produce these symptoms, but it can't really hurt to try.

Comment 10 noah.bergbauer 2017-09-08 10:47:06 UTC

# sysctl kern.hz
kern.hz: 1000

I can't really experiment on the machine but from what I can see, r320848 shouldn't change anything anyways because VNET isn't in use right now.

Comment 11 Max 2017-09-11 04:41:59 UTC

(In reply to noah.bergbauer from comment #10)
Are there any limits in your ruleset? And what does the "pfctl -vsi" show?
Sounds familiar... just like bug #217997.

Comment 12 noah.bergbauer 2017-09-11 10:42:39 UTC

set limit { states 100000, src-nodes 10000 }

One of my first attempts to fix this was increasing both limits 10x - didn't help though.


# pfctl -vsi
No ALTQ support in kernel
ALTQ related functions disabled
Status: Enabled for 1 days 14:44:53           Debug: Urgent

Hostid:   0x4b1e78c2
Checksum: 0x67f2a9cbd7b0d65ce52864ecfc156ebb

State Table                          Total             Rate
  current entries                     3839               
  searches                       360179452         2582.1/s
  inserts                           594949            4.3/s
  removals                          591110            4.2/s
Source Tracking Table
  current entries                        0               
  searches                               0            0.0/s
  inserts                                0            0.0/s
  removals                               0            0.0/s
Counters
  match                             689782            4.9/s
  bad-offset                             0            0.0/s
  fragment                              16            0.0/s
  short                                  0            0.0/s
  normalize                              0            0.0/s
  memory                                 0            0.0/s
  bad-timestamp                          0            0.0/s
  congestion                             0            0.0/s
  ip-option                              0            0.0/s
  proto-cksum                          450            0.0/s
  state-mismatch                       942            0.0/s
  state-insert                           0            0.0/s
  state-limit                            0            0.0/s
  src-limit                              0            0.0/s
  synproxy                               0            0.0/s
  map-failed                             0            0.0/s
Limit Counters
  max states per rule                    0            0.0/s
  max-src-states                         0            0.0/s
  max-src-nodes                          0            0.0/s
  max-src-conn                           0            0.0/s
  max-src-conn-rate                      0            0.0/s
  overload table insertion               0            0.0/s
  overload flush states                  0            0.0/s

Comment 13 Max 2017-09-11 11:38:31 UTC

(In reply to noah.bergbauer from comment #12)
> Status: Enabled for 1 days 14:44:53
Have you had any issues during this period?
And do you know which rule produces expired states?

Comment 14 noah.bergbauer 2017-09-11 11:42:51 UTC

(In reply to Max from comment #13)

Maybe, maybe not. The point of my workaround is to get a mostly functioning machine. However, the reboot right before this period was necessary because for some reason not even that workaround works consistently.

Comment 15 Kristof Provost freebsd_committer

2017-09-11 11:53:45 UTC

(In reply to noah.bergbauer from comment #14)
Given the nature of your workaround and what we've seen from Dtrace I don't think that #217997 is the problem.
I'm also pretty sure that that fix is included in 11.1.

Comment 16 Max 2017-09-11 12:06:00 UTC

(In reply to noah.bergbauer from comment #14)
I'll try to reproduce the problem. But I need some starting point. Rules, dead connections state entries... Anything?

Comment 17 Max 2017-09-11 12:10:27 UTC

(In reply to Kristof Provost from comment #15)
You are right. It is not the problem. But it looks quite similar.

Comment 18 Henri Hennebert 2017-09-17 15:47:57 UTC

I encounter the same problem with current on arm64 - pine64+ 2GB:

FreeBSD norquay.restart.bel 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r320599M: Sat Jul  8 18:24:19 CEST 2017     root@norquay.restart.bel:/usr/obj/usr/src/sys/NORQUAY  arm64

Comment 19 Kristof Provost freebsd_committer

2017-09-21 10:12:51 UTC

(In reply to hlh from comment #18)
Can you also run the dtrace script and 'sudo procstat -kk <pid of pf purge>'?
Does the workaround also work for you?

Can you post your ruleset?

Comment 20 Henri Hennebert 2017-09-22 10:05:20 UTC

The same workaround is working for me:

echo "set timeout interval 5" | pfctl -mf -
echo "set timeout interval 10" | pfctl -mf -

but to be sure I also do:
nohup service pf restart

For more than 1 week (without any config change) the problem don't exist.

Next time it appears I will do the dtrace and procstat.

Comment 21 Henri Hennebert 2017-09-22 10:07:55 UTC

Here is my pf.conf:

#	$FreeBSD: src/etc/pf.conf,v 1.3 2006/01/27 17:16:20 mlaier Exp $
#	$OpenBSD: pf.conf,v 1.21 2003/09/02 20:38:44 david Exp $
#
# See pf.conf(5) and /usr/share/examples/pf for syntax and examples.
# Required order: options, normalization, queueing, translation, filtering.
# Macros and tables may be defined and used anywhere.
# Note that translation rules are first match while filter rules are last match.

# Macros: define common values, so they can be referenced and changed easily.
#ext_if="ext0"	# replace with actual external interface name i.e., dc0
#int_if="int0"	# replace with actual internal interface name i.e., dc1
#internal_net="10.1.1.1/8"
#external_addr="192.168.1.1"

#--- RestartSoft ---
int_if="awg0"
int_net="192.168.24.0/24, 2001:41d0:8:bdbe:1::/80"
vdsl_if="ng0"
tunnel_if="gif0"
virtualbox_net="192.168.22.0/24, 2001:41d0:8:bdbe:2::/80"
tignes="5.135.182.190"
ftp_passive="30000:31000"
smtp1="24"
http1="8080"
squid="3128"
tcp_services="{" "ftp" $ftp_passive $smtp1 $squid "nntp" "http" $http1 "https" "}"
bittorrent="6881:6889"
donkey_tcp="{ 4662 6346 6347 6667 6881 6882 7254 21776 }"
donkey_udp="{ 4666 6346 6347 7254 21780 }"

meribel="192.168.24.8"
platon="192.168.24.192"
emule_tcp="{ 4242 4661 4662 4881 }"
emule_udp="{ 4246 4665 4672 }"

# Tables: similar to macros, but more flexible for many addresses.
#table <foo> { 10.0.0.0/8, !10.1.0.0/16, 192.168.0.0/24, 192.168.1.18 }

#--- RestartSoft ---
table <rfc1918> const { 127.0.0.0/8, 192.168.0.0/16, 172.16.0.0/12, 10.0.0.0/8 }
table <internal_nets> const { $int_net, $virtualbox_net }
table <bad_hosts> persist

# Options: tune the behavior of pf, default values are given.
#set timeout { interval 10, frag 30 }
#set timeout { tcp.first 120, tcp.opening 30, tcp.established 86400 }
#set timeout { tcp.closing 900, tcp.finwait 45, tcp.closed 90 }
#set timeout { udp.first 60, udp.single 30, udp.multiple 60 }
#set timeout { icmp.first 20, icmp.error 10 }
#set timeout { other.first 60, other.single 30, other.multiple 60 }
#set timeout { adaptive.start 0, adaptive.end 0 }
#--- RestartSoft ---
set limit { states 30000, src-nodes 20000, frags 20000 }
#set loginterface none
#set optimization normal
#set block-policy drop
#set require-order yes
#set fingerprints "/etc/pf.os"

#--- RestartSoft ---
set block-policy drop
set debug urgent
set loginterface $vdsl_if
set state-policy if-bound 

# Normalization: reassemble fragments and resolve or reduce traffic ambiguities.
#scrub in all

#--- RestartSoft ---
scrub in all

# Queueing: rule-based bandwidth control.
#altq on $ext_if bandwidth 2Mb cbq queue { dflt, developers, marketing }
#queue dflt bandwidth 5% cbq(default)
#queue developers bandwidth 80%
#queue marketing  bandwidth 15%

#--- RestartSoft ---
#- altq on $vdsl_if bandwidth 10Mb cbq queue { dflt, p2p_upload }
#- queue dflt       bandwidth 80% cbq(default)
#- queue p2p_upload bandwidth 20%

# Translation: specify how addresses are to be mapped or redirected.
# nat: packets going out through $ext_if with source address $internal_net will
# get translated as coming from the address of $ext_if, a state is created for
# such packets, and incoming packets will be redirected to the internal address.
#nat on $ext_if from $internal_net to any -> ($ext_if)

#--- RestartSoft ---
# Translate all internal networks
# Special case for ekiga (voip) on meribel
nat on $vdsl_if proto udp from $meribel   to any -> ($vdsl_if) static-port 
nat on $vdsl_if inet from <internal_nets> to any -> ($vdsl_if)

# rdr: packets coming in on $ext_if with destination $external_addr:1234 will
# be redirected to 10.1.1.1:5678. A state is created for such packets, and
# outgoing packets will be translated as coming from the external address.
#rdr on $ext_if proto tcp from any to $external_addr/32 port 1234 -> 10.1.1.1 port 5678

# rdr outgoing FTP requests to the ftp-proxy
#rdr on $int_if proto tcp from any to any port ftp -> 127.0.0.1 port 8021

# spamd-setup puts addresses to be redirected into table <spamd>.
#table <spamd> persist
#no rdr on { lo0, lo1 } from any to any
#rdr inet proto tcp from <spamd> to any port smtp -> 127.0.0.1 port 8025
#--- RestartSoft ---
# Ekiga is running on meribel (192.168.24.8)
rdr on $vdsl_if proto udp from any to ($vdsl_if) port 5000:5100 -> $meribel
rdr on $vdsl_if proto tcp from any to ($vdsl_if) port 1720      -> $meribel

# Filtering: the implicit first two rules are
#pass in all
#pass out all

# block all incoming packets but allow ssh, pass all outgoing tcp and udp
# connections and keep state, logging blocked packets.
#block in log all
#pass  in  on $ext_if proto tcp from any to $ext_if port 22 keep state
#pass  out on $ext_if proto { tcp, udp } all keep state

#--- RestartSoft ---
# Setup a default deny policy
block in  log all
block out log all

# pass incoming packets destined to the addresses given in table <foo>.
#pass in on $ext_if proto { tcp, udp } from any to <foo> port 80 keep state

# pass incoming ports for ftp-proxy
#pass in on $ext_if inet proto tcp from any to $ext_if port > 49151 keep state

# Alternate rule to pass incoming ports for ftp-proxy
# NOTE: Please see pf.conf(5) BUGS section before using user/group rules.
#pass in on $ext_if inet proto tcp from any to $ext_if user proxy keep state

#--- RestartSoft ---
# Skip filtering on loopback interfaces
set skip on lo0

# Block Spoofed Packets
# antispool don't work for bridge
block in log quick from ($int_if) to any

# Block RFC1918 addresses on VDSL
block log quick on $vdsl_if from <rfc1918> to any

# Block bad hosts trying ssh or smtp
block log quick on $vdsl_if from <bad_hosts> to any

# Block networks known to use  Windows Metafile Vulnerability ---
# see http://isc.sans.org/diary.php?storyid=997
block log quick on $vdsl_if from {69.50.160.0/19, 85.255.112.0/20 } to any

# Allow traffic between physical internal networks (bridge0)
#- pass  in  quick on $int_if0 all no state
#- pass  out quick on $int_if0 all no state
#- pass  in  quick on $int_if1 all no state
#- pass  out quick on $int_if1 all no state

# Allow traffic with physical internal networks
pass      quick on $int_if  from ($int_if:network)  to ($int_if:network)   keep state
pass  in  quick on $int_if  from ($int_if:network)  to ($vdsl_if)          keep state
pass      quick on $int_if  proto ipencap

# Allow traffic with virtualbox internal network
pass  in  quick on $int_if  from {$virtualbox_net}  to ($int_if)           keep state
pass  in  quick on $int_if  from {$virtualbox_net}  to ($vdsl_if)          keep state
pass  out quick on $int_if  from ($int_if)          to {$virtualbox_net}   keep state

# Allow physical internal networks to go to INTERNET - caveat quick keyword
#- pass  in  quick on $int_if  proto tcp from $platon  to any port $emule_tcp keep state queue p2p_upload
#- pass  in  quick on $int_if  proto udp from $platon  to any port $emule_udp keep state queue p2p_upload
pass  in  quick on $int_if  from ($int_if:network)  to any                 keep state

# Allow virtualbox internal network to go to INTERNET
pass  in  quick on $int_if  from {$virtualbox_net}  to any                 keep state

# Allow all connections that I initiate to INTERNET
pass  out quick on $vdsl_if    proto { tcp, udp, icmp } all                keep state

# Allow DNS and NTP (keep state take care of this)
#- pass  in  quick on $vdsl_if proto udp from any port domain to ($vdsl_if)
#- pass  in  quick on $vdsl_if proto udp from any port ntp    to ($vdsl_if) port ntp

# Allow resolvers on INTERNET to use our DNS
#- pass  in  quick on $vdsl_if proto udp from any             to ($vdsl_if) port domain

# Allow IPv4 PING from everywhere
pass      quick inet proto icmp all icmp-type echoreq keep state

# Allow NEEDFRAG from everywhere (see /usr/include/netinet/ip_icmp.h)
pass  in  quick inet proto icmp all icmp-type unreach code needfrag keep state

# Allow icmp6 from everywhere
pass  in  quick inet6 proto icmp6 all

# Reserved for test of mpd-4.3 with synproxy
pass  in  quick on $vdsl_if proto tcp from any to ($vdsl_if)          port 8000 flags S/SA synproxy state

# Allow everyone on INTERNET to connect to the following services
#--- NOTE --- synproxy can't work on $vdsl_if managed by mpd4.2.2 / mpd4.3
pass  in  quick on $vdsl_if   proto tcp from any to ($vdsl_if)        port ssh           flags S/SA keep state (source-track rule, max-src-conn-rate 1/30, overload <bad_hosts> flush global)
pass  in  quick on $vdsl_if   proto tcp from any to ($vdsl_if)        port smtp          flags S/SA keep state (source-track rule, max-src-conn-rate 15/10, overload <bad_hosts> flush global)
pass  in  quick on $vdsl_if   proto tcp from any to ($vdsl_if)        port $tcp_services flags S/SA keep state
# Services in IPv6 (no IPv4 routing in tunnel)
pass  in  quick on $tunnel_if proto tcp from any to any               port $tcp_services flags S/SA keep state

# Allow BitTorrent traffic with INTERNET
pass  in  quick on $vdsl_if proto tcp from any to ($vdsl_if)  port $bittorrent  flags S/SA keep state

# Allow Donkey traffic with INTERNET
pass  in  quick on $vdsl_if proto tcp from any to ($vdsl_if) port $donkey_tcp   flags S/SA keep state
pass  in  quick on $vdsl_if proto udp from any to ($vdsl_if) port $donkey_udp              keep state

# Allow ekiga traffic (on meribel - see rdr)
pass  in  quick on $vdsl_if proto udp from any to any        port 5000:5100                keep state
pass  in  quick on $vdsl_if proto tcp from any to any        port 1720                     keep state
pass  out quick on $int_if  proto udp from any to any        port 5000:5100                keep state
pass  out quick on $int_if  proto tcp from any to any        port 1720                     keep state

# Allow IPsec
pass  in  quick on $vdsl_if proto udp from any        to ($vdsl_if) port isakmp     keep state
pass  out quick on $vdsl_if proto udp from ($vdsl_if) to any        port isakmp     keep state
pass  in  quick on $vdsl_if proto udp from any        to ($vdsl_if) port 4500       keep state
pass  in  quick on $vdsl_if proto ah  from any        to ($vdsl_if)
pass  out quick on $vdsl_if proto ah  from ($vdsl_if) to any
pass  in  quick on $vdsl_if proto esp from any        to ($vdsl_if)
pass  out quick on $vdsl_if proto esp from ($vdsl_if) to any

# Allow IPv6 with tignes - tunnelling without encryption
pass      quick on $vdsl_if   inet  proto ipv6
pass      quick on $tunnel_if inet6

# Allow IPv4 with tignes - tunnelling without encryption
pass  in  quick on $vdsl_if   proto ipencap from $tignes to ($vdsl_if)
pass  out quick on $vdsl_if   proto ipencap from ($vdsl_if) to $tignes
pass      quick on $tunnel_if inet

# Allow IPV6 with internal networks
pass      quick on $int_if inet6

# assign packets to a queue.
#pass out on $ext_if from 192.168.0.0/24 to any keep state queue developers
#pass out on $ext_if from 192.168.1.0/24 to any keep state queue marketing

Comment 22 Henri Hennebert 2017-09-26 12:05:26 UTC

The problem crop up:

[root@norquay ~]# pfctl -si
Status: Enabled for 1 days 08:09:42           Debug: Urgent

Interface Stats for ng0               IPv4             IPv6
  Bytes In                      3355637698                0
  Bytes Out                      236586554                0
  Packets In
    Passed                         2587532                0
    Blocked                           3290                0
  Packets Out
    Passed                         2395320                0
    Blocked                            109                0

State Table                          Total             Rate
  current entries                       31               
  searches                        10992548           94.9/s
  inserts                            77585            0.7/s
  removals                           77052            0.7/s
Counters
  match                              86805            0.7/s
  bad-offset                             0            0.0/s
  fragment                               0            0.0/s
  short                                  0            0.0/s
  normalize                              0            0.0/s
  memory                                 0            0.0/s
  bad-timestamp                          0            0.0/s
  congestion                             0            0.0/s
  ip-option                              0            0.0/s
  proto-cksum                            0            0.0/s
  state-mismatch                         9            0.0/s
  state-insert                           2            0.0/s
  state-limit                            0            0.0/s
  src-limit                              8            0.0/s
  synproxy                             105            0.0/s
  map-failed                             0            0.0/s

[root@norquay ~]# pfctl -ss|wc -l
     533

[root@norquay ~]# procstat -kk 7
  PID    TID COMM                TDNAME              KSTACK                       
    7 100084 pf purge            -                   mi_switch+0x118 sleepq_timedwait+0x40 _sleep+0x268 pf_purge_thread+0xec fork_exit+0x94 

[root@norquay dtrace]# ./pf.dtrace 
dtrace: script './pf.dtrace' matched 4 probes
dtrace: buffer size lowered to 2m

after:

[root@norquay ~]# echo "set timeout interval 5" | pfctl -mf -

CPU     ID                    FUNCTION:NAME
  1   2257    pf_purge_expired_states:entry 
  1   2258   pf_purge_expired_states:return 
  1   2258   pf_purge_expired_states:return 
  1   2258   pf_purge_expired_states:return 
  1   2257    pf_purge_expired_states:entry 
  1   2258   pf_purge_expired_states:return 
  1   2258   pf_purge_expired_states:return 
  1   2257    pf_purge_expired_states:entry 
  1   2258   pf_purge_expired_states:return 
  1   2258   pf_purge_expired_states:return 
  1   2257    pf_purge_expired_states:entry 
  1   2258   pf_purge_expired_states:return 
  1   2258   pf_purge_expired_states:return 
  1   2257    pf_purge_expired_states:entry 
  1   2258   pf_purge_expired_states:return 
  1   2258   pf_purge_expired_states:return 
  1   2257    pf_purge_expired_states:entry 
  1   2258   pf_purge_expired_states:return 
  1   2258   pf_purge_expired_states:return 
  1   2257    pf_purge_expired_states:entry 
  1   2258   pf_purge_expired_states:return 
  1   2258   pf_purge_expired_states:return 
  1   2257    pf_purge_expired_states:entry 
  1   2258   pf_purge_expired_states:return 
  1   2258   pf_purge_expired_states:return 
  1   2257    pf_purge_expired_states:entry 
  1   2258   pf_purge_expired_states:return 
  1   2258   pf_purge_expired_states:return 
  1   2257    pf_purge_expired_states:entry 
  1   2258   pf_purge_expired_states:return 
  1   2258   pf_purge_expired_states:return 
  1   2257    pf_purge_expired_states:entry 
  1   2258   pf_purge_expired_states:return 
  1   2258   pf_purge_expired_states:return 
  1   2257    pf_purge_expired_states:entry 
  1   2258   pf_purge_expired_states:return 
  1   2258   pf_purge_expired_states:return 
  1   2257    pf_purge_expired_states:entry 
  1   2258   pf_purge_expired_states:return 
  1   2258   pf_purge_expired_states:return 
  3   2257    pf_purge_expired_states:entry 
  3   2258   pf_purge_expired_states:return 
  3   2258   pf_purge_expired_states:return 
  3   2257    pf_purge_expired_states:entry 
  3   2258   pf_purge_expired_states:return 
  3   2258   pf_purge_expired_states:return 
  3   2257    pf_purge_expired_states:entry 
  3   2258   pf_purge_expired_states:return 
  3   2258   pf_purge_expired_states:return 
  3   2257    pf_purge_expired_states:entry 
  3   2258   pf_purge_expired_states:return 
  3   2258   pf_purge_expired_states:return 
  3   2257    pf_purge_expired_states:entry 
  3   2258   pf_purge_expired_states:return 
  3   2258   pf_purge_expired_states:return 
  3   2258   pf_purge_expired_states:return 
  3   2257    pf_purge_expired_states:entry 
  3   2258   pf_purge_expired_states:return 
  3   2258   pf_purge_expired_states:return 
  1   2257    pf_purge_expired_states:entry 
  1   2258   pf_purge_expired_states:return 
  1   2258   pf_purge_expired_states:return 
  1   2257    pf_purge_expired_states:entry 
....
[root@norquay ~]# echo "set timeout interval 10" | pfctl -mf -

And then no new state are created!

[root@norquay ~]# nohup service pf restart

resume the normal work of pf.

Comment 23 Kristof Provost freebsd_committer

2017-09-27 19:10:35 UTC

(In reply to hlh from comment #22)
Did you take that '# pfctl -si' before attempting any of the workarounds? It shows you've got 31 states, I don't see how you'd hit the state limit that way.

It might be interesting to keep dtrace running while you run the workaround. 
Everything I see so far suggests that we're stuck in the rw_sleep() (which was changed to an sx_sleep() in CURRENT), but that makes no sense. I don't see how that could happen, or how the workaround could help there.

Can you confirm you're not running low on memory? That might conceivably also trigger the 'PF states limit reached' warning (although I'd expect to see many other warnings too), but I'm not sure how that would look like a frozen purge thread.

Comment 24 Henri Hennebert 2017-09-28 06:25:01 UTC

I run the pfctl -si after I detected the problem (with pftop) and before attempting any workaround.

Here is the typical top:

last pid: 38563;  load averages:  0.15,  0.28,  0.25                                                        up 7+14:37:46  08:20:09
67 processes:  1 running, 64 sleeping, 2 zombie
CPU:  0.0% user,  0.0% nice,  0.2% system,  0.1% interrupt, 99.7% idle
Mem: 49M Active, 84M Inact, 76M Laundry, 1728M Wired, 27M Free
ARC: 571M Total, 63M MFU, 358M MRU, 4934K Anon, 6512K Header, 138M Other
     315M Compressed, 520M Uncompressed, 1.65:1 Ratio
Swap: 4096M Total, 285M Used, 3811M Free, 6% Inuse

I don't see the 'PF states limit reached'.

I run the dtrace when I detected the problem. The dtrace show nothing and start showing the trace after I run the first workaround.

Comment 25 Kristof Provost freebsd_committer

2017-09-28 06:40:12 UTC

(In reply to hlh from comment #24)
Wait? You don't see 'PF states limit reached'? What are you seeing, exactly, and how do you detect it?

Comment 26 Henri Hennebert 2017-09-28 07:59:21 UTC

The first time I detected this problem was when a computer was not allow a connection to the internet. I check the gateway (the pine64 running CURRENT) and find the 'PF states limit reached' in /var/log/messages. Then I run pftop and see that there was a huge number of states.

Reboot the gateway solved the problem. I dig further and find the workaround.

I add
set limit { states 30000, src-nodes 20000, frags 20000 }
to /etc/pf.conf.

Then I regularly check with pftop. For more than one week, no problem. But I continue to check and it occurs again.

I have to check only from time to time because even when the problem arise, the limit of 30000 is large enough to allow for new connections to be established for some time...

Comment 27 Kristof Provost freebsd_committer

2017-09-28 08:36:01 UTC

(In reply to hlh from comment #26)
I don't understand why your 'pfctl -si' would then only show 31 states.

Either way, it would be interesting to run the dtrace script constantly (saving the output), and see what's in that log when the problem occurs.
I'd expect it to stop showing updates when you end up in this state, and restart when you trigger the workaround. The interesting bit would be the last line(s) before it blocks, and the first line(s) when it resumes.

Comment 28 Henri Hennebert 2017-09-28 08:46:48 UTC

I think that the `pfctl -si` shows the counters just when the problem start.

I'm running dtrace from now on.

Comment 29 noah.bergbauer 2017-09-28 14:27:42 UTC

>>I don't understand why your 'pfctl -si' would then only show 31 states.
>I think that the `pfctl -si` shows the counters just when the problem start.

It's the same for me. As far as I remember, this behavior is easily explained by the fact that these counters are updated regularly by the purge thread, so ... yeah. Frozen purge thread means frozen counters.

Comment 30 Kristof Provost freebsd_committer

2017-09-28 14:45:00 UTC

(In reply to noah.bergbauer from comment #29)
Having checked the code, yes, I believe you're correct about that.

Comment 31 Henri Hennebert 2017-10-01 08:36:05 UTC

(In reply to Kristof Provost from comment #27)

When the problem occurs, no more line in the dtrace log.
Here are the last 20 lines:

[root@norquay ~]# tail -n 20 /var/log/pf.dtrace.log-before 
  0   2257    pf_purge_expired_states:entry 
  0   2258   pf_purge_expired_states:return 
  0   2258   pf_purge_expired_states:return 
  0   2257    pf_purge_expired_states:entry 
  0   2258   pf_purge_expired_states:return 
  0   2258   pf_purge_expired_states:return 
  3   2257    pf_purge_expired_states:entry 
  3   2258   pf_purge_expired_states:return 
  3   2258   pf_purge_expired_states:return 
  3   2257    pf_purge_expired_states:entry 
  3   2258   pf_purge_expired_states:return 
  3   2258   pf_purge_expired_states:return 
  3   2257    pf_purge_expired_states:entry 
  3   2258   pf_purge_expired_states:return 
  3   2258   pf_purge_expired_states:return 
  1   2257    pf_purge_expired_states:entry 
  1   2258   pf_purge_expired_states:return 
  1   2258   pf_purge_expired_states:return 

then I run:
[root@norquay ~]# echo "set timeout interval 5" | pfctl -mf -

And the dtrace log resume; the first 20 lines are:

[root@norquay ~]# head -n 20 /var/log/pf.dtrace.log
CPU     ID                    FUNCTION:NAME
  1   2257    pf_purge_expired_states:entry 
  1   2258   pf_purge_expired_states:return 
  1   2258   pf_purge_expired_states:return 
  0   2257    pf_purge_expired_states:entry 
  0   2258   pf_purge_expired_states:return 
  0   2258   pf_purge_expired_states:return 
  0   2257    pf_purge_expired_states:entry 
  0   2258   pf_purge_expired_states:return 
  0   2258   pf_purge_expired_states:return 
  0   2257    pf_purge_expired_states:entry 
  0   2258   pf_purge_expired_states:return 
  0   2258   pf_purge_expired_states:return 
  0   2257    pf_purge_expired_states:entry 
  0   2258   pf_purge_expired_states:return 
  0   2258   pf_purge_expired_states:return 
  0   2257    pf_purge_expired_states:entry 
  0   2258   pf_purge_expired_states:return 
  0   2258   pf_purge_expired_states:return 
  0   2258   pf_purge_expired_states:return 

PS I have the 2 log files and we can go back more then 20 lines.

Comment 32 Kristof Provost freebsd_committer

2017-10-01 15:39:15 UTC

(In reply to hlh from comment #31)
It's a little odd that you're seeing double pf_purge_expired_states:return entries.
Any chance you've got two such probes in your dtrace script?

Anyway, let's stick a couple of static probes in and see what's going on:
diff --git a/sys/netpfil/pf/pf.c b/sys/netpfil/pf/pf.c
index 8613a161f0a..f8244a6ef6e 100644
--- a/sys/netpfil/pf/pf.c
+++ b/sys/netpfil/pf/pf.c
@@ -55,6 +55,7 @@ __FBSDID("$FreeBSD$");
 #include <sys/md5.h>
 #include <sys/random.h>
 #include <sys/refcount.h>
+#include <sys/sdt.h>
 #include <sys/socket.h>
 #include <sys/sysctl.h>
 #include <sys/taskqueue.h>
@@ -105,6 +106,14 @@ __FBSDID("$FreeBSD$");

 #define        DPFPRINTF(n, x) if (V_pf_status.debug >= (n)) printf x

+/* DTrace static probes */
+SDT_PROVIDER_DEFINE(pf);
+
+SDT_PROBE_DEFINE(pf, purge, thread, wakeup);
+SDT_PROBE_DEFINE2(pf, purge, , expired_states,
+               "unsigned int",
+               "int");
+
 /*
  * Global variables
  */
@@ -1434,6 +1443,7 @@ pf_purge_thread(void *unused __unused)
        sx_xlock(&pf_end_lock);
        while (pf_end_threads == 0) {
                sx_sleep(pf_purge_thread, &pf_end_lock, 0, "pftm", hz / 10);
+               SDT_PROBE0(pf, purge, thread, wakeup);

                VNET_LIST_RLOCK();
                VNET_FOREACH(vnet_iter) {
@@ -1680,6 +1690,8 @@ pf_purge_expired_states(u_int i, int maxcheck)

        V_pf_status.states = uma_zone_get_cur(V_pf_state_z);

+       SDT_PROBE2(pf, purge, , expired_states, i, maxcheck);
+
        /*
         * Go through hash and unlink states that expire now.
         */

You can trace those with:
#!/usr/sbin/dtrace -s

pf:purge:thread:wakeup
{

}

pf:purge::expired_states
{
    printf("i %d maxentry %d %d", arg0, arg1, arg2);
}

Hopefully we'll get a clue as to what's going on with this.

Comment 33 Henri Hennebert 2017-10-02 06:10:32 UTC

(In reply to Kristof Provost from comment #32)

Here is the dtrace script used:

#!/usr/sbin/dtrace -s

fbt:kernel:pf_purge_expired_states:entry
{

}

fbt:kernel:pf_purge_expired_states:return
{

}

fbt:kernel:pf_purge_expired_fragments:entry
{

}

fbt:kernel:pf_purge_thread:entry
{

}

I'm doing the changes you ask.

Comment 34 Henri Hennebert 2017-10-02 10:06:50 UTC

(In reply to Kristof Provost from comment #32)

I check the troubling number of return:

[root@norquay log]# grep entry pf.dtrace1.log-before |wc -l
 2026647
[root@norquay log]# grep return pf.dtrace1.log-before |wc -l
 4021723
[root@norquay log]# bc
2026647*2
4053294
4053294-4021723
31571

Really strange!

Comment 35 Henri Hennebert 2017-10-19 06:30:31 UTC

(In reply to hlh from comment #33)

The problem crops up again.

Here is the end of the output of your dtrace script when the states are no more expired:

[root@norquay log]# tail pf.dtrace2.log-after
  3   9078              none:expired_states i 19947 maxentry 327 0
  3   9077                    thread:wakeup 
  3   9078              none:expired_states i 20274 maxentry 327 0
  0   9077                    thread:wakeup 
  0   9078              none:expired_states i 20601 maxentry 327 0
  0   9077                    thread:wakeup 
  0   9078              none:expired_states i 20928 maxentry 327 0
  0   9077                    thread:wakeup 
  0   9078              none:expired_states i 21255 maxentry 327 0

The dtrace script give no more output.

Then I stop the dtrace and save the previous trace.

I restart a new dtrace which output nothing until I run:

echo "set timeout interval 5" | pfctl -mf -
echo "set timeout interval 5" | pfctl -mf -
nohup service pf restart

The first lines of the dtrace are:

[root@norquay log]# head pf.dtrace2.log-after 
CPU     ID                    FUNCTION:NAME
  0   9077                    thread:wakeup 
  0   9078              none:expired_states i 23544 maxentry 655 0
  1   9077                    thread:wakeup 
  1   9078              none:expired_states i 24199 maxentry 655 0
  1   9077                    thread:wakeup 
  1   9078              none:expired_states i 24854 maxentry 655 0
  1   9077                    thread:wakeup 
  1   9078              none:expired_states i 25509 maxentry 655 0
  1   9077                    thread:wakeup

Comment 36 Henri Hennebert 2017-10-19 08:14:40 UTC

I notice that when I do:

echo "set timeout interval 10" | pfctl -mf -

while pf is running without problem, then no more state are created.
I must run `nohup service pf restart`to restore a functional system.

Comment 37 Kristof Provost freebsd_committer

2017-10-22 20:05:13 UTC

I do not understand this at all. There is no reason for the purge thread to get stuck. There's also no reason for it apparently having run 7 times between freezing and when you restart the dtrace. (That's based on the purge_idx value logged by the expired_states probe. 23544 - 21255 == 327 * 7.

(Please just don't stop it at all while you're performing the workaround. There's no reason for it, and stopping it runs the risk of missing interesting information).

It still looks like we just don't wake up from the sx_sleep(pf_purge_thread, &pf_end_lock, ...) call, but that makes no sense. The only way for that to happen is if someone would be holding the pf_end_lock, but that lock is only taken when the pf module is unloaded (and by the purge thread, obviously).

Do you see other strange behaviour on the system? If there's something wrong with the sleep/lock infrastructure I'd expect to see other strange things.

Comment 38 Henri Hennebert 2017-10-23 06:18:16 UTC

(In reply to Kristof Provost from comment #37)

The only thing that seems strange to me is the memory used on this system:

top show:

last pid: 37231;  load averages:  0.19,  0.20,  0.16                                                        up 3+17:01:05  08:04:01
73 processes:  1 running, 71 sleeping, 1 zombie
CPU:  0.1% user,  0.0% nice,  0.2% system,  0.0% interrupt, 99.7% idle
Mem: 69M Active, 76M Inact, 57M Laundry, 1727M Wired, 35M Free
ARC: 597M Total, 70M MFU, 195M MRU, 1868K Anon, 4274K Header, 326M Other
     55M Compressed, 212M Uncompressed, 3.84:1 Ratio
Swap: 4096M Total, 356M Used, 3740M Free, 8% Inuse

zfs-stats -M:

------------------------------------------------------------------------
ZFS Subsystem Report				Mon Oct 23 08:04:34 2017
------------------------------------------------------------------------

System Memory:

	3.53%	69.41	MiB Active,	3.87%	76.07	MiB Inact
	87.95%	1.69	GiB Wired,	0.00%	0 Cache
	1.72%	33.71	MiB Free,	2.93%	57.48	MiB Gap

	Real Installed:				2.00	GiB
	Real Available:			98.34%	1.97	GiB
	Real Managed:			97.50%	1.92	GiB

	Logical Total:				2.00	GiB
	Logical Used:			94.64%	1.89	GiB
	Logical Free:			5.36%	109.78	MiB

Kernel Memory:					147.94	MiB
	Data:				86.01%	127.24	MiB
	Text:				13.99%	20.70	MiB

Kernel Memory Map:				512.00	MiB
	Size:				53.68%	274.86	MiB
	Free:				46.32%	237.14	MiB

------------------------------------------------------------------------

zfs-stats -A

------------------------------------------------------------------------
ZFS Subsystem Report				Mon Oct 23 08:05:18 2017
------------------------------------------------------------------------

ARC Summary: (HEALTHY)
	Memory Throttle Count:			0

ARC Misc:
	Deleted:				4.39m
	Recycle Misses:				0
	Mutex Misses:				24.67m
	Evict Skips:				4.15b

ARC Size:				233.77%	598.44	MiB
	Target Size: (Adaptive)		100.00%	256.00	MiB
	Min Size (Hard Limit):		12.50%	32.00	MiB
	Max Size (High Water):		8:1	256.00	MiB

ARC Size Breakdown:
	Recently Used Cache Size:	22.91%	137.09	MiB
	Frequently Used Cache Size:	77.09%	461.35	MiB

ARC Hash Breakdown:
	Elements Max:				41.80k
	Elements Current:		38.41%	16.06k
	Collisions:				448.17k
	Chain Max:				4
	Chains:					471

------------------------------------------------------------------------

Why ARC at 233% and 1727Mb wired?

Comment 39 Henri Hennebert 2017-10-23 07:17:23 UTC

(In reply to Kristof Provost from comment #37)
I'm really sorry because in my previous post I display the tail of dtrace2.log-after instead of dtrace2.log-before. I have deleted those files; Shame on me!
Let's wait the next problem...

Comment 40 Henri Hennebert 2017-10-25 15:58:05 UTC

(In reply to hlh from comment #33)

When I run the dtrace in batch, it end freqently with:

dtrace: script './pf.dtrace2' matched 2 probes
dtrace: buffer size lowered to 2m
dtrace: processing aborted: Abort due to systemic unresponsiveness

and so have no valuable trace when the problem arise.

PS I'm now running 

FreeBSD norquay.restart.bel 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r322941:324563M: Tue Oct 24 16:18:12 CEST 2017     root@norquay.restart.bel:/usr/obj/usr/src/sys/NORQUAY  arm64

And the problem crop up more freqently.

Comment 41 Kristof Provost freebsd_committer

2017-10-25 19:27:56 UTC

(In reply to hlh from comment #38)
> Why ARC at 233% and 1727Mb wired?
The ARC now keeps compressed data, so you can fit more in memory. I suspect that's a result of that.

Comment 42 Kristof Provost freebsd_committer

2017-10-25 19:28:57 UTC

(In reply to hlh from comment #40)
> dtrace: processing aborted: Abort due to systemic unresponsiveness
You'd have to talk to a dtrace specialist about that. I'm not sure what causes that.

I still don't see this issue on my current boxes, so I don't know what I can do right now.

Comment 43 Henri Hennebert 2017-10-26 06:13:45 UTC

(In reply to Kristof Provost from comment #42)

I upgrade the PINE64 to CURRENT 324563 and now ntpd can't keep the clock for a long time - after one hour or 2, the clock run too fast. I think that this clock drift is correlated with the pf problem.
Does pf is sensitive to a clock running too fast and beeing reset backward by ntpd?

Comment 44 Kristof Provost freebsd_committer

2017-10-26 19:17:44 UTC

(In reply to hlh from comment #43)
No, that shouldn't matter, unless the clock is completely wrong (i.e. not running, or running orders of magnitude slower than expected, or running backwards, ...)

Comment 45 freebsd.ports 2018-02-01 08:14:31 UTC

We had the exactly same problem after upgrading from 10.3 to 11.1

Since this was the only system with an older vmware esxi, we upgraded the esxi host to a newer version. Surprise, the problem went away... :-)

Comment 46 Henri Hennebert 2018-02-01 12:14:36 UTC

(In reply to Kristof Provost from comment #44)

I can't get the dtrace you ask (dtrace after the problem occurs)

I upgrade to r328259 and the problem always crops up when I get a time drift (time jump 3 minutes in the future).

Comment 47 Kristof Provost freebsd_committer

2018-02-02 11:16:56 UTC

(In reply to hlh from comment #46)
Are you also using VMWare esxi?

Comment 48 Henri Hennebert 2018-02-02 11:54:53 UTC

(In reply to Kristof Provost from comment #47)

No. I am running on a Pine64+ 2GB

See http://docs.freebsd.org/cgi/mid.cgi?fdf7ea24-6ada-dbb1-146a-0bf6fde11d29

Comment 49 noah.bergbauer 2018-03-03 11:49:36 UTC

At least in my particular case, I eventually tracked this down to a tunable from loader.conf: kern.timecounter.smp_tsc_adjust=1

Since I removed that, this issue hasn't happened once so I believe it might have been the reason.

Comment 50 Kristof Provost freebsd_committer

2018-03-04 02:29:35 UTC

That pretty much confirms this isn't actually a pf bug, but a time keeping issue. I'm afraid I can't help with this one.

Comment 51 John-Mark Gurney freebsd_committer

2018-07-29 01:00:28 UTC

I am seeing the same thing on:
FreeBSD gate2.funkthat.com 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r336134: Mon Jul  9 19:20:11 UTC 2018     root@releng3.nyi.freebsd.org:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC  arm64

This is a Pine A64-LTS board.

Proposed loader.conf does not work for me as I do not have that line present.  This is an otherwise basic configuration w/o anything special set.

Comment 52 Hans Petter Selasky freebsd_committer

2018-07-29 09:43:45 UTC

Is it possible to get the output from "procstat -ak" when the lockup happens?

Is the Giant mutex involved here?

--HPS

Comment 53 Kristof Provost freebsd_committer

2018-08-02 08:23:40 UTC

This is an aarch64 (or pine64) problem, not a pf issue.

Comment 54 John-Mark Gurney freebsd_committer

2018-08-21 15:01:04 UTC

Created attachment 196421 [details]
run a bunch of threads, and detect if sleep fails to wakeup

This is a program I wrote to verify that sleep was working, and in my case it was not.  There is current patches attached to 229644 that are believed to fix the issue.

Attaching the program so others can find it if this problem crops up again.

Noah, you didn't include the platform of your system, so I'm going to assume that either you resolved it, or it's no longer an issue unless you comment on the bug.

Comment 55 John-Mark Gurney freebsd_committer

2018-08-29 00:47:01 UTC

This should be fixed now, see bug 229644 with commit:
https://svnweb.freebsd.org/changeset/base/338272

Comment 56 skillcoder 2018-09-19 09:10:42 UTC

I think, i have same issue.

uname -a
FreeBSD *** 11.2-RELEASE FreeBSD 11.2-RELEASE #0 r335510: Fri Jun 22 04:32:14 UTC 2018     root@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64

sudo procstat -kk 342
  PID    TID COMM                TDNAME              KSTACK
  342 100342 pf purge            -                   mi_switch+0xe6 sleepq_timedwait+0x2f _sleep+0x21e pf_purge_thread+0x86 fork_exit+0x83 fork_trampoline+0xe

pftop show thousands of FIN_WAIT_2:FIN_WAIT_2 with EXP 00:00:00 and AGE > 00:24:00

How i can fix this problem ?

Comment 57 Kristof Provost freebsd_committer

2018-09-19 09:13:24 UTC

(In reply to skillcoder from comment #56)
That's odd, we've only seen this on arm64 so far.

Can you run the Dtrace script from comment #1 to confirm that your purge thread really isn't waking up?

Comment 58 skillcoder 2018-09-19 09:18:21 UTC

# sudo ./pf.dtrace
dtrace: script './pf.dtrace' matched 8 probes

wait some minutes:
^T
load: 0.13  cmd: dtrace 10177 [uwait] 159.76r 0.12u 0.06s 0% 24064k

Comment 59 skillcoder 2018-09-19 09:26:41 UTC

It's default kernel on Hertzner CX10 with 1GiB RAM
sysctl kern.hz
kern.hz: 100

sudo pfctl -vsi
No ALTQ support in kernel
ALTQ related functions disabled
Status: Enabled for 0 days 01:18:30           Debug: Urgent

Hostid:   0x4b6112e8
Checksum: 0x6f02d61c617ad47b85858a2708d6bf50

State Table                          Total             Rate
  current entries                     1169
  searches                         1495604          317.5/s
  inserts                            40476            8.6/s
  removals                           10639            2.3/s
Source Tracking Table
  current entries                        0
  searches                               0            0.0/s
  inserts                                0            0.0/s
  removals                               0            0.0/s
Counters
  match                              47883           10.2/s
  bad-offset                             0            0.0/s
  fragment                               0            0.0/s
  short                                  0            0.0/s
  normalize                              0            0.0/s
  memory                                 0            0.0/s
  bad-timestamp                          0            0.0/s
  congestion                             0            0.0/s
  ip-option                              0            0.0/s
  proto-cksum                         5120            1.1/s
  state-mismatch                        31            0.0/s
  state-insert                           0            0.0/s
  state-limit                            0            0.0/s
  src-limit                              0            0.0/s
  synproxy                               0            0.0/s
  map-failed                             0            0.0/s
Limit Counters
  max states per rule                    0            0.0/s
  max-src-states                         0            0.0/s
  max-src-nodes                          0            0.0/s
  max-src-conn                           0            0.0/s
  max-src-conn-rate                      0            0.0/s
  overload table insertion               0            0.0/s
  overload flush states                  0            0.0/s

sudo pfctl -ss | awk '{print $6}' | sort | uniq -c | sort -nr | head
27112 FIN_WAIT_2:FIN_WAIT_2
2237 TIME_WAIT:TIME_WAIT
 224 MULTIPLE:SINGLE
 188 ESTABLISHED:ESTABLISHED
 177 FIN_WAIT_2:ESTABLISHED
  49 SYN_SENT:ESTABLISHED
   4 SYN_SENT:CLOSED
   4 SINGLE:NO_TRAFFIC
   3 MULTIPLE:MULTIPLE
   2 0:0

Comment 60 skillcoder 2018-09-19 09:35:33 UTC

After using this
echo "set timeout interval 5" | sudo pfctl -mf -

CPU     ID                    FUNCTION:NAME
  0  65924    pf_purge_expired_states:entry
  0  65925   pf_purge_expired_states:return
  0  65924    pf_purge_expired_states:entry
  0  65925   pf_purge_expired_states:return
  0  65924    pf_purge_expired_states:entry
  0  65925   pf_purge_expired_states:return
  0  65924    pf_purge_expired_states:entry
  0  65925   pf_purge_expired_states:return
  0  65924    pf_purge_expired_states:entry
  0  65925   pf_purge_expired_states:return
  0  65924    pf_purge_expired_states:entry
  0  65925   pf_purge_expired_states:return
  0  65924    pf_purge_expired_states:entry
  0  65925   pf_purge_expired_states:return
  0  65924    pf_purge_expired_states:entry
  0  65925   pf_purge_expired_states:return
  0  65924    pf_purge_expired_states:entry
  0  65925   pf_purge_expired_states:return
  0  65924    pf_purge_expired_states:entry
  0  65925   pf_purge_expired_states:return
  0  65924    pf_purge_expired_states:entry
  0  65925   pf_purge_expired_states:return
  0  65924    pf_purge_expired_states:entry
  0  65925   pf_purge_expired_states:return
  0  65924    pf_purge_expired_states:entry
  0  65925   pf_purge_expired_states:return
  0  65924    pf_purge_expired_states:entry
  0  65925   pf_purge_expired_states:return
  0  65924    pf_purge_expired_states:entry
  0  65925   pf_purge_expired_states:return
  0  65924    pf_purge_expired_states:entry
  0  65925   pf_purge_expired_states:return
  0  65924    pf_purge_expired_states:entry
  0  65925   pf_purge_expired_states:return
  0  65924    pf_purge_expired_states:entry
  0  65925   pf_purge_expired_states:return
  0  65924    pf_purge_expired_states:entry
^C

And now 
sudo pfctl -ss | awk '{print $6}' | sort | uniq -c | sort -nr | head
 162 ESTABLISHED:ESTABLISHED
  40 FIN_WAIT_2:ESTABLISHED
  10 FIN_WAIT_2:FIN_WAIT_2
   9 TIME_WAIT:TIME_WAIT
   1 ESTABLISHED:FIN_WAIT_2

Comment 61 Kristof Provost freebsd_committer

2018-10-01 13:25:56 UTC

That's very strange. The symptoms are the same, but as far as I know we've never had issues with the timer on amd64 (and other users would have complained too).

Can you give the latest update a try? From the looks of things you're running an unpatched 11.2.

Comment 62 Henri Hennebert 2018-10-01 16:38:40 UTC

(In reply to hlh from comment #48)

Since my upgrade to 12.0 r338128 and using a10_timer  (see pr229644) the pf problem disappear for me since 2018-08-21.

Comment 63 Martin Matuska freebsd_committer

2020-10-09 18:33:45 UTC

I have encountered this very same problem on an ASUS RS500A-E10-RS12U (ASUS KRPA-U16 mainboard) using an AMD EPYC 7502 CPU, 12.2-STABLE r366526.

The workaround is to use the HPET timer instead of TSC-low.

Live system:
sysctl kern.timecounter.hardware="HPET"

/etc/sysctl.conf:
kern.timecounter.hardware="HPET"

Comment 64 Mark Kamichoff 2021-09-21 14:17:41 UTC

I actually just hit this after a 12->13 upgrade.  No PF states were being aged out.  Changing to HPET fixed this for me, too.

FreeBSD dax.prolixium.com 13.0-STABLE FreeBSD 13.0-STABLE #0 stable/13-n247299-aac5428f48e: Wed Sep 15 22:44:48 EDT 2021     prox@dax.prolixium.com:/usr/obj/usr/src/amd64.amd64/sys/DAX  amd64

(dax:6:16:EDT)% tail -n2 /etc/sysctl.conf
# fix https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=222126
kern.timecounter.hardware=HPET

Previously this was set to TSC-low.

This host is a VM running within VirtualBox, FWIW.