Bug 229241

Summary: pfctl -f /etc/pf.conf blocks loopback interface
Product: Base System Reporter: delmo
Component: binAssignee: Kristof Provost <kp>
Status: Closed FIXED    
Severity: Affects Only Me CC: arved, duerrd561, egypcio, gustik, henno, kp, marek
Priority: ---    
Version: 11.2-STABLE   
Hardware: amd64   
OS: Any   
See Also: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230588
Attachments:
Description Flags
pf.conf
none
ifconfig.txt
none
pftest.sh: Test script for testing pfctl set skip failure
none
pftest.sh output for 12.0-RELEASE-p1
none
pftest.sh output for 13.0-CURRENT none

Description delmo 2018-06-22 22:05:25 UTC
Hi all,

there seems to be a problem with pfctl when using the -f switch.

Im using jails on the loopback interface(es) and the problem seems to only affect lo0 and/or lo1 where are my jails living.

If i use pfctl -f /etc/pf.conf, the traffic on the loopback interface is blocked. If i enter the command again the interface is working correctly. It happens exactly every 2nd time.

I have set skip on lo in the ruleset and putting also pass on lo1 into pf.conf, seems to be a workaround.

In blocked state the jails on lo1 cannot be pinged from the host system and inside the jails, its not possible to ping localhost. After entering pfctl -f /etc/pf.conf again, everything works perfect. o.0

Im not sure if other rules are affected. At the Moment also the -k switch is to under suspicion to lock sometimes the lo interfaces. I have 2 servers and 1 workstation with the same problem. My IPFW hosts are working normal.

Best regards Dirk
Comment 1 Kristof Provost freebsd_committer freebsd_triage 2018-06-22 22:14:12 UTC
Can you add your pf.conf and network configuration to the bug report?
Comment 2 delmo 2018-06-22 22:19:34 UTC
Created attachment 194513 [details]
pf.conf

pf.conf
Comment 3 delmo 2018-06-22 22:21:16 UTC
Created attachment 194514 [details]
ifconfig.txt
Comment 4 delmo 2018-06-22 22:23:00 UTC
My config is a little bit wild but i hope it helps anyway.
Comment 5 Kristof Provost freebsd_committer freebsd_triage 2018-06-22 22:29:53 UTC
Okay, a couple of things that might be interesting:
 - Does it still happen if you set skip on lo0 / set skip on lo1 rather than set skip on lo?
 - When is lo1 created? Before or after the first load of pf.conf?
 - Does it happen again if you flush are rules (including the set skip of course) and re-appy?
 - Did this happen with 11.1?
Comment 6 delmo 2018-06-22 22:57:24 UTC
I just created a ktrace but its 11 MByte. I dont know if its to big as attachment. It shows some errors for not found dirs in /usr/local/etc ??
Also a v6 socket could not be opened several times. Can i post the file?

Im pretty sure that this comes with 11.2 or is not very old.

lo1 is create via cloned_interfaces="" in /etc/rc.conf. The pf is set later in rc.conf

As far i understood lo1 does only work if lo0 is also allowed. I test it now...
Comment 7 delmo 2018-06-22 22:58:55 UTC
i removed all v6 rules but the error still happens...
Comment 8 delmo 2018-06-22 23:04:05 UTC
Ok i added
set skip on lo0 and
set skip in lo1

no problems anymore!!

very cool thx
Comment 9 delmo 2018-06-22 23:08:22 UTC
However if you need further informations, pleasy let me know.
Comment 10 Daniel Duerr 2018-07-25 21:59:47 UTC
Hi all,

We are noticing very similar behavior on 11.2-RELEASE after recently upgrading from 11.1-RELEASE-p11.  Our pf.conf rule set is the same as it was on 11.1.  Like the original poster here, we had been using "set skip on { lo }" (e.g. the interface group).  Changing to "set skip on { lo0 }" doesn't really seem to change the behavior.  Also, we only have one lo0 loopback interface -- no additional ones.  We also are not using jails.

On boot, everything works as expected.  After some time, pf starts blocking traffic on lo0.  From there, reloading the rules has mixed effects -- sometimes it restores lo0 and sometimes it does not.  The only consistent way we seem to be able to control the behavior once it starts is using `pfctl -d` and `pfctl -e`.  In other words, if the problem is happening, disabling pf will restore traffic on lo0 immediately.  If we then re-enable pf, it will block traffic again on lo0 immediately.

Daniel
Comment 11 delmo 2018-07-26 01:53:09 UTC
Hi Daniel,

i donĀ“t know why its working on my system. The only difference is that my System is STABLE not RELEASE. So it is a little bit newer. Before i could solve it, i did a workaround by adding a normal rule.
pass on lo0 

Maybe that helps you out for now o.0

Best regards Dirk
Comment 12 Daniel Duerr 2018-07-26 14:51:42 UTC
Hi all,

Apologies, I jumped the gun on my comment yesterday.  Changing the rule from 'set skip lo' to 'set skip lo0' *did* fix the issue.  I just needed to do a `pfctl -F all` to flush some state data after reloading it.  I can confirm that it works now using the explicit interface vs. the interface group.

Thanks!
Daniel
Comment 13 Kristof Provost freebsd_committer freebsd_triage 2018-08-13 13:03:56 UTC
First partial fix went in on r337643. I forgot to mark it as such, but it'll get MFCd next week.
Comment 14 commit-hook freebsd_committer freebsd_triage 2018-08-22 08:15:10 UTC
A commit references this bug:

Author: kp
Date: Wed Aug 22 08:14:29 UTC 2018
New revision: 338183
URL: https://svnweb.freebsd.org/changeset/base/338183

Log:
  pfctl: Improve set skip handling for groups

  Rely on the kernel to appropriately mark group members as skipped.
  Once a group is skipped we can clear the update flag on all the members.

  PR:		229241
  Submitted by:	Andreas Longwitz <longwitz AT incore.de>
  MFC after:	1 week

Changes:
  head/sbin/pfctl/pfctl.c
  head/sbin/pfctl/pfctl_parser.h
Comment 15 commit-hook freebsd_committer freebsd_triage 2018-08-29 20:50:56 UTC
A commit references this bug:

Author: kp
Date: Wed Aug 29 20:49:57 UTC 2018
New revision: 338390
URL: https://svnweb.freebsd.org/changeset/base/338390

Log:
  MFC r338183, r338183:

  pfctl: Improve set skip handling for groups

  Rely on the kernel to appropriately mark group members as skipped.
  Once a group is skipped we can clear the update flag on all the members.

  PR:		229241
  Submitted by:	Andreas Longwitz <longwitz AT incore.de>

Changes:
_U  stable/11/
  stable/11/sbin/pfctl/pfctl.c
  stable/11/sbin/pfctl/pfctl_parser.h
Comment 16 Lars Schotte 2018-11-02 08:31:21 UTC
Problem persists and can not simply be tested by ping6 ::1.
To me it still affects all all versions >= FreeBSD 11 when testing with host google.com ::1 (we are running an unbound on localhost and asking over loopback interface for a domain resolving.
Now with
set skip on lo
it does not work (request gets blocked)
while with
set skip on lo0
it works like expected.
Comment 17 Kristof Provost freebsd_committer freebsd_triage 2018-11-02 08:32:51 UTC
(In reply to Lars Schotte from comment #16)
Please include the exact version you are testing, as well as the complete pf.conf. 

At first glance this makes no sense, as there's no difference in 'set skip' handling between ICMPv6 and UDP.
Comment 18 Lars Schotte 2018-11-02 09:08:20 UTC
(In reply to Kristof Provost from comment #17)
See. Apparently there is. No idea. We have to reproduce it. I have it reproduced on 4 different installations, from 11.2, over 11-stable to 12-BETA1 and 12-BETA2.
So there has to be SOMETHING!!! LOL!
Comment 19 Henno Schooljan 2019-01-05 11:35:26 UTC
This is still an issue, I can reproduce it with a minimal /etc/pf.conf on a fresh 11/stable starting with r338181, just do this:

- Install clean FreeBSD installation, at least stable/11 r338181.
- No jails involved, do everything on the host.
- Install /etc/pf.conf with only this content:

set skip on lo
block all

- Enable pf in /etc/rc.conf.
- Start pf: service pf start

ping localhost now works.

- Reload rules: pfctl -f /etc/pf.conf

ping localhost now fails, block rule is matched (this can be verified by using pflog + block log all if you want)

- Reload rules: pfctl -f /etc/pf.conf

ping localhost now works.

And so on.

I have also tested this on 12.0-RELEASE-p1 and even the 13-CURRENT 20190103 r342707 snapshot, same issue. FreeBSD 11.1 and earlier work fine.

If I revert my stable/11 tree by just 1 commit to r338180 and rebuild world, everything works fine, so r338181 seems to have introduced this issue (r333084 in HEAD). Link to commit: https://svnweb.freebsd.org/base?view=revision&revision=333181

Also interesting: After reproducing this issue and then changing the skip rule to 'set skip on lo0' will cause the next pfctl -f call to not reload the rules. After calling pfctl a second time everything works fine and keeps working on subsequent calls. On FreeBSD 12 specifically, this will cause pfctl to segfault the first time... o_O

I'm happy to provide more info if needed. I got a fresh 13-CURRENT virtual machine ready where I can reproduce this, so perhaps I can provide you with more debugging info if you need it.
Comment 20 Kristof Provost freebsd_committer freebsd_triage 2019-01-06 04:27:51 UTC
(In reply to Henno Schooljan from comment #19)
Can you confirm your findings on a recent 13? I've jus tried and I'm unable to reproduce this on 13.
Comment 21 Henno Schooljan 2019-01-06 10:52:38 UTC
Created attachment 200832 [details]
pftest.sh: Test script for testing pfctl set skip failure
Comment 22 Henno Schooljan 2019-01-06 10:53:43 UTC
Created attachment 200833 [details]
pftest.sh output for 12.0-RELEASE-p1
Comment 23 Henno Schooljan 2019-01-06 10:54:15 UTC
Created attachment 200834 [details]
pftest.sh output for 13.0-CURRENT
Comment 24 Henno Schooljan 2019-01-06 10:55:35 UTC
(In reply to Kristof Provost from comment #20)
Really strange, I can reproduce this reliably on fresh installations running inside VirtualBox 5.2.22 with these versions:
12.0-RELEASE-p1 amd64
13.0-CURRENT-20190103-r342707 amd64

For good measure I have created a pftest.sh script which reproduces the issue here, and also tests for the segfault I have been experiencing on FreeBSD 12 specifically.
Comment 25 Kristof Provost freebsd_committer freebsd_triage 2019-01-08 06:03:59 UTC
Thanks for the script, this is very helpful. Strange that I didn't see it in my own test.
Comment 26 Kristof Provost freebsd_committer freebsd_triage 2019-01-11 09:35:17 UTC
I *think* I know what's happening here. There's a mismatch between what pfctl things happens, and what the kernel has actually applied, which causes pfctl to set things incorrectly.

Can you see if this fixes the problem for you?

diff --git a/sbin/pfctl/pfctl.c b/sbin/pfctl/pfctl.c
index 63298d7449c..4e00bf2462a 100644
--- a/sbin/pfctl/pfctl.c
+++ b/sbin/pfctl/pfctl.c
@@ -1977,6 +1977,7 @@ int
 pfctl_set_interface_flags(struct pfctl *pf, char *ifname, int flags, int how)
 {
        struct pfioc_iface      pi;
+       struct node_host        *h = NULL, *n = NULL;

        if ((loadopt & PFCTL_FLAG_OPTION) == 0)
                return (0);
@@ -1985,6 +1986,12 @@ pfctl_set_interface_flags(struct pfctl *pf, char *ifname, int flags, int how)

        pi.pfiio_flags = flags;

+       /* Make sure our cache matches the kernel. If we set or clear the flag
+        * for a group this applies to all members. */
+       h = ifa_grouplookup(ifname, 0);
+       for (n = h; n != NULL; n = n->next)
+               pfctl_set_interface_flags(pf, n->ifname, flags, how);
+
        if (strlcpy(pi.pfiio_name, ifname, sizeof(pi.pfiio_name)) >=
            sizeof(pi.pfiio_name))
                errx(1, "pfctl_set_interface_flags: strlcpy");

As for the crash on 12.0, could you test that on stable/12? I'm pretty sure I've already merged the relevant fixes, but they probably didn't make it into 12.0.
Comment 27 Henno Schooljan 2019-01-12 01:58:56 UTC
Yeah thanks a lot, that fixes the issue here.

Tested with SVN revision 342952 on both stable/12 and head branches. I do not get a crash on stable/12, and with the patch applied everything gets reloaded correctly every time on both branches.
Comment 28 commit-hook freebsd_committer freebsd_triage 2019-01-13 05:31:18 UTC
A commit references this bug:

Author: kp
Date: Sun Jan 13 05:30:26 UTC 2019
New revision: 342989
URL: https://svnweb.freebsd.org/changeset/base/342989

Log:
  pfctl: Fix 'set skip' handling for groups

  When we skip on a group the kernel will automatically skip on the member
  interfaces. We still need to update our own cache though, or we risk
  overruling the kernel afterwards.

  This manifested as 'set skip' working initially, then not working when
  the rules were reloaded.

  PR:		229241
  MFC after:	1 week

Changes:
  head/sbin/pfctl/pfctl.c
Comment 29 commit-hook freebsd_committer freebsd_triage 2019-01-13 05:32:25 UTC
A commit references this bug:

Author: kp
Date: Sun Jan 13 05:31:54 UTC 2019
New revision: 342990
URL: https://svnweb.freebsd.org/changeset/base/342990

Log:
  pf tests: Test PR 229241

  pfctl has an issue with 'set skip on <group>', which causes inconsistent
  behaviour: the set skip directive works initially, but does not take
  effect when the same rules are re-applied.

  PR:		229241
  MFC after:	1 week

Changes:
  head/tests/sys/netpfil/pf/set_skip.sh
  head/tests/sys/netpfil/pf/utils.subr
Comment 30 commit-hook freebsd_committer freebsd_triage 2019-01-20 22:02:22 UTC
A commit references this bug:

Author: kp
Date: Sun Jan 20 22:01:39 UTC 2019
New revision: 343228
URL: https://svnweb.freebsd.org/changeset/base/343228

Log:
  MFC r342989

  pfctl: Fix 'set skip' handling for groups

  When we skip on a group the kernel will automatically skip on the member
  interfaces. We still need to update our own cache though, or we risk
  overruling the kernel afterwards.

  This manifested as 'set skip' working initially, then not working when
  the rules were reloaded.

  PR:		229241

Changes:
_U  stable/12/
  stable/12/sbin/pfctl/pfctl.c
Comment 31 commit-hook freebsd_committer freebsd_triage 2019-01-20 22:02:28 UTC
A commit references this bug:

Author: kp
Date: Sun Jan 20 22:01:41 UTC 2019
New revision: 343229
URL: https://svnweb.freebsd.org/changeset/base/343229

Log:
  MFC r342989

  pfctl: Fix 'set skip' handling for groups

  When we skip on a group the kernel will automatically skip on the member
  interfaces. We still need to update our own cache though, or we risk
  overruling the kernel afterwards.

  This manifested as 'set skip' working initially, then not working when
  the rules were reloaded.

  PR:		229241

Changes:
_U  stable/11/
  stable/11/sbin/pfctl/pfctl.c
Comment 32 commit-hook freebsd_committer freebsd_triage 2019-01-20 22:04:32 UTC
A commit references this bug:

Author: kp
Date: Sun Jan 20 22:03:44 UTC 2019
New revision: 343230
URL: https://svnweb.freebsd.org/changeset/base/343230

Log:
  MFC r342990

  pf tests: Test PR 229241

  pfctl has an issue with 'set skip on <group>', which causes inconsistent
  behaviour: the set skip directive works initially, but does not take
  effect when the same rules are re-applied.

  PR:		229241

Changes:
_U  stable/12/
  stable/12/tests/sys/netpfil/pf/set_skip.sh
  stable/12/tests/sys/netpfil/pf/utils.subr