Hi all, there seems to be a problem with pfctl when using the -f switch. Im using jails on the loopback interface(es) and the problem seems to only affect lo0 and/or lo1 where are my jails living. If i use pfctl -f /etc/pf.conf, the traffic on the loopback interface is blocked. If i enter the command again the interface is working correctly. It happens exactly every 2nd time. I have set skip on lo in the ruleset and putting also pass on lo1 into pf.conf, seems to be a workaround. In blocked state the jails on lo1 cannot be pinged from the host system and inside the jails, its not possible to ping localhost. After entering pfctl -f /etc/pf.conf again, everything works perfect. o.0 Im not sure if other rules are affected. At the Moment also the -k switch is to under suspicion to lock sometimes the lo interfaces. I have 2 servers and 1 workstation with the same problem. My IPFW hosts are working normal. Best regards Dirk
Can you add your pf.conf and network configuration to the bug report?
Created attachment 194513 [details] pf.conf pf.conf
Created attachment 194514 [details] ifconfig.txt
My config is a little bit wild but i hope it helps anyway.
Okay, a couple of things that might be interesting: - Does it still happen if you set skip on lo0 / set skip on lo1 rather than set skip on lo? - When is lo1 created? Before or after the first load of pf.conf? - Does it happen again if you flush are rules (including the set skip of course) and re-appy? - Did this happen with 11.1?
I just created a ktrace but its 11 MByte. I dont know if its to big as attachment. It shows some errors for not found dirs in /usr/local/etc ?? Also a v6 socket could not be opened several times. Can i post the file? Im pretty sure that this comes with 11.2 or is not very old. lo1 is create via cloned_interfaces="" in /etc/rc.conf. The pf is set later in rc.conf As far i understood lo1 does only work if lo0 is also allowed. I test it now...
i removed all v6 rules but the error still happens...
Ok i added set skip on lo0 and set skip in lo1 no problems anymore!! very cool thx
However if you need further informations, pleasy let me know.
Hi all, We are noticing very similar behavior on 11.2-RELEASE after recently upgrading from 11.1-RELEASE-p11. Our pf.conf rule set is the same as it was on 11.1. Like the original poster here, we had been using "set skip on { lo }" (e.g. the interface group). Changing to "set skip on { lo0 }" doesn't really seem to change the behavior. Also, we only have one lo0 loopback interface -- no additional ones. We also are not using jails. On boot, everything works as expected. After some time, pf starts blocking traffic on lo0. From there, reloading the rules has mixed effects -- sometimes it restores lo0 and sometimes it does not. The only consistent way we seem to be able to control the behavior once it starts is using `pfctl -d` and `pfctl -e`. In other words, if the problem is happening, disabling pf will restore traffic on lo0 immediately. If we then re-enable pf, it will block traffic again on lo0 immediately. Daniel
Hi Daniel, i don´t know why its working on my system. The only difference is that my System is STABLE not RELEASE. So it is a little bit newer. Before i could solve it, i did a workaround by adding a normal rule. pass on lo0 Maybe that helps you out for now o.0 Best regards Dirk
Hi all, Apologies, I jumped the gun on my comment yesterday. Changing the rule from 'set skip lo' to 'set skip lo0' *did* fix the issue. I just needed to do a `pfctl -F all` to flush some state data after reloading it. I can confirm that it works now using the explicit interface vs. the interface group. Thanks! Daniel
First partial fix went in on r337643. I forgot to mark it as such, but it'll get MFCd next week.
A commit references this bug: Author: kp Date: Wed Aug 22 08:14:29 UTC 2018 New revision: 338183 URL: https://svnweb.freebsd.org/changeset/base/338183 Log: pfctl: Improve set skip handling for groups Rely on the kernel to appropriately mark group members as skipped. Once a group is skipped we can clear the update flag on all the members. PR: 229241 Submitted by: Andreas Longwitz <longwitz AT incore.de> MFC after: 1 week Changes: head/sbin/pfctl/pfctl.c head/sbin/pfctl/pfctl_parser.h
A commit references this bug: Author: kp Date: Wed Aug 29 20:49:57 UTC 2018 New revision: 338390 URL: https://svnweb.freebsd.org/changeset/base/338390 Log: MFC r338183, r338183: pfctl: Improve set skip handling for groups Rely on the kernel to appropriately mark group members as skipped. Once a group is skipped we can clear the update flag on all the members. PR: 229241 Submitted by: Andreas Longwitz <longwitz AT incore.de> Changes: _U stable/11/ stable/11/sbin/pfctl/pfctl.c stable/11/sbin/pfctl/pfctl_parser.h
Problem persists and can not simply be tested by ping6 ::1. To me it still affects all all versions >= FreeBSD 11 when testing with host google.com ::1 (we are running an unbound on localhost and asking over loopback interface for a domain resolving. Now with set skip on lo it does not work (request gets blocked) while with set skip on lo0 it works like expected.
(In reply to Lars Schotte from comment #16) Please include the exact version you are testing, as well as the complete pf.conf. At first glance this makes no sense, as there's no difference in 'set skip' handling between ICMPv6 and UDP.
(In reply to Kristof Provost from comment #17) See. Apparently there is. No idea. We have to reproduce it. I have it reproduced on 4 different installations, from 11.2, over 11-stable to 12-BETA1 and 12-BETA2. So there has to be SOMETHING!!! LOL!
This is still an issue, I can reproduce it with a minimal /etc/pf.conf on a fresh 11/stable starting with r338181, just do this: - Install clean FreeBSD installation, at least stable/11 r338181. - No jails involved, do everything on the host. - Install /etc/pf.conf with only this content: set skip on lo block all - Enable pf in /etc/rc.conf. - Start pf: service pf start ping localhost now works. - Reload rules: pfctl -f /etc/pf.conf ping localhost now fails, block rule is matched (this can be verified by using pflog + block log all if you want) - Reload rules: pfctl -f /etc/pf.conf ping localhost now works. And so on. I have also tested this on 12.0-RELEASE-p1 and even the 13-CURRENT 20190103 r342707 snapshot, same issue. FreeBSD 11.1 and earlier work fine. If I revert my stable/11 tree by just 1 commit to r338180 and rebuild world, everything works fine, so r338181 seems to have introduced this issue (r333084 in HEAD). Link to commit: https://svnweb.freebsd.org/base?view=revision&revision=333181 Also interesting: After reproducing this issue and then changing the skip rule to 'set skip on lo0' will cause the next pfctl -f call to not reload the rules. After calling pfctl a second time everything works fine and keeps working on subsequent calls. On FreeBSD 12 specifically, this will cause pfctl to segfault the first time... o_O I'm happy to provide more info if needed. I got a fresh 13-CURRENT virtual machine ready where I can reproduce this, so perhaps I can provide you with more debugging info if you need it.
(In reply to Henno Schooljan from comment #19) Can you confirm your findings on a recent 13? I've jus tried and I'm unable to reproduce this on 13.
Created attachment 200832 [details] pftest.sh: Test script for testing pfctl set skip failure
Created attachment 200833 [details] pftest.sh output for 12.0-RELEASE-p1
Created attachment 200834 [details] pftest.sh output for 13.0-CURRENT
(In reply to Kristof Provost from comment #20) Really strange, I can reproduce this reliably on fresh installations running inside VirtualBox 5.2.22 with these versions: 12.0-RELEASE-p1 amd64 13.0-CURRENT-20190103-r342707 amd64 For good measure I have created a pftest.sh script which reproduces the issue here, and also tests for the segfault I have been experiencing on FreeBSD 12 specifically.
Thanks for the script, this is very helpful. Strange that I didn't see it in my own test.
I *think* I know what's happening here. There's a mismatch between what pfctl things happens, and what the kernel has actually applied, which causes pfctl to set things incorrectly. Can you see if this fixes the problem for you? diff --git a/sbin/pfctl/pfctl.c b/sbin/pfctl/pfctl.c index 63298d7449c..4e00bf2462a 100644 --- a/sbin/pfctl/pfctl.c +++ b/sbin/pfctl/pfctl.c @@ -1977,6 +1977,7 @@ int pfctl_set_interface_flags(struct pfctl *pf, char *ifname, int flags, int how) { struct pfioc_iface pi; + struct node_host *h = NULL, *n = NULL; if ((loadopt & PFCTL_FLAG_OPTION) == 0) return (0); @@ -1985,6 +1986,12 @@ pfctl_set_interface_flags(struct pfctl *pf, char *ifname, int flags, int how) pi.pfiio_flags = flags; + /* Make sure our cache matches the kernel. If we set or clear the flag + * for a group this applies to all members. */ + h = ifa_grouplookup(ifname, 0); + for (n = h; n != NULL; n = n->next) + pfctl_set_interface_flags(pf, n->ifname, flags, how); + if (strlcpy(pi.pfiio_name, ifname, sizeof(pi.pfiio_name)) >= sizeof(pi.pfiio_name)) errx(1, "pfctl_set_interface_flags: strlcpy"); As for the crash on 12.0, could you test that on stable/12? I'm pretty sure I've already merged the relevant fixes, but they probably didn't make it into 12.0.
Yeah thanks a lot, that fixes the issue here. Tested with SVN revision 342952 on both stable/12 and head branches. I do not get a crash on stable/12, and with the patch applied everything gets reloaded correctly every time on both branches.
A commit references this bug: Author: kp Date: Sun Jan 13 05:30:26 UTC 2019 New revision: 342989 URL: https://svnweb.freebsd.org/changeset/base/342989 Log: pfctl: Fix 'set skip' handling for groups When we skip on a group the kernel will automatically skip on the member interfaces. We still need to update our own cache though, or we risk overruling the kernel afterwards. This manifested as 'set skip' working initially, then not working when the rules were reloaded. PR: 229241 MFC after: 1 week Changes: head/sbin/pfctl/pfctl.c
A commit references this bug: Author: kp Date: Sun Jan 13 05:31:54 UTC 2019 New revision: 342990 URL: https://svnweb.freebsd.org/changeset/base/342990 Log: pf tests: Test PR 229241 pfctl has an issue with 'set skip on <group>', which causes inconsistent behaviour: the set skip directive works initially, but does not take effect when the same rules are re-applied. PR: 229241 MFC after: 1 week Changes: head/tests/sys/netpfil/pf/set_skip.sh head/tests/sys/netpfil/pf/utils.subr
A commit references this bug: Author: kp Date: Sun Jan 20 22:01:39 UTC 2019 New revision: 343228 URL: https://svnweb.freebsd.org/changeset/base/343228 Log: MFC r342989 pfctl: Fix 'set skip' handling for groups When we skip on a group the kernel will automatically skip on the member interfaces. We still need to update our own cache though, or we risk overruling the kernel afterwards. This manifested as 'set skip' working initially, then not working when the rules were reloaded. PR: 229241 Changes: _U stable/12/ stable/12/sbin/pfctl/pfctl.c
A commit references this bug: Author: kp Date: Sun Jan 20 22:01:41 UTC 2019 New revision: 343229 URL: https://svnweb.freebsd.org/changeset/base/343229 Log: MFC r342989 pfctl: Fix 'set skip' handling for groups When we skip on a group the kernel will automatically skip on the member interfaces. We still need to update our own cache though, or we risk overruling the kernel afterwards. This manifested as 'set skip' working initially, then not working when the rules were reloaded. PR: 229241 Changes: _U stable/11/ stable/11/sbin/pfctl/pfctl.c
A commit references this bug: Author: kp Date: Sun Jan 20 22:03:44 UTC 2019 New revision: 343230 URL: https://svnweb.freebsd.org/changeset/base/343230 Log: MFC r342990 pf tests: Test PR 229241 pfctl has an issue with 'set skip on <group>', which causes inconsistent behaviour: the set skip directive works initially, but does not take effect when the same rules are re-applied. PR: 229241 Changes: _U stable/12/ stable/12/tests/sys/netpfil/pf/set_skip.sh stable/12/tests/sys/netpfil/pf/utils.subr