Summary: | pfctl cowardly refuses to load rules, broken between 8c94ed992702 & f29af8618bf9 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | Dave Cottlehuber <dch> | ||||||||
Component: | bin | Assignee: | freebsd-net (Nobody) <net> | ||||||||
Status: | Closed Unable to Reproduce | ||||||||||
Severity: | Affects Only Me | CC: | emaste, freebsd, kp | ||||||||
Priority: | --- | ||||||||||
Version: | 15.0-CURRENT | ||||||||||
Hardware: | Any | ||||||||||
OS: | Any | ||||||||||
Attachments: |
|
Created attachment 249388 [details]
codel config
This is interesting. I can load the abridged pf.conf, but it seems to hallucinate ridentifier settings which are not in the file. That may point towards some memory corruption during the parsing, but valgrind doesn't immediately seem to detect anything. I see the above behaviour on both amd64 and arm64. A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=88f557a2a9c31aaa954841b06d38eec84a1702fc commit 88f557a2a9c31aaa954841b06d38eec84a1702fc Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2024-03-22 03:21:50 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2024-03-22 08:00:05 +0000 libpfctl: fix incorrect labels copy We copied the entire parsed_labels struct, including the counter to a field that was only big enough for the labels (so not the counter). PR: 277875 MFC after: 1 week lib/libpfctl/libpfctl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Created attachment 249438 [details]
truss log
Thanks, rebuilt with that patch included.
I reduced the failing ruleset to this minimal example:
```
# pfctl -s Running
Enabled
# pfctl -F all
Ethernet rules cleared
rules cleared
nat cleared
0 tables deleted.
0 states cleared
source tracking entries cleared
pf: statistics cleared
pf: interface flags reset
root@# echo 'pass in quick on ng0 proto tcp to port 2200' | pfctl -vgf -
No ALTQ support in kernel
ALTQ related functions disabled
pass in quick on ng0 proto tcp from any to any port = 2200 flags S/SA keep state
# echo $status
1
# pfctl -s rules
#
```
Evidently its not a ruleset parsing issue.
I swapped ng0 for lo0 and the same situation occurs.
running under truss, final lines from attached full log:
ioctl(3,DIOCSETTIMEOUT,0x621da911a368) = 0 (0x0)
ioctl(3,DIOCSETTIMEOUT,0x621da911a368) = 0 (0x0)
ioctl(3,DIOCSETDEBUG,0x621da911a368) = 0 (0x0)
sendto(5," \0\0\0\^P\0\^E\0\^A\0\0\0\0\0\0"...,32,0,NULL,0) = 32 (0x20)
recvmsg(5,{0x621da911a26c,12,[{"\M-x\0\0\0\^P\0\^E\0\^A\0\0\0\0"...,65536}],1,{},0,0},0) = 284 (0x11c)
sendto(5,"\^\\0\0\0\^Q\0\^E\0\^B\0\0\0\0\0"...,28,0,NULL,0) = 28 (0x1c)
recvmsg(5,{0x621da911a26c,12,[{"0\0\0\0\^B\0\0\0\^B\0\0\0\0\0\0"...,65536}],1,{},0,0},0) = 48 (0x30)
ioctl(3,DIOCSETHOSTID,0x621da911a368) = 0 (0x0)
ioctl(3,DIOCSETREASS,0x621da911a368) = 0 (0x0)
ioctl(3,DIOCKEEPCOUNTERS,0x621da911a310) = 0 (0x0)
ioctl(3,DIOCGETLIMIT,0x621da911a300) = 0 (0x0)
ioctl(3,DIOCSETSYNCOOKIES,0x621da911a300) = 0 (0x0)
ioctl(3,DIOCXROLLBACK,0x621da911a398) = 0 (0x0)
extl_if = "ng0"
pass in quick on ng0 proto tcp from any to any port = 2200 flags S/SA keep state
write(1,"extl_if = "ng0"\npass in quick o"...,97) = 97 (0x61)
exit(0x1)
process exit, rval = 1
trying the same ruleset on a different arm64 box with same from-source
build, it works as expected - rules loaded, and output displayed.
I'll do a full re-install into an empty BE next.
That truss output is strange. We only DIOCXROLLBACK from pfctl_rules() (in sbin/pfctl), and then only after a 'goto _error'. That must mean we've failed to load one of the options in pfctl_load_options(). All but one of those are old-style ioctls and show no errors, so that would imply that it has to be pfctl_load_logif() (i.e. pfctl_set_statusif() in libpfctl) that fails. However, that can really only fail if the log interface name is too long, and that does not appear to be the case here. That's a newly converted-to-netlink call, so at least that's somewhat plausible at a source of shiny new bugs. It's also all we have to go on right now. Can you try running `dtrace -n 'fbt::pf_handle_set_statusif:return { printf("%#x %#x", arg0, arg1); }'` and then loading the relevant pf.conf? running `pfctl -vf /etc/pf.comf` repeatedly dtrace: description 'fbt::pf_handle_set_statusif:return ' matched 1 probe CPU ID FUNCTION:NAME 4 61685 pf_handle_set_statusif:return 0xe8 0 4 61685 pf_handle_set_statusif:return 0xe8 0 1 61685 pf_handle_set_statusif:return 0xe8 0 4 61685 pf_handle_set_statusif:return 0xe8 0 1 61685 pf_handle_set_statusif:return 0xe8 0 3 61685 pf_handle_set_statusif:return 0xe8 0 too soon, this last one is after rebasing to 319a5d086b50 (upstream/main) if_bridge: use IF_MINMTU. This looks like a mismatched userland vs kernel, after dtrace reporting invalid probe specifier. After a full rebuild with empty /usr/obj and a fresh beinstall, this can't be reproduced. Not present in latest CURRENT either. thanks kp for your help! |
Created attachment 249387 [details] abridged pf.conf after 8c94ed992702, servers behind firewall are unable to ping, dns, etc. I think this is because `pfctl -vvef /etc/pf.conf` returns 1, and whines about ALTQ even though we're not using any ALTQ function. Removing dummynet config doesn't seem to address the issue. Reverting to last boot env built off f29af8618bf9 and all is well. - abridged pf.conf attached, full one available as needed. - h/w is ten64 arm64 router running 15.0-CURRENT.