Happens on 12.2-RELEASE when doing a `service pf reload` when pf.conf loads a table from a file with about 2700 entries. Never had this problem before I added the table. [1232428] Fatal trap 12: page fault while in kernel mode [1232428] cpuid = 2; apic id = 02 [1232428] fault virtual address = 0x2010 [1232428] fault code = supervisor write data, page not present [1232428] instruction pointer = 0x20:0xffffffff83595228 [1232428] stack pointer = 0x0:0xfffffe0000504540 [1232428] frame pointer = 0x0:0xfffffe00005045a0 [1232428] code segment = base rx0, limit 0xfffff, type 0x1b [1232428] = DPL 0, pres 1, long 1, def32 0, gran 1 [1232428] processor eflags = interrupt enabled, resume, IOPL = 0 [1232428] current process = 0 (if_io_tqg_2) [1232428] trap number = 12 [1232428] panic: page fault [1232428] cpuid = 2 [1232428] time = 1606389244 [1232428] KDB: stack backtrace: [1232428] db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0000504200 [1232428] vpanic() at vpanic+0x17b/frame 0xfffffe0000504250 [1232428] panic() at panic+0x43/frame 0xfffffe00005042b0 [1232428] trap_fatal() at trap_fatal+0x391/frame 0xfffffe0000504310 [1232428] trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0000504360 [1232428] trap() at trap+0x286/frame 0xfffffe0000504470 [1232428] calltrap() at calltrap+0x8/frame 0xfffffe0000504470 [1232428] --- trap 0xc, rip = 0xffffffff83595228, rsp = 0xfffffe0000504540, rbp = 0xfffffe00005045a0 --- [1232428] pfr_update_stats() at pfr_update_stats+0x1a8/frame 0xfffffe00005045a0 [1232428] pf_test() at pf_test+0xebe/frame 0xfffffe0000504740 [1232428] pf_check_in() at pf_check_in+0x1d/frame 0xfffffe0000504760 [1232428] pfil_run_hooks() at pfil_run_hooks+0x87/frame 0xfffffe00005047f0 [1232428] ip_input() at ip_input+0x40e/frame 0xfffffe00005048a0 [1232428] netisr_dispatch_src() at netisr_dispatch_src+0xca/frame 0xfffffe00005048f0 [1232428] ether_demux() at ether_demux+0x138/frame 0xfffffe0000504920 [1232428] ether_nh_input() at ether_nh_input+0x33b/frame 0xfffffe0000504980 [1232428] netisr_dispatch_src() at netisr_dispatch_src+0xca/frame 0xfffffe00005049d0 [1232428] ether_input() at ether_input+0x4b/frame 0xfffffe0000504a00 [1232428] iflib_rxeof() at iflib_rxeof+0xae6/frame 0xfffffe0000504ae0 [1232428] _task_fn_rx() at _task_fn_rx+0x72/frame 0xfffffe0000504b20 [1232428] gtaskqueue_run_locked() at gtaskqueue_run_locked+0x121/frame 0xfffffe0000504b80 [1232428] gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xb6/frame 0xfffffe0000504bb0 [1232428] fork_exit() at fork_exit+0x7e/frame 0xfffffe0000504bf0 [1232428] fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0000504bf0 [1232428] --- trap 0, rip = 0, rsp = 0, rbp = 0 --- [10502] Fatal trap 12: page fault while in kernel mode [10502] cpuid = 12; apic id = 0c [10502] fault virtual address = 0xc030 [10502] fault code = supervisor write data, page not present [10502] instruction pointer = 0x20:0xffffffff83595228 [10502] stack pointer = 0x28:0xfffffe0000536330 [10502] frame pointer = 0x28:0xfffffe0000536390 [10502] code segment = base rx0, limit 0xfffff, type 0x1b [10502] = DPL 0, pres 1, long 1, def32 0, gran 1 [10502] processor eflags = interrupt enabled, resume, IOPL = 0 [10502] current process = 0 (if_io_tqg_12) [10502] trap number = 12 [10502] panic: page fault [10502] cpuid = 12 [10502] time = 1606400775 [10502] KDB: stack backtrace: [10502] db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0000535ff0 [10502] vpanic() at vpanic+0x17b/frame 0xfffffe0000536040 [10502] panic() at panic+0x43/frame 0xfffffe00005360a0 [10502] trap_fatal() at trap_fatal+0x391/frame 0xfffffe0000536100 [10502] trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0000536150 [10502] trap() at trap+0x286/frame 0xfffffe0000536260 [10502] calltrap() at calltrap+0x8/frame 0xfffffe0000536260 [10502] --- trap 0xc, rip = 0xffffffff83595228, rsp = 0xfffffe0000536330, rbp = 0xfffffe0000536390 --- [10502] pfr_update_stats() at pfr_update_stats+0x1a8/frame 0xfffffe0000536390 [10502] pf_test() at pf_test+0xebe/frame 0xfffffe0000536530 [10502] pf_check_out() at pf_check_out+0x1d/frame 0xfffffe0000536550 [10502] pfil_run_hooks() at pfil_run_hooks+0x87/frame 0xfffffe00005365e0 [10502] ip_output() at ip_output+0xaf8/frame 0xfffffe0000536730 [10502] ip_forward() at ip_forward+0x32e/frame 0xfffffe00005367f0 [10502] ip_input() at ip_input+0x7c5/frame 0xfffffe00005368a0 [10502] netisr_dispatch_src() at netisr_dispatch_src+0xca/frame 0xfffffe00005368f0 [10502] ether_demux() at ether_demux+0x138/frame 0xfffffe0000536920 [10502] ether_nh_input() at ether_nh_input+0x33b/frame 0xfffffe0000536980 [10502] netisr_dispatch_src() at netisr_dispatch_src+0xca/frame 0xfffffe00005369d0 [10502] ether_input() at ether_input+0x4b/frame 0xfffffe0000536a00 [10502] iflib_rxeof() at iflib_rxeof+0xae6/frame 0xfffffe0000536ae0 [10502] _task_fn_rx() at _task_fn_rx+0x72/frame 0xfffffe0000536b20 [10502] gtaskqueue_run_locked() at gtaskqueue_run_locked+0x121/frame 0xfffffe0000536b80 [10502] gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xb6/frame 0xfffffe0000536bb0 [10502] fork_exit() at fork_exit+0x7e/frame 0xfffffe0000536bf0 [10502] fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0000536bf0 [10502] --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Please include your pf.conf and the table file.
Created attachment 220012 [details] pf.conf
Created attachment 220013 [details] table
(In reply to Kristof Provost from comment #1) I must admit, even though the ruleset isn't too big, there might be A LITTLE BIT of weirdness going on there with FIBs and firewalled VMs on separate bridges and routed taps. But it was working fine before the big table. Also I already had tables in it, but with just a few entries and they very rarely changed. They weren't being reloaded automatically with the rest of the ruleset unlike that new <firehol-level1> one I tried to add.
Do you have any pf_flags or anything else set in /etc/rc.conf?
(In reply to Kristof Provost from comment #5) No pf_flags. Only pf settings are: pf_enable="YES" pflog_enable="YES" pflog_flags="-s 0"
OOOPS. Sorry, I made a mistake. I kept editing my pf.conf while trying to figure this out and got confused as to what edits I had on exactly when the crashes happen. It seems to be related to the "counters" keyword of tables. I had it on at some point and it can cause crashes later on when you reload. table <firehol-level1> persist counters file "/etc/pf/firehol-level1-blocklist.tab" If I've been reading this stack trace and the disassembly correctly, the crash must be happening right there: https://github.com/freebsd/freebsd/blob/releng/12.2/sys/netpfil/pf/pf_table.c#L2112-L2113 2c228: 65 48 83 01 01 addq $1, %gs:(%rcx)
This one is my fault I think, but I haven't been able to trigger it on head by reloading a table while toggling the "counters" keyword.
(In reply to Mark Johnston from comment #8) Looks like I found an easy way to reproduce on -CURRENT. It doesn't even need a big table loaded from a file. Looks like it's related to toggling the counters support like you said. I guess I somehow managed to keep messing with this flag while trying out rulesets. Load this ruleset and enable pf: pass all table <tab> { self } pass in log to <tab> Then switch to this ruleset: pass all table <tab> counters { self } pass in log to <tab> Then send in some traffic that should create a new state. Panics every time for me. If the table already existed it might need to be destroyed first. I think the problem is in pfr_commit_ktable(). When it transfers the addresses from the "shadow" inactive table to the active table, in some cases the existing entries will be reused and those won't have their counters if the active table wasn't created with counters support to begin with. Maybe it could end up leaking counters too if you try to disabling them in the new table? I think the table can end up with a mix of entries with and without counters.
(In reply to sigsys from comment #9) Thanks for the analysis, I believe you are right. This patch fixes the bug for me: https://reviews.freebsd.org/D27440
(In reply to Mark Johnston from comment #10) Works great here. Switching between two rulesets with and without counters for the table in a loop while killing the table in a loop while running iperf works. I didn't think of exchanging the counters like that. It's a nice solution, it avoids having to allocate new ones in a function that must not fail and it gets rid of the existing counters when they're not needed anymore.
A commit references this bug: Author: markj Date: Wed Dec 2 16:01:43 UTC 2020 New revision: 368276 URL: https://svnweb.freebsd.org/changeset/base/368276 Log: pf: Fix table entry counter toggling When updating a table, pf will keep existing table entry structures corresponding to addresses that are in both of the old and new tables. However, the update may also enable or disable per-entry counters which are allocated separately. Thus when toggling PFR_TFLAG_COUNTERS, the entries may be missing counters or may have unused counters allocated. Fix the problem by modifying pfr_ina_commit() to transfer counters from or to entries in the shadow table. PR: 251414 Reported by: sigsys@gmail.com Reviewed by: kp MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27440 Changes: head/sys/netpfil/pf/pf_table.c
A commit references this bug: Author: kp Date: Wed Dec 2 16:33:23 UTC 2020 New revision: 368277 URL: https://svnweb.freebsd.org/changeset/base/368277 Log: pf tests: Test case for bug #251414 Changing a table from not having counters to having counters (or vice versa) may trigger panics. PR: 251414 MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D27441 Changes: head/tests/sys/netpfil/pf/table.sh
A commit references this bug: Author: markj Date: Wed Dec 9 00:56:15 UTC 2020 New revision: 368470 URL: https://svnweb.freebsd.org/changeset/base/368470 Log: MFC r368276: pf: Fix table entry counter toggling PR: 251414 Changes: _U stable/12/ stable/12/sys/netpfil/pf/pf_table.c
Thanks for the report and analysis.
A commit references this bug: Author: kp Date: Wed Dec 9 17:17:45 UTC 2020 New revision: 368488 URL: https://svnweb.freebsd.org/changeset/base/368488 Log: MFC r368277: pf tests: Test case for bug #251414 Changing a table from not having counters to having counters (or vice versa) may trigger panics. PR: 251414 Changes: _U stable/12/ stable/12/tests/sys/netpfil/pf/table.sh