Bug 208140 - panic: page fault in pf
Summary: panic: page fault in pf
Status: Closed Overcome By Events
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.2-STABLE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-pf mailing list
URL:
Keywords: patch
Depends on:
Blocks:
 
Reported: 2016-03-19 15:19 UTC by Roman
Modified: 2019-02-01 13:39 UTC (History)
1 user (show)

See Also:


Attachments
Core.txt (229.70 KB, text/plain)
2016-03-19 15:19 UTC, Roman
no flags Details
Extra assertions for pf_test_udp_state (1.65 KB, patch)
2016-04-27 10:57 UTC, Kristof Provost
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Roman 2016-03-19 15:19:03 UTC
Created attachment 168388 [details]
Core.txt

May be like bug #203976 but I use "scrub in all fragment reassemble"
I use kgdb

(kgdb) whatis pd
type = struct pf_pdesc
(kgdb) p pd
$3 = {lookup = {done = 0, uid = 0, gid = 0}, tot_len = 70, hdr = {
    tcp = 0xfffffe00003e8638, udp = 0xfffffe00003e8638,
    icmp = 0xfffffe00003e8638, icmp6 = 0xfffffe00003e8638,
    any = 0xfffffe00003e8638}, nat_rule = 0x0, src = 0xfffff8024ac3c01c,
  dst = 0xfffff8024ac3c020, sport = 0x0, dport = 0x0, pf_mtag = 0x0,
  p_len = 0, ip_sum = 0xfffff8024ac3c01a, proto_sum = 0x0, flags = 2,
  af = 2 '\002', proto = 17 '\021', tos = 0 '\0', dir = 1 '\001',
  sidx = 0 '\0', didx = 1 '\001'}
(kgdb) p pd->hdr
$4 = {tcp = 0xfffffe00003e8638, udp = 0xfffffe00003e8638,
  icmp = 0xfffffe00003e8638, icmp6 = 0xfffffe00003e8638,
  any = 0xfffffe00003e8638}
(kgdb) p pd->hdr->udp
$5 = (struct udphdr *) 0xfffffe00003e8638
(kgdb) p *(pd->hdr->udp)
$6 = {uh_sport = 20480, uh_dport = 13568, uh_ulen = 12800, uh_sum = 0}

dst = 371862716 (188.44.42.22)
src = 1832175963 (91.201.52.109)
uh_sport = 20480 = 80
uh_dport = 13568 = 53


pf-nat for this ip:
binat on ng0 inet from 10.3.128.3 to any -> 188.44.42.22

pf-rules:
scrub in all fragment reassemble
pass in on vlan2 route-to (ng0 192.168.1.1) inet from <local> to ! <local> no state
pass out on ng0 fastroute all flags S/SA keep state
block drop out log on ng0 from <private> to any
block drop in on ng0 all
pass in on ng0 from any to <local> flags S/SA keep state

ng0-ng1:
+ show ng0:
  Name: ng0             Type: iface           ID: 00000002   Num hooks: 1
  Local hook      Peer name       Peer type    Peer ID         Peer hook
  ----------      ---------       ---------    -------         ---------
  inet            ng1             iface        00000004        inet
+ show ng1:
  Name: ng1             Type: iface           ID: 00000004   Num hooks: 1
  Local hook      Peer name       Peer type    Peer ID         Peer hook
  ----------      ---------       ---------    -------         ---------
  inet            ng0             iface        00000002        inet

it is 'pipe' for do NAT for two providers

netstat -rn | grep 188.44.42.22:
188.44.42.22       ng1                UHS         ng1

Local <-> ng0 <->NAT <-> ng1 <-> prov1/prov2
Comment 1 Roman 2016-03-21 09:18:37 UTC
Statistic:

-rw-------  1 root  wheel  280511 Jan 20 20:03 core.txt.0
-rw-------  1 root  wheel  265955 Jan 26 13:58 core.txt.1
-rw-------  1 root  wheel  258959 Jan 26 21:03 core.txt.2
-rw-------  1 root  wheel  301676 Feb  4 19:21 core.txt.3
-rw-------  1 root  wheel  315501 Feb  8 16:02 core.txt.4
-rw-------  1 root  wheel  283602 Feb  8 20:13 core.txt.5
-rw-------  1 root  wheel  320508 Feb  9 07:51 core.txt.6
-rw-------  1 root  wheel  297694 Mar 19 17:01 core.txt.7
-rw-------  1 root  wheel  287306 Mar 20 00:25 core.txt.8
-rw-------  1 root  wheel  282799 Mar 20 19:07 core.txt.9
Comment 2 Roman 2016-03-26 19:37:15 UTC
On last dump:

...
up 8
4454                    if (PF_ANEQ(pd->src, &nk->addr[pd->sidx], pd->af) ||
Current language:  auto; currently minimal
(kgdb) p pd
Cannot access memory at address 0x0

(kgdb) up 1
#9  0xffffffff8063d47c in pf_test (dir=<value optimized out>,
    ifp=<value optimized out>, m0=<value optimized out>,
    inp=<value optimized out>) at /usr/src/sys/netpfil/pf/pf.c:5889
5889                    action = pf_test_state_udp(&s, dir, kif, m, off, h, &pd);

(kgdb) p pd
$11 = {lookup = {done = 0, uid = 0, gid = 0}, tot_len = 74, hdr = {
    tcp = 0xfffffe00003e8638, udp = 0xfffffe00003e8638,
    icmp = 0xfffffe00003e8638, icmp6 = 0xfffffe00003e8638,
    any = 0xfffffe00003e8638}, nat_rule = 0x0, src = 0xfffff801efc6401c,
  dst = 0xfffff801efc64020, sport = 0x0, dport = 0x0, pf_mtag = 0x0,
  p_len = 0, ip_sum = 0xfffff801efc6401a, proto_sum = 0x0, flags = 0,
  af = 2 '\002', proto = 17 '\021', tos = 0 '\0', dir = 1 '\001',
  sidx = 0 '\0', didx = 1 '\001'}
Comment 3 Roman 2016-03-27 07:45:43 UTC
Change to "10.3-PRERELEASE FreeBSD 10.3-PRERELEASE #8 r297297": crashed too
Comment 4 Kristof Provost freebsd_committer 2016-04-23 13:52:46 UTC
Could you show the contents of (*state)->key[PF_SK_WIRE (0)] and (*state)->key[PF_SK_STACK (1)] at the time of the panic?

I'm more interested in the state of the pf_state, because the pf_desc is allocated on the stack in the calling function. It's very unlikely to be a bad pointer here.

My current hypothesis is that you're unlucky enough to have one core in pf_test_state_udp() trying to use state->key[] while another core is in pf_state_key_attach(). 

The locking there is rather complicated, so before I dig into that it'd be nice to confirm that one of the PF_SK_WIRE or PF_SK_STACK keys is NULL. (I'd expect PF_SK_STACK to be NULL, in fact.)
Comment 5 Roman 2016-04-26 08:44:15 UTC
(In reply to Kristof Provost from comment #4)

In pf_test_state_udp in kgdb this pointer is null:

kgdb /boot/kernel/kernel /var/crash/vmcore.last


#8  0xffffffff806591d0 in pf_test_state_udp ()
    at /usr/src/sys/netpfil/pf/pf.c:4454
4454                    if (PF_ANEQ(pd->src, &nk->addr[pd->sidx], pd->af) ||

(kgdb) whatis state
type = struct pf_state **
(kgdb) p state
Cannot access memory at address 0x0
Comment 6 Roman 2016-04-26 08:51:28 UTC
(In reply to Kristof Provost from comment #4)

May be add temporary global variable for saving "state" pointer?
I may change kernel to 10.3-RELENG/RELEASE (now is 10.3-PRERELEASE) and wait for panic.
Comment 7 Kristof Provost freebsd_committer 2016-04-26 08:54:22 UTC
(In reply to Roman from comment #6)
Yeah, because I don't see how state could possibly be NULL here. We'd have panicked a good bit earlier in that case. Not to mention that pf_test_state_udp() is always called with state pointing to a stack variable, so it can't ever be NULL.

If you can wait a bit, I'll try to write you a patch with a couple of extra KASSERT()s as well, so we'll get as much information as possible out of your tests.
Comment 8 Roman 2016-04-26 10:24:39 UTC
(In reply to Kristof Provost from comment #7)

Yes, I wait path for 10.3.
Comment 9 Kristof Provost freebsd_committer 2016-04-27 10:57:47 UTC
Created attachment 169746 [details]
Extra assertions for pf_test_udp_state

Can you run the machine with this patch? It won't fix anything, but it should give us more information if the problem happens again.
Comment 10 Roman 2016-04-27 11:20:20 UTC
(In reply to Kristof Provost from comment #9)

I installed new kernel and wait for the night to reboot
Comment 11 Roman 2016-05-01 03:09:24 UTC
(In reply to Kristof Provost from comment #9)

#4  0xffffffff805bc59d in pf_test_state_udp ()
    at /usr/src/sys/netpfil/pf/pf.c:4461
4461                panic("key PF_SK_STACK is NULL");

p *state
Cannot access memory at address 0x0

from core.txt:
=== 
panic: key PF_SK_STACK is NULL
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff80444e10 at kdb_backtrace+0x60
#1 0xffffffff8040b306 at vpanic+0x126
#2 0xffffffff8040b1d3 at panic+0x43
#3 0xffffffff805bc59d at pf_test_state_udp+0x3ad
#4 0xffffffff805b6c33 at pf_test+0x19d3
#5 0xffffffff805c5ced at pf_check_in+0x1d
#6 0xffffffff804d94d4 at pfil_run_hooks+0x84
#7 0xffffffff804f543d at ip_input+0x31d
#8 0xffffffff804d8672 at netisr_dispatch_src+0x62
#9 0xffffffff804d13a6 at ether_demux+0x126
#10 0xffffffff804d204e at ether_nh_input+0x35e
#11 0xffffffff804d8672 at netisr_dispatch_src+0x62
#12 0xffffffff804d1311 at ether_demux+0x91
#13 0xffffffff804d204e at ether_nh_input+0x35e
#14 0xffffffff804d8672 at netisr_dispatch_src+0x62
#15 0xffffffff80fd452b at nfe_int_task+0x5eb
#16 0xffffffff80455c45 at taskqueue_run_locked+0xe5
#17 0xffffffff804566d8 at taskqueue_thread_loop+0xa8
Comment 12 Roman 2016-05-02 17:57:58 UTC
new crash:

panic: page fault

---
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
panic: page fault
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff80444e10 at kdb_backtrace+0x60
#1 0xffffffff8040b306 at vpanic+0x126
#2 0xffffffff8040b1d3 at panic+0x43
#3 0xffffffff8066ddab at trap_fatal+0x36b
#4 0xffffffff8066e0ad at trap_pfault+0x2ed
#5 0xffffffff8066d72a at trap+0x47a
#6 0xffffffff80653892 at calltrap+0x8
#7 0xffffffff805b5fc6 at pf_test+0xd66
#8 0xffffffff805c5ced at pf_check_in+0x1d
#9 0xffffffff804d94d4 at pfil_run_hooks+0x84
#10 0xffffffff804f543d at ip_input+0x31d
#11 0xffffffff804d8672 at netisr_dispatch_src+0x62
#12 0xffffffff804d13a6 at ether_demux+0x126
#13 0xffffffff804d204e at ether_nh_input+0x35e
#14 0xffffffff804d8672 at netisr_dispatch_src+0x62
#15 0xffffffff804d1311 at ether_demux+0x91
#16 0xffffffff804d204e at ether_nh_input+0x35e
#17 0xffffffff804d8672 at netisr_dispatch_src+0x62

---

bt:

#0  doadump (textdump=<value optimized out>) at pcpu.h:219
#1  0xffffffff8040af62 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:486
#2  0xffffffff8040b345 in vpanic (fmt=<value optimized out>,
    ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:889
#3  0xffffffff8040b1d3 in panic (fmt=0x0)
    at /usr/src/sys/kern/kern_shutdown.c:818
#4  0xffffffff8066ddab in trap_fatal (frame=<value optimized out>,
    eva=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:858
#5  0xffffffff8066e0ad in trap_pfault (frame=0xfffffe00003cf480,
    usermode=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:681
#6  0xffffffff8066d72a in trap (frame=0xfffffe00003cf480)
    at /usr/src/sys/amd64/amd64/trap.c:447
#7  0xffffffff80653892 in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:236
#8  0xffffffff805dbd06 in pfr_update_stats (kt=<value optimized out>, a=0x10,
    af=<value optimized out>, len=74, dir_out=0, op_pass=1, notrule=0)
    at /usr/src/sys/netpfil/pf/pf_table.c:1962
#9  0xffffffff805b5fc6 in pf_test (dir=1, ifp=<value optimized out>,
    m0=0xfffffe00003cf798, inp=<value optimized out>)
    at /usr/src/sys/netpfil/pf/pf.c:6105
#10 0xffffffff805c5ced in pf_check_in (arg=<value optimized out>,
    m=0xfffffe00003cf798, ifp=0x10, dir=<value optimized out>, inp=0x0)
    at /usr/src/sys/netpfil/pf/pf_ioctl.c:3551
#11 0xffffffff804d94d4 in pfil_run_hooks (ph=0xffffffff80b1e158,
    mp=0xfffffe00003cf820, ifp=0xfffff80006c16000, dir=1, inp=0x0)
    at /usr/src/sys/net/pfil.c:82

---
#8  0xffffffff805dbd06 in pfr_update_stats (kt=<value optimized out>, a=0x10,
    af=<value optimized out>, len=74, dir_out=0, op_pass=1, notrule=0)
    at /usr/src/sys/netpfil/pf/pf_table.c:1962
1962                    sin.sin_family = AF_INET;
(kgdb) p sin
$1 = {sin_len = 16 '\020', sin_family = 2 '\002', sin_port = 0, sin_addr = {
    s_addr = 0}, sin_zero = "\000\000\000\000\000\000\000"}

#9  0xffffffff805b5fc6 in pf_test (dir=1, ifp=<value optimized out>,
    m0=0xfffffe00003cf798, inp=<value optimized out>)
    at /usr/src/sys/netpfil/pf/pf.c:6105
(kgdb) l
6100                                &s->key[(s->direction == PF_IN)]->
6101                                    addr[(s->direction == PF_OUT)],
6102                                pd.af, pd.tot_len, dir == PF_OUT,
6103                                r->action == PF_PASS, tr->src.neg);
6104                    if (tr->dst.addr.type == PF_ADDR_TABLE)
6105                            pfr_update_stats(tr->dst.addr.p.tbl,
6106                                (s == NULL) ? pd.dst :
6107                                &s->key[(s->direction == PF_IN)]->
6108                                    addr[(s->direction == PF_IN)],
6109                                pd.af, pd.tot_len, dir == PF_OUT,
p tr->dst.addr.p.tbl
Cannot access memory at address 0x68
(kgdb) p tr
$4 = <value optimized out>
(kgdb) p tr->dst
Cannot access memory at address 0x39
(kgdb) p tr->dst.addr
Cannot access memory at address 0x39
(kgdb) p tr->dst.addr.p
Cannot access memory at address 0x59
(kgdb) p tr->dst.addr.p.tbl
Cannot access memory at address 0x59

... 

p *tr - worked
p tr->dst.addr.p.tbl - worked after p *tr
Comment 13 Roman 2016-05-04 18:10:31 UTC
Change to 

options         SCHED_4BSD

Unread portion of the kernel message buffer:
panic: key PF_SK_STACK is NULL
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff80442b40 at kdb_backtrace+0x60
#1 0xffffffff8040b2a6 at vpanic+0x126
#2 0xffffffff8040b173 at panic+0x43
#3 0xffffffff805ba2cd at pf_test_state_udp+0x3ad
#4 0xffffffff805b4963 at pf_test+0x19d3
#5 0xffffffff805c3a1d at pf_check_in+0x1d
#6 0xffffffff804d7204 at pfil_run_hooks+0x84
#7 0xffffffff804f316d at ip_input+0x31d
#8 0xffffffff804d63a2 at netisr_dispatch_src+0x62
#9 0xffffffff804cf0d6 at ether_demux+0x126
#10 0xffffffff804cfd7e at ether_nh_input+0x35e
#11 0xffffffff804d63a2 at netisr_dispatch_src+0x62
#12 0xffffffff804cf041 at ether_demux+0x91
#13 0xffffffff804cfd7e at ether_nh_input+0x35e
#14 0xffffffff804d63a2 at netisr_dispatch_src+0x62
#15 0xffffffff80fae52b at nfe_int_task+0x5eb
#16 0xffffffff80453975 at taskqueue_run_locked+0xe5
#17 0xffffffff80454408 at taskqueue_thread_loop+0xa8
Comment 14 Kristof Provost freebsd_committer 2019-02-01 13:39:55 UTC
FreeBSD 10.2 is no longer supported. If this problem is still present in 12.0 or 11.2 please re-open this bug.