Created attachment 168388 [details] Core.txt May be like bug #203976 but I use "scrub in all fragment reassemble" I use kgdb (kgdb) whatis pd type = struct pf_pdesc (kgdb) p pd $3 = {lookup = {done = 0, uid = 0, gid = 0}, tot_len = 70, hdr = { tcp = 0xfffffe00003e8638, udp = 0xfffffe00003e8638, icmp = 0xfffffe00003e8638, icmp6 = 0xfffffe00003e8638, any = 0xfffffe00003e8638}, nat_rule = 0x0, src = 0xfffff8024ac3c01c, dst = 0xfffff8024ac3c020, sport = 0x0, dport = 0x0, pf_mtag = 0x0, p_len = 0, ip_sum = 0xfffff8024ac3c01a, proto_sum = 0x0, flags = 2, af = 2 '\002', proto = 17 '\021', tos = 0 '\0', dir = 1 '\001', sidx = 0 '\0', didx = 1 '\001'} (kgdb) p pd->hdr $4 = {tcp = 0xfffffe00003e8638, udp = 0xfffffe00003e8638, icmp = 0xfffffe00003e8638, icmp6 = 0xfffffe00003e8638, any = 0xfffffe00003e8638} (kgdb) p pd->hdr->udp $5 = (struct udphdr *) 0xfffffe00003e8638 (kgdb) p *(pd->hdr->udp) $6 = {uh_sport = 20480, uh_dport = 13568, uh_ulen = 12800, uh_sum = 0} dst = 371862716 (188.44.42.22) src = 1832175963 (91.201.52.109) uh_sport = 20480 = 80 uh_dport = 13568 = 53 pf-nat for this ip: binat on ng0 inet from 10.3.128.3 to any -> 188.44.42.22 pf-rules: scrub in all fragment reassemble pass in on vlan2 route-to (ng0 192.168.1.1) inet from <local> to ! <local> no state pass out on ng0 fastroute all flags S/SA keep state block drop out log on ng0 from <private> to any block drop in on ng0 all pass in on ng0 from any to <local> flags S/SA keep state ng0-ng1: + show ng0: Name: ng0 Type: iface ID: 00000002 Num hooks: 1 Local hook Peer name Peer type Peer ID Peer hook ---------- --------- --------- ------- --------- inet ng1 iface 00000004 inet + show ng1: Name: ng1 Type: iface ID: 00000004 Num hooks: 1 Local hook Peer name Peer type Peer ID Peer hook ---------- --------- --------- ------- --------- inet ng0 iface 00000002 inet it is 'pipe' for do NAT for two providers netstat -rn | grep 188.44.42.22: 188.44.42.22 ng1 UHS ng1 Local <-> ng0 <->NAT <-> ng1 <-> prov1/prov2
Statistic: -rw------- 1 root wheel 280511 Jan 20 20:03 core.txt.0 -rw------- 1 root wheel 265955 Jan 26 13:58 core.txt.1 -rw------- 1 root wheel 258959 Jan 26 21:03 core.txt.2 -rw------- 1 root wheel 301676 Feb 4 19:21 core.txt.3 -rw------- 1 root wheel 315501 Feb 8 16:02 core.txt.4 -rw------- 1 root wheel 283602 Feb 8 20:13 core.txt.5 -rw------- 1 root wheel 320508 Feb 9 07:51 core.txt.6 -rw------- 1 root wheel 297694 Mar 19 17:01 core.txt.7 -rw------- 1 root wheel 287306 Mar 20 00:25 core.txt.8 -rw------- 1 root wheel 282799 Mar 20 19:07 core.txt.9
On last dump: ... up 8 4454 if (PF_ANEQ(pd->src, &nk->addr[pd->sidx], pd->af) || Current language: auto; currently minimal (kgdb) p pd Cannot access memory at address 0x0 (kgdb) up 1 #9 0xffffffff8063d47c in pf_test (dir=<value optimized out>, ifp=<value optimized out>, m0=<value optimized out>, inp=<value optimized out>) at /usr/src/sys/netpfil/pf/pf.c:5889 5889 action = pf_test_state_udp(&s, dir, kif, m, off, h, &pd); (kgdb) p pd $11 = {lookup = {done = 0, uid = 0, gid = 0}, tot_len = 74, hdr = { tcp = 0xfffffe00003e8638, udp = 0xfffffe00003e8638, icmp = 0xfffffe00003e8638, icmp6 = 0xfffffe00003e8638, any = 0xfffffe00003e8638}, nat_rule = 0x0, src = 0xfffff801efc6401c, dst = 0xfffff801efc64020, sport = 0x0, dport = 0x0, pf_mtag = 0x0, p_len = 0, ip_sum = 0xfffff801efc6401a, proto_sum = 0x0, flags = 0, af = 2 '\002', proto = 17 '\021', tos = 0 '\0', dir = 1 '\001', sidx = 0 '\0', didx = 1 '\001'}
Change to "10.3-PRERELEASE FreeBSD 10.3-PRERELEASE #8 r297297": crashed too
Could you show the contents of (*state)->key[PF_SK_WIRE (0)] and (*state)->key[PF_SK_STACK (1)] at the time of the panic? I'm more interested in the state of the pf_state, because the pf_desc is allocated on the stack in the calling function. It's very unlikely to be a bad pointer here. My current hypothesis is that you're unlucky enough to have one core in pf_test_state_udp() trying to use state->key[] while another core is in pf_state_key_attach(). The locking there is rather complicated, so before I dig into that it'd be nice to confirm that one of the PF_SK_WIRE or PF_SK_STACK keys is NULL. (I'd expect PF_SK_STACK to be NULL, in fact.)
(In reply to Kristof Provost from comment #4) In pf_test_state_udp in kgdb this pointer is null: kgdb /boot/kernel/kernel /var/crash/vmcore.last #8 0xffffffff806591d0 in pf_test_state_udp () at /usr/src/sys/netpfil/pf/pf.c:4454 4454 if (PF_ANEQ(pd->src, &nk->addr[pd->sidx], pd->af) || (kgdb) whatis state type = struct pf_state ** (kgdb) p state Cannot access memory at address 0x0
(In reply to Kristof Provost from comment #4) May be add temporary global variable for saving "state" pointer? I may change kernel to 10.3-RELENG/RELEASE (now is 10.3-PRERELEASE) and wait for panic.
(In reply to Roman from comment #6) Yeah, because I don't see how state could possibly be NULL here. We'd have panicked a good bit earlier in that case. Not to mention that pf_test_state_udp() is always called with state pointing to a stack variable, so it can't ever be NULL. If you can wait a bit, I'll try to write you a patch with a couple of extra KASSERT()s as well, so we'll get as much information as possible out of your tests.
(In reply to Kristof Provost from comment #7) Yes, I wait path for 10.3.
Created attachment 169746 [details] Extra assertions for pf_test_udp_state Can you run the machine with this patch? It won't fix anything, but it should give us more information if the problem happens again.
(In reply to Kristof Provost from comment #9) I installed new kernel and wait for the night to reboot
(In reply to Kristof Provost from comment #9) #4 0xffffffff805bc59d in pf_test_state_udp () at /usr/src/sys/netpfil/pf/pf.c:4461 4461 panic("key PF_SK_STACK is NULL"); p *state Cannot access memory at address 0x0 from core.txt: === panic: key PF_SK_STACK is NULL cpuid = 0 KDB: stack backtrace: #0 0xffffffff80444e10 at kdb_backtrace+0x60 #1 0xffffffff8040b306 at vpanic+0x126 #2 0xffffffff8040b1d3 at panic+0x43 #3 0xffffffff805bc59d at pf_test_state_udp+0x3ad #4 0xffffffff805b6c33 at pf_test+0x19d3 #5 0xffffffff805c5ced at pf_check_in+0x1d #6 0xffffffff804d94d4 at pfil_run_hooks+0x84 #7 0xffffffff804f543d at ip_input+0x31d #8 0xffffffff804d8672 at netisr_dispatch_src+0x62 #9 0xffffffff804d13a6 at ether_demux+0x126 #10 0xffffffff804d204e at ether_nh_input+0x35e #11 0xffffffff804d8672 at netisr_dispatch_src+0x62 #12 0xffffffff804d1311 at ether_demux+0x91 #13 0xffffffff804d204e at ether_nh_input+0x35e #14 0xffffffff804d8672 at netisr_dispatch_src+0x62 #15 0xffffffff80fd452b at nfe_int_task+0x5eb #16 0xffffffff80455c45 at taskqueue_run_locked+0xe5 #17 0xffffffff804566d8 at taskqueue_thread_loop+0xa8
new crash: panic: page fault --- GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: panic: page fault cpuid = 0 KDB: stack backtrace: #0 0xffffffff80444e10 at kdb_backtrace+0x60 #1 0xffffffff8040b306 at vpanic+0x126 #2 0xffffffff8040b1d3 at panic+0x43 #3 0xffffffff8066ddab at trap_fatal+0x36b #4 0xffffffff8066e0ad at trap_pfault+0x2ed #5 0xffffffff8066d72a at trap+0x47a #6 0xffffffff80653892 at calltrap+0x8 #7 0xffffffff805b5fc6 at pf_test+0xd66 #8 0xffffffff805c5ced at pf_check_in+0x1d #9 0xffffffff804d94d4 at pfil_run_hooks+0x84 #10 0xffffffff804f543d at ip_input+0x31d #11 0xffffffff804d8672 at netisr_dispatch_src+0x62 #12 0xffffffff804d13a6 at ether_demux+0x126 #13 0xffffffff804d204e at ether_nh_input+0x35e #14 0xffffffff804d8672 at netisr_dispatch_src+0x62 #15 0xffffffff804d1311 at ether_demux+0x91 #16 0xffffffff804d204e at ether_nh_input+0x35e #17 0xffffffff804d8672 at netisr_dispatch_src+0x62 --- bt: #0 doadump (textdump=<value optimized out>) at pcpu.h:219 #1 0xffffffff8040af62 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:486 #2 0xffffffff8040b345 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:889 #3 0xffffffff8040b1d3 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:818 #4 0xffffffff8066ddab in trap_fatal (frame=<value optimized out>, eva=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:858 #5 0xffffffff8066e0ad in trap_pfault (frame=0xfffffe00003cf480, usermode=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:681 #6 0xffffffff8066d72a in trap (frame=0xfffffe00003cf480) at /usr/src/sys/amd64/amd64/trap.c:447 #7 0xffffffff80653892 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236 #8 0xffffffff805dbd06 in pfr_update_stats (kt=<value optimized out>, a=0x10, af=<value optimized out>, len=74, dir_out=0, op_pass=1, notrule=0) at /usr/src/sys/netpfil/pf/pf_table.c:1962 #9 0xffffffff805b5fc6 in pf_test (dir=1, ifp=<value optimized out>, m0=0xfffffe00003cf798, inp=<value optimized out>) at /usr/src/sys/netpfil/pf/pf.c:6105 #10 0xffffffff805c5ced in pf_check_in (arg=<value optimized out>, m=0xfffffe00003cf798, ifp=0x10, dir=<value optimized out>, inp=0x0) at /usr/src/sys/netpfil/pf/pf_ioctl.c:3551 #11 0xffffffff804d94d4 in pfil_run_hooks (ph=0xffffffff80b1e158, mp=0xfffffe00003cf820, ifp=0xfffff80006c16000, dir=1, inp=0x0) at /usr/src/sys/net/pfil.c:82 --- #8 0xffffffff805dbd06 in pfr_update_stats (kt=<value optimized out>, a=0x10, af=<value optimized out>, len=74, dir_out=0, op_pass=1, notrule=0) at /usr/src/sys/netpfil/pf/pf_table.c:1962 1962 sin.sin_family = AF_INET; (kgdb) p sin $1 = {sin_len = 16 '\020', sin_family = 2 '\002', sin_port = 0, sin_addr = { s_addr = 0}, sin_zero = "\000\000\000\000\000\000\000"} #9 0xffffffff805b5fc6 in pf_test (dir=1, ifp=<value optimized out>, m0=0xfffffe00003cf798, inp=<value optimized out>) at /usr/src/sys/netpfil/pf/pf.c:6105 (kgdb) l 6100 &s->key[(s->direction == PF_IN)]-> 6101 addr[(s->direction == PF_OUT)], 6102 pd.af, pd.tot_len, dir == PF_OUT, 6103 r->action == PF_PASS, tr->src.neg); 6104 if (tr->dst.addr.type == PF_ADDR_TABLE) 6105 pfr_update_stats(tr->dst.addr.p.tbl, 6106 (s == NULL) ? pd.dst : 6107 &s->key[(s->direction == PF_IN)]-> 6108 addr[(s->direction == PF_IN)], 6109 pd.af, pd.tot_len, dir == PF_OUT, p tr->dst.addr.p.tbl Cannot access memory at address 0x68 (kgdb) p tr $4 = <value optimized out> (kgdb) p tr->dst Cannot access memory at address 0x39 (kgdb) p tr->dst.addr Cannot access memory at address 0x39 (kgdb) p tr->dst.addr.p Cannot access memory at address 0x59 (kgdb) p tr->dst.addr.p.tbl Cannot access memory at address 0x59 ... p *tr - worked p tr->dst.addr.p.tbl - worked after p *tr
Change to options SCHED_4BSD Unread portion of the kernel message buffer: panic: key PF_SK_STACK is NULL cpuid = 0 KDB: stack backtrace: #0 0xffffffff80442b40 at kdb_backtrace+0x60 #1 0xffffffff8040b2a6 at vpanic+0x126 #2 0xffffffff8040b173 at panic+0x43 #3 0xffffffff805ba2cd at pf_test_state_udp+0x3ad #4 0xffffffff805b4963 at pf_test+0x19d3 #5 0xffffffff805c3a1d at pf_check_in+0x1d #6 0xffffffff804d7204 at pfil_run_hooks+0x84 #7 0xffffffff804f316d at ip_input+0x31d #8 0xffffffff804d63a2 at netisr_dispatch_src+0x62 #9 0xffffffff804cf0d6 at ether_demux+0x126 #10 0xffffffff804cfd7e at ether_nh_input+0x35e #11 0xffffffff804d63a2 at netisr_dispatch_src+0x62 #12 0xffffffff804cf041 at ether_demux+0x91 #13 0xffffffff804cfd7e at ether_nh_input+0x35e #14 0xffffffff804d63a2 at netisr_dispatch_src+0x62 #15 0xffffffff80fae52b at nfe_int_task+0x5eb #16 0xffffffff80453975 at taskqueue_run_locked+0xe5 #17 0xffffffff80454408 at taskqueue_thread_loop+0xa8
FreeBSD 10.2 is no longer supported. If this problem is still present in 12.0 or 11.2 please re-open this bug.