Bug 221849 - Kernel panic, kqueue related NULL pointer dereference sys/kern/kern_event.c
Summary: Kernel panic, kqueue related NULL pointer dereference sys/kern/kern_event.c
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.1-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: Andrey V. Elsukov
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-08-27 09:37 UTC by Aragon Gouveia
Modified: 2018-05-28 20:37 UTC (History)
4 users (show)

See Also:


Attachments
Proposed patch (762 bytes, patch)
2017-08-30 19:40 UTC, Andrey V. Elsukov
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Aragon Gouveia 2017-08-27 09:37:33 UTC
I've stumbled upon a consistently reproducible panic with a GENERIC kernel on 11.1-RELEASE.  For me it happens every time I try connect my Android device to the L2TP/IPSec (racoon+mpd) server running on the system in question.  I don't know what's special about the Android VPN client, so I guess this panic might occur under other conditions too (other VPN clients don't trigger this).

kern_kqueue() in sys/kern/kern_event.c:827 gets passed a struct thread pointer, and member td_ucred->cr_ruidinfo (null pointer) gets passed to chgkqcnt() in sys/kern/kern_resource.c:1430, which eventually gets dereferenced by atomic_fetchadd_long in sys/amd64/include/atomic.h.


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x50
fault code              = supervisor write data, page not present
instruction pointer     = 0x20:0xffffffff80a65f95
stack pointer           = 0x28:0xfffffe004f1258c0
frame pointer           = 0x28:0xfffffe004f1258d0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 816 (sshd)
trap number             = 12
panic: page fault
cpuid = 1
KDB: stack backtrace:
#0 0xffffffff80aada97 at kdb_backtrace+0x67
#1 0xffffffff80a6bb76 at vpanic+0x186
#2 0xffffffff80a6b9e3 at panic+0x43
#3 0xffffffff80edf832 at trap_fatal+0x322
#4 0xffffffff80edf889 at trap_pfault+0x49
#5 0xffffffff80edf0c6 at trap+0x286
#6 0xffffffff80ec3641 at calltrap+0x8
#7 0xffffffff80a21fbc at kern_kqueue+0x4c
#8 0xffffffff80ee0394 at amd64_syscall+0x6c4
#9 0xffffffff80ec392b at Xfast_syscall+0xfb
Uptime: 6m56s
Dumping 122 out of 991 MB:..14%..27%..40%..53%..66%..79%..92%

Reading symbols from /boot/kernel/ng_socket.ko...Reading symbols from /usr/lib/debug//boot/kernel/ng_socket.ko.debug...done.
done.
Loaded symbols for /boot/kernel/ng_socket.ko
Reading symbols from /boot/kernel/netgraph.ko...Reading symbols from /usr/lib/debug//boot/kernel/netgraph.ko.debug...done.
done.
Loaded symbols for /boot/kernel/netgraph.ko
Reading symbols from /boot/kernel/ng_mppc.ko...Reading symbols from /usr/lib/debug//boot/kernel/ng_mppc.ko.debug...done.
done.
Loaded symbols for /boot/kernel/ng_mppc.ko
Reading symbols from /boot/kernel/rc4.ko...Reading symbols from /usr/lib/debug//boot/kernel/rc4.ko.debug...done.
done.
Loaded symbols for /boot/kernel/rc4.ko
#0  doadump (textdump=<value optimized out>) at pcpu.h:222
222             __asm("movq %%gs:%1,%0" : "=r" (td)
(kgdb) list *0xffffffff80a65f95
0xffffffff80a65f95 is in chgkqcnt (atomic.h:263).
258      */
259     static __inline u_long
260     atomic_fetchadd_long(volatile u_long *p, u_long v)
261     {
262
263             __asm __volatile(
264             "       " MPLOCKED "            "
265             "       xaddq   %0,%1 ;         "
266             "# atomic_fetchadd_long"
267             : "+r" (v),                     /* 0 */
Current language:  auto; currently minimal
(kgdb) backtrace
#0  doadump (textdump=<value optimized out>) at pcpu.h:222
#1  0xffffffff80a6b6f1 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:366
#2  0xffffffff80a6bbb0 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:759
#3  0xffffffff80a6b9e3 in panic (fmt=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:690
#4  0xffffffff80edf832 in trap_fatal (frame=0xfffffe004f125800, eva=80) at /usr/src/sys/amd64/amd64/trap.c:801
#5  0xffffffff80edf889 in trap_pfault (frame=0xfffffe004f125800, usermode=0) at pcpu.h:222
#6  0xffffffff80edf0c6 in trap (frame=0xfffffe004f125800) at /usr/src/sys/amd64/amd64/trap.c:421
#7  0xffffffff80ec3641 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236
#8  0xffffffff80a65f95 in chgkqcnt (uip=0x0, diff=1, max=9223372036854775807) at atomic.h:263
#9  0xffffffff80a21fbc in kern_kqueue (td=0xfffff80003e6a000, flags=0, fcaps=0x0) at /usr/src/sys/kern/kern_event.c:837
#10 0xffffffff80ee0394 in amd64_syscall (td=0xfffff80003e6a000, traced=0) at subr_syscall.c:135
#11 0xffffffff80ec392b in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396
#12 0x00000008021aae9a in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) frame 9
#9  0xffffffff80a21fbc in kern_kqueue (td=0xfffff80003e6a000, flags=0, fcaps=0x0) at /usr/src/sys/kern/kern_event.c:837
837             if (!chgkqcnt(cred->cr_ruidinfo, 1, lim_cur(td, RLIMIT_KQUEUES)))
(kgdb) list
832             struct ucred *cred;
833             int fd, error;
834
835             fdp = td->td_proc->p_fd;
836             cred = td->td_ucred;
837             if (!chgkqcnt(cred->cr_ruidinfo, 1, lim_cur(td, RLIMIT_KQUEUES)))
838                     return (ENOMEM);
839
840             error = falloc_caps(td, &fp, &fd, flags, fcaps);
841             if (error != 0) {
(kgdb) print *cred
$1 = {cr_ref = 2168608751, cr_uid = 4294967295, cr_ruid = 21168128, cr_svuid = 0, cr_ngroups = 0, cr_rgid = 0, cr_svgid = 4, 
  cr_uidinfo = 0x0, cr_ruidinfo = 0x0, cr_prison = 0xfffff8002676b4b0, cr_loginclass = 0xfffff80003945280, cr_flags = 60052120, 
  cr_pspare2 = 0xfffff80003640448, cr_label = 0x0, cr_audit = {ai_auid = 0, ai_mask = {am_success = 0, am_failure = 2158116784}, 
    ai_termid = {at_port = 4294967295, at_type = 2158116848, at_addr = 0xfffff80003640474}, ai_asid = -1, 
    ai_flags = 18446735277673415680}, cr_groups = 0x0, cr_agroups = 0, cr_smallgroups = 0xfffff8000364049c}
Comment 1 Aragon Gouveia 2017-08-27 14:39:29 UTC
After more tests, I think the kqueue related backtrace from earlier might just be a symptom of something much stranger.  Is it possible the IPSec stack is overwriting kernel memory?

The kernel panic consistently happens shortly after the Android VPN client and racoon finish establishing ISAKMP and IPSec SAs, but before MPD sees any L2TP requests.

What is inconsistent is the contents of the backtrace.  I have rebuilt a GENERIC kernel with -O0 to try make debugging easier, and below are a few kgdb sessions of separate panics that were triggered under the same condition of an Android VPN client trying to connect.


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x1100000094
fault code              = supervisor write data, page not present
instruction pointer     = 0x20:0xffffffff814c43b6
stack pointer           = 0x28:0xfffffe00003b3af0
frame pointer           = 0x28:0xfffffe00003b3b00
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 814 (sshd)
trap number             = 12
panic: page fault
cpuid = 1
KDB: stack backtrace:
#0 0xffffffff810e9be7 at kdb_backtrace+0xa7
#1 0xffffffff8107a129 at vpanic+0x249
#2 0xffffffff81079ee0 at vpanic+0
#3 0xffffffff817cb38a at trap_fatal+0x60a
#4 0xffffffff817cb538 at trap_pfault+0x188
#5 0xffffffff817ca6e1 at trap+0x751
#6 0xffffffff817cb9ba at trap_check+0x4a
#7 0xffffffff817a07e1 at calltrap+0x8
#8 0xffffffff814bfc42 at refcount_release+0x22
#9 0xffffffff814bfbae at key_freesp+0x2e
#10 0xffffffff814b7744 at ipsec_invalidate_cache+0xc4
#11 0xffffffff814b622a at ipsec_getpcbpolicy+0x16a
#12 0xffffffff814b6005 at ipsec_hdrsiz_inpcb+0x25
#13 0xffffffff8141e57d at tcp_output+0x9dd
#14 0xffffffff81439c80 at tcp_usr_send+0x350
#15 0xffffffff8116a051 at sosend_generic+0xeb1
#16 0xffffffff8116a31d at sosend+0x5d
#17 0xffffffff8112e7c7 at soo_write+0x87
Uptime: 6m9s
Dumping 123 out of 981 MB:..13%..26%..39%..52%..65%..78%..91%

Reading symbols from /boot/kernel/ng_socket.ko...Reading symbols from /usr/lib/debug//boot/kernel/ng_socket.ko.debug...done.
done.  
Loaded symbols for /boot/kernel/ng_socket.ko
Reading symbols from /boot/kernel/netgraph.ko...Reading symbols from /usr/lib/debug//boot/kernel/netgraph.ko.debug...done.
done.  
Loaded symbols for /boot/kernel/netgraph.ko
Reading symbols from /boot/kernel/ng_mppc.ko...Reading symbols from /usr/lib/debug//boot/kernel/ng_mppc.ko.debug...done.
done.  
Loaded symbols for /boot/kernel/ng_mppc.ko
Reading symbols from /boot/kernel/rc4.ko...Reading symbols from /usr/lib/debug//boot/kernel/rc4.ko.debug...done.
done.  
Loaded symbols for /boot/kernel/rc4.ko
#0  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298
298             dumptid = curthread->td_tid;
(kgdb) list *0xffffffff814c43b6
0xffffffff814c43b6 is in atomic_fetchadd_int (atomic.h:245).
240      */
241     static __inline u_int
242     atomic_fetchadd_int(volatile u_int *p, u_int v)
243     {
244
245             __asm __volatile(
246             "       " MPLOCKED "            "
247             "       xaddl   %0,%1 ;         "
248             "# atomic_fetchadd_int"
249             : "+r" (v),                     /* 0 */
Current language:  auto; currently minimal
(kgdb) backtrace
#0  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298
#1  0xffffffff81079668 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:366
#2  0xffffffff8107a17f in vpanic (fmt=0xffffffff81b40308 "%s", ap=0xfffffe00003b34a0) at /usr/src/sys/kern/kern_shutdown.c:759
#3  0xffffffff81079ee0 in panic (fmt=0xffffffff81b40308 "%s") at /usr/src/sys/kern/kern_shutdown.c:690
#4  0xffffffff817cb38a in trap_fatal (frame=0xfffffe00003b3a30, eva=73014444180) at /usr/src/sys/amd64/amd64/trap.c:801
#5  0xffffffff817cb538 in trap_pfault (frame=0xfffffe00003b3a30, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:683
#6  0xffffffff817ca6e1 in trap (frame=0xfffffe00003b3a30) at /usr/src/sys/amd64/amd64/trap.c:421
#7  0xffffffff817cb9ba in trap_check (frame=0xfffffe00003b3a30) at /usr/src/sys/amd64/amd64/trap.c:602
#8  0xffffffff817a07e1 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236
#9  0xffffffff814c43b6 in atomic_fetchadd_int (p=0x1100000094, v=4294967295) at atomic.h:250
#10 0xffffffff814bfc42 in refcount_release (count=0x1100000094) at refcount.h:62
#11 0xffffffff814bfbae in key_freesp (spp=0xfffffe00003b3b58) at /usr/src/sys/netipsec/key.c:1076
#12 0xffffffff814b7744 in ipsec_invalidate_cache (inp=0xfffff80003f2fae0, dir=2) at /usr/src/sys/netipsec/ipsec.c:317
#13 0xffffffff814b622a in ipsec_getpcbpolicy (inp=0xfffff80003f2fae0, dir=2) at /usr/src/sys/netipsec/ipsec.c:463
#14 0xffffffff814b6005 in ipsec_hdrsiz_inpcb (inp=0xfffff80003f2fae0) at /usr/src/sys/netipsec/ipsec.c:1151
#15 0xffffffff8141e57d in tcp_output (tp=0xfffff80003e58820) at /usr/src/sys/netinet/tcp_output.c:560
#16 0xffffffff81439c80 in tcp_usr_send (so=0xfffff80013035000, flags=0, m=0xfffff80013538400, nam=0x0, control=0x0,
    td=0xfffff80003cbd000) at /usr/src/sys/netinet/tcp_usrreq.c:967
#17 0xffffffff8116a051 in sosend_generic (so=0xfffff80013035000, addr=0x0, uio=0xfffffe00003b47a8, top=0xfffff80013538400,
    control=0x0, flags=0, td=0xfffff80003cbd000) at /usr/src/sys/kern/uipc_socket.c:1360
#18 0xffffffff8116a31d in sosend (so=0xfffff80013035000, addr=0x0, uio=0xfffffe00003b47a8, top=0x0, control=0x0, flags=0,
    td=0xfffff80003cbd000) at /usr/src/sys/kern/uipc_socket.c:1405
#19 0xffffffff8112e7c7 in soo_write (fp=0xfffff80003869b90, uio=0xfffffe00003b47a8, active_cred=0xfffff80003d5a600, flags=0,
    td=0xfffff80003cbd000) at /usr/src/sys/kern/sys_socket.c:146
#20 0xffffffff81121e1a in fo_write (fp=0xfffff80003869b90, uio=0xfffffe00003b47a8, active_cred=0xfffff80003d5a600, flags=0,
    td=0xfffff80003cbd000) at file.h:307
#21 0xffffffff8111dc36 in dofilewrite (td=0xfffff80003cbd000, fd=3, fp=0xfffff80003869b90, auio=0xfffffe00003b47a8, offset=-1,
    flags=0) at /usr/src/sys/kern/sys_generic.c:592
#22 0xffffffff8111d786 in kern_writev (td=0xfffff80003cbd000, fd=3, auio=0xfffffe00003b47a8) at /usr/src/sys/kern/sys_generic.c:506
#23 0xffffffff8111d65f in sys_write (td=0xfffff80003cbd000, uap=0xfffffe00003b4a58) at /usr/src/sys/kern/sys_generic.c:420
#24 0xffffffff817cc7b1 in syscallenter (td=0xfffff80003cbd000, sa=0xfffffe00003b4a48) at subr_syscall.c:135
#25 0xffffffff817cbd0a in amd64_syscall (td=0xfffff80003cbd000, traced=0) at /usr/src/sys/amd64/amd64/trap.c:902
#26 0xffffffff817a0acb in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396
#27 0x00000008021c34aa in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) frame 11
#11 0xffffffff814bfbae in key_freesp (spp=0xfffffe00003b3b58) at /usr/src/sys/netipsec/key.c:1076
1076            if (SP_DELREF(sp) == 0)
(kgdb) list
1071    key_freesp(struct secpolicy **spp)
1072    {
1073            struct secpolicy *sp = *spp;
1074   
1075            IPSEC_ASSERT(sp != NULL, ("null sp"));
1076            if (SP_DELREF(sp) == 0)
1077                    return;
1078   
1079            KEYDBG(IPSEC_STAMP,
1080                printf("%s: last reference to SP(%p)\n", __func__, sp));
(kgdb) print sp
$1 = (struct secpolicy *) 0x1100000000
(kgdb) print *sp
Cannot access memory at address 0x1100000000


Below panic seemed to occur just as a tried to perform a "racoonctl show-sa ipsec", while the VPN client was busy trying to connect.

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x4c
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff814c43fe
stack pointer           = 0x28:0xfffffe0000336b70
frame pointer           = 0x28:0xfffffe0000336bd0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 688 (racoon)
trap number             = 12
panic: page fault
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff810e9be7 at kdb_backtrace+0xa7
#1 0xffffffff8107a129 at vpanic+0x249
#2 0xffffffff81079ee0 at vpanic+0
#3 0xffffffff817cb38a at trap_fatal+0x60a
#4 0xffffffff817cb538 at trap_pfault+0x188
#5 0xffffffff817ca6e1 at trap+0x751
#6 0xffffffff817cb9ba at trap_check+0x4a
#7 0xffffffff817a07e1 at calltrap+0x8
#8 0xffffffff814d297a at key_setdumpsa+0x40a
#9 0xffffffff814cb182 at key_dump+0x412
#10 0xffffffff814c31e4 at key_parse+0xce4
#11 0xffffffff814d86ac at key_output+0x1ac
#12 0xffffffff8125af8c at raw_usend+0x8c
#13 0xffffffff814d9bb1 at key_send+0x51
#14 0xffffffff8116a051 at sosend_generic+0xeb1
#15 0xffffffff8116a31d at sosend+0x5d
#16 0xffffffff811769bc at kern_sendit+0x42c
#17 0xffffffff81176e86 at sendit+0x146
Uptime: 5m22s
Dumping 124 out of 981 MB:..13%..26%..39%..52%..65%..78%..91%

Reading symbols from /boot/kernel/ng_socket.ko...Reading symbols from /usr/lib/debug//boot/kernel/ng_socket.ko.debug...done.
done.
Loaded symbols for /boot/kernel/ng_socket.ko
Reading symbols from /boot/kernel/netgraph.ko...Reading symbols from /usr/lib/debug//boot/kernel/netgraph.ko.debug...done.
done.
Loaded symbols for /boot/kernel/netgraph.ko
Reading symbols from /boot/kernel/ng_mppc.ko...Reading symbols from /usr/lib/debug//boot/kernel/ng_mppc.ko.debug...done.
done.
Loaded symbols for /boot/kernel/ng_mppc.ko
Reading symbols from /boot/kernel/rc4.ko...Reading symbols from /usr/lib/debug//boot/kernel/rc4.ko.debug...done.
done.
Loaded symbols for /boot/kernel/rc4.ko
#0  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298
298             dumptid = curthread->td_tid;
(kgdb) list *0xffffffff814c43fe
0xffffffff814c43fe is in key_setsadbaddr (/usr/src/sys/netipsec/key.c:3693).
3688            struct mbuf *m;
3689            struct sadb_address *p;
3690            size_t len;
3691
3692            len = PFKEY_ALIGN8(sizeof(struct sadb_address)) +
3693                PFKEY_ALIGN8(saddr->sa_len);
3694            m = m_get2(len, M_NOWAIT, MT_DATA, 0);
3695            if (m == NULL)
3696                    return (NULL);
3697            m_align(m, len);
Current language:  auto; currently minimal
(kgdb) backtrace
#0  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298
#1  0xffffffff81079668 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:366
#2  0xffffffff8107a17f in vpanic (fmt=0xffffffff81b40308 "%s", ap=0xfffffe0000336520) at /usr/src/sys/kern/kern_shutdown.c:759
#3  0xffffffff81079ee0 in panic (fmt=0xffffffff81b40308 "%s") at /usr/src/sys/kern/kern_shutdown.c:690
#4  0xffffffff817cb38a in trap_fatal (frame=0xfffffe0000336ab0, eva=76) at /usr/src/sys/amd64/amd64/trap.c:801
#5  0xffffffff817cb538 in trap_pfault (frame=0xfffffe0000336ab0, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:683
#6  0xffffffff817ca6e1 in trap (frame=0xfffffe0000336ab0) at /usr/src/sys/amd64/amd64/trap.c:421
#7  0xffffffff817cb9ba in trap_check (frame=0xfffffe0000336ab0) at /usr/src/sys/amd64/amd64/trap.c:602
#8  0xffffffff817a07e1 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236
#9  0xffffffff814c43fe in key_setsadbaddr (exttype=6, saddr=0x4c, prefixlen=255 '?', ul_proto=255) at /usr/src/sys/netipsec/key.c:3693
#10 0xffffffff814d297a in key_setdumpsa (sav=0xfffff8000381b700, type=10 '\n', satype=3 '\003', seq=0, pid=688)
    at /usr/src/sys/netipsec/key.c:3469
#11 0xffffffff814cb182 in key_dump (so=0xfffff800039a8360, m=0xfffff8003d4fab00, mhp=0xfffffe0000336fd8)
    at /usr/src/sys/netipsec/key.c:7509
#12 0xffffffff814c31e4 in key_parse (m=0xfffff8003d4fab00, so=0xfffff800039a8360) at /usr/src/sys/netipsec/key.c:7861
#13 0xffffffff814d86ac in key_output (m=0xfffff8003d4fab00, so=0xfffff800039a8360) at /usr/src/sys/netipsec/keysock.c:128
#14 0xffffffff8125af8c in raw_usend (so=0xfffff800039a8360, flags=0, m=0xfffff8003d4fab00, nam=0x0, control=0x0,
    td=0xfffff800039f5560) at /usr/src/sys/net/raw_usrreq.c:238
#15 0xffffffff814d9bb1 in key_send (so=0xfffff800039a8360, flags=0, m=0xfffff8003d4fab00, nam=0x0, control=0x0, td=0xfffff800039f5560)
    at /usr/src/sys/netipsec/keysock.c:492
#16 0xffffffff8116a051 in sosend_generic (so=0xfffff800039a8360, addr=0x0, uio=0xfffffe00003376a0, top=0xfffff8003d4fab00,
    control=0x0, flags=0, td=0xfffff800039f5560) at /usr/src/sys/kern/uipc_socket.c:1360
#17 0xffffffff8116a31d in sosend (so=0xfffff800039a8360, addr=0x0, uio=0xfffffe00003376a0, top=0x0, control=0x0, flags=0,
    td=0xfffff800039f5560) at /usr/src/sys/kern/uipc_socket.c:1405
#18 0xffffffff811769bc in kern_sendit (td=0xfffff800039f5560, s=12, mp=0xfffffe00003377b0, flags=0, control=0x0, segflg=UIO_USERSPACE)
    at /usr/src/sys/kern/uipc_syscalls.c:873
#19 0xffffffff81176e86 in sendit (td=0xfffff800039f5560, s=12, mp=0xfffffe00003377b0, flags=0)
    at /usr/src/sys/kern/uipc_syscalls.c:793
#20 0xffffffff81176d37 in sys_sendto (td=0xfffff800039f5560, uap=0xfffffe0000337a58) at /usr/src/sys/kern/uipc_syscalls.c:924
#21 0xffffffff817cc7b1 in syscallenter (td=0xfffff800039f5560, sa=0xfffffe0000337a48) at subr_syscall.c:135
#22 0xffffffff817cbd0a in amd64_syscall (td=0xfffff800039f5560, traced=0) at /usr/src/sys/amd64/amd64/trap.c:902
#23 0xffffffff817a0acb in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396
#24 0x00000008013c9dfa in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) frame 10
#10 0xffffffff814d297a in key_setdumpsa (sav=0xfffff8000381b700, type=10 '\n', satype=3 '\003', seq=0, pid=688)
    at /usr/src/sys/netipsec/key.c:3469
3469                            m = key_setsadbaddr(SADB_EXT_ADDRESS_DST,
(kgdb) list
3464                            if (!m)
3465                                    goto fail;
3466                            break;
3467
3468                    case SADB_EXT_ADDRESS_DST:
3469                            m = key_setsadbaddr(SADB_EXT_ADDRESS_DST,
3470                                &sav->sah->saidx.dst.sa,
3471                                FULLMASK, IPSEC_ULPROTO_ANY);
3472                            if (!m)
3473                                    goto fail;
(kgdb) print sav
$1 = (struct secasvar *) 0xfffff8000381b700
(kgdb) print *sav
$2 = {spi = 778989686, flags = 779777128, seq = 1819047270, pid = 1768120678, ivlen = 1663985518, sah = 0x0, key_auth = 0x0,
  key_enc = 0x0, replay = 0x0, natt = 0x0, lock = 0x0, tdb_xform = 0x0, tdb_encalgxform = 0x0, tdb_authalgxform = 0x0,
  tdb_compalgxform = 0x0, tdb_cryptoid = 0, alg_auth = 0 '\0', alg_enc = 0 '\0', alg_comp = 0 '\0', state = 0 '\0', lft_c = 0x0,
  lft_h = 0x0, lft_s = 0x0, created = 0, firstused = 0, chain = {tqe_next = 0x0, tqe_prev = 0x0}, spihash = {le_next = 0x0,
    le_prev = 0x0}, drainq = {le_next = 0x0, le_prev = 0x0}, cntr = 0, refcnt = 0}
(kgdb) print sav->sah
$3 = (struct secashead *) 0x0


Fatal trap 12: page fault while in kernel mode  
cpuid = 1; apic id = 01
fault virtual address   = 0x50
fault code              = supervisor write data, page not present
instruction pointer     = 0x20:0xffffffff8106e168
stack pointer           = 0x28:0xfffffe00002bf620
frame pointer           = 0x28:0xfffffe00002bf630
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 948 (sshd)
trap number             = 12
panic: page fault
cpuid = 1
KDB: stack backtrace:
#0 0xffffffff810e9be7 at kdb_backtrace+0xa7
#1 0xffffffff8107a129 at vpanic+0x249
#2 0xffffffff81079ee0 at vpanic+0
#3 0xffffffff817cb38a at trap_fatal+0x60a
#4 0xffffffff817cb538 at trap_pfault+0x188
#5 0xffffffff817ca6e1 at trap+0x751
#6 0xffffffff817cb9ba at trap_check+0x4a
#7 0xffffffff817a07e1 at calltrap+0x8
#8 0xffffffff8106defd at chglimit+0x3d
#9 0xffffffff8106e09b at chgkqcnt+0x3b
#10 0xffffffff80fefe55 at kern_kqueue+0x75
#11 0xffffffff80fefdd7 at sys_kqueue+0x37
#12 0xffffffff817cc7b1 at syscallenter+0x961
#13 0xffffffff817cbd0a at amd64_syscall+0x2a
#14 0xffffffff817a0acb at Xfast_syscall+0xfb
Uptime: 45m34s
Dumping 123 out of 981 MB:..13%..26%..39%..52%..65%..78%..91%

Reading symbols from /boot/kernel/ng_socket.ko...Reading symbols from /usr/lib/debug//boot/kernel/ng_socket.ko.debug...done.
done.
Loaded symbols for /boot/kernel/ng_socket.ko
Reading symbols from /boot/kernel/netgraph.ko...Reading symbols from /usr/lib/debug//boot/kernel/netgraph.ko.debug...done.
done.
Loaded symbols for /boot/kernel/netgraph.ko
Reading symbols from /boot/kernel/ng_mppc.ko...Reading symbols from /usr/lib/debug//boot/kernel/ng_mppc.ko.debug...done.
done.
Loaded symbols for /boot/kernel/ng_mppc.ko
Reading symbols from /boot/kernel/rc4.ko...Reading symbols from /usr/lib/debug//boot/kernel/rc4.ko.debug...done.
done.
Loaded symbols for /boot/kernel/rc4.ko
#0  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298
298             dumptid = curthread->td_tid;
(kgdb) list *0xffffffff8106e168
0xffffffff8106e168 is in atomic_fetchadd_long (atomic.h:263).
258      */
259     static __inline u_long
260     atomic_fetchadd_long(volatile u_long *p, u_long v)
261     {
262
263             __asm __volatile(
264             "       " MPLOCKED "            "
265             "       xaddq   %0,%1 ;         "
266             "# atomic_fetchadd_long"
267             : "+r" (v),                     /* 0 */
Current language:  auto; currently minimal
(kgdb) backtrace
#0  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298
#1  0xffffffff81079668 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:366
#2  0xffffffff8107a17f in vpanic (fmt=0xffffffff81b40308 "%s", ap=0xfffffe00002befd0) at /usr/src/sys/kern/kern_shutdown.c:759
#3  0xffffffff81079ee0 in panic (fmt=0xffffffff81b40308 "%s") at /usr/src/sys/kern/kern_shutdown.c:690
#4  0xffffffff817cb38a in trap_fatal (frame=0xfffffe00002bf560, eva=80) at /usr/src/sys/amd64/amd64/trap.c:801
#5  0xffffffff817cb538 in trap_pfault (frame=0xfffffe00002bf560, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:683
#6  0xffffffff817ca6e1 in trap (frame=0xfffffe00002bf560) at /usr/src/sys/amd64/amd64/trap.c:421
#7  0xffffffff817cb9ba in trap_check (frame=0xfffffe00002bf560) at /usr/src/sys/amd64/amd64/trap.c:602
#8  0xffffffff817a07e1 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236
#9  0xffffffff8106e168 in atomic_fetchadd_long (p=0x50, v=1) at atomic.h:268
#10 0xffffffff8106defd in chglimit (uip=0x0, limit=0x50, diff=1, max=9223372036854775807, name=0xffffffff81e06486 "kqcnt")
    at /usr/src/sys/kern/kern_resource.c:1376
#11 0xffffffff8106e09b in chgkqcnt (uip=0x0, diff=1, max=9223372036854775807) at /usr/src/sys/kern/kern_resource.c:1433
#12 0xffffffff80fefe55 in kern_kqueue (td=0xfffff8000364e000, flags=0, fcaps=0x0) at /usr/src/sys/kern/kern_event.c:837
#13 0xffffffff80fefdd7 in sys_kqueue (td=0xfffff8000364e000, uap=0xfffffe00002bfa58) at /usr/src/sys/kern/kern_event.c:813
#14 0xffffffff817cc7b1 in syscallenter (td=0xfffff8000364e000, sa=0xfffffe00002bfa48) at subr_syscall.c:135
#15 0xffffffff817cbd0a in amd64_syscall (td=0xfffff8000364e000, traced=0) at /usr/src/sys/amd64/amd64/trap.c:902
#16 0xffffffff817a0acb in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396
#17 0x00000008021aae9a in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) frame 12
#12 0xffffffff80fefe55 in kern_kqueue (td=0xfffff8000364e000, flags=0, fcaps=0x0) at /usr/src/sys/kern/kern_event.c:837
837             if (!chgkqcnt(cred->cr_ruidinfo, 1, lim_cur(td, RLIMIT_KQUEUES)))
(kgdb) list
832             struct ucred *cred;
833             int fd, error;
834
835             fdp = td->td_proc->p_fd;
836             cred = td->td_ucred;
837             if (!chgkqcnt(cred->cr_ruidinfo, 1, lim_cur(td, RLIMIT_KQUEUES)))
838                     return (ENOMEM);
839
840             error = falloc_caps(td, &fp, &fd, flags, fcaps);
841             if (error != 0) {
(kgdb) print *td
$1 = {td_lock = 0xffffffff825e8d00, td_proc = 0xfffff8000333f000, td_plist = {tqe_next = 0x0, tqe_prev = 0xfffff8000333f010},
  td_runq = {tqe_next = 0xfffff80003e80560, tqe_prev = 0xffffffff825e8f88}, td_slpq = {tqe_next = 0x0,
    tqe_prev = 0xfffff80003323480}, td_lockq = {tqe_next = 0x0, tqe_prev = 0x0}, td_hash = {le_next = 0x0,
    le_prev = 0xfffffe0000d4c7b8}, td_cpuset = 0xfffff800032ce000, td_sel = 0xfffff800032d8680, td_sleepqueue = 0xfffff80003323480,
  td_turnstile = 0xfffff8000320f540, td_rlqe = 0xfffff80003874820, td_umtxq = 0xfffff80003642c80, td_vm_dom_policy = {seq = 0, p = {
      policy = VM_POLICY_NONE, domain = -1}}, td_tid = 100087, padding1 = 0xfffff8000364e0a0, padding2 = 0xfffff8000364e0c0,
  td_lend_user_pri = 255 '?', td_flags = 67174406, td_inhibitors = 0, td_pflags = 0, td_dupfd = 0, td_sqqueue = 0, td_wchan = 0x0,
  td_wmesg = 0x0, td_owepreempt = 0 '\0', td_tsqueue = 0 '\0', td_locks = 0, td_rw_rlocks = 0, td_lk_slocks = 0, td_stopsched = 1,
  td_blocked = 0x0, td_lockname = 0x0, td_contested = {lh_first = 0x0}, td_sleeplocks = 0x0, td_intr_nesting_level = 0,
  td_pinned = 0, td_ucred = 0xfffff8000390f700, td_limit = 0xfffff8000381b400, td_slptick = 0, td_blktick = 0,
  td_swvoltick = -2145350148, td_swinvoltick = -2145350138, td_cow = 127, td_ru = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {
      tv_sec = 0, tv_usec = 0}, ru_maxrss = 7268, ru_ixrss = 592, ru_idrss = 80, ru_isrss = 256, ru_minflt = 274, ru_majflt = 0,
    ru_nswap = 0, ru_inblock = 0, ru_oublock = 0, ru_msgsnd = 0, ru_msgrcv = 2, ru_nsignals = 0, ru_nvcsw = 1, ru_nivcsw = 1},
  td_rux = {rux_runtime = 0, rux_uticks = 0, rux_sticks = 0, rux_iticks = 0, rux_uu = 0, rux_su = 0, rux_tu = 0},
  td_incruntime = 44205853, td_runtime = 44205853, td_pticks = 1, td_sticks = 1, td_iticks = 0, td_uticks = 1, td_intrval = 0,
  td_oldsigmask = {__bits = 0xfffff8000364e254}, td_generation = 2, td_sigstk = {ss_sp = 0x0, ss_size = 0, ss_flags = 4},
  td_xsig = 0, td_profil_addr = 0, td_profil_ticks = 0, td_name = 0xfffff8000364e294 "sshd", td_fpop = 0x0, td_dbgflags = 0,
  td_dbgksi = {ksi_link = {tqe_next = 0x0, tqe_prev = 0x0}, ksi_info = {si_signo = 0, si_errno = 0, si_code = 0, si_pid = 0,
      si_uid = 0, si_status = 0, si_addr = 0x0, si_value = {sival_int = 0, sival_ptr = 0x0, sigval_int = 0, sigval_ptr = 0x0},
      _reason = {_fault = {_trapno = 0}, _timer = {_timerid = 0, _overrun = 0}, _mesgq = {_mqd = 0}, _poll = {_band = 0},
        __spare__ = {__spare1__ = 0, __spare2__ = 0xfffff8000364e2f8}}}, ksi_flags = 0, ksi_sigq = 0x0}, td_ng_outbound = 0,
  td_osd = {osd_nslots = 0, osd_slots = 0x0, osd_next = {le_next = 0x0, le_prev = 0x0}}, td_map_def_user = 0x0, td_dbg_forked = 0,
  td_vp_reserv = 0, td_no_sleeping = 0, td_dom_rr_idx = 0, td_su = 0x0, td_rtcgen = 0, td_sigmask = {__bits = 0xfffff8000364e374},
  td_rqindex = 30 '\036', td_base_pri = 120 'x', td_priority = 120 'x', td_pri_class = 3 '\003', td_user_pri = 121 'y',
  td_base_user_pri = 121 'y', td_dbg_sc_code = 0, td_dbg_sc_narg = 0, td_rb_list = 0, td_rbp_list = 0, td_rb_inact = 0,
  td_pcb = 0xfffffe00002bfb80, td_state = TDS_RUNNING, td_uretoff = {tdu_retval = 0xfffff8000364e3c0, tdu_off = 0}, td_cowgen = 1,
  td_slpcallout = {c_links = {le = {le_next = 0x0, le_prev = 0xfffff8001317d3d8}, sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0,
        tqe_prev = 0xfffff8001317d3d8}}, c_time = 8284352236906, c_precision = 16106127360, c_arg = 0xfffff8000364e000,
    c_func = 0xffffffff811018e0 <sleepq_timeout>, c_lock = 0x0, c_flags = 0, c_iflags = 272, c_cpu = 0},
  td_frame = 0xfffffe00002bfac0, td_kstack_obj = 0xfffff800037b8a50, td_kstack = 18446741874689163264, td_kstack_pages = 4,
  td_critnest = 1, td_md = {md_spinlock_count = 1, md_saved_flags = 646, md_spurflt_addr = 34427155648, md_invl_gen = {gen = 0,
      link = {le_next = 0x0, le_prev = 0xffffffff826df868}}}, td_ar = 0x0, td_lprof = 0xfffff8000364e470,
  td_dtrace = 0xfffff80003e35600, td_errno = 0, td_vnet = 0x0, td_vnet_lpush = 0x0, td_intr_frame = 0x0,
  td_rfppwait_p = 0xfffff8000396d588, td_ma = 0x0, td_ma_cnt = 0, td_emuldata = 0x0, td_lastcpu = 1, td_oncpu = 1, td_sleeptimo = 0,
---Type <return> to continue, or q <return> to quit---
  td_sigqueue = {sq_signals = {__bits = 0xfffff8000364e4d8}, sq_kill = {__bits = 0xfffff8000364e4e8}, sq_ptrace = {
      __bits = 0xfffff8000364e4f8}, sq_list = {tqh_first = 0x0, tqh_last = 0xfffff8000364e508}, sq_proc = 0xfffff8000333f000,
    sq_flags = 1}}
(kgdb) print *cred
$2 = {cr_ref = 2178945375, cr_uid = 4294967295, cr_ruid = 21168128, cr_svuid = 0, cr_ngroups = 0, cr_rgid = 0, cr_svgid = 4,
  cr_uidinfo = 0x0, cr_ruidinfo = 0x0, cr_prison = 0xfffff800130f3060, cr_loginclass = 0xfffff80013032d80, cr_flags = 318975384,
  cr_pspare2 = 0xfffff8000390f748, cr_label = 0x0, cr_audit = {ai_auid = 0, ai_mask = {am_success = 0, am_failure = 2164206432},
    ai_termid = {at_port = 4294967295, at_type = 2164206608, at_addr = 0xfffff8000390f774}, ai_asid = -1,
    ai_flags = 18446735277676361472}, cr_groups = 0x0, cr_agroups = 0, cr_smallgroups = 0xfffff8000390f79c}


Thank you for looking at this!!
Comment 2 Aragon Gouveia 2017-08-27 18:37:26 UTC
I've rebuilt with IPSEC_DEBUG enabled.  The only error unique to Android clients is this:

Aug 27 18:33:08 <kern.crit> vpn kernel: esp_input: payload of 76 octets not a multiple of 16 octets,  SA 1.2.3.4/04bc89ee

Just before the panic occurs.

Any help appreciated.
Comment 3 Aragon Gouveia 2017-08-27 19:25:41 UTC
Ok, finally I have a half-solution.  Just disable sha256 from racoon's phase2 (sainfo) configuration, and Android won't crash FreeBSD with its broken IKEv1 implementation.

More info on Android's broken sha256: https://wiki.strongswan.org/issues/1472

Would be nice if FreeBSD kernel handled this gracefully.  If it behaves like this with a broken VPN client, what is possible with a malicious VPN client?  Hmmm...
Comment 4 Andrey V. Elsukov freebsd_committer freebsd_triage 2017-08-29 21:56:33 UTC
Can you provide raconn's logs to see what it did before the panic?
Comment 5 Andrey V. Elsukov freebsd_committer freebsd_triage 2017-08-29 22:30:31 UTC
Did you try to add "options INVARIANTS" to your kernel? It enables many checks and probably some message from KASSERT will shed some light on what is going on here? Also, if it is not easy to do further debugging, I would be glad to get some instructions on how to configure simple test lab using mpd+l2tp+racoon to be able reproduce this problem locally.
Comment 6 Aragon Gouveia 2017-08-30 06:18:08 UTC
Sure, happy to help more!  I'm just struggling with building in INVARIANTS support:

cc -target x86_64-unknown-freebsd11.1 --sysroot=/usr/obj/usr/src/tmp -B/usr/obj/usr/src/tmp/usr/bin -c -O0  -g -nostdinc  -I. -I/usr/sr
c/sys -I/usr/src/sys/contrib/libfdt -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h  -fno-omit-frame-pointer -mno-omit-lea
f-frame-pointer -MD  -MF.depend.cam.o -MTcam.o -mcmodel=kernel -mno-red-zone -mno-mmx -mno-sse -msoft-float  -fno-asynchronous-unwind-t
ables -ffreestanding -fwrapv -fstack-protector -gdwarf-2 -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes  -Wmissing-protot
ypes -Wpointer-arith -Winline -Wcast-qual  -Wundef -Wno-pointer-sign -D__printf__=__freebsd_kprintf__  -Wmissing-include-dirs -fdiagnos
tics-show-option  -Wno-unknown-pragmas  -Wno-error-tautological-compare -Wno-error-empty-body  -Wno-error-parentheses-equality -Wno-err
or-unused-function  -Wno-error-pointer-sign -Wno-error-shift-negative-value -Wno-error-address-of-packed-member  -mno-aes -mno-avx  -st
d=iso9899:1999 -Werror  /usr/src/sys/cam/cam.c
In file included from /usr/src/sys/cam/cam.c:45:
In file included from /usr/src/sys/cam/cam_ccb.h:44:
In file included from /usr/src/sys/cam/nvme/nvme_all.h:32:
/usr/src/sys/dev/nvme/nvme.h:922:16: warning: taking address of packed member 'cdw10' of class or structure 'nvme_command' may result
      in an unaligned pointer value [-Waddress-of-packed-member]
        *(uint64_t *)&cmd->cdw10 = lba;
                      ^~~~~~~~~~
In file included from /usr/src/sys/cam/cam.c:45:
/usr/src/sys/cam/cam_ccb.h:1392:2: error: implicit declaration of function 'kassert_panic' is invalid in C99
      [-Werror,-Wimplicit-function-declaration]
        KASSERT((flags & CAM_DIR_MASK) == CAM_DIR_BOTH,
        ^
/usr/src/sys/sys/systm.h:88:3: note: expanded from macro 'KASSERT'
                kassert_panic msg;                                      \
                ^
In file included from /usr/src/sys/cam/cam.c:45:
/usr/src/sys/cam/cam_ccb.h:1392:2: error: this function declaration is not a prototype [-Werror,-Wstrict-prototypes]
/usr/src/sys/sys/systm.h:88:3: note: expanded from macro 'KASSERT'
                kassert_panic msg;                                      \
                ^
1 warning and 2 errors generated.
*** Error code 1

My kernel config is:

include GENERIC
ident IPSECDBG

options IPSEC_DEBUG
options INVARIANTS

In case it's easier for you to setup a test lab, the following racoon.conf should allow you to reproduce this with an Android L2TP/IPSec VPN session:

path include "/usr/local/etc/racoon";
path pre_shared_key "/usr/local/etc/racoon/psk.txt";

listen
{
	adminsock "/var/db/racoon/racoon.sock";
}

timer
{
	natt_keepalive 55 seconds;
}

remote anonymous
{
	exchange_mode main;
	passive on;
	doi ipsec_doi;
	my_identifier address 1.2.3.4;
	verify_cert off;
	lifetime time 24 hours;
	nat_traversal on;
	generate_policy on;
	dpd_delay 55;
	dpd_retry 10;
	nonce_size 64;
	proposal_check claim;

	proposal {
		encryption_algorithm aes 256;
		hash_algorithm sha256;
		authentication_method pre_shared_key;
		dh_group modp4096;
		lifetime time 24 hours;
	}
	proposal {
		encryption_algorithm aes 256;
		hash_algorithm sha256;
		authentication_method pre_shared_key;
		dh_group modp2048;
		lifetime time 24 hours;
	}
	proposal {
		encryption_algorithm aes 256;
		hash_algorithm sha256;
		authentication_method pre_shared_key;
		dh_group modp1024;
		lifetime time 24 hours;
	}
	proposal {
		encryption_algorithm aes;
		hash_algorithm sha1;
		authentication_method pre_shared_key;
		dh_group modp2048;
		lifetime time 24 hours;
	}
	proposal {
		encryption_algorithm aes;
		hash_algorithm sha1;
		authentication_method pre_shared_key;
		dh_group modp1024;
		lifetime time 24 hours;
	}
}

sainfo anonymous
{
	encryption_algorithm aes 256, aes;
	authentication_algorithm hmac_sha256, hmac_sha1;
	compression_algorithm deflate;
	lifetime time 8 hours;
}

You don't need to bother setting up MPD.  FreeBSD should crash before any L2TP packets reach userland.  Mine crashes without MPD running.
Comment 7 Eugene Grosbein freebsd_committer freebsd_triage 2017-08-30 07:29:51 UTC
(In reply to Aragon Gouveia from comment #6)

What version of Android do you use? Mine inists on Aggressive mode and racoon cannnot run Aggressive mode with your configuration due to different dh_group settings.
Comment 8 Aragon Gouveia 2017-08-30 08:23:33 UTC
(In reply to Eugene Grosbein from comment #7)

Android 6.0.1.

Maybe try enable aggressive mode in racoon.  The problem occurs after phase 2 negotiation, so maybe aggressive mode will still allow the bug to trigger.

The main thing to look out for in racoon logs is that both peers agree on sha256 for phase 2.
Comment 9 Eugene Grosbein freebsd_committer freebsd_triage 2017-08-30 08:31:14 UTC
(In reply to Aragon Gouveia from comment #8)

For unknown reason, I cannot get Android 6 to establish a connection with FreeBSD 11.1@r321371 ipsec-tools-0.8.2_1 using your configuration nor to reproduce a crash, racoon complains: "X.X.X.X give up to get IPsec-SA due to time up to wait"
Comment 10 Konstantin Belousov freebsd_committer freebsd_triage 2017-08-30 08:47:07 UTC
(In reply to Aragon Gouveia from comment #6)
You also need INVARIANTS_SUPPORT.

See https://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html
Comment 11 Aragon Gouveia 2017-08-30 15:11:15 UTC
Thanks Konstantin!  Invariant kernel displayed an additional error at panic.  Here's some kgdb info:

Unread portion of the kernel message buffer:

esp_input: payload of 76 octets not a multiple of 16 octets,  SA 197.189.200.16/019052a1
panic: attempt to free non DEAD SA 0xfffff8000398db00
cpuid = 1
KDB: stack backtrace:
#0 0xffffffff81007cf7 at kdb_backtrace+0xa7
#1 0xffffffff80fa22b9 at vpanic+0x249
#2 0xffffffff80fa1e44 at kassert_panic+0xc4
#3 0xffffffff81362a0e at key_delsav+0x7e
#4 0xffffffff81362981 at key_freesav+0xb1
#5 0xffffffff8137f934 at udp_ipsec_input+0x2d4
#6 0xffffffff812f5a9b at udp_append+0x1ab
#7 0xffffffff812f5739 at udp_input+0x12a9
#8 0xffffffff811ec014 at ip_input+0xd44
#9 0xffffffff81142587 at netisr_dispatch_src+0x167
#10 0xffffffff81142b87 at netisr_dispatch+0x27
#11 0xffffffff811196a0 at ether_demux+0x2d0
#12 0xffffffff8111b369 at ether_input_internal+0x7e9
#13 0xffffffff8111ab6f at ether_nh_input+0xbf
#14 0xffffffff81142587 at netisr_dispatch_src+0x167
#15 0xffffffff81142b87 at netisr_dispatch+0x27
#16 0xffffffff81119c8f at ether_input+0x9f
#17 0xffffffff80cfad69 at vtnet_rxq_input+0x169
Uptime: 3m59s
Dumping 121 out of 982 MB:..14%..27%..40%..53%..66%..80%..93%

Reading symbols from /boot/kernel/ng_socket.ko...Reading symbols from /usr/lib/debug//boot/kernel/ng_socket.ko.debug...done.
done.
Loaded symbols for /boot/kernel/ng_socket.ko
Reading symbols from /boot/kernel/netgraph.ko...Reading symbols from /usr/lib/debug//boot/kernel/netgraph.ko.debug...done.
done.
Loaded symbols for /boot/kernel/netgraph.ko
Reading symbols from /boot/kernel/ng_mppc.ko...Reading symbols from /usr/lib/debug//boot/kernel/ng_mppc.ko.debug...done.
done.
Loaded symbols for /boot/kernel/ng_mppc.ko
Reading symbols from /boot/kernel/rc4.ko...Reading symbols from /usr/lib/debug//boot/kernel/rc4.ko.debug...done.
done.
Loaded symbols for /boot/kernel/rc4.ko
#0  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298
298		dumptid = curthread->td_tid;
(kgdb) backtrace
#0  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298
#1  0xffffffff80fa1908 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:366
#2  0xffffffff80fa230f in vpanic (fmt=0xffffffff81ca7401 "attempt to free non DEAD SA %p", ap=0xfffffe003c1f9d40)
    at /usr/src/sys/kern/kern_shutdown.c:759
#3  0xffffffff80fa1e44 in kassert_panic (fmt=0xffffffff81ca7401 "attempt to free non DEAD SA %p")
    at /usr/src/sys/kern/kern_shutdown.c:649
#4  0xffffffff81362a0e in key_delsav (sav=0xfffff8000398db00) at /usr/src/sys/netipsec/key.c:2969
#5  0xffffffff81362981 in key_freesav (psav=0xfffffe003c1f9e40) at /usr/src/sys/netipsec/key.c:1237
#6  0xffffffff8137f934 in udp_ipsec_input (m=0xfffff80003c04600, off=9, af=2) at /usr/src/sys/netipsec/udpencap.c:210
#7  0xffffffff812f5a9b in udp_append (inp=0xfffff800039bd3a0, ip=0xfffff80003bf401a, n=0xfffff80003c04600, off=28, 
    udp_in=0xfffffe003c1fa268) at /usr/src/sys/netinet/udp_usrreq.c:333
#8  0xffffffff812f5739 in udp_input (mp=0xfffffe003c1fa3c8, offp=0xfffffe003c1fa3a0, proto=17)
    at /usr/src/sys/netinet/udp_usrreq.c:705
#9  0xffffffff811ec014 in ip_input (m=0x0) at /usr/src/sys/netinet/ip_input.c:823
#10 0xffffffff81142587 in netisr_dispatch_src (proto=1, source=0, m=0xfffff80003c04600) at /usr/src/sys/net/netisr.c:1120
#11 0xffffffff81142b87 in netisr_dispatch (proto=1, m=0xfffff80003c04600) at /usr/src/sys/net/netisr.c:1211
#12 0xffffffff811196a0 in ether_demux (ifp=0xfffff800034f3800, m=0xfffff80003c04600) at /usr/src/sys/net/if_ethersubr.c:850
#13 0xffffffff8111b369 in ether_input_internal (ifp=0xfffff800034f3800, m=0xfffff80003c04600) at /usr/src/sys/net/if_ethersubr.c:639
#14 0xffffffff8111ab6f in ether_nh_input (m=0xfffff80003c04600) at /usr/src/sys/net/if_ethersubr.c:669
#15 0xffffffff81142587 in netisr_dispatch_src (proto=5, source=0, m=0xfffff80003c04600) at /usr/src/sys/net/netisr.c:1120
#16 0xffffffff81142b87 in netisr_dispatch (proto=5, m=0xfffff80003c04600) at /usr/src/sys/net/netisr.c:1211
#17 0xffffffff81119c8f in ether_input (ifp=0xfffff800034f3800, m=0xfffff80003c04600) at /usr/src/sys/net/if_ethersubr.c:759
#18 0xffffffff80cfad69 in vtnet_rxq_input (rxq=0xfffff8000331fd00, m=0xfffff80003c04600, hdr=0xfffffe003c1fa890)
    at /usr/src/sys/dev/virtio/network/if_vtnet.c:1745
#19 0xffffffff80cfa7ba in vtnet_rxq_eof (rxq=0xfffff8000331fd00) at /usr/src/sys/dev/virtio/network/if_vtnet.c:1831
#20 0xffffffff80cfdb18 in vtnet_rx_vq_intr (xrxq=0xfffff8000331fd00) at /usr/src/sys/dev/virtio/network/if_vtnet.c:1876
#21 0xffffffff80cf2dd9 in virtqueue_intr (vq=0xfffffe0000dde000) at /usr/src/sys/dev/virtio/virtqueue.c:439
#22 0xffffffff80cf72cd in vtpci_vq_intr (xvq=0xfffffe0000dde000) at /usr/src/sys/dev/virtio/pci/virtio_pci.c:1316
#23 0xffffffff80f440a8 in intr_event_execute_handlers (p=0xfffff800032e2588, ie=0xfffff8000331f700)
    at /usr/src/sys/kern/kern_intr.c:1262
#24 0xffffffff80f44637 in ithread_execute_handlers (p=0xfffff800032e2588, ie=0xfffff8000331f700) at /usr/src/sys/kern/kern_intr.c:1275
#25 0xffffffff80f444bc in ithread_loop (arg=0xfffff800035690e0) at /usr/src/sys/kern/kern_intr.c:1356
#26 0xffffffff80f3fcbb in fork_exit (callout=0xffffffff80f443c0 <ithread_loop>, arg=0xfffff800035690e0, frame=0xfffffe003c1faac0)
    at /usr/src/sys/kern/kern_fork.c:1042
#27 0xffffffff816129ae in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:611
---Type <return> to continue, or q <return> to quit--- 
#28 0x0000000000000000 in ?? ()
(kgdb) frame 4
#4  0xffffffff81362a0e in key_delsav (sav=0xfffff8000398db00) at /usr/src/sys/netipsec/key.c:2969
2969		IPSEC_ASSERT(sav->state == SADB_SASTATE_DEAD,
(kgdb) list
2964	 */
2965	static void
2966	key_delsav(struct secasvar *sav)
2967	{
2968		IPSEC_ASSERT(sav != NULL, ("null sav"));
2969		IPSEC_ASSERT(sav->state == SADB_SASTATE_DEAD,
2970		    ("attempt to free non DEAD SA %p", sav));
2971		IPSEC_ASSERT(sav->refcnt == 0, ("reference count %u > 0",
2972		    sav->refcnt));
2973	
(kgdb) print *sav
$1 = {spi = 2706542593, flags = 0, seq = 2437428875, pid = 831, ivlen = 16, sah = 0xfffff800032d1800, key_auth = 0xfffff8000368b8b0, 
  key_enc = 0xfffff8000368b8c0, replay = 0xfffff800032de5a0, natt = 0xfffff800037ffd80, lock = 0xfffff8000365ed60, 
  tdb_xform = 0xffffffff821b8078, tdb_encalgxform = 0xffffffff821b9440, tdb_authalgxform = 0xffffffff821b97c0, 
  tdb_compalgxform = 0x0, tdb_cryptoid = 432345564227567617, alg_auth = 5 '\005', alg_enc = 12 '\f', alg_comp = 0 '\0', 
  state = 1 '\001', lft_c = 0xfffff800322013c0, lft_h = 0xfffff8000365ea20, lft_s = 0xfffff800034ef1a0, created = 1504104802, 
  firstused = 0, chain = {tqe_next = 0x0, tqe_prev = 0xfffff800032d1888}, spihash = {le_next = 0x0, le_prev = 0xfffff80003693c78}, 
  drainq = {le_next = 0x0, le_prev = 0x0}, cntr = 0, refcnt = 0}
(kgdb) frame 6
#6  0xffffffff8137f934 in udp_ipsec_input (m=0xfffff80003c04600, off=9, af=2) at /usr/src/sys/netipsec/udpencap.c:210
210			key_freesav(&sav);
(kgdb) list
205		 * will do this anyway, so don't touch them here.
206		 */
207		ESPSTAT_INC(esps_input);
208		error = (*sav->tdb_xform->xf_input)(m, sav, hlen, off);
209		if (error != 0)
210			key_freesav(&sav);
211	
212		return (EINPROGRESS);	/* Consumed by IPsec. */
213	}
214	

The associated racoon debug log (I've just edited peer IP addresses):

https://paste.ubuntu.com/25432438/
Comment 12 Andrey V. Elsukov freebsd_committer freebsd_triage 2017-08-30 19:39:59 UTC
(In reply to Aragon Gouveia from comment #11)
> Thanks Konstantin!  Invariant kernel displayed an additional error at panic.
> Here's some kgdb info:
> 
> Unread portion of the kernel message buffer:
> 
> esp_input: payload of 76 octets not a multiple of 16 octets,  SA
> 197.189.200.16/019052a1
> panic: attempt to free non DEAD SA 0xfffff8000398db00
> cpuid = 1
> KDB: stack backtrace:
> #0 0xffffffff81007cf7 at kdb_backtrace+0xa7
> #1 0xffffffff80fa22b9 at vpanic+0x249
> #2 0xffffffff80fa1e44 at kassert_panic+0xc4
> #3 0xffffffff81362a0e at key_delsav+0x7e
> #4 0xffffffff81362981 at key_freesav+0xb1
> #5 0xffffffff8137f934 at udp_ipsec_input+0x2d4
> #6 0xffffffff812f5a9b at udp_append+0x1ab
> #7 0xffffffff812f5739 at udp_input+0x12a9
> #8 0xffffffff811ec014 at ip_input+0xd44
> #9 0xffffffff81142587 at netisr_dispatch_src+0x167
> #10 0xffffffff81142b87 at netisr_dispatch+0x27
> #11 0xffffffff811196a0 at ether_demux+0x2d0
> #12 0xffffffff8111b369 at ether_input_internal+0x7e9
> #13 0xffffffff8111ab6f at ether_nh_input+0xbf
> #14 0xffffffff81142587 at netisr_dispatch_src+0x167
> #15 0xffffffff81142b87 at netisr_dispatch+0x27
> #16 0xffffffff81119c8f at ether_input+0x9f
> #17 0xffffffff80cfad69 at vtnet_rxq_input+0x169
> (kgdb) list
> 205		 * will do this anyway, so don't touch them here.
> 206		 */
> 207		ESPSTAT_INC(esps_input);
> 208		error = (*sav->tdb_xform->xf_input)(m, sav, hlen, off);
> 209		if (error != 0)
> 210			key_freesav(&sav);
> 211	
> 212		return (EINPROGRESS);	/* Consumed by IPsec. */
> 213	}
> 214	


Thanks. The NAT-T part of code was missed in the r318734.
Comment 13 Andrey V. Elsukov freebsd_committer freebsd_triage 2017-08-30 19:40:36 UTC
Created attachment 185935 [details]
Proposed patch
Comment 14 Andrey V. Elsukov freebsd_committer freebsd_triage 2017-08-31 10:38:53 UTC
Can you confirm that this patch fixed your problem?
Comment 15 commit-hook freebsd_committer freebsd_triage 2017-09-01 11:51:49 UTC
A commit references this bug:

Author: ae
Date: Fri Sep  1 11:51:08 UTC 2017
New revision: 323086
URL: https://svnweb.freebsd.org/changeset/base/323086

Log:
  Fix possible double releasing for SA reference.

  This is missing part of r318734. When crypto subsystem returns error
  the xform code handles an error independently.

  PR:		221849
  MFC after:	5 days

Changes:
  head/sys/netipsec/udpencap.c
Comment 16 Aragon Gouveia 2017-09-02 20:58:58 UTC
(In reply to Andrey V. Elsukov from comment #14)

Sorry for the wait.  I just tested it now, and it works!  Android's broken ESP packets just get dropped now, as expected.

Thank you, Andrey, everyone.
Comment 17 commit-hook freebsd_committer freebsd_triage 2017-09-06 10:21:44 UTC
A commit references this bug:

Author: ae
Date: Wed Sep  6 10:21:28 UTC 2017
New revision: 323216
URL: https://svnweb.freebsd.org/changeset/base/323216

Log:
  MFC r323086:
    Fix possible double releasing for SA reference.

    This is missing part of r318734. When crypto subsystem returns error
    the xform code handles an error independently.

    PR:		221849

Changes:
_U  stable/11/
  stable/11/sys/netipsec/udpencap.c
Comment 18 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:41:10 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
Comment 19 Eugene Grosbein freebsd_committer freebsd_triage 2018-05-28 20:37:54 UTC
Fixed 7 months ago.