| Summary: | 10.2 STABLE Crashing with IPSec Support | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Base System | Reporter: | Cassiano Peixoto <peixotocassiano> | ||||||
| Component: | kern | Assignee: | Fabien Thomas <fabient> | ||||||
| Status: | Closed Overcome By Events | ||||||||
| Severity: | Affects Some People | CC: | ae, emeric.poupon, g_amanakis, garga, gnn, jch, jim, rrs, rwatson, smh | ||||||
| Priority: | --- | Keywords: | crash, patch | ||||||
| Version: | 10.2-STABLE | Flags: | koobs:
mfc-stable10+
koobs: mfc-stable9? |
||||||
| Hardware: | Any | ||||||||
| OS: | Any | ||||||||
| Attachments: |
|
||||||||
|
Description
Cassiano Peixoto
2015-11-10 18:01:44 UTC
Assign to net@ with a Cc: to gnn. (In reply to Cassiano Peixoto from comment #0) > Fatal trap 12: page fault while in kernel mode > cpuid = 4; apic id = 32 > fault virtual address = 0x0 > fault code = supervisor read data, page not present > instruction pointer = 0x20:0xffffffff80ac9cbe ... > #6 0xffffffff80d2e8c2 at calltrap+0x8 > #7 0xffffffff80acaa7a at ipsec4_in_reject+0x2a > #8 0xffffffff80a73b3b at tcp_input+0x89b > #9 0xffffffff80a0484b at ip_input+0xab ... > (kgdb) list *0xffffffff80ac9cbe > 0xffffffff80ac9cbe is in ipsec_getpolicybysock > (/usr/src/sys/netipsec/ipsec.c:502). > 497 IPSEC_ASSERT(inp != NULL, ("null inp")); > 498 IPSEC_ASSERT(inp->inp_sp != NULL, ("null inp_sp")); > 499 IPSEC_ASSERT(inp->inp_sp->sp_out != NULL && inp->inp_sp->sp_in != NULL, > 500 ("null sp_in || sp_out")); > 501 > 502 error = ipsec_setspidx(m, &inp->inp_sp->sp_in->spidx, 1); It looks like inp->inp_sp is NULL here. Can you show output of following commands: (kgdb) f 7 (kgdb) p *inp (In reply to Andrey V. Elsukov from comment #2) Hi Andrey, Thanks for your help. Here is the output commands you asked. Let me know if you need something else. (kgdb) f 7 #7 0xffffffff80d2e8c2 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236 236 call trap Current language: auto; currently asm (kgdb) p *inp No symbol "inp" in current context. Just to let you know it was tested with FreeBSD 10.2-STABLE #1 r290651. Looks like that's from a different crash dump as frame 7 in your original was: #7 0xffffffff80acaa7a at ipsec4_in_reject+0x2a As a guess based on your previous info try frame 8 instead? Hello,
it seems we have the very same issue here.
Here is the backtrace:
(kgdb) bt
#0 doadump (textdump=<value optimized out>) at pcpu.h:237
#1 0xffffffff8044b9d2 in kern_reboot (howto=260) at ../../../kern/kern_shutdown.c:464
#2 0xffffffff8044bf3c in panic (fmt=0x104 <Address 0x104 out of bounds>) at ../../../kern/kern_shutdown.c:745
#3 0xffffffff80656a4d in trap_fatal (frame=0xfffffe0001c194a8, eva=<value optimized out>) at ../../../amd64/amd64/trap.c:878
#4 0xffffffff80656d68 in trap_pfault (frame=0xffffff8000ec1760, usermode=0) at ../../../amd64/amd64/trap.c:794
#5 0xffffffff8065710c in trap (frame=0xffffff8000ec1760) at ../../../amd64/amd64/trap.c:456
#6 0xffffffff80640cff in calltrap () at ../../../amd64/amd64/exception.S:232
#7 0xffffffff805b48d1 in ipsec_getpolicybysock (m=0xfffffe005fd0da00, dir=1, inp=0xfffffe00c26e9320, error=0xffffff8000ec186c) at ../../../netipsec/ipsec.c:328
#8 0xffffffff805b5664 in ipsec46_in_reject (m=0xfffffe005fd0da00, inp=<value optimized out>) at ../../../netipsec/ipsec.c:1291
#9 0xffffffff805b5ba9 in ipsec4_in_reject (m=<value optimized out>, inp=<value optimized out>) at ../../../netipsec/ipsec.c:1313
#10 0xffffffff8056b4d1 in tcp_input (m=0xfffffe005fd0da00, off0=20) at ../../../netinet/tcp_input.c:944
#11 0xffffffff8055e7a2 in ip_input (m=0xfffffe005fd0da00) at ../../../netinet/ip_input.c:1103
#12 0xffffffff80519393 in swi_net (arg=<value optimized out>) at ../../../net/netisr.c:807
#13 0xffffffff8042349d in intr_event_execute_handlers (p=<value optimized out>, ie=0xfffffe005f598200) at ../../../kern/kern_intr.c:1272
#14 0xffffffff80424c8d in ithread_loop (arg=0xfffffe005f530880) at ../../../kern/kern_intr.c:1285
#15 0xffffffff8042064f in fork_exit (callout=0xffffffff80424bf0 <ithread_loop>, arg=0xfffffe005f530880, frame=0xffffff8000ec1c40) at ../../../kern/kern_fork.c:996
#16 0xffffffff8064122e in fork_trampoline () at ../../../amd64/amd64/exception.S:606
#17 0x0000000000000000 in ?? ()
(kgdb) p *inp
$1 = {
inp_hash = {
le_next = 0x0,
le_prev = 0xffffff805d4c92e0
},
inp_pcbgrouphash = {
le_next = 0x0,
le_prev = 0x0
},
inp_list = {
le_next = 0xfffffe00c29024b0,
le_prev = 0xfffffe00cb627340
},
inp_ppcb = 0x0,
inp_pcbinfo = 0xffffffff80c9a3c0,
inp_pcbgroup = 0x0,
inp_pcbgroup_wild = {
le_next = 0x0,
le_prev = 0x0
},
inp_socket = 0x0,
inp_cred = 0xfffffe00cb880100,
inp_flow = 0,
inp_flags = 75497472,
inp_flags2 = 16,
inp_vflag = 0 '\0',
inp_ip_ttl = 64 '@',
inp_ip_p = 0 '\0',
inp_ip_minttl = 0 '\0',
inp_flowid = 0,
inp_refcount = 1,
inp_pspare = {0x0, 0x0, 0x0, 0x0, 0x0},
inp_ispare = {0, 0, 0, 0, 0, 0},
inp_ro_dst = {
s_addr = 0
},
inp_inc = {
inc_flags = 0 '\0',
inc_len = 0 '\0',
inc_fibnum = 0,
inc_ie = {
ie_fport = 51153,
ie_lport = 36895,
ie_dependfaddr = {
ie46_foreign = {
ia46_pad32 = {0, 0, 0},
ia46_addr4 = {
s_addr = 536939018
}
---Type <return> to continue, or q <return> to quit---
},
ie6_foreign = {
__u6_addr = {
__u6_addr8 = '\0' <repeats 12 times>, "\n\n\001 ",
__u6_addr16 = {0, 0, 0, 0, 0, 0, 2570, 8193},
__u6_addr32 = {0, 0, 0, 536939018}
}
}
},
ie_dependladdr = {
ie46_local = {
ia46_pad32 = {0, 0, 0},
ia46_addr4 = {
s_addr = 33554559
}
},
ie6_local = {
__u6_addr = {
__u6_addr8 = '\0' <repeats 12 times>, "\177\000\000\002",
__u6_addr16 = {0, 0, 0, 0, 0, 0, 127, 512},
__u6_addr32 = {0, 0, 0, 33554559}
}
}
}
}
},
inp_label = 0x0,
inp_sp = 0x0,
inp_depend4 = {
inp4_ip_tos = 0 '\0',
inp4_options = 0x0,
inp4_moptions = 0x0
},
inp_depend6 = {
inp6_options = 0x0,
inp6_outputopts = 0x0,
inp6_moptions = 0x0,
inp6_icmp6filt = 0x0,
inp6_cksum = 0,
inp6_hops = 0
},
inp_portlist = {
le_next = 0xfffffe00c27644b0,
le_prev = 0xfffffe00cb1bd010
},
inp_phd = 0xfffffe00cb1bd000,
inp_gencnt = 560249,
inp_lle = 0x0,
inp_rt = 0x0,
---Type <return> to continue, or q <return> to quit---
inp_lock = {
lock_object = {
lo_name = 0xffffffff8071866f "tcpinp",
lo_flags = 90898432,
lo_data = 0,
lo_witness = 0x0
},
rw_lock = 18446741876286327076
}
}
(kgdb)
Looks like the inp struct has been freed (inp_flags2 = 16), but the struct is still referenced somewhere (refcnt = 1)
What do you think?
(In reply to emeric.poupon from comment #5) > Looks like the inp struct has been freed (inp_flags2 = 16), but the struct > is still referenced somewhere (refcnt = 1) > > What do you think? Also it has INP_DROPPED flag. Is there `i lo` available in the frame 10? Or maybe it is possible to dump some info about packet (tcp flags, etc) from mbuf pointer? BTW, the crash occurs on FreeBSD-9.3.
I have that:
(kgdb) f 10
#10 0xffffffff8056b4d1 in tcp_input (m=0xfffffe005fd0da00, off0=20) at ../../../netinet/tcp_input.c:944
944 in ../../../netinet/tcp_input.c
(kgdb) info locals
next_hop = <value optimized out>
th = (struct tcphdr *) 0xfffffe005fd0dacc
ip = (struct ip *) 0xfffffe005fd0dab8
inp = (struct inpcb *) 0xfffffe00c26e9320
tp = (struct tcpcb *) 0x0
so = (struct socket *) 0x0
optp = (u_char *) 0x0
optlen = 0
len = <value optimized out>
tlen = 0
off = 40
thflags = 16
rstreason = <value optimized out>
iptos = 0 '\0'
fwd_tag = (struct m_tag *) 0x0
ip6 = (struct ip6_hdr *) 0x0
isipv6 = 0
to = {
to_flags = 0,
to_tsval = 3406246400,
to_tsecr = 4294966784,
to_sacks = 0x2508040 <Address 0x2508040 out of bounds>,
to_signature = 0x0,
to_mss = 6560,
to_wscale = 236 '�',
to_nsacks = 0 '\0',
to_spare = 4294967168
}
s = <value optimized out>
ti_locked = 1
__func__ = "tcp_input"
This is some great debugging happening here, and I don't want to stifle it, but for future debug logs, can we please use attachments (with appropriate descriptions) so it's easier to follow the conversation in comments Thanks! :) Created attachment 163095 [details]
debug
Comment on attachment 163095 [details]
debug
Here is the attachment with f7 and f8.
This problem looks not related with IPSEC. Add several TCP-people to the CC list :) I found this commit : http://svnweb.freebsd.org/base?view=revision&revision=241129. The comment says the issue is located in_pcbrele_rlocked function, but I think the issue we see here may be related to the very same problem in the in_pcbrele_wlocked function. Maybe someone can explain me more about this problem? Created attachment 163350 [details]
Fix race in in_pcbrele_wlocked
We are now sure there is a race in in_pcbrele_wlocked: the returned inp may have been freed (problem spotted on a test machine)
Please try the attached patch.
Additionally, we would need to check every caller to make sure they check the inp once they unlock/relock the inp
(In reply to emeric.poupon from comment #13) Hi emeric, Did you check my debug? are you sure we are talking about the same issue? Should i test this patch? Have i just recompile kernel or buidworld too? Thanks. Hi, Yes I am talking about the same issue. Just rebuild the kernel with this patch, and please let me know the results. (In reply to emeric.poupon from comment #15) I'll try it now and let you know the results. Thanks. (In reply to emeric.poupon from comment #15) Hi Emeric, Your patch fixed the bug. Thank you very much for your help. My system is now running for 15 hours with no reboot :) Will you commit this patch ASAP? Thank you again. Hi, I am glad you confirm it fixes the problem. It is planned to be committed soon. (In reply to emeric.poupon from comment #18) Please update this PR when it's commited. So i can keep posted :) Thanks again. Assign to committer that resolved. Commit has been merged to stable/10. @Fabien, if this commit doesn't apply to stable/9, please set the mfc-stable9 flag to -, otherwise set to + when merged. (In reply to Kubilay Kocak from comment #21) Thanks guys! batch change of PRs untouched in 2018 marked "in progress" back to open. 10.x has reached EOL. Also there was related commit. The problem should be fixed. |