Bug 193724

Summary: [panic] [tcp] in tcp_discardcb (/usr/src/sys/netinet/tcp_subr.c:929)
Product: Base System Reporter: Palle Girgensohn <girgen>
Component: kernAssignee: freebsd-net (Nobody) <net>
Status: Closed FIXED    
Severity: Affects Only Me CC: hiren
Priority: --- Keywords: crash
Version: 10.0-RELEASE   
Hardware: amd64   
OS: Any   
See Also: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203175

Description Palle Girgensohn freebsd_committer freebsd_triage 2014-09-17 21:58:53 UTC
Hi!

We got a spontaneous reboot on a producion system. I have a crash dump, can someone perhaps make use of it to actually find the culprit. I'd rather see it happen again.

I cannot find anything in the log files.

This is FreeBSD 10.0-RELEASE-p6

Kernel conf:
---
include GENERIC

cpu             HAMMER
ident           CAJA

# Virtual networking for jail
options         VIMAGE

# The nullFS to mount local directory
options         NULLFS
---

[girgen@caja /usr/obj/usr/src/sys/CAJA]$ kgdb kernel.debug /var/crash/vmcore.0
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 22; apic id = 26
fault virtual address	= 0x20
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff80a506e2
stack pointer	        = 0x28:0xfffffe1835e5e780
frame pointer	        = 0x28:0xfffffe1835e5e7f0
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 84230 (httpd)
trap number		= 12
panic: page fault
cpuid = 22
KDB: stack backtrace:
#0 0xffffffff808eb720 at kdb_backtrace+0x60
#1 0xffffffff808b2de5 at panic+0x155
#2 0xffffffff80ca05b2 at trap_fatal+0x3a2
#3 0xffffffff80ca0889 at trap_pfault+0x2c9
#4 0xffffffff80ca0016 at trap+0x5e6
#5 0xffffffff80c872b2 at calltrap+0x8
#6 0xffffffff80a58d6e at tcp_usr_detach+0xde
#7 0xffffffff80923263 at sofree+0x163
#8 0xffffffff809237c2 at soclose+0x362
#9 0xffffffff808718c9 at _fdrop+0x29
#10 0xffffffff80874137 at closef+0x237
#11 0xffffffff80871b95 at closefp+0x95
#12 0xffffffff80ca0ea7 at amd64_syscall+0x357
#13 0xffffffff80c8759b at Xfast_syscall+0xfb
Uptime: 1d19h54m41s
Dumping 13667 out of 98244 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

Reading symbols from /boot/kernel/zfs.ko.symbols...done.
Loaded symbols for /boot/kernel/zfs.ko.symbols
Reading symbols from /boot/kernel/opensolaris.ko.symbols...done.
Loaded symbols for /boot/kernel/opensolaris.ko.symbols
Reading symbols from /boot/kernel/ng_bridge.ko.symbols...done.
Loaded symbols for /boot/kernel/ng_bridge.ko.symbols
Reading symbols from /boot/kernel/netgraph.ko.symbols...done.
Loaded symbols for /boot/kernel/netgraph.ko.symbols
Reading symbols from /boot/kernel/ng_eiface.ko.symbols...done.
Loaded symbols for /boot/kernel/ng_eiface.ko.symbols
Reading symbols from /boot/kernel/ng_ether.ko.symbols...done.
Loaded symbols for /boot/kernel/ng_ether.ko.symbols
Reading symbols from /boot/kernel/accf_data.ko.symbols...done.
Loaded symbols for /boot/kernel/accf_data.ko.symbols
Reading symbols from /boot/kernel/accf_http.ko.symbols...done.
Loaded symbols for /boot/kernel/accf_http.ko.symbols
Reading symbols from /boot/kernel/ums.ko.symbols...done.
Loaded symbols for /boot/kernel/ums.ko.symbols
Reading symbols from /boot/kernel/ng_socket.ko.symbols...done.
Loaded symbols for /boot/kernel/ng_socket.ko.symbols
#0  doadump (textdump=<value optimized out>) at pcpu.h:219
219		__asm("movq %%gs:%1,%0" : "=r" (td)
(kgdb) list *0xffffffff80a506e2
0xffffffff80a506e2 is in tcp_discardcb (/usr/src/sys/netinet/tcp_subr.c:929).
924		 * portion of the remainder of tcp_discardcb() to an asynchronous
925		 * context that can callout_drain() and then continue.  Some care
926		 * will be required to ensure that no further processing takes place
927		 * on the tcpcb, even though it hasn't been freed (a flag?).
928		 */
929		callout_stop(&tp->t_timers->tt_rexmt);
930		callout_stop(&tp->t_timers->tt_persist);
931		callout_stop(&tp->t_timers->tt_keep);
932		callout_stop(&tp->t_timers->tt_2msl);
933		callout_stop(&tp->t_timers->tt_delack);
Current language:  auto; currently minimal
(kgdb) backtrace
#0  doadump (textdump=<value optimized out>) at pcpu.h:219
#1  0xffffffff808b2a60 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:447
#2  0xffffffff808b2e24 in panic (fmt=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:754
#3  0xffffffff80ca05b2 in trap_fatal (frame=<value optimized out>, eva=<value optimized out>)
    at /usr/src/sys/amd64/amd64/trap.c:882
#4  0xffffffff80ca0889 in trap_pfault (frame=0xfffffe1835e5e6d0, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:699
#5  0xffffffff80ca0016 in trap (frame=0xfffffe1835e5e6d0) at /usr/src/sys/amd64/amd64/trap.c:463
#6  0xffffffff80c872b2 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:232
#7  0xffffffff80a506e2 in tcp_discardcb (tp=0x0) at /usr/src/sys/netinet/tcp_subr.c:905
#8  0xffffffff80a58d6e in tcp_usr_detach (so=<value optimized out>) at /usr/src/sys/netinet/tcp_usrreq.c:207
#9  0xffffffff80923263 in sofree (so=0xfffff80934dac570) at /usr/src/sys/kern/uipc_socket.c:735
#10 0xffffffff809237c2 in soclose (so=<value optimized out>) at /usr/src/sys/kern/uipc_socket.c:837
#11 0xffffffff808718c9 in _fdrop (fp=0xfffff8002d9b3050, td=0xfffff80060425000) at file.h:342
#12 0xffffffff80874137 in closef (fp=<value optimized out>, td=<value optimized out>)
    at /usr/src/sys/kern/kern_descrip.c:2310
#13 0xffffffff80871b95 in closefp (fdp=0xfffff802fc70b800, fd=<value optimized out>, fp=0xfffff8002d9b3050, 
    td=0xfffff80060425000, holdleaders=<value optimized out>) at /usr/src/sys/kern/kern_descrip.c:1159
#14 0xffffffff80ca0ea7 in amd64_syscall (td=0xfffff80060425000, traced=0) at subr_syscall.c:134
#15 0xffffffff80c8759b in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:391
#16 0x0000000801e6214a in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) quit
Comment 1 Palle Girgensohn freebsd_committer freebsd_triage 2014-09-17 22:11:59 UTC
Since we use VIMAGE, and this is still "experimental" (although we've used for two years now, it usually doesn't crash unless you take an interface down), could there be something there? It was an httpd (apache 2.2) process, in a jail with netgraph bridge and viamge, that was active when the page fault occured.
Comment 2 Hiren Panchasara freebsd_committer freebsd_triage 2015-12-01 08:30:43 UTC
Palle,

Isn't this the one fixed by jch@? If so, we can close this bug.
Comment 3 Palle Girgensohn freebsd_committer freebsd_triage 2015-12-01 09:42:01 UTC
(In reply to Hiren Panchasara from comment #2)

It is not entirely clear to me that this is the same problem, but it might well be, and if so, it could be marked as a duplicate of PR 203175