148807 – [panic] "panic: sbdrop" and "panic: sbsndptr: sockbuf _ and mbuf _ clashing" (8.1-RELEASE/10.1-STABLE/11-CURRENT)

Bug 148807 - [panic] "panic: sbdrop" and "panic: sbsndptr: sockbuf _ and mbuf _ clashing" (8.1-RELEASE/10.1-STABLE/11-CURRENT)

Summary: [panic] "panic: sbdrop" and "panic: sbsndptr: sockbuf _ and mbuf _ clashing" ...

Status:	Closed Not Enough Information

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	10.1-STABLE
Hardware:	Any Any

Importance:	Normal Affects Only Me
Assignee:	freebsd-net (Nobody)

URL:
Keywords:	crash, needs-qa, patch

Duplicates (1):	147558 (view as bug list)
Depends on:
Blocks:

Reported:	2010-07-21 07:20 UTC by Mike Andrews
Modified:	2018-09-04 18:00 UTC (History)
CC List:	20 users (show)

See Also:	218270

Flags:	koobs: mfc-stable11? koobs: mfc-stable10? koobs: mfc-stable9?

Attachments
sockbuf_debug-20100813.diff (4.48 KB, patch) 2010-08-13 11:02 UTC, Andre Oppermann	no flags	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Mike Andrews 2010-07-21 07:20:01 UTC

Under heavy load (i.e. enough to make an 8 GB machine start to swap) I'm
seeing multiple (identical) machines panic repeatedly with the above panic
 messages.  The panics go away once load goes down.

Also very occasionally seeing "em0: discard frame w/o packet header" before
the panics, though not very often.

Hardware is five identical Supermicro PDSMI+ systems, Q6600, 8 GB ECC memory.

The only references I'm finding to these panics on Google seem to point at
either IPv6 or em as potential issues, and we're using both.  :)  Specifically
http://groups.google.com/group/mailing.freebsd.net/browse_thread/thread/28db45413a889411
looks VERY similar.  I have not yet tried shutting off IPv6 or at least
switching some services back to IPv4.  (Our v6 usage is all internal-only.)

core.txt.* files are at http://www.bit0.com/tmp/core.txt.20100721.tar.gz

I have minidumps as well but as they may contain some proprietary data
I'd rather not post 'em online, however I can run whatever kgdb commands
are needed to help troubleshoot.  :)

Fix: 

Unknown
How-To-Repeat: See above

Comment 1 Bruce Cran freebsd_committer

2010-07-21 17:30:32 UTC

Responsible Changed
From-To: freebsd-bugs->freebsd-net

Over to maintainer(s).

Comment 2 Andre Oppermann freebsd_committer

2010-08-12 09:24:22 UTC

Responsible Changed
From-To: freebsd-net->andre

Take over.

Comment 3 Andre Oppermann freebsd_committer

2010-08-12 09:35:24 UTC

Mike,

I see that you use ZERO_COPY_SOCKETS in your kernel file base on the
information in the crash dumps.  ZERO_COPY_SOCKETS may have some bugs
regarding the mbuf and vm page lifecycle.  Their use is not really
supported at the moment and we have highly optimized the normal send
path.  So further optimizations are not really necessary.

Please recompile your kernel without ZERO_COPY_SOCKETS and report whether
you still see sbdrop and sockbuf panics.

Debugging ZERO_COPY_SOCKETS is very difficult because of the complex
interactions between the VM, mbuf and sockbuf systems.

-- 
Andre

Comment 4 Mike Andrews 2010-08-12 17:21:16 UTC

I removed ZERO_COPY_SOCKETS and am still seeing panics.  A new set of 
core.txt files is at http://www.bit0.com/tmp/core.txt.20100812.tar.gz

Comment 5 Andre Oppermann freebsd_committer

2010-08-13 11:02:20 UTC

On 12.08.2010 18:21, Mike Andrews wrote:
> I removed ZERO_COPY_SOCKETS and am still seeing panics.  A new set of
> core.txt files is at http://www.bit0.com/tmp/core.txt.20100812.tar.gz

Please try the attached patch and compile your kernel with INVARIANTS.
It contains some debugging code to catch any corruption to the sockbuf
when it happens and may also a few potential fixes.

We can narrow it down now.

-- 
Andre

Comment 6 Mike Andrews 2010-08-13 19:25:01 UTC

I'll try this this evening.  Would options SOCKBUF_DEBUG help any?

Comment 7 pluknet 2010-08-14 07:07:00 UTC

On 13 August 2010 14:02, Andre Oppermann <andre@freebsd.org> wrote:
> On 12.08.2010 18:21, Mike Andrews wrote:
>>
>> I removed ZERO_COPY_SOCKETS and am still seeing panics. =A0A new set of
>> core.txt files is at http://www.bit0.com/tmp/core.txt.20100812.tar.gz
>
> Please try the attached patch and compile your kernel with INVARIANTS.
> It contains some debugging code to catch any corruption to the sockbuf
> when it happens and may also a few potential fixes.
>
> We can narrow it down now.

[as I (occasionally) was added to cc: list]
so, I tried this patch and box got panic near starting multiuser:

My testbox has no swapspace, no debug symbols this time :(

panic: sbflush_internal: sb_cc !=3D total mbuf length

db> bt
Tracing pid 982 tid 100111 td 0xffffff0008032440
kdb_enter() at kdb_enter+0x3d
panic() at panic+0x17b
sbflush_internal() at sbflush_internal+0x98
sbrelease_internal() at sbrelease_internal+0x1c
sofree() at sofree+0x1bb
soclose() at soclose+0x32b
_fdrop() at _fdrop+0x23
closef() at closef+0x5b
kern_close() at kern_close+0x110
syscallenter() at syscallenter+0x1aa
syscall() at syscall+0x4c
Xfast_syscall() at Xfast_syscall+0xe2
--- syscall (6, FreeBSD ELF64, close), rip =3D 0x800742fbc, rsp =3D
0x7fffffffd2f8, rbp =3D 0x800c07060 ---
db> show proc 982
Process 982 (ypbind) at 0xffffff0008034000:
 state: NORMAL
 uid: 0  gids: 0
 parent: pid 980 at 0xffffff00080358c0
 ABI: FreeBSD ELF64
 arguments: /usr/sbin/ypbind
 threads: 1
100111                   Run     CPU 1                       ypbind

--=20
wbr,
pluknet

Comment 8 mwisnicki 2010-08-24 21:24:34 UTC

Same thing happened to me today. I don't have ZERO_COPY_SOCKETS,
however I use IPv6.
Until today my system was completely solid (no panics for weeks/months
of uptime) and only change I did recently was enabling ALTQ in pf.conf
week ago but it was working for a couple of days with no ill effects.

Panic happened when I resumed Win7 client and opened web browser. I'm
running Squid-3.1 with pf transparent redirection.

System: FreeBSD 8.1-PRERELEASE #4: Wed Jul 14 21:47:49 CEST 2010
Config: GENERIC - (everything that can go into kld) + SW_WATCHDOG +
DEVICE_POLLING (not used) + ALTQ + DDB
KLDs: kernel vesa.ko geom_journal.ko geom_label.ko if_rl.ko miibus.ko
if_vr.ko snd_via8233.ko sound.ko usb.ko ukbd.ko ums.ko umass.ko cam.ko
agp.ko uhci.ko ehci.ko kbdmux.ko geom_part_gpt.ko atapicam.ko if_br
idge.ko bridgestp.ko wlan_wep.ko wlan.ko wlan_tkip.ko wlan_ccmp.ko
wlan_xauth.ko wlan_acl.ko cpufreq.ko netgraph.ko aio.ko sem.ko acpi.ko
geom_eli.ko crypto.ko zlib.ko procfs.ko pseudofs.ko linprocfs.ko linu
x.ko nullfs.ko pf.ko if_tun.ko ng_ether.ko ng_pppoe.ko ng_socket.ko
if_stf.ko nfsclient.ko nfs_common.ko krpc.ko nfsserver.ko nfssvc.ko
nfslockd.ko ng_mppc.ko rc4.ko fuse.ko accf_http.ko accf_data.ko

Dmesg:

Unread portion of the kernel message buffer:
panic: sbsndptr: sockbuf 0xc6ea65bc and mbuf 0xc6248100 clashing
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper(c0764c90,e4208910,c04fa0b9,c077f5e6,0,...) at
db_trace_self_wrapper+0x26
kdb_backtrace(c077f5e6,0,c0767e81,e420891c,0,...) at kdb_backtrace+0x29
panic(c0767e81,c0749504,c6ea65bc,c6248100,c4eb24f0,...) at panic+0x119
sbsndptr(c6ea65bc,0,4fd,e42089cc,c4f56ef8,...) at sbsndptr+0xa6
tcp_output(c4eb24f0,0,0,c3e45800,c47b0700,...) at tcp_output+0xc84
tcp_do_segment(c4eb24f0,28,0,0,2,...) at tcp_do_segment+0x1f45
tcp_input(c6079d00,14,c3e45800,1,0,...) at tcp_input+0x11d0
ip_input(c6079d00,e4208bb0,80246,24,e4208bd8,...) at ip_input+0x6e5
netisr_dispatch_src(1,0,c6079d00,e4208c00,c05abb61,...) at
netisr_dispatch_src+0x89
netisr_dispatch(1,c6079d00,0,c3e45800,c62d8808,...) at netisr_dispatch+0x20
ether_demux(c3e45800,c6079d00,3,0,3,...) at ether_demux+0x161
ether_input(c3e45800,c6079d00,c0940756,587,c3e8eaec,...) at ether_input+0x323
vr_rxeof(c3e8eaec,0,c0940756,68b,c3e8eaec,...) at vr_rxeof+0x219
vr_intr(c3e8e000,0,109,801bafb1,c8ac,...) at vr_intr+0x114
intr_event_execute_handlers(c3d407f8,c3d3d300,c075f556,52d,c3d3d370,...)
at intr_event_execute_handlers+0x14b
ithread_loop(c3e1a0e0,e4208d38,7afffffb,ddfffbff,84ff77ff,...) at
ithread_loop+0x6b
fork_exit(c04d0fd0,c3e1a0e0,e4208d38) at fork_exit+0x90
fork_trampoline() at fork_trampoline+0x8
--- trap 0, eip = 0, esp = 0xe4208d70, ebp = 0 ---
Uptime: 1d17h48m27s
Physical memory: 1007 MB

Kgdb:

(kgdb) bt
#0  doadump () at pcpu.h:230
#1  0xc04f9e57 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:416
#2  0xc04fa0f5 in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:590
#3  0xc0558126 in sbsndptr (sb=0xc6ea65bc, off=0,
len=dwarf2_read_address: Corrupted DWARF expression.
) at /usr/src/sys/kern/uipc_sockbuf.c:954
#4  0xc0636c04 in tcp_output (tp=0xc4eb24f0) at
/usr/src/sys/netinet/tcp_output.c:817
#5  0xc0633a85 in tcp_do_segment (m=0xc6079d00, th=0xc62d882a,
so=0xc6ea64d4, tp=0xc4eb24f0, drop_hdrlen=40, tlen=0, iptos=0 '\0',
ti_locked=2)
    at /usr/src/sys/netinet/tcp_input.c:2693
#6  0xc0635080 in tcp_input (m=0xc6079d00, off0=20) at
/usr/src/sys/netinet/tcp_input.c:1029
#7  0xc05cc6f5 in ip_input (m=0xc6079d00) at /usr/src/sys/netinet/ip_input.c:793
#8  0xc05aed89 in netisr_dispatch_src (proto=1, source=0,
m=0xc6079d00) at /usr/src/sys/net/netisr.c:917
#9  0xc05af050 in netisr_dispatch (proto=1, m=0xc6079d00) at
/usr/src/sys/net/netisr.c:1004
#10 0xc05abb61 in ether_demux (ifp=0xc3e45800, m=0xc6079d00) at
/usr/src/sys/net/if_ethersubr.c:901
#11 0xc05ac0c3 in ether_input (ifp=0xc3e45800, m=0xc6079d00) at
/usr/src/sys/net/if_ethersubr.c:760
#12 0xc093bea9 in vr_rxeof (sc=0xc3e8e000) at
/usr/src/sys/modules/vr/../../dev/vr/if_vr.c:1416
#13 0xc093f374 in vr_intr (arg=0xc3e8e000) at
/usr/src/sys/modules/vr/../../dev/vr/if_vr.c:1710
#14 0xc04cf94b in intr_event_execute_handlers (p=0xc3d407f8,
ie=0xc3d3d300) at /usr/src/sys/kern/kern_intr.c:1220
#15 0xc04d103b in ithread_loop (arg=0xc3e1a0e0) at
/usr/src/sys/kern/kern_intr.c:1233
#16 0xc04cd1d0 in fork_exit (callout=0xc04d0fd0 <ithread_loop>,
arg=0xc3e1a0e0, frame=0xe4208d38) at /usr/src/sys/kern/kern_fork.c:844
#17 0xc070eb94 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:273

(kgdb) frame 3
#3  0xc0558126 in sbsndptr (sb=0xc6ea65bc, off=0,
len=dwarf2_read_address: Corrupted DWARF expression.
) at /usr/src/sys/kern/uipc_sockbuf.c:954
954                     panic("%s: sockbuf %p and mbuf %p clashing",
__func__, sb, ret);

(kgdb) p *sb
$1 = {sb_sel = {si_tdlist = {tqh_first = 0x0, tqh_last = 0xc6ea65bc},
si_note = {kl_list = {slh_first = 0x0}, kl_lock = 0xc04c6910
<knlist_mtx_lock>,
      kl_unlock = 0xc04c68c0 <knlist_mtx_unlock>, kl_assert_locked =
0xc04c35e0 <knlist_mtx_assert_locked>,
      kl_assert_unlocked = 0xc04c35f0 <knlist_mtx_assert_unlocked>,
kl_lockarg = 0xc6ea65e0}, si_mtx = 0xc4490448}, sb_mtx = {lock_object
= {
      lo_name = 0xc0767ab5 "so_snd", lo_flags = 16973824, lo_data = 0,
lo_witness = 0x0}, mtx_lock = 3285714816}, sb_sx = {lock_object = {
      lo_name = 0xc0767f89 "so_snd_sx", lo_flags = 36896768, lo_data =
0, lo_witness = 0x0}, sx_lock = 1}, sb_state = 0, sb_mb = 0xc6248100,
  sb_mbtail = 0xc6292a00, sb_lastrecord = 0xc6248100, sb_sndptr = 0x0,
sb_sndptroff = 601, sb_cc = 1277, sb_hiwat = 33580, sb_mbcnt = 2048,
sb_mcnt = 8,
  sb_ccnt = 0, sb_mbmax = 262144, sb_ctl = 0, sb_lowat = 2048,
sb_timeo = 0, sb_flags = 2048, sb_upcall = 0, sb_upcallarg = 0x0}

(kgdb) p *buf
$2 = {b_bufobj = 0xc6835b20, b_bcount = 16384, b_caller1 = 0x0,
  b_data = 0xd7f90000 "<unreadable garbage>"..., b_error = 0, b_iocmd
= 1 '\001', b_ioflags = 2 '\002',
  b_iooffset = 469651013632, b_resid = 0, b_iodone = 0, b_blkno =
917287136, b_offset = 72007680, b_bobufs = {tqe_next = 0xd7e1de00,
    tqe_prev = 0xd7d52888}, b_left = 0x0, b_right = 0x0, b_vflags = 0,
b_freelist = {tqe_next = 0xd7d90f20, tqe_prev = 0xd7df1adc}, b_qindex
= 1,
  b_flags = 805306400, b_xflags = 2 '\002', b_lock = {lock_object =
{lo_name = 0xc0768b4a "bufwait", lo_flags = 91422720, lo_data = 0,
lo_witness = 0x0},
    lk_lock = 1, lk_timo = 0, lk_pri = 80}, b_bufsize = 16384,
b_runningbufspace = 0,
  b_kvabase = 0xd7f90000 "<unreadable garbage>"..., b_kvasize = 16384,
b_lblkno = 4395, b_vp = 0xc6835a78,
  b_dirtyoff = 0, b_dirtyend = 0, b_rcred = 0x0, b_wcred = 0x0,
b_saveaddr = 0xd7f90000, b_pager = {pg_reqpage = 0}, b_cluster =
{cluster_head = {
      tqh_first = 0xd7ef93a0, tqh_last = 0xd7dfd1b0}, cluster_entry =
{tqe_next = 0xd7ef93a0, tqe_prev = 0xd7dfd1b0}}, b_pages =
{0xc1b6e200, 0xc150bde8,
    0xc150be30, 0xc1ba1c08, 0x0 <repeats 28 times>}, b_npages = 4,
b_dep = {lh_first = 0x0}, b_fsprivate1 = 0x0, b_fsprivate2 = 0x0,
b_fsprivate3 = 0x0,
  b_pin_count = 0}


"<unreadable garbage>" were inserted by me in place of contents.
I wonder why gdb is never fully working with anything more than -O0
(variables not available, corrupted dwarf, etc.). I didn't have in
distant past (FreeBSD 5.x I think).

Comment 9 Mike Andrews 2010-09-09 18:20:23 UTC

With the patch and with INVARIANTS it's harder to reproduce but still 
possible.  I had a few crashes that hung while attempting to write dumps.

Today I had this one on a different machine (same kernel though) that 
was NOT under significant load but shares other characteristics -- same 
hardware, same use of IPv6, etc.

http://www.bit0.com/tmp/core.beer.2010-09-09.txt.gz

Comment 10 Mike Andrews 2010-09-16 02:53:19 UTC

Got a panic when attempting to manually reboot just now.
During the shutdown:

panic: sbflush_internal: sb_cc != total mbuf length
cpuid = 1
Uptime: 1d6h22m22s
Physical memory: 8179 MB
Dumping 3250 MB: 3235 3219 3203

...and it never finished the dump, it just hung at 3203.

I'm going to try running today's 8.1-STABLE instead of -RELEASE.
We'll see if that helps.

Comment 11 pluknet 2010-09-28 09:20:23 UTC

hi there,

If my eyes do not deceive me, the source of my panic
posted above was in the wrong KASSERT() check.
Should be '+=' in slen = m_length(n, &n) like this:

#ifdef INVARIANTS
static int
sb_cc_check(struct sockbuf *sb)
{
        struct mbuf *n = sb->sb_mb;
        int slen = 0;

        while (n) {
                slen += m_length(n, &n);
                n = n->m_nextpkt;
        }

        return (slen == sb->sb_cc ? 1 : 0);
}
#endif

Comment 12 Andrey V. Elsukov freebsd_committer

2015-02-02 19:40:01 UTC

Recently we have this panic two times just after reboot on 9.3-STABLE. So, I think high load isn't needed to reproduce. I found that TCP connection triggered the panic was SSH via IPv6.

Unread portion of the kernel message buffer:
panic: sbsndptr: sockbuf 0xfffffe01505826d0 and mbuf 0xfffffe015043b000 clashing
cpuid = 30
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame 0xffffff90c310d430
kdb_backtrace() at kdb_backtrace+0x37/frame 0xffffff90c310d4f0
panic() at panic+0x1ce/frame 0xffffff90c310d5f0
sbsndptr() at sbsndptr+0xe4/frame 0xffffff90c310d610
tcp_output() at tcp_output+0x16cd/frame 0xffffff90c310d7c0
tcp_usr_send() at tcp_usr_send+0x325/frame 0xffffff90c310d820
sosend_generic() at sosend_generic+0x3f6/frame 0xffffff90c310d8c0
soo_write() at soo_write+0x5e/frame 0xffffff90c310d8f0
dofilewrite() at dofilewrite+0x85/frame 0xffffff90c310d940
kern_writev() at kern_writev+0x6c/frame 0xffffff90c310d980
sys_write() at sys_write+0x64/frame 0xffffff90c310d9d0
amd64_syscall() at amd64_syscall+0x5ea/frame 0xffffff90c310daf0
Xfast_syscall() at Xfast_syscall+0xf7/frame 0xffffff90c310daf0

(kgdb) bt
#0  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:271
#1  0xffffffff80907eb4 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:454
#2  0xffffffff809083a7 in panic (fmt=0x1 <Address 0x1 out of bounds>) at /usr/src/sys/kern/kern_shutdown.c:642
#3  0xffffffff809766e4 in sbsndptr (sb=<value optimized out>, off=<value optimized out>, len=<value optimized out>, moff=<value optimized out>)
    at /usr/src/sys/kern/uipc_sockbuf.c:985
#4  0xffffffff80aaedbd in tcp_output (tp=0xfffffe0ba665a3d0) at /usr/src/sys/netinet/tcp_output.c:954
#5  0xffffffff80abc555 in tcp_usr_send (so=0xfffffe0150582550, flags=0, m=0xfffffe015043b000, nam=0x0, control=<value optimized out>, td=0xfffffe0021d91920)
    at /usr/src/sys/netinet/tcp_usrreq.c:874
#6  0xffffffff8097c1f6 in sosend_generic (so=0xfffffe0150582550, addr=0x0, uio=0xffffff90c310d990, top=0xfffffe015043b000, control=0x0, flags=<value optimized out>, 
    td=0xfffffe0021d91920) at /usr/src/sys/kern/uipc_socket.c:1376
#7  0xffffffff8095ea6e in soo_write (fp=<value optimized out>, uio=0xffffff90c310d990, active_cred=<value optimized out>, flags=<value optimized out>, 
    td=<value optimized out>) at /usr/src/sys/kern/sys_socket.c:102
#8  0xffffffff80957195 in dofilewrite (td=0xfffffe0021d91920, fd=3, fp=0xfffffe00216fedc0, auio=0xffffff90c310d990, offset=<value optimized out>, flags=0) at file.h:295
#9  0xffffffff809574cc in kern_writev (td=0xfffffe0021d91920, fd=3, auio=0xffffff90c310d990) at /usr/src/sys/kern/sys_generic.c:477
#10 0xffffffff80957554 in sys_write (td=<value optimized out>, uap=<value optimized out>) at /usr/src/sys/kern/sys_generic.c:393
#11 0xffffffff80cfea4a in amd64_syscall (td=0xfffffe0021d91920, traced=0) at subr_syscall.c:135
#12 0xffffffff80ce8ac7 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:391
#13 0x0000000802da3bec in ?? ()

(kgdb) p *(struct sockbuf *)0xfffffe01505826d0
$1 = {sb_sel = {si_tdlist = {tqh_first = 0x0, tqh_last = 0x0}, si_note = {kl_list = {slh_first = 0x0}, kl_lock = 0xffffffff808cd0c0 <knlist_mtx_lock>, 
      kl_unlock = 0xffffffff808cd090 <knlist_mtx_unlock>, kl_assert_locked = 0xffffffff808c9a10 <knlist_mtx_assert_locked>, 
      kl_assert_unlocked = 0xffffffff808c9a20 <knlist_mtx_assert_unlocked>, kl_lockarg = 0xfffffe0150582718}, si_mtx = 0x0}, sb_mtx = {lock_object = {
      lo_name = 0xffffffff80f3e7fd "so_snd", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 18446741875254171936}, sb_sx = {lock_object = {
      lo_name = 0xffffffff80f3ed6b "so_snd_sx", lo_flags = 36896768, lo_data = 0, lo_witness = 0x0}, sx_lock = 18446741875254171936}, sb_state = 0, 
  sb_mb = 0xfffffe015043b000, sb_mbtail = 0xfffffe015043b000, sb_lastrecord = 0xfffffe015043b000, sb_sndptr = 0x0, sb_sndptroff = 72, sb_cc = 84, sb_hiwat = 131376, 
  sb_mbcnt = 256, sb_mcnt = 1, sb_ccnt = 0, sb_mbmax = 1051008, sb_ctl = 0, sb_lowat = 2048, sb_timeo = 0, sb_flags = 2048, sb_upcall = 0, sb_upcallarg = 0x0}
(kgdb) p *(struct mbuf *)0xfffffe015043b000
$2 = {m_hdr = {mh_next = 0x0, mh_nextpkt = 0x0, mh_data = 0xfffffe015043b068 "�\223ykt?--�L\213\203)�>=�\230�\227", mh_len = 72, mh_flags = 2, mh_type = 1, 
    pad = "\000\000\000\000\000"}, M_dat = {MH = {MH_pkthdr = {rcvif = 0x0, header = 0x0, len = 0, flowid = 0, csum_flags = 0, csum_data = 0, tso_segsz = 0, PH_vt = {
          vt_vtag = 0, vt_nrecs = 0}, tags = {slh_first = 0x0}}, MH_dat = {MH_ext = {ext_buf = 0x1a17cf9646fc61e6 <Address 0x1a17cf9646fc61e6 out of bounds>, 
          ext_free = 0x401596cd1b508c86, ext_arg1 = 0x2d2d3f746b7993d6, ext_arg2 = 0x3d3ee229838b4cb5, ext_size = 2544539875, ref_cnt = 0x1040000b806022a, 
          ext_type = -11722238}, 
        MH_databuf = "�a�F\226�\027\032\206\214P\033�\226\025@�\223ykt?--�L\213\203)�>=�\230�\227\000\000\001R*\002\006�\000\000\004\001\002\"M��P�/\000\026�2\030��U?\003\203�\200\030\b\004��\000\000\001\001\b\nA\022�l\v��\b\000\000\000 ���\034R\\\002tn�b!�\202(}\177Y���\005#j�n�\024\232\224\004}�Rq��\203�\001��맠��38.137.198 via vlan802\b\210\220\215\b\210\216"}}, 
    M_databuf = '\0' <repeats 36 times>, "\033�d\237\000\000\000\000\000\000\000\000�a�F\226�\027\032\206\214P\033�\226\025@�\223ykt?--�L\213\203)�>=�\230�\227\000\000\001R*\002\006�\000\000\004\001\002\"M��P�/\000\026�2\030��U?\003\203�\200\030\b\004��\000\000\001\001\b\nA\022�l\v��\b\000\000\000 ���\034R\\\002tn�b!�\202(}\177Y���\005#j�n�\024\232\224\004}�Rq��\203�\001��맠��38.137.198 via vlan802\b\210\220\215\b\210\216"}}

(kgdb) f 6
#6  0xffffffff8097c1f6 in sosend_generic (so=0xfffffe0150582550, addr=0x0, uio=0xffffff90c310d990, top=0xfffffe015043b000, control=0x0, flags=<value optimized out>, 
    td=0xfffffe0021d91920) at /usr/src/sys/kern/uipc_socket.c:1376
1376				error = (*so->so_proto->pr_usrreqs->pru_send)(so,
(kgdb) p *so
$3 = {so_count = 1, so_type = 1, so_options = 12, so_linger = 0, so_state = 258, so_qstate = 0, so_pcb = 0xfffffe0ba677aaf0, so_vnet = 0x0, 
  so_proto = 0xffffffff8143c3f0, so_head = 0x0, so_incomp = {tqh_first = 0x0, tqh_last = 0x0}, so_comp = {tqh_first = 0x0, tqh_last = 0x0}, so_list = {tqe_next = 0x0, 
    tqe_prev = 0xfffffe01505862e8}, so_qlen = 0, so_incqlen = 0, so_qlimit = 0, so_timeo = 0, so_error = 0, so_sigio = 0x0, so_oobmark = 0, so_aiojobq = {
    tqh_first = 0x0, tqh_last = 0xfffffe01505825d0}, so_rcv = {sb_sel = {si_tdlist = {tqh_first = 0x0, tqh_last = 0xfffffe01505825e0}, si_note = {kl_list = {
          slh_first = 0x0}, kl_lock = 0xffffffff808cd0c0 <knlist_mtx_lock>, kl_unlock = 0xffffffff808cd090 <knlist_mtx_unlock>, 
        kl_assert_locked = 0xffffffff808c9a10 <knlist_mtx_assert_locked>, kl_assert_unlocked = 0xffffffff808c9a20 <knlist_mtx_assert_unlocked>, 
        kl_lockarg = 0xfffffe0150582628}, si_mtx = 0xffffff800e02f2f0}, sb_mtx = {lock_object = {lo_name = 0xffffffff80f3e7f6 "so_rcv", lo_flags = 16973824, 
        lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, sb_sx = {lock_object = {lo_name = 0xffffffff80f3ed75 "so_rcv_sx", lo_flags = 36896768, lo_data = 0, 
        lo_witness = 0x0}, sx_lock = 1}, sb_state = 0, sb_mb = 0x0, sb_mbtail = 0x0, sb_lastrecord = 0x0, sb_sndptr = 0x0, sb_sndptroff = 0, sb_cc = 0, 
    sb_hiwat = 131376, sb_mbcnt = 0, sb_mcnt = 0, sb_ccnt = 0, sb_mbmax = 1051008, sb_ctl = 0, sb_lowat = 1, sb_timeo = 0, sb_flags = 2056, sb_upcall = 0, 
    sb_upcallarg = 0x0}, so_snd = {sb_sel = {si_tdlist = {tqh_first = 0x0, tqh_last = 0x0}, si_note = {kl_list = {slh_first = 0x0}, 
        kl_lock = 0xffffffff808cd0c0 <knlist_mtx_lock>, kl_unlock = 0xffffffff808cd090 <knlist_mtx_unlock>, 
        kl_assert_locked = 0xffffffff808c9a10 <knlist_mtx_assert_locked>, kl_assert_unlocked = 0xffffffff808c9a20 <knlist_mtx_assert_unlocked>, 
        kl_lockarg = 0xfffffe0150582718}, si_mtx = 0x0}, sb_mtx = {lock_object = {lo_name = 0xffffffff80f3e7fd "so_snd", lo_flags = 16973824, lo_data = 0, 
        lo_witness = 0x0}, mtx_lock = 18446741875254171936}, sb_sx = {lock_object = {lo_name = 0xffffffff80f3ed6b "so_snd_sx", lo_flags = 36896768, lo_data = 0, 
        lo_witness = 0x0}, sx_lock = 18446741875254171936}, sb_state = 0, sb_mb = 0xfffffe015043b000, sb_mbtail = 0xfffffe015043b000, 
    sb_lastrecord = 0xfffffe015043b000, sb_sndptr = 0x0, sb_sndptroff = 72, sb_cc = 84, sb_hiwat = 131376, sb_mbcnt = 256, sb_mcnt = 1, sb_ccnt = 0, sb_mbmax = 1051008, 
    sb_ctl = 0, sb_lowat = 2048, sb_timeo = 0, sb_flags = 2048, sb_upcall = 0, sb_upcallarg = 0x0}, so_cred = 0xfffffe0150293100, so_label = 0x0, so_peerlabel = 0x0, 
  so_gencnt = 172891, so_emuldata = 0x0, so_accf = 0x0, so_fibnum = 0, so_user_cookie = 0}

(kgdb) set $inp=(struct inpcb*)so->so_pcb
(kgdb) p *$inp
$4 = {inp_hash = {le_next = 0x0, le_prev = 0xfffffe0012f570b8}, inp_pcbgrouphash = {le_next = 0x0, le_prev = 0x0}, inp_list = {le_next = 0xfffffe05c1d22c80, 
    le_prev = 0xffffffff81531050}, inp_ppcb = 0xfffffe0ba665a3d0, inp_pcbinfo = 0xffffffff81531060, inp_pcbgroup = 0x0, inp_pcbgroup_wild = {le_next = 0x0, 
    le_prev = 0x0}, inp_socket = 0xfffffe0150582550, inp_cred = 0xfffffe0150293100, inp_flow = 3067088640, inp_flags = 545300480, inp_flags2 = 0, inp_vflag = 6 '\006', 
  inp_ip_ttl = 64 '@', inp_ip_p = 0 '\0', inp_ip_minttl = 0 '\0', inp_flowid = 576216547, inp_refcount = 1, inp_pspare = {0x0, 0x0, 0x0, 0x0, 0x0}, inp_ispare = {0, 0, 
    0, 0, 0, 0}, inp_inc = {inc_flags = 1 '\001', inc_len = 0 '\0', inc_fibnum = 0, inc_ie = {ie_fport = 13015, ie_lport = 5632, ie_dependfaddr = {ie46_foreign = {
          ia46_pad32 = {3087401514, 17039360, 4283245058}, ia46_addr4 = {s_addr = 801984766}}, ie6_foreign = {__u6_addr = {
            __u6_addr8 = "*\002\006�\000\000\004\001\002\"M��P�/", __u6_addr16 = {554, 47110, 0, 260, 8706, 65357, 20734, 12237}, __u6_addr32 = {3087401514, 17039360, 
              4283245058, 801984766}}}}, ie_dependladdr = {ie46_local = {ia46_pad32 = {3087401514, 917504, 0}, ia46_addr4 = {s_addr = 1375797248}}, ie6_local = {
          __u6_addr = {__u6_addr8 = "*\002\006�\000\000\016\000\000\000\000\000\000\000\001R", __u6_addr16 = {554, 47110, 0, 14, 0, 0, 0, 20993}, __u6_addr32 = {
              3087401514, 917504, 0, 1375797248}}}}, ie6_zoneid = 0}}, inp_label = 0x0, inp_sp = 0x0, inp_depend4 = {inp4_ip_tos = 0 '\0', inp4_options = 0x0, 
    inp4_moptions = 0x0}, inp_depend6 = {inp6_options = 0x0, inp6_outputopts = 0xfffffe0b8794c100, inp6_moptions = 0x0, inp6_icmp6filt = 0x0, inp6_cksum = 0, 
    inp6_hops = -1}, inp_portlist = {le_next = 0xfffffe05c1d22c80, le_prev = 0xfffffe010b8830b0}, inp_phd = 0xfffffe010b8830a0, inp_gencnt = 2008, inp_lle = 0x0, 
  inp_rt = 0x0, inp_lock = {lock_object = {lo_name = 0xffffffff80f59235 "tcpinp", lo_flags = 90898432, lo_data = 0, lo_witness = 0x0}, rw_lock = 18446741875254171936}}

(kgdb) printf "%x\n", $inp->inp_flags
2080a000

(kgdb) set $inc=$inp->inp_inc
(kgdb) p $inc
$6 = {inc_flags = 1 '\001', inc_len = 0 '\0', inc_fibnum = 0, inc_ie = {ie_fport = 13015, ie_lport = 5632, ie_dependfaddr = {ie46_foreign = {ia46_pad32 = {3087401514, 
          17039360, 4283245058}, ia46_addr4 = {s_addr = 801984766}}, ie6_foreign = {__u6_addr = {__u6_addr8 = "*\002\006�\000\000\004\001\002\"M��P�/", __u6_addr16 = {
            554, 47110, 0, 260, 8706, 65357, 20734, 12237}, __u6_addr32 = {3087401514, 17039360, 4283245058, 801984766}}}}, ie_dependladdr = {ie46_local = {
        ia46_pad32 = {3087401514, 917504, 0}, ia46_addr4 = {s_addr = 1375797248}}, ie6_local = {__u6_addr = {
          __u6_addr8 = "*\002\006�\000\000\016\000\000\000\000\000\000\000\001R", __u6_addr16 = {554, 47110, 0, 14, 0, 0, 0, 20993}, __u6_addr32 = {3087401514, 917504, 
            0, 1375797248}}}}, ie6_zoneid = 0}}

(kgdb) printf "%04x\n", $inc.inc_ie->ie_lport
1600
(kgdb) printf "%d\n", 0x0016
22

Comment 13 Andrey V. Elsukov freebsd_committer

2015-02-02 19:44:07 UTC

Reassign to freebsd-net.

Comment 14 Andrey V. Elsukov freebsd_committer

2015-02-02 19:57:43 UTC

Second panic:

panic: sbsndptr: sockbuf 0xfffffe03e62b5c20 and mbuf 0xfffffe01d8fd3900 clashing
cpuid = 31
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame 0xffffff90d4fca430
kdb_backtrace() at kdb_backtrace+0x37/frame 0xffffff90d4fca4f0
panic() at panic+0x1ce/frame 0xffffff90d4fca5f0
sbsndptr() at sbsndptr+0xe4/frame 0xffffff90d4fca610
tcp_output() at tcp_output+0x16cd/frame 0xffffff90d4fca7c0
tcp_usr_send() at tcp_usr_send+0x325/frame 0xffffff90d4fca820
sosend_generic() at sosend_generic+0x3f6/frame 0xffffff90d4fca8c0
soo_write() at soo_write+0x5e/frame 0xffffff90d4fca8f0
dofilewrite() at dofilewrite+0x85/frame 0xffffff90d4fca940
kern_writev() at kern_writev+0x6c/frame 0xffffff90d4fca980
sys_write() at sys_write+0x64/frame 0xffffff90d4fca9d0
amd64_syscall() at amd64_syscall+0x5ea/frame 0xffffff90d4fcaaf0
Xfast_syscall() at Xfast_syscall+0xf7/frame 0xffffff90d4fcaaf0
--- syscall (4, FreeBSD ELF64, sys_write), rip = 0x802da3bec, rsp = 0x7fffffffdae8, rbp = 0x7fffffffdbf0 ---
Uptime: 1m48s
Dumping 3468 out of 65475 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /boot/kernel/zfs.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/zfs.ko
Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /boot/kernel/opensolaris.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/opensolaris.ko
Reading symbols from /boot/kernel/if_igb.ko...Reading symbols from /boot/kernel/if_igb.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/if_igb.ko
Reading symbols from /boot/kernel/aac.ko...Reading symbols from /boot/kernel/aac.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/aac.ko
Reading symbols from /boot/kernel/ipdivert.ko...Reading symbols from /boot/kernel/ipdivert.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/ipdivert.ko
Reading symbols from /boot/kernel/ipfw.ko...Reading symbols from /boot/kernel/ipfw.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/ipfw.ko
Reading symbols from /boot/kernel/t5fw_cfg.ko...Reading symbols from /boot/kernel/t5fw_cfg.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/t5fw_cfg.ko
Reading symbols from /boot/kernel/if_cxgbe.ko...Reading symbols from /boot/kernel/if_cxgbe.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/if_cxgbe.ko
Reading symbols from /boot/kernel/ipmi.ko...Reading symbols from /boot/kernel/ipmi.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/ipmi.ko
Reading symbols from /boot/kernel/smbus.ko...Reading symbols from /boot/kernel/smbus.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/smbus.ko
#0  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:271
271		if (textdump && textdump_pending) {
(kgdb) bt
#0  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:271
#1  0xffffffff80907eb4 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:454
#2  0xffffffff809083a7 in panic (fmt=0x1 <Address 0x1 out of bounds>) at /usr/src/sys/kern/kern_shutdown.c:642
#3  0xffffffff809766e4 in sbsndptr (sb=<value optimized out>, off=<value optimized out>, len=<value optimized out>, moff=<value optimized out>)
    at /usr/src/sys/kern/uipc_sockbuf.c:985
#4  0xffffffff80aaedbd in tcp_output (tp=0xfffffe03e675a3d0) at /usr/src/sys/netinet/tcp_output.c:954
#5  0xffffffff80abc555 in tcp_usr_send (so=0xfffffe03e62b5aa0, flags=0, m=0xfffffe01d8fd2200, nam=0x0, control=<value optimized out>, td=0xfffffe0021e90000)
    at /usr/src/sys/netinet/tcp_usrreq.c:874
#6  0xffffffff8097c1f6 in sosend_generic (so=0xfffffe03e62b5aa0, addr=0x0, uio=0xffffff90d4fca990, top=0xfffffe01d8fd2200, control=0x0, flags=<value optimized out>, 
    td=0xfffffe0021e90000) at /usr/src/sys/kern/uipc_socket.c:1376
#7  0xffffffff8095ea6e in soo_write (fp=<value optimized out>, uio=0xffffff90d4fca990, active_cred=<value optimized out>, flags=<value optimized out>, 
    td=<value optimized out>) at /usr/src/sys/kern/sys_socket.c:102
#8  0xffffffff80957195 in dofilewrite (td=0xfffffe0021e90000, fd=3, fp=0xfffffe0021cf3820, auio=0xffffff90d4fca990, offset=<value optimized out>, flags=0) at file.h:295
#9  0xffffffff809574cc in kern_writev (td=0xfffffe0021e90000, fd=3, auio=0xffffff90d4fca990) at /usr/src/sys/kern/sys_generic.c:477
#10 0xffffffff80957554 in sys_write (td=<value optimized out>, uap=<value optimized out>) at /usr/src/sys/kern/sys_generic.c:393
#11 0xffffffff80cfea4a in amd64_syscall (td=0xfffffe0021e90000, traced=0) at subr_syscall.c:135
#12 0xffffffff80ce8ac7 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:391
#13 0x0000000802da3bec in ?? ()
Previous frame inner to this frame (corrupt stack?)

(kgdb) p *(struct sockbuf *)0xfffffe03e62b5c20
$1 = {sb_sel = {si_tdlist = {tqh_first = 0x0, tqh_last = 0x0}, si_note = {kl_list = {slh_first = 0x0}, kl_lock = 0xffffffff808cd0c0 <knlist_mtx_lock>, 
      kl_unlock = 0xffffffff808cd090 <knlist_mtx_unlock>, kl_assert_locked = 0xffffffff808c9a10 <knlist_mtx_assert_locked>, 
      kl_assert_unlocked = 0xffffffff808c9a20 <knlist_mtx_assert_unlocked>, kl_lockarg = 0xfffffe03e62b5c68}, si_mtx = 0x0}, sb_mtx = {lock_object = {
      lo_name = 0xffffffff80f3e7fd "so_snd", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 18446741875255214080}, sb_sx = {lock_object = {
      lo_name = 0xffffffff80f3ed6b "so_snd_sx", lo_flags = 36896768, lo_data = 0, lo_witness = 0x0}, sx_lock = 18446741875255214080}, sb_state = 0, 
  sb_mb = 0xfffffe01f4069900, sb_mbtail = 0xfffffe01d8fd3900, sb_lastrecord = 0xfffffe01f4069900, sb_sndptr = 0xfffffe01d8fd3900, sb_sndptroff = 1632, sb_cc = 1716, 
  sb_hiwat = 131376, sb_mbcnt = 4864, sb_mcnt = 11, sb_ccnt = 1, sb_mbmax = 1051008, sb_ctl = 0, sb_lowat = 2048, sb_timeo = 0, sb_flags = 2048, sb_upcall = 0, 
  sb_upcallarg = 0x0}

(kgdb) p *(struct mbuf *)0xfffffe01d8fd3900
$2 = {m_hdr = {mh_next = 0x0, mh_nextpkt = 0x0, mh_data = 0xfffffe01d8fd3928 "", mh_len = 68, mh_flags = 0, mh_type = 1, pad = "\000\000\000\000\000"}, M_dat = {MH = {
      MH_pkthdr = {rcvif = 0xb1dee9e530000000, header = 0xf10fc01307aab916, len = -337628730, flowid = 2682375970, csum_flags = -966380398, csum_data = -1624117065, 
        tso_segsz = 11596, PH_vt = {vt_vtag = 31606, vt_nrecs = 31606}, tags = {slh_first = 0xa2b0a659a4311f25}}, MH_dat = {MH_ext = {
          ext_buf = 0x43772562c99aa431 <Address 0x43772562c99aa431 out of bounds>, ext_free = 0x7e1cffd9b6b13fc6, ext_arg1 = 0x731c9ab425536605, 
          ext_arg2 = 0xebc6cac44b21a941, ext_size = 520953289, ref_cnt = 0x5165381046dcad94, ext_type = 1308134978}, 
        MH_databuf = "1�\232�b%wC�?����\034~\005fS%�\232\034sA�!K�����\035\r\037Iܡq\224��F\0208eQB\216�M�P�/\000\026OS^Lq%�MY\212\200\030\b\004\021\000\000\000\001\001\b\n2�� \v��O\000\000\000 ��n�ٻ�Er\032S\201\220\220��I�\"\210\233\v\0223?=�*a|\231\001\022�6}�G�\026�\036z\n\023�<���B8�\200\000\000\000\000\000\000\002%\220���B8\001\003Ip\000\000\000"}}, 
    M_databuf = "\000\000\0000��ޱ\026��\a\023�\017��1��\"��\237\2224fƷ�1\237L-v{X�\235\214%\0371�Y���1�\232�b%wC�?����\034~\005fS%�\232\034sA�!K�����\035\r\037Iܡq\224��F\0208eQB\216�M�P�/\000\026OS^Lq%�MY\212\200\030\b\004\021\000\000\000\001\001\b\n2�� \v��O\000\000\000 ��n�ٻ�Er\032S\201\220\220��I�\"\210\233\v\0223?=�*a|\231\001\022�6}�G�\026�\036z\n\023�<���B8�\200\000\000\000\000\000\000"...}}

(kgdb) f 6
#6  0xffffffff8097c1f6 in sosend_generic (so=0xfffffe03e62b5aa0, addr=0x0, uio=0xffffff90d4fca990, top=0xfffffe01d8fd2200, control=0x0, flags=<value optimized out>, 
    td=0xfffffe0021e90000) at /usr/src/sys/kern/uipc_socket.c:1376
1376				error = (*so->so_proto->pr_usrreqs->pru_send)(so,
(kgdb) p *so
$3 = {so_count = 1, so_type = 1, so_options = 12, so_linger = 0, so_state = 258, so_qstate = 0, so_pcb = 0xfffffe03e678a640, so_vnet = 0x0, 
  so_proto = 0xffffffff8143c3f0, so_head = 0x0, so_incomp = {tqh_first = 0x0, tqh_last = 0x0}, so_comp = {tqh_first = 0x0, tqh_last = 0x0}, so_list = {tqe_next = 0x0, 
    tqe_prev = 0xfffffe01d8f96040}, so_qlen = 0, so_incqlen = 0, so_qlimit = 0, so_timeo = 0, so_error = 0, so_sigio = 0x0, so_oobmark = 0, so_aiojobq = {
    tqh_first = 0x0, tqh_last = 0xfffffe03e62b5b20}, so_rcv = {sb_sel = {si_tdlist = {tqh_first = 0x0, tqh_last = 0xfffffe03e62b5b30}, si_note = {kl_list = {
          slh_first = 0x0}, kl_lock = 0xffffffff808cd0c0 <knlist_mtx_lock>, kl_unlock = 0xffffffff808cd090 <knlist_mtx_unlock>, 
        kl_assert_locked = 0xffffffff808c9a10 <knlist_mtx_assert_locked>, kl_assert_unlocked = 0xffffffff808c9a20 <knlist_mtx_assert_unlocked>, 
        kl_lockarg = 0xfffffe03e62b5b78}, si_mtx = 0xffffff800e02f670}, sb_mtx = {lock_object = {lo_name = 0xffffffff80f3e7f6 "so_rcv", lo_flags = 16973824, 
        lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, sb_sx = {lock_object = {lo_name = 0xffffffff80f3ed75 "so_rcv_sx", lo_flags = 36896768, lo_data = 0, 
        lo_witness = 0x0}, sx_lock = 1}, sb_state = 0, sb_mb = 0x0, sb_mbtail = 0x0, sb_lastrecord = 0x0, sb_sndptr = 0x0, sb_sndptroff = 0, sb_cc = 0, 
    sb_hiwat = 131376, sb_mbcnt = 0, sb_mcnt = 0, sb_ccnt = 0, sb_mbmax = 1051008, sb_ctl = 0, sb_lowat = 1, sb_timeo = 0, sb_flags = 2056, sb_upcall = 0, 
    sb_upcallarg = 0x0}, so_snd = {sb_sel = {si_tdlist = {tqh_first = 0x0, tqh_last = 0x0}, si_note = {kl_list = {slh_first = 0x0}, 
        kl_lock = 0xffffffff808cd0c0 <knlist_mtx_lock>, kl_unlock = 0xffffffff808cd090 <knlist_mtx_unlock>, 
        kl_assert_locked = 0xffffffff808c9a10 <knlist_mtx_assert_locked>, kl_assert_unlocked = 0xffffffff808c9a20 <knlist_mtx_assert_unlocked>, 
        kl_lockarg = 0xfffffe03e62b5c68}, si_mtx = 0x0}, sb_mtx = {lock_object = {lo_name = 0xffffffff80f3e7fd "so_snd", lo_flags = 16973824, lo_data = 0, 
        lo_witness = 0x0}, mtx_lock = 18446741875255214080}, sb_sx = {lock_object = {lo_name = 0xffffffff80f3ed6b "so_snd_sx", lo_flags = 36896768, lo_data = 0, 
        lo_witness = 0x0}, sx_lock = 18446741875255214080}, sb_state = 0, sb_mb = 0xfffffe01f4069900, sb_mbtail = 0xfffffe01d8fd3900, 
    sb_lastrecord = 0xfffffe01f4069900, sb_sndptr = 0xfffffe01d8fd3900, sb_sndptroff = 1632, sb_cc = 1716, sb_hiwat = 131376, sb_mbcnt = 4864, sb_mcnt = 11, 
    sb_ccnt = 1, sb_mbmax = 1051008, sb_ctl = 0, sb_lowat = 2048, sb_timeo = 0, sb_flags = 2048, sb_upcall = 0, sb_upcallarg = 0x0}, so_cred = 0xfffffe01f48ce900, 
  so_label = 0x0, so_peerlabel = 0x0, so_gencnt = 13244, so_emuldata = 0x0, so_accf = 0x0, so_fibnum = 0, so_user_cookie = 0}

(kgdb) set $inp=(struct inpcb *)so->so_pcb
(kgdb) p *$inp
$4 = {inp_hash = {le_next = 0x0, le_prev = 0xfffffe0012f573b0}, inp_pcbgrouphash = {le_next = 0x0, le_prev = 0x0}, inp_list = {le_next = 0xfffffe03e679bc80, 
    le_prev = 0xfffffe03e6743020}, inp_ppcb = 0xfffffe03e675a3d0, inp_pcbinfo = 0xffffffff81531060, inp_pcbgroup = 0x0, inp_pcbgroup_wild = {le_next = 0x0, 
    le_prev = 0x0}, inp_socket = 0xfffffe03e62b5aa0, inp_cred = 0xfffffe01f48ce900, inp_flow = 3457486592, inp_flags = 545300480, inp_flags2 = 0, inp_vflag = 6 '\006', 
  inp_ip_ttl = 64 '@', inp_ip_p = 0 '\0', inp_ip_minttl = 0 '\0', inp_flowid = 1779132015, inp_refcount = 1, inp_pspare = {0x0, 0x0, 0x0, 0x0, 0x0}, inp_ispare = {0, 0, 
    0, 0, 0, 0}, inp_inc = {inc_flags = 1 '\001', inc_len = 0 '\0', inc_fibnum = 0, inc_ie = {ie_fport = 21327, ie_lport = 5632, ie_dependfaddr = {ie46_foreign = {
          ia46_pad32 = {3087401514, 17039360, 4283245058}, ia46_addr4 = {s_addr = 801984766}}, ie6_foreign = {__u6_addr = {
            __u6_addr8 = "*\002\006�\000\000\004\001\002\"M��P�/", __u6_addr16 = {554, 47110, 0, 260, 8706, 65357, 20734, 12237}, __u6_addr32 = {3087401514, 17039360, 
              4283245058, 801984766}}}}, ie_dependladdr = {ie46_local = {ia46_pad32 = {3087401514, 917504, 0}, ia46_addr4 = {s_addr = 1375797248}}, ie6_local = {
          __u6_addr = {__u6_addr8 = "*\002\006�\000\000\016\000\000\000\000\000\000\000\001R", __u6_addr16 = {554, 47110, 0, 14, 0, 0, 0, 20993}, __u6_addr32 = {
              3087401514, 917504, 0, 1375797248}}}}, ie6_zoneid = 0}}, inp_label = 0x0, inp_sp = 0x0, inp_depend4 = {inp4_ip_tos = 0 '\0', inp4_options = 0x0, 
    inp4_moptions = 0x0}, inp_depend6 = {inp6_options = 0x0, inp6_outputopts = 0xfffffe0013424500, inp6_moptions = 0x0, inp6_icmp6filt = 0x0, inp6_cksum = 0, 
    inp6_hops = -1}, inp_portlist = {le_next = 0xfffffe03e6d8f640, le_prev = 0xfffffe03e6743140}, inp_phd = 0xfffffe03e6dfa540, inp_gencnt = 1509, inp_lle = 0x0, 
  inp_rt = 0x0, inp_lock = {lock_object = {lo_name = 0xffffffff80f59235 "tcpinp", lo_flags = 90898432, lo_data = 0, lo_witness = 0x0}, rw_lock = 18446741875255214080}}

Comment 15 Hiren Panchasara freebsd_committer

2015-03-18 17:58:17 UTC

We saw this panic on stable10 from Jan.

Dump header from device /dev/da0s1b
  Architecture: amd64
  Architecture Version: 2
  Dump Length: 1050050560B (1001 MB)
  Blocksize: 512
  Dumptime: Wed Mar 11 23:12:59 2015
  Hostname: xxxxxxxxxxxxxxxxxx
  Magic: FreeBSD Kernel Dump
  Version String: FreeBSD 10.1-STABLE-llnw10 #0: Fri Jan 16 00:19:27 MST 2015
    xxxx@xxxxxxxxxxx:/usr/obj/usr/src/sys/SIXFOUR
  Panic String: sbsndptr: sockbuf 0xfffff802d3e79440 and mbuf 0xfffff80238089b00 clashing
  Dump Parity: 747892521
  Bounds: 0
  Dump Status: good

(kgdb) #0  doadump (textdump=1) at pcpu.h:219
#1  0xffffffff8072d397 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:452
#2  0xffffffff8072d774 in panic (fmt=<value optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:759
#3  0xffffffff807a38a0 in sbsndptr (sb=<value optimized out>, 
    off=<value optimized out>, len=<value optimized out>, 
    moff=<value optimized out>) at /usr/src/sys/kern/uipc_sockbuf.c:1011
#4  0xffffffff80895f1f in tcp_output (tp=0xfffff802d5315400)
    at /usr/src/sys/netinet/tcp_output.c:1092
#5  0xffffffff80891685 in tcp_do_segment (m=0xfffff80128e51d00, 
    th=0xfffff8020ca95022, so=0xfffff802d3e792b8, tp=0xfffff802d5315400, 
    drop_hdrlen=<value optimized out>, tlen=0, iptos=<value optimized out>, 
    ti_locked=-1) at /usr/src/sys/netinet/tcp_input.c:2729
#6  0xffffffff8088f54d in tcp_input (m=<value optimized out>, 
    off0=<value optimized out>) at /usr/src/sys/netinet/tcp_input.c:1388
#7  0xffffffff808216b7 in ip_input (m=0xfffff80128e51d00)
    at /usr/src/sys/netinet/ip_input.c:734
#8  0xffffffff807fbc92 in netisr_dispatch_src (proto=<value optimized out>, 
    source=<value optimized out>, m=0x0) at /usr/src/sys/net/netisr.c:972
#9  0xffffffff807f4656 in ether_demux (ifp=<value optimized out>, 
    m=0xfffff80128e51d00) at /usr/src/sys/net/if_ethersubr.c:851
#10 0xffffffff807f52e9 in ether_nh_input (m=<value optimized out>)
    at /usr/src/sys/net/if_ethersubr.c:646
#11 0xffffffff807fbc92 in netisr_dispatch_src (proto=<value optimized out>, 
    source=<value optimized out>, m=0x0) at /usr/src/sys/net/netisr.c:972
#12 0xffffffff8042332b in em_rxeof (count=94)
    at /usr/src/sys/dev/e1000/if_em.c:4532
#13 0xffffffff80423703 in em_msix_rx (arg=0xfffff80003ce6600)
    at /usr/src/sys/dev/e1000/if_em.c:1600
#14 0xffffffff806fe0fb in intr_event_execute_handlers (
    p=<value optimized out>, ie=0xfffff80003d04200)
    at /usr/src/sys/kern/kern_intr.c:1264
#15 0xffffffff806fea96 in ithread_loop (arg=0xfffff80003cfb3c0)
    at /usr/src/sys/kern/kern_intr.c:1277
#16 0xffffffff806fbd1a in fork_exit (
    callout=0xffffffff806fea00 <ithread_loop>, arg=0xfffff80003cfb3c0, 
    frame=0xfffffe03438d7c00) at /usr/src/sys/kern/kern_fork.c:1017
#17 0xffffffff80acdf5e in fork_trampoline ()
    at /usr/src/sys/amd64/amd64/exception.S:611
#18 0x0000000000000000 in ?? ()
Current language:  auto; currently minimal
(kgdb) 

Unfortunately I do not have a crashdump to investigate further.

Comment 16 sebastian.huber 2015-04-10 11:11:22 UTC

I see this problem also with the port of the FreeBSD 9-stable (2015-04-09) network stack to the RTEMS real-time operating system (Altera Cyclone V target).  It occurs quite frequently with multiple concurrent IPv4 TCP transfers to /dev/null and from /dev/zero.

The stack trace is:

000|panic(fmt = 0x00636E14)
001|sbsndptr(?, ?, ?, ?)
002|tcp_output(?)
003|tcp_do_segment(?, ?, so = 0x3FF3D7C0, tp = 0x3FF2F828, drop_hdrlen = 52, tlen = 0, iptos = 0, ?)
004|tcp_input(m = 0x3FF21700, ?)
005|ip_input(m = 0x3FF21700)
006|netisr_dispatch_src(?, ?, ?)
007|ether_demux(ifp = 0x006DFEE0, m = 0x3FF21700)
008|ether_nh_input(?)
009|netisr_dispatch_src(?, ?, ?)
010|dwc_rxfinish_locked(inline)
010|dwc_intr(arg = 0x006DEA20)
011|bsp_interrupt_server_task(?)
012|Thread_Handler()
---|end of frame

Comment 17 Robert Watson freebsd_committer

2015-04-10 11:33:31 UTC

sbdrop() performs invariant checks as it tears down a socket buffer on final close -- originally intended to validate a set of values cached by the socket buffer that could (in the presence of a socket-buffer bug) get out of sync with the chain stored there.  However, these checks have proven something of a 'canary' for many possible underlying bugs involving mbuf chains and socket buffers.  I've seen the panics most frequently in the presence of device-driver concurrency bugs -- e.g., in which a driver makes changes to the mbuf chain after handing the mbuf off to the network stack via netisr, for example, or involving improper freeing of an mbuf by other code while it remains referenced by a socket buffer.  Others have spotted them in the presence of other classes of network-stack race conditions -- most involving a failure to have a single thread or object own an mbuf.  As such, seeing this panic is a symptom of many possible underlying problems and hence not a specific 'bug' per se.

However, as a useful rule of thumb: when I spot this panic, I look first at the device driver to make sure that there is no possible use of mbuf after it is passed as an argument to netisr.

Comment 18 sebastian.huber 2015-04-10 11:47:23 UTC

(In reply to Robert Watson from comment #17)

Thanks for the warning.  The driver looks all right.  I update yesterday from FreeBSD 9.3 to 9-stable 2015-04-09 since previously I had other problems.  For example a NULL pointer dereference in tcp_reass() (I guess the temporary stack element remained on the list and was overwritten in a second call) and a corruption of a UMA keg (mbuf_packet zone).

Comment 19 sebastian.huber 2015-05-22 09:48:36 UTC

(In reply to sebastian.huber from comment #18)
I found one problem with the driver.  In the RTEMS port of the network stack I don't use the BUS_DMA(9) support and instead directly use cache invalidate/flush routines (the Altera Cyclone V has no automatic cache coherency between the Ethernet module and the processors).  In the receive path it was done like this:

invalidate buffer (mcluster)
register buffer in receive DMA descriptor
...
DMA done
hand over buffer to interface input

It seems that due to a cache line prefetch sometimes cache lines of the buffer are loaded to the cache after the invalidate, but before the receive DMA completed its transfers.

I changed the sequence like this:

invalidate buffer (mcluster)
register buffer in receive DMA descriptor
...
DMA done
invalidate buffer
hand over buffer to interface input

Now it works very stable and I didn't observe a mbuf or socketbuf corruption any more.

So as an off hand guess it seems in case the network stack is presented with partially invalid data (previously received IP and TCP headers mixed with new data most likely), then this could lead to a crash in the TCP input processing.

Comment 20 Palle Girgensohn freebsd_committer

2015-09-30 12:06:30 UTC

Hi, 

This seems related? A GENERIC kernel binary installed using freebsd-update, so I have no WITNESS or KGDB, only standard vanilla kernel.

I have core dumps (2) and a complete core.txt if anyone is interested.

# ifconfig -l 
oce0 oce1 lo0

# freebsd-version -k 
10.1-RELEASE-p19

# cat /var/crash/core.txt.1
...
Thu Aug 27 15:04:09 CEST 2015

FreeBSD kurs-ap-01 10.1-RELEASE-p10 FreeBSD 10.1-RELEASE-p10 #0: Wed May 13 06:54:13 UTC 2015     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

panic: sbsndptr: sockbuf 0xfffff80312126c68 and mbuf 0xfffff800b4a36800 clashing

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
panic: sbsndptr: sockbuf 0xfffff80312126c68 and mbuf 0xfffff800b4a36800 clashing
cpuid = 1
KDB: stack backtrace:
#0 0xffffffff80963000 at kdb_backtrace+0x60
#1 0xffffffff80928125 at panic+0x155
#2 0xffffffff8099c180 at sbdroprecord_locked+0
#3 0xffffffff80ac8c9c at tcp_output+0xdbc
#4 0xffffffff80ac6a95 at tcp_do_segment+0x3045
#5 0xffffffff80ac2e04 at tcp_input+0xd04
#6 0xffffffff80a54fc7 at ip_input+0x97
#7 0xffffffff809f4f73 at swi_net+0x143
#8 0xffffffff808faf4b at intr_event_execute_handlers+0xab
#9 0xffffffff808fb396 at ithread_loop+0x96
#10 0xffffffff808f8b6a at fork_exit+0x9a
#11 0xffffffff80d0b67e at fork_trampoline+0xe
Uptime: 21d0h54m53s
Dumping 2005 out of 32709 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

Reading symbols from /boot/kernel/accf_data.ko.symbols...done.
Loaded symbols for /boot/kernel/accf_data.ko.symbols
Reading symbols from /boot/kernel/accf_http.ko.symbols...done.
Loaded symbols for /boot/kernel/accf_http.ko.symbols
Reading symbols from /boot/kernel/oce.ko.symbols...done.
Loaded symbols for /boot/kernel/oce.ko.symbols
Reading symbols from /boot/kernel/nullfs.ko.symbols...done.
[root@kurs-ap-01 /home/girgen]# head -n 300 /var/crash/core.txt.1 
kurs-ap-01 dumped core - see /var/crash/vmcore.1

Thu Aug 27 15:04:09 CEST 2015

FreeBSD kurs-ap-01 10.1-RELEASE-p10 FreeBSD 10.1-RELEASE-p10 #0: Wed May 13 06:54:13 UTC 2015     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

panic: sbsndptr: sockbuf 0xfffff80312126c68 and mbuf 0xfffff800b4a36800 clashing

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
panic: sbsndptr: sockbuf 0xfffff80312126c68 and mbuf 0xfffff800b4a36800 clashing
cpuid = 1
KDB: stack backtrace:
#0 0xffffffff80963000 at kdb_backtrace+0x60
#1 0xffffffff80928125 at panic+0x155
#2 0xffffffff8099c180 at sbdroprecord_locked+0
#3 0xffffffff80ac8c9c at tcp_output+0xdbc
#4 0xffffffff80ac6a95 at tcp_do_segment+0x3045
#5 0xffffffff80ac2e04 at tcp_input+0xd04
#6 0xffffffff80a54fc7 at ip_input+0x97
#7 0xffffffff809f4f73 at swi_net+0x143
#8 0xffffffff808faf4b at intr_event_execute_handlers+0xab
#9 0xffffffff808fb396 at ithread_loop+0x96
#10 0xffffffff808f8b6a at fork_exit+0x9a
#11 0xffffffff80d0b67e at fork_trampoline+0xe
Uptime: 21d0h54m53s
Dumping 2005 out of 32709 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

Reading symbols from /boot/kernel/accf_data.ko.symbols...done.
Loaded symbols for /boot/kernel/accf_data.ko.symbols
Reading symbols from /boot/kernel/accf_http.ko.symbols...done.
Loaded symbols for /boot/kernel/accf_http.ko.symbols
Reading symbols from /boot/kernel/oce.ko.symbols...done.
Loaded symbols for /boot/kernel/oce.ko.symbols
Reading symbols from /boot/kernel/nullfs.ko.symbols...done.
Loaded symbols for /boot/kernel/nullfs.ko.symbols
Reading symbols from /boot/kernel/linprocfs.ko.symbols...done.
Loaded symbols for /boot/kernel/linprocfs.ko.symbols
Reading symbols from /boot/kernel/linux.ko.symbols...done.
Loaded symbols for /boot/kernel/linux.ko.symbols
Reading symbols from /boot/kernel/zfs.ko.symbols...done.
Loaded symbols for /boot/kernel/zfs.ko.symbols
Reading symbols from /boot/kernel/opensolaris.ko.symbols...done.
Loaded symbols for /boot/kernel/opensolaris.ko.symbols
#0  doadump (textdump=<value optimized out>) at pcpu.h:219
219	pcpu.h: No such file or directory.
	in pcpu.h
(kgdb) #0  doadump (textdump=<value optimized out>) at pcpu.h:219
#1  0xffffffff80927da2 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:452
#2  0xffffffff80928164 in panic (fmt=<value optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:759
#3  0xffffffff8099c180 in sbsndptr (sb=<value optimized out>, 
    off=<value optimized out>, len=<value optimized out>, 
    moff=<value optimized out>) at /usr/src/sys/kern/uipc_sockbuf.c:1011
#4  0xffffffff80ac8c9c in tcp_output (tp=0xfffff80312ef5800)
    at /usr/src/sys/netinet/tcp_output.c:870
#5  0xffffffff80ac6a95 in tcp_do_segment (m=<value optimized out>, 
    th=<value optimized out>, so=<value optimized out>, 
    tp=<value optimized out>, drop_hdrlen=<value optimized out>, tlen=0, 
    iptos=<value optimized out>, ti_locked=Cannot access memory at address 0x1
)
    at /usr/src/sys/netinet/tcp_input.c:3018
#6  0xffffffff80ac2e04 in tcp_input (m=<value optimized out>, 
    off0=<value optimized out>) at /usr/src/sys/netinet/tcp_input.c:1377
#7  0xffffffff80a54fc7 in ip_input (m=0xfffff800b4516600)
    at /usr/src/sys/netinet/ip_input.c:734
#8  0xffffffff809f4f73 in swi_net (arg=0xffffffff81988880)
    at /usr/src/sys/net/netisr.c:765
#9  0xffffffff808faf4b in intr_event_execute_handlers (
    p=<value optimized out>, ie=0xfffff800093ac600)
    at /usr/src/sys/kern/kern_intr.c:1263
#10 0xffffffff808fb396 in ithread_loop (arg=0xfffff80009388e40)
    at /usr/src/sys/kern/kern_intr.c:1276
#11 0xffffffff808f8b6a in fork_exit (
    callout=0xffffffff808fb300 <ithread_loop>, arg=0xfffff80009388e40, 
    frame=0xfffffe083c3e3ac0) at /usr/src/sys/kern/kern_fork.c:996
#12 0xffffffff80d0b67e in fork_trampoline ()
    at /usr/src/sys/amd64/amd64/exception.S:606
#13 0x0000000000000000 in ?? ()
Current language:  auto; currently minimal
(kgdb)

Comment 21 g_amanakis 2016-02-21 16:53:54 UTC

This one persists on 10.2-STABLE.

Comment 22 g_amanakis 2016-02-22 03:31:32 UTC

Applying the patch at "https://reviews.freebsd.org/D5330" on 10.2-STABLE with ipfw+nat results in this bug when I attempt to lower the mtu on the WAN-interface from 1500 with "route change default -mtu 1196". The mtu of the LAN-interface is set at 1500.

Comment 23 Hiren Panchasara freebsd_committer

2016-02-28 16:57:33 UTC

This bug still persists in latest stable/10. We are seeing this mainly in case of em(4). Any hints on how to debug this further would be great.

Comment 24 Hiren Panchasara freebsd_committer

2016-02-28 16:58:46 UTC

D5330 is not related as far as I can tell.

Comment 25 Steve Wills freebsd_committer

2016-08-11 15:07:19 UTC

I just hit this panic on 11-CURRENT r298999. Box wasn't particularly busy, but had been up a while. I am using em NIC. Any further info I can provide? Should I test the patch?

Comment 26 Kubilay Kocak freebsd_committer

2016-08-11 15:49:04 UTC

Update mfc flags and summary to reflect latest information.

@Hiren, is this likely to affect 11.0-R ? If so, please cc re@ so they're aware and can track

Comment 27 Mike Andrews 2016-08-11 19:23:20 UTC

For what it's worth, the original problem I reported in this PR from 6 years ago is (for me anyway) long-ago solved... but then we're on 10.3 these days and are slowly rolling 11-beta out.  Haven't seen a panic in quite a while.

Comment 28 emz 2016-10-07 07:00:05 UTC

Got this right now on an update from 10.2-STABLE to 11.0-PRERELEASE. Persistent in 11.0-RC3. Repeatable in like 5-12 minutes. 25 minutes is an absolute record.

panic: sbsndptr: sockbuf 0xfffff8003eea31b8 and mbuf 0xfffff80020a6e700 clashing

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
panic: sbsndptr: sockbuf 0xfffff8003eea31b8 and mbuf 0xfffff80020a6e700 clashing
cpuid = 1
KDB: stack backtrace:
#0 0xffffffff80b1d0c7 at kdb_backtrace+0x67
#1 0xffffffff80ad1f62 at vpanic+0x182
#2 0xffffffff80ad1dd3 at panic+0x43
#3 0xffffffff80b6a15a at sbsndptr+0xda
#4 0xffffffff80cfcbb4 at tcp_output+0xf34
#5 0xffffffff80cf9a81 at tcp_do_segment+0x2ce1
#6 0xffffffff80cf60cc at tcp_input+0xd1c
#7 0xffffffff80c66dbf at ip_input+0x15f
#8 0xffffffff80bfc295 at netisr_dispatch_src+0xa5
#9 0xffffffff80be4cea at ether_demux+0x12a
#10 0xffffffff80be5942 at ether_nh_input+0x322
#11 0xffffffff80bfc295 at netisr_dispatch_src+0xa5
#12 0xffffffff80be4f66 at ether_input+0x26
#13 0xffffffff80bed9db at vlan_input+0x1cb
#14 0xffffffff80be4c55 at ether_demux+0x95
#15 0xffffffff80be5942 at ether_nh_input+0x322
#16 0xffffffff80bfc295 at netisr_dispatch_src+0xa5
#17 0xffffffff80be4f66 at ether_input+0x26

Comment 29 Robert Watson freebsd_committer

2016-10-07 08:04:07 UTC

Just a quick comment in light of recent notes on this PR: the panic being seen is as a result of a kernel self-check that occurs on socket close, and likely reports on a bug that triggered some substantial time earlier (milliseconds, seconds, minutes, hours, days, or even weeks earlier), and reports on a class of problems rather than detecting a specific bug. It's entirely likely that the problem reported more recently is not the same bug as those reported previously with the same panic message -- rather, a similar bug with the same kernel self-check detecting it.

In the past, this self-check has most frequently fired as a result of either bugs in the socket-buffer code (although I think none recently), or device-driver bugs involving modifications to the mbuf chain after submitting the mbuf to the network stack (e.g., due to concurrency bugs in the device driver). It can also occur in use-after-free scenarios, as a result of protocol bugs, etc.

On the whole, my intuition is towards a device-driver bug based on past experience. Could you paste in the output of "dmesg" and "ifconfig -a" from the host to give a bit more information on its configuration?

Comment 30 emz 2016-10-07 10:09:36 UTC

Follow-up: RC3 was installed incorrectly (i.e. not installed at all). After proper RC3 downgrade (r305786) seems like server is at least more stable - it runs for more than an hour. On 11.0-PRE (306739) panics were happening in like 3 to 5 minutes.

I have a handful of cores, in case someone needs them.

As about the driver. This was a HP DL160 g6 I guess and the driver was igb(4). Now it's the Supermicro board (tech team switched the drives to a new chassis to exclude possible hardware problems) and the ifconfig/dmesg.boot are as folows (the driver is still an igb(4)), and the dmesg is from 11-RC3:

igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
        ether 00:25:90:06:b7:9e
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
igb1: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
        ether 00:25:90:06:b7:9f
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect
        status: no carrier
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
        inet 127.0.0.1 netmask 0xff000000
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        groups: lo
vlan1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=303<RXCSUM,TXCSUM,TSO4,TSO6>
        ether 00:25:90:06:b7:9e
        inet 192.168.0.248 netmask 0xffffff00 broadcast 192.168.0.255
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        vlan: 1 vlanpcp: 0 parent interface: igb0
        groups: vlan
vlan2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=303<RXCSUM,TXCSUM,TSO4,TSO6>
        ether 00:25:90:06:b7:9e
        inet 91.206.242.1 netmask 0xfffffff0 broadcast 91.206.242.15
        inet 91.206.242.5 netmask 0xfffffff0 broadcast 91.206.242.15
        inet 91.206.242.8 netmask 0xfffffff0 broadcast 91.206.242.15
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        vlan: 2 vlanpcp: 0 parent interface: igb0
        groups: vlan
vlan3: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=303<RXCSUM,TXCSUM,TSO4,TSO6>
        ether 00:25:90:06:b7:9e
        inet 10.64.0.250 netmask 0xffffff00 broadcast 10.64.0.255
        inet 10.64.0.252 netmask 0xffffff00 broadcast 10.64.0.255
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        vlan: 3 vlanpcp: 0 parent interface: igb0
        groups: vlan
vlan4: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=303<RXCSUM,TXCSUM,TSO4,TSO6>
        ether 00:25:90:06:b7:9e
        inet 89.250.210.6 netmask 0xfffffffc broadcast 89.250.210.7
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        vlan: 4 vlanpcp: 0 parent interface: igb0
        groups: vlan
vlan5: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=303<RXCSUM,TXCSUM,TSO4,TSO6>
        ether 00:25:90:06:b7:9e
        inet 77.43.142.201 netmask 0xfffffffc broadcast 77.43.142.203
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        vlan: 5 vlanpcp: 0 parent interface: igb0
        groups: vlan
vlan6: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=303<RXCSUM,TXCSUM,TSO4,TSO6>
        ether 00:25:90:06:b7:9e
        inet 172.20.142.250 netmask 0xffffff00 broadcast 172.20.142.255
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        vlan: 6 vlanpcp: 0 parent interface: igb0
        groups: vlan
vlan7: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=303<RXCSUM,TXCSUM,TSO4,TSO6>
        ether 00:25:90:06:b7:9e
        inet 172.16.240.2 netmask 0xffffff00 broadcast 172.16.240.255
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        vlan: 7 vlanpcp: 0 parent interface: igb0
        groups: vlan
vlan8: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=303<RXCSUM,TXCSUM,TSO4,TSO6>
        ether 00:25:90:06:b7:9e
        inet 86.109.196.74 netmask 0xfffffff8 broadcast 86.109.196.79
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        vlan: 8 vlanpcp: 0 parent interface: igb0
        groups: vlan
vlan9: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=303<RXCSUM,TXCSUM,TSO4,TSO6>
        ether 00:25:90:06:b7:9e
        inet 192.168.2.1 netmask 0xffffff00 broadcast 192.168.2.255
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        vlan: 9 vlanpcp: 0 parent interface: igb0
        groups: vlan
vlan10: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=303<RXCSUM,TXCSUM,TSO4,TSO6>
        ether 00:25:90:06:b7:9e
        inet 192.168.3.1 netmask 0xffffff00 broadcast 192.168.3.255
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        vlan: 10 vlanpcp: 0 parent interface: igb0
        groups: vlan
vlan11: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=303<RXCSUM,TXCSUM,TSO4,TSO6>
        ether 00:25:90:06:b7:9e
        inet 188.234.141.201 netmask 0xfffffffc broadcast 188.234.141.203
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        vlan: 11 vlanpcp: 0 parent interface: igb0
        groups: vlan
vlan12: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=303<RXCSUM,TXCSUM,TSO4,TSO6>
        ether 00:25:90:06:b7:9e
        inet 192.168.50.1 netmask 0xffffff00 broadcast 192.168.50.255
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        vlan: 12 vlanpcp: 0 parent interface: igb0
        groups: vlan
vlan13: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=303<RXCSUM,TXCSUM,TSO4,TSO6>
        ether 00:25:90:06:b7:9e
        inet 192.168.99.10 netmask 0xffffff00 broadcast 192.168.99.255
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        vlan: 13 vlanpcp: 0 parent interface: igb0
        groups: vlan


Dmesg:

Copyright (c) 1992-2016 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 11.0-RC3 #0 r305786: Wed Sep 14 02:19:25 UTC 2016
    root@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64
FreeBSD clang version 3.8.0 (tags/RELEASE_380/final 262564) (based on LLVM 3.8.0)
VT(vga): resolution 640x480
CPU: Intel(R) Xeon(R) CPU           E5620  @ 2.40GHz (2400.13-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x206c2  Family=0x6  Model=0x2c  Stepping=2
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x9ee3fd<SSE3,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,POPCNT>
  AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
  AMD Features2=0x1<LAHF>
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
  TSC: P-state invariant, performance statistics
real memory  = 51543801856 (49156 MB)
avail memory = 49979412480 (47664 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <080312 APIC1521>
FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs
FreeBSD/SMP: 2 package(s) x 4 core(s) x 2 hardware threads
random: unblocking device.
ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe0Block: 128/64 (20160527/tbfadt-650)
ioapic0: Changing APIC ID to 6
ioapic1: Changing APIC ID to 7
ioapic0 <Version 2.0> irqs 0-23 on motherboard
ioapic1 <Version 2.0> irqs 24-47 on motherboard
random: entropy device external interface
kbd1 at kbdmux0
netmap: loaded module
module_register_init: MOD_LOAD (vesa, 0xffffffff8101c950, 0) error 19
vtvga0: <VT VGA driver> on motherboard
cryptosoft0: <software crypto> on motherboard
acpi0: <SMCI > on motherboard
acpi0: Power Button (fixed)
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
cpu2: <ACPI CPU> on acpi0
cpu3: <ACPI CPU> on acpi0
cpu4: <ACPI CPU> on acpi0
cpu5: <ACPI CPU> on acpi0
cpu6: <ACPI CPU> on acpi0
cpu7: <ACPI CPU> on acpi0
cpu8: <ACPI CPU> on acpi0
cpu9: <ACPI CPU> on acpi0
cpu10: <ACPI CPU> on acpi0
cpu11: <ACPI CPU> on acpi0
cpu12: <ACPI CPU> on acpi0
cpu13: <ACPI CPU> on acpi0
cpu14: <ACPI CPU> on acpi0
cpu15: <ACPI CPU> on acpi0
attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0
Event timer "RTC" frequency 32768 Hz quality 0
hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0
Timecounter "HPET" frequency 14318180 Hz quality 950
Event timer "HPET" frequency 14318180 Hz quality 350
Event timer "HPET1" frequency 14318180 Hz quality 340
Event timer "HPET2" frequency 14318180 Hz quality 340
Event timer "HPET3" frequency 14318180 Hz quality 340
Timecounter "ACPI-safe" frequency 3579545 Hz quality 850
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff numa-domain 0 on acpi0
pcib0: _OSC returned error 0x10
pci0: <ACPI PCI bus> numa-domain 0 on pcib0
pcib1: <ACPI PCI-PCI bridge> at device 1.0 numa-domain 0 on pci0
pci1: <ACPI PCI bus> numa-domain 0 on pcib1
igb0: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port 0xec00-0xec1f mem 0xfbde0000-0xfbdfffff,0xfbdc0000-0xfbddffff,0xfbd9c000-0xfbd9ffff irq 28 at device 0.0 numa-domain 0 on pci1
igb0: Using MSIX interrupts with 9 vectors
igb0: Ethernet address: 00:25:90:06:b7:9e
igb0: Bound queue 0 to cpu 0
igb0: Bound queue 1 to cpu 1
igb0: Bound queue 2 to cpu 2
igb0: Bound queue 3 to cpu 3
igb0: Bound queue 4 to cpu 4
igb0: Bound queue 5 to cpu 5
igb0: Bound queue 6 to cpu 6
igb0: Bound queue 7 to cpu 7
igb0: netmap queues/slots: TX 8/1024, RX 8/1024
igb1: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port 0xe880-0xe89f mem 0xfbd60000-0xfbd7ffff,0xfbd40000-0xfbd5ffff,0xfbd1c000-0xfbd1ffff irq 40 at device 0.1 numa-domain 0 on pci1
igb1: Using MSIX interrupts with 9 vectors
igb1: Ethernet address: 00:25:90:06:b7:9f
igb1: Bound queue 0 to cpu 8
igb1: Bound queue 1 to cpu 9
igb1: Bound queue 2 to cpu 10
igb1: Bound queue 3 to cpu 11
igb1: Bound queue 4 to cpu 12
igb1: Bound queue 5 to cpu 13
igb1: Bound queue 6 to cpu 14
igb1: Bound queue 7 to cpu 15
igb1: netmap queues/slots: TX 8/1024, RX 8/1024
pcib2: <ACPI PCI-PCI bridge> at device 3.0 numa-domain 0 on pci0
pci2: <ACPI PCI bus> numa-domain 0 on pcib2
pcib3: <ACPI PCI-PCI bridge> at device 5.0 numa-domain 0 on pci0
pci3: <ACPI PCI bus> numa-domain 0 on pcib3
pcib4: <ACPI PCI-PCI bridge> at device 7.0 numa-domain 0 on pci0
pci4: <ACPI PCI bus> numa-domain 0 on pcib4
pcib5: <ACPI PCI-PCI bridge> at device 9.0 numa-domain 0 on pci0
pci5: <ACPI PCI bus> numa-domain 0 on pcib5
pci0: <base peripheral, interrupt controller> at device 20.0 (no driver attached)
pci0: <base peripheral, interrupt controller> at device 20.1 (no driver attached)
pci0: <base peripheral, interrupt controller> at device 20.2 (no driver attached)
pci0: <base peripheral, interrupt controller> at device 20.3 (no driver attached)
uhci0: <Intel 82801JI (ICH10) USB controller USB-D> port 0xdc00-0xdc1f irq 16 at device 26.0 numa-domain 0 on pci0
uhci0: LegSup = 0x2f00
usbus0 numa-domain 0 on uhci0
uhci1: <Intel 82801JI (ICH10) USB controller USB-E> port 0xd880-0xd89f irq 21 at device 26.1 numa-domain 0 on pci0
uhci1: LegSup = 0x2f00
usbus1 numa-domain 0 on uhci1
uhci2: <Intel 82801JI (ICH10) USB controller USB-F> port 0xd800-0xd81f irq 19 at device 26.2 numa-domain 0 on pci0
uhci2: LegSup = 0x2f00
usbus2 numa-domain 0 on uhci2
ehci0: <Intel 82801JI (ICH10) USB 2.0 controller USB-B> mem 0xfbeda000-0xfbeda3ff irq 18 at device 26.7 numa-domain 0 on pci0
usbus3: EHCI version 1.0
usbus3 numa-domain 0 on ehci0
uhci3: <Intel 82801JI (ICH10) USB controller USB-A> port 0xd480-0xd49f irq 23 at device 29.0 numa-domain 0 on pci0
uhci3: LegSup = 0x2f00
usbus4 numa-domain 0 on uhci3
uhci4: <Intel 82801JI (ICH10) USB controller USB-B> port 0xd400-0xd41f irq 19 at device 29.1 numa-domain 0 on pci0
uhci4: LegSup = 0x2f00
usbus5 numa-domain 0 on uhci4
uhci5: <Intel 82801JI (ICH10) USB controller USB-C> port 0xd080-0xd09f irq 18 at device 29.2 numa-domain 0 on pci0
uhci5: LegSup = 0x2f00
usbus6 numa-domain 0 on uhci5
ehci1: <Intel 82801JI (ICH10) USB 2.0 controller USB-A> mem 0xfbed8000-0xfbed83ff irq 23 at device 29.7 numa-domain 0 on pci0
usbus7: EHCI version 1.0
usbus7 numa-domain 0 on ehci1
pcib6: <ACPI PCI-PCI bridge> at device 30.0 numa-domain 0 on pci0
pci6: <ACPI PCI bus> numa-domain 0 on pcib6
vgapci0: <VGA-compatible display> mem 0xf9000000-0xf9ffffff,0xfaffc000-0xfaffffff,0xfb000000-0xfb7fffff irq 18 at device 1.0 numa-domain 0 on pci6
vgapci0: Boot video device
isab0: <PCI-ISA bridge> at device 31.0 numa-domain 0 on pci0
isa0: <ISA bus> numa-domain 0 on isab0
atapci0: <Intel ICH10 SATA300 controller> port 0xd000-0xd007,0xcc00-0xcc03,0xc880-0xc887,0xc800-0xc803,0xc480-0xc48f,0xc400-0xc40f irq 19 at device 31.2 numa-domain 0 on pci0
ata2: <ATA channel> at channel 0 on atapci0
ata3: <ATA channel> at channel 1 on atapci0
atapci1: <Intel ICH10 SATA300 controller> port 0xc000-0xc007,0xbc00-0xbc03,0xb880-0xb887,0xb800-0xb803,0xb480-0xb48f,0xb400-0xb40f irq 19 at device 31.5 numa-domain 0 on pci0
ata4: <ATA channel> at channel 0 on atapci1
ata5: <ATA channel> at channel 1 on atapci1
acpi_button0: <Power Button> on acpi0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: model IntelliMouse Explorer, device ID 4
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0
qpi0: <QPI system bus> on motherboard
pcib7: <QPI Host-PCI bridge> pcibus 255 on qpi0
pci7: <PCI bus> on pcib7
pcib8: <QPI Host-PCI bridge> pcibus 254 on qpi0
pci8: <PCI bus> on pcib8
orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xc8fff on isa0
ppc0: cannot reserve I/O port range
est0: <Enhanced SpeedStep Frequency Control> on cpu0
est1: <Enhanced SpeedStep Frequency Control> on cpu1
est2: <Enhanced SpeedStep Frequency Control> on cpu2
est3: <Enhanced SpeedStep Frequency Control> on cpu3
est4: <Enhanced SpeedStep Frequency Control> on cpu4
est5: <Enhanced SpeedStep Frequency Control> on cpu5
est6: <Enhanced SpeedStep Frequency Control> on cpu6
est7: <Enhanced SpeedStep Frequency Control> on cpu7
est8: <Enhanced SpeedStep Frequency Control> on cpu8
est9: <Enhanced SpeedStep Frequency Control> on cpu9
est10: <Enhanced SpeedStep Frequency Control> on cpu10
est11: <Enhanced SpeedStep Frequency Control> on cpu11
est12: <Enhanced SpeedStep Frequency Control> on cpu12
est13: <Enhanced SpeedStep Frequency Control> on cpu13
est14: <Enhanced SpeedStep Frequency Control> on cpu14
est15: <Enhanced SpeedStep Frequency Control> on cpu15
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
Timecounters tick every 1.000 msec
nvme cam probe device init
usbus0: 12Mbps Full Speed USB v1.0
usbus1: 12Mbps Full Speed USB v1.0
usbus2: 12Mbps Full Speed USB v1.0
ugen0.1: <Intel> at usbus0
uhub0: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
ugen1.1: <Intel> at usbus1
uhub1: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus1
ugen2.1: <Intel> at usbus2
uhub2: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2
usbus3: 480Mbps High Speed USB v2.0
usbus4: 12Mbps Full Speed USB v1.0
usbus5: 12Mbps Full Speed USB v1.0
ugen3.1: <Intel> at usbus3
uhub3: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3
ugen4.1: <Intel> at usbus4
uhub4: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus4
ugen5.1: <Intel> at usbus5
uhub5: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus5
usbus6: 12Mbps Full Speed USB v1.0
usbus7: 480Mbps High Speed USB v2.0
ugen6.1: <Intel> at usbus6
uhub6: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus6
ugen7.1: <Intel> at usbus7
uhub7: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus7
uhub2: 2 ports with 2 removable, self powered
uhub1: 2 ports with 2 removable, self powered
uhub0: 2 ports with 2 removable, self powered
uhub6: 2 ports with 2 removable, self powered
uhub4: 2 ports with 2 removable, self powered
uhub5: 2 ports with 2 removable, self powered
ada0 at ata2 bus 0 scbus0 target 0 lun 0
ada0: <GB0500EAFYL HPG1> ATA-7 SATA 2.x device
ada0: Serial Number WCASY6743897
ada0: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes)
ada0: 476940MB (976773168 512 byte sectors)
ada1 at ata2 bus 0 scbus0 target 1 lun 0
ada1: <ST500DM002-1BD142 KC48> ATA8-ACS SATA 3.x device
ada1: Serial Number Z6EMAENR
ada1: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes)
ada1: 476940MB (976773168 512 byte sectors)
ada1: quirks=0x1<4K>
ada2 at ata3 bus 0 scbus1 target 0 lun 0
ada2: <GB0500EAFYL HPG1> ATA-7 SATA 2.x device
ada2: Serial Number WCASY6752687
ada2: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes)
ada2: 476940MB (976773168 512 byte sectors)
ada3 at ata3 bus 0 scbus1 target 1 lun 0
ada3: <ST500DM002-1BD142 KC48> ATA8-ACS SATA 3.x device
ada3: Serial Number Z6EM8QHK
ada3: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes)
ada3: 476940MB (976773168 512 byte sectors)
ada3: quirks=0x1<4K>
SMP: AP CPU #1 Launched!
SMP: AP CPU #15 Launched!
SMP: AP CPU #4 Launched!
SMP: AP CPU #10 Launched!
SMP: AP CPU #6 Launched!
SMP: AP CPU #11 Launched!
SMP: AP CPU #2 Launched!
SMP: AP CPU #8 Launched!
SMP: AP CPU #3 Launched!
SMP: AP CPU #12 Launched!
SMP: AP CPU #7 Launched!
SMP: AP CPU #13 Launched!
SMP: AP CPU #5 Launched!
SMP: AP CPU #9 Launched!
SMP: AP CPU #14 Launched!
Timecounter "TSC-low" frequency 1200065624 Hz quality 1000
Trying to mount root from zfs:zfsroot []...
GEOM_MIRROR: Device mirror/swap launched (2/2).
Root mount waiting for: usbus7 usbus3
Root mount waiting for: usbus7 usbus3
uhub7: 6 ports with 6 removable, self powered
uhub3: 6 ports with 6 removable, self powered
igb0: link state changed to UP
vlan1: link state changed to UP
vlan2: link state changed to UP
vlan3: link state changed to UP
vlan4: link state changed to UP
vlan5: link state changed to UP
vlan6: link state changed to UP
vlan7: link state changed to UP
vlan8: link state changed to UP
vlan9: link state changed to UP
vlan10: link state changed to UP
vlan11: link state changed to UP
vlan12: link state changed to UP
vlan13: link state changed to UP

Comment 31 Hiren Panchasara freebsd_committer

2016-10-13 04:51:24 UTC

(In reply to Robert Watson from comment #29)

Robert,

Thanks for your response.

 On a slightly modified (nothing in driver space) stable/11, I am seeing repeated panic in sbsndptr() with igb while box is pretty much idle or doing very low traffic.

(kgdb) bt
#0  __curthread () at ./machine/pcpu.h:221
#1  doadump (textdump=-2121667464) at /d2/hiren/freebsd/sys/kern/kern_shutdown.c:298
#2  0xffffffff80389f86 in db_fncall_generic (nargs=0, addr=<optimized out>, rv=<optimized out>, 
    args=<optimized out>) at /d2/hiren/freebsd/sys/ddb/db_command.c:568
#3  db_fncall (dummy1=<optimized out>, dummy2=<optimized out>, dummy3=<optimized out>, dummy4=<optimized out>)
    at /d2/hiren/freebsd/sys/ddb/db_command.c:616
#4  0xffffffff80389a29 in db_command (last_cmdp=<optimized out>, cmd_table=<optimized out>, 
    dopager=<optimized out>) at /d2/hiren/freebsd/sys/ddb/db_command.c:440
#5  0xffffffff80389784 in db_command_loop () at /d2/hiren/freebsd/sys/ddb/db_command.c:493
#6  0xffffffff8038c76b in db_trap (type=<optimized out>, code=<optimized out>)
    at /d2/hiren/freebsd/sys/ddb/db_main.c:251
#7  0xffffffff809a6f33 in kdb_trap (type=<optimized out>, code=<optimized out>, tf=<optimized out>)
    at /d2/hiren/freebsd/sys/kern/subr_kdb.c:654
#8  0xffffffff80d93521 in trap_fatal (frame=0xfffffe1f2bb38210, eva=24)
    at /d2/hiren/freebsd/sys/amd64/amd64/trap.c:836
#9  0xffffffff80d93753 in trap_pfault (frame=0xfffffe1f2bb38210, usermode=0)
    at /d2/hiren/freebsd/sys/amd64/amd64/trap.c:691
#10 0xffffffff80d92cdc in trap (frame=0xfffffe1f2bb38210) at /d2/hiren/freebsd/sys/amd64/amd64/trap.c:442
#11 <signal handler called>
#12 sbsndptr (sb=0xfffff8060f8a5518, off=0, len=4294967287, moff=0xfffffe1f2bb38420)
    at /d2/hiren/freebsd/sys/kern/uipc_sockbuf.c:1191
#13 0xffffffff80ab9382 in tcp_output (tp=<optimized out>) at /d2/hiren/freebsd/sys/netinet/tcp_output.c:1099
#14 0xffffffff80ab6105 in tcp_do_segment (m=<optimized out>, th=<optimized out>, so=0xfffff8060f8a5360, 
    tp=<optimized out>, drop_hdrlen=60, tlen=<optimized out>, iptos=<optimized out>, 
    ti_locked=<error reading variable: Cannot access memory at address 0x1>)
    at /d2/hiren/freebsd/sys/netinet/tcp_input.c:3182
#15 0xffffffff80ab2803 in tcp_input (mp=<optimized out>, offp=<optimized out>, proto=<optimized out>)
    at /d2/hiren/freebsd/sys/netinet/tcp_input.c:1444
#16 0xffffffff80aa6bc5 in ip_input (m=<error reading variable: Cannot access memory at address 0x0>)
    at /d2/hiren/freebsd/sys/netinet/ip_input.c:809
#17 0xffffffff80a82b35 in netisr_dispatch_src (proto=1, source=<optimized out>, m=0x0)
    at /d2/hiren/freebsd/sys/net/netisr.c:1120
#18 0xffffffff80a6c2ca in ether_demux (ifp=<optimized out>, m=0x0) at /d2/hiren/freebsd/sys/net/if_ethersubr.c:850
#19 0xffffffff80a6cf22 in ether_input_internal (ifp=<optimized out>, m=0x0)
    at /d2/hiren/freebsd/sys/net/if_ethersubr.c:639
#20 ether_nh_input (m=<optimized out>) at /d2/hiren/freebsd/sys/net/if_ethersubr.c:669
#21 0xffffffff80a82b35 in netisr_dispatch_src (proto=5, source=<optimized out>, m=0x0)
    at /d2/hiren/freebsd/sys/net/netisr.c:1120
#22 0xffffffff80a6c546 in ether_input (ifp=<optimized out>, m=0x0) at /d2/hiren/freebsd/sys/net/if_ethersubr.c:759
#23 0xffffffff804e2b3c in igb_rx_input (rxr=<optimized out>, ifp=0xfffff80115614800, m=0xfffff8014eee7600, 
    ptype=<optimized out>) at /d2/hiren/freebsd/sys/dev/e1000/if_igb.c:4957
#24 igb_rxeof (que=<optimized out>, count=358700136, done=<optimized out>)
    at /d2/hiren/freebsd/sys/dev/e1000/if_igb.c:5185
#25 0xffffffff804e1daf in igb_msix_que (arg=<optimized out>) at /d2/hiren/freebsd/sys/dev/e1000/if_igb.c:1612
#26 0xffffffff8091425f in intr_event_execute_handlers (p=<optimized out>, ie=<optimized out>)
    at /d2/hiren/freebsd/sys/kern/kern_intr.c:1262
#27 0xffffffff80914876 in ithread_execute_handlers (ie=<optimized out>, p=<optimized out>)
    at /d2/hiren/freebsd/sys/kern/kern_intr.c:1275
#28 ithread_loop (arg=<optimized out>) at /d2/hiren/freebsd/sys/kern/kern_intr.c:1356
#29 0xffffffff80910ea5 in fork_exit (callout=0xffffffff809147b0 <ithread_loop>, arg=0xfffff8011561a0e0, 
    frame=0xfffffe1f2bb38ac0) at /d2/hiren/freebsd/sys/kern/kern_fork.c:1040
#30 <signal handler called>

----------------------------------------------------------------

Most interesting frames are these 2:

#22 0xffffffff80a6c546 in ether_input (ifp=<optimized out>, m=0x0) at /d2/hiren/freebsd/sys/net/if_ethersubr.c:759
#23 0xffffffff804e2b3c in igb_rx_input (rxr=<optimized out>, ifp=0xfffff80115614800, m=0xfffff8014eee7600, 
    ptype=<optimized out>) at /d2/hiren/freebsd/sys/dev/e1000/if_igb.c:4957

#23 has an mbuf while #22 has it null.

Does this point to your hunch of
"device-driver bugs involving modifications to the mbuf chain after submitting the mbuf to the network stack (e.g., due to concurrency bugs in the device driver)" ?

OR something else is going on?

Comment 32 Daniel Bilik 2016-10-13 08:37:54 UTC

(In reply to Robert Watson from comment #29)

> On the whole, my intuition is towards a device-driver bug based
> on past experience.

We've been also struggling this in past weeks, and I can confirm Robert's intuition.

In our case, the bug affects two hosts running recent 10-STABLE, connected to each other via igb(4) using a dedicated 100Mb switch. When trying to transfer directory structure holding several gigabytes of data with rsync protocol, either sender or receiver panics in less then a minute with:

Panic String: sbsndptr: sockbuf 0xfffff8000ccc76f8 and mbuf 0xfffff802a0145800 clashing

Interestingly, scp(1)ing data between the hosts doesn't seem to trigger this panic such easily, but sometimes it does, mostly when copying larger (>1GB) files.

We've fixed this just yesterday by limiting number of igb(4) txrx queues, ie. adding this into loader.conf:

hw.igb.num_queues=1

Now the hosts run stable, periodically rsyncing data in both directions.

Comment 33 slw 2016-10-13 09:12:40 UTC

(In reply to Hiren Panchasara from comment #31)

> Most interesting frames are these 2:
> 
> #22 0xffffffff80a6c546 in ether_input (ifp=<optimized out>, m=0x0) at /d2/hiren/freebsd/sys/net/if_ethersubr.c:759
> #23 0xffffffff804e2b3c in igb_rx_input (rxr=<optimized out>, ifp=0xfffff80115614800, m=0xfffff8014eee7600, 
>    ptype=<optimized out>) at /d2/hiren/freebsd/sys/dev/e1000/if_igb.c:4957
>
> #23 has an mbuf while #22 has it null.

> Does this point to your hunch of
> "device-driver bugs involving modifications to the mbuf chain after submitting the mbuf to the network stack (e.g., due to concurrency bugs in the device driver)" ?

This is just result of compiler optimisation and stack decoding.
Compiler use for m same register as passed at call time and do

while (m) {  
 mn = m->m_nextpkt;
[...]
 m = mn;
}

as result m (as decoded argument) will be incorectly displayed.
Actualy this is just last loop iteration with last mbuf in chain.

Comment 34 Hiren Panchasara freebsd_committer

2016-10-13 16:40:39 UTC

(In reply to slw from comment #33)
Thanks but I am little confused.

which value of 'm' should I trust? is it null in frame #22 or not? it seems like null in the frames above it also.

Comment 35 slw 2016-10-13 17:55:40 UTC

(In reply to Hiren Panchasara from comment #34)
> which value of 'm' should I trust? is it null in frame #22 or not? it seems like null in the frames above it also.

Зartially. ether_input call with m set to 0xfffff8014eee7600 (and this is first m for next invocation of further functions), do one (or more, w/ different m, need access to vmcore by kgdb and analyse 0xfffff8014eee7600 for answer) iteration w/  and call netisr_dispatch with passed m as second argument (in %rsi register). All next invocation can don't preserve %rsi (or %rdx in case of m passed as 3'th argument) and backtrace can incorrectly decode arguments call.

Just realyty check: frame #19, ether_input_internal (ifp=<optimized out>, m=0x0), line 483:

        if (m->m_len < ETHER_HDR_LEN) {

MUST occur kernel panic if m realy NULL.

This is just incorrect decoding of arguments.

Comment 36 Greg Lund-Chaix 2016-11-21 20:49:14 UTC

(In reply to Daniel Bilik from comment #32)

I was seeing identical issues - panic/reboot every few hours when under network load (rsync and zfs replication in this case) on 11.0-RELEASE.  Daniel's suggested workaround setting "hw.igb.num_queues=1" in loader.conf has stabilized the systems.

Comment 37 Hiren Panchasara freebsd_committer

2017-01-07 22:56:56 UTC

*** Bug 147558 has been marked as a duplicate of this bug. ***

Comment 38 Ivan Klymenko 2017-02-15 08:01:04 UTC

I have the same problem: http://docs.freebsd.org/cgi/mid.cgi?20170215093609.78a77ead
Head maillist: http://docs.freebsd.org/cgi/mid.cgi?20161021220413.1d130f5c

Comment 39 Robert Watson freebsd_committer

2017-02-15 09:56:03 UTC

FYI, I feel that this bug report ought to be closed, and a new and more specific report opened on the IGB issue. The "sbdrop" panic is a generic debugging invariant that catches a range of different bugs, and this report has likely therefore encapsulated quite a few different bugs over the years, rather than reflecting a single bug.

Can I recommend the creation of a new ticket tacking the presumed IGB multi-queue issue specifically?

Comment 40 Ivan Klymenko 2017-02-15 12:59:42 UTC

(In reply to Robert Watson from comment #39)
I am afraid, that I can not add to new ticket tacking is nothing more, than has been described in the above thread the mailing list.
In the same thread, I explained why I could not get more information about the cause of panic.

Sorry.

Comment 41 emz 2017-02-16 06:14:44 UTC

May be it will be better just to change a subject, rather to lose all the decent work here in comments ? Because closing such a bug will be really discouraging, even if it's subject is misleading.

Comment 42 Hiren Panchasara freebsd_committer

2017-02-16 07:01:01 UTC

I've seen this happen on both em(4) which has only 1 queue and igb(4) which has 2.
It'd be interesting to see if new iflib'd em(4) (which has both em/igb) drivers in -HEAD help in this regard.

Comment 43 Kubilay Kocak freebsd_committer

2017-03-31 02:57:05 UTC

@All For issues where the 'described problem' and the 'thing(s) that need to be changed' are not one and the same, to retain all history/contextual information and not cause confusion (renaming titles, etc), the correct process is as follows:

1) Leave the 'original description/report of symptoms' issues 'as is'.
2) Create a blocking (depends on) issue for each area/component within FreeBSD that requires a change/fix (em, igb, etc). Alternatively, if multiple areas require fixing/changes but these will be resolved by a single person or maintainer (the assignee), then a single blocking issue is fine.

Notes:

The assignee of the blocked (parent) issue (ie 'the original report') can be changed to the person(s) responsible (assignee) to the blocking (sub) issue. 

Alternatively, the assignee of the parent issue can also be a person who wants to/will 'see it through', coordinating and updating issue metadata until all sub-tasks are closed, including any relevant MFC's and documentation tasks.

Comment 44 Robert Watson freebsd_committer

2017-03-31 09:37:56 UTC

It's probably best to be clear: the bugs being described by various reporters over the years are *different* (and likely entirely *independent*) bugs that happen to have the same visible symptom as a result of being detected by the same check.  Putting these differences problems in the same open bug report, and keeping it open, is very much like putting all kernel panics in the same bug report because they share the word "panic" and involve a crash (which is pretty much what is happening here). As a result, there's a bug report that describes many different problems fixed in many different ways at many different times, and is never closed. Not only that, but prior debugging may shed no light on more recent problems, and could in fact obscure understanding, since it's not progress being made on one bug, but observations of several likely independent problems.

Comment 45 Mike Andrews 2017-03-31 17:25:00 UTC

As the guy that filed the original bug report almost 7 years ago, I'd tend to agree with Robert Watson here.  I'll see if Bugzilla will let me close it.

Comment 46 Robert Watson freebsd_committer

2017-03-31 18:14:50 UTC

I've opened a new #218270 for the sbsndptr() crash.

Comment 47 Kubilay Kocak freebsd_committer

2017-07-05 11:55:37 UTC

Use more appropriate resolution (FIXED is for those issues resolved by a change/commit)

ae
ddb
emaste
emz
fidaj
g_amanakis
girgen
glundchaix
hoomanfazaeli
j
jch
koobs
luke.hamburg
np
op
rwatson
sebastian.huber
sh
slw
swills