Under heavy load (i.e. enough to make an 8 GB machine start to swap) I'm seeing multiple (identical) machines panic repeatedly with the above panic messages. The panics go away once load goes down. Also very occasionally seeing "em0: discard frame w/o packet header" before the panics, though not very often. Hardware is five identical Supermicro PDSMI+ systems, Q6600, 8 GB ECC memory. The only references I'm finding to these panics on Google seem to point at either IPv6 or em as potential issues, and we're using both. :) Specifically http://groups.google.com/group/mailing.freebsd.net/browse_thread/thread/28db45413a889411 looks VERY similar. I have not yet tried shutting off IPv6 or at least switching some services back to IPv4. (Our v6 usage is all internal-only.) core.txt.* files are at http://www.bit0.com/tmp/core.txt.20100721.tar.gz I have minidumps as well but as they may contain some proprietary data I'd rather not post 'em online, however I can run whatever kgdb commands are needed to help troubleshoot. :) Fix: Unknown How-To-Repeat: See above
Responsible Changed From-To: freebsd-bugs->freebsd-net Over to maintainer(s).
Responsible Changed From-To: freebsd-net->andre Take over.
Mike, I see that you use ZERO_COPY_SOCKETS in your kernel file base on the information in the crash dumps. ZERO_COPY_SOCKETS may have some bugs regarding the mbuf and vm page lifecycle. Their use is not really supported at the moment and we have highly optimized the normal send path. So further optimizations are not really necessary. Please recompile your kernel without ZERO_COPY_SOCKETS and report whether you still see sbdrop and sockbuf panics. Debugging ZERO_COPY_SOCKETS is very difficult because of the complex interactions between the VM, mbuf and sockbuf systems. -- Andre
I removed ZERO_COPY_SOCKETS and am still seeing panics. A new set of core.txt files is at http://www.bit0.com/tmp/core.txt.20100812.tar.gz
On 12.08.2010 18:21, Mike Andrews wrote: > I removed ZERO_COPY_SOCKETS and am still seeing panics. A new set of > core.txt files is at http://www.bit0.com/tmp/core.txt.20100812.tar.gz Please try the attached patch and compile your kernel with INVARIANTS. It contains some debugging code to catch any corruption to the sockbuf when it happens and may also a few potential fixes. We can narrow it down now. -- Andre
I'll try this this evening. Would options SOCKBUF_DEBUG help any?
On 13 August 2010 14:02, Andre Oppermann <andre@freebsd.org> wrote: > On 12.08.2010 18:21, Mike Andrews wrote: >> >> I removed ZERO_COPY_SOCKETS and am still seeing panics. =A0A new set of >> core.txt files is at http://www.bit0.com/tmp/core.txt.20100812.tar.gz > > Please try the attached patch and compile your kernel with INVARIANTS. > It contains some debugging code to catch any corruption to the sockbuf > when it happens and may also a few potential fixes. > > We can narrow it down now. [as I (occasionally) was added to cc: list] so, I tried this patch and box got panic near starting multiuser: My testbox has no swapspace, no debug symbols this time :( panic: sbflush_internal: sb_cc !=3D total mbuf length db> bt Tracing pid 982 tid 100111 td 0xffffff0008032440 kdb_enter() at kdb_enter+0x3d panic() at panic+0x17b sbflush_internal() at sbflush_internal+0x98 sbrelease_internal() at sbrelease_internal+0x1c sofree() at sofree+0x1bb soclose() at soclose+0x32b _fdrop() at _fdrop+0x23 closef() at closef+0x5b kern_close() at kern_close+0x110 syscallenter() at syscallenter+0x1aa syscall() at syscall+0x4c Xfast_syscall() at Xfast_syscall+0xe2 --- syscall (6, FreeBSD ELF64, close), rip =3D 0x800742fbc, rsp =3D 0x7fffffffd2f8, rbp =3D 0x800c07060 --- db> show proc 982 Process 982 (ypbind) at 0xffffff0008034000: state: NORMAL uid: 0 gids: 0 parent: pid 980 at 0xffffff00080358c0 ABI: FreeBSD ELF64 arguments: /usr/sbin/ypbind threads: 1 100111 Run CPU 1 ypbind --=20 wbr, pluknet
Same thing happened to me today. I don't have ZERO_COPY_SOCKETS, however I use IPv6. Until today my system was completely solid (no panics for weeks/months of uptime) and only change I did recently was enabling ALTQ in pf.conf week ago but it was working for a couple of days with no ill effects. Panic happened when I resumed Win7 client and opened web browser. I'm running Squid-3.1 with pf transparent redirection. System: FreeBSD 8.1-PRERELEASE #4: Wed Jul 14 21:47:49 CEST 2010 Config: GENERIC - (everything that can go into kld) + SW_WATCHDOG + DEVICE_POLLING (not used) + ALTQ + DDB KLDs: kernel vesa.ko geom_journal.ko geom_label.ko if_rl.ko miibus.ko if_vr.ko snd_via8233.ko sound.ko usb.ko ukbd.ko ums.ko umass.ko cam.ko agp.ko uhci.ko ehci.ko kbdmux.ko geom_part_gpt.ko atapicam.ko if_br idge.ko bridgestp.ko wlan_wep.ko wlan.ko wlan_tkip.ko wlan_ccmp.ko wlan_xauth.ko wlan_acl.ko cpufreq.ko netgraph.ko aio.ko sem.ko acpi.ko geom_eli.ko crypto.ko zlib.ko procfs.ko pseudofs.ko linprocfs.ko linu x.ko nullfs.ko pf.ko if_tun.ko ng_ether.ko ng_pppoe.ko ng_socket.ko if_stf.ko nfsclient.ko nfs_common.ko krpc.ko nfsserver.ko nfssvc.ko nfslockd.ko ng_mppc.ko rc4.ko fuse.ko accf_http.ko accf_data.ko Dmesg: Unread portion of the kernel message buffer: panic: sbsndptr: sockbuf 0xc6ea65bc and mbuf 0xc6248100 clashing cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper(c0764c90,e4208910,c04fa0b9,c077f5e6,0,...) at db_trace_self_wrapper+0x26 kdb_backtrace(c077f5e6,0,c0767e81,e420891c,0,...) at kdb_backtrace+0x29 panic(c0767e81,c0749504,c6ea65bc,c6248100,c4eb24f0,...) at panic+0x119 sbsndptr(c6ea65bc,0,4fd,e42089cc,c4f56ef8,...) at sbsndptr+0xa6 tcp_output(c4eb24f0,0,0,c3e45800,c47b0700,...) at tcp_output+0xc84 tcp_do_segment(c4eb24f0,28,0,0,2,...) at tcp_do_segment+0x1f45 tcp_input(c6079d00,14,c3e45800,1,0,...) at tcp_input+0x11d0 ip_input(c6079d00,e4208bb0,80246,24,e4208bd8,...) at ip_input+0x6e5 netisr_dispatch_src(1,0,c6079d00,e4208c00,c05abb61,...) at netisr_dispatch_src+0x89 netisr_dispatch(1,c6079d00,0,c3e45800,c62d8808,...) at netisr_dispatch+0x20 ether_demux(c3e45800,c6079d00,3,0,3,...) at ether_demux+0x161 ether_input(c3e45800,c6079d00,c0940756,587,c3e8eaec,...) at ether_input+0x323 vr_rxeof(c3e8eaec,0,c0940756,68b,c3e8eaec,...) at vr_rxeof+0x219 vr_intr(c3e8e000,0,109,801bafb1,c8ac,...) at vr_intr+0x114 intr_event_execute_handlers(c3d407f8,c3d3d300,c075f556,52d,c3d3d370,...) at intr_event_execute_handlers+0x14b ithread_loop(c3e1a0e0,e4208d38,7afffffb,ddfffbff,84ff77ff,...) at ithread_loop+0x6b fork_exit(c04d0fd0,c3e1a0e0,e4208d38) at fork_exit+0x90 fork_trampoline() at fork_trampoline+0x8 --- trap 0, eip = 0, esp = 0xe4208d70, ebp = 0 --- Uptime: 1d17h48m27s Physical memory: 1007 MB Kgdb: (kgdb) bt #0 doadump () at pcpu.h:230 #1 0xc04f9e57 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:416 #2 0xc04fa0f5 in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:590 #3 0xc0558126 in sbsndptr (sb=0xc6ea65bc, off=0, len=dwarf2_read_address: Corrupted DWARF expression. ) at /usr/src/sys/kern/uipc_sockbuf.c:954 #4 0xc0636c04 in tcp_output (tp=0xc4eb24f0) at /usr/src/sys/netinet/tcp_output.c:817 #5 0xc0633a85 in tcp_do_segment (m=0xc6079d00, th=0xc62d882a, so=0xc6ea64d4, tp=0xc4eb24f0, drop_hdrlen=40, tlen=0, iptos=0 '\0', ti_locked=2) at /usr/src/sys/netinet/tcp_input.c:2693 #6 0xc0635080 in tcp_input (m=0xc6079d00, off0=20) at /usr/src/sys/netinet/tcp_input.c:1029 #7 0xc05cc6f5 in ip_input (m=0xc6079d00) at /usr/src/sys/netinet/ip_input.c:793 #8 0xc05aed89 in netisr_dispatch_src (proto=1, source=0, m=0xc6079d00) at /usr/src/sys/net/netisr.c:917 #9 0xc05af050 in netisr_dispatch (proto=1, m=0xc6079d00) at /usr/src/sys/net/netisr.c:1004 #10 0xc05abb61 in ether_demux (ifp=0xc3e45800, m=0xc6079d00) at /usr/src/sys/net/if_ethersubr.c:901 #11 0xc05ac0c3 in ether_input (ifp=0xc3e45800, m=0xc6079d00) at /usr/src/sys/net/if_ethersubr.c:760 #12 0xc093bea9 in vr_rxeof (sc=0xc3e8e000) at /usr/src/sys/modules/vr/../../dev/vr/if_vr.c:1416 #13 0xc093f374 in vr_intr (arg=0xc3e8e000) at /usr/src/sys/modules/vr/../../dev/vr/if_vr.c:1710 #14 0xc04cf94b in intr_event_execute_handlers (p=0xc3d407f8, ie=0xc3d3d300) at /usr/src/sys/kern/kern_intr.c:1220 #15 0xc04d103b in ithread_loop (arg=0xc3e1a0e0) at /usr/src/sys/kern/kern_intr.c:1233 #16 0xc04cd1d0 in fork_exit (callout=0xc04d0fd0 <ithread_loop>, arg=0xc3e1a0e0, frame=0xe4208d38) at /usr/src/sys/kern/kern_fork.c:844 #17 0xc070eb94 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:273 (kgdb) frame 3 #3 0xc0558126 in sbsndptr (sb=0xc6ea65bc, off=0, len=dwarf2_read_address: Corrupted DWARF expression. ) at /usr/src/sys/kern/uipc_sockbuf.c:954 954 panic("%s: sockbuf %p and mbuf %p clashing", __func__, sb, ret); (kgdb) p *sb $1 = {sb_sel = {si_tdlist = {tqh_first = 0x0, tqh_last = 0xc6ea65bc}, si_note = {kl_list = {slh_first = 0x0}, kl_lock = 0xc04c6910 <knlist_mtx_lock>, kl_unlock = 0xc04c68c0 <knlist_mtx_unlock>, kl_assert_locked = 0xc04c35e0 <knlist_mtx_assert_locked>, kl_assert_unlocked = 0xc04c35f0 <knlist_mtx_assert_unlocked>, kl_lockarg = 0xc6ea65e0}, si_mtx = 0xc4490448}, sb_mtx = {lock_object = { lo_name = 0xc0767ab5 "so_snd", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 3285714816}, sb_sx = {lock_object = { lo_name = 0xc0767f89 "so_snd_sx", lo_flags = 36896768, lo_data = 0, lo_witness = 0x0}, sx_lock = 1}, sb_state = 0, sb_mb = 0xc6248100, sb_mbtail = 0xc6292a00, sb_lastrecord = 0xc6248100, sb_sndptr = 0x0, sb_sndptroff = 601, sb_cc = 1277, sb_hiwat = 33580, sb_mbcnt = 2048, sb_mcnt = 8, sb_ccnt = 0, sb_mbmax = 262144, sb_ctl = 0, sb_lowat = 2048, sb_timeo = 0, sb_flags = 2048, sb_upcall = 0, sb_upcallarg = 0x0} (kgdb) p *buf $2 = {b_bufobj = 0xc6835b20, b_bcount = 16384, b_caller1 = 0x0, b_data = 0xd7f90000 "<unreadable garbage>"..., b_error = 0, b_iocmd = 1 '\001', b_ioflags = 2 '\002', b_iooffset = 469651013632, b_resid = 0, b_iodone = 0, b_blkno = 917287136, b_offset = 72007680, b_bobufs = {tqe_next = 0xd7e1de00, tqe_prev = 0xd7d52888}, b_left = 0x0, b_right = 0x0, b_vflags = 0, b_freelist = {tqe_next = 0xd7d90f20, tqe_prev = 0xd7df1adc}, b_qindex = 1, b_flags = 805306400, b_xflags = 2 '\002', b_lock = {lock_object = {lo_name = 0xc0768b4a "bufwait", lo_flags = 91422720, lo_data = 0, lo_witness = 0x0}, lk_lock = 1, lk_timo = 0, lk_pri = 80}, b_bufsize = 16384, b_runningbufspace = 0, b_kvabase = 0xd7f90000 "<unreadable garbage>"..., b_kvasize = 16384, b_lblkno = 4395, b_vp = 0xc6835a78, b_dirtyoff = 0, b_dirtyend = 0, b_rcred = 0x0, b_wcred = 0x0, b_saveaddr = 0xd7f90000, b_pager = {pg_reqpage = 0}, b_cluster = {cluster_head = { tqh_first = 0xd7ef93a0, tqh_last = 0xd7dfd1b0}, cluster_entry = {tqe_next = 0xd7ef93a0, tqe_prev = 0xd7dfd1b0}}, b_pages = {0xc1b6e200, 0xc150bde8, 0xc150be30, 0xc1ba1c08, 0x0 <repeats 28 times>}, b_npages = 4, b_dep = {lh_first = 0x0}, b_fsprivate1 = 0x0, b_fsprivate2 = 0x0, b_fsprivate3 = 0x0, b_pin_count = 0} "<unreadable garbage>" were inserted by me in place of contents. I wonder why gdb is never fully working with anything more than -O0 (variables not available, corrupted dwarf, etc.). I didn't have in distant past (FreeBSD 5.x I think).
With the patch and with INVARIANTS it's harder to reproduce but still possible. I had a few crashes that hung while attempting to write dumps. Today I had this one on a different machine (same kernel though) that was NOT under significant load but shares other characteristics -- same hardware, same use of IPv6, etc. http://www.bit0.com/tmp/core.beer.2010-09-09.txt.gz
Got a panic when attempting to manually reboot just now. During the shutdown: panic: sbflush_internal: sb_cc != total mbuf length cpuid = 1 Uptime: 1d6h22m22s Physical memory: 8179 MB Dumping 3250 MB: 3235 3219 3203 ...and it never finished the dump, it just hung at 3203. I'm going to try running today's 8.1-STABLE instead of -RELEASE. We'll see if that helps.
hi there, If my eyes do not deceive me, the source of my panic posted above was in the wrong KASSERT() check. Should be '+=' in slen = m_length(n, &n) like this: #ifdef INVARIANTS static int sb_cc_check(struct sockbuf *sb) { struct mbuf *n = sb->sb_mb; int slen = 0; while (n) { slen += m_length(n, &n); n = n->m_nextpkt; } return (slen == sb->sb_cc ? 1 : 0); } #endif
Recently we have this panic two times just after reboot on 9.3-STABLE. So, I think high load isn't needed to reproduce. I found that TCP connection triggered the panic was SSH via IPv6. Unread portion of the kernel message buffer: panic: sbsndptr: sockbuf 0xfffffe01505826d0 and mbuf 0xfffffe015043b000 clashing cpuid = 30 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame 0xffffff90c310d430 kdb_backtrace() at kdb_backtrace+0x37/frame 0xffffff90c310d4f0 panic() at panic+0x1ce/frame 0xffffff90c310d5f0 sbsndptr() at sbsndptr+0xe4/frame 0xffffff90c310d610 tcp_output() at tcp_output+0x16cd/frame 0xffffff90c310d7c0 tcp_usr_send() at tcp_usr_send+0x325/frame 0xffffff90c310d820 sosend_generic() at sosend_generic+0x3f6/frame 0xffffff90c310d8c0 soo_write() at soo_write+0x5e/frame 0xffffff90c310d8f0 dofilewrite() at dofilewrite+0x85/frame 0xffffff90c310d940 kern_writev() at kern_writev+0x6c/frame 0xffffff90c310d980 sys_write() at sys_write+0x64/frame 0xffffff90c310d9d0 amd64_syscall() at amd64_syscall+0x5ea/frame 0xffffff90c310daf0 Xfast_syscall() at Xfast_syscall+0xf7/frame 0xffffff90c310daf0 (kgdb) bt #0 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:271 #1 0xffffffff80907eb4 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:454 #2 0xffffffff809083a7 in panic (fmt=0x1 <Address 0x1 out of bounds>) at /usr/src/sys/kern/kern_shutdown.c:642 #3 0xffffffff809766e4 in sbsndptr (sb=<value optimized out>, off=<value optimized out>, len=<value optimized out>, moff=<value optimized out>) at /usr/src/sys/kern/uipc_sockbuf.c:985 #4 0xffffffff80aaedbd in tcp_output (tp=0xfffffe0ba665a3d0) at /usr/src/sys/netinet/tcp_output.c:954 #5 0xffffffff80abc555 in tcp_usr_send (so=0xfffffe0150582550, flags=0, m=0xfffffe015043b000, nam=0x0, control=<value optimized out>, td=0xfffffe0021d91920) at /usr/src/sys/netinet/tcp_usrreq.c:874 #6 0xffffffff8097c1f6 in sosend_generic (so=0xfffffe0150582550, addr=0x0, uio=0xffffff90c310d990, top=0xfffffe015043b000, control=0x0, flags=<value optimized out>, td=0xfffffe0021d91920) at /usr/src/sys/kern/uipc_socket.c:1376 #7 0xffffffff8095ea6e in soo_write (fp=<value optimized out>, uio=0xffffff90c310d990, active_cred=<value optimized out>, flags=<value optimized out>, td=<value optimized out>) at /usr/src/sys/kern/sys_socket.c:102 #8 0xffffffff80957195 in dofilewrite (td=0xfffffe0021d91920, fd=3, fp=0xfffffe00216fedc0, auio=0xffffff90c310d990, offset=<value optimized out>, flags=0) at file.h:295 #9 0xffffffff809574cc in kern_writev (td=0xfffffe0021d91920, fd=3, auio=0xffffff90c310d990) at /usr/src/sys/kern/sys_generic.c:477 #10 0xffffffff80957554 in sys_write (td=<value optimized out>, uap=<value optimized out>) at /usr/src/sys/kern/sys_generic.c:393 #11 0xffffffff80cfea4a in amd64_syscall (td=0xfffffe0021d91920, traced=0) at subr_syscall.c:135 #12 0xffffffff80ce8ac7 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:391 #13 0x0000000802da3bec in ?? () (kgdb) p *(struct sockbuf *)0xfffffe01505826d0 $1 = {sb_sel = {si_tdlist = {tqh_first = 0x0, tqh_last = 0x0}, si_note = {kl_list = {slh_first = 0x0}, kl_lock = 0xffffffff808cd0c0 <knlist_mtx_lock>, kl_unlock = 0xffffffff808cd090 <knlist_mtx_unlock>, kl_assert_locked = 0xffffffff808c9a10 <knlist_mtx_assert_locked>, kl_assert_unlocked = 0xffffffff808c9a20 <knlist_mtx_assert_unlocked>, kl_lockarg = 0xfffffe0150582718}, si_mtx = 0x0}, sb_mtx = {lock_object = { lo_name = 0xffffffff80f3e7fd "so_snd", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 18446741875254171936}, sb_sx = {lock_object = { lo_name = 0xffffffff80f3ed6b "so_snd_sx", lo_flags = 36896768, lo_data = 0, lo_witness = 0x0}, sx_lock = 18446741875254171936}, sb_state = 0, sb_mb = 0xfffffe015043b000, sb_mbtail = 0xfffffe015043b000, sb_lastrecord = 0xfffffe015043b000, sb_sndptr = 0x0, sb_sndptroff = 72, sb_cc = 84, sb_hiwat = 131376, sb_mbcnt = 256, sb_mcnt = 1, sb_ccnt = 0, sb_mbmax = 1051008, sb_ctl = 0, sb_lowat = 2048, sb_timeo = 0, sb_flags = 2048, sb_upcall = 0, sb_upcallarg = 0x0} (kgdb) p *(struct mbuf *)0xfffffe015043b000 $2 = {m_hdr = {mh_next = 0x0, mh_nextpkt = 0x0, mh_data = 0xfffffe015043b068 "�\223ykt?--�L\213\203)�>=�\230�\227", mh_len = 72, mh_flags = 2, mh_type = 1, pad = "\000\000\000\000\000"}, M_dat = {MH = {MH_pkthdr = {rcvif = 0x0, header = 0x0, len = 0, flowid = 0, csum_flags = 0, csum_data = 0, tso_segsz = 0, PH_vt = { vt_vtag = 0, vt_nrecs = 0}, tags = {slh_first = 0x0}}, MH_dat = {MH_ext = {ext_buf = 0x1a17cf9646fc61e6 <Address 0x1a17cf9646fc61e6 out of bounds>, ext_free = 0x401596cd1b508c86, ext_arg1 = 0x2d2d3f746b7993d6, ext_arg2 = 0x3d3ee229838b4cb5, ext_size = 2544539875, ref_cnt = 0x1040000b806022a, ext_type = -11722238}, MH_databuf = "�a�F\226�\027\032\206\214P\033�\226\025@�\223ykt?--�L\213\203)�>=�\230�\227\000\000\001R*\002\006�\000\000\004\001\002\"M��P�/\000\026�2\030��U?\003\203�\200\030\b\004��\000\000\001\001\b\nA\022�l\v��\b\000\000\000 ���\034R\\\002tn�b!�\202(}\177Y���\005#j�n�\024\232\224\004}�Rq��\203�\001��맠��38.137.198 via vlan802\b\210\220\215\b\210\216"}}, M_databuf = '\0' <repeats 36 times>, "\033�d\237\000\000\000\000\000\000\000\000�a�F\226�\027\032\206\214P\033�\226\025@�\223ykt?--�L\213\203)�>=�\230�\227\000\000\001R*\002\006�\000\000\004\001\002\"M��P�/\000\026�2\030��U?\003\203�\200\030\b\004��\000\000\001\001\b\nA\022�l\v��\b\000\000\000 ���\034R\\\002tn�b!�\202(}\177Y���\005#j�n�\024\232\224\004}�Rq��\203�\001��맠��38.137.198 via vlan802\b\210\220\215\b\210\216"}} (kgdb) f 6 #6 0xffffffff8097c1f6 in sosend_generic (so=0xfffffe0150582550, addr=0x0, uio=0xffffff90c310d990, top=0xfffffe015043b000, control=0x0, flags=<value optimized out>, td=0xfffffe0021d91920) at /usr/src/sys/kern/uipc_socket.c:1376 1376 error = (*so->so_proto->pr_usrreqs->pru_send)(so, (kgdb) p *so $3 = {so_count = 1, so_type = 1, so_options = 12, so_linger = 0, so_state = 258, so_qstate = 0, so_pcb = 0xfffffe0ba677aaf0, so_vnet = 0x0, so_proto = 0xffffffff8143c3f0, so_head = 0x0, so_incomp = {tqh_first = 0x0, tqh_last = 0x0}, so_comp = {tqh_first = 0x0, tqh_last = 0x0}, so_list = {tqe_next = 0x0, tqe_prev = 0xfffffe01505862e8}, so_qlen = 0, so_incqlen = 0, so_qlimit = 0, so_timeo = 0, so_error = 0, so_sigio = 0x0, so_oobmark = 0, so_aiojobq = { tqh_first = 0x0, tqh_last = 0xfffffe01505825d0}, so_rcv = {sb_sel = {si_tdlist = {tqh_first = 0x0, tqh_last = 0xfffffe01505825e0}, si_note = {kl_list = { slh_first = 0x0}, kl_lock = 0xffffffff808cd0c0 <knlist_mtx_lock>, kl_unlock = 0xffffffff808cd090 <knlist_mtx_unlock>, kl_assert_locked = 0xffffffff808c9a10 <knlist_mtx_assert_locked>, kl_assert_unlocked = 0xffffffff808c9a20 <knlist_mtx_assert_unlocked>, kl_lockarg = 0xfffffe0150582628}, si_mtx = 0xffffff800e02f2f0}, sb_mtx = {lock_object = {lo_name = 0xffffffff80f3e7f6 "so_rcv", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, sb_sx = {lock_object = {lo_name = 0xffffffff80f3ed75 "so_rcv_sx", lo_flags = 36896768, lo_data = 0, lo_witness = 0x0}, sx_lock = 1}, sb_state = 0, sb_mb = 0x0, sb_mbtail = 0x0, sb_lastrecord = 0x0, sb_sndptr = 0x0, sb_sndptroff = 0, sb_cc = 0, sb_hiwat = 131376, sb_mbcnt = 0, sb_mcnt = 0, sb_ccnt = 0, sb_mbmax = 1051008, sb_ctl = 0, sb_lowat = 1, sb_timeo = 0, sb_flags = 2056, sb_upcall = 0, sb_upcallarg = 0x0}, so_snd = {sb_sel = {si_tdlist = {tqh_first = 0x0, tqh_last = 0x0}, si_note = {kl_list = {slh_first = 0x0}, kl_lock = 0xffffffff808cd0c0 <knlist_mtx_lock>, kl_unlock = 0xffffffff808cd090 <knlist_mtx_unlock>, kl_assert_locked = 0xffffffff808c9a10 <knlist_mtx_assert_locked>, kl_assert_unlocked = 0xffffffff808c9a20 <knlist_mtx_assert_unlocked>, kl_lockarg = 0xfffffe0150582718}, si_mtx = 0x0}, sb_mtx = {lock_object = {lo_name = 0xffffffff80f3e7fd "so_snd", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 18446741875254171936}, sb_sx = {lock_object = {lo_name = 0xffffffff80f3ed6b "so_snd_sx", lo_flags = 36896768, lo_data = 0, lo_witness = 0x0}, sx_lock = 18446741875254171936}, sb_state = 0, sb_mb = 0xfffffe015043b000, sb_mbtail = 0xfffffe015043b000, sb_lastrecord = 0xfffffe015043b000, sb_sndptr = 0x0, sb_sndptroff = 72, sb_cc = 84, sb_hiwat = 131376, sb_mbcnt = 256, sb_mcnt = 1, sb_ccnt = 0, sb_mbmax = 1051008, sb_ctl = 0, sb_lowat = 2048, sb_timeo = 0, sb_flags = 2048, sb_upcall = 0, sb_upcallarg = 0x0}, so_cred = 0xfffffe0150293100, so_label = 0x0, so_peerlabel = 0x0, so_gencnt = 172891, so_emuldata = 0x0, so_accf = 0x0, so_fibnum = 0, so_user_cookie = 0} (kgdb) set $inp=(struct inpcb*)so->so_pcb (kgdb) p *$inp $4 = {inp_hash = {le_next = 0x0, le_prev = 0xfffffe0012f570b8}, inp_pcbgrouphash = {le_next = 0x0, le_prev = 0x0}, inp_list = {le_next = 0xfffffe05c1d22c80, le_prev = 0xffffffff81531050}, inp_ppcb = 0xfffffe0ba665a3d0, inp_pcbinfo = 0xffffffff81531060, inp_pcbgroup = 0x0, inp_pcbgroup_wild = {le_next = 0x0, le_prev = 0x0}, inp_socket = 0xfffffe0150582550, inp_cred = 0xfffffe0150293100, inp_flow = 3067088640, inp_flags = 545300480, inp_flags2 = 0, inp_vflag = 6 '\006', inp_ip_ttl = 64 '@', inp_ip_p = 0 '\0', inp_ip_minttl = 0 '\0', inp_flowid = 576216547, inp_refcount = 1, inp_pspare = {0x0, 0x0, 0x0, 0x0, 0x0}, inp_ispare = {0, 0, 0, 0, 0, 0}, inp_inc = {inc_flags = 1 '\001', inc_len = 0 '\0', inc_fibnum = 0, inc_ie = {ie_fport = 13015, ie_lport = 5632, ie_dependfaddr = {ie46_foreign = { ia46_pad32 = {3087401514, 17039360, 4283245058}, ia46_addr4 = {s_addr = 801984766}}, ie6_foreign = {__u6_addr = { __u6_addr8 = "*\002\006�\000\000\004\001\002\"M��P�/", __u6_addr16 = {554, 47110, 0, 260, 8706, 65357, 20734, 12237}, __u6_addr32 = {3087401514, 17039360, 4283245058, 801984766}}}}, ie_dependladdr = {ie46_local = {ia46_pad32 = {3087401514, 917504, 0}, ia46_addr4 = {s_addr = 1375797248}}, ie6_local = { __u6_addr = {__u6_addr8 = "*\002\006�\000\000\016\000\000\000\000\000\000\000\001R", __u6_addr16 = {554, 47110, 0, 14, 0, 0, 0, 20993}, __u6_addr32 = { 3087401514, 917504, 0, 1375797248}}}}, ie6_zoneid = 0}}, inp_label = 0x0, inp_sp = 0x0, inp_depend4 = {inp4_ip_tos = 0 '\0', inp4_options = 0x0, inp4_moptions = 0x0}, inp_depend6 = {inp6_options = 0x0, inp6_outputopts = 0xfffffe0b8794c100, inp6_moptions = 0x0, inp6_icmp6filt = 0x0, inp6_cksum = 0, inp6_hops = -1}, inp_portlist = {le_next = 0xfffffe05c1d22c80, le_prev = 0xfffffe010b8830b0}, inp_phd = 0xfffffe010b8830a0, inp_gencnt = 2008, inp_lle = 0x0, inp_rt = 0x0, inp_lock = {lock_object = {lo_name = 0xffffffff80f59235 "tcpinp", lo_flags = 90898432, lo_data = 0, lo_witness = 0x0}, rw_lock = 18446741875254171936}} (kgdb) printf "%x\n", $inp->inp_flags 2080a000 (kgdb) set $inc=$inp->inp_inc (kgdb) p $inc $6 = {inc_flags = 1 '\001', inc_len = 0 '\0', inc_fibnum = 0, inc_ie = {ie_fport = 13015, ie_lport = 5632, ie_dependfaddr = {ie46_foreign = {ia46_pad32 = {3087401514, 17039360, 4283245058}, ia46_addr4 = {s_addr = 801984766}}, ie6_foreign = {__u6_addr = {__u6_addr8 = "*\002\006�\000\000\004\001\002\"M��P�/", __u6_addr16 = { 554, 47110, 0, 260, 8706, 65357, 20734, 12237}, __u6_addr32 = {3087401514, 17039360, 4283245058, 801984766}}}}, ie_dependladdr = {ie46_local = { ia46_pad32 = {3087401514, 917504, 0}, ia46_addr4 = {s_addr = 1375797248}}, ie6_local = {__u6_addr = { __u6_addr8 = "*\002\006�\000\000\016\000\000\000\000\000\000\000\001R", __u6_addr16 = {554, 47110, 0, 14, 0, 0, 0, 20993}, __u6_addr32 = {3087401514, 917504, 0, 1375797248}}}}, ie6_zoneid = 0}} (kgdb) printf "%04x\n", $inc.inc_ie->ie_lport 1600 (kgdb) printf "%d\n", 0x0016 22
Reassign to freebsd-net.
Second panic: panic: sbsndptr: sockbuf 0xfffffe03e62b5c20 and mbuf 0xfffffe01d8fd3900 clashing cpuid = 31 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame 0xffffff90d4fca430 kdb_backtrace() at kdb_backtrace+0x37/frame 0xffffff90d4fca4f0 panic() at panic+0x1ce/frame 0xffffff90d4fca5f0 sbsndptr() at sbsndptr+0xe4/frame 0xffffff90d4fca610 tcp_output() at tcp_output+0x16cd/frame 0xffffff90d4fca7c0 tcp_usr_send() at tcp_usr_send+0x325/frame 0xffffff90d4fca820 sosend_generic() at sosend_generic+0x3f6/frame 0xffffff90d4fca8c0 soo_write() at soo_write+0x5e/frame 0xffffff90d4fca8f0 dofilewrite() at dofilewrite+0x85/frame 0xffffff90d4fca940 kern_writev() at kern_writev+0x6c/frame 0xffffff90d4fca980 sys_write() at sys_write+0x64/frame 0xffffff90d4fca9d0 amd64_syscall() at amd64_syscall+0x5ea/frame 0xffffff90d4fcaaf0 Xfast_syscall() at Xfast_syscall+0xf7/frame 0xffffff90d4fcaaf0 --- syscall (4, FreeBSD ELF64, sys_write), rip = 0x802da3bec, rsp = 0x7fffffffdae8, rbp = 0x7fffffffdbf0 --- Uptime: 1m48s Dumping 3468 out of 65475 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /boot/kernel/zfs.ko.symbols...done. done. Loaded symbols for /boot/kernel/zfs.ko Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. done. Loaded symbols for /boot/kernel/opensolaris.ko Reading symbols from /boot/kernel/if_igb.ko...Reading symbols from /boot/kernel/if_igb.ko.symbols...done. done. Loaded symbols for /boot/kernel/if_igb.ko Reading symbols from /boot/kernel/aac.ko...Reading symbols from /boot/kernel/aac.ko.symbols...done. done. Loaded symbols for /boot/kernel/aac.ko Reading symbols from /boot/kernel/ipdivert.ko...Reading symbols from /boot/kernel/ipdivert.ko.symbols...done. done. Loaded symbols for /boot/kernel/ipdivert.ko Reading symbols from /boot/kernel/ipfw.ko...Reading symbols from /boot/kernel/ipfw.ko.symbols...done. done. Loaded symbols for /boot/kernel/ipfw.ko Reading symbols from /boot/kernel/t5fw_cfg.ko...Reading symbols from /boot/kernel/t5fw_cfg.ko.symbols...done. done. Loaded symbols for /boot/kernel/t5fw_cfg.ko Reading symbols from /boot/kernel/if_cxgbe.ko...Reading symbols from /boot/kernel/if_cxgbe.ko.symbols...done. done. Loaded symbols for /boot/kernel/if_cxgbe.ko Reading symbols from /boot/kernel/ipmi.ko...Reading symbols from /boot/kernel/ipmi.ko.symbols...done. done. Loaded symbols for /boot/kernel/ipmi.ko Reading symbols from /boot/kernel/smbus.ko...Reading symbols from /boot/kernel/smbus.ko.symbols...done. done. Loaded symbols for /boot/kernel/smbus.ko #0 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:271 271 if (textdump && textdump_pending) { (kgdb) bt #0 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:271 #1 0xffffffff80907eb4 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:454 #2 0xffffffff809083a7 in panic (fmt=0x1 <Address 0x1 out of bounds>) at /usr/src/sys/kern/kern_shutdown.c:642 #3 0xffffffff809766e4 in sbsndptr (sb=<value optimized out>, off=<value optimized out>, len=<value optimized out>, moff=<value optimized out>) at /usr/src/sys/kern/uipc_sockbuf.c:985 #4 0xffffffff80aaedbd in tcp_output (tp=0xfffffe03e675a3d0) at /usr/src/sys/netinet/tcp_output.c:954 #5 0xffffffff80abc555 in tcp_usr_send (so=0xfffffe03e62b5aa0, flags=0, m=0xfffffe01d8fd2200, nam=0x0, control=<value optimized out>, td=0xfffffe0021e90000) at /usr/src/sys/netinet/tcp_usrreq.c:874 #6 0xffffffff8097c1f6 in sosend_generic (so=0xfffffe03e62b5aa0, addr=0x0, uio=0xffffff90d4fca990, top=0xfffffe01d8fd2200, control=0x0, flags=<value optimized out>, td=0xfffffe0021e90000) at /usr/src/sys/kern/uipc_socket.c:1376 #7 0xffffffff8095ea6e in soo_write (fp=<value optimized out>, uio=0xffffff90d4fca990, active_cred=<value optimized out>, flags=<value optimized out>, td=<value optimized out>) at /usr/src/sys/kern/sys_socket.c:102 #8 0xffffffff80957195 in dofilewrite (td=0xfffffe0021e90000, fd=3, fp=0xfffffe0021cf3820, auio=0xffffff90d4fca990, offset=<value optimized out>, flags=0) at file.h:295 #9 0xffffffff809574cc in kern_writev (td=0xfffffe0021e90000, fd=3, auio=0xffffff90d4fca990) at /usr/src/sys/kern/sys_generic.c:477 #10 0xffffffff80957554 in sys_write (td=<value optimized out>, uap=<value optimized out>) at /usr/src/sys/kern/sys_generic.c:393 #11 0xffffffff80cfea4a in amd64_syscall (td=0xfffffe0021e90000, traced=0) at subr_syscall.c:135 #12 0xffffffff80ce8ac7 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:391 #13 0x0000000802da3bec in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) p *(struct sockbuf *)0xfffffe03e62b5c20 $1 = {sb_sel = {si_tdlist = {tqh_first = 0x0, tqh_last = 0x0}, si_note = {kl_list = {slh_first = 0x0}, kl_lock = 0xffffffff808cd0c0 <knlist_mtx_lock>, kl_unlock = 0xffffffff808cd090 <knlist_mtx_unlock>, kl_assert_locked = 0xffffffff808c9a10 <knlist_mtx_assert_locked>, kl_assert_unlocked = 0xffffffff808c9a20 <knlist_mtx_assert_unlocked>, kl_lockarg = 0xfffffe03e62b5c68}, si_mtx = 0x0}, sb_mtx = {lock_object = { lo_name = 0xffffffff80f3e7fd "so_snd", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 18446741875255214080}, sb_sx = {lock_object = { lo_name = 0xffffffff80f3ed6b "so_snd_sx", lo_flags = 36896768, lo_data = 0, lo_witness = 0x0}, sx_lock = 18446741875255214080}, sb_state = 0, sb_mb = 0xfffffe01f4069900, sb_mbtail = 0xfffffe01d8fd3900, sb_lastrecord = 0xfffffe01f4069900, sb_sndptr = 0xfffffe01d8fd3900, sb_sndptroff = 1632, sb_cc = 1716, sb_hiwat = 131376, sb_mbcnt = 4864, sb_mcnt = 11, sb_ccnt = 1, sb_mbmax = 1051008, sb_ctl = 0, sb_lowat = 2048, sb_timeo = 0, sb_flags = 2048, sb_upcall = 0, sb_upcallarg = 0x0} (kgdb) p *(struct mbuf *)0xfffffe01d8fd3900 $2 = {m_hdr = {mh_next = 0x0, mh_nextpkt = 0x0, mh_data = 0xfffffe01d8fd3928 "", mh_len = 68, mh_flags = 0, mh_type = 1, pad = "\000\000\000\000\000"}, M_dat = {MH = { MH_pkthdr = {rcvif = 0xb1dee9e530000000, header = 0xf10fc01307aab916, len = -337628730, flowid = 2682375970, csum_flags = -966380398, csum_data = -1624117065, tso_segsz = 11596, PH_vt = {vt_vtag = 31606, vt_nrecs = 31606}, tags = {slh_first = 0xa2b0a659a4311f25}}, MH_dat = {MH_ext = { ext_buf = 0x43772562c99aa431 <Address 0x43772562c99aa431 out of bounds>, ext_free = 0x7e1cffd9b6b13fc6, ext_arg1 = 0x731c9ab425536605, ext_arg2 = 0xebc6cac44b21a941, ext_size = 520953289, ref_cnt = 0x5165381046dcad94, ext_type = 1308134978}, MH_databuf = "1�\232�b%wC�?����\034~\005fS%�\232\034sA�!K�����\035\r\037Iܡq\224��F\0208eQB\216�M�P�/\000\026OS^Lq%�MY\212\200\030\b\004\021\000\000\000\001\001\b\n2�� \v��O\000\000\000 ��n�ٻ�Er\032S\201\220\220��I�\"\210\233\v\0223?=�*a|\231\001\022�6}�G�\026�\036z\n\023�<���B8�\200\000\000\000\000\000\000\002%\220���B8\001\003Ip\000\000\000"}}, M_databuf = "\000\000\0000��ޱ\026��\a\023�\017��1��\"��\237\2224fƷ�1\237L-v{X�\235\214%\0371�Y���1�\232�b%wC�?����\034~\005fS%�\232\034sA�!K�����\035\r\037Iܡq\224��F\0208eQB\216�M�P�/\000\026OS^Lq%�MY\212\200\030\b\004\021\000\000\000\001\001\b\n2�� \v��O\000\000\000 ��n�ٻ�Er\032S\201\220\220��I�\"\210\233\v\0223?=�*a|\231\001\022�6}�G�\026�\036z\n\023�<���B8�\200\000\000\000\000\000\000"...}} (kgdb) f 6 #6 0xffffffff8097c1f6 in sosend_generic (so=0xfffffe03e62b5aa0, addr=0x0, uio=0xffffff90d4fca990, top=0xfffffe01d8fd2200, control=0x0, flags=<value optimized out>, td=0xfffffe0021e90000) at /usr/src/sys/kern/uipc_socket.c:1376 1376 error = (*so->so_proto->pr_usrreqs->pru_send)(so, (kgdb) p *so $3 = {so_count = 1, so_type = 1, so_options = 12, so_linger = 0, so_state = 258, so_qstate = 0, so_pcb = 0xfffffe03e678a640, so_vnet = 0x0, so_proto = 0xffffffff8143c3f0, so_head = 0x0, so_incomp = {tqh_first = 0x0, tqh_last = 0x0}, so_comp = {tqh_first = 0x0, tqh_last = 0x0}, so_list = {tqe_next = 0x0, tqe_prev = 0xfffffe01d8f96040}, so_qlen = 0, so_incqlen = 0, so_qlimit = 0, so_timeo = 0, so_error = 0, so_sigio = 0x0, so_oobmark = 0, so_aiojobq = { tqh_first = 0x0, tqh_last = 0xfffffe03e62b5b20}, so_rcv = {sb_sel = {si_tdlist = {tqh_first = 0x0, tqh_last = 0xfffffe03e62b5b30}, si_note = {kl_list = { slh_first = 0x0}, kl_lock = 0xffffffff808cd0c0 <knlist_mtx_lock>, kl_unlock = 0xffffffff808cd090 <knlist_mtx_unlock>, kl_assert_locked = 0xffffffff808c9a10 <knlist_mtx_assert_locked>, kl_assert_unlocked = 0xffffffff808c9a20 <knlist_mtx_assert_unlocked>, kl_lockarg = 0xfffffe03e62b5b78}, si_mtx = 0xffffff800e02f670}, sb_mtx = {lock_object = {lo_name = 0xffffffff80f3e7f6 "so_rcv", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, sb_sx = {lock_object = {lo_name = 0xffffffff80f3ed75 "so_rcv_sx", lo_flags = 36896768, lo_data = 0, lo_witness = 0x0}, sx_lock = 1}, sb_state = 0, sb_mb = 0x0, sb_mbtail = 0x0, sb_lastrecord = 0x0, sb_sndptr = 0x0, sb_sndptroff = 0, sb_cc = 0, sb_hiwat = 131376, sb_mbcnt = 0, sb_mcnt = 0, sb_ccnt = 0, sb_mbmax = 1051008, sb_ctl = 0, sb_lowat = 1, sb_timeo = 0, sb_flags = 2056, sb_upcall = 0, sb_upcallarg = 0x0}, so_snd = {sb_sel = {si_tdlist = {tqh_first = 0x0, tqh_last = 0x0}, si_note = {kl_list = {slh_first = 0x0}, kl_lock = 0xffffffff808cd0c0 <knlist_mtx_lock>, kl_unlock = 0xffffffff808cd090 <knlist_mtx_unlock>, kl_assert_locked = 0xffffffff808c9a10 <knlist_mtx_assert_locked>, kl_assert_unlocked = 0xffffffff808c9a20 <knlist_mtx_assert_unlocked>, kl_lockarg = 0xfffffe03e62b5c68}, si_mtx = 0x0}, sb_mtx = {lock_object = {lo_name = 0xffffffff80f3e7fd "so_snd", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 18446741875255214080}, sb_sx = {lock_object = {lo_name = 0xffffffff80f3ed6b "so_snd_sx", lo_flags = 36896768, lo_data = 0, lo_witness = 0x0}, sx_lock = 18446741875255214080}, sb_state = 0, sb_mb = 0xfffffe01f4069900, sb_mbtail = 0xfffffe01d8fd3900, sb_lastrecord = 0xfffffe01f4069900, sb_sndptr = 0xfffffe01d8fd3900, sb_sndptroff = 1632, sb_cc = 1716, sb_hiwat = 131376, sb_mbcnt = 4864, sb_mcnt = 11, sb_ccnt = 1, sb_mbmax = 1051008, sb_ctl = 0, sb_lowat = 2048, sb_timeo = 0, sb_flags = 2048, sb_upcall = 0, sb_upcallarg = 0x0}, so_cred = 0xfffffe01f48ce900, so_label = 0x0, so_peerlabel = 0x0, so_gencnt = 13244, so_emuldata = 0x0, so_accf = 0x0, so_fibnum = 0, so_user_cookie = 0} (kgdb) set $inp=(struct inpcb *)so->so_pcb (kgdb) p *$inp $4 = {inp_hash = {le_next = 0x0, le_prev = 0xfffffe0012f573b0}, inp_pcbgrouphash = {le_next = 0x0, le_prev = 0x0}, inp_list = {le_next = 0xfffffe03e679bc80, le_prev = 0xfffffe03e6743020}, inp_ppcb = 0xfffffe03e675a3d0, inp_pcbinfo = 0xffffffff81531060, inp_pcbgroup = 0x0, inp_pcbgroup_wild = {le_next = 0x0, le_prev = 0x0}, inp_socket = 0xfffffe03e62b5aa0, inp_cred = 0xfffffe01f48ce900, inp_flow = 3457486592, inp_flags = 545300480, inp_flags2 = 0, inp_vflag = 6 '\006', inp_ip_ttl = 64 '@', inp_ip_p = 0 '\0', inp_ip_minttl = 0 '\0', inp_flowid = 1779132015, inp_refcount = 1, inp_pspare = {0x0, 0x0, 0x0, 0x0, 0x0}, inp_ispare = {0, 0, 0, 0, 0, 0}, inp_inc = {inc_flags = 1 '\001', inc_len = 0 '\0', inc_fibnum = 0, inc_ie = {ie_fport = 21327, ie_lport = 5632, ie_dependfaddr = {ie46_foreign = { ia46_pad32 = {3087401514, 17039360, 4283245058}, ia46_addr4 = {s_addr = 801984766}}, ie6_foreign = {__u6_addr = { __u6_addr8 = "*\002\006�\000\000\004\001\002\"M��P�/", __u6_addr16 = {554, 47110, 0, 260, 8706, 65357, 20734, 12237}, __u6_addr32 = {3087401514, 17039360, 4283245058, 801984766}}}}, ie_dependladdr = {ie46_local = {ia46_pad32 = {3087401514, 917504, 0}, ia46_addr4 = {s_addr = 1375797248}}, ie6_local = { __u6_addr = {__u6_addr8 = "*\002\006�\000\000\016\000\000\000\000\000\000\000\001R", __u6_addr16 = {554, 47110, 0, 14, 0, 0, 0, 20993}, __u6_addr32 = { 3087401514, 917504, 0, 1375797248}}}}, ie6_zoneid = 0}}, inp_label = 0x0, inp_sp = 0x0, inp_depend4 = {inp4_ip_tos = 0 '\0', inp4_options = 0x0, inp4_moptions = 0x0}, inp_depend6 = {inp6_options = 0x0, inp6_outputopts = 0xfffffe0013424500, inp6_moptions = 0x0, inp6_icmp6filt = 0x0, inp6_cksum = 0, inp6_hops = -1}, inp_portlist = {le_next = 0xfffffe03e6d8f640, le_prev = 0xfffffe03e6743140}, inp_phd = 0xfffffe03e6dfa540, inp_gencnt = 1509, inp_lle = 0x0, inp_rt = 0x0, inp_lock = {lock_object = {lo_name = 0xffffffff80f59235 "tcpinp", lo_flags = 90898432, lo_data = 0, lo_witness = 0x0}, rw_lock = 18446741875255214080}}
We saw this panic on stable10 from Jan. Dump header from device /dev/da0s1b Architecture: amd64 Architecture Version: 2 Dump Length: 1050050560B (1001 MB) Blocksize: 512 Dumptime: Wed Mar 11 23:12:59 2015 Hostname: xxxxxxxxxxxxxxxxxx Magic: FreeBSD Kernel Dump Version String: FreeBSD 10.1-STABLE-llnw10 #0: Fri Jan 16 00:19:27 MST 2015 xxxx@xxxxxxxxxxx:/usr/obj/usr/src/sys/SIXFOUR Panic String: sbsndptr: sockbuf 0xfffff802d3e79440 and mbuf 0xfffff80238089b00 clashing Dump Parity: 747892521 Bounds: 0 Dump Status: good (kgdb) #0 doadump (textdump=1) at pcpu.h:219 #1 0xffffffff8072d397 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:452 #2 0xffffffff8072d774 in panic (fmt=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:759 #3 0xffffffff807a38a0 in sbsndptr (sb=<value optimized out>, off=<value optimized out>, len=<value optimized out>, moff=<value optimized out>) at /usr/src/sys/kern/uipc_sockbuf.c:1011 #4 0xffffffff80895f1f in tcp_output (tp=0xfffff802d5315400) at /usr/src/sys/netinet/tcp_output.c:1092 #5 0xffffffff80891685 in tcp_do_segment (m=0xfffff80128e51d00, th=0xfffff8020ca95022, so=0xfffff802d3e792b8, tp=0xfffff802d5315400, drop_hdrlen=<value optimized out>, tlen=0, iptos=<value optimized out>, ti_locked=-1) at /usr/src/sys/netinet/tcp_input.c:2729 #6 0xffffffff8088f54d in tcp_input (m=<value optimized out>, off0=<value optimized out>) at /usr/src/sys/netinet/tcp_input.c:1388 #7 0xffffffff808216b7 in ip_input (m=0xfffff80128e51d00) at /usr/src/sys/netinet/ip_input.c:734 #8 0xffffffff807fbc92 in netisr_dispatch_src (proto=<value optimized out>, source=<value optimized out>, m=0x0) at /usr/src/sys/net/netisr.c:972 #9 0xffffffff807f4656 in ether_demux (ifp=<value optimized out>, m=0xfffff80128e51d00) at /usr/src/sys/net/if_ethersubr.c:851 #10 0xffffffff807f52e9 in ether_nh_input (m=<value optimized out>) at /usr/src/sys/net/if_ethersubr.c:646 #11 0xffffffff807fbc92 in netisr_dispatch_src (proto=<value optimized out>, source=<value optimized out>, m=0x0) at /usr/src/sys/net/netisr.c:972 #12 0xffffffff8042332b in em_rxeof (count=94) at /usr/src/sys/dev/e1000/if_em.c:4532 #13 0xffffffff80423703 in em_msix_rx (arg=0xfffff80003ce6600) at /usr/src/sys/dev/e1000/if_em.c:1600 #14 0xffffffff806fe0fb in intr_event_execute_handlers ( p=<value optimized out>, ie=0xfffff80003d04200) at /usr/src/sys/kern/kern_intr.c:1264 #15 0xffffffff806fea96 in ithread_loop (arg=0xfffff80003cfb3c0) at /usr/src/sys/kern/kern_intr.c:1277 #16 0xffffffff806fbd1a in fork_exit ( callout=0xffffffff806fea00 <ithread_loop>, arg=0xfffff80003cfb3c0, frame=0xfffffe03438d7c00) at /usr/src/sys/kern/kern_fork.c:1017 #17 0xffffffff80acdf5e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:611 #18 0x0000000000000000 in ?? () Current language: auto; currently minimal (kgdb) Unfortunately I do not have a crashdump to investigate further.
I see this problem also with the port of the FreeBSD 9-stable (2015-04-09) network stack to the RTEMS real-time operating system (Altera Cyclone V target). It occurs quite frequently with multiple concurrent IPv4 TCP transfers to /dev/null and from /dev/zero. The stack trace is: 000|panic(fmt = 0x00636E14) 001|sbsndptr(?, ?, ?, ?) 002|tcp_output(?) 003|tcp_do_segment(?, ?, so = 0x3FF3D7C0, tp = 0x3FF2F828, drop_hdrlen = 52, tlen = 0, iptos = 0, ?) 004|tcp_input(m = 0x3FF21700, ?) 005|ip_input(m = 0x3FF21700) 006|netisr_dispatch_src(?, ?, ?) 007|ether_demux(ifp = 0x006DFEE0, m = 0x3FF21700) 008|ether_nh_input(?) 009|netisr_dispatch_src(?, ?, ?) 010|dwc_rxfinish_locked(inline) 010|dwc_intr(arg = 0x006DEA20) 011|bsp_interrupt_server_task(?) 012|Thread_Handler() ---|end of frame
sbdrop() performs invariant checks as it tears down a socket buffer on final close -- originally intended to validate a set of values cached by the socket buffer that could (in the presence of a socket-buffer bug) get out of sync with the chain stored there. However, these checks have proven something of a 'canary' for many possible underlying bugs involving mbuf chains and socket buffers. I've seen the panics most frequently in the presence of device-driver concurrency bugs -- e.g., in which a driver makes changes to the mbuf chain after handing the mbuf off to the network stack via netisr, for example, or involving improper freeing of an mbuf by other code while it remains referenced by a socket buffer. Others have spotted them in the presence of other classes of network-stack race conditions -- most involving a failure to have a single thread or object own an mbuf. As such, seeing this panic is a symptom of many possible underlying problems and hence not a specific 'bug' per se. However, as a useful rule of thumb: when I spot this panic, I look first at the device driver to make sure that there is no possible use of mbuf after it is passed as an argument to netisr.
(In reply to Robert Watson from comment #17) Thanks for the warning. The driver looks all right. I update yesterday from FreeBSD 9.3 to 9-stable 2015-04-09 since previously I had other problems. For example a NULL pointer dereference in tcp_reass() (I guess the temporary stack element remained on the list and was overwritten in a second call) and a corruption of a UMA keg (mbuf_packet zone).
(In reply to sebastian.huber from comment #18) I found one problem with the driver. In the RTEMS port of the network stack I don't use the BUS_DMA(9) support and instead directly use cache invalidate/flush routines (the Altera Cyclone V has no automatic cache coherency between the Ethernet module and the processors). In the receive path it was done like this: invalidate buffer (mcluster) register buffer in receive DMA descriptor ... DMA done hand over buffer to interface input It seems that due to a cache line prefetch sometimes cache lines of the buffer are loaded to the cache after the invalidate, but before the receive DMA completed its transfers. I changed the sequence like this: invalidate buffer (mcluster) register buffer in receive DMA descriptor ... DMA done invalidate buffer hand over buffer to interface input Now it works very stable and I didn't observe a mbuf or socketbuf corruption any more. So as an off hand guess it seems in case the network stack is presented with partially invalid data (previously received IP and TCP headers mixed with new data most likely), then this could lead to a crash in the TCP input processing.
Hi, This seems related? A GENERIC kernel binary installed using freebsd-update, so I have no WITNESS or KGDB, only standard vanilla kernel. I have core dumps (2) and a complete core.txt if anyone is interested. # ifconfig -l oce0 oce1 lo0 # freebsd-version -k 10.1-RELEASE-p19 # cat /var/crash/core.txt.1 ... Thu Aug 27 15:04:09 CEST 2015 FreeBSD kurs-ap-01 10.1-RELEASE-p10 FreeBSD 10.1-RELEASE-p10 #0: Wed May 13 06:54:13 UTC 2015 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 panic: sbsndptr: sockbuf 0xfffff80312126c68 and mbuf 0xfffff800b4a36800 clashing GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: panic: sbsndptr: sockbuf 0xfffff80312126c68 and mbuf 0xfffff800b4a36800 clashing cpuid = 1 KDB: stack backtrace: #0 0xffffffff80963000 at kdb_backtrace+0x60 #1 0xffffffff80928125 at panic+0x155 #2 0xffffffff8099c180 at sbdroprecord_locked+0 #3 0xffffffff80ac8c9c at tcp_output+0xdbc #4 0xffffffff80ac6a95 at tcp_do_segment+0x3045 #5 0xffffffff80ac2e04 at tcp_input+0xd04 #6 0xffffffff80a54fc7 at ip_input+0x97 #7 0xffffffff809f4f73 at swi_net+0x143 #8 0xffffffff808faf4b at intr_event_execute_handlers+0xab #9 0xffffffff808fb396 at ithread_loop+0x96 #10 0xffffffff808f8b6a at fork_exit+0x9a #11 0xffffffff80d0b67e at fork_trampoline+0xe Uptime: 21d0h54m53s Dumping 2005 out of 32709 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% Reading symbols from /boot/kernel/accf_data.ko.symbols...done. Loaded symbols for /boot/kernel/accf_data.ko.symbols Reading symbols from /boot/kernel/accf_http.ko.symbols...done. Loaded symbols for /boot/kernel/accf_http.ko.symbols Reading symbols from /boot/kernel/oce.ko.symbols...done. Loaded symbols for /boot/kernel/oce.ko.symbols Reading symbols from /boot/kernel/nullfs.ko.symbols...done. [root@kurs-ap-01 /home/girgen]# head -n 300 /var/crash/core.txt.1 kurs-ap-01 dumped core - see /var/crash/vmcore.1 Thu Aug 27 15:04:09 CEST 2015 FreeBSD kurs-ap-01 10.1-RELEASE-p10 FreeBSD 10.1-RELEASE-p10 #0: Wed May 13 06:54:13 UTC 2015 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 panic: sbsndptr: sockbuf 0xfffff80312126c68 and mbuf 0xfffff800b4a36800 clashing GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: panic: sbsndptr: sockbuf 0xfffff80312126c68 and mbuf 0xfffff800b4a36800 clashing cpuid = 1 KDB: stack backtrace: #0 0xffffffff80963000 at kdb_backtrace+0x60 #1 0xffffffff80928125 at panic+0x155 #2 0xffffffff8099c180 at sbdroprecord_locked+0 #3 0xffffffff80ac8c9c at tcp_output+0xdbc #4 0xffffffff80ac6a95 at tcp_do_segment+0x3045 #5 0xffffffff80ac2e04 at tcp_input+0xd04 #6 0xffffffff80a54fc7 at ip_input+0x97 #7 0xffffffff809f4f73 at swi_net+0x143 #8 0xffffffff808faf4b at intr_event_execute_handlers+0xab #9 0xffffffff808fb396 at ithread_loop+0x96 #10 0xffffffff808f8b6a at fork_exit+0x9a #11 0xffffffff80d0b67e at fork_trampoline+0xe Uptime: 21d0h54m53s Dumping 2005 out of 32709 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% Reading symbols from /boot/kernel/accf_data.ko.symbols...done. Loaded symbols for /boot/kernel/accf_data.ko.symbols Reading symbols from /boot/kernel/accf_http.ko.symbols...done. Loaded symbols for /boot/kernel/accf_http.ko.symbols Reading symbols from /boot/kernel/oce.ko.symbols...done. Loaded symbols for /boot/kernel/oce.ko.symbols Reading symbols from /boot/kernel/nullfs.ko.symbols...done. Loaded symbols for /boot/kernel/nullfs.ko.symbols Reading symbols from /boot/kernel/linprocfs.ko.symbols...done. Loaded symbols for /boot/kernel/linprocfs.ko.symbols Reading symbols from /boot/kernel/linux.ko.symbols...done. Loaded symbols for /boot/kernel/linux.ko.symbols Reading symbols from /boot/kernel/zfs.ko.symbols...done. Loaded symbols for /boot/kernel/zfs.ko.symbols Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols for /boot/kernel/opensolaris.ko.symbols #0 doadump (textdump=<value optimized out>) at pcpu.h:219 219 pcpu.h: No such file or directory. in pcpu.h (kgdb) #0 doadump (textdump=<value optimized out>) at pcpu.h:219 #1 0xffffffff80927da2 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:452 #2 0xffffffff80928164 in panic (fmt=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:759 #3 0xffffffff8099c180 in sbsndptr (sb=<value optimized out>, off=<value optimized out>, len=<value optimized out>, moff=<value optimized out>) at /usr/src/sys/kern/uipc_sockbuf.c:1011 #4 0xffffffff80ac8c9c in tcp_output (tp=0xfffff80312ef5800) at /usr/src/sys/netinet/tcp_output.c:870 #5 0xffffffff80ac6a95 in tcp_do_segment (m=<value optimized out>, th=<value optimized out>, so=<value optimized out>, tp=<value optimized out>, drop_hdrlen=<value optimized out>, tlen=0, iptos=<value optimized out>, ti_locked=Cannot access memory at address 0x1 ) at /usr/src/sys/netinet/tcp_input.c:3018 #6 0xffffffff80ac2e04 in tcp_input (m=<value optimized out>, off0=<value optimized out>) at /usr/src/sys/netinet/tcp_input.c:1377 #7 0xffffffff80a54fc7 in ip_input (m=0xfffff800b4516600) at /usr/src/sys/netinet/ip_input.c:734 #8 0xffffffff809f4f73 in swi_net (arg=0xffffffff81988880) at /usr/src/sys/net/netisr.c:765 #9 0xffffffff808faf4b in intr_event_execute_handlers ( p=<value optimized out>, ie=0xfffff800093ac600) at /usr/src/sys/kern/kern_intr.c:1263 #10 0xffffffff808fb396 in ithread_loop (arg=0xfffff80009388e40) at /usr/src/sys/kern/kern_intr.c:1276 #11 0xffffffff808f8b6a in fork_exit ( callout=0xffffffff808fb300 <ithread_loop>, arg=0xfffff80009388e40, frame=0xfffffe083c3e3ac0) at /usr/src/sys/kern/kern_fork.c:996 #12 0xffffffff80d0b67e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:606 #13 0x0000000000000000 in ?? () Current language: auto; currently minimal (kgdb)
This one persists on 10.2-STABLE.
Applying the patch at "https://reviews.freebsd.org/D5330" on 10.2-STABLE with ipfw+nat results in this bug when I attempt to lower the mtu on the WAN-interface from 1500 with "route change default -mtu 1196". The mtu of the LAN-interface is set at 1500.
This bug still persists in latest stable/10. We are seeing this mainly in case of em(4). Any hints on how to debug this further would be great.
D5330 is not related as far as I can tell.
I just hit this panic on 11-CURRENT r298999. Box wasn't particularly busy, but had been up a while. I am using em NIC. Any further info I can provide? Should I test the patch?
Update mfc flags and summary to reflect latest information. @Hiren, is this likely to affect 11.0-R ? If so, please cc re@ so they're aware and can track
For what it's worth, the original problem I reported in this PR from 6 years ago is (for me anyway) long-ago solved... but then we're on 10.3 these days and are slowly rolling 11-beta out. Haven't seen a panic in quite a while.
Got this right now on an update from 10.2-STABLE to 11.0-PRERELEASE. Persistent in 11.0-RC3. Repeatable in like 5-12 minutes. 25 minutes is an absolute record. panic: sbsndptr: sockbuf 0xfffff8003eea31b8 and mbuf 0xfffff80020a6e700 clashing GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: panic: sbsndptr: sockbuf 0xfffff8003eea31b8 and mbuf 0xfffff80020a6e700 clashing cpuid = 1 KDB: stack backtrace: #0 0xffffffff80b1d0c7 at kdb_backtrace+0x67 #1 0xffffffff80ad1f62 at vpanic+0x182 #2 0xffffffff80ad1dd3 at panic+0x43 #3 0xffffffff80b6a15a at sbsndptr+0xda #4 0xffffffff80cfcbb4 at tcp_output+0xf34 #5 0xffffffff80cf9a81 at tcp_do_segment+0x2ce1 #6 0xffffffff80cf60cc at tcp_input+0xd1c #7 0xffffffff80c66dbf at ip_input+0x15f #8 0xffffffff80bfc295 at netisr_dispatch_src+0xa5 #9 0xffffffff80be4cea at ether_demux+0x12a #10 0xffffffff80be5942 at ether_nh_input+0x322 #11 0xffffffff80bfc295 at netisr_dispatch_src+0xa5 #12 0xffffffff80be4f66 at ether_input+0x26 #13 0xffffffff80bed9db at vlan_input+0x1cb #14 0xffffffff80be4c55 at ether_demux+0x95 #15 0xffffffff80be5942 at ether_nh_input+0x322 #16 0xffffffff80bfc295 at netisr_dispatch_src+0xa5 #17 0xffffffff80be4f66 at ether_input+0x26
Just a quick comment in light of recent notes on this PR: the panic being seen is as a result of a kernel self-check that occurs on socket close, and likely reports on a bug that triggered some substantial time earlier (milliseconds, seconds, minutes, hours, days, or even weeks earlier), and reports on a class of problems rather than detecting a specific bug. It's entirely likely that the problem reported more recently is not the same bug as those reported previously with the same panic message -- rather, a similar bug with the same kernel self-check detecting it. In the past, this self-check has most frequently fired as a result of either bugs in the socket-buffer code (although I think none recently), or device-driver bugs involving modifications to the mbuf chain after submitting the mbuf to the network stack (e.g., due to concurrency bugs in the device driver). It can also occur in use-after-free scenarios, as a result of protocol bugs, etc. On the whole, my intuition is towards a device-driver bug based on past experience. Could you paste in the output of "dmesg" and "ifconfig -a" from the host to give a bit more information on its configuration?
Follow-up: RC3 was installed incorrectly (i.e. not installed at all). After proper RC3 downgrade (r305786) seems like server is at least more stable - it runs for more than an hour. On 11.0-PRE (306739) panics were happening in like 3 to 5 minutes. I have a handful of cores, in case someone needs them. As about the driver. This was a HP DL160 g6 I guess and the driver was igb(4). Now it's the Supermicro board (tech team switched the drives to a new chassis to exclude possible hardware problems) and the ifconfig/dmesg.boot are as folows (the driver is still an igb(4)), and the dmesg is from 11-RC3: igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6> ether 00:25:90:06:b7:9e nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active igb1: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6> ether 00:25:90:06:b7:9f nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect status: no carrier lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384 options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6> inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3 inet 127.0.0.1 netmask 0xff000000 nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> groups: lo vlan1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=303<RXCSUM,TXCSUM,TSO4,TSO6> ether 00:25:90:06:b7:9e inet 192.168.0.248 netmask 0xffffff00 broadcast 192.168.0.255 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active vlan: 1 vlanpcp: 0 parent interface: igb0 groups: vlan vlan2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=303<RXCSUM,TXCSUM,TSO4,TSO6> ether 00:25:90:06:b7:9e inet 91.206.242.1 netmask 0xfffffff0 broadcast 91.206.242.15 inet 91.206.242.5 netmask 0xfffffff0 broadcast 91.206.242.15 inet 91.206.242.8 netmask 0xfffffff0 broadcast 91.206.242.15 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active vlan: 2 vlanpcp: 0 parent interface: igb0 groups: vlan vlan3: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=303<RXCSUM,TXCSUM,TSO4,TSO6> ether 00:25:90:06:b7:9e inet 10.64.0.250 netmask 0xffffff00 broadcast 10.64.0.255 inet 10.64.0.252 netmask 0xffffff00 broadcast 10.64.0.255 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active vlan: 3 vlanpcp: 0 parent interface: igb0 groups: vlan vlan4: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=303<RXCSUM,TXCSUM,TSO4,TSO6> ether 00:25:90:06:b7:9e inet 89.250.210.6 netmask 0xfffffffc broadcast 89.250.210.7 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active vlan: 4 vlanpcp: 0 parent interface: igb0 groups: vlan vlan5: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=303<RXCSUM,TXCSUM,TSO4,TSO6> ether 00:25:90:06:b7:9e inet 77.43.142.201 netmask 0xfffffffc broadcast 77.43.142.203 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active vlan: 5 vlanpcp: 0 parent interface: igb0 groups: vlan vlan6: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=303<RXCSUM,TXCSUM,TSO4,TSO6> ether 00:25:90:06:b7:9e inet 172.20.142.250 netmask 0xffffff00 broadcast 172.20.142.255 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active vlan: 6 vlanpcp: 0 parent interface: igb0 groups: vlan vlan7: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=303<RXCSUM,TXCSUM,TSO4,TSO6> ether 00:25:90:06:b7:9e inet 172.16.240.2 netmask 0xffffff00 broadcast 172.16.240.255 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active vlan: 7 vlanpcp: 0 parent interface: igb0 groups: vlan vlan8: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=303<RXCSUM,TXCSUM,TSO4,TSO6> ether 00:25:90:06:b7:9e inet 86.109.196.74 netmask 0xfffffff8 broadcast 86.109.196.79 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active vlan: 8 vlanpcp: 0 parent interface: igb0 groups: vlan vlan9: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=303<RXCSUM,TXCSUM,TSO4,TSO6> ether 00:25:90:06:b7:9e inet 192.168.2.1 netmask 0xffffff00 broadcast 192.168.2.255 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active vlan: 9 vlanpcp: 0 parent interface: igb0 groups: vlan vlan10: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=303<RXCSUM,TXCSUM,TSO4,TSO6> ether 00:25:90:06:b7:9e inet 192.168.3.1 netmask 0xffffff00 broadcast 192.168.3.255 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active vlan: 10 vlanpcp: 0 parent interface: igb0 groups: vlan vlan11: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=303<RXCSUM,TXCSUM,TSO4,TSO6> ether 00:25:90:06:b7:9e inet 188.234.141.201 netmask 0xfffffffc broadcast 188.234.141.203 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active vlan: 11 vlanpcp: 0 parent interface: igb0 groups: vlan vlan12: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=303<RXCSUM,TXCSUM,TSO4,TSO6> ether 00:25:90:06:b7:9e inet 192.168.50.1 netmask 0xffffff00 broadcast 192.168.50.255 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active vlan: 12 vlanpcp: 0 parent interface: igb0 groups: vlan vlan13: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=303<RXCSUM,TXCSUM,TSO4,TSO6> ether 00:25:90:06:b7:9e inet 192.168.99.10 netmask 0xffffff00 broadcast 192.168.99.255 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active vlan: 13 vlanpcp: 0 parent interface: igb0 groups: vlan Dmesg: Copyright (c) 1992-2016 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 11.0-RC3 #0 r305786: Wed Sep 14 02:19:25 UTC 2016 root@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64 FreeBSD clang version 3.8.0 (tags/RELEASE_380/final 262564) (based on LLVM 3.8.0) VT(vga): resolution 640x480 CPU: Intel(R) Xeon(R) CPU E5620 @ 2.40GHz (2400.13-MHz K8-class CPU) Origin="GenuineIntel" Id=0x206c2 Family=0x6 Model=0x2c Stepping=2 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x9ee3fd<SSE3,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,POPCNT> AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM> AMD Features2=0x1<LAHF> VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant, performance statistics real memory = 51543801856 (49156 MB) avail memory = 49979412480 (47664 MB) Event timer "LAPIC" quality 600 ACPI APIC Table: <080312 APIC1521> FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs FreeBSD/SMP: 2 package(s) x 4 core(s) x 2 hardware threads random: unblocking device. ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe0Block: 128/64 (20160527/tbfadt-650) ioapic0: Changing APIC ID to 6 ioapic1: Changing APIC ID to 7 ioapic0 <Version 2.0> irqs 0-23 on motherboard ioapic1 <Version 2.0> irqs 24-47 on motherboard random: entropy device external interface kbd1 at kbdmux0 netmap: loaded module module_register_init: MOD_LOAD (vesa, 0xffffffff8101c950, 0) error 19 vtvga0: <VT VGA driver> on motherboard cryptosoft0: <software crypto> on motherboard acpi0: <SMCI > on motherboard acpi0: Power Button (fixed) cpu0: <ACPI CPU> on acpi0 cpu1: <ACPI CPU> on acpi0 cpu2: <ACPI CPU> on acpi0 cpu3: <ACPI CPU> on acpi0 cpu4: <ACPI CPU> on acpi0 cpu5: <ACPI CPU> on acpi0 cpu6: <ACPI CPU> on acpi0 cpu7: <ACPI CPU> on acpi0 cpu8: <ACPI CPU> on acpi0 cpu9: <ACPI CPU> on acpi0 cpu10: <ACPI CPU> on acpi0 cpu11: <ACPI CPU> on acpi0 cpu12: <ACPI CPU> on acpi0 cpu13: <ACPI CPU> on acpi0 cpu14: <ACPI CPU> on acpi0 cpu15: <ACPI CPU> on acpi0 attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0 Event timer "RTC" frequency 32768 Hz quality 0 hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 950 Event timer "HPET" frequency 14318180 Hz quality 350 Event timer "HPET1" frequency 14318180 Hz quality 340 Event timer "HPET2" frequency 14318180 Hz quality 340 Event timer "HPET3" frequency 14318180 Hz quality 340 Timecounter "ACPI-safe" frequency 3579545 Hz quality 850 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff numa-domain 0 on acpi0 pcib0: _OSC returned error 0x10 pci0: <ACPI PCI bus> numa-domain 0 on pcib0 pcib1: <ACPI PCI-PCI bridge> at device 1.0 numa-domain 0 on pci0 pci1: <ACPI PCI bus> numa-domain 0 on pcib1 igb0: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port 0xec00-0xec1f mem 0xfbde0000-0xfbdfffff,0xfbdc0000-0xfbddffff,0xfbd9c000-0xfbd9ffff irq 28 at device 0.0 numa-domain 0 on pci1 igb0: Using MSIX interrupts with 9 vectors igb0: Ethernet address: 00:25:90:06:b7:9e igb0: Bound queue 0 to cpu 0 igb0: Bound queue 1 to cpu 1 igb0: Bound queue 2 to cpu 2 igb0: Bound queue 3 to cpu 3 igb0: Bound queue 4 to cpu 4 igb0: Bound queue 5 to cpu 5 igb0: Bound queue 6 to cpu 6 igb0: Bound queue 7 to cpu 7 igb0: netmap queues/slots: TX 8/1024, RX 8/1024 igb1: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port 0xe880-0xe89f mem 0xfbd60000-0xfbd7ffff,0xfbd40000-0xfbd5ffff,0xfbd1c000-0xfbd1ffff irq 40 at device 0.1 numa-domain 0 on pci1 igb1: Using MSIX interrupts with 9 vectors igb1: Ethernet address: 00:25:90:06:b7:9f igb1: Bound queue 0 to cpu 8 igb1: Bound queue 1 to cpu 9 igb1: Bound queue 2 to cpu 10 igb1: Bound queue 3 to cpu 11 igb1: Bound queue 4 to cpu 12 igb1: Bound queue 5 to cpu 13 igb1: Bound queue 6 to cpu 14 igb1: Bound queue 7 to cpu 15 igb1: netmap queues/slots: TX 8/1024, RX 8/1024 pcib2: <ACPI PCI-PCI bridge> at device 3.0 numa-domain 0 on pci0 pci2: <ACPI PCI bus> numa-domain 0 on pcib2 pcib3: <ACPI PCI-PCI bridge> at device 5.0 numa-domain 0 on pci0 pci3: <ACPI PCI bus> numa-domain 0 on pcib3 pcib4: <ACPI PCI-PCI bridge> at device 7.0 numa-domain 0 on pci0 pci4: <ACPI PCI bus> numa-domain 0 on pcib4 pcib5: <ACPI PCI-PCI bridge> at device 9.0 numa-domain 0 on pci0 pci5: <ACPI PCI bus> numa-domain 0 on pcib5 pci0: <base peripheral, interrupt controller> at device 20.0 (no driver attached) pci0: <base peripheral, interrupt controller> at device 20.1 (no driver attached) pci0: <base peripheral, interrupt controller> at device 20.2 (no driver attached) pci0: <base peripheral, interrupt controller> at device 20.3 (no driver attached) uhci0: <Intel 82801JI (ICH10) USB controller USB-D> port 0xdc00-0xdc1f irq 16 at device 26.0 numa-domain 0 on pci0 uhci0: LegSup = 0x2f00 usbus0 numa-domain 0 on uhci0 uhci1: <Intel 82801JI (ICH10) USB controller USB-E> port 0xd880-0xd89f irq 21 at device 26.1 numa-domain 0 on pci0 uhci1: LegSup = 0x2f00 usbus1 numa-domain 0 on uhci1 uhci2: <Intel 82801JI (ICH10) USB controller USB-F> port 0xd800-0xd81f irq 19 at device 26.2 numa-domain 0 on pci0 uhci2: LegSup = 0x2f00 usbus2 numa-domain 0 on uhci2 ehci0: <Intel 82801JI (ICH10) USB 2.0 controller USB-B> mem 0xfbeda000-0xfbeda3ff irq 18 at device 26.7 numa-domain 0 on pci0 usbus3: EHCI version 1.0 usbus3 numa-domain 0 on ehci0 uhci3: <Intel 82801JI (ICH10) USB controller USB-A> port 0xd480-0xd49f irq 23 at device 29.0 numa-domain 0 on pci0 uhci3: LegSup = 0x2f00 usbus4 numa-domain 0 on uhci3 uhci4: <Intel 82801JI (ICH10) USB controller USB-B> port 0xd400-0xd41f irq 19 at device 29.1 numa-domain 0 on pci0 uhci4: LegSup = 0x2f00 usbus5 numa-domain 0 on uhci4 uhci5: <Intel 82801JI (ICH10) USB controller USB-C> port 0xd080-0xd09f irq 18 at device 29.2 numa-domain 0 on pci0 uhci5: LegSup = 0x2f00 usbus6 numa-domain 0 on uhci5 ehci1: <Intel 82801JI (ICH10) USB 2.0 controller USB-A> mem 0xfbed8000-0xfbed83ff irq 23 at device 29.7 numa-domain 0 on pci0 usbus7: EHCI version 1.0 usbus7 numa-domain 0 on ehci1 pcib6: <ACPI PCI-PCI bridge> at device 30.0 numa-domain 0 on pci0 pci6: <ACPI PCI bus> numa-domain 0 on pcib6 vgapci0: <VGA-compatible display> mem 0xf9000000-0xf9ffffff,0xfaffc000-0xfaffffff,0xfb000000-0xfb7fffff irq 18 at device 1.0 numa-domain 0 on pci6 vgapci0: Boot video device isab0: <PCI-ISA bridge> at device 31.0 numa-domain 0 on pci0 isa0: <ISA bus> numa-domain 0 on isab0 atapci0: <Intel ICH10 SATA300 controller> port 0xd000-0xd007,0xcc00-0xcc03,0xc880-0xc887,0xc800-0xc803,0xc480-0xc48f,0xc400-0xc40f irq 19 at device 31.2 numa-domain 0 on pci0 ata2: <ATA channel> at channel 0 on atapci0 ata3: <ATA channel> at channel 1 on atapci0 atapci1: <Intel ICH10 SATA300 controller> port 0xc000-0xc007,0xbc00-0xbc03,0xb880-0xb887,0xb800-0xb803,0xb480-0xb48f,0xb400-0xb40f irq 19 at device 31.5 numa-domain 0 on pci0 ata4: <ATA channel> at channel 0 on atapci1 ata5: <ATA channel> at channel 1 on atapci1 acpi_button0: <Power Button> on acpi0 atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] psm0: <PS/2 Mouse> irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: model IntelliMouse Explorer, device ID 4 uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0 qpi0: <QPI system bus> on motherboard pcib7: <QPI Host-PCI bridge> pcibus 255 on qpi0 pci7: <PCI bus> on pcib7 pcib8: <QPI Host-PCI bridge> pcibus 254 on qpi0 pci8: <PCI bus> on pcib8 orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xc8fff on isa0 ppc0: cannot reserve I/O port range est0: <Enhanced SpeedStep Frequency Control> on cpu0 est1: <Enhanced SpeedStep Frequency Control> on cpu1 est2: <Enhanced SpeedStep Frequency Control> on cpu2 est3: <Enhanced SpeedStep Frequency Control> on cpu3 est4: <Enhanced SpeedStep Frequency Control> on cpu4 est5: <Enhanced SpeedStep Frequency Control> on cpu5 est6: <Enhanced SpeedStep Frequency Control> on cpu6 est7: <Enhanced SpeedStep Frequency Control> on cpu7 est8: <Enhanced SpeedStep Frequency Control> on cpu8 est9: <Enhanced SpeedStep Frequency Control> on cpu9 est10: <Enhanced SpeedStep Frequency Control> on cpu10 est11: <Enhanced SpeedStep Frequency Control> on cpu11 est12: <Enhanced SpeedStep Frequency Control> on cpu12 est13: <Enhanced SpeedStep Frequency Control> on cpu13 est14: <Enhanced SpeedStep Frequency Control> on cpu14 est15: <Enhanced SpeedStep Frequency Control> on cpu15 ZFS filesystem version: 5 ZFS storage pool version: features support (5000) Timecounters tick every 1.000 msec nvme cam probe device init usbus0: 12Mbps Full Speed USB v1.0 usbus1: 12Mbps Full Speed USB v1.0 usbus2: 12Mbps Full Speed USB v1.0 ugen0.1: <Intel> at usbus0 uhub0: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0 ugen1.1: <Intel> at usbus1 uhub1: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus1 ugen2.1: <Intel> at usbus2 uhub2: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2 usbus3: 480Mbps High Speed USB v2.0 usbus4: 12Mbps Full Speed USB v1.0 usbus5: 12Mbps Full Speed USB v1.0 ugen3.1: <Intel> at usbus3 uhub3: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3 ugen4.1: <Intel> at usbus4 uhub4: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus4 ugen5.1: <Intel> at usbus5 uhub5: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus5 usbus6: 12Mbps Full Speed USB v1.0 usbus7: 480Mbps High Speed USB v2.0 ugen6.1: <Intel> at usbus6 uhub6: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus6 ugen7.1: <Intel> at usbus7 uhub7: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus7 uhub2: 2 ports with 2 removable, self powered uhub1: 2 ports with 2 removable, self powered uhub0: 2 ports with 2 removable, self powered uhub6: 2 ports with 2 removable, self powered uhub4: 2 ports with 2 removable, self powered uhub5: 2 ports with 2 removable, self powered ada0 at ata2 bus 0 scbus0 target 0 lun 0 ada0: <GB0500EAFYL HPG1> ATA-7 SATA 2.x device ada0: Serial Number WCASY6743897 ada0: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes) ada0: 476940MB (976773168 512 byte sectors) ada1 at ata2 bus 0 scbus0 target 1 lun 0 ada1: <ST500DM002-1BD142 KC48> ATA8-ACS SATA 3.x device ada1: Serial Number Z6EMAENR ada1: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes) ada1: 476940MB (976773168 512 byte sectors) ada1: quirks=0x1<4K> ada2 at ata3 bus 0 scbus1 target 0 lun 0 ada2: <GB0500EAFYL HPG1> ATA-7 SATA 2.x device ada2: Serial Number WCASY6752687 ada2: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes) ada2: 476940MB (976773168 512 byte sectors) ada3 at ata3 bus 0 scbus1 target 1 lun 0 ada3: <ST500DM002-1BD142 KC48> ATA8-ACS SATA 3.x device ada3: Serial Number Z6EM8QHK ada3: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes) ada3: 476940MB (976773168 512 byte sectors) ada3: quirks=0x1<4K> SMP: AP CPU #1 Launched! SMP: AP CPU #15 Launched! SMP: AP CPU #4 Launched! SMP: AP CPU #10 Launched! SMP: AP CPU #6 Launched! SMP: AP CPU #11 Launched! SMP: AP CPU #2 Launched! SMP: AP CPU #8 Launched! SMP: AP CPU #3 Launched! SMP: AP CPU #12 Launched! SMP: AP CPU #7 Launched! SMP: AP CPU #13 Launched! SMP: AP CPU #5 Launched! SMP: AP CPU #9 Launched! SMP: AP CPU #14 Launched! Timecounter "TSC-low" frequency 1200065624 Hz quality 1000 Trying to mount root from zfs:zfsroot []... GEOM_MIRROR: Device mirror/swap launched (2/2). Root mount waiting for: usbus7 usbus3 Root mount waiting for: usbus7 usbus3 uhub7: 6 ports with 6 removable, self powered uhub3: 6 ports with 6 removable, self powered igb0: link state changed to UP vlan1: link state changed to UP vlan2: link state changed to UP vlan3: link state changed to UP vlan4: link state changed to UP vlan5: link state changed to UP vlan6: link state changed to UP vlan7: link state changed to UP vlan8: link state changed to UP vlan9: link state changed to UP vlan10: link state changed to UP vlan11: link state changed to UP vlan12: link state changed to UP vlan13: link state changed to UP
(In reply to Robert Watson from comment #29) Robert, Thanks for your response. On a slightly modified (nothing in driver space) stable/11, I am seeing repeated panic in sbsndptr() with igb while box is pretty much idle or doing very low traffic. (kgdb) bt #0 __curthread () at ./machine/pcpu.h:221 #1 doadump (textdump=-2121667464) at /d2/hiren/freebsd/sys/kern/kern_shutdown.c:298 #2 0xffffffff80389f86 in db_fncall_generic (nargs=0, addr=<optimized out>, rv=<optimized out>, args=<optimized out>) at /d2/hiren/freebsd/sys/ddb/db_command.c:568 #3 db_fncall (dummy1=<optimized out>, dummy2=<optimized out>, dummy3=<optimized out>, dummy4=<optimized out>) at /d2/hiren/freebsd/sys/ddb/db_command.c:616 #4 0xffffffff80389a29 in db_command (last_cmdp=<optimized out>, cmd_table=<optimized out>, dopager=<optimized out>) at /d2/hiren/freebsd/sys/ddb/db_command.c:440 #5 0xffffffff80389784 in db_command_loop () at /d2/hiren/freebsd/sys/ddb/db_command.c:493 #6 0xffffffff8038c76b in db_trap (type=<optimized out>, code=<optimized out>) at /d2/hiren/freebsd/sys/ddb/db_main.c:251 #7 0xffffffff809a6f33 in kdb_trap (type=<optimized out>, code=<optimized out>, tf=<optimized out>) at /d2/hiren/freebsd/sys/kern/subr_kdb.c:654 #8 0xffffffff80d93521 in trap_fatal (frame=0xfffffe1f2bb38210, eva=24) at /d2/hiren/freebsd/sys/amd64/amd64/trap.c:836 #9 0xffffffff80d93753 in trap_pfault (frame=0xfffffe1f2bb38210, usermode=0) at /d2/hiren/freebsd/sys/amd64/amd64/trap.c:691 #10 0xffffffff80d92cdc in trap (frame=0xfffffe1f2bb38210) at /d2/hiren/freebsd/sys/amd64/amd64/trap.c:442 #11 <signal handler called> #12 sbsndptr (sb=0xfffff8060f8a5518, off=0, len=4294967287, moff=0xfffffe1f2bb38420) at /d2/hiren/freebsd/sys/kern/uipc_sockbuf.c:1191 #13 0xffffffff80ab9382 in tcp_output (tp=<optimized out>) at /d2/hiren/freebsd/sys/netinet/tcp_output.c:1099 #14 0xffffffff80ab6105 in tcp_do_segment (m=<optimized out>, th=<optimized out>, so=0xfffff8060f8a5360, tp=<optimized out>, drop_hdrlen=60, tlen=<optimized out>, iptos=<optimized out>, ti_locked=<error reading variable: Cannot access memory at address 0x1>) at /d2/hiren/freebsd/sys/netinet/tcp_input.c:3182 #15 0xffffffff80ab2803 in tcp_input (mp=<optimized out>, offp=<optimized out>, proto=<optimized out>) at /d2/hiren/freebsd/sys/netinet/tcp_input.c:1444 #16 0xffffffff80aa6bc5 in ip_input (m=<error reading variable: Cannot access memory at address 0x0>) at /d2/hiren/freebsd/sys/netinet/ip_input.c:809 #17 0xffffffff80a82b35 in netisr_dispatch_src (proto=1, source=<optimized out>, m=0x0) at /d2/hiren/freebsd/sys/net/netisr.c:1120 #18 0xffffffff80a6c2ca in ether_demux (ifp=<optimized out>, m=0x0) at /d2/hiren/freebsd/sys/net/if_ethersubr.c:850 #19 0xffffffff80a6cf22 in ether_input_internal (ifp=<optimized out>, m=0x0) at /d2/hiren/freebsd/sys/net/if_ethersubr.c:639 #20 ether_nh_input (m=<optimized out>) at /d2/hiren/freebsd/sys/net/if_ethersubr.c:669 #21 0xffffffff80a82b35 in netisr_dispatch_src (proto=5, source=<optimized out>, m=0x0) at /d2/hiren/freebsd/sys/net/netisr.c:1120 #22 0xffffffff80a6c546 in ether_input (ifp=<optimized out>, m=0x0) at /d2/hiren/freebsd/sys/net/if_ethersubr.c:759 #23 0xffffffff804e2b3c in igb_rx_input (rxr=<optimized out>, ifp=0xfffff80115614800, m=0xfffff8014eee7600, ptype=<optimized out>) at /d2/hiren/freebsd/sys/dev/e1000/if_igb.c:4957 #24 igb_rxeof (que=<optimized out>, count=358700136, done=<optimized out>) at /d2/hiren/freebsd/sys/dev/e1000/if_igb.c:5185 #25 0xffffffff804e1daf in igb_msix_que (arg=<optimized out>) at /d2/hiren/freebsd/sys/dev/e1000/if_igb.c:1612 #26 0xffffffff8091425f in intr_event_execute_handlers (p=<optimized out>, ie=<optimized out>) at /d2/hiren/freebsd/sys/kern/kern_intr.c:1262 #27 0xffffffff80914876 in ithread_execute_handlers (ie=<optimized out>, p=<optimized out>) at /d2/hiren/freebsd/sys/kern/kern_intr.c:1275 #28 ithread_loop (arg=<optimized out>) at /d2/hiren/freebsd/sys/kern/kern_intr.c:1356 #29 0xffffffff80910ea5 in fork_exit (callout=0xffffffff809147b0 <ithread_loop>, arg=0xfffff8011561a0e0, frame=0xfffffe1f2bb38ac0) at /d2/hiren/freebsd/sys/kern/kern_fork.c:1040 #30 <signal handler called> ---------------------------------------------------------------- Most interesting frames are these 2: #22 0xffffffff80a6c546 in ether_input (ifp=<optimized out>, m=0x0) at /d2/hiren/freebsd/sys/net/if_ethersubr.c:759 #23 0xffffffff804e2b3c in igb_rx_input (rxr=<optimized out>, ifp=0xfffff80115614800, m=0xfffff8014eee7600, ptype=<optimized out>) at /d2/hiren/freebsd/sys/dev/e1000/if_igb.c:4957 #23 has an mbuf while #22 has it null. Does this point to your hunch of "device-driver bugs involving modifications to the mbuf chain after submitting the mbuf to the network stack (e.g., due to concurrency bugs in the device driver)" ? OR something else is going on?
(In reply to Robert Watson from comment #29) > On the whole, my intuition is towards a device-driver bug based > on past experience. We've been also struggling this in past weeks, and I can confirm Robert's intuition. In our case, the bug affects two hosts running recent 10-STABLE, connected to each other via igb(4) using a dedicated 100Mb switch. When trying to transfer directory structure holding several gigabytes of data with rsync protocol, either sender or receiver panics in less then a minute with: Panic String: sbsndptr: sockbuf 0xfffff8000ccc76f8 and mbuf 0xfffff802a0145800 clashing Interestingly, scp(1)ing data between the hosts doesn't seem to trigger this panic such easily, but sometimes it does, mostly when copying larger (>1GB) files. We've fixed this just yesterday by limiting number of igb(4) txrx queues, ie. adding this into loader.conf: hw.igb.num_queues=1 Now the hosts run stable, periodically rsyncing data in both directions.
(In reply to Hiren Panchasara from comment #31) > Most interesting frames are these 2: > > #22 0xffffffff80a6c546 in ether_input (ifp=<optimized out>, m=0x0) at /d2/hiren/freebsd/sys/net/if_ethersubr.c:759 > #23 0xffffffff804e2b3c in igb_rx_input (rxr=<optimized out>, ifp=0xfffff80115614800, m=0xfffff8014eee7600, > ptype=<optimized out>) at /d2/hiren/freebsd/sys/dev/e1000/if_igb.c:4957 > > #23 has an mbuf while #22 has it null. > Does this point to your hunch of > "device-driver bugs involving modifications to the mbuf chain after submitting the mbuf to the network stack (e.g., due to concurrency bugs in the device driver)" ? This is just result of compiler optimisation and stack decoding. Compiler use for m same register as passed at call time and do while (m) { mn = m->m_nextpkt; [...] m = mn; } as result m (as decoded argument) will be incorectly displayed. Actualy this is just last loop iteration with last mbuf in chain.
(In reply to slw from comment #33) Thanks but I am little confused. which value of 'm' should I trust? is it null in frame #22 or not? it seems like null in the frames above it also.
(In reply to Hiren Panchasara from comment #34) > which value of 'm' should I trust? is it null in frame #22 or not? it seems like null in the frames above it also. Зartially. ether_input call with m set to 0xfffff8014eee7600 (and this is first m for next invocation of further functions), do one (or more, w/ different m, need access to vmcore by kgdb and analyse 0xfffff8014eee7600 for answer) iteration w/ and call netisr_dispatch with passed m as second argument (in %rsi register). All next invocation can don't preserve %rsi (or %rdx in case of m passed as 3'th argument) and backtrace can incorrectly decode arguments call. Just realyty check: frame #19, ether_input_internal (ifp=<optimized out>, m=0x0), line 483: if (m->m_len < ETHER_HDR_LEN) { MUST occur kernel panic if m realy NULL. This is just incorrect decoding of arguments.
(In reply to Daniel Bilik from comment #32) I was seeing identical issues - panic/reboot every few hours when under network load (rsync and zfs replication in this case) on 11.0-RELEASE. Daniel's suggested workaround setting "hw.igb.num_queues=1" in loader.conf has stabilized the systems.
*** Bug 147558 has been marked as a duplicate of this bug. ***
I have the same problem: http://docs.freebsd.org/cgi/mid.cgi?20170215093609.78a77ead Head maillist: http://docs.freebsd.org/cgi/mid.cgi?20161021220413.1d130f5c
FYI, I feel that this bug report ought to be closed, and a new and more specific report opened on the IGB issue. The "sbdrop" panic is a generic debugging invariant that catches a range of different bugs, and this report has likely therefore encapsulated quite a few different bugs over the years, rather than reflecting a single bug. Can I recommend the creation of a new ticket tacking the presumed IGB multi-queue issue specifically?
(In reply to Robert Watson from comment #39) I am afraid, that I can not add to new ticket tacking is nothing more, than has been described in the above thread the mailing list. In the same thread, I explained why I could not get more information about the cause of panic. Sorry.
May be it will be better just to change a subject, rather to lose all the decent work here in comments ? Because closing such a bug will be really discouraging, even if it's subject is misleading.
I've seen this happen on both em(4) which has only 1 queue and igb(4) which has 2. It'd be interesting to see if new iflib'd em(4) (which has both em/igb) drivers in -HEAD help in this regard.
@All For issues where the 'described problem' and the 'thing(s) that need to be changed' are not one and the same, to retain all history/contextual information and not cause confusion (renaming titles, etc), the correct process is as follows: 1) Leave the 'original description/report of symptoms' issues 'as is'. 2) Create a blocking (depends on) issue for each area/component within FreeBSD that requires a change/fix (em, igb, etc). Alternatively, if multiple areas require fixing/changes but these will be resolved by a single person or maintainer (the assignee), then a single blocking issue is fine. Notes: The assignee of the blocked (parent) issue (ie 'the original report') can be changed to the person(s) responsible (assignee) to the blocking (sub) issue. Alternatively, the assignee of the parent issue can also be a person who wants to/will 'see it through', coordinating and updating issue metadata until all sub-tasks are closed, including any relevant MFC's and documentation tasks.
It's probably best to be clear: the bugs being described by various reporters over the years are *different* (and likely entirely *independent*) bugs that happen to have the same visible symptom as a result of being detected by the same check. Putting these differences problems in the same open bug report, and keeping it open, is very much like putting all kernel panics in the same bug report because they share the word "panic" and involve a crash (which is pretty much what is happening here). As a result, there's a bug report that describes many different problems fixed in many different ways at many different times, and is never closed. Not only that, but prior debugging may shed no light on more recent problems, and could in fact obscure understanding, since it's not progress being made on one bug, but observations of several likely independent problems.
As the guy that filed the original bug report almost 7 years ago, I'd tend to agree with Robert Watson here. I'll see if Bugzilla will let me close it.
I've opened a new #218270 for the sbsndptr() crash.
Use more appropriate resolution (FIXED is for those issues resolved by a change/commit)