Bug 218270 - panic: sbsndptr: sockbuf (...) and mbuf (...) clashing
Summary: panic: sbsndptr: sockbuf (...) and mbuf (...) clashing
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.0-RELEASE
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-net (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-03-31 18:14 UTC by Robert Watson
Modified: 2018-09-04 18:00 UTC (History)
6 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Robert Watson freebsd_committer 2017-03-31 18:14:21 UTC
Multiple reports of a problem involving "panic: sbsndptr: sockbuf (...) and mbuf (...) clashing" have been reported crashes along the following lines with em/igb drivers. Some text from a prior unrelated report (#148807) are included here:


From emz@norma.perm.ru:

Got this right now on an update from 10.2-STABLE to 11.0-PRERELEASE. Persistent in 11.0-RC3. Repeatable in like 5-12 minutes. 25 minutes is an absolute record.

panic: sbsndptr: sockbuf 0xfffff8003eea31b8 and mbuf 0xfffff80020a6e700 clashing

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
panic: sbsndptr: sockbuf 0xfffff8003eea31b8 and mbuf 0xfffff80020a6e700 clashing
cpuid = 1
KDB: stack backtrace:
#0 0xffffffff80b1d0c7 at kdb_backtrace+0x67
#1 0xffffffff80ad1f62 at vpanic+0x182
#2 0xffffffff80ad1dd3 at panic+0x43
#3 0xffffffff80b6a15a at sbsndptr+0xda
#4 0xffffffff80cfcbb4 at tcp_output+0xf34
#5 0xffffffff80cf9a81 at tcp_do_segment+0x2ce1
#6 0xffffffff80cf60cc at tcp_input+0xd1c
#7 0xffffffff80c66dbf at ip_input+0x15f
#8 0xffffffff80bfc295 at netisr_dispatch_src+0xa5
#9 0xffffffff80be4cea at ether_demux+0x12a
#10 0xffffffff80be5942 at ether_nh_input+0x322
#11 0xffffffff80bfc295 at netisr_dispatch_src+0xa5
#12 0xffffffff80be4f66 at ether_input+0x26
#13 0xffffffff80bed9db at vlan_input+0x1cb
#14 0xffffffff80be4c55 at ether_demux+0x95
#15 0xffffffff80be5942 at ether_nh_input+0x322
#16 0xffffffff80bfc295 at netisr_dispatch_src+0xa5
#17 0xffffffff80be4f66 at ether_input+0x26

Follow-up: RC3 was installed incorrectly (i.e. not installed at all). After proper RC3 downgrade (r305786) seems like server is at least more stable - it runs for more than an hour. On 11.0-PRE (306739) panics were happening in like 3 to 5 minutes.

I have a handful of cores, in case someone needs them.

As about the driver. This was a HP DL160 g6 I guess and the driver was igb(4). Now it's the Supermicro board (tech team switched the drives to a new chassis to exclude possible hardware problems) and the ifconfig/dmesg.boot are as folows (the driver is still an igb(4)), and the dmesg is from 11-RC3:

igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
        ether 00:25:90:06:b7:9e
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
igb1: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
        ether 00:25:90:06:b7:9f
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect
        status: no carrier
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
        inet 127.0.0.1 netmask 0xff000000
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        groups: lo
vlan1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=303<RXCSUM,TXCSUM,TSO4,TSO6>
        ether 00:25:90:06:b7:9e
        inet 192.168.0.248 netmask 0xffffff00 broadcast 192.168.0.255
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        vlan: 1 vlanpcp: 0 parent interface: igb0
        groups: vlan
vlan2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=303<RXCSUM,TXCSUM,TSO4,TSO6>
        ether 00:25:90:06:b7:9e
        inet 91.206.242.1 netmask 0xfffffff0 broadcast 91.206.242.15
        inet 91.206.242.5 netmask 0xfffffff0 broadcast 91.206.242.15
        inet 91.206.242.8 netmask 0xfffffff0 broadcast 91.206.242.15
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        vlan: 2 vlanpcp: 0 parent interface: igb0
        groups: vlan
vlan3: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=303<RXCSUM,TXCSUM,TSO4,TSO6>
        ether 00:25:90:06:b7:9e
        inet 10.64.0.250 netmask 0xffffff00 broadcast 10.64.0.255
        inet 10.64.0.252 netmask 0xffffff00 broadcast 10.64.0.255
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        vlan: 3 vlanpcp: 0 parent interface: igb0
        groups: vlan
vlan4: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=303<RXCSUM,TXCSUM,TSO4,TSO6>
        ether 00:25:90:06:b7:9e
        inet 89.250.210.6 netmask 0xfffffffc broadcast 89.250.210.7
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        vlan: 4 vlanpcp: 0 parent interface: igb0
        groups: vlan
vlan5: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=303<RXCSUM,TXCSUM,TSO4,TSO6>
        ether 00:25:90:06:b7:9e
        inet 77.43.142.201 netmask 0xfffffffc broadcast 77.43.142.203
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        vlan: 5 vlanpcp: 0 parent interface: igb0
        groups: vlan
vlan6: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=303<RXCSUM,TXCSUM,TSO4,TSO6>
        ether 00:25:90:06:b7:9e
        inet 172.20.142.250 netmask 0xffffff00 broadcast 172.20.142.255
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        vlan: 6 vlanpcp: 0 parent interface: igb0
        groups: vlan
vlan7: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=303<RXCSUM,TXCSUM,TSO4,TSO6>
        ether 00:25:90:06:b7:9e
        inet 172.16.240.2 netmask 0xffffff00 broadcast 172.16.240.255
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        vlan: 7 vlanpcp: 0 parent interface: igb0
        groups: vlan
vlan8: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=303<RXCSUM,TXCSUM,TSO4,TSO6>
        ether 00:25:90:06:b7:9e
        inet 86.109.196.74 netmask 0xfffffff8 broadcast 86.109.196.79
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        vlan: 8 vlanpcp: 0 parent interface: igb0
        groups: vlan
vlan9: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=303<RXCSUM,TXCSUM,TSO4,TSO6>
        ether 00:25:90:06:b7:9e
        inet 192.168.2.1 netmask 0xffffff00 broadcast 192.168.2.255
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        vlan: 9 vlanpcp: 0 parent interface: igb0
        groups: vlan
vlan10: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=303<RXCSUM,TXCSUM,TSO4,TSO6>
        ether 00:25:90:06:b7:9e
        inet 192.168.3.1 netmask 0xffffff00 broadcast 192.168.3.255
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        vlan: 10 vlanpcp: 0 parent interface: igb0
        groups: vlan
vlan11: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=303<RXCSUM,TXCSUM,TSO4,TSO6>
        ether 00:25:90:06:b7:9e
        inet 188.234.141.201 netmask 0xfffffffc broadcast 188.234.141.203
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        vlan: 11 vlanpcp: 0 parent interface: igb0
        groups: vlan
vlan12: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=303<RXCSUM,TXCSUM,TSO4,TSO6>
        ether 00:25:90:06:b7:9e
        inet 192.168.50.1 netmask 0xffffff00 broadcast 192.168.50.255
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        vlan: 12 vlanpcp: 0 parent interface: igb0
        groups: vlan
vlan13: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=303<RXCSUM,TXCSUM,TSO4,TSO6>
        ether 00:25:90:06:b7:9e
        inet 192.168.99.10 netmask 0xffffff00 broadcast 192.168.99.255
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        vlan: 13 vlanpcp: 0 parent interface: igb0
        groups: vlan


Dmesg:

Copyright (c) 1992-2016 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 11.0-RC3 #0 r305786: Wed Sep 14 02:19:25 UTC 2016
    root@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64
FreeBSD clang version 3.8.0 (tags/RELEASE_380/final 262564) (based on LLVM 3.8.0)
VT(vga): resolution 640x480
CPU: Intel(R) Xeon(R) CPU           E5620  @ 2.40GHz (2400.13-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x206c2  Family=0x6  Model=0x2c  Stepping=2
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x9ee3fd<SSE3,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,POPCNT>
  AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
  AMD Features2=0x1<LAHF>
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
  TSC: P-state invariant, performance statistics
real memory  = 51543801856 (49156 MB)
avail memory = 49979412480 (47664 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <080312 APIC1521>
FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs
FreeBSD/SMP: 2 package(s) x 4 core(s) x 2 hardware threads
random: unblocking device.
ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe0Block: 128/64 (20160527/tbfadt-650)
ioapic0: Changing APIC ID to 6
ioapic1: Changing APIC ID to 7
ioapic0 <Version 2.0> irqs 0-23 on motherboard
ioapic1 <Version 2.0> irqs 24-47 on motherboard
random: entropy device external interface
kbd1 at kbdmux0
netmap: loaded module
module_register_init: MOD_LOAD (vesa, 0xffffffff8101c950, 0) error 19
vtvga0: <VT VGA driver> on motherboard
cryptosoft0: <software crypto> on motherboard
acpi0: <SMCI > on motherboard
acpi0: Power Button (fixed)
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
cpu2: <ACPI CPU> on acpi0
cpu3: <ACPI CPU> on acpi0
cpu4: <ACPI CPU> on acpi0
cpu5: <ACPI CPU> on acpi0
cpu6: <ACPI CPU> on acpi0
cpu7: <ACPI CPU> on acpi0
cpu8: <ACPI CPU> on acpi0
cpu9: <ACPI CPU> on acpi0
cpu10: <ACPI CPU> on acpi0
cpu11: <ACPI CPU> on acpi0
cpu12: <ACPI CPU> on acpi0
cpu13: <ACPI CPU> on acpi0
cpu14: <ACPI CPU> on acpi0
cpu15: <ACPI CPU> on acpi0
attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0
Event timer "RTC" frequency 32768 Hz quality 0
hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0
Timecounter "HPET" frequency 14318180 Hz quality 950
Event timer "HPET" frequency 14318180 Hz quality 350
Event timer "HPET1" frequency 14318180 Hz quality 340
Event timer "HPET2" frequency 14318180 Hz quality 340
Event timer "HPET3" frequency 14318180 Hz quality 340
Timecounter "ACPI-safe" frequency 3579545 Hz quality 850
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff numa-domain 0 on acpi0
pcib0: _OSC returned error 0x10
pci0: <ACPI PCI bus> numa-domain 0 on pcib0
pcib1: <ACPI PCI-PCI bridge> at device 1.0 numa-domain 0 on pci0
pci1: <ACPI PCI bus> numa-domain 0 on pcib1
igb0: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port 0xec00-0xec1f mem 0xfbde0000-0xfbdfffff,0xfbdc0000-0xfbddffff,0xfbd9c000-0xfbd9ffff irq 28 at device 0.0 numa-domain 0 on pci1
igb0: Using MSIX interrupts with 9 vectors
igb0: Ethernet address: 00:25:90:06:b7:9e
igb0: Bound queue 0 to cpu 0
igb0: Bound queue 1 to cpu 1
igb0: Bound queue 2 to cpu 2
igb0: Bound queue 3 to cpu 3
igb0: Bound queue 4 to cpu 4
igb0: Bound queue 5 to cpu 5
igb0: Bound queue 6 to cpu 6
igb0: Bound queue 7 to cpu 7
igb0: netmap queues/slots: TX 8/1024, RX 8/1024
igb1: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port 0xe880-0xe89f mem 0xfbd60000-0xfbd7ffff,0xfbd40000-0xfbd5ffff,0xfbd1c000-0xfbd1ffff irq 40 at device 0.1 numa-domain 0 on pci1
igb1: Using MSIX interrupts with 9 vectors
igb1: Ethernet address: 00:25:90:06:b7:9f
igb1: Bound queue 0 to cpu 8
igb1: Bound queue 1 to cpu 9
igb1: Bound queue 2 to cpu 10
igb1: Bound queue 3 to cpu 11
igb1: Bound queue 4 to cpu 12
igb1: Bound queue 5 to cpu 13
igb1: Bound queue 6 to cpu 14
igb1: Bound queue 7 to cpu 15
igb1: netmap queues/slots: TX 8/1024, RX 8/1024
pcib2: <ACPI PCI-PCI bridge> at device 3.0 numa-domain 0 on pci0
pci2: <ACPI PCI bus> numa-domain 0 on pcib2
pcib3: <ACPI PCI-PCI bridge> at device 5.0 numa-domain 0 on pci0
pci3: <ACPI PCI bus> numa-domain 0 on pcib3
pcib4: <ACPI PCI-PCI bridge> at device 7.0 numa-domain 0 on pci0
pci4: <ACPI PCI bus> numa-domain 0 on pcib4
pcib5: <ACPI PCI-PCI bridge> at device 9.0 numa-domain 0 on pci0
pci5: <ACPI PCI bus> numa-domain 0 on pcib5
pci0: <base peripheral, interrupt controller> at device 20.0 (no driver attached)
pci0: <base peripheral, interrupt controller> at device 20.1 (no driver attached)
pci0: <base peripheral, interrupt controller> at device 20.2 (no driver attached)
pci0: <base peripheral, interrupt controller> at device 20.3 (no driver attached)
uhci0: <Intel 82801JI (ICH10) USB controller USB-D> port 0xdc00-0xdc1f irq 16 at device 26.0 numa-domain 0 on pci0
uhci0: LegSup = 0x2f00
usbus0 numa-domain 0 on uhci0
uhci1: <Intel 82801JI (ICH10) USB controller USB-E> port 0xd880-0xd89f irq 21 at device 26.1 numa-domain 0 on pci0
uhci1: LegSup = 0x2f00
usbus1 numa-domain 0 on uhci1
uhci2: <Intel 82801JI (ICH10) USB controller USB-F> port 0xd800-0xd81f irq 19 at device 26.2 numa-domain 0 on pci0
uhci2: LegSup = 0x2f00
usbus2 numa-domain 0 on uhci2
ehci0: <Intel 82801JI (ICH10) USB 2.0 controller USB-B> mem 0xfbeda000-0xfbeda3ff irq 18 at device 26.7 numa-domain 0 on pci0
usbus3: EHCI version 1.0
usbus3 numa-domain 0 on ehci0
uhci3: <Intel 82801JI (ICH10) USB controller USB-A> port 0xd480-0xd49f irq 23 at device 29.0 numa-domain 0 on pci0
uhci3: LegSup = 0x2f00
usbus4 numa-domain 0 on uhci3
uhci4: <Intel 82801JI (ICH10) USB controller USB-B> port 0xd400-0xd41f irq 19 at device 29.1 numa-domain 0 on pci0
uhci4: LegSup = 0x2f00
usbus5 numa-domain 0 on uhci4
uhci5: <Intel 82801JI (ICH10) USB controller USB-C> port 0xd080-0xd09f irq 18 at device 29.2 numa-domain 0 on pci0
uhci5: LegSup = 0x2f00
usbus6 numa-domain 0 on uhci5
ehci1: <Intel 82801JI (ICH10) USB 2.0 controller USB-A> mem 0xfbed8000-0xfbed83ff irq 23 at device 29.7 numa-domain 0 on pci0
usbus7: EHCI version 1.0
usbus7 numa-domain 0 on ehci1
pcib6: <ACPI PCI-PCI bridge> at device 30.0 numa-domain 0 on pci0
pci6: <ACPI PCI bus> numa-domain 0 on pcib6
vgapci0: <VGA-compatible display> mem 0xf9000000-0xf9ffffff,0xfaffc000-0xfaffffff,0xfb000000-0xfb7fffff irq 18 at device 1.0 numa-domain 0 on pci6
vgapci0: Boot video device
isab0: <PCI-ISA bridge> at device 31.0 numa-domain 0 on pci0
isa0: <ISA bus> numa-domain 0 on isab0
atapci0: <Intel ICH10 SATA300 controller> port 0xd000-0xd007,0xcc00-0xcc03,0xc880-0xc887,0xc800-0xc803,0xc480-0xc48f,0xc400-0xc40f irq 19 at device 31.2 numa-domain 0 on pci0
ata2: <ATA channel> at channel 0 on atapci0
ata3: <ATA channel> at channel 1 on atapci0
atapci1: <Intel ICH10 SATA300 controller> port 0xc000-0xc007,0xbc00-0xbc03,0xb880-0xb887,0xb800-0xb803,0xb480-0xb48f,0xb400-0xb40f irq 19 at device 31.5 numa-domain 0 on pci0
ata4: <ATA channel> at channel 0 on atapci1
ata5: <ATA channel> at channel 1 on atapci1
acpi_button0: <Power Button> on acpi0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: model IntelliMouse Explorer, device ID 4
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0
qpi0: <QPI system bus> on motherboard
pcib7: <QPI Host-PCI bridge> pcibus 255 on qpi0
pci7: <PCI bus> on pcib7
pcib8: <QPI Host-PCI bridge> pcibus 254 on qpi0
pci8: <PCI bus> on pcib8
orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xc8fff on isa0
ppc0: cannot reserve I/O port range
est0: <Enhanced SpeedStep Frequency Control> on cpu0
est1: <Enhanced SpeedStep Frequency Control> on cpu1
est2: <Enhanced SpeedStep Frequency Control> on cpu2
est3: <Enhanced SpeedStep Frequency Control> on cpu3
est4: <Enhanced SpeedStep Frequency Control> on cpu4
est5: <Enhanced SpeedStep Frequency Control> on cpu5
est6: <Enhanced SpeedStep Frequency Control> on cpu6
est7: <Enhanced SpeedStep Frequency Control> on cpu7
est8: <Enhanced SpeedStep Frequency Control> on cpu8
est9: <Enhanced SpeedStep Frequency Control> on cpu9
est10: <Enhanced SpeedStep Frequency Control> on cpu10
est11: <Enhanced SpeedStep Frequency Control> on cpu11
est12: <Enhanced SpeedStep Frequency Control> on cpu12
est13: <Enhanced SpeedStep Frequency Control> on cpu13
est14: <Enhanced SpeedStep Frequency Control> on cpu14
est15: <Enhanced SpeedStep Frequency Control> on cpu15
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
Timecounters tick every 1.000 msec
nvme cam probe device init
usbus0: 12Mbps Full Speed USB v1.0
usbus1: 12Mbps Full Speed USB v1.0
usbus2: 12Mbps Full Speed USB v1.0
ugen0.1: <Intel> at usbus0
uhub0: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
ugen1.1: <Intel> at usbus1
uhub1: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus1
ugen2.1: <Intel> at usbus2
uhub2: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2
usbus3: 480Mbps High Speed USB v2.0
usbus4: 12Mbps Full Speed USB v1.0
usbus5: 12Mbps Full Speed USB v1.0
ugen3.1: <Intel> at usbus3
uhub3: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3
ugen4.1: <Intel> at usbus4
uhub4: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus4
ugen5.1: <Intel> at usbus5
uhub5: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus5
usbus6: 12Mbps Full Speed USB v1.0
usbus7: 480Mbps High Speed USB v2.0
ugen6.1: <Intel> at usbus6
uhub6: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus6
ugen7.1: <Intel> at usbus7
uhub7: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus7
uhub2: 2 ports with 2 removable, self powered
uhub1: 2 ports with 2 removable, self powered
uhub0: 2 ports with 2 removable, self powered
uhub6: 2 ports with 2 removable, self powered
uhub4: 2 ports with 2 removable, self powered
uhub5: 2 ports with 2 removable, self powered
ada0 at ata2 bus 0 scbus0 target 0 lun 0
ada0: <GB0500EAFYL HPG1> ATA-7 SATA 2.x device
ada0: Serial Number WCASY6743897
ada0: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes)
ada0: 476940MB (976773168 512 byte sectors)
ada1 at ata2 bus 0 scbus0 target 1 lun 0
ada1: <ST500DM002-1BD142 KC48> ATA8-ACS SATA 3.x device
ada1: Serial Number Z6EMAENR
ada1: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes)
ada1: 476940MB (976773168 512 byte sectors)
ada1: quirks=0x1<4K>
ada2 at ata3 bus 0 scbus1 target 0 lun 0
ada2: <GB0500EAFYL HPG1> ATA-7 SATA 2.x device
ada2: Serial Number WCASY6752687
ada2: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes)
ada2: 476940MB (976773168 512 byte sectors)
ada3 at ata3 bus 0 scbus1 target 1 lun 0
ada3: <ST500DM002-1BD142 KC48> ATA8-ACS SATA 3.x device
ada3: Serial Number Z6EM8QHK
ada3: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes)
ada3: 476940MB (976773168 512 byte sectors)
ada3: quirks=0x1<4K>
SMP: AP CPU #1 Launched!
SMP: AP CPU #15 Launched!
SMP: AP CPU #4 Launched!
SMP: AP CPU #10 Launched!
SMP: AP CPU #6 Launched!
SMP: AP CPU #11 Launched!
SMP: AP CPU #2 Launched!
SMP: AP CPU #8 Launched!
SMP: AP CPU #3 Launched!
SMP: AP CPU #12 Launched!
SMP: AP CPU #7 Launched!
SMP: AP CPU #13 Launched!
SMP: AP CPU #5 Launched!
SMP: AP CPU #9 Launched!
SMP: AP CPU #14 Launched!
Timecounter "TSC-low" frequency 1200065624 Hz quality 1000
Trying to mount root from zfs:zfsroot []...
GEOM_MIRROR: Device mirror/swap launched (2/2).
Root mount waiting for: usbus7 usbus3
Root mount waiting for: usbus7 usbus3
uhub7: 6 ports with 6 removable, self powered
uhub3: 6 ports with 6 removable, self powered
igb0: link state changed to UP
vlan1: link state changed to UP
vlan2: link state changed to UP
vlan3: link state changed to UP
vlan4: link state changed to UP
vlan5: link state changed to UP
vlan6: link state changed to UP
vlan7: link state changed to UP
vlan8: link state changed to UP
vlan9: link state changed to UP
vlan10: link state changed to UP
vlan11: link state changed to UP
vlan12: link state changed to UP
vlan13: link state changed to UP

From Hiren Panchasara:

(In reply to Robert Watson from comment #29)

Robert,

Thanks for your response.

 On a slightly modified (nothing in driver space) stable/11, I am seeing repeated panic in sbsndptr() with igb while box is pretty much idle or doing very low traffic.

(kgdb) bt
#0  __curthread () at ./machine/pcpu.h:221
#1  doadump (textdump=-2121667464) at /d2/hiren/freebsd/sys/kern/kern_shutdown.c:298
#2  0xffffffff80389f86 in db_fncall_generic (nargs=0, addr=<optimized out>, rv=<optimized out>, 
    args=<optimized out>) at /d2/hiren/freebsd/sys/ddb/db_command.c:568
#3  db_fncall (dummy1=<optimized out>, dummy2=<optimized out>, dummy3=<optimized out>, dummy4=<optimized out>)
    at /d2/hiren/freebsd/sys/ddb/db_command.c:616
#4  0xffffffff80389a29 in db_command (last_cmdp=<optimized out>, cmd_table=<optimized out>, 
    dopager=<optimized out>) at /d2/hiren/freebsd/sys/ddb/db_command.c:440
#5  0xffffffff80389784 in db_command_loop () at /d2/hiren/freebsd/sys/ddb/db_command.c:493
#6  0xffffffff8038c76b in db_trap (type=<optimized out>, code=<optimized out>)
    at /d2/hiren/freebsd/sys/ddb/db_main.c:251
#7  0xffffffff809a6f33 in kdb_trap (type=<optimized out>, code=<optimized out>, tf=<optimized out>)
    at /d2/hiren/freebsd/sys/kern/subr_kdb.c:654
#8  0xffffffff80d93521 in trap_fatal (frame=0xfffffe1f2bb38210, eva=24)
    at /d2/hiren/freebsd/sys/amd64/amd64/trap.c:836
#9  0xffffffff80d93753 in trap_pfault (frame=0xfffffe1f2bb38210, usermode=0)
    at /d2/hiren/freebsd/sys/amd64/amd64/trap.c:691
#10 0xffffffff80d92cdc in trap (frame=0xfffffe1f2bb38210) at /d2/hiren/freebsd/sys/amd64/amd64/trap.c:442
#11 <signal handler called>
#12 sbsndptr (sb=0xfffff8060f8a5518, off=0, len=4294967287, moff=0xfffffe1f2bb38420)
    at /d2/hiren/freebsd/sys/kern/uipc_sockbuf.c:1191
#13 0xffffffff80ab9382 in tcp_output (tp=<optimized out>) at /d2/hiren/freebsd/sys/netinet/tcp_output.c:1099
#14 0xffffffff80ab6105 in tcp_do_segment (m=<optimized out>, th=<optimized out>, so=0xfffff8060f8a5360, 
    tp=<optimized out>, drop_hdrlen=60, tlen=<optimized out>, iptos=<optimized out>, 
    ti_locked=<error reading variable: Cannot access memory at address 0x1>)
    at /d2/hiren/freebsd/sys/netinet/tcp_input.c:3182
#15 0xffffffff80ab2803 in tcp_input (mp=<optimized out>, offp=<optimized out>, proto=<optimized out>)
    at /d2/hiren/freebsd/sys/netinet/tcp_input.c:1444
#16 0xffffffff80aa6bc5 in ip_input (m=<error reading variable: Cannot access memory at address 0x0>)
    at /d2/hiren/freebsd/sys/netinet/ip_input.c:809
#17 0xffffffff80a82b35 in netisr_dispatch_src (proto=1, source=<optimized out>, m=0x0)
    at /d2/hiren/freebsd/sys/net/netisr.c:1120
#18 0xffffffff80a6c2ca in ether_demux (ifp=<optimized out>, m=0x0) at /d2/hiren/freebsd/sys/net/if_ethersubr.c:850
#19 0xffffffff80a6cf22 in ether_input_internal (ifp=<optimized out>, m=0x0)
    at /d2/hiren/freebsd/sys/net/if_ethersubr.c:639
#20 ether_nh_input (m=<optimized out>) at /d2/hiren/freebsd/sys/net/if_ethersubr.c:669
#21 0xffffffff80a82b35 in netisr_dispatch_src (proto=5, source=<optimized out>, m=0x0)
    at /d2/hiren/freebsd/sys/net/netisr.c:1120
#22 0xffffffff80a6c546 in ether_input (ifp=<optimized out>, m=0x0) at /d2/hiren/freebsd/sys/net/if_ethersubr.c:759
#23 0xffffffff804e2b3c in igb_rx_input (rxr=<optimized out>, ifp=0xfffff80115614800, m=0xfffff8014eee7600, 
    ptype=<optimized out>) at /d2/hiren/freebsd/sys/dev/e1000/if_igb.c:4957
#24 igb_rxeof (que=<optimized out>, count=358700136, done=<optimized out>)
    at /d2/hiren/freebsd/sys/dev/e1000/if_igb.c:5185
#25 0xffffffff804e1daf in igb_msix_que (arg=<optimized out>) at /d2/hiren/freebsd/sys/dev/e1000/if_igb.c:1612
#26 0xffffffff8091425f in intr_event_execute_handlers (p=<optimized out>, ie=<optimized out>)
    at /d2/hiren/freebsd/sys/kern/kern_intr.c:1262
#27 0xffffffff80914876 in ithread_execute_handlers (ie=<optimized out>, p=<optimized out>)
    at /d2/hiren/freebsd/sys/kern/kern_intr.c:1275
#28 ithread_loop (arg=<optimized out>) at /d2/hiren/freebsd/sys/kern/kern_intr.c:1356
#29 0xffffffff80910ea5 in fork_exit (callout=0xffffffff809147b0 <ithread_loop>, arg=0xfffff8011561a0e0, 
    frame=0xfffffe1f2bb38ac0) at /d2/hiren/freebsd/sys/kern/kern_fork.c:1040
#30 <signal handler called>

----------------------------------------------------------------

Most interesting frames are these 2:

#22 0xffffffff80a6c546 in ether_input (ifp=<optimized out>, m=0x0) at /d2/hiren/freebsd/sys/net/if_ethersubr.c:759
#23 0xffffffff804e2b3c in igb_rx_input (rxr=<optimized out>, ifp=0xfffff80115614800, m=0xfffff8014eee7600, 
    ptype=<optimized out>) at /d2/hiren/freebsd/sys/dev/e1000/if_igb.c:4957

#23 has an mbuf while #22 has it null.

Does this point to your hunch of
"device-driver bugs involving modifications to the mbuf chain after submitting the mbuf to the network stack (e.g., due to concurrency bugs in the device driver)" ?

OR something else is going on?

From Daniel Bilik:

We've been also struggling this in past weeks, and I can confirm Robert's intuition.

In our case, the bug affects two hosts running recent 10-STABLE, connected to each other via igb(4) using a dedicated 100Mb switch. When trying to transfer directory structure holding several gigabytes of data with rsync protocol, either sender or receiver panics in less then a minute with:

Panic String: sbsndptr: sockbuf 0xfffff8000ccc76f8 and mbuf 0xfffff802a0145800 clashing

Interestingly, scp(1)ing data between the hosts doesn't seem to trigger this panic such easily, but sometimes it does, mostly when copying larger (>1GB) files.

We've fixed this just yesterday by limiting number of igb(4) txrx queues, ie. adding this into loader.conf:

hw.igb.num_queues=1

Now the hosts run stable, periodically rsyncing data in both directions.