Bug 208957 - Kernel panic (page fault) on 10.3-STABLE with VIMAGE & Infiniband modules
Summary: Kernel panic (page fault) on 10.3-STABLE with VIMAGE & Infiniband modules
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.3-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: Hans Petter Selasky
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-04-21 14:47 UTC by Justin Clift
Modified: 2016-04-29 11:31 UTC (History)
3 users (show)

See Also:


Attachments
ipoib_ib_completion() (206.95 KB, image/png)
2016-04-21 17:49 UTC, Justin Clift
no flags Details
ipoib_cm_handle_rx_wc() (205.85 KB, image/png)
2016-04-21 17:50 UTC, Justin Clift
no flags Details
netisr_dispatch_src() (207.84 KB, image/png)
2016-04-21 17:51 UTC, Justin Clift
no flags Details
ip_input() (203.35 KB, image/png)
2016-04-21 17:52 UTC, Justin Clift
no flags Details
calltrap() (208.03 KB, image/png)
2016-04-21 17:52 UTC, Justin Clift
no flags Details
ipoib_cm_handle_rx_wc() - print *dev (204.46 KB, image/png)
2016-04-21 18:13 UTC, Justin Clift
no flags Details
VIMAGE + ipoib fix (1.04 KB, patch)
2016-04-21 18:22 UTC, Hans Petter Selasky
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Justin Clift 2016-04-21 14:47:22 UTC
The VIMAGE option is causing a kernel panic (page fault) when compiled along with the Infiniband options on 10.3-STABLE.  It's 100% reproducible, and easily triggered. ;)

Note - compiled this multiple times over the last few days, across several systems, just to ensure it's not due to bad hw in a system.  It panic reliably every time, on them all.  Definitely a software bug of some sort.

Note - Anecdotal evidence suggests the repeated problems of VIMAGE + Infiniband is a large part of the reason Infiniband isn't supported on FreeNAS.  The NAS4Free project also has difficulties with Infiniband, very likely also due to this. :(

    https://bugs.freenas.org/issues/2014#note-18

Anyway, backtrace info below in case it helps:
(commands taken from https://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-gdb.html)

***********************************************************************************

root@cluster1:/usr/obj/usr/src/sys/CONNECTX # kgdb kernel.debug /var/crash/vmcore.0
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 12 (irq271: mlx4_core0)
trap number		= 12
panic: page fault
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff807263d0 at kdb_backtrace+0x60
#1 0xffffffff806e8c76 at vpanic+0x126
#2 0xffffffff806e8b43 at panic+0x43
#3 0xffffffff80b8bf3b at trap_fatal+0x36b
#4 0xffffffff80b8c23d at trap_pfault+0x2ed
#5 0xffffffff80b8b8ba at trap+0x47a
#6 0xffffffff80b71892 at calltrap+0x8
#7 0xffffffff807be1a2 at netisr_dispatch_src+0x62
#8 0xffffffff808f89fa at ipoib_cm_handle_rx_wc+0x22a
#9 0xffffffff808fcc98 at ipoib_ib_completion+0x78
#10 0xffffffff80930c43 at mlx4_cq_completion+0x63
#11 0xffffffff80933d43 at mlx4_eq_int+0x2c3
#12 0xffffffff80932fac at mlx4_msi_x_interrupt+0xc
#13 0xffffffff806b35cb at intr_event_execute_handlers+0xab
#14 0xffffffff806b3a16 at ithread_loop+0x96
#15 0xffffffff806b104a at fork_exit+0x9a
#16 0xffffffff80b71dce at fork_trampoline+0xe
Uptime: 3m47s
Dumping 485 out of 7857 MB:..4%..14%..24%..33%..43%..53%..63%..73%..83%..93%

Reading symbols from /boot/kernel/ums.ko.symbols...done.
Loaded symbols for /boot/kernel/ums.ko.symbols
#0  doadump (textdump=<value optimized out>) at pcpu.h:219
219		__asm("movq %%gs:%1,%0" : "=r" (td)
(kgdb) list *0xffffffff808f89fa
0xffffffff808f89fa is in ipoib_cm_handle_rx_wc (/usr/src/sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_cm.c:565).
560		mb->m_pkthdr.rcvif = dev;
561		proto = *mtod(mb, uint16_t *);
562		m_adj(mb, IPOIB_ENCAP_LEN);
563	
564		IPOIB_MTAP_PROTO(dev, mb, proto);
565		ipoib_demux(dev, mb, ntohs(proto));
566	
567	repost:
568		if (has_srq) {
569			if (unlikely(ipoib_cm_post_receive_srq(priv, wr_id)))
Current language:  auto; currently minimal
(kgdb) list *0xffffffff807be1a2
0xffffffff807be1a2 is in netisr_dispatch_src (/usr/src/sys/net/netisr.c:976).
971		if (dispatch_policy == NETISR_DISPATCH_DIRECT) {
972			nwsp = DPCPU_PTR(nws);
973			npwp = &nwsp->nws_work[proto];
974			npwp->nw_dispatched++;
975			npwp->nw_handled++;
976			netisr_proto[proto].np_handler(m);
977			error = 0;
978			goto out_unlock;
979		}
980	
(kgdb) list *0xffffffff80b71892
0xffffffff80b71892 is at /usr/src/sys/amd64/amd64/exception.S:238.
233		.type	calltrap,@function
234	calltrap:
235		movq	%rsp,%rdi
236		call	trap
237		MEXITCOUNT
238		jmp	doreti			/* Handle any pending ASTs */
239	
240		/*
241		 * alltraps_noen entry point.  Unlike alltraps above, we want to
242		 * leave the interrupts disabled.  This corresponds to
(kgdb) list *0xffffffff80b8b8ba
0xffffffff80b8b8ba is in trap (/usr/src/sys/amd64/amd64/trap.c:447).
442	
443			KASSERT(cold || td->td_ucred != NULL,
444			    ("kernel trap doesn't have ucred"));
445			switch (type) {
446			case T_PAGEFLT:			/* page fault */
447				(void) trap_pfault(frame, FALSE);
448				goto out;
449	
450			case T_DNA:
451				KASSERT(!PCB_USER_FPU(td->td_pcb),
(kgdb)

***********************************************************************************

Kernel configuration used:

---

#
# GENERIC -- Generic kernel configuration file for FreeBSD/amd64
#
# For more information on this file, please read the config(5) manual page,
# and/or the handbook section on Kernel Configuration Files:
#
#    http://www.FreeBSD.org/doc/en_US.ISO8859-1/books/handbook/kernelconfig-config.html
#
# The handbook is also available locally in /usr/share/doc/handbook
# if you've installed the doc distribution, otherwise always see the
# FreeBSD World Wide Web server (http://www.FreeBSD.org/) for the
# latest information.
#
# An exhaustive list of options and more detailed explanations of the
# device lines is also present in the ../../conf/NOTES and NOTES files.
# If you are in doubt as to the purpose or necessity of a line, check first
# in NOTES.
#
# $FreeBSD: stable/10/sys/amd64/conf/GENERIC 286132 2015-07-31 15:25:07Z gjb $

cpu		HAMMER
ident		CONNECTX2

makeoptions	DEBUG=-g		# Build kernel with gdb(1) debug symbols
makeoptions	WITH_CTF=1		# Run ctfconvert(1) for DTrace support

#####################################################################
# NETWORKING OPTIONS

#
# DEVICE_POLLING adds support for mixed interrupt-polling handling
# of network device drivers, which has significant benefits in terms
# of robustness to overloads and responsivity, as well as permitting
# accurate scheduling of the CPU time between kernel network processing
# and other activities.  The drawback is a moderate (up to 1/HZ seconds)
# potential increase in response times.
# It is strongly recommended to use HZ=1000 or 2000 with DEVICE_POLLING
# to achieve smoother behaviour.
# Additionally, you can enable/disable polling at runtime with help of
# the ifconfig(8) utility, and select the CPU fraction reserved to
# userland with the sysctl variable kern.polling.user_frac
# (default 50, range 0..100).
#
# Not all device drivers support this mode of operation at the time of
# this writing.  See polling(4) for more details.

options         DEVICE_POLLING

# BPF_JITTER adds support for BPF just-in-time compiler.

options         BPF_JITTER

# OpenFabrics Enterprise Distribution (Infiniband).
options         OFED
options         OFED_DEBUG_INIT

# Sockets Direct Protocol
options         SDP
options         SDP_DEBUG
 
# IP over Infiniband
options         IPOIB
options         IPOIB_DEBUG
options         IPOIB_CM
#####################################################################

options 	SCHED_ULE		# ULE scheduler
options 	PREEMPTION		# Enable kernel thread preemption
options 	INET			# InterNETworking
options 	INET6			# IPv6 communications protocols
options 	TCP_OFFLOAD		# TCP offload
options 	SCTP			# Stream Control Transmission Protocol
options 	FFS			# Berkeley Fast Filesystem
options 	SOFTUPDATES		# Enable FFS soft updates support
options 	UFS_ACL			# Support for access control lists
options 	UFS_DIRHASH		# Improve performance on big directories
options 	UFS_GJOURNAL		# Enable gjournal-based UFS journaling
options 	QUOTA			# Enable disk quotas for UFS
options 	MD_ROOT			# MD is a potential root device
options 	NFSCL			# New Network Filesystem Client
options 	NFSD			# New Network Filesystem Server
options 	NFSLOCKD		# Network Lock Manager
options 	NFS_ROOT		# NFS usable as /, requires NFSCL
options 	MSDOSFS			# MSDOS Filesystem
options 	CD9660			# ISO 9660 Filesystem
options 	PROCFS			# Process filesystem (requires PSEUDOFS)
options 	PSEUDOFS		# Pseudo-filesystem framework
options 	GEOM_PART_GPT		# GUID Partition Tables.
options 	GEOM_RAID		# Soft RAID functionality.
options 	GEOM_LABEL		# Provides labelization
options 	COMPAT_FREEBSD32	# Compatible with i386 binaries
options 	COMPAT_FREEBSD4		# Compatible with FreeBSD4
options 	COMPAT_FREEBSD5		# Compatible with FreeBSD5
options 	COMPAT_FREEBSD6		# Compatible with FreeBSD6
options 	COMPAT_FREEBSD7		# Compatible with FreeBSD7
options 	SCSI_DELAY=5000		# Delay (in ms) before probing SCSI
options 	KTRACE			# ktrace(1) support
options 	STACK			# stack(9) support
options 	SYSVSHM			# SYSV-style shared memory
options 	SYSVMSG			# SYSV-style message queues
options 	SYSVSEM			# SYSV-style semaphores
options 	_KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time extensions
options 	PRINTF_BUFR_SIZE=128	# Prevent printf output being interspersed.
options 	KBD_INSTALL_CDEV	# install a CDEV entry in /dev
options 	HWPMC_HOOKS		# Necessary kernel hooks for hwpmc(4)
options 	AUDIT			# Security event auditing
options 	CAPABILITY_MODE		# Capsicum capability mode
options 	CAPABILITIES		# Capsicum capabilities
options 	PROCDESC		# Support for process descriptors
options 	MAC			# TrustedBSD MAC Framework
options 	KDTRACE_FRAME		# Ensure frames are compiled in
options 	KDTRACE_HOOKS		# Kernel DTrace hooks
options 	DDB_CTF			# Kernel ELF linker loads CTF data
options 	INCLUDE_CONFIG_FILE	# Include this file in kernel
options 	RACCT			# Resource accounting framework
options 	RACCT_DEFAULT_TO_DISABLED # Set kern.racct.enable=0 by default
options 	RCTL			# Resource limits

# Debugging support.  Always need this:
options 	KDB			# Enable kernel debugger support.
options 	KDB_TRACE		# Print a stack trace for a panic.

# Make an SMP-capable kernel by default
options 	SMP			# Symmetric MultiProcessor Kernel

# CPU frequency control
device		cpufreq

# Bus support.
device		acpi
options 	ACPI_DMAR
device		pci

# Floppy drives
#device		fdc

# ATA controllers
device		ahci			# AHCI-compatible SATA controllers
device		ata			# Legacy ATA/SATA controllers
options 	ATA_STATIC_ID		# Static device numbering
device		mvs			# Marvell 88SX50XX/88SX60XX/88SX70XX/SoC SATA
device		siis			# SiliconImage SiI3124/SiI3132/SiI3531 SATA

# SCSI Controllers
device		ahc			# AHA2940 and onboard AIC7xxx devices
options 	AHC_REG_PRETTY_PRINT	# Print register bitfields in debug
					# output.  Adds ~128k to driver.
device		ahd			# AHA39320/29320 and onboard AIC79xx devices
options 	AHD_REG_PRETTY_PRINT	# Print register bitfields in debug
					# output.  Adds ~215k to driver.
device		esp			# AMD Am53C974 (Tekram DC-390(T))
device		hptiop			# Highpoint RocketRaid 3xxx series
device		isp			# Qlogic family
device		ispfw			# Firmware for QLogic HBAs- normally a module
device		mpt			# LSI-Logic MPT-Fusion
device		mps			# LSI-Logic MPT-Fusion 2
device		mpr			# LSI-Logic MPT-Fusion 3
device		ncr			# NCR/Symbios Logic
device		sym			# NCR/Symbios Logic (newer chipsets + those of `ncr')
device		trm			# Tekram DC395U/UW/F DC315U adapters

device		adv			# Advansys SCSI adapters
device		adw			# Advansys wide SCSI adapters
device		aic			# Adaptec 15[012]x SCSI adapters, AIC-6[23]60.
device		bt			# Buslogic/Mylex MultiMaster SCSI adapters
device		isci			# Intel C600 SAS controller

# ATA/SCSI peripherals
device		scbus			# SCSI bus (required for ATA/SCSI)
device		ch			# SCSI media changers
device		da			# Direct Access (disks)
device		sa			# Sequential Access (tape etc)
device		cd			# CD
device		pass			# Passthrough device (direct ATA/SCSI access)
device		ses			# Enclosure Services (SES and SAF-TE)
#device		ctl			# CAM Target Layer

# RAID controllers interfaced to the SCSI subsystem
device		amr			# AMI MegaRAID
device		arcmsr			# Areca SATA II RAID
#XXX it is not 64-bit clean, -scottl
#device		asr			# DPT SmartRAID V, VI and Adaptec SCSI RAID
device		ciss			# Compaq Smart RAID 5*
device		dpt			# DPT Smartcache III, IV - See NOTES for options
device		hptmv			# Highpoint RocketRAID 182x
device		hptnr			# Highpoint DC7280, R750
device		hptrr			# Highpoint RocketRAID 17xx, 22xx, 23xx, 25xx
device		hpt27xx			# Highpoint RocketRAID 27xx
device		iir			# Intel Integrated RAID
device		ips			# IBM (Adaptec) ServeRAID
device		mly			# Mylex AcceleRAID/eXtremeRAID
device		twa			# 3ware 9000 series PATA/SATA RAID
device		tws			# LSI 3ware 9750 SATA+SAS 6Gb/s RAID controller

# RAID controllers
#device		aac			# Adaptec FSA RAID
#device		aacp			# SCSI passthrough for aac (requires CAM)
#device		aacraid			# Adaptec by PMC RAID
#device		ida			# Compaq Smart RAID
#device		mfi			# LSI MegaRAID SAS
#device		mlx			# Mylex DAC960 family
#device		mrsas			# LSI/Avago MegaRAID SAS/SATA, 6Gb/s and 12Gb/s
#XXX PCI ID conflicts with ahd(4) and mvs(4)
#device		pmspcv			# PMC-Sierra SAS/SATA Controller driver
#XXX pointer/int warnings
#device		pst			# Promise Supertrak SX6000
#device		twe			# 3ware ATA RAID

# NVM Express (NVMe) support
device		nvme			# base NVMe driver
device		nvd			# expose NVMe namespaces as disks, depends on nvme

# atkbdc0 controls both the keyboard and the PS/2 mouse
device		atkbdc			# AT keyboard controller
device		atkbd			# AT keyboard
device		psm			# PS/2 mouse

device		kbdmux			# keyboard multiplexer

device		vga			# VGA video card driver
options 	VESA			# Add support for VESA BIOS Extensions (VBE)

device		splash			# Splash screen and screen saver support

# syscons is the default console driver, resembling an SCO console
device		sc
options 	SC_PIXEL_MODE		# add support for the raster text mode

# vt is the new video console driver
device		vt
device		vt_vga
device		vt_efifb

device		agp			# support several AGP chipsets

# PCCARD (PCMCIA) support
# PCMCIA and cardbus bridge support
device		cbb			# cardbus (yenta) bridge
device		pccard			# PC Card (16-bit) bus
device		cardbus			# CardBus (32-bit) bus

# Serial (COM) ports
device		uart			# Generic UART driver

# Parallel port
device		ppc
device		ppbus			# Parallel port bus (required)
device		lpt			# Printer
device		ppi			# Parallel port interface device
device		vpo			# Requires scbus and da

device		puc			# Multi I/O cards and multi-channel UARTs

# PCI Ethernet NICs.
#device		bxe			# Broadcom NetXtreme II BCM5771X/BCM578XX 10GbE
#device		de			# DEC/Intel DC21x4x (``Tulip'')
device		em			# Intel PRO/1000 Gigabit Ethernet Family
#device		igb			# Intel PRO/1000 PCIE Server Gigabit Family
#device		ix			# Intel PRO/10GbE PCIE PF Ethernet
#device		ixv			# Intel PRO/10GbE PCIE VF Ethernet
#device		ixl			# Intel XL710 40Gbe PCIE Ethernet
#device		ixlv			# Intel XL710 40Gbe VF PCIE Ethernet
device          mlx4ib          # Mellanox ConnectX HCA InfiniBand
device          mlxen           # Mellanox ConnectX HCA Ethernet
device          mthca           # Mellanox HCA InfiniBand
#device		le			# AMD Am7900 LANCE and Am79C9xx PCnet
#device		ti			# Alteon Networks Tigon I/II gigabit Ethernet
#device		txp			# 3Com 3cR990 (``Typhoon'')
#device		vx			# 3Com 3c590, 3c595 (``Vortex'')

# PCI Ethernet NICs that use the common MII bus controller code.
# NOTE: Be sure to keep the 'device miibus' line in order to use these NICs!
device		miibus			# MII bus support
#device		ae			# Attansic/Atheros L2 FastEthernet
#device		age			# Attansic/Atheros L1 Gigabit Ethernet
#device		alc			# Atheros AR8131/AR8132 Ethernet
#device		ale			# Atheros AR8121/AR8113/AR8114 Ethernet
#device		bce			# Broadcom BCM5706/BCM5708 Gigabit Ethernet
#device		bfe			# Broadcom BCM440x 10/100 Ethernet
#device		bge			# Broadcom BCM570xx Gigabit Ethernet
#device		cas			# Sun Cassini/Cassini+ and NS DP83065 Saturn
#device		dc			# DEC/Intel 21143 and various workalikes
#device		et			# Agere ET1310 10/100/Gigabit Ethernet
#device		fxp			# Intel EtherExpress PRO/100B (82557, 82558)
#device		gem			# Sun GEM/Sun ERI/Apple GMAC
#device		hme			# Sun HME (Happy Meal Ethernet)
#device		jme			# JMicron JMC250 Gigabit/JMC260 Fast Ethernet
#device		lge			# Level 1 LXT1001 gigabit Ethernet
#device		msk			# Marvell/SysKonnect Yukon II Gigabit Ethernet
#device		nfe			# nVidia nForce MCP on-board Ethernet
#device		nge			# NatSemi DP83820 gigabit Ethernet
#device		nve			# nVidia nForce MCP on-board Ethernet Networking
#device		pcn			# AMD Am79C97x PCI 10/100 (precedence over 'le')
device		re			# RealTek 8139C+/8169/8169S/8110S
#device		rl			# RealTek 8129/8139
#device		sf			# Adaptec AIC-6915 (``Starfire'')
#device		sge			# Silicon Integrated Systems SiS190/191
#device		sis			# Silicon Integrated Systems SiS 900/SiS 7016
#device		sk			# SysKonnect SK-984x & SK-982x gigabit Ethernet
#device		ste			# Sundance ST201 (D-Link DFE-550TX)
#device		stge			# Sundance/Tamarack TC9021 gigabit Ethernet
#device		tl			# Texas Instruments ThunderLAN
#device		tx			# SMC EtherPower II (83c170 ``EPIC'')
#device		vge			# VIA VT612x gigabit Ethernet
#device		vr			# VIA Rhine, Rhine II
#device		wb			# Winbond W89C840F
#device		xl			# 3Com 3c90x (``Boomerang'', ``Cyclone'')

# ISA Ethernet NICs.  pccard NICs included.
#device		cs			# Crystal Semiconductor CS89x0 NIC
# 'device ed' requires 'device miibus'
#device		ed			# NE[12]000, SMC Ultra, 3c503, DS8390 cards
#device		ex			# Intel EtherExpress Pro/10 and Pro/10+
#device		ep			# Etherlink III based cards
#device		fe			# Fujitsu MB8696x based cards
#device		sn			# SMC's 9000 series of Ethernet chips
#device		xe			# Xircom pccard Ethernet

# Wireless NIC cards
#device		wlan			# 802.11 support
#options 	IEEE80211_DEBUG		# enable debug msgs
#options 	IEEE80211_AMPDU_AGE	# age frames in AMPDU reorder q's
#options 	IEEE80211_SUPPORT_MESH	# enable 802.11s draft support
#device		wlan_wep		# 802.11 WEP support
#device		wlan_ccmp		# 802.11 CCMP support
#device		wlan_tkip		# 802.11 TKIP support
#device		wlan_amrr		# AMRR transmit rate control algorithm
#device		an			# Aironet 4500/4800 802.11 wireless NICs.
#device		ath			# Atheros NICs
#device		ath_pci			# Atheros pci/cardbus glue
#device		ath_hal			# pci/cardbus chip support
#options 	AH_SUPPORT_AR5416	# enable AR5416 tx/rx descriptors
#options 	AH_AR5416_INTERRUPT_MITIGATION # AR5416 interrupt mitigation
#options 	ATH_ENABLE_11N		# Enable 802.11n support for AR5416 and later
#device		ath_rate_sample		# SampleRate tx rate control for ath
#device		bwi			# Broadcom BCM430x/BCM431x wireless NICs.
#device		bwn			# Broadcom BCM43xx wireless NICs.
#device		ipw			# Intel 2100 wireless NICs.
#device		iwi			# Intel 2200BG/2225BG/2915ABG wireless NICs.
#device		iwn			# Intel 4965/1000/5000/6000 wireless NICs.
#device		malo			# Marvell Libertas wireless NICs.
#device		mwl			# Marvell 88W8363 802.11n wireless NICs.
#device		ral			# Ralink Technology RT2500 wireless NICs.
#device		wi			# WaveLAN/Intersil/Symbol 802.11 wireless NICs.
#device		wpi			# Intel 3945ABG wireless NICs.

# Pseudo devices.
device		loop			# Network loopback
device		random			# Entropy device
device		padlock_rng		# VIA Padlock RNG
device		rdrand_rng		# Intel Bull Mountain RNG
device		ether			# Ethernet support
device		vlan			# 802.1Q VLAN support
device		tun			# Packet tunnel.
device		md			# Memory "disks"
device		gif			# IPv6 and IPv4 tunneling
device		faith			# IPv6-to-IPv4 relaying (translation)
device		firmware		# firmware assist module

# The `bpf' device enables the Berkeley Packet Filter.
# Be aware of the administrative consequences of enabling this!
# Note that 'bpf' is required for DHCP.
device		bpf			# Berkeley packet filter

# USB support
options 	USB_DEBUG		# enable debug msgs
device		uhci			# UHCI PCI->USB interface
device		ohci			# OHCI PCI->USB interface
device		ehci			# EHCI PCI->USB interface (USB 2.0)
device		xhci			# XHCI PCI->USB interface (USB 3.0)
device		usb			# USB Bus (required)
device		ukbd			# Keyboard
device		umass			# Disks/Mass storage - Requires scbus and da

# Sound support
device		sound			# Generic sound driver (required)
#device		snd_cmi			# CMedia CMI8338/CMI8738
#device		snd_csa			# Crystal Semiconductor CS461x/428x
#device		snd_emu10kx		# Creative SoundBlaster Live! and Audigy
#device		snd_es137x		# Ensoniq AudioPCI ES137x
device		snd_hda			# Intel High Definition Audio
device		snd_ich			# Intel, NVidia and other ICH AC'97 Audio
device		snd_via8233		# VIA VT8233x Audio

# MMC/SD
device		mmc			# MMC/SD bus
device		mmcsd			# MMC/SD memory card
device		sdhci			# Generic PCI SD Host Controller

# VirtIO support
device		virtio			# Generic VirtIO bus (required)
device		virtio_pci		# VirtIO PCI device
device		vtnet			# VirtIO Ethernet device
device		virtio_blk		# VirtIO Block device
device		virtio_scsi		# VirtIO SCSI device
device		virtio_balloon		# VirtIO Memory Balloon device

# HyperV drivers and enchancement support
# NOTE: HYPERV depends on hyperv.  They must be added or removed together.
options 	HYPERV			# Hyper-V kernel infrastructure
device		hyperv			# HyperV drivers 

# Xen HVM Guest Optimizations
# NOTE: XENHVM depends on xenpci.  They must be added or removed together.
options 	XENHVM			# Xen HVM kernel infrastructure
device		xenpci			# Xen HVM Hypervisor services driver

# VMware support
device		vmx			# VMware VMXNET3 Ethernet

# 2016-04-21 JC Added VIMAGE just to verify it's the crash causer
options		VIMAGE

---
Comment 1 Justin Clift 2016-04-21 15:06:10 UTC
Oops, forgot to include the trigger.  It's 100% reproducible, and very easy.

All that needs to be done is have an interface (eg ib0) be in IPoIB mode with an IP assigned. (eg 10.10.100.1)

When that interface is pinged from another host (eg "ping 10.10.100.1") the
kernel panic immediately happens.  Every time. ;)
Comment 2 Justin Clift 2016-04-21 17:48:37 UTC
Hmmm, ddd seems to be showing more/better information about the crash.

Attaching screenshots, as they seem to show the exact code at the crash point. ;)
Comment 3 Justin Clift 2016-04-21 17:49:12 UTC
Created attachment 169532 [details]
ipoib_ib_completion()
Comment 4 Justin Clift 2016-04-21 17:50:49 UTC
Created attachment 169533 [details]
ipoib_cm_handle_rx_wc()
Comment 5 Justin Clift 2016-04-21 17:51:51 UTC
Created attachment 169534 [details]
netisr_dispatch_src()
Comment 6 Justin Clift 2016-04-21 17:52:23 UTC
Created attachment 169535 [details]
ip_input()
Comment 7 Justin Clift 2016-04-21 17:52:45 UTC
Created attachment 169536 [details]
calltrap()
Comment 8 Hans Petter Selasky freebsd_committer 2016-04-21 18:03:19 UTC
In this context, can you type:

print *dev

Or:

print dev->if_vnet


(kgdb) list *0xffffffff808f89fa
0xffffffff808f89fa is in ipoib_cm_handle_rx_wc (/usr/src/sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_cm.c:565).
560		mb->m_pkthdr.rcvif = dev;
561		proto = *mtod(mb, uint16_t *);
562		m_adj(mb, IPOIB_ENCAP_LEN);
563	
564		IPOIB_MTAP_PROTO(dev, mb, proto);
565		ipoib_demux(dev, mb, ntohs(proto));
566	
567	repost:

--HPS
Comment 9 Justin Clift 2016-04-21 18:13:10 UTC
Created attachment 169537 [details]
ipoib_cm_handle_rx_wc() - print *dev
Comment 10 Justin Clift 2016-04-21 18:14:41 UTC
Ahhh, cut-n-paste works too.  That'll be easier. :)

(gdb) print dev->if_vnet
$3 = (struct vnet *) 0xfffff8000318aa80
(gdb)
Comment 11 Hans Petter Selasky freebsd_committer 2016-04-21 18:22:22 UTC
Created attachment 169538 [details]
VIMAGE + ipoib fix

Can you try this patch?
Comment 12 Justin Clift 2016-04-21 18:42:13 UTC
Awesome Hans, that seems to have fixed it. :)

Instead of the kernel panic, the console is now showing:

ib0: REQ arrived
ib0: REP received.
ib0 cm rep handler
[etc]
Comment 13 Hans Petter Selasky freebsd_committer 2016-04-21 18:45:06 UTC
Hi,

Can you test it a bit, and I'll get it upstream and then to 10-stable in a weeks time or so?

--HPS
Comment 14 Justin Clift 2016-04-21 18:48:01 UTC
Yep, no problem.  I should be able to spin an initial IB enabled image of FreeNAS 9.10 in a few hours, then (hopefully) hammer on that for a bit.  Assuming nothing blows up. :D

(For anyone following along that also wants to try stuff out, look here: https://github.com/justinclift/freenas-infiniband)
Comment 15 commit-hook freebsd_committer 2016-04-22 06:33:41 UTC
A commit references this bug:

Author: hselasky
Date: Fri Apr 22 06:33:06 UTC 2016
New revision: 298458
URL: https://svnweb.freebsd.org/changeset/base/298458

Log:
  Add missing set of the current VNET when inputting IP packets in IPoIB.

  This fixes a kernel panic when using IPoIB with VIMAGE and infiniband.

  PR:		208957
  Sponsored by:	Mellanox Technologies
  Tested by:	Justin Clift <justin@postgresql.org>
  MFC after:	1 week

Changes:
  head/sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_cm.c
Comment 16 commit-hook freebsd_committer 2016-04-29 11:30:51 UTC
A commit references this bug:

Author: hselasky
Date: Fri Apr 29 11:29:53 UTC 2016
New revision: 298779
URL: https://svnweb.freebsd.org/changeset/base/298779

Log:
  MFC r298458:
  Add missing set of the current VNET when inputting IP packets in IPoIB.

  This fixes a kernel panic when using IPoIB with VIMAGE and infiniband.

  PR:		208957
  Sponsored by:	Mellanox Technologies
  Tested by:	Justin Clift <justin@postgresql.org>

Changes:
_U  stable/10/
  stable/10/sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_cm.c
Comment 17 commit-hook freebsd_committer 2016-04-29 11:31:53 UTC
A commit references this bug:

Author: hselasky
Date: Fri Apr 29 11:31:28 UTC 2016
New revision: 298780
URL: https://svnweb.freebsd.org/changeset/base/298780

Log:
  MFC r298458:
  Add missing set of the current VNET when inputting IP packets in IPoIB.

  This fixes a kernel panic when using IPoIB with VIMAGE and infiniband.

  PR:		208957
  Sponsored by:	Mellanox Technologies
  Tested by:	Justin Clift <justin@postgresql.org>

Changes:
_U  stable/9/sys/
  stable/9/sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_cm.c