Bug 191468 - [vimage] options VIMAGE + Infiniband - kernel panic, crashes during system boot
Summary: [vimage] options VIMAGE + Infiniband - kernel panic, crashes during system boot
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 9.2-RELEASE
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-net (Nobody)
URL: http://forums.nas4free.org/viewtopic....
Keywords:
Depends on:
Blocks:
 
Reported: 2014-06-28 22:41 UTC by dreamcat4
Modified: 2015-01-06 08:03 UTC (History)
4 users (show)

See Also:


Attachments
(Photo) Kernel Panic (414.65 KB, image/jpeg)
2014-06-28 22:41 UTC, dreamcat4
no flags Details
Kernel config (2.28 KB, text/plain)
2014-06-28 22:43 UTC, dreamcat4
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description dreamcat4 2014-06-28 22:41:15 UTC
Created attachment 144234 [details]
(Photo) Kernel Panic

Hello.
We have this week experienced a bad combination of kernel options. On FreeBSD-9.2. Kernel crashes, panics during system boot.

Added:

+options VIMAGE
+device epair

Boot gets at least this far in DMESG:

http://i.imgur.com/f954gLW.jpg

Then panic occur here:

http://i.imgur.com/6nx6AiX.jpg

If the VIMAGE and pair are subsequently removed from the kernel config file, then the new built image will boot up all fine.

This forum thread mentions some of conflicts which are suspected:

http://forums.nas4free.org/viewtopic.php?f=56&t=5917

We also may suspect other more generally (from past history of VIMAGE) -

PF, IPFW, CARP, ALTQ, LAGG, IPFILTER or IPNAT.

Kernel Config file to be attached soon.
Comment 1 dreamcat4 2014-06-28 22:43:09 UTC
Created attachment 144235 [details]
Kernel config
Comment 2 dreamcat4 2014-06-29 19:56:10 UTC
This crash occurred on:

Freebsd 9.2 p9
svn revision r268004

Kernel config 

include         GENERIC
ident		NAS4FREE-x64

#####################################################################
# NAS4FREE
#####################################################################
device		speaker

# for ZFS tuning
options		VM_KMEM_SIZE_SCALE=2

# Networking
#options		DEVICE_POLLING
#options		HZ=1000
options		VIMAGE		# Vnet jails
device			epair		# Vnet jails

# GEOM classes
options 	GEOM_ELI		# Disk encryption.
options 	GEOM_UZIP		# Read-only compressed disks

device		lagg			# Link aggregation interface.
#device	vlan			# 802.1Q VLAN support
device		if_bridge		# Bridge interface.

# ATA/SCSI peripherals
device		ctl		# CAM Target Layer

# 10GbE adapters
device		cxgb		# Chelsio T3 10 Gigabit Ethernet
device		cxgb_t3fw	# Chelsio T3 10 Gigabit Ethernet firmware
device		cxgbe		# Chelsio T4 10GbE PCIe adapter
device		ixgb		# Intel Pro/10Gbe PCI-X Ethernet
device		mxge		# Myricom Myri-10G 10GbE NIC
device		nxge		# Neterion Xframe 10GbE Server/Storage Adapter
device		qlxgb		# QLogic 3200 and 8200 10GbE/CNA Adapter
device		vxge		# Exar/Neterion XFrame 3100 10GbE
device		oce		# Emulex 10Gbe Ethernet
device		sfxge		# Solarflare 10Gb Ethernet Adapters

# InfiniBand support
options	OFED		# InfiniBand support
options	SDP		# SDP protocol
options	IPOIB		# IPoIB
options	IPOIB_CM	# IPoIB connected mode

# InfiniBand Adapters
device		mlx4ib
device		mlxen
device		mthca

# Hardware crypto acceleration
device		crypto		# core crypto support
device		cryptodev	# /dev/crypto for access to h/w

# Temperature sensors:
#
# coretemp: on-die sensor on Intel Core and newer CPUs
#
device		coretemp
# amdtemp: on-die digital thermal sensor for AMD K8, K10 and K11
device		amdtemp
# cpuctl: cpuctl pseudo device
device		cpuctl

# IP firewall
options		IPFIREWALL
options		IPFIREWALL_VERBOSE
options		IPFIREWALL_VERBOSE_LIMIT=5
options		IPFIREWALL_DEFAULT_TO_ACCEPT

# Disk quotas are supported when this option is enabled.
#options 	QUOTA			#enable disk quotas

# use module
#nooption 	NFSCLIENT		# Network File System client
#nooption 	NFSSERVER		# Network File System server
#nooption 	NFSLOCKD		# Network Lock Manager
#nooption 	NFS_ROOT		# NFS usable as /, requires NFSCL
#nooption 	NFSCL			# New Network Filesystem Client
#nooption 	NFSD			# New Network Filesystem Server
#nodevice	xhci			# XHCI PCI->USB interface (USB 3.0)
Comment 3 Kurt Jaeger freebsd_committer freebsd_triage 2014-07-12 06:17:35 UTC
Can you test if this still happens with the upcoming 9.3-release ?
Comment 4 dreamcat4 2014-07-12 08:15:08 UTC
(In reply to Kurt Jaeger from comment #3)
> Can you test if this still happens with the upcoming 9.3-release ?

Sorry. Unfortunately I don't happen to have any suitable FreeBSD-GENERIC build lying around. Best I can do is put it on my list (sorry, very busy), and try to get round to it eventually. Maybe in a few weeks time.

In an ideal circumstance, we would:

* Rebuild for this kernel config in a 9.2-GENERIC VM.
* Reproduce panic. [Y/N]
* Then freebsd-update to 9.3 RELEASE (since 9.3 will be released by that time). 
* Rebuild for this kernel config in a 9.3-GENERIC VM.
* Reproduce panic. [Y/N]
* Then freebsd-update to 10.0 RELEASE. 
* Rebuild for this kernel config in a 10.0-GENERIC VM.
* Reproduce panic. [Y/N]

However the situation is complicated by the fact we don't know exactly which other modules VIMAGE is conflicting with. So in reality, we need to also be:

* Rebuild for this kernel config in a 9.2-GENERIC VM.
* Reproduce panic. [Y/N]
* <edit kernel config>
 * add or remove a suspected module
* Reproduce panic. [Y/N]
* <edit kernel config>
 * add or remove a suspected module
* Reproduce panic. [Y/N]
* <edit kernel config>
 * add or remove a suspected module
Etc. etc.

Which take a lot of time.
Comment 5 dreamcat4 2014-07-21 19:50:55 UTC
OK. Good news. We have some new information from jandegr, who has recently been investigating this same issue. Here are his comments:

In short : Vimage or infiniband on 9.3-RELEASE is no problem, but both together in the same config gives a nice page fault at boot.
No problem for the both in the same config on 10-stable.
They can reproduce it starting from a generic config with the addition of the infiniband entries from the NAS4Free config.
I did not determine which of the infiniband entries causes it,I treated them as one block for the purpose of building a VIMAGE kernel, and I already spent
a lot of time to find the infiniband connection.


Forum Link:

http://www.forums.nas4free.org/viewtopic.php?f=69&t=5365&sid=f51d324fd807e82208b3cbf793d0d55d&start=100
Comment 6 commit-hook freebsd_committer freebsd_triage 2014-12-08 07:26:13 UTC
A commit references this bug:

Author: rodrigc
Date: Mon Dec  8 07:26:01 UTC 2014
New revision: 275599
URL: https://svnweb.freebsd.org/changeset/base/275599

Log:
  Use CURVNET macros inside inet_get_local_port_range() function.
  Without this fix, a kernel with VIMAGE + Infiniband will panic on bootup.

  Certain necessary #include statements require LIST_HEAD.
  Add these includes to ofed/include/linux/list.h, because
  LIST_HEAD is specifically overridden in this file.

  PR: 191468
  Differential Revision: D1279
  Reviewed by: hselasky

Changes:
  head/sys/ofed/include/linux/list.h
  head/sys/ofed/include/net/ip.h
Comment 7 commit-hook freebsd_committer freebsd_triage 2015-01-06 08:00:25 UTC
A commit references this bug:

Author: rodrigc
Date: Tue Jan  6 07:59:51 UTC 2015
New revision: 276744
URL: https://svnweb.freebsd.org/changeset/base/276744

Log:
  Merge r275599:
  Use CURVNET macros inside inet_get_local_port_range() function.
  Without this fix, a kernel with VIMAGE + Infiniband will panic on bootup.

  Certain necessary #include statements require LIST_HEAD.
  Add these includes to ofed/include/linux/list.h, because
  LIST_HEAD is specifically overridden in this file.

  PR: 191468
  Differential Revision: D1279
  Reviewed by: hselasky

Changes:
_U  stable/10/
  stable/10/sys/ofed/include/linux/list.h
  stable/10/sys/ofed/include/net/ip.h
Comment 8 commit-hook freebsd_committer freebsd_triage 2015-01-06 08:03:28 UTC
A commit references this bug:

Author: rodrigc
Date: Tue Jan  6 08:03:03 UTC 2015
New revision: 276745
URL: https://svnweb.freebsd.org/changeset/base/276745

Log:
  Merge r276744:
  Use CURVNET macros inside inet_get_local_port_range() function.
  Without this fix, a kernel with VIMAGE + Infiniband will panic on bootup.

  Certain necessary #include statements require LIST_HEAD.
  Add these includes to ofed/include/linux/list.h, because
  LIST_HEAD is specifically overridden in this file.

  PR: 191468
  Differential Revision: D1279
  Reviewed by: hselasky

Changes:
_U  stable/9/sys/
  stable/9/sys/ofed/include/linux/list.h
  stable/9/sys/ofed/include/net/ip.h