Bug 229454 - System enabled for FC target mode using QLOGIC HBA panics with “fault code = supervisor write data, page not present - Fatal trap 12: page fault while in kernel mode”
Summary: System enabled for FC target mode using QLOGIC HBA panics with “fault code = ...
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.1-RELEASE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-07-01 15:56 UTC by Setsquare
Modified: 2018-08-27 08:28 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Setsquare 2018-07-01 15:56:45 UTC
Hi,

Pulling my hair out with this one.

System overview:

Intel(R) Xeon(R) CPU E3-1245 v6 @ 3.70GHz
32GB ECC RAM
Quad port QLOGIC PCIe adaptor listed as “Qlogic ISP 2432 PCI FC-AL Adapter” (1 of the ports is not connected, but 3 are)
No fibre channel switches are being used, connectivity is directly from the FC HBA on the FreeBSD storage server to 3x ESXi hosts.
FC Target devices for CTL, tried both physical block devices and ZVOLS on ZFS, the problem occurs with both.
FC Target devices for CTL are SAMSUNG SSD’s (EVO 840/850) & Crucial CT240M500SSD1
Problem does not seem to occur when using ISCSI using the same target devices for CTL, only seen the panic when using FC on the same server hardware.
3 LUNs were presented to the ESXi hosts (issue occurs with direct block device and ZVOL backing)

Kernel compiled with following in config file:

include GENERIC
ident FCTARGETMODE

device          ispfw
options         ISP_TARGET_MODE
options         ISP_DEFAULT_ROLES=1

Problem description:

The FreeBSD storage system as a FC target seems to function without issue using a single ESXi host as an initiator (from what I can tell, no panics). After additional ESXi hosts are powered on and the system becomes more active, higher load, storage vMotions a system panic occurs with the following. I have only ever noticed the panics when there are multiple systems accessing the target concurrently.

Under FreeBSD 10.4-RELEASE-p8 (Seems to have more info in crash textdump):

Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address	= 0x0
fault code		= supervisor write data, page not present
instruction pointer	= 0x20:0xffffffff8057a028
stack pointer	        = 0x28:0xfffffe07c69fb230
frame pointer	        = 0x28:0xfffffe07c69fb2b0
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 12 (irq270: isp3:0)
trap number		= 12
panic: page fault
cpuid = 2
KDB: stack backtrace:
#0 0xffffffff809d0340 at kdb_backtrace+0x60
#1 0xffffffff80990c16 at vpanic+0x126
#2 0xffffffff80990ae3 at panic+0x43
#3 0xffffffff80db03cd at trap_fatal+0x35d
#4 0xffffffff80db06e8 at trap_pfault+0x308
#5 0xffffffff80dafd4a at trap+0x47a
#6 0xffffffff80d9551c at calltrap+0x8
#7 0xffffffff80578702 at isp_async+0x13b2
#8 0xffffffff8058c9af at isp_target_notify+0xf9f
#9 0xffffffff80570fd4 at isp_intr_atioq+0xa4
#10 0xffffffff805896bb at isp_pci_run_isr_2400+0xab
#11 0xffffffff8057ab23 at isp_platform_intr+0x53
#12 0xffffffff8095a0a9 at intr_event_execute_handlers+0xb9
#13 0xffffffff8095a516 at ithread_loop+0x96
#14 0xffffffff8095796a at fork_exit+0x9a
#15 0xffffffff80d95a5e at fork_trampoline+0xe
Uptime: 20h32m44s
Dumping 1447 out of 32633 MB:..2%..12%..21%..31%..41%..51%..61%..71%..81%..91%

Reading symbols from /boot/kernel/ctl.ko.symbols...done.
Loaded symbols for /boot/kernel/ctl.ko.symbols
Reading symbols from /boot/kernel/iscsi.ko.symbols...done.
Loaded symbols for /boot/kernel/iscsi.ko.symbols
#0  doadump (textdump=<value optimized out>) at pcpu.h:219
219	pcpu.h: No such file or directory.
	in pcpu.h
(kgdb) #0  doadump (textdump=<value optimized out>) at pcpu.h:219
#1  0xffffffff80990833 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:486
#2  0xffffffff80990c55 in vpanic (fmt=<value optimized out>, 
    ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:889
#3  0xffffffff80990ae3 in panic (fmt=0x0)
    at /usr/src/sys/kern/kern_shutdown.c:818
#4  0xffffffff80db03cd in trap_fatal (frame=<value optimized out>, 
    eva=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:858
#5  0xffffffff80db06e8 in trap_pfault (frame=0xfffffe07c69fb180, 
    usermode=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:681
#6  0xffffffff80dafd4a in trap (frame=0xfffffe07c69fb180)
    at /usr/src/sys/amd64/amd64/trap.c:447
#7  0xffffffff80d9551c in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:238
#8  0xffffffff8057a028 in isp_handle_platform_atio7 (
    isp=<value optimized out>, aep=0xfffffe07c69fb5c0)
    at /usr/src/sys/dev/isp/isp_freebsd.c:1931
#9  0xffffffff80578702 in isp_async (isp=0xfffff80009723800, 
    cmd=<value optimized out>) at /usr/src/sys/dev/isp/isp_freebsd.c:3941
#10 0xffffffff8058c9af in isp_target_notify (isp=<value optimized out>, 
    vptr=<value optimized out>, optrp=<value optimized out>)
    at /usr/src/sys/dev/isp/isp_target.c:735
#11 0xffffffff80570fd4 in isp_intr_atioq (isp=0xfffff80009723800)
    at /usr/src/sys/dev/isp/isp.c:4943
#12 0xffffffff805896bb in isp_pci_run_isr_2400 (isp=0xfffff80009723800)
    at /usr/src/sys/dev/isp/isp_pci.c:1167
#13 0xffffffff8057ab23 in isp_platform_intr (arg=0xfffff80009723800)
    at /usr/src/sys/dev/isp/isp_freebsd.c:4155
#14 0xffffffff8095a0a9 in intr_event_execute_handlers (
    p=<value optimized out>, ie=0xfffff80009738700)
    at /usr/src/sys/kern/kern_intr.c:1264
#15 0xffffffff8095a516 in ithread_loop (arg=0xfffff80009750240)
    at /usr/src/sys/kern/kern_intr.c:1277
#16 0xffffffff8095796a in fork_exit (
    callout=0xffffffff8095a480 <ithread_loop>, arg=0xfffff80009750240, 
    frame=0xfffffe07c69fb9c0) at /usr/src/sys/kern/kern_fork.c:1032
#17 0xffffffff80d95a5e in fork_trampoline ()
    at /usr/src/sys/amd64/amd64/exception.S:613
#18 0x0000000000000000 in ?? ()
Current language:  auto; currently minimal
(kgdb)

Under FreeBSD 11.1-RELEASE-p10:

Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address    = 0x0
fault code               = supervisor write data, page not present
instruction pointer      = 0x20:0xffffffff805b3da0
stack pointer            = 0x28:0xfffffe07c5d4e470
frame pointer            = 0x28:0xfffffe07c5d4e4e0
code segment             = base rx0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process          = 12 (irq270: isp3:0)
trap number              = 12
panic: page fault
cpuid = 2
KDB: stack backtrace:
#0 0xffffffff80ab97b7 at kdb_backtrace+0x67
#1 0xffffffff80a77896 at vpanic+0x186
#2 0xffffffff80a77703 at panic+0x43
#3 0xffffffff80eef192 at trap_fatal+0x322
#4 0xffffffff80eef1eb at trap_pfault+0x4b
#5 0xffffffff80eee948 at trap+0x2a8
#6 0xffffffff80ecf950 at calltrap+0x8
#7 0xffffffff805b26bb at isp_async+0x156b
#8 0xffffffff805c7ebe at isp_target_notify+0x186e
#9 0xffffffff805aa733 at isp_intr_atioq+0xa3
#10 0xffffffff805c61f4 at isp_pci_run_isr_2400+0x84
#11 0xffffffff805b4a01 at isp_platform_intr+0x41
#12 0xffffffff80a3dedc at intr_event_execute_handlers+0xec
#13 0xffffffff80a3e1c6 at ithread_loop+0xd6
#14 0xffffffff80a3b535 at fork_exit+0x85
#15 0xffffffff80ed081e at fork_trampoline+0xe
Uptime: 3h29m17s

What I have tried so far that hasn’t resolved the issue:

Changing the FC ports on the HBA in the FreeBSD server.
Set DataMover.HardwareAcceleratedMove, DataMover.HardwareAcceleratedInit and VMFS3.HardwareAcceleratedLocking on the ESXi hosts to 0 (disabled)
Changed PCIe slot the Quad port HBA is using the FreeBSD server
Disabled Hyper threading on the FreeBSD server
Ran Memtest86+ on the FreeBSD server memory (no errors found)
Use ZVOLS instead of direct block devices for ctld

Going forward:

Happy to try help as much as I can to help troubleshoot this going forward. Can provide assisted remote access to the system that displays the issue if required. I can also get dumps if required, just let me know.

Noticed this on the FreeNAS issue list, not sure if it’s related at all https://redmine.ixsystems.com/issues/32370
Comment 1 Setsquare 2018-07-01 16:18:19 UTC
In Addition to the above sometimes I have also noticed that the FC target LUNs show as dead from the initiator but the FreeBSD server has not panic'd. There doesn't seem to be anything I can do on the FreeBSD FC target to bring them back up. Only option I have had is to restart the FreeBSD server.
Comment 2 Setsquare 2018-08-27 08:28:29 UTC
Hi,

Update: I swapped out the quad port for 2x dual port cards and I am no longer experiencing the issue.

Not sure if the "Chip Revision 0x2" had anything to do with it as that's the only thing I see that is different in the hardware identification.

Quad port card with issues:
Board Type 2422, Chip Revision 0x2, loaded F/W Revision 8.7.0

Working dual port cards:
Board Type 2422, Chip Revision 0x3, loaded F/W Revision 8.7.0

Kind Regards