Bug 203746 - Panic in NVME driver
Summary: Panic in NVME driver
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.2-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: Jim Harris
Depends on:
Reported: 2015-10-13 18:09 UTC by mgoroff
Modified: 2016-01-15 05:03 UTC (History)
2 users (show)

See Also:

Crashinfo file for NVME kernel panic (196.41 KB, text/plain)
2015-10-13 18:09 UTC, mgoroff
no flags Details
Crash info file (197.28 KB, text/plain)
2015-10-23 16:57 UTC, mgoroff
no flags Details
Crash info file (199.31 KB, text/plain)
2015-10-23 16:58 UTC, mgoroff
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description mgoroff 2015-10-13 18:09:20 UTC
Created attachment 161991 [details]
Crashinfo file for NVME kernel panic

We are running a ZFS filer server using 10.2-RELEASE. We recently encountered a kernel panic from the NVME driver under heavy load with an Intel DC P3700 NVME SSD board as a ZIL and an L2ARC. The core.txt file from the crash dump is attached. All NVME sysctl params were left at the defaults. The stack trace from the crash dump is:

(kgdb) #0  doadump (textdump=<value optimized out>) at pcpu.h:219
#1  0xffffffff80948642 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:451
#2  0xffffffff80948a25 in vpanic (fmt=<value optimized out>, 
    ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:758
#3  0xffffffff809488b3 in panic (fmt=0x0)
    at /usr/src/sys/kern/kern_shutdown.c:687
#4  0xffffffff80d4aadb in trap_fatal (frame=<value optimized out>, 
    eva=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:851
#5  0xffffffff80d4addd in trap_pfault (frame=0xfffffe3fd0568870, 
    usermode=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:674
#6  0xffffffff80d4a47a in trap (frame=0xfffffe3fd0568870)
    at /usr/src/sys/amd64/amd64/trap.c:440
#7  0xffffffff80d307f2 in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:236
#8  0xfffff8345ad7aaa8 in ?? ()
#9  0xffffffff80dedc03 in nvme_bio_child_done (arg=<value optimized out>, 
    cpl=<value optimized out>) at /usr/src/sys/dev/nvme/nvme_ns.c:235
#10 0xffffffff80dee55e in nvme_qpair_complete_tracker (
    qpair=0xfffff801268b9900, tr=0xfffff8012698a200, 
    cpl=<value optimized out>, print_on_error=<value optimized out>)
    at /usr/src/sys/dev/nvme/nvme_qpair.c:330
#11 0xffffffff80dee370 in nvme_qpair_process_completions (
    qpair=0xfffff801268b9900) at /usr/src/sys/dev/nvme/nvme_qpair.c:433
#12 0xffffffff8091482b in intr_event_execute_handlers (
    p=<value optimized out>, ie=0xfffff801268bf900)
    at /usr/src/sys/kern/kern_intr.c:1264
#13 0xffffffff80914c76 in ithread_loop (arg=0xfffff801268d0380)
    at /usr/src/sys/kern/kern_intr.c:1277
#14 0xffffffff8091244a in fork_exit (
    callout=0xffffffff80914be0 <ithread_loop>, arg=0xfffff801268d0380, 
    frame=0xfffffe3fd0568ac0) at /usr/src/sys/kern/kern_fork.c:1018
#15 0xffffffff80d30d2e in fork_trampoline ()
    at /usr/src/sys/amd64/amd64/exception.S:611
Comment 1 John Baldwin freebsd_committer freebsd_triage 2015-10-20 18:27:44 UTC
CC'ing Jim who works on nvme(4)
Comment 2 Jim Harris freebsd_committer 2015-10-23 15:36:15 UTC
I've received a similar report from a second party offline.  I am actively debugging this.  In the e-mail to -current, it was suggested that this filer was using NVMe on 10.1 without any issues, and this issue was only observed after upgrading to 10.2.  Can you confirm?
Comment 3 mgoroff 2015-10-23 15:45:57 UTC
The NVME boards were not installed very long before the upgrade to 10.2. I don't think there was enough run time to say this wasn't an issue in 10.1.

BTW, we have two more crash dumps from panics in the driver if it would help.
Comment 4 Jim Harris freebsd_committer 2015-10-23 15:53:38 UTC
Thanks for the quick reply.  Yes - please post the additional crashinfo files.
Comment 5 mgoroff 2015-10-23 16:57:27 UTC
Created attachment 162398 [details]
Crash info file
Comment 6 mgoroff 2015-10-23 16:58:06 UTC
Created attachment 162399 [details]
Crash info file

Attached two more crash info files.
Comment 7 commit-hook freebsd_committer 2015-10-30 16:07:14 UTC
A commit references this bug:

Author: jimharris
Date: Fri Oct 30 16:06:34 UTC 2015
New revision: 290198
URL: https://svnweb.freebsd.org/changeset/base/290198

  nvme: fix race condition in split bio completion path

  Fixes race condition observed under following circumstances:

  1) I/O split on 128KB boundary with Intel NVMe controller.
     Current Intel controllers produce better latency when
     I/Os do not span a 128KB boundary - even if the I/O size
     itself is less than 128KB.
  2) Per-CPU I/O queues are enabled.
  3) Child I/Os are submitted on different submission queues.
  4) Interrupts for child I/O completions occur almost
  5) ithread for child I/O A increments bio_inbed, then
     immediately is preempted (rendezvous IPI, higher priority
  6) ithread for child I/O B increments bio_inbed, then completes
     parent bio since all children are now completed.
  7) parent bio is freed, and immediately reallocated for a VFS
     or gpart bio (including setting bio_children to 1 and
     clearing bio_driver1).
  8) ithread for child I/O A resumes processing.  bio_children
     for what it thinks is the parent bio is set to 1, so it
     thinks it needs to complete the parent bio.

  Result is either calling a NULL callback function, or double freeing
  the bio to its uma zone.

  PR:		203746
  Reported by:	Drew Gallatin <gallatin@netflix.com>,
  		Marc Goroff <mgoroff@quorum.net>
  Tested by:	Drew Gallatin <gallatin@netflix.com>
  MFC after:	3 days
  Sponsored by:	Intel

Comment 8 Jim Harris freebsd_committer 2016-01-15 04:27:09 UTC
Issue should be fixed with SVN r290198 - waiting for confirmation from submitter.
Comment 9 Jim Harris freebsd_committer 2016-01-15 05:03:37 UTC
Marc confirmed offline that this bug can be closed.