Bug 204298

Summary: xpt_release_device() panic: mutex CAM queue lock not owned
Product: Base System Reporter: Scott M. Ferris <smferris>
Component: kernAssignee: freebsd-scsi (Nobody) <scsi>
Status: New ---    
Severity: Affects Some People CC: benno, jeff, mav, ngie, scottl
Priority: --- Keywords: patch
Version: CURRENT   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
cam_xpt.c device callout patch
none
Alternative patch
none
revised patch with a LOR fix and mav's cleanups none

Description Scott M. Ferris 2015-11-05 00:34:15 UTC
Created attachment 162793 [details]
cam_xpt.c device callout patch

xpt_release_device() calls callout_stop() without holding the callout mutex (devq->send_mtx).

When testing large-scale repeated FC device removal, we ran into panics where xpt_release_device() was reaching refcount 0 while the callout was pending (or at least active).

The attached patch will give the device callout a ref on the device while the device callout is active, to ensure the device can't be destroyed until the callout is either stopped or finishes executing.

Sponsored by: EMC / Isilon Storage Division
Comment 1 Enji Cooper freebsd_committer freebsd_triage 2015-11-05 00:42:09 UTC
(In reply to Scott M. Ferris from comment #0)

The comment on line 4755 is probably unnecessary.
Comment 2 Alexander Motin freebsd_committer freebsd_triage 2015-11-05 14:24:20 UTC
Created attachment 162814 [details]
Alternative patch

Alternative version of patch, that I like more. Unfortunately still not viable.
Comment 3 Alexander Motin freebsd_committer freebsd_triage 2015-11-05 14:28:39 UTC
I like your idea of giving timeout a reference. I've even reworked your patch in IMO cleaner way (see new attachment). Unfortunately after doing that I hit the problem affecting both mine and your patches: device reference counters are protected by bus lock, that can not be acquired while holding device queue lock. I am not ready to say this approach is hopeless, but requires some more thinking.
Comment 4 Scott Long freebsd_committer freebsd_triage 2015-11-05 16:34:33 UTC
The bus/topology locks for device reference counting exist only because RCU wasn't available when I did the locking work.  I'd really prefer that they went away soon.  Would someone be willing to investigate ck_epoch from ConcurrencyKit?  We're going to import this library into FreeBSD soon, and ck_epoch provides lockless object lifecycle semantics.
Comment 5 Scott M. Ferris 2015-11-05 19:19:54 UTC
(In reply to Alexander Motin from comment #3)

Yes, for the patch to work we have to use the lock ordering:
  devq->send_mtx before bus->eb_mtx
  device->device_mtx before bus->eb_mtx

It's a counter-intuitive ordering but seems to work given how eb_mtx is used today. Where would we violate that ordering?  I don't see anyplace acquiring another lock while holding bus->eb_mtx.

No, wait, I missed xpt_alloc_device() wanting the send_mtx in order to cam_devq_resize().  Can we solve that by moving the cam_devq_resize() from xpt_alloc_device() up into xpt_compile_path()?  That appears to be the only caller of xport->alloc_device() and only the transports call xpt_alloc_device().  I'll see if I can make that work.
Comment 6 Scott M. Ferris 2015-11-06 07:59:14 UTC
Created attachment 162840 [details]
revised patch with a LOR fix and mav's cleanups

I attached a revised patch that tries to fix the LOR and incorporate mav's cleanups.

The acquire side is handled by taking a device ref for the callout before we know if we need another ref, and releasing the extra ref if we discover we don't need it.

The release side is handled using CALLOUT_RETURNUNLOCKED so that the callout can just unlock devq->send_mtx and then xpt_release_device().
Comment 7 Alexander Motin freebsd_committer freebsd_triage 2015-11-06 11:04:11 UTC
Looks good to me. I am just not sure that KASSERTs duplication at xpt_destroy_device() after already done at xpt_release_device() is really useful.
Comment 8 Scott M. Ferris 2015-11-09 18:02:33 UTC
Latest patch doesn't work.  SCSI_STATUS_BUSY does RELSIM_RELEASE_AFTER_TIMEOUT | RELSIM_RELEASE_AFTER_CMDCMPLT, and expects a single freeze to be undone by either the timeout or a command completion, whichever happens first. xpt_done_process() for a command completion needs to be able to stop the device callout and xpt_release_device the refcount the callout had, or make the callout do nothing.  I'll see if I can come up with a fix.