Created attachment 162793 [details]
cam_xpt.c device callout patch
xpt_release_device() calls callout_stop() without holding the callout mutex (devq->send_mtx).
When testing large-scale repeated FC device removal, we ran into panics where xpt_release_device() was reaching refcount 0 while the callout was pending (or at least active).
The attached patch will give the device callout a ref on the device while the device callout is active, to ensure the device can't be destroyed until the callout is either stopped or finishes executing.
Sponsored by: EMC / Isilon Storage Division
(In reply to Scott M. Ferris from comment #0)
The comment on line 4755 is probably unnecessary.
Created attachment 162814 [details]
Alternative version of patch, that I like more. Unfortunately still not viable.
I like your idea of giving timeout a reference. I've even reworked your patch in IMO cleaner way (see new attachment). Unfortunately after doing that I hit the problem affecting both mine and your patches: device reference counters are protected by bus lock, that can not be acquired while holding device queue lock. I am not ready to say this approach is hopeless, but requires some more thinking.
The bus/topology locks for device reference counting exist only because RCU wasn't available when I did the locking work. I'd really prefer that they went away soon. Would someone be willing to investigate ck_epoch from ConcurrencyKit? We're going to import this library into FreeBSD soon, and ck_epoch provides lockless object lifecycle semantics.
(In reply to Alexander Motin from comment #3)
Yes, for the patch to work we have to use the lock ordering:
devq->send_mtx before bus->eb_mtx
device->device_mtx before bus->eb_mtx
It's a counter-intuitive ordering but seems to work given how eb_mtx is used today. Where would we violate that ordering? I don't see anyplace acquiring another lock while holding bus->eb_mtx.
No, wait, I missed xpt_alloc_device() wanting the send_mtx in order to cam_devq_resize(). Can we solve that by moving the cam_devq_resize() from xpt_alloc_device() up into xpt_compile_path()? That appears to be the only caller of xport->alloc_device() and only the transports call xpt_alloc_device(). I'll see if I can make that work.
Created attachment 162840 [details]
revised patch with a LOR fix and mav's cleanups
I attached a revised patch that tries to fix the LOR and incorporate mav's cleanups.
The acquire side is handled by taking a device ref for the callout before we know if we need another ref, and releasing the extra ref if we discover we don't need it.
The release side is handled using CALLOUT_RETURNUNLOCKED so that the callout can just unlock devq->send_mtx and then xpt_release_device().
Looks good to me. I am just not sure that KASSERTs duplication at xpt_destroy_device() after already done at xpt_release_device() is really useful.
Latest patch doesn't work. SCSI_STATUS_BUSY does RELSIM_RELEASE_AFTER_TIMEOUT | RELSIM_RELEASE_AFTER_CMDCMPLT, and expects a single freeze to be undone by either the timeout or a command completion, whichever happens first. xpt_done_process() for a command completion needs to be able to stop the device callout and xpt_release_device the refcount the callout had, or make the callout do nothing. I'll see if I can come up with a fix.