Created attachment 162793 [details] cam_xpt.c device callout patch xpt_release_device() calls callout_stop() without holding the callout mutex (devq->send_mtx). When testing large-scale repeated FC device removal, we ran into panics where xpt_release_device() was reaching refcount 0 while the callout was pending (or at least active). The attached patch will give the device callout a ref on the device while the device callout is active, to ensure the device can't be destroyed until the callout is either stopped or finishes executing. Sponsored by: EMC / Isilon Storage Division
(In reply to Scott M. Ferris from comment #0) The comment on line 4755 is probably unnecessary.
Created attachment 162814 [details] Alternative patch Alternative version of patch, that I like more. Unfortunately still not viable.
I like your idea of giving timeout a reference. I've even reworked your patch in IMO cleaner way (see new attachment). Unfortunately after doing that I hit the problem affecting both mine and your patches: device reference counters are protected by bus lock, that can not be acquired while holding device queue lock. I am not ready to say this approach is hopeless, but requires some more thinking.
The bus/topology locks for device reference counting exist only because RCU wasn't available when I did the locking work. I'd really prefer that they went away soon. Would someone be willing to investigate ck_epoch from ConcurrencyKit? We're going to import this library into FreeBSD soon, and ck_epoch provides lockless object lifecycle semantics.
(In reply to Alexander Motin from comment #3) Yes, for the patch to work we have to use the lock ordering: devq->send_mtx before bus->eb_mtx device->device_mtx before bus->eb_mtx It's a counter-intuitive ordering but seems to work given how eb_mtx is used today. Where would we violate that ordering? I don't see anyplace acquiring another lock while holding bus->eb_mtx. No, wait, I missed xpt_alloc_device() wanting the send_mtx in order to cam_devq_resize(). Can we solve that by moving the cam_devq_resize() from xpt_alloc_device() up into xpt_compile_path()? That appears to be the only caller of xport->alloc_device() and only the transports call xpt_alloc_device(). I'll see if I can make that work.
Created attachment 162840 [details] revised patch with a LOR fix and mav's cleanups I attached a revised patch that tries to fix the LOR and incorporate mav's cleanups. The acquire side is handled by taking a device ref for the callout before we know if we need another ref, and releasing the extra ref if we discover we don't need it. The release side is handled using CALLOUT_RETURNUNLOCKED so that the callout can just unlock devq->send_mtx and then xpt_release_device().
Looks good to me. I am just not sure that KASSERTs duplication at xpt_destroy_device() after already done at xpt_release_device() is really useful.
Latest patch doesn't work. SCSI_STATUS_BUSY does RELSIM_RELEASE_AFTER_TIMEOUT | RELSIM_RELEASE_AFTER_CMDCMPLT, and expects a single freeze to be undone by either the timeout or a command completion, whichever happens first. xpt_done_process() for a command completion needs to be able to stop the device callout and xpt_release_device the refcount the callout had, or make the callout do nothing. I'll see if I can come up with a fix.
Hey Mav/Scott F... Finally read through all the back and forth... Did you ever come up with a patch that worked? While I don't have FC I have several machines I have a large number of device removals in my fleet and I've not seen this panic.... But reading the code today strongly suggests that something akin to this is still needed. We still mess with callouts in the same way, and it looks somewhat unsafe if one is pending when the device departs. Maybe I'll add a couple more KASSERTS to see if $WORK hits those problems. Scott Long's suggestion of a rework with ck_epoch is interesting and likely would be the best way to squash this in the long run, but would take a fair amount of time and fussing to get into good shape... at least that's what I fear given the network stack's experiences.
It was too long ago, but as I can see the callout is now always manipulated under the send_mtx lock. It was fixed 4 years ago in 03caca368a2ad. So I guess this report might be no longer applicable.
I agree with mav. The callout is always done under the send_mtx lock, so I'm going to close this. Should there be similar bugs or symptoms, please file a new bug.