It seems that camperiphfree() has some unsafe code that can lead to a crash when a peripheral is freed at the same time as an unrelated peripheral driver is added. p_drv is a pointer into periph_drivers. The pointer is used across xpt_unlock_buses + xpt_lock_buses. While the buses lock is dropped periph_drivers could be pointed to a new memory location and the old memory location would be freed. See periphdriver_register() for details. Thus, p_drv can end up pointing into the freed memory. That would lead to a crash or memory corruption while dereferencing the pointer. I have actually experienced such a crash (happened during the kernel boot): #3 0xffffffff80651f33 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:594 #4 0xffffffff8085a190 in trap_fatal (frame=0xfffffe050536b430, eva=0) at /usr/src/sys/amd64/amd64/trap.c:802 #5 0xffffffff80859737 in trap (frame=0xfffffe050536b430) at /usr/src/sys/amd64/amd64/trap.c:198 #6 0xffffffff8085a4ba in trap_check (frame=0xfffffe050536b430) at /usr/src/sys/amd64/amd64/trap.c:603 #7 <signal handler called> #8 0xffffffff802a9cd8 in camperiphfree (periph=0xfffff800224f9300) at /usr/src/sys/cam/cam_periph.c:712 #9 0xffffffff802a9bb2 in cam_periph_release_locked_buses (periph=0xfffff800224f9300) at /usr/src/sys/cam/cam_periph.c:441 #10 0xffffffff802a9e7b in cam_periph_release_locked (periph=0xfffff800224f9300) at /usr/src/sys/cam/cam_periph.c:452 #11 0xffffffff802bfeac in probedone (periph=<optimized out>, done_ccb=0xfffff8000bfe1000) at /usr/src/sys/cam/scsi/scsi_xpt.c:1198 #12 0xffffffff802b13c3 in xpt_done_process (ccb_h=0xfffff8000bfe1000) at /usr/src/sys/cam/cam_xpt.c:5452 #13 0xffffffff802b2965 in xpt_done_td (arg=0xffffffff80cf3880 <cam_doneqs>) at /usr/src/sys/cam/cam_xpt.c:5479 (kgdb) fr 8 #8 0xffffffff802a9cd8 in camperiphfree (periph=0xfffff800224f9300) at /usr/src/sys/cam/cam_periph.c:712 712 TAILQ_REMOVE(&(*p_drv)->units, periph, unit_links); (kgdb) p p_drv $1 = (struct periph_driver **) 0xfffff80007330030 (kgdb) p *p_drv $2 = (struct periph_driver *) 0xdeadc0dedeadc0d (kgdb) p periph_drivers $9 = (struct periph_driver **) 0xfffff800a81f4880 I think that we should cache the actual pointer to the peripheral driver rather than the pointer into the array of pointers.
Created attachment 183542 [details] Proposed patch to cache the periph_driver pointer. Proposed patch to cache the periph_driver pointer.
Can you try out the patch and see if it works for you?
(In reply to Kenneth D. Merry from comment #2) I never had a way to reproduce the problem, so I can not say with 100% certainty that the problem is fixed. The code looks like it does fix the problem. I do not see any regressions from the patch. And I have not seen any recurrence of the problem. Thank you!
A commit references this bug: Author: ken Date: Tue Jun 27 19:26:03 UTC 2017 New revision: 320421 URL: https://svnweb.freebsd.org/changeset/base/320421 Log: Fix a panic in camperiphfree(). If a peripheral driver (e.g. da, sa, cd) is added or removed from the peripheral driver list while an unrelated peripheral driver instance (e.g. da0, sa5, cd2) is going away and is inside camperiphfree(), we could dereference an invalid pointer. When peripheral drivers are added or removed (see periphdriver_register() and periphdriver_unregister()), the peripheral driver array is resized and existing entries are moved. Although we hold the topology lock while we traverse the peripheral driver list, we retain a pointer to the location of the peripheral driver pointer and then drop the topology lock. So we are still vulnerable to the list getting moved around while the lock is dropped. To solve the problem, cache a copy of the peripheral driver pointer. If its storage location in the list changes while we have the lock dropped, it won't have any effect. This doesn't solve the issue that peripheral drivers ("da", "cd", as opposed to individual instances like "da0", "cd0") are not generally part of a reference counting scheme to guard against deregistering them while there are instances active. The caller (generally the person unloading a module) has to be aware of active drivers and not unload something that is in use. sys/cam/cam_periph.c: In camperiphfree(), cache a pointer to the peripheral driver instance to avoid holding a pointer to an invalid memory location in the event that the peripheral driver list changes while we have the topology lock dropped. PR: kern/219701 Submitted by: avg MFC after: 3 days Sponsored by: Spectra Logic Changes: head/sys/cam/cam_periph.c
A commit references this bug: Author: ken Date: Mon Jul 3 15:34:20 UTC 2017 New revision: 320601 URL: https://svnweb.freebsd.org/changeset/base/320601 Log: MFC r320421: ------------------------------------------------------------------------ r320421 | ken | 2017-06-27 13:26:02 -0600 (Tue, 27 Jun 2017) | 37 lines Fix a panic in camperiphfree(). If a peripheral driver (e.g. da, sa, cd) is added or removed from the peripheral driver list while an unrelated peripheral driver instance (e.g. da0, sa5, cd2) is going away and is inside camperiphfree(), we could dereference an invalid pointer. When peripheral drivers are added or removed (see periphdriver_register() and periphdriver_unregister()), the peripheral driver array is resized and existing entries are moved. Although we hold the topology lock while we traverse the peripheral driver list, we retain a pointer to the location of the peripheral driver pointer and then drop the topology lock. So we are still vulnerable to the list getting moved around while the lock is dropped. To solve the problem, cache a copy of the peripheral driver pointer. If its storage location in the list changes while we have the lock dropped, it won't have any effect. This doesn't solve the issue that peripheral drivers ("da", "cd", as opposed to individual instances like "da0", "cd0") are not generally part of a reference counting scheme to guard against deregistering them while there are instances active. The caller (generally the person unloading a module) has to be aware of active drivers and not unload something that is in use. sys/cam/cam_periph.c: In camperiphfree(), cache a pointer to the peripheral driver instance to avoid holding a pointer to an invalid memory location in the event that the peripheral driver list changes while we have the topology lock dropped. PR: kern/219701 Submitted by: avg Sponsored by: Spectra Logic ------------------------------------------------------------------------ PR: kern/219701 Sponsored by: Spectra Logic Changes: _U stable/10/ stable/10/sys/cam/cam_periph.c
A commit references this bug: Author: ken Date: Mon Jul 3 15:34:21 UTC 2017 New revision: 320602 URL: https://svnweb.freebsd.org/changeset/base/320602 Log: MFC r320421: ------------------------------------------------------------------------ r320421 | ken | 2017-06-27 13:26:02 -0600 (Tue, 27 Jun 2017) | 37 lines Fix a panic in camperiphfree(). If a peripheral driver (e.g. da, sa, cd) is added or removed from the peripheral driver list while an unrelated peripheral driver instance (e.g. da0, sa5, cd2) is going away and is inside camperiphfree(), we could dereference an invalid pointer. When peripheral drivers are added or removed (see periphdriver_register() and periphdriver_unregister()), the peripheral driver array is resized and existing entries are moved. Although we hold the topology lock while we traverse the peripheral driver list, we retain a pointer to the location of the peripheral driver pointer and then drop the topology lock. So we are still vulnerable to the list getting moved around while the lock is dropped. To solve the problem, cache a copy of the peripheral driver pointer. If its storage location in the list changes while we have the lock dropped, it won't have any effect. This doesn't solve the issue that peripheral drivers ("da", "cd", as opposed to individual instances like "da0", "cd0") are not generally part of a reference counting scheme to guard against deregistering them while there are instances active. The caller (generally the person unloading a module) has to be aware of active drivers and not unload something that is in use. sys/cam/cam_periph.c: In camperiphfree(), cache a pointer to the peripheral driver instance to avoid holding a pointer to an invalid memory location in the event that the peripheral driver list changes while we have the topology lock dropped. PR: kern/219701 Submitted by: avg Sponsored by: Spectra Logic ------------------------------------------------------------------------ PR: kern/219701 Sponsored by: Spectra Logic Changes: _U stable/11/ stable/11/sys/cam/cam_periph.c
A commit references this bug: Author: ken Date: Mon Jul 3 18:20:45 UTC 2017 New revision: 320608 URL: https://svnweb.freebsd.org/changeset/base/320608 Log: Merge r320602 from stable/11 into releng/11.1: ------------------------------------------------------------------------ r320602 | ken | 2017-07-03 09:34:21 -0600 (Mon, 03 Jul 2017) | 45 lines MFC r320421: ------------------------------------------------------------------------ r320421 | ken | 2017-06-27 13:26:02 -0600 (Tue, 27 Jun 2017) | 37 lines Fix a panic in camperiphfree(). If a peripheral driver (e.g. da, sa, cd) is added or removed from the peripheral driver list while an unrelated peripheral driver instance (e.g. da0, sa5, cd2) is going away and is inside camperiphfree(), we could dereference an invalid pointer. When peripheral drivers are added or removed (see periphdriver_register() and periphdriver_unregister()), the peripheral driver array is resized and existing entries are moved. Although we hold the topology lock while we traverse the peripheral driver list, we retain a pointer to the location of the peripheral driver pointer and then drop the topology lock. So we are still vulnerable to the list getting moved around while the lock is dropped. To solve the problem, cache a copy of the peripheral driver pointer. If its storage location in the list changes while we have the lock dropped, it won't have any effect. This doesn't solve the issue that peripheral drivers ("da", "cd", as opposed to individual instances like "da0", "cd0") are not generally part of a reference counting scheme to guard against deregistering them while there are instances active. The caller (generally the person unloading a module) has to be aware of active drivers and not unload something that is in use. sys/cam/cam_periph.c: In camperiphfree(), cache a pointer to the peripheral driver instance to avoid holding a pointer to an invalid memory location in the event that the peripheral driver list changes while we have the topology lock dropped. PR: kern/219701 Submitted by: avg Sponsored by: Spectra Logic ------------------------------------------------------------------------ ------------------------------------------------------------------------ Approved by: re (gjb) Changes: _U releng/11.1/ releng/11.1/sys/cam/cam_periph.c
Fixed in head, releng/11.1, stable/11, and stable/10. Thanks for the bug report!