The CAM xpt_done_td is marked THREAD_NO_SLEEPING. This is problematic since the AC_FOUND_DEVICE async events still call functions such as disk_alloc() and devstat_alloc() that malloc with M_WAITOK, so they could sleep, which panics the
You can spot the problem more easily by adding an ASSERT in malloc
that checks for M_WAITOK and THREAD_CAN_SLEEP, and then removing and
re-adding a device at run-time. At least with mps, the initial device
creation works since it runs from dainit() in an intr config hook.
I'll attach a patch with the assertion that highlights the problem.
Created attachment 147387 [details]
malloc M_WAITOK patch to assert THREAD_CAN_SLEEP
I'm testing out the patch in an 11-CURRENT VM to make sure it doesn't break the VM
case at least. I'll send it out for review to alc/kib.
Actually, I'm going to put the bug back in Needs Triage state because the patch
above makes the issue apparent -- the larger issue Scott brought up needs to be fixed.
Kib had some feedback on the assert:
1. We should also add it (and the interrupt check) to uma_zalloc_arg() (through 1 inline function)
2. The interrupt assert may be wrong since it is not OK to malloc(9) in an interrupt, regardless of the flags.
Isilon's internal discussion was that we should add a debug stack output rather than an assert until all major cases are fixed.
The patch Scott provided looked ok (doesn't panic on boot with simple cases with a VM), but I didn't get an opportunity to test it out more extensively.
I didn't try out a patch that incorporates the feedback noted in comment # 4 yet.
Trace from xpt_done_td from pulling a device out of the system:
KASSERT failed: malloc(M_WAITOK) in no_sleeping context
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe349829a340
kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe349829a3f0
_kassert_panic() at _kassert_panic+0xd7/frame 0xfffffe349829a470
malloc() at malloc+0x2e4/frame 0xfffffe349829a4c0
g_post_event_x() at g_post_event_x+0x84/frame 0xfffffe349829a510
g_post_event() at g_post_event+0x5d/frame 0xfffffe349829a580
adacleanup() at adacleanup+0x62/frame 0xfffffe349829a5a0
cam_periph_release_locked_buses() at cam_periph_release_locked_buses+0xde/frame 0xfffffe349829aaa0
cam_periph_release_locked() at cam_periph_release_locked+0x1b/frame 0xfffffe349829aac0
adadone() at adadone+0x26e/frame 0xfffffe349829ab20
xpt_done_process() at xpt_done_process+0x3a4/frame 0xfffffe349829ab60
xpt_done_td() at xpt_done_td+0x136/frame 0xfffffe349829abb0
fork_exit() at fork_exit+0x84/frame 0xfffffe349829abf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe349829abf0
https://reviews.freebsd.org/D829 - KASSERT_WARN
https://reviews.freebsd.org/D830 - Use KASSERT_WARN in malloc(9) and uma_zalloc_arg(9)