Bug 193696 - CAM AC_FOUND_DEVICE calls malloc(M_WAITOK) from THREAD_NO_SLEEPING() context
Summary: CAM AC_FOUND_DEVICE calls malloc(M_WAITOK) from THREAD_NO_SLEEPING() context
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-scsi (Nobody)
Depends on:
Reported: 2014-09-16 21:50 UTC by Scott M. Ferris
Modified: 2014-10-01 15:57 UTC (History)
5 users (show)

See Also:

malloc M_WAITOK patch to assert THREAD_CAN_SLEEP (549 bytes, patch)
2014-09-16 21:51 UTC, Scott M. Ferris
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Scott M. Ferris 2014-09-16 21:50:19 UTC
The CAM xpt_done_td is marked THREAD_NO_SLEEPING.  This is problematic since the AC_FOUND_DEVICE async events still call functions such as disk_alloc() and devstat_alloc() that malloc with M_WAITOK, so they could sleep, which panics the

You can spot the problem more easily by adding an ASSERT in malloc
that checks for M_WAITOK and THREAD_CAN_SLEEP, and then removing and
re-adding a device at run-time.  At least with mps, the initial device
creation works since it runs from dainit() in an intr config hook.

I'll attach a patch with the assertion that highlights the problem.
Comment 1 Scott M. Ferris 2014-09-16 21:51:15 UTC
Created attachment 147387 [details]
malloc M_WAITOK patch to assert THREAD_CAN_SLEEP
Comment 2 Enji Cooper freebsd_committer 2014-09-16 21:58:38 UTC
I'm testing out the patch in an 11-CURRENT VM to make sure it doesn't break the VM
case at least. I'll send it out for review to alc/kib.
Comment 3 Enji Cooper freebsd_committer 2014-09-16 21:59:36 UTC
Actually, I'm going to put the bug back in Needs Triage state because the patch
above makes the issue apparent -- the larger issue Scott brought up needs to be fixed.
Comment 4 Bryan Drewery freebsd_committer 2014-09-19 17:05:56 UTC
Kib had some feedback on the assert:

1. We should also add it (and the interrupt check) to uma_zalloc_arg() (through 1 inline function)
2. The interrupt assert may be wrong since it is not OK to malloc(9) in an interrupt, regardless of the flags.

Isilon's internal discussion was that we should add a debug stack output rather than an assert until all major cases are fixed.
Comment 5 Enji Cooper freebsd_committer 2014-09-22 23:10:49 UTC
The patch Scott provided looked ok (doesn't panic on boot with simple cases with a VM), but I didn't get an opportunity to test it out more extensively.

I didn't try out a patch that incorporates the feedback noted in comment # 4 yet.
Comment 6 Bryan Drewery freebsd_committer 2014-09-25 00:58:32 UTC
Trace from xpt_done_td from pulling a device out of the system:

KASSERT failed: malloc(M_WAITOK) in no_sleeping context
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe349829a340
kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe349829a3f0
_kassert_panic() at _kassert_panic+0xd7/frame 0xfffffe349829a470
malloc() at malloc+0x2e4/frame 0xfffffe349829a4c0
g_post_event_x() at g_post_event_x+0x84/frame 0xfffffe349829a510
g_post_event() at g_post_event+0x5d/frame 0xfffffe349829a580
adacleanup() at adacleanup+0x62/frame 0xfffffe349829a5a0
cam_periph_release_locked_buses() at cam_periph_release_locked_buses+0xde/frame 0xfffffe349829aaa0
cam_periph_release_locked() at cam_periph_release_locked+0x1b/frame 0xfffffe349829aac0
adadone() at adadone+0x26e/frame 0xfffffe349829ab20
xpt_done_process() at xpt_done_process+0x3a4/frame 0xfffffe349829ab60
xpt_done_td() at xpt_done_td+0x136/frame 0xfffffe349829abb0
fork_exit() at fork_exit+0x84/frame 0xfffffe349829abf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe349829abf0
Comment 7 Bryan Drewery freebsd_committer 2014-09-25 01:09:22 UTC
https://reviews.freebsd.org/D829 - KASSERT_WARN
https://reviews.freebsd.org/D830 - Use KASSERT_WARN in malloc(9) and uma_zalloc_arg(9)