Bug 226578 - panic: _mtx_lock_sleep: recursed on non-recursive mutex CAM device lock @ /usr/home/trasz/svn-ssh/head/sys/cam/scsi/scsi_da.c:2042
Summary: panic: _mtx_lock_sleep: recursed on non-recursive mutex CAM device lock @ /us...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: Robert Wing
URL: https://reviews.freebsd.org/D27735
Keywords: crash
Depends on:
Blocks:
 
Reported: 2018-03-13 16:55 UTC by Edward Tomasz Napierala
Modified: 2022-03-08 07:16 UTC (History)
3 users (show)

See Also:
koobs: maintainer-feedback? (imp)
rew: maintainer-feedback+
koobs: mfc-stable13+
rew: mfc-stable12+


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Edward Tomasz Napierala freebsd_committer freebsd_triage 2018-03-13 16:55:19 UTC
The current -CURRENT panics like this:

Unread portion of the kernel message buffer:
WARNING: 127.0.0.1 (iqn.2012-06.com.example:target0): connection error; reconnecting
panic: _mtx_lock_sleep: recursed on non-recursive mutex CAM device lock @ /usr/home/trasz/svn-ssh/head/sys/cam/scsi/scsi_da.c:2042

cpuid = 3
time = 1520959743
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00265b85e0
vpanic() at vpanic+0x18d/frame 0xfffffe00265b8640
vpanic() at vpanic/frame 0xfffffe00265b86c0
__mtx_lock_sleep() at __mtx_lock_sleep+0x420/frame 0xfffffe00265b8750
__mtx_lock_flags() at __mtx_lock_flags+0xf9/frame 0xfffffe00265b87a0
daasync() at daasync+0x117/frame 0xfffffe00265b87f0
xpt_async_process_dev() at xpt_async_process_dev+0x162/frame 0xfffffe00265b8840
xptdevicetraverse() at xptdevicetraverse+0xa4/frame 0xfffffe00265b8890
xpttargettraverse() at xpttargettraverse+0x6b/frame 0xfffffe00265b88d0
xpt_async_process() at xpt_async_process+0x114/frame 0xfffffe00265b89e0
xpt_done_process() at xpt_done_process+0x390/frame 0xfffffe00265b8a20
xpt_done_td() at xpt_done_td+0xf6/frame 0xfffffe00265b8a70
fork_exit() at fork_exit+0x84/frame 0xfffffe00265b8ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00265b8ab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic

__curthread () at ./machine/pcpu.h:230
230             __asm("movq %%gs:%1,%0" : "=r" (td)
(kgdb) #0  __curthread () at ./machine/pcpu.h:230
#1  doadump (textdump=0)
    at /usr/home/trasz/svn-ssh/head/sys/kern/kern_shutdown.c:347
#2  0xffffffff8040acfb in db_dump (dummy=<optimized out>, 
    dummy2=<unavailable>, dummy3=<unavailable>, dummy4=<unavailable>)
    at /usr/home/trasz/svn-ssh/head/sys/ddb/db_command.c:574
#3  0xffffffff8040aac9 in db_command (last_cmdp=<optimized out>, 
    cmd_table=<optimized out>, dopager=<optimized out>)
    at /usr/home/trasz/svn-ssh/head/sys/ddb/db_command.c:481
#4  0xffffffff8040a844 in db_command_loop ()
    at /usr/home/trasz/svn-ssh/head/sys/ddb/db_command.c:534
#5  0xffffffff8040da6f in db_trap (type=<optimized out>, code=<optimized out>)
    at /usr/home/trasz/svn-ssh/head/sys/ddb/db_main.c:250
#6  0xffffffff80b192d3 in kdb_trap (type=3, code=-61456, tf=<optimized out>)
    at /usr/home/trasz/svn-ssh/head/sys/kern/subr_kdb.c:697
#7  0xffffffff80f8b868 in trap (frame=0xfffffe00265b8510)
    at /usr/home/trasz/svn-ssh/head/sys/amd64/amd64/trap.c:547
#8  <signal handler called>
#9  kdb_enter (why=0xffffffff811fcef3 "panic", msg=<optimized out>)
    at /usr/home/trasz/svn-ssh/head/sys/kern/subr_kdb.c:479
#10 0xffffffff80ad409a in vpanic (fmt=<optimized out>, ap=0xfffffe00265b8680)
    at /usr/home/trasz/svn-ssh/head/sys/kern/kern_shutdown.c:801
#11 0xffffffff80ad3ef0 in kassert_panic (
    fmt=0xffffffff81146b66 "_mtx_lock_sleep: recursed on non-recursive mutex %s @ %s:%d\n") at /usr/home/trasz/svn-ssh/head/sys/kern/kern_shutdown.c:698
#12 0xffffffff80ab3400 in __mtx_lock_sleep (c=0xfffff800054134e8, 
    v=18446735277674057728, opts=<optimized out>, file=<optimized out>, 
    line=32) at /usr/home/trasz/svn-ssh/head/sys/kern/kern_mutex.c:509
#13 0xffffffff80ab2f49 in __mtx_lock_flags (c=0xfffff800054134e8, 
    opts=<optimized out>, file=<optimized out>, line=<optimized out>)
    at /usr/home/trasz/svn-ssh/head/sys/kern/kern_mutex.c:255
#14 0xffffffff80366cf7 in daasync (callback_arg=0xfffff80005c76200, 
    code=16384, path=<optimized out>, arg=0xfffff800037f2800)
    at /usr/home/trasz/svn-ssh/head/sys/cam/scsi/scsi_da.c:2042
#15 0xffffffff80342462 in xpt_async_bcast (async_head=<optimized out>, 
    async_code=<optimized out>, path=<optimized out>, 
    async_arg=0xfffff800037f2800)
    at /usr/home/trasz/svn-ssh/head/sys/cam/cam_xpt.c:4408
#16 xpt_async_process_dev (device=<optimized out>, arg=<optimized out>)
    at /usr/home/trasz/svn-ssh/head/sys/cam/cam_xpt.c:4315
#17 0xffffffff80341304 in xptdevicetraverse (target=0xfffff80005961300, 
    start_device=<optimized out>, tr_func=0xfffff80005a89000, 
    arg=<optimized out>)
    at /usr/home/trasz/svn-ssh/head/sys/cam/cam_xpt.c:2339
#18 0xffffffff8034108b in xpttargettraverse (bus=0xfffff80005a05300, 
    start_target=<optimized out>, 
    tr_func=0xffffffff803422a0 <xpt_async_process_tgt>, 
    arg=0xfffff80005a89000)
    at /usr/home/trasz/svn-ssh/head/sys/cam/cam_xpt.c:2300
#19 0xffffffff8033dd44 in xpt_async_process (periph=<optimized out>, 
    ccb=0xfffff80005a89000)
    at /usr/home/trasz/svn-ssh/head/sys/cam/cam_xpt.c:4364
#20 0xffffffff8033e700 in xpt_done_process (ccb_h=0xfffff80005a89000)
    at /usr/home/trasz/svn-ssh/head/sys/cam/cam_xpt.c:5495
#21 0xffffffff80340786 in xpt_done_td (arg=0xffffffff81995000 <cam_doneqs>)
    at /usr/home/trasz/svn-ssh/head/sys/cam/cam_xpt.c:5522
#22 0xffffffff80a943d4 in fork_exit (
    callout=0xffffffff80340690 <xpt_done_td>, 
    arg=0xffffffff81995000 <cam_doneqs>, frame=0xfffffe00265b8ac0)
    at /usr/home/trasz/svn-ssh/head/sys/kern/kern_fork.c:1039
#23 <signal handler called>
(kgdb) 

The way to reproduce it is to set up an iSCSI session, disconnect it in a loop ("while :; do ctladm isterminate -a; sleep 1; done"), while having "while :; do diskinfo -t /dev/da0; done" running in another session.
Comment 1 Warner Losh freebsd_committer freebsd_triage 2018-03-13 17:38:16 UTC
OK. XPT is locking the periph lock in xpt_async_process_dev sometimes, but not other times. The following only takes out the lock when we're likely to change the flags. It's an OK workaround for the moment, if it works, so we can validate the funky locking done in xpt_async_process_dev() and stylize the test a bit better...



diff --git a/sys/cam/scsi/scsi_da.c b/sys/cam/scsi/scsi_da.c
index f5be34f96e0..90045ca3e76 100644
--- a/sys/cam/scsi/scsi_da.c
+++ b/sys/cam/scsi/scsi_da.c
@@ -2039,10 +2039,10 @@ daasync(void *callback_arg, u_int32_t code,
                 * Handle all UNIT ATTENTIONs except our own,
                 * as they will be handled by daerror().
                 */
-               cam_periph_lock(periph);
                if (xpt_path_periph(ccb->ccb_h.path) != periph &&
                    scsi_extract_sense_ccb(ccb,
                     &error_code, &sense_key, &asc, &ascq)) {
+                       cam_periph_lock(periph);
                        if (asc == 0x2A && ascq == 0x09) {
                                xpt_print(ccb->ccb_h.path,
                                    "Capacity data has changed\n");
@@ -2057,8 +2057,8 @@ daasync(void *callback_arg, u_int32_t code,
                                softc->flags &= ~DA_FLAG_PROBED;
                                dareprobe(periph);
                        }
+                       cam_periph_unlock(periph);
                }
-               cam_periph_unlock(periph);
                break;
        }
        case AC_SCSI_AEN:
Comment 2 Edward Tomasz Napierala freebsd_committer freebsd_triage 2018-03-14 18:31:56 UTC
Seems to work, thanks!
Comment 3 Warner Losh freebsd_committer freebsd_triage 2018-11-04 18:20:20 UTC
Seems to work for lots of people too... Closing.
Comment 4 commit-hook freebsd_committer freebsd_triage 2022-01-04 02:02:37 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=bb8441184bab60cd8a07c2b94bd6c4ae8b56ec25

commit bb8441184bab60cd8a07c2b94bd6c4ae8b56ec25
Author:     Robert Wing <rew@FreeBSD.org>
AuthorDate: 2022-01-04 01:21:58 +0000
Commit:     Robert Wing <rew@FreeBSD.org>
CommitDate: 2022-01-04 01:56:48 +0000

    cam: don't lock while handling an AC_UNIT_ATTENTION

    Don't take the device_mtx lock in daasync() when handling an
    AC_UNIT_ATTENTION. Instead, assert the lock is held before modifying the
    periph's softc flags.

    The device_mtx lock is taken in xptdevicetraverse() before daasync()
    is eventually called in xpt_async_bcast().

    PR:             240917, 226510, 226578
    Reviewed by:    imp
    MFC after:      3 weeks
    Differential Revision: https://reviews.freebsd.org/D27735

 sys/cam/scsi/scsi_da.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)
Comment 5 commit-hook freebsd_committer freebsd_triage 2022-02-10 19:43:52 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=583480174ee7f4af92f0c5302884a7eece5b12f3

commit 583480174ee7f4af92f0c5302884a7eece5b12f3
Author:     Robert Wing <rew@FreeBSD.org>
AuthorDate: 2022-01-04 01:21:58 +0000
Commit:     Robert Wing <rew@FreeBSD.org>
CommitDate: 2022-02-10 19:43:18 +0000

    cam: don't lock while handling an AC_UNIT_ATTENTION

    Don't take the device_mtx lock in daasync() when handling an
    AC_UNIT_ATTENTION. Instead, assert the lock is held before modifying the
    periph's softc flags.

    The device_mtx lock is taken in xptdevicetraverse() before daasync()
    is eventually called in xpt_async_bcast().

    PR:             240917, 226510, 226578
    Reviewed by:    imp
    MFC after:      3 weeks
    Differential Revision: https://reviews.freebsd.org/D27735

    (cherry picked from commit bb8441184bab60cd8a07c2b94bd6c4ae8b56ec25)

 sys/cam/scsi/scsi_da.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)
Comment 6 Kubilay Kocak freebsd_committer freebsd_triage 2022-02-10 23:14:42 UTC
^Triage: Assign to committer that resolves and track stable/* merge.

Does this need to go to stable/12? Unclear from OP what 'CURRENT' was at the time.
Comment 7 Kubilay Kocak freebsd_committer freebsd_triage 2022-02-10 23:21:06 UTC
^Triage: Re-open pending stable/12 merge query.

Other bugs (bug 226510 bug 240917) are/were reported againt current (12.x at the time), will query merge to stable/12 on those too
Comment 8 commit-hook freebsd_committer freebsd_triage 2022-03-08 07:12:34 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=1987ff8abca2c9bdff7f385ea2fd1c60cf5b3aeb

commit 1987ff8abca2c9bdff7f385ea2fd1c60cf5b3aeb
Author:     Robert Wing <rew@FreeBSD.org>
AuthorDate: 2022-01-04 01:21:58 +0000
Commit:     Robert Wing <rew@FreeBSD.org>
CommitDate: 2022-03-08 07:07:46 +0000

    cam: don't lock while handling an AC_UNIT_ATTENTION

    Don't take the device_mtx lock in daasync() when handling an
    AC_UNIT_ATTENTION. Instead, assert the lock is held before modifying the
    periph's softc flags.

    The device_mtx lock is taken in xptdevicetraverse() before daasync()
    is eventually called in xpt_async_bcast().

    PR:             240917, 226510, 226578
    Reviewed by:    imp
    Differential Revision: https://reviews.freebsd.org/D27735

    (cherry picked from commit bb8441184bab60cd8a07c2b94bd6c4ae8b56ec25)

 sys/cam/scsi/scsi_da.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)