The current -CURRENT panics like this: Unread portion of the kernel message buffer: WARNING: 127.0.0.1 (iqn.2012-06.com.example:target0): connection error; reconnecting panic: _mtx_lock_sleep: recursed on non-recursive mutex CAM device lock @ /usr/home/trasz/svn-ssh/head/sys/cam/scsi/scsi_da.c:2042 cpuid = 3 time = 1520959743 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00265b85e0 vpanic() at vpanic+0x18d/frame 0xfffffe00265b8640 vpanic() at vpanic/frame 0xfffffe00265b86c0 __mtx_lock_sleep() at __mtx_lock_sleep+0x420/frame 0xfffffe00265b8750 __mtx_lock_flags() at __mtx_lock_flags+0xf9/frame 0xfffffe00265b87a0 daasync() at daasync+0x117/frame 0xfffffe00265b87f0 xpt_async_process_dev() at xpt_async_process_dev+0x162/frame 0xfffffe00265b8840 xptdevicetraverse() at xptdevicetraverse+0xa4/frame 0xfffffe00265b8890 xpttargettraverse() at xpttargettraverse+0x6b/frame 0xfffffe00265b88d0 xpt_async_process() at xpt_async_process+0x114/frame 0xfffffe00265b89e0 xpt_done_process() at xpt_done_process+0x390/frame 0xfffffe00265b8a20 xpt_done_td() at xpt_done_td+0xf6/frame 0xfffffe00265b8a70 fork_exit() at fork_exit+0x84/frame 0xfffffe00265b8ab0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00265b8ab0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic __curthread () at ./machine/pcpu.h:230 230 __asm("movq %%gs:%1,%0" : "=r" (td) (kgdb) #0 __curthread () at ./machine/pcpu.h:230 #1 doadump (textdump=0) at /usr/home/trasz/svn-ssh/head/sys/kern/kern_shutdown.c:347 #2 0xffffffff8040acfb in db_dump (dummy=<optimized out>, dummy2=<unavailable>, dummy3=<unavailable>, dummy4=<unavailable>) at /usr/home/trasz/svn-ssh/head/sys/ddb/db_command.c:574 #3 0xffffffff8040aac9 in db_command (last_cmdp=<optimized out>, cmd_table=<optimized out>, dopager=<optimized out>) at /usr/home/trasz/svn-ssh/head/sys/ddb/db_command.c:481 #4 0xffffffff8040a844 in db_command_loop () at /usr/home/trasz/svn-ssh/head/sys/ddb/db_command.c:534 #5 0xffffffff8040da6f in db_trap (type=<optimized out>, code=<optimized out>) at /usr/home/trasz/svn-ssh/head/sys/ddb/db_main.c:250 #6 0xffffffff80b192d3 in kdb_trap (type=3, code=-61456, tf=<optimized out>) at /usr/home/trasz/svn-ssh/head/sys/kern/subr_kdb.c:697 #7 0xffffffff80f8b868 in trap (frame=0xfffffe00265b8510) at /usr/home/trasz/svn-ssh/head/sys/amd64/amd64/trap.c:547 #8 <signal handler called> #9 kdb_enter (why=0xffffffff811fcef3 "panic", msg=<optimized out>) at /usr/home/trasz/svn-ssh/head/sys/kern/subr_kdb.c:479 #10 0xffffffff80ad409a in vpanic (fmt=<optimized out>, ap=0xfffffe00265b8680) at /usr/home/trasz/svn-ssh/head/sys/kern/kern_shutdown.c:801 #11 0xffffffff80ad3ef0 in kassert_panic ( fmt=0xffffffff81146b66 "_mtx_lock_sleep: recursed on non-recursive mutex %s @ %s:%d\n") at /usr/home/trasz/svn-ssh/head/sys/kern/kern_shutdown.c:698 #12 0xffffffff80ab3400 in __mtx_lock_sleep (c=0xfffff800054134e8, v=18446735277674057728, opts=<optimized out>, file=<optimized out>, line=32) at /usr/home/trasz/svn-ssh/head/sys/kern/kern_mutex.c:509 #13 0xffffffff80ab2f49 in __mtx_lock_flags (c=0xfffff800054134e8, opts=<optimized out>, file=<optimized out>, line=<optimized out>) at /usr/home/trasz/svn-ssh/head/sys/kern/kern_mutex.c:255 #14 0xffffffff80366cf7 in daasync (callback_arg=0xfffff80005c76200, code=16384, path=<optimized out>, arg=0xfffff800037f2800) at /usr/home/trasz/svn-ssh/head/sys/cam/scsi/scsi_da.c:2042 #15 0xffffffff80342462 in xpt_async_bcast (async_head=<optimized out>, async_code=<optimized out>, path=<optimized out>, async_arg=0xfffff800037f2800) at /usr/home/trasz/svn-ssh/head/sys/cam/cam_xpt.c:4408 #16 xpt_async_process_dev (device=<optimized out>, arg=<optimized out>) at /usr/home/trasz/svn-ssh/head/sys/cam/cam_xpt.c:4315 #17 0xffffffff80341304 in xptdevicetraverse (target=0xfffff80005961300, start_device=<optimized out>, tr_func=0xfffff80005a89000, arg=<optimized out>) at /usr/home/trasz/svn-ssh/head/sys/cam/cam_xpt.c:2339 #18 0xffffffff8034108b in xpttargettraverse (bus=0xfffff80005a05300, start_target=<optimized out>, tr_func=0xffffffff803422a0 <xpt_async_process_tgt>, arg=0xfffff80005a89000) at /usr/home/trasz/svn-ssh/head/sys/cam/cam_xpt.c:2300 #19 0xffffffff8033dd44 in xpt_async_process (periph=<optimized out>, ccb=0xfffff80005a89000) at /usr/home/trasz/svn-ssh/head/sys/cam/cam_xpt.c:4364 #20 0xffffffff8033e700 in xpt_done_process (ccb_h=0xfffff80005a89000) at /usr/home/trasz/svn-ssh/head/sys/cam/cam_xpt.c:5495 #21 0xffffffff80340786 in xpt_done_td (arg=0xffffffff81995000 <cam_doneqs>) at /usr/home/trasz/svn-ssh/head/sys/cam/cam_xpt.c:5522 #22 0xffffffff80a943d4 in fork_exit ( callout=0xffffffff80340690 <xpt_done_td>, arg=0xffffffff81995000 <cam_doneqs>, frame=0xfffffe00265b8ac0) at /usr/home/trasz/svn-ssh/head/sys/kern/kern_fork.c:1039 #23 <signal handler called> (kgdb) The way to reproduce it is to set up an iSCSI session, disconnect it in a loop ("while :; do ctladm isterminate -a; sleep 1; done"), while having "while :; do diskinfo -t /dev/da0; done" running in another session.
OK. XPT is locking the periph lock in xpt_async_process_dev sometimes, but not other times. The following only takes out the lock when we're likely to change the flags. It's an OK workaround for the moment, if it works, so we can validate the funky locking done in xpt_async_process_dev() and stylize the test a bit better... diff --git a/sys/cam/scsi/scsi_da.c b/sys/cam/scsi/scsi_da.c index f5be34f96e0..90045ca3e76 100644 --- a/sys/cam/scsi/scsi_da.c +++ b/sys/cam/scsi/scsi_da.c @@ -2039,10 +2039,10 @@ daasync(void *callback_arg, u_int32_t code, * Handle all UNIT ATTENTIONs except our own, * as they will be handled by daerror(). */ - cam_periph_lock(periph); if (xpt_path_periph(ccb->ccb_h.path) != periph && scsi_extract_sense_ccb(ccb, &error_code, &sense_key, &asc, &ascq)) { + cam_periph_lock(periph); if (asc == 0x2A && ascq == 0x09) { xpt_print(ccb->ccb_h.path, "Capacity data has changed\n"); @@ -2057,8 +2057,8 @@ daasync(void *callback_arg, u_int32_t code, softc->flags &= ~DA_FLAG_PROBED; dareprobe(periph); } + cam_periph_unlock(periph); } - cam_periph_unlock(periph); break; } case AC_SCSI_AEN:
Seems to work, thanks!
Seems to work for lots of people too... Closing.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=bb8441184bab60cd8a07c2b94bd6c4ae8b56ec25 commit bb8441184bab60cd8a07c2b94bd6c4ae8b56ec25 Author: Robert Wing <rew@FreeBSD.org> AuthorDate: 2022-01-04 01:21:58 +0000 Commit: Robert Wing <rew@FreeBSD.org> CommitDate: 2022-01-04 01:56:48 +0000 cam: don't lock while handling an AC_UNIT_ATTENTION Don't take the device_mtx lock in daasync() when handling an AC_UNIT_ATTENTION. Instead, assert the lock is held before modifying the periph's softc flags. The device_mtx lock is taken in xptdevicetraverse() before daasync() is eventually called in xpt_async_bcast(). PR: 240917, 226510, 226578 Reviewed by: imp MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D27735 sys/cam/scsi/scsi_da.c | 15 +++++---------- 1 file changed, 5 insertions(+), 10 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=583480174ee7f4af92f0c5302884a7eece5b12f3 commit 583480174ee7f4af92f0c5302884a7eece5b12f3 Author: Robert Wing <rew@FreeBSD.org> AuthorDate: 2022-01-04 01:21:58 +0000 Commit: Robert Wing <rew@FreeBSD.org> CommitDate: 2022-02-10 19:43:18 +0000 cam: don't lock while handling an AC_UNIT_ATTENTION Don't take the device_mtx lock in daasync() when handling an AC_UNIT_ATTENTION. Instead, assert the lock is held before modifying the periph's softc flags. The device_mtx lock is taken in xptdevicetraverse() before daasync() is eventually called in xpt_async_bcast(). PR: 240917, 226510, 226578 Reviewed by: imp MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D27735 (cherry picked from commit bb8441184bab60cd8a07c2b94bd6c4ae8b56ec25) sys/cam/scsi/scsi_da.c | 15 +++++---------- 1 file changed, 5 insertions(+), 10 deletions(-)
^Triage: Assign to committer that resolves and track stable/* merge. Does this need to go to stable/12? Unclear from OP what 'CURRENT' was at the time.
^Triage: Re-open pending stable/12 merge query. Other bugs (bug 226510 bug 240917) are/were reported againt current (12.x at the time), will query merge to stable/12 on those too
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=1987ff8abca2c9bdff7f385ea2fd1c60cf5b3aeb commit 1987ff8abca2c9bdff7f385ea2fd1c60cf5b3aeb Author: Robert Wing <rew@FreeBSD.org> AuthorDate: 2022-01-04 01:21:58 +0000 Commit: Robert Wing <rew@FreeBSD.org> CommitDate: 2022-03-08 07:07:46 +0000 cam: don't lock while handling an AC_UNIT_ATTENTION Don't take the device_mtx lock in daasync() when handling an AC_UNIT_ATTENTION. Instead, assert the lock is held before modifying the periph's softc flags. The device_mtx lock is taken in xptdevicetraverse() before daasync() is eventually called in xpt_async_bcast(). PR: 240917, 226510, 226578 Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D27735 (cherry picked from commit bb8441184bab60cd8a07c2b94bd6c4ae8b56ec25) sys/cam/scsi/scsi_da.c | 15 +++++---------- 1 file changed, 5 insertions(+), 10 deletions(-)