Bug 226510 - panic: Re-refing for reason 5, cnt = 1
Summary: panic: Re-refing for reason 5, cnt = 1
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Some People
Assignee: Robert Wing
URL:
Keywords: crash
Depends on:
Blocks:
 
Reported: 2018-03-11 08:18 UTC by Roman Bogorodskiy
Modified: 2022-03-08 07:20 UTC (History)
3 users (show)

See Also:
koobs: mfc-stable13+
rew: mfc-stable12+


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Roman Bogorodskiy freebsd_committer freebsd_triage 2018-03-11 08:18:47 UTC
From time to time on my desktop system running fairly fresh current I get a panic with the following trace:

(kgdb) bt
#0  __curthread () at ./machine/pcpu.h:230
#1  doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:347
#2  0xffffffff8040a96b in db_dump (dummy=<optimized out>, dummy2=<unavailable>, dummy3=<unavailable>, dummy4=<unavailable>) at /usr/src/sys/ddb/db_command.c:574
#3  0xffffffff8040a739 in db_command (last_cmdp=<optimized out>, cmd_table=<optimized out>, dopager=<optimized out>) at /usr/src/sys/ddb/db_command.c:481
#4  0xffffffff8040a4b4 in db_command_loop () at /usr/src/sys/ddb/db_command.c:534
#5  0xffffffff8040d6df in db_trap (type=<optimized out>, code=<optimized out>) at /usr/src/sys/ddb/db_main.c:250
#6  0xffffffff80b18ff3 in kdb_trap (type=3, code=-61456, tf=<optimized out>) at /usr/src/sys/kern/subr_kdb.c:697
#7  0xffffffff80f8b868 in trap (frame=0xfffffe005b975750) at /usr/src/sys/amd64/amd64/trap.c:547
#8  <signal handler called>
#9  kdb_enter (why=0xffffffff811f9f80 "panic", msg=<optimized out>) at /usr/src/sys/kern/subr_kdb.c:479
#10 0xffffffff80ad3dca in vpanic (fmt=<optimized out>, ap=0xfffffe005b9758c0) at /usr/src/sys/kern/kern_shutdown.c:801
#11 0xffffffff80ad3e53 in panic (fmt=0xffffffff81be33f8 <cnputs_mtx> "\266;\034\201\377\377\377\377") at /usr/src/sys/kern/kern_shutdown.c:739
#12 0xffffffff80369c57 in da_periph_acquire (periph=<optimized out>, token=DA_REF_TUR) at /usr/src/sys/cam/scsi/scsi_da.c:1574
#13 damediapoll (arg=0xfffff80004bf2300) at /usr/src/sys/cam/scsi/scsi_da.c:5668
#14 0xffffffff80aebd00 in softclock_call_cc (c=0xfffffe0069c15a68, cc=0xffffffff81df2ac0 <cc_cpu>, direct=<optimized out>) at /usr/src/sys/kern/kern_timeout.c:731
#15 0xffffffff80aec0cc in softclock (arg=0xffffffff81df2ac0 <cc_cpu>) at /usr/src/sys/kern/kern_timeout.c:869
#16 0xffffffff80a96cc9 in intr_event_execute_handlers (p=<optimized out>, ie=0xfffff80003826200) at /usr/src/sys/kern/kern_intr.c:1338
#17 0xffffffff80a973b7 in ithread_execute_handlers (ie=<optimized out>, p=<optimized out>) at /usr/src/sys/kern/kern_intr.c:1351
#18 ithread_loop (arg=0xfffff8000382f120) at /usr/src/sys/kern/kern_intr.c:1432
#19 0xffffffff80a94104 in fork_exit (callout=0xffffffff80a97300 <ithread_loop>, arg=0xfffff8000382f120, frame=0xfffffe005b975ac0) at /usr/src/sys/kern/kern_fork.c:1039
#20 <signal handler called>
(kgdb)


Sources version:

Last Changed Rev: 330676
Last Changed Date: 2018-03-09 04:50:40 +0400 (Fri, 09 Mar 2018)

The only da(4) device I had when it paniced was:
da0 at umass-sim0 bus 0 scbus4 target 0 lun 0                                                                                                                                                                      
da0: <Multi Flash Reader 1.00> Removable Direct Access SCSI device                                                                                                                                                 
da0: Serial Number 058F0O1111B1                                                                                                                                                                                    
da0: 40.000MB/s transfers                                                                                                                                                                                          
da0: Attempt to query device size failed: NOT READY, Medium not present                                                                                                                                            
da0: quirks=0x2<NO_6_BYTE>  

This is a flash card reader, and usually it stays connected, but without flash cards attached to it. In other words, I didn't do anything with it to trigger a panic.
Comment 1 Andriy Gapon freebsd_committer freebsd_triage 2018-03-11 14:40:08 UTC
What's the panic message?
Would make sense to put it into the bug title.
Comment 2 Roman Bogorodskiy freebsd_committer freebsd_triage 2018-03-11 14:43:02 UTC
(In reply to Andriy Gapon from comment #1)

I've updated the title.

Unread portion of the kernel message buffer:
panic: Re-refing for reason 5, cnt = 1
cpuid = 0
time = 1520754073
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe005b975820
vpanic() at vpanic+0x18d/frame 0xfffffe005b975880
panic() at panic+0x43/frame 0xfffffe005b9758e0
damediapoll() at damediapoll+0xa7/frame 0xfffffe005b975900
softclock_call_cc() at softclock_call_cc+0x150/frame 0xfffffe005b9759b0
softclock() at softclock+0x7c/frame 0xfffffe005b9759e0
intr_event_execute_handlers() at intr_event_execute_handlers+0x99/frame 0xfffffe005b975a20
ithread_loop() at ithread_loop+0xb7/frame 0xfffffe005b975a70
fork_exit() at fork_exit+0x84/frame 0xfffffe005b975ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe005b975ab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
Comment 3 Andriy Gapon freebsd_committer freebsd_triage 2018-03-11 15:14:30 UTC
I don't know the design behind da_periph_acquire, but I suspect that there could be a collision between the periodic damediapoll() call and, say, AC_SCSI_AEN notification coming from the device.
Comment 4 Warner Losh freebsd_committer freebsd_triage 2018-03-12 14:38:42 UTC
There have been other reports

https://reviews.freebsd.org/D14456

has the fixed locking for them. This bug is almost certainly related.

What the panic means is that for some reason we're taking out a second reference because we want to do a test unit ready. We should only ever have one reference.
I've updated that review since I think this error shows I missed a spot.
Comment 5 commit-hook freebsd_committer freebsd_triage 2018-03-12 15:18:06 UTC
A commit references this bug:

Author: imp
Date: Mon Mar 12 15:17:16 UTC 2018
New revision: 330796
URL: https://svnweb.freebsd.org/changeset/base/330796

Log:
  Tighten up periph lock to avoid some races

  Make sure the periph lock is held around rmw access to softc data,
  espeically flags, including work flags in iosched.
  Add asserts for the periph lock where it should be held.

  PR: 226510
  Sponsored by: Netflix
  Differential Review: https://reviews.freebsd.org/D14456

Changes:
  head/sys/cam/scsi/scsi_da.c
Comment 6 Roman Bogorodskiy freebsd_committer freebsd_triage 2018-03-12 16:18:09 UTC
(In reply to Warner Losh from comment #4)

Thanks, I'll try to update tomorrow and see if these panics show up again.
Comment 7 Roman Bogorodskiy freebsd_committer freebsd_triage 2018-03-15 05:49:19 UTC
Hm, the new kernel fails to boot with:

https://people.freebsd.org/~novel/misc/panic_mar14.jpg
Comment 8 Warner Losh freebsd_committer freebsd_triage 2018-03-15 14:20:38 UTC
what's the exact version tested? I fixed a similar panic.
Comment 9 Roman Bogorodskiy freebsd_committer freebsd_triage 2018-03-15 16:03:42 UTC
(In reply to Warner Losh from comment #8)

Unfortunately, I didn't record the exact version, that was from Tuesday or Wednesday. I've updated to r330969, will boot into it tomorrow.
Comment 10 Roman Bogorodskiy freebsd_committer freebsd_triage 2018-03-16 07:21:04 UTC
Still panics on boot:

12.0-CURRENT #7 r330969: Thu Mar 15 11:42:57 +04 2018

Boots fine with detached 'da' device though.
Comment 11 Warner Losh freebsd_committer freebsd_triage 2018-03-16 18:27:23 UTC
Does this help?
diff --git a/sys/cam/scsi/scsi_da.c b/sys/cam/scsi/scsi_da.c
index c6941990a8df..4bcddbb8dff9 100644
--- a/sys/cam/scsi/scsi_da.c
+++ b/sys/cam/scsi/scsi_da.c
@@ -2039,26 +2039,30 @@ daasync(void *callback_arg, u_int32_t code,
                 * Handle all UNIT ATTENTIONs except our own,
                 * as they will be handled by daerror().
                 */
-               cam_periph_lock(periph);
                if (xpt_path_periph(ccb->ccb_h.path) != periph &&
                    scsi_extract_sense_ccb(ccb,
                     &error_code, &sense_key, &asc, &ascq)) {
                        if (asc == 0x2A && ascq == 0x09) {
                                xpt_print(ccb->ccb_h.path,
                                    "Capacity data has changed\n");
+                               cam_periph_lock(periph);
                                softc->flags &= ~DA_FLAG_PROBED;
+                               cam_periph_unlock(periph);
                                dareprobe(periph);
                        } else if (asc == 0x28 && ascq == 0x00) {
+                               cam_periph_lock(periph);
                                softc->flags &= ~DA_FLAG_PROBED;
+                               cam_periph_unlock(periph);
                                disk_media_changed(softc->disk, M_NOWAIT);
                        } else if (asc == 0x3F && ascq == 0x03) {
                                xpt_print(ccb->ccb_h.path,
                                    "INQUIRY data has changed\n");
+                               cam_periph_lock(periph);
                                softc->flags &= ~DA_FLAG_PROBED;
+                               cam_periph_unlock(periph);
                                dareprobe(periph);
                        }
                }
-               cam_periph_unlock(periph);
                break;
        }
        case AC_SCSI_AEN:
Comment 12 commit-hook freebsd_committer freebsd_triage 2018-03-17 16:04:21 UTC
A commit references this bug:

Author: imp
Date: Sat Mar 17 16:04:06 UTC 2018
New revision: 331097
URL: https://svnweb.freebsd.org/changeset/base/331097

Log:
  Only take out the periph lock when we're modifying the flags of the
  softc for an async unit attention. CAM locks, sometimes, the periph
  lock and other times does not. We were taking the lock always and
  running into lock recursion issues on a non-recursive lock. Now we
  take it selectively. It's not clear why xpt takes the lock selectively
  before calling us, though, and that's still under investigation.

  Reported by:	avg
  PR:		226510 (same panic, differnt circumstances)
  Sponsored by:	Netflix

Changes:
  head/sys/cam/scsi/scsi_da.c
Comment 13 Roman Bogorodskiy freebsd_committer freebsd_triage 2018-03-17 16:32:22 UTC
(In reply to Warner Losh from comment #11)

Sorry for the delay, I can see it's already committed. I have a kernel with this fix installed, but I'm doing some long poudriere build I don't want to interrupt. I'll reboot once it's done and let you know if this works.
Comment 14 Roman Bogorodskiy freebsd_committer freebsd_triage 2018-03-19 05:44:54 UTC
FreeBSD 12.0-CURRENT #9 r331097 boots fine with the device attached. Haven't yet checked if device operates properly though.
Comment 15 Roman Bogorodskiy freebsd_committer freebsd_triage 2018-03-20 08:22:46 UTC
(In reply to Roman Bogorodskiy from comment #14)

One more panic:

Tue Mar 20 12:18:17 +04 2018

FreeBSD kloomba 12.0-CURRENT FreeBSD 12.0-CURRENT #9 r331097: Sat Mar 17 22:30:10 +04 2018     root@romashka:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64

panic: Re-refing for reason 5, cnt = 1

GNU gdb (GDB) 8.0.1 [GDB v8.0.1 for FreeBSD]
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd12.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /boot/kernel/kernel...Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...done.
done.

Unread portion of the kernel message buffer:
panic: Re-refing for reason 5, cnt = 1
cpuid = 1
time = 1521533405
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe005b97f820
vpanic() at vpanic+0x18d/frame 0xfffffe005b97f880
panic() at panic+0x43/frame 0xfffffe005b97f8e0
damediapoll() at damediapoll+0xa7/frame 0xfffffe005b97f900
softclock_call_cc() at softclock_call_cc+0x150/frame 0xfffffe005b97f9b0
softclock() at softclock+0x7c/frame 0xfffffe005b97f9e0
intr_event_execute_handlers() at intr_event_execute_handlers+0x99/frame 0xfffffe005b97fa20
ithread_loop() at ithread_loop+0xb7/frame 0xfffffe005b97fa70
fork_exit() at fork_exit+0x84/frame 0xfffffe005b97fab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe005b97fab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic

__curthread () at ./machine/pcpu.h:230
230     ./machine/pcpu.h: No such file or directory.
(kgdb) #0  __curthread () at ./machine/pcpu.h:230
#1  doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:347
#2  0xffffffff8041d36b in db_dump (dummy=<optimized out>, 
    dummy2=<unavailable>, dummy3=<unavailable>, dummy4=<unavailable>)
    at /usr/src/sys/ddb/db_command.c:574
#3  0xffffffff8041d139 in db_command (last_cmdp=<optimized out>, 
    cmd_table=<optimized out>, dopager=<optimized out>)
    at /usr/src/sys/ddb/db_command.c:481
#4  0xffffffff8041ceb4 in db_command_loop ()
    at /usr/src/sys/ddb/db_command.c:534
#5  0xffffffff804200df in db_trap (type=<optimized out>, code=<optimized out>)
    at /usr/src/sys/ddb/db_main.c:250
#6  0xffffffff80b2c063 in kdb_trap (type=3, code=-61456, tf=<optimized out>)
    at /usr/src/sys/kern/subr_kdb.c:697
#7  0xffffffff80f9e868 in trap (frame=0xfffffe005b97f750)
    at /usr/src/sys/amd64/amd64/trap.c:547
#8  <signal handler called>
#9  kdb_enter (why=0xffffffff8120cb7e "panic", msg=<optimized out>)
    at /usr/src/sys/kern/subr_kdb.c:479
#10 0xffffffff80ae6e2a in vpanic (fmt=<optimized out>, ap=0xfffffe005b97f8c0)
    at /usr/src/sys/kern/kern_shutdown.c:801
#11 0xffffffff80ae6eb3 in panic (
    fmt=0xffffffff81be33f8 <cnputs_mtx> "|f\035\201\377\377\377\377")
    at /usr/src/sys/kern/kern_shutdown.c:739
#12 0xffffffff8036a187 in da_periph_acquire (periph=<optimized out>, 
    token=DA_REF_TUR) at /usr/src/sys/cam/scsi/scsi_da.c:1574
#13 damediapoll (arg=0xfffff80004c68c00)
    at /usr/src/sys/cam/scsi/scsi_da.c:5704
#14 0xffffffff80afed60 in softclock_call_cc (c=0xfffffe0069ea9a68, 
    cc=0xffffffff81df2b00 <cc_cpu>, direct=<optimized out>)
    at /usr/src/sys/kern/kern_timeout.c:731
#15 0xffffffff80aff12c in softclock (arg=0xffffffff81df2b00 <cc_cpu>)
    at /usr/src/sys/kern/kern_timeout.c:869
#16 0xffffffff80aa9d29 in intr_event_execute_handlers (p=<optimized out>, 
    ie=0xfffff80003805b00) at /usr/src/sys/kern/kern_intr.c:1338
#17 0xffffffff80aaa417 in ithread_execute_handlers (ie=<optimized out>, 
    p=<optimized out>) at /usr/src/sys/kern/kern_intr.c:1351
#18 ithread_loop (arg=0xfffff800038200e0)
    at /usr/src/sys/kern/kern_intr.c:1432
#19 0xffffffff80aa7164 in fork_exit (
    callout=0xffffffff80aaa360 <ithread_loop>, arg=0xfffff800038200e0, 
    frame=0xfffffe005b97fac0) at /usr/src/sys/kern/kern_fork.c:1039
#20 <signal handler called>
(kgdb)
Comment 16 Warner Losh freebsd_committer freebsd_triage 2018-03-20 20:23:36 UTC
It looks like we're not releasing the DA_REF_TUR. We clear it, but don't release, then see if it is set and clear it and release. This isn't going to work out too well, so always release when we clear the work flag.


diff --git a/sys/cam/scsi/scsi_da.c b/sys/cam/scsi/scsi_da.c
index 4bcddbb8dff9..beeda8d90f79 100644
--- a/sys/cam/scsi/scsi_da.c
+++ b/sys/cam/scsi/scsi_da.c
@@ -3114,6 +3114,7 @@ dastart(struct cam_periph *periph, union ccb *start_ccb)
                if (bp == NULL) {
                        if (cam_iosched_has_work_flags(softc->cam_iosched, DA_WORK_TUR)) {
                                cam_iosched_clr_work_flags(softc->cam_iosched, DA_WORK_TUR);
+                               da_periph_release_locked(periph, DA_REF_TUR);
                                scsi_test_unit_ready(&start_ccb->csio,
                                     /*retries*/ da_retry_count,
                                     dadone,
@@ -3139,11 +3140,6 @@ dastart(struct cam_periph *periph, union ccb *start_ccb)
                        }
                }

-               if (cam_iosched_has_work_flags(softc->cam_iosched, DA_WORK_TUR)) {
-                       cam_iosched_clr_work_flags(softc->cam_iosched, DA_WORK_TUR);
-                       da_periph_release_locked(periph, DA_REF_TUR);
-               }
-
                if ((bp->bio_flags & BIO_ORDERED) != 0 ||
                    (softc->flags & DA_FLAG_NEED_OTAG) != 0) {
                        softc->flags &= ~DA_FLAG_NEED_OTAG;
Comment 17 commit-hook freebsd_committer freebsd_triage 2018-03-20 22:08:39 UTC
A commit references this bug:

Author: imp
Date: Tue Mar 20 22:07:45 UTC 2018
New revision: 331273
URL: https://svnweb.freebsd.org/changeset/base/331273

Log:
  Release the "TUR" reference when clearing the TUR work flag. We mostly
  do this right, except when there's no BP and we do a TUR by request.
  In that case, we clear the flag, but don't release the reference,
  leaking the reference on rare occasion.

  PR: 226510
  Sponsored by: Netflix

Changes:
  head/sys/cam/scsi/scsi_da.c
Comment 18 Roman Bogorodskiy freebsd_committer freebsd_triage 2018-03-21 09:57:24 UTC
(In reply to commit-hook from comment #17)

This panics on boot with the device attached: https://people.freebsd.org/~novel/misc/panic_mar21.jpg

FreeBSD 12.0-CURRENT #10 r331284
Comment 19 commit-hook freebsd_committer freebsd_triage 2018-03-23 16:23:54 UTC
A commit references this bug:

Author: imp
Date: Fri Mar 23 16:23:15 UTC 2018
New revision: 331435
URL: https://svnweb.freebsd.org/changeset/base/331435

Log:
  Flag when we have a pending TUR. Don't schedule another one when we
  have one pending. Otherwise, we can race and send two, which is
  wasteful in close proximity. It can also cause the acaquire/release
  count for TUR to be > 1, which is undexpected.

  PR: 226510
  Differential Review: https://reviews.freebsd.org/D14792

Changes:
  head/sys/cam/scsi/scsi_da.c
Comment 20 Oleksandr Tymoshenko freebsd_committer freebsd_triage 2019-01-21 21:58:35 UTC
There is a commit referencing this PR, but it's still not closed and has been inactive for some time. Closing the PR as fixed but feel free to re-open it if the issue hasn't been completely resolved.

Thanks
Comment 21 Eric van Gyzen freebsd_committer freebsd_triage 2019-04-16 17:04:46 UTC
For the record, here are the commits that seem to be related to this PR, including followup commits to fix regressions.  I'm recording them here just because I had to find them for myself.

commit 02d268dd2672cab6c99d55edc230623ab60acf3f
Author: imp <imp@FreeBSD.org>
Date:   Mon Mar 12 15:17:16 2018 +0000

    Tighten up periph lock to avoid some races
    
    Make sure the periph lock is held around rmw access to softc data,
    espeically flags, including work flags in iosched.
    Add asserts for the periph lock where it should be held.
    
    PR: 226510
    Sponsored by: Netflix
    Differential Review: https://reviews.freebsd.org/D14456

Notes (freebsd):
    svn path=/head/; revision=330796

commit bf523f13ef5a3a6d06e76be0df100ac13b0d1d11
Author: imp <imp@FreeBSD.org>
Date:   Sat Mar 17 16:04:06 2018 +0000

    Only take out the periph lock when we're modifying the flags of the
    softc for an async unit attention. CAM locks, sometimes, the periph
    lock and other times does not. We were taking the lock always and
    running into lock recursion issues on a non-recursive lock. Now we
    take it selectively. It's not clear why xpt takes the lock selectively
    before calling us, though, and that's still under investigation.

    Reported by:    avg
    PR:             226510 (same panic, differnt circumstances)
    Sponsored by:   Netflix

Notes (freebsd):
    svn path=/head/; revision=331097

commit c2ed5522d0e7837332d4dfcad73179f6f0df45c2
Author: imp <imp@FreeBSD.org>
Date:   Tue Mar 20 22:07:45 2018 +0000

    Release the "TUR" reference when clearing the TUR work flag. We mostly
    do this right, except when there's no BP and we do a TUR by request.
    In that case, we clear the flag, but don't release the reference,
    leaking the reference on rare occasion.

    PR: 226510
    Sponsored by: Netflix

Notes (freebsd):
    svn path=/head/; revision=331273

commit 20eb8298f5923ea3ab2734cd24f8ee0f12cf8b98
Author: imp <imp@FreeBSD.org>
Date:   Wed Mar 21 12:55:59 2018 +0000

    Revert r331273: "Release the "TUR" reference when clearing the TUR work flag. We mostly"

    It exposes other issues, so revert to the pervious state of known issues.

Notes (freebsd):
    svn path=/head/; revision=331291

commit 685a9276f2ecb16a977f044eda1490e1f243a043
Author: imp <imp@FreeBSD.org>
Date:   Fri Mar 23 16:23:15 2018 +0000

    Flag when we have a pending TUR. Don't schedule another one when we
    have one pending. Otherwise, we can race and send two, which is
    wasteful in close proximity. It can also cause the acaquire/release
    count for TUR to be > 1, which is undexpected.

    PR: 226510
    Differential Review: https://reviews.freebsd.org/D14792

Notes (freebsd):
    svn path=/head/; revision=331435

commit 896df23a52b2a955b338a931ea514c44aec48cba
Author: ken <ken@FreeBSD.org>
Date:   Thu Jun 14 17:08:44 2018 +0000

    Fix da(4) locking when probing SMR drives.

    Probing host aware and host managed SMR drives got broken in revision
    330796.

    The added cam_periph_lock() calls were in areas in dadone() where
    the peripheral lock was already held.

    Since then, dadone() has been split into separate functions that are
    dedicated to each probe state.

    The result is that when probing a host aware drive, I ran into a recursive
    lock acquisition in dadone_probeatalogdir(). I would have run into the
    same problem in dadone_probeataiddir(), and in dadone_probeatasup() and
    dadone_probeatazone() in the error paths had the probe continued.

    The solution is to take out all of the extra cam_periph_lock() calls. I
    also added cam_periph_assert(periph, MA_OWNED) near the top of each of
    the dadone_* calls. These make it clear to anyone coming along in the
    the future that the lock is held in the probe done functions.

    Also add a locking assert in daprobedone(), to make it clear that it must
    be called with the periph lock held.

    Sponsored by:   Spectra Logic
    Differential Revision:  https://reviews.freebsd.org/D15764

Notes (freebsd):
    svn path=/head/; revision=335154

commit 92253110610c28fc34b45c0c6894294395f480bd
Author: imp <imp@FreeBSD.org>
Date:   Mon Nov 5 18:47:29 2018 +0000

    Only assert locked for many async events.
    
    Many async events that we see are called for this specific path. When
    calling an async callback for a targetted device, XTP will lock that
    specific device's path lock (same as what cam_periph_lock does). For
    those AC_ events, assert we have the lock rather than trying to
    recusrively take it (which causes panics since it's not recursive).
    
    Add annotations about this and about the fact that AC_SCSI_AEN events
    are generated now only in the ata stack (which cannot have a scsi_da
    attachment). Leave it in place in case I've overlooked something as
    the code is harmless.
    
    This is fallout from my attempts to "fix" locking for softc->flags in
    r330796 that's not been triggered often enough to get my attention
    until now.
    
    Sponsored by: Netflix
    MFC After: 3 days
    Differential Revision: https://reviews.freebsd.org/D17837

Notes (freebsd):
    svn path=/head/; revision=340155
Comment 22 commit-hook freebsd_committer freebsd_triage 2022-01-04 02:02:36 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=bb8441184bab60cd8a07c2b94bd6c4ae8b56ec25

commit bb8441184bab60cd8a07c2b94bd6c4ae8b56ec25
Author:     Robert Wing <rew@FreeBSD.org>
AuthorDate: 2022-01-04 01:21:58 +0000
Commit:     Robert Wing <rew@FreeBSD.org>
CommitDate: 2022-01-04 01:56:48 +0000

    cam: don't lock while handling an AC_UNIT_ATTENTION

    Don't take the device_mtx lock in daasync() when handling an
    AC_UNIT_ATTENTION. Instead, assert the lock is held before modifying the
    periph's softc flags.

    The device_mtx lock is taken in xptdevicetraverse() before daasync()
    is eventually called in xpt_async_bcast().

    PR:             240917, 226510, 226578
    Reviewed by:    imp
    MFC after:      3 weeks
    Differential Revision: https://reviews.freebsd.org/D27735

 sys/cam/scsi/scsi_da.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)
Comment 23 commit-hook freebsd_committer freebsd_triage 2022-02-10 19:43:51 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=583480174ee7f4af92f0c5302884a7eece5b12f3

commit 583480174ee7f4af92f0c5302884a7eece5b12f3
Author:     Robert Wing <rew@FreeBSD.org>
AuthorDate: 2022-01-04 01:21:58 +0000
Commit:     Robert Wing <rew@FreeBSD.org>
CommitDate: 2022-02-10 19:43:18 +0000

    cam: don't lock while handling an AC_UNIT_ATTENTION

    Don't take the device_mtx lock in daasync() when handling an
    AC_UNIT_ATTENTION. Instead, assert the lock is held before modifying the
    periph's softc flags.

    The device_mtx lock is taken in xptdevicetraverse() before daasync()
    is eventually called in xpt_async_bcast().

    PR:             240917, 226510, 226578
    Reviewed by:    imp
    MFC after:      3 weeks
    Differential Revision: https://reviews.freebsd.org/D27735

    (cherry picked from commit bb8441184bab60cd8a07c2b94bd6c4ae8b56ec25)

 sys/cam/scsi/scsi_da.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)
Comment 24 Kubilay Kocak freebsd_committer freebsd_triage 2022-02-10 23:24:54 UTC
^Triage: Assign to committer that resolved (last reference) and track stable/* merge (so far).

Does this need to go to stable/12? This issue was a report against 12.0 (CURRENT). 

Will leave this issue closed, but if/when merged, please set mfc-stable12 flag to + and reference this issue in merge commit log so the merge is tracked in all issues
Comment 25 commit-hook freebsd_committer freebsd_triage 2022-03-08 07:12:37 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=1987ff8abca2c9bdff7f385ea2fd1c60cf5b3aeb

commit 1987ff8abca2c9bdff7f385ea2fd1c60cf5b3aeb
Author:     Robert Wing <rew@FreeBSD.org>
AuthorDate: 2022-01-04 01:21:58 +0000
Commit:     Robert Wing <rew@FreeBSD.org>
CommitDate: 2022-03-08 07:07:46 +0000

    cam: don't lock while handling an AC_UNIT_ATTENTION

    Don't take the device_mtx lock in daasync() when handling an
    AC_UNIT_ATTENTION. Instead, assert the lock is held before modifying the
    periph's softc flags.

    The device_mtx lock is taken in xptdevicetraverse() before daasync()
    is eventually called in xpt_async_bcast().

    PR:             240917, 226510, 226578
    Reviewed by:    imp
    Differential Revision: https://reviews.freebsd.org/D27735

    (cherry picked from commit bb8441184bab60cd8a07c2b94bd6c4ae8b56ec25)

 sys/cam/scsi/scsi_da.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)