Bug 219857

Summary: panic in scsi_cd code
Product: Base System Reporter: Andriy Gapon <avg>
Component: kernAssignee: Kenneth D. Merry <ken>
Status: Closed FIXED    
Severity: Affects Only Me CC: avos, bugzilla.freebsd, decui, ken, leandro.lupori, mmance, scsi, weh, whu
Priority: ---    
Version: CURRENT   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
Proposed patch to make the media check async
none
The good log
none
The badlog
none
Fix system hang on boot none

Description Andriy Gapon freebsd_committer freebsd_triage 2017-06-08 08:53:58 UTC
The following panic happened right after cdrecord quit with an error while writing to a CD-RW disk.

(cd0:ahcich10:0:0:0): READ(10). CDB: 28 00 00 00 00 00 00 00 01 00 
(cd0:ahcich10:0:0:0): CAM status: SCSI Status Error
(cd0:ahcich10:0:0:0): SCSI status: Check Condition
(cd0:ahcich10:0:0:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range)
(cd0:ahcich10:0:0:0): Error 22, Unretryable error
(cd0:ahcich10:0:0:0): cddone: got error 0x16 back

panic: sleepq_add: td 0xfffff800060bf000 to sleep on wchan 0xfffff801b3f1a048 with sleeping prohibited
cpuid = 4
time = 1496908339
KDB: stack backtrace:
db_trace_self_wrapper() at 0xffffffff80439f4b = db_trace_self_wrapper+0x2b/frame 0xfffffe01ef5da770
kdb_backtrace() at 0xffffffff8068b369 = kdb_backtrace+0x39/frame 0xfffffe01ef5da820
vpanic() at 0xffffffff806521f2 = vpanic+0x162/frame 0xfffffe01ef5da860
panic() at 0xffffffff80651f23 = panic+0x43/frame 0xfffffe01ef5da8c0
sleepq_add() at 0xffffffff806979ff = sleepq_add+0x30f/frame 0xfffffe01ef5da910
_sleep() at 0xffffffff8065c0de = _sleep+0x2ae/frame 0xfffffe01ef5da9b0
cam_periph_ccbwait() at 0xffffffff802aac9d = cam_periph_ccbwait+0x5d/frame 0xfffffe01ef5da9e0
cam_periph_runccb() at 0xffffffff802aaac8 = cam_periph_runccb+0xb8/frame 0xfffffe01ef5daa40
cdrunccb() at 0xffffffff802cab8c = cdrunccb+0x3c/frame 0xfffffe01ef5daa60
cdprevent() at 0xffffffff802ca48b = cdprevent+0x7b/frame 0xfffffe01ef5daaa0
cdcheckmedia() at 0xffffffff802ca132 = cdcheckmedia+0x22/frame 0xfffffe01ef5daaf0
cdstrategy() at 0xffffffff802c8356 = cdstrategy+0x56/frame 0xfffffe01ef5dab20
g_disk_start() at 0xffffffff805c38f0 = g_disk_start+0x180/frame 0xfffffe01ef5dab70
g_io_schedule_down() at 0xffffffff805c760b = g_io_schedule_down+0x1eb/frame 0xfffffe01ef5dab90
g_down_procbody() at 0xffffffff805c7fed = g_down_procbody+0x6d/frame 0xfffffe01ef5daba0
fork_exit() at 0xffffffff80618680 = fork_exit+0xd0/frame 0xfffffe01ef5dabf0
fork_trampoline() at 0xffffffff8084304e = fork_trampoline+0xe/frame 0xfffffe01ef5dabf0
Comment 1 Kenneth D. Merry freebsd_committer freebsd_triage 2017-06-13 21:32:50 UTC
I can see why this happened.  It is a little strange that I don't recall seeing this before, since the media check code was added in 2003, and it appears that sleeping has been prohibited for a similar amount of time.

The problem is:

1.  In g_io_schedule_down(), THREAD_NO_SLEEPING() is called before the start routine is called.

2.  In cdstrategy(), if the CD_FLAG_VALID_MEDIA flag isn't set, the cd(4) driver probes to see if there is valid media in the drive.

3.  In cdcheckmedia(), the cd(4) driver first prevents CD removal by calling cdprevent().

4.  In cdprevent(), the cd(4) driver calls cdrunccb(), which calls cam_periph_runccb(), which sleeps and runs into the panic.

In most cases, this won't happen because the cd(4) driver checks for media on open(), and prevents removal when it finds media.  The driver does allow the open() when no media is present, so that the user can call the CDIOCCLOSE and CDIOCEJECT ioctls.

One possible solution is just failing the I/O in cdstrategy() if the CD_FLAG_VALID_MEDIA flag is not set.  We would need to add code that checks for media if the user does a CDIOCCLOSE ioctl.

The challenge there, however, is that it would not catch the case where the user physically closes the drive while they have the device open and then sends a read.

In order to handle that case, and not sleep underneath cdstrategy(), we would need to do an asynchronous media check.

The way I would probably do it is to add a series of states into cdstart() and cddone() that mirror the way that cdcheckmedia() operates.  So, prevent, read capacity, read TOC, and so on.  We would still call xpt_schedule() from cdstrategy(), but cdstart() would go into a media probe state machine to detect the media and once it has validated the media or not, satisfy the I/O or send it back with an error.

As a bonus, we would re-implement the existing code so that it can work either asynchronously or synchronously.  (To avoid duplicate code paths.)
Comment 2 Andriy Gapon freebsd_committer freebsd_triage 2018-03-28 10:09:07 UTC
*** Bug 226870 has been marked as a duplicate of this bug. ***
Comment 3 Dexuan Cui 2018-06-04 04:14:42 UTC
Our test team sees a similar call-trace:

panic: sleepq_add: td 0xfffff8010581a000 to sleep on wchan 0xfffff8010f924048 with sleeping prohibited
cpuid = 1
time = 1527752111
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00005815e0
vpanic() at vpanic+0x1a3/frame 0xfffffe0000581640
doadump() at doadump/frame 0xfffffe00005816c0
sleepq_add() at sleepq_add+0x378/frame 0xfffffe0000581710
_sleep() at _sleep+0x2c3/frame 0xfffffe00005817b0
cam_periph_runccb() at cam_periph_runccb+0x1ad/frame 0xfffffe0000581900
cdcheckmedia() at cdcheckmedia+0x8a/frame 0xfffffe0000581980
cdstrategy() at cdstrategy+0x55/frame 0xfffffe00005819b0
g_disk_start() at g_disk_start+0x48e/frame 0xfffffe0000581a20
g_io_schedule_down() at g_io_schedule_down+0x105/frame 0xfffffe0000581a60
g_down_procbody() at g_down_procbody+0x6d/frame 0xfffffe0000581a70
fork_exit() at fork_exit+0x84/frame 0xfffffe0000581ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0000581ab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 13 tid 100113 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why
db> 


But our test team is just running some fio test with RAID (we run "kldload geom_stripe). I'm not sure why our test is triggering cdstrategy() and cdcheckmedia(), which seems to be related to CD? Our test team says the panic happens to FreeBSD 10.3 and the latest HEAD code, meaning this should be a longstanding issue...

Anyway, do we have a plan to fix the panic originally reported?
Comment 4 Andriy Gapon freebsd_committer freebsd_triage 2018-06-04 06:23:56 UTC
(In reply to Dexuan Cui from comment #3)
>  I'm not sure why our test is triggering cdstrategy() and cdcheckmedia(), which seems to be related to CD?

It's probably because the new geom class wants to read media to check for on-disk metadata and it ends up trying to read from the empty CD drive too.
Comment 5 Dexuan Cui 2018-06-14 21:20:09 UTC
@ken: Is anyone working on this bug recently?
Comment 6 Kenneth D. Merry freebsd_committer freebsd_triage 2018-06-15 01:57:30 UTC
I have not been working on it.  Feel free to work on the solution.  My comments from last year give a roadmap for how to do it.
Comment 7 Dexuan Cui 2018-06-15 05:25:46 UTC
(In reply to Kenneth D. Merry from comment #6)
I wish I could help, but unluckily the code here is quite unfamiliar to me. :-(
Comment 8 Kenneth D. Merry freebsd_committer freebsd_triage 2018-06-15 20:47:40 UTC
(In reply to Dexuan Cui from comment #7)

Ok, I'll see if I can come up with a fix.  No promises on when/if it'll get done, of course. :)
Comment 9 Dexuan Cui 2018-06-15 21:04:24 UTC
(In reply to Kenneth D. Merry from comment #8)
Understood. :-)  Thanks!
Comment 10 Kenneth D. Merry freebsd_committer freebsd_triage 2018-07-11 15:37:18 UTC
Created attachment 195055 [details]
Proposed patch to make the media check async

Here is a proposed patch against FreeBSD/head as of revision 335477 on June 21st, 2018.

This makes the media check process asynchronous, so we no longer block in cdstrategy() to check for media.

This still needs some additional testing in various situations.  I think there may still be one panic (GEOM assertion I believe) I saw with an audio CD in a drive, and I have not tested this with a SCSI CD/DVD drive.

I have only tested so far with a SATA drive running ATAPI.

In any case, give this a try and let me know if it improves things for you.
Comment 11 Dexuan Cui 2018-07-11 18:34:58 UTC
(In reply to Kenneth D. Merry from comment #10)
Thanks very much! We'll test it and report back.
Comment 12 Dexuan Cui 2018-07-17 21:48:36 UTC
(In reply to Dexuan Cui from comment #11)
Hi Kenneth, we tested your patch with 12-CURRENT, and we believe the panic we saw is fixed by the patch! Thank you very much!
Comment 13 Kenneth D. Merry freebsd_committer freebsd_triage 2018-07-18 15:04:36 UTC
(In reply to Dexuan Cui from comment #12)

Great, I'm glad it is working!  I'll test it in a few more scenarios before I commit it.
Comment 14 Dexuan Cui 2018-07-20 23:19:02 UTC
(In reply to Kenneth D. Merry from comment #13)
Hi Ken, we found a regression caused by the patch: when we tested the patch with recent 12-CURRENT (July 11) in a *UP* FreeBSD Virtual Machine running on Windows Server 2012 R2, the VM hangs after printing 

Trying to mount root from ufs:/dev/da0s1a [rw]...

If I use a SMP VM (e.g. 2 CPUs or more), the issue will go away.
If I use a UP VM, and insert a .iso file into the virtual DVD drive, the issue will go away as well.

However, the same UP VM image can run fine on a Windows Server 2016 host with 1 CPU, and no .iso file in the virtual DVD drive.

So I can't say this patch has a bug. My gut feeling is that: maybe the patch somehow triggers a host issue, e.g. maybe the host's emulation of DVD drive returns some invalid value to the VM and the VM hangs? When the issue happens, the VM's CPU utilization is 0%, so I'm sure the VM is not in a busy loop. Probably the VM is waiting for some condition?

I'm attaching the good log (I insert a .iso file into the virtual DVD) and the bad log (the virtual DVD is empty). I got the log with the same VM running on Windows Server 2012 R2. 


The good one is:
...
cd0 at ata1 bus 0 scbus1 target 0 lun 0
cd0: <Msft Virtual CD/ROM 1.0> Removable CD-ROM SPC-3 SCSI device
cd0: 16.700MB/s transfers (WDMA2, ATAPI 12bytes, PIO 65534bytes)
cd0: 669MB (342562 2048 byte sectors)
...
Trying to mount root from ufs:/dev/da0s1a [rw]...
GEOM: new disk da0
GEOM: new disk cd0
...
start_init: trying /sbin/init
...
login:


The bad one is:
cd0 at ata1 bus 0 scbus1 target 0 lun 0
cd0: <Msft Virtual CD/ROM 1.0> Removable CD-ROM SPC-3 SCSI device
cd0: 16.700MB/s transfers (WDMA2, ATAPI 12bytes, PIO 65534bytes)
cd0: Attempt to query device size failed: NOT READY, Medium not present
...
Trying to mount root from ufs:/dev/da0s1a [rw]...
GEOM: new disk cd0
GEOM: new disk da0
(Dexuan: it hangs here for ever)


How can I know where it hangs? Any insight or suggestion?  Thanks!
Comment 15 Dexuan Cui 2018-07-21 00:22:49 UTC
Created attachment 195314 [details]
The good log
Comment 16 Dexuan Cui 2018-07-21 00:23:08 UTC
Created attachment 195315 [details]
The badlog
Comment 17 Dexuan Cui 2018-07-21 00:29:04 UTC
(In reply to Dexuan Cui from comment #14)
BTW, the issue can't reproduce if we don't apply the patch.
Comment 18 Kenneth D. Merry freebsd_committer freebsd_triage 2018-07-21 18:39:07 UTC
Ok, there is likely a bug in the patch, we'll just have to track it down.

Here are a few things to try that may help get to the bottom of it.

First, make sure you're running with WITNESS and INVARIANTS enabled, since those may catch locking or other issues that are causing problems.

1.  In the situation that hangs (Windows Server 2012, UP, no ISO in the drive), when the OS hangs, break into the debugger and run "ps" and "alltrace" and send the output. You can break into the debugger normally by sending a "break", but you can also configure ALT_BREAK_TO_DEBUGGER in the kernel config file.  Here is what sys/conf/NOTES says about triggering the debugger:

# Solaris implements a new BREAK which is initiated by a character
# sequence CR ~ ^b which is similar to a familiar pattern used on
# Sun servers by the Remote Console.  There are FreeBSD extensions:
# CR ~ ^p requests force panic and CR ~ ^r requests a clean reboot.

2.  Run the VM under Windows Server 2012 in SMP mode with no ISO in the drive, run these commands and send the output:

camcontrol cmd cd0 -v -c "1e 0 0 0 1 0"

camcontrol readcap cd0 -v

camcontrol cmd cd0 -v -t 60 -c "28 0 0 0 0 0 0 0 1 0" -i 2048 - | hd

The first is Prevent Media Removal, the second is Read Capacity, the third is a standard 10 byte read of one sector only.

My assumption is that there should be no difference in the behavior of the emulated CD device between SMP and UP mode, and we're just avoiding the problem in SMP mode due to timing.

I have tested in UP and SMP, and haven't seen this particular problem, but it could be that the emulated CD driver on Windows Server 2012 is responding in a way I didn't expect or the timing is just right in UP mode to trigger the bug.

Let me know what you figure out.
Comment 19 Dexuan Cui 2018-07-21 18:50:01 UTC
(In reply to Kenneth D. Merry from comment #18)
I'll try to debug the issue following the detailed instructions. Thanks so much!
Comment 20 Kenneth D. Merry freebsd_committer freebsd_triage 2018-10-08 00:54:01 UTC
*** Bug 232044 has been marked as a duplicate of this bug. ***
Comment 21 Andriy Gapon freebsd_committer freebsd_triage 2018-12-07 12:21:36 UTC
I am starting (at last!) to test the patch and here is a warning that I got during a build:
> /usr/devel/svn/head/sys/cam/scsi/scsi_cd.c:2862:1: warning: unused function 'cdsize' [-Wunused-function]
Comment 22 Harald Schmalzbauer 2018-12-07 12:31:22 UTC
Just to merge sideband communication:
I wasn't able to reproduce the UP problems described in comment#14.
The patch prevents my 12-stable production machines from panicing for a while; but I'm not extensively using ODDs over all...

Thanks,

-harry
Comment 23 Andriy Voskoboinyk freebsd_committer freebsd_triage 2019-01-21 09:38:14 UTC
*** Bug 235081 has been marked as a duplicate of this bug. ***
Comment 24 Harald Schmalzbauer 2019-06-15 08:11:00 UTC
Kenneth's patch successfully prevented my machines from panicing for almost one year. I don't use ODD regularly, so the fact that I don't see any other side effect nor regression isn't counting much, but I remember staring at console while hypervisor reboots after ejecting (removing USB-) OD(D)...
Unfortunately it seems that 11.3 again ships with this trip-wire/trip-mine.

I'd vote for commiting it as soon as possible.

Thanks,
-harry
Comment 25 commit-hook freebsd_committer freebsd_triage 2019-08-29 07:51:56 UTC
A commit references this bug:

Author: avg
Date: Thu Aug 29 07:51:12 UTC 2019
New revision: 351599
URL: https://svnweb.freebsd.org/changeset/base/351599

Log:
  scsi_cd: make the media check asynchronous

  This makes the media check process asynchronous, so we no longer block
  in cdstrategy() to check for media.

  PR:		219857
  Obtained from:	ken
  MFC after:	3 weeks

Changes:
  head/sys/cam/scsi/scsi_cd.c
  head/sys/cam/scsi/scsi_cd.h
Comment 26 commit-hook freebsd_committer freebsd_triage 2019-09-25 19:47:02 UTC
A commit references this bug:

Author: avg
Date: Wed Sep 25 19:46:18 UTC 2019
New revision: 352714
URL: https://svnweb.freebsd.org/changeset/base/352714

Log:
  MFC r351599,r351600: scsi_cd: make the media check asynchronous

  PR:		219857

Changes:
_U  stable/12/
  stable/12/sys/cam/scsi/scsi_cd.c
  stable/12/sys/cam/scsi/scsi_cd.h
Comment 27 commit-hook freebsd_committer freebsd_triage 2019-09-25 19:47:04 UTC
A commit references this bug:

Author: avg
Date: Wed Sep 25 19:46:53 UTC 2019
New revision: 352715
URL: https://svnweb.freebsd.org/changeset/base/352715

Log:
  MFC r351599,r351600: scsi_cd: make the media check asynchronous

  PR:		219857

Changes:
_U  stable/11/
  stable/11/sys/cam/scsi/scsi_cd.c
  stable/11/sys/cam/scsi/scsi_cd.h
Comment 28 Leandro Lupori 2019-11-28 14:32:01 UTC
I was able to reproduce the problem described in comment#14.

Actually, I wasn't trying to reproduce it, I just started seeing it a few weeks ago, in CURRENT.

I can reproduce it consistently, by using QEMU to run a PowerPC64 VM with a single core/thread (-smp 1).

It happens only when there is no media in the emulated CD-ROM, a device that QEMU adds by default, unless -nodefaults is specified in command line.

I've debugged it and this is what I've found:

1- After the CD probe is successful, GEOM will try to open the device, which will end up calling cdcheckmedia(), that sets CD state to CD_STATE_MEDIA_PREVENT.
2- Next, scsi_prevent() is executed and succeeds, the CD_FLAG_DISC_LOCKED flag is set and CD state moves to CD_STATE_MEDIA_SIZE.
3- Next, scsi_read_capacity() is executed and fails, state is set to CD_STATE_MEDIA_ALLOW, cdmediaprobedone() is called and wakes up cdcheckmedia().
4- Then, when cdstart() is invoked to process CD_STATE_MEDIA_ALLOW, it first checks if CD_FLAG_DISC_LOCKED is set, and if so skips directly to CD_STATE_MEDIA_SIZE state. This will repeat the steps of bullet 3, entering an infinite MEDIA_SIZE command loop.

When there is a least another core/thread, the GEOM thread that performed the initial cdopen() will get scheduled again, closing the CD device, that will call cdprevent(PR_ALLOW) that clears the CD_FLAG_DISC_LOCKED flag and breaks the loop.

So, apparently, the problem is CD_STATE_MEDIA_ALLOW being skipped when CD_FLAG_DISC_LOCKED is set. If I understand correctly, in this case, the state should be advanced to CD_STATE_MEDIA size only when the current state is CD_STATE_MEDIA_PREVENT.
Comment 29 Leandro Lupori 2019-11-28 14:37:35 UTC
Created attachment 209512 [details]
Fix system hang on boot

This patch fixes the system hanging on boot.
The previous comment presents a full explanation on why.
Comment 30 Andriy Gapon freebsd_committer freebsd_triage 2019-12-02 13:43:09 UTC
(In reply to Leandro Lupori from comment #28)
Your analysis sounds correct to me.
Thank you for it and the patch!

I would like to hear from Ken if he agrees.
Comment 31 Kenneth D. Merry freebsd_committer freebsd_triage 2019-12-02 18:52:37 UTC
(In reply to Andriy Gapon from comment #30)
I agree, Leandro's analysis is correct!

Thank you for debugging this, and for the patch!
Comment 32 commit-hook freebsd_committer freebsd_triage 2019-12-02 19:58:37 UTC
A commit references this bug:

Author: ken
Date: Mon Dec  2 19:57:40 UTC 2019
New revision: 355299
URL: https://svnweb.freebsd.org/changeset/base/355299

Log:
  Fix a hang introduced in r351599.

  My changes in 351599 (kindly committed by avg) made the cd(4) media check
  asynchronous to avoid a sleep while holding a mutex.

  There was a difficult to reproduce bug with those changes that caused a
  hang on boot on some single processor machines/VMs.  Leandro Lupori
  managed to reproduce the bug, diagnose it, and supplied a patch!  Here is
  his analysis, from the PR:

  ======
  I was able to reproduce the problem described in comment#14.

  Actually, I wasn't trying to reproduce it, I just started seeing it a few
  weeks ago, in CURRENT.

  I can reproduce it consistently, by using QEMU to run a PowerPC64 VM with a
  single core/thread (-smp 1).

  It happens only when there is no media in the emulated CD-ROM, a device
  that QEMU adds by default, unless -nodefaults is specified in command line.

  I've debugged it and this is what I've found:

  1- After the CD probe is successful, GEOM will try to open the device,
  which will end up calling cdcheckmedia(), that sets CD state to
  CD_STATE_MEDIA_PREVENT.
  2- Next, scsi_prevent() is executed and succeeds, the CD_FLAG_DISC_LOCKED
  flag is set and CD state moves to CD_STATE_MEDIA_SIZE.
  3- Next, scsi_read_capacity() is executed and fails, state is set to
  CD_STATE_MEDIA_ALLOW, cdmediaprobedone() is called and wakes up
  cdcheckmedia().
  4- Then, when cdstart() is invoked to process CD_STATE_MEDIA_ALLOW, it
  first checks if CD_FLAG_DISC_LOCKED is set, and if so skips directly to
  CD_STATE_MEDIA_SIZE state. This will repeat the steps of bullet 3, entering
  an infinite MEDIA_SIZE command loop.

  When there is a least another core/thread, the GEOM thread that performed
  the initial cdopen() will get scheduled again, closing the CD device, that
  will call cdprevent(PR_ALLOW) that clears the CD_FLAG_DISC_LOCKED flag and
  breaks the loop.

  So, apparently, the problem is CD_STATE_MEDIA_ALLOW being skipped when
  CD_FLAG_DISC_LOCKED is set. If I understand correctly, in this case, the
  state should be advanced to CD_STATE_MEDIA size only when the current state
  is CD_STATE_MEDIA_PREVENT.
  =====

  PR:		kern/219857
  Submitted by:	Leandro Lupori <leandro.lupori@gmail.com>
  MFC after:	1 week

Changes:
  head/sys/cam/scsi/scsi_cd.c
Comment 33 commit-hook freebsd_committer freebsd_triage 2019-12-17 20:29:57 UTC
A commit references this bug:

Author: ken
Date: Tue Dec 17 20:29:47 UTC 2019
New revision: 355862
URL: https://svnweb.freebsd.org/changeset/base/355862

Log:
  MFC r355299:

    ------------------------------------------------------------------------
    r355299 | ken | 2019-12-02 14:57:39 -0500 (Mon, 02 Dec 2019) | 52 lines

    Fix a hang introduced in r351599.

    My changes in 351599 (kindly committed by avg) made the cd(4) media check
    asynchronous to avoid a sleep while holding a mutex.

    There was a difficult to reproduce bug with those changes that caused a
    hang on boot on some single processor machines/VMs.  Leandro Lupori
    managed to reproduce the bug, diagnose it, and supplied a patch!  Here is
    his analysis, from the PR:

    ======
    I was able to reproduce the problem described in comment#14.

    Actually, I wasn't trying to reproduce it, I just started seeing it a few
    weeks ago, in CURRENT.

    I can reproduce it consistently, by using QEMU to run a PowerPC64 VM with a
    single core/thread (-smp 1).

    It happens only when there is no media in the emulated CD-ROM, a device
    that QEMU adds by default, unless -nodefaults is specified in command line.

    I've debugged it and this is what I've found:

    1- After the CD probe is successful, GEOM will try to open the device,
    which will end up calling cdcheckmedia(), that sets CD state to
    CD_STATE_MEDIA_PREVENT.
    2- Next, scsi_prevent() is executed and succeeds, the CD_FLAG_DISC_LOCKED
    flag is set and CD state moves to CD_STATE_MEDIA_SIZE.
    3- Next, scsi_read_capacity() is executed and fails, state is set to
    CD_STATE_MEDIA_ALLOW, cdmediaprobedone() is called and wakes up
    cdcheckmedia().
    4- Then, when cdstart() is invoked to process CD_STATE_MEDIA_ALLOW, it
    first checks if CD_FLAG_DISC_LOCKED is set, and if so skips directly to
    CD_STATE_MEDIA_SIZE state. This will repeat the steps of bullet 3, entering
    an infinite MEDIA_SIZE command loop.

    When there is a least another core/thread, the GEOM thread that performed
    the initial cdopen() will get scheduled again, closing the CD device, that
    will call cdprevent(PR_ALLOW) that clears the CD_FLAG_DISC_LOCKED flag and
    breaks the loop.

    So, apparently, the problem is CD_STATE_MEDIA_ALLOW being skipped when
    CD_FLAG_DISC_LOCKED is set. If I understand correctly, in this case, the
    state should be advanced to CD_STATE_MEDIA size only when the current state
    is CD_STATE_MEDIA_PREVENT.
    =====
    ------------------------------------------------------------------------

  PR:		kern/219857
  Submitted by:	Leandro Lupori <leandro.lupori@gmail.com>

Changes:
_U  stable/12/
  stable/12/sys/cam/scsi/scsi_cd.c
Comment 34 commit-hook freebsd_committer freebsd_triage 2019-12-17 20:31:00 UTC
A commit references this bug:

Author: ken
Date: Tue Dec 17 20:30:33 UTC 2019
New revision: 355863
URL: https://svnweb.freebsd.org/changeset/base/355863

Log:
  MFC r355299:

    ------------------------------------------------------------------------
    r355299 | ken | 2019-12-02 14:57:39 -0500 (Mon, 02 Dec 2019) | 52 lines

    Fix a hang introduced in r351599.

    My changes in 351599 (kindly committed by avg) made the cd(4) media check
    asynchronous to avoid a sleep while holding a mutex.

    There was a difficult to reproduce bug with those changes that caused a
    hang on boot on some single processor machines/VMs.  Leandro Lupori
    managed to reproduce the bug, diagnose it, and supplied a patch!  Here is
    his analysis, from the PR:

    ======
    I was able to reproduce the problem described in comment#14.

    Actually, I wasn't trying to reproduce it, I just started seeing it a few
    weeks ago, in CURRENT.

    I can reproduce it consistently, by using QEMU to run a PowerPC64 VM with a
    single core/thread (-smp 1).

    It happens only when there is no media in the emulated CD-ROM, a device
    that QEMU adds by default, unless -nodefaults is specified in command line.

    I've debugged it and this is what I've found:

    1- After the CD probe is successful, GEOM will try to open the device,
    which will end up calling cdcheckmedia(), that sets CD state to
    CD_STATE_MEDIA_PREVENT.
    2- Next, scsi_prevent() is executed and succeeds, the CD_FLAG_DISC_LOCKED
    flag is set and CD state moves to CD_STATE_MEDIA_SIZE.
    3- Next, scsi_read_capacity() is executed and fails, state is set to
    CD_STATE_MEDIA_ALLOW, cdmediaprobedone() is called and wakes up
    cdcheckmedia().
    4- Then, when cdstart() is invoked to process CD_STATE_MEDIA_ALLOW, it
    first checks if CD_FLAG_DISC_LOCKED is set, and if so skips directly to
    CD_STATE_MEDIA_SIZE state. This will repeat the steps of bullet 3, entering
    an infinite MEDIA_SIZE command loop.

    When there is a least another core/thread, the GEOM thread that performed
    the initial cdopen() will get scheduled again, closing the CD device, that
    will call cdprevent(PR_ALLOW) that clears the CD_FLAG_DISC_LOCKED flag and
    breaks the loop.

    So, apparently, the problem is CD_STATE_MEDIA_ALLOW being skipped when
    CD_FLAG_DISC_LOCKED is set. If I understand correctly, in this case, the
    state should be advanced to CD_STATE_MEDIA size only when the current state
    is CD_STATE_MEDIA_PREVENT.
    =====
    ------------------------------------------------------------------------

  PR:		kern/219857
  Submitted by:	Leandro Lupori <leandro.lupori@gmail.com>

Changes:
_U  stable/11/
  stable/11/sys/cam/scsi/scsi_cd.c
Comment 35 Kenneth D. Merry freebsd_committer freebsd_triage 2019-12-17 20:34:01 UTC
Patch committed to head, stable/11 and stable/12.  Thank you!
Comment 36 Marc Mance 2022-04-02 21:18:24 UTC
I am on FreeBSD 13.  I am very new to BSD.  I just tried to use Brasero to burn a blu-ray and it was not recognizing I had a blank disc in.

When I looked at dmesg:

ugen3.2: <JMicron JMS579> at usbus3
umass0 on uhub0
umass0: <JMicron JMS579, class 0/0, rev 3.00/4.14, addr 1> on usbus3
umass0:  SCSI over Bulk-Only; quirks = 0x0100
umass0:4:0: Attached to scbus4
cd0 at umass-sim0 bus 0 scbus4 target 0 lun 0
cd0: <ASUS BW-16D1HT 3.10> Removable CD-ROM SCSI device
cd0: 400.000MB/s transfers
cd0: Attempt to query device size failed: NOT READY, Medium not present - tray closed
cd0: quirks=0x10<10_BYTE_ONLY>
(cd0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 00 00 00 00 01 00 
(cd0:umass-sim0:0:0:0): CAM status: SCSI Status Error
(cd0:umass-sim0:0:0:0): SCSI status: Check Condition
(cd0:umass-sim0:0:0:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range)
(cd0:umass-sim0:0:0:0): Error 22, Unretryable error
(cd0:umass-sim0:0:0:0): cddone: got error 0x16 back
(cd0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 00 00 00 00 01 00 
(cd0:umass-sim0:0:0:0): CAM status: SCSI Status Error
(cd0:umass-sim0:0:0:0): SCSI status: Check Condition
(cd0:umass-sim0:0:0:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range)
(cd0:umass-sim0:0:0:0): Error 22, Unretryable error
(cd0:umass-sim0:0:0:0): cddone: got error 0x16 back
(cd0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 00 00 00 00 01 00 
(cd0:umass-sim0:0:0:0): CAM status: SCSI Status Error
(cd0:umass-sim0:0:0:0): SCSI status: Check Condition
(cd0:umass-sim0:0:0:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range)
(cd0:umass-sim0:0:0:0): Error 22, Unretryable error
(cd0:umass-sim0:0:0:0): cddone: got error 0x16 back
(cd0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 00 00 00 00 01 00 
(cd0:umass-sim0:0:0:0): CAM status: SCSI Status Error
(cd0:umass-sim0:0:0:0): SCSI status: Check Condition
(cd0:umass-sim0:0:0:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range)
(cd0:umass-sim0:0:0:0): Error 22, Unretryable error
(cd0:umass-sim0:0:0:0): cddone: got error 0x16 back
(cd0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 00 00 00 00 01 00 
(cd0:umass-sim0:0:0:0): CAM status: SCSI Status Error
(cd0:umass-sim0:0:0:0): SCSI status: Check Condition
(cd0:umass-sim0:0:0:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range)
(cd0:umass-sim0:0:0:0): Error 22, Unretryable error
(cd0:umass-sim0:0:0:0): cddone: got error 0x16 back
(cd0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 00 00 00 00 01 00 
(cd0:umass-sim0:0:0:0): CAM status: SCSI Status Error
(cd0:umass-sim0:0:0:0): SCSI status: Check Condition
(cd0:umass-sim0:0:0:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range)
(cd0:umass-sim0:0:0:0): Error 22, Unretryable error
(cd0:umass-sim0:0:0:0): cddone: got error 0x16 back
(cd0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 00 00 00 00 01 00 
(cd0:umass-sim0:0:0:0): CAM status: SCSI Status Error
(cd0:umass-sim0:0:0:0): SCSI status: Check Condition
(cd0:umass-sim0:0:0:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range)
(cd0:umass-sim0:0:0:0): Error 22, Unretryable error
(cd0:umass-sim0:0:0:0): cddone: got error 0x16 back
(cd0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 00 00 00 00 01 00 
(cd0:umass-sim0:0:0:0): CAM status: SCSI Status Error
(cd0:umass-sim0:0:0:0): SCSI status: Check Condition
(cd0:umass-sim0:0:0:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range)
(cd0:umass-sim0:0:0:0): Error 22, Unretryable error
(cd0:umass-sim0:0:0:0): cddone: got error 0x16 back
(cd0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 00 00 00 00 01 00 
(cd0:umass-sim0:0:0:0): CAM status: SCSI Status Error
(cd0:umass-sim0:0:0:0): SCSI status: Check Condition
(cd0:umass-sim0:0:0:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range)
(cd0:umass-sim0:0:0:0): Error 22, Unretryable error
(cd0:umass-sim0:0:0:0): cddone: got error 0x16 back


I did not get a panic.  But this was the only place I have found the same error.  It only happens when I have a blank Blu-Ray in the drive.  This is my first optical I have used with BSD, I was trying to play a DVD movie to test the drive in BSD, but I have to learn how still.

The error above the panic was the same as what I got, but I am on a new BSD.  I don't know if it is related.

--------

[mmance@freeMance /mnt]$ sudo cdrecord -minfo
Cdrecord-ProDVD-ProBD-Clone 3.02a10 2021/07/23 (amd64-unknown-freebsd13.0) Copyright (C) 1995-2019 Joerg Schilling
Using libscg version 'schily-0.9'.
No target specified, trying to find one...
Using dev=4,0,0.
Device type    : Removable CD-ROM
Version        : 0
Response Format: 2
Capabilities   : 
Vendor_info    : 'ASUS    '
Identifikation : 'BW-16D1HT       '
Revision       : '3.10'
Device seems to be: Generic mmc2 DVD-R/DVD-RW/DVD-RAM.
Using generic SCSI-3/mmc-3 BD-R driver (mmc_bdr).
Driver flags   : NO-CD BD MMC-3 BURNFREE 
Supported modes: PACKET SAO LAYER_JUMP
cdrecord: Warning: Cannot read drive buffer.
cdrecord: Warning: The DMA speed test has been skipped.
WARNING: Phys disk size 4452354 differs from rzone size 12219392! Prerecorded disk?
WARNING: Phys start: 25088 Phys end 4477441
Mounted media class:      BD
Mounted media type:       BD-R sequential recording
Disk Is not erasable
data type:                standard
disk status:              empty
session status:           empty
BG format status:         none
first track:              1
number of sessions:       1
first track in last sess: 1
last track in last sess:  1
Disk Is unrestricted
Disk type: DVD, HD-DVD or BD

Track  Sess Type   Start Addr End Addr   Size
==============================================
    1     1 Blank  0          12219391   12219392

Next writable address:              0
Remaining writable size:            12219392
Comment 37 Marc Mance 2022-04-02 22:00:19 UTC
Brasero still does not see a disc, however;

I could use cdrecord to write to blu-ray and then use cmp to verify it.

So the error message I saw can probably be safely ignored.  It would probably matter more if I also had the panic error.