Bug 99017

Summary: [ata] [patch] FreeBSD versions above 5.3 panic if atapi drives become unresponsive
Product: Base System Reporter: Fabian Keil <freebsd-listen>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: Unspecified   
Hardware: Any   
OS: Any   

Description Fabian Keil 2006-06-16 11:10:24 UTC
As reported several times on the mailing lists,
FreeBSD version above 5.3 panic if atapi drives
become unresponsive.

Quoting an older mailing list message:

I own a Plextor PlexWriter Premium, the drive has a buggy firmware
which crashes if you try to burn multi session in SAO mode.

On FreeBSD 5.2 and 5.3 the drive "just" disappears from the bus: 

root@datenspeicher.localhost atacontrol list
ATA channel 0:
    Master:  ad0 <WDC WD205AA/05.05B05> ATA/ATAPI rev 4
    Slave:   ad1 <MAXTOR 4K040H2/A08.1500> ATA/ATAPI rev 5
ATA channel 1:
    Master:      no device present
    Slave:  acd0 <PLEXTOR CD-R PREMIUM/1.05> ATA/ATAPI rev 0
root@datenspeicher.localhost cdrecord dev=1,1,0 -dao -multi -dummy tsize=100000s /dev/random
Cdrecord 2.00.3 (i386-unknown-freebsd5.2) Copyright (C) 1995-2002 Jörg Schilling
scsidev: '1,1,0'
scsibus: 1 target: 1 lun: 0
Using libscg version 'schily-0.7'
Device type    : Removable CD-ROM
Version        : 0
Response Format: 1
Vendor_info    : 'PLEXTOR '
Identifikation : 'CD-R   PREMIUM  '
Revision       : '1.05'
Device seems to be: Generic mmc CD-RW.
Using generic SCSI-3/mmc CD-R driver (mmc_cdr).
Driver flags   : MMC-3 SWABAUDIO BURNFREE VARIREC 
Supported modes: TAO PACKET SAO SAO/R96P SAO/R96R RAW/R16 RAW/R96P RAW/R96R
Starting to write CD/DVD at speed 52 in dummy SAO mode for multi session.
Last chance to quit, starting dummy write    0 seconds. Operation starts.
[a long time nothing ]
^C[after doing atacontrol reinit 1 on another shell:]cdrecord: Caught interrupt.
root@datenspeicher.localhost atacontrol list
ATA channel 0:
    Master:  ad0 <WDC WD205AA/05.05B05> ATA/ATAPI rev 4
    Slave:   ad1 <MAXTOR 4K040H2/A08.1500> ATA/ATAPI rev 5
ATA channel 1:
    Master:      no device present
    Slave:       no device present


On FreeBSD 6.0-BETA2 a panic is caused:

fk@africanqueen ~ $atacontrol list
ATA channel 0:
    Master: acd0 <LITE-ON DVDRW SOHW-1693S/KS0A> ATA/ATAPI revision 5
    Slave:   ad1 <WDC WD800BB-00CAA1/17.07W17> ATA/ATAPI revision 5
ATA channel 1:
    Master: acd1 <LITE-ON LTR-48125W/VS0D> ATA/ATAPI revision 0
    Slave:  acd2 <PLEXTOR CD-R PREMIUM/1.05> ATA/ATAPI revision 0
ATA channel 2:
    Master:  ad4 <SAMSUNG SV1204H/RK100-15> ATA/ATAPI revision 6
    Slave:       no device present
ATA channel 3:
    Master:      no device present
    Slave:       no device present

fk@africanqueen ~ $cdrecord dev=1,1,0 -sao -multi -dummy tsize=1s -v /dev/random 
Cdrecord-Clone 2.01.01a03 (i386-unknown-freebsd6.0) Copyright (C) 1995-2005 Jörg Schilling
TOC Type: 3 = CD-ROM XA mode 2
cdrecord: Operation not permitted. WARNING: Cannot do mlockall(2).
cdrecord: WARNING: This causes a high risk for buffer underruns.
scsidev: '1,1,0'
scsibus: 1 target: 1 lun: 0
Using libscg version 'schily-0.8'.
SCSI buffer size: 64512
atapi: 0
Device type    : Removable CD-ROM
Version        : 0
Response Format: 1
Vendor_info    : 'PLEXTOR '
Identifikation : 'CD-R   PREMIUM  '
Revision       : '1.05'
Device seems to be: Generic mmc CD-RW.
Current: 0x0009
Profile: 0x0008 
Profile: 0x0009 (current)
Profile: 0x000A 
Using generic SCSI-3/mmc   CD-R/CD-RW driver (mmc_cdr).
Driver flags   : MMC-3 SWABAUDIO BURNFREE VARIREC GIGAREC FORCESPEED SPEEDREAD SINGLESESSION HIDECDR 
Supported modes: TAO PACKET SAO SAO/R96P SAO/R96R RAW/R16 RAW/R96P RAW/R96R
Drive buf size : 4802784 = 4690 KB
Drive DMA Speed: 27687 kB/s 157x CD 19x DVD
FIFO size      : 4194304 = 4096 KB
Track 01: data     0 MB         padsize:  598 KB
Total size:        0 MB (00:04.00) = 300 sectors
Lout start:        1 MB (00:06/00) = 300 sectors
Current Secsize: 2048
ATIP info from disk:
  Indicated writing power: 4
  Is not unrestricted
  Is not erasable
  Disk sub type: Medium Type A, low Beta category (A-) (2)
  ATIP start of lead in:  -12508 (97:15/17)
  ATIP start of lead out: 359845 (79:59/70)
Disk type:    Short strategy type (Phthalocyanine or similar)
Manuf. index: 22
Manufacturer: Ritek Co.
Single session is OFF.
Hide CDR is OFF.
Speed-Read is OFF.
GigaRec is off.
Blocks total: 359845 Blocks current: 359845 Blocks remaining: 359545
Forcespeed is OFF.
Power-Rec is ON.
Power-Rec write speed:     52x (recommended)
Starting to write CD/DVD at speed 52 in dummy SAO mode for multi session.
Last chance to quit, starting dummy write    0 seconds. Operation starts.
Waiting for reader process to fill input buffer ... input buffer ready.
BURN-Free is OFF.
Sending CUE sheet...
Writing pregap for track 1 at -150
[Panic after a few minutes]

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x3b0
fault code              = supervisor read, page not present
instruction pointer     = 0x20:0xc046f132
stack pointer           = 0x28:0xd44b0cc8
frame pointer           = 0x28:0xd44b0cd8
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 2 (g_event)
panic: from debugger
cpuid = 0
Uptime: 22m47s
Dumping 511 MB (2 chunks)
  chunk 0: 1MB (158 pages) ... ok
  chunk 1: 511MB (130800 pages) 495 479 463 447 431 415 399 383 367 351 335 319 303 287 271 255 239 2                       
23 207 191 175 159 143 127 111 95 79 63 47 31 15

#0  doadump () at pcpu.h:165
165     pcpu.h: No such file or directory.
        in pcpu.h
(kgdb) where 
#0  doadump () at pcpu.h:165
#1  0xc04f4154 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:397
#2  0xc04f4469 in panic (fmt=0xc066ff84 "from debugger") at /usr/src/sys/kern/kern_shutdown.c:553
#3  0xc044e3d9 in db_panic (addr=-1069092558, have_addr=0, count=-1, modif=0xd44b0aec "")
    at /usr/src/sys/ddb/db_command.c:435
#4  0xc044e370 in db_command (last_cmdp=0xc06d97e4, cmd_table=0x0, aux_cmd_tablep=0xc06a37cc, 
    aux_cmd_tablep_end=0xc06a37d0) at /usr/src/sys/ddb/db_command.c:349
#5  0xc044e438 in db_command_loop () at /usr/src/sys/ddb/db_command.c:455
#6  0xc044ffd9 in db_trap (type=12, code=0) at /usr/src/sys/ddb/db_main.c:221
#7  0xc050c160 in kdb_trap (type=12, code=0, tf=0xd44b0c88) at /usr/src/sys/kern/subr_kdb.c:473
#8  0xc064b8d8 in trap_fatal (frame=0xd44b0c88, eva=944) at /usr/src/sys/i386/i386/trap.c:832
#9  0xc064b61f in trap_pfault (frame=0xd44b0c88, usermode=0, eva=944)
    at /usr/src/sys/i386/i386/trap.c:752
#10 0xc064b289 in trap (frame=
      {tf_fs = 8, tf_es = 40, tf_ds = 40, tf_edi = 0, tf_esi = 0, tf_ebp = -733279016, tf_isp = -7332                       
79052, tf_ebx = -1044868864, tf_edx = 4, tf_ecx = 1, tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip =                       
 -1069092558, tf_cs = 32, tf_eflags = 590470, tf_esp = 6, tf_ss = -1044868864})
    at /usr/src/sys/i386/i386/trap.c:442
#11 0xc063992a in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#12 0x00000008 in ?? ()
#13 0x00000028 in ?? ()
#14 0x00000028 in ?? ()
#15 0x00000000 in ?? ()
#16 0x00000000 in ?? ()
#17 0xd44b0cd8 in ?? ()
#18 0xd44b0cb4 in ?? ()
#19 0xc1b89100 in ?? ()
#20 0x00000004 in ?? ()
#21 0x00000001 in ?? ()
#22 0x00000000 in ?? ()
#23 0x0000000c in ?? ()
#24 0x00000000 in ?? ()
#25 0xc046f132 in acd_geom_detach (arg=0xc1b89100, flag=0) at /usr/src/sys/dev/ata/atapi-cd.c:199
#26 0xc04bf9ef in one_event () at /usr/src/sys/geom/geom_event.c:198
#27 0xc04bfa79 in g_run_events () at /usr/src/sys/geom/geom_event.c:218
#28 0xc04c10dd in g_event_procbody () at /usr/src/sys/geom/geom_kern.c:141
#29 0xc04e1098 in fork_exit (callout=0xc04c1070 <g_event_procbody>, arg=0x0, frame=0xd44b0d38)
    at /usr/src/sys/kern/kern_fork.c:789
#30 0xc063998c in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:208

The problem is still present in todays RELENG_6.

Fix: 

More of a work around than a real fix, but not running:
g_wither_geom(cdp->gp, ENXIO) in atapi-cd.c if cdp is
already NULL seems to be enough to keep the system stable.

http://www.fabiankeil.de/sourcecode/freebsd/atapi-cd.c.patch
How-To-Repeat: Let the firmware of an atapi drive crash.

Similar panics have been reported with suspend/resume
cycles on some laptops, but I never saw these myself and
don't know if the patch will help there.
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2006-06-16 16:30:53 UTC
Responsible Changed
From-To: freebsd-bugs->sos

Over to maintainer for evaluation.
Comment 2 Fabian Keil 2008-01-12 12:51:15 UTC
I no longer use FreeBSD 5.3 or 6, but at least for FreeBSD 7
and later this seems to be fixed. I can no longer reproduce
the panic since about this commit:

> sos         2007-11-19 21:11:26 UTC
> 
>   FreeBSD src repository
> 
>   Modified files:
>     sys/dev/ata          atapi-cd.c atapi-fd.c atapi-tape.c 
>   Log:
>   Dont fumble the ivars on reinit, avoids panic on suspend/resume om some systems that looses thier devices.
>   
>   Patch by: jhb@
>   
>   Revision  Changes    Path
>   1.196     +0 -3      src/sys/dev/ata/atapi-cd.c
>   1.111     +0 -3      src/sys/dev/ata/atapi-fd.c
>   1.104     +0 -3      src/sys/dev/ata/atapi-tape.c

I didn't try to trigger the panic for at least half a
year though, so I can't rule out the possibility that
something else fixed it.

Fabian
Comment 3 Mark Linimon freebsd_committer freebsd_triage 2009-05-12 05:38:27 UTC
Responsible Changed
From-To: sos->freebsd-bugs

sos@ is not actively working on ATA-related PRs.
Comment 4 Jaakko Heinonen freebsd_committer freebsd_triage 2011-02-21 15:27:53 UTC
State Changed
From-To: open->closed

Apparently this has been already fixed.