Summary: | System crash after SCSI DAT tape access. | ||
---|---|---|---|
Product: | Base System | Reporter: | unger <unger> |
Component: | kern | Assignee: | Justin T. Gibbs <gibbs> |
Status: | Closed FIXED | ||
Severity: | Affects Only Me | ||
Priority: | Normal | ||
Version: | 2.2.1-RELEASE | ||
Hardware: | Any | ||
OS: | Any |
Description
unger
1997-08-18 18:20:03 UTC
Responsible Changed From-To: freebsd-bugs->gibbs I'll probably end up fixing this. I'm seeing the same problem using FreeBSD 2.2.5. Here's the portion of my dmesg where the drives are detected. Note that I'm using both the wide and narrow SCSI chain (which shouldn't matter). bt0 <Buslogic 946 SCSI host adapter> rev 6 int a irq 15 on pci0:9 bt0: Bt956C/ 0-(32bit) bus bt0: reading board settings, busmastering, int=15 bt0: version 4.28A, async only, no parity, 32 mbxs, 32 ccbs bt0: targ 0 async bt0: targ 1 async bt0: targ 6 async bt0: Using Strict Round robin scheme bt0 waiting for scsi devices to settle (bt0:0:0): "SEAGATE ST31230W 0510" type 0 fixed SCSI 2 sd0(bt0:0:0): Direct-Access 1010MB (2069860 512 byte sectors) (bt0:1:0): "HP HP35480A A" type 1 removable SCSI 2 st0(bt0:1:0): Sequential-Access density code 0x13, drive empty (bt0:6:0): "SEAGATE ST34371N 0280" type 0 fixed SCSI 2 sd1(bt0:6:0): Direct-Access 4148MB (8496960 512 byte sectors) Exactly the same fault - the controller locks (SCSI chain dead, get neato console messages like "bt0: Abort Operation has timed out" and "bt0: not responding to commands") and then have to reboot the machine. Running processes that don't need to access the disk keep working. Because disk access is locked it's not logging these errors - all I see in the message log is "bt0: Try to abort". I did not have this problem when using an Adaptec 2940 and a DAT drive, it's only manifested since I went to the Buslogic card. Are there any solutions besides "try playing with the timeouts in the st driver" or going to a different OS? Thanks, Bryan I've spent a little time working on this problem myself. I got the kernel sources and went through st.c and increased command timeouts by a factor of 4. Then I enabled debugging. Here is a trace from a 'dump 0fB /dev/rst0 9999999 /' command. Since the computer usually crashes I'm running these test in single use mode with all file systems mounted read only. [ TIMEOUTFACTOR = 4, scsi_prevent not changed.] ... ------------------------------ 000: 6f 72 74 09 31 39 39 39 2f 75 64 70 20 20 20 23 016: 63 69 73 63 6f 20 69 64 65 6e 74 69 66 69 63 61 032: 74 69 6f 6e 20 70 6f 72 74 0a 63 61 6c 6c 62 6f 048: 6f 6b 09 32 30 30 30 2f 74 63 70 0a 63 61 6c 6c ------------------------------ st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): ststrategy st0(bt0:5:0): 10240 bytes @ blk54400 st0(bt0:5:0): ststart st0(bt0:5:0): scsi_cmd st0(bt0:5:0): scsi_done st0(bt0:5:0): command: a,0,0,28,0,0-[10240 bytes] ------------------------------ 000: 05 00 00 00 b1 05 ce 34 00 00 00 00 01 00 00 00 016: 40 6a 00 00 ff 0e 00 00 6c ea 00 00 03 e6 04 6c 032: a4 81 01 00 00 00 00 00 9d ef 00 00 00 00 00 00 048: 80 03 ce 34 00 00 00 00 2b a1 8b 34 00 00 00 00 ------------------------------ st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): stclose: Closing device st0(bt0:5:0): scsi_cmd st0(bt0:5:0): scsi_done st0(bt0:5:0): command: 10,0,0,0,1,0-[0 bytes] st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): scsi_cmd ccb:f07a4e00 op:0 cmdlen:6 senlen:32 datlen:0 hstat:0 tstat:0 flags:1 bt0: Try to abort ccb:f07a4e00 op:0 cmdlen:6 senlen:32 datlen:0 hstat:0 tstat:0 flags:2 bt0: Abort Operation has timed out st0(bt0:5:0): scsi_done st0(bt0:5:0): command: 1e,0,0,0,0,0-[0 bytes] st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): unmounting st0(bt0:5:0): scsi_cmd st0(bt0:5:0): scsi_done st0(bt0:5:0): command: 10,0,0,0,1,0-[0 bytes] st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): scsi_cmd st0(bt0:5:0): scsi_done st0(bt0:5:0): command: 1,0,0,0,0,0-[0 bytes] st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): scsi_cmd st0(bt0:5:0): scsi_done st0(bt0:5:0): command: 0,0,0,0,0,0-[0 bytes] st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): scsi_done st0(bt0:5:0): command: 0,0,0,0,0,0-[0 bytes] st0(bt0:5:0): scsi_cmd st0(bt0:5:0): scsi_done st0(bt0:5:0): command: 0,0,0,0,0,0-[0 bytes] st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): mounting st0(bt0:5:0): scsi_cmd st0(bt0:5:0): scsi_done st0(bt0:5:0): command: 1b,0,0,0,1,0-[0 bytes] st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): scsi_cmd st0(bt0:5:0): scsi_done st0(bt0:5:0): command: 0,0,0,0,0,0-[0 bytes] st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): scsi_cmd st0(bt0:5:0): scsi_done st0(bt0:5:0): command: 5,0,0,0,0,0-[6 bytes] ------------------------------ 000: 00 ff ff ff 00 01 ------------------------------ st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): scsi_cmd st0(bt0:5:0): scsi_done st0(bt0:5:0): command: 1a,0,0,0,c,0-[12 bytes] ------------------------------ 000: 0b 00 10 08 13 00 00 00 00 00 00 00 ------------------------------ st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): starting block mode decision st0(bt0:5:0): scsi_cmd st0(bt0:5:0): scsi_done st0(bt0:5:0): command: 15,0,0,0,c,0-[12 bytes] ------------------------------ 000: 00 00 10 08 13 00 00 00 00 00 00 00 ------------------------------ st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): scsi_cmd st0(bt0:5:0): scsi_done st0(bt0:5:0): command: 1e,0,0,0,1,0-[0 bytes] st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): Open complete st0(bt0:5:0): stopen: dev=0xe00 (unit 0) result 0 st0(bt0:5:0): stclose: Closing device st0(bt0:5:0): scsi_cmd st0(bt0:5:0): scsi_done st0(bt0:5:0): command: 1e,0,0,0,0,0-[0 bytes] st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): unmounting st0(bt0:5:0): scsi_cmd st0(bt0:5:0): scsi_done st0(bt0:5:0): command: 1,0,0,0,0,0-[0 bytes] st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart Note that as the system closes the device there is a timeout. I believe this is the error that I've been seeing. Reading kern/1467 I found the suggestion that the timeout in st_prevent was too low. I increased it from 5000 to 300000 and ran the dump again. Success!. So I tried to do 2 dumps onto the same tape using /dev/nrst0. On the second dump I ran into problems. On the first test the scsi device seemed to stop responding to commands. The system ended up in a loop where dump was receiving an error, trying a new operation, receiving an erro, trying a new op... so the second test I aborted dump at the first sign of trouble. Here is the log: 09 016: 31 35 30 33 2f 74 63 70 20 20 20 20 23 44 61 74 032: 61 62 65 61 6d 0a 69 6d 74 63 2d 6d 63 73 09 31 048: 35 30 33 2f 75 64 70 20 20 20 20 23 44 61 74 61 ------------------------------ st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): ststrategy st0(bt0:5:0): 10240 bytes @ blk54420 st0(bt0:5:0): ststart st0(bt0:5:0): scsi_cmd st0(bt0:5:0): scsi_done st0(bt0:5:0): command: a,0,0,28,0,0-[10240 bytes] ------------------------------ 000: 20 20 23 56 41 54 20 64 65 66 61 75 6c 74 20 63 016: 6f 6e 74 72 6f 6c 0a 75 64 74 5f 6f 73 09 09 33 032: 39 30 30 2f 74 63 70 20 20 20 23 55 6e 69 64 61 048: 74 61 20 55 44 54 20 4f 53 0a 75 64 74 5f 6f 73 ------------------------------ st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): stclose: Closing device st0(bt0:5:0): scsi_cmd st0(bt0:5:0): scsi_done st0(bt0:5:0): command: 10,0,0,0,1,0-[0 bytes] st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): scsi_cmd st0(bt0:5:0): scsi_done st0(bt0:5:0): command: 1e,0,0,0,0,0-[0 bytes] st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): scsi_cmd st0(bt0:5:0): scsi_done st0(bt0:5:0): command: 0,0,0,0,0,0-[0 bytes] st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): scsi_cmd st0(bt0:5:0): scsi_done st0(bt0:5:0): command: 0,0,0,0,0,0-[0 bytes] st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): scsi_cmd st0(bt0:5:0): scsi_done st0(bt0:5:0): command: 1e,0,0,0,1,0-[0 bytes] st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): Open complete st0(bt0:5:0): stopen: dev=0xe01 (unit 0) result 0 st0(bt0:5:0): stclose: Closing device st0(bt0:5:0): scsi_cmd st0(bt0:5:0): scsi_done st0(bt0:5:0): command: 1e,0,0,0,0,0-[0 bytes] st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): scsi_cmd st0(bt0:5:0): scsi_done st0(bt0:5:0): command: 0,0,0,0,0,0-[0 bytes] st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): scsi_cmd st0(bt0:5:0): scsi_done st0(bt0:5:0): command: 0,0,0,0,0,0-[0 bytes] st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): scsi_cmd st0(bt0:5:0): scsi_done st0(bt0:5:0): command: 1e,0,0,0,1,0-[0 bytes] st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): Open complete st0(bt0:5:0): stopen: dev=0xe01 (unit 0) result 0 st0(bt0:5:0): ststrategy st0(bt0:5:0): 10240 bytes @ blk0 st0(bt0:5:0): ststart st0(bt0:5:0): scsi_cmd st0(bt0:5:0): scsi_done st0(bt0:5:0): command: a,0,0,28,0,0-[10240 bytes] ------------------------------ 000: 01 00 00 00 2b 14 ce 34 00 00 00 00 01 00 00 00 016: 00 00 00 00 00 00 00 00 6c ea 00 00 d3 5c 33 4c 032: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 048: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ------------------------------ st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): ststrategy st0(bt0:5:0): 10240 bytes @ blk20 st0(bt0:5:0): ststart st0(bt0:5:0): scsi_cmd st0(bt0:5:0): scsi_done st0(bt0:5:0): command: a,0,0,28,0,0-[10240 bytes] ------------------------------ 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 016: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 032: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 048: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ------------------------------ st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): ststrategy st0(bt0:5:0): 10240 bytes @ blk40 st0(bt0:5:0): ststart st0(bt0:5:0): scsi_cmd st0(bt0:5:0): scsi_done st0(bt0:5:0): command: a,0,0,28,0,0-[10240 bytes] ------------------------------ 000: 02 00 00 00 2b 14 ce 34 00 00 00 00 01 00 00 00 016: 14 00 00 00 08 00 00 00 6c ea 00 00 4b e7 44 7d 032: e8 41 02 00 00 00 00 00 00 02 00 00 00 00 00 00 048: 4c b1 cd 34 00 00 00 00 be 71 8b 34 00 00 00 00 ------------------------------ st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): ststrategy st0(bt0:5:0): 10240 bytes @ blk60 st0(bt0:5:0): ststart st0(bt0:5:0): scsi_cmd st0(bt0:5:0): scsi_done st0(bt0:5:0): command: a,0,0,28,0,0-[10240 bytes] ------------------------------ 000: 0d 00 00 00 0c 00 04 01 2e 00 00 00 02 00 00 00 016: 0c 00 04 02 2e 2e 00 00 0e 00 00 00 18 00 08 0f 032: 6c 6f 63 61 74 65 2e 64 61 74 61 62 61 73 65 00 048: 3d 00 00 00 14 00 08 0a 6d 6f 75 6e 74 64 62 74 ------------------------------ st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): ststrategy st0(bt0:5:0): 10240 bytes @ blk80 st0(bt0:5:0): ststart st0(bt0:5:0): scsi_cmd st0(bt0:5:0): scsi_done st0(bt0:5:0): command: a,0,0,28,0,0-[10240 bytes] ------------------------------ 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 016: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 032: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 048: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ------------------------------ st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): ststrategy st0(bt0:5:0): 10240 bytes @ blk100 st0(bt0:5:0): ststart st0(bt0:5:0): scsi_cmd st0(bt0:5:0): scsi_done st0(bt0:5:0): command: a,0,0,28,0,0-[10240 bytes] ------------------------------ 000: 02 00 00 00 2b 14 ce 34 00 00 00 00 01 00 00 00 016: 32 00 00 00 1c 00 00 00 6c ea 00 00 15 63 7b 7d 032: ed 41 0a 00 00 00 00 00 00 02 00 00 00 00 00 00 048: 3d ee cd 34 00 00 00 00 8e b8 4c 34 00 00 00 00 ------------------------------ st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): ststrategy st0(bt0:5:0): 10240 bytes @ blk120 st0(bt0:5:0): ststart st0(bt0:5:0): scsi_cmd st0(bt0:5:0): scsi_done st0(bt0:5:0): command: a,0,0,28,0,0-[10240 bytes] ------------------------------ 000: 1f 00 00 00 0c 00 04 01 2e 00 00 00 1c 00 00 00 016: f4 01 04 02 2e 2e 00 00 da 02 00 00 14 00 08 0a 032: 71 66 49 41 41 30 30 33 31 34 00 f2 db 02 00 00 048: d4 01 08 0a 78 66 49 41 41 30 30 33 31 34 00 f2 ------------------------------ st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): ststrategy st0(bt0:5:0): 10240 bytes @ blk140 st0(bt0:5:0): ststart st0(bt0:5:0): scsi_cmd st0(bt0:5:0): scsi_done st0(bt0:5:0): command: a,0,0,28,0,0-[10240 bytes] ------------------------------ 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 016: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 032: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 048: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ------------------------------ st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): ststrategy st0(bt0:5:0): 10240 bytes @ blk160 st0(bt0:5:0): ststart st0(bt0:5:0): scsi_cmd ccb:f07a7600 op:2 cmdlen:10 senlen:32 datlen:24 hstat:0 tstat:0 flags:1 ccb:f0773e00 op:2 cmdlen:6 senlen:32 datlen:16 hstat:0 tstat:0 flags:1 bt0: Try to abort ccb:f07a7600 op:2 cmdlen:10 senlen:32 datlen:24 hstat:2a tstat:0 flags:1 ccb:f0773e00 op:2 cmdlen:6 senlen:32 datlen:16 hstat:0 tstat:0 flags:1 bt0: Try to abort [ it is somewhere in here that I interupt 'dump' ] st0(bt0:5:0): scsi_done st0(bt0:5:0): command: a,0,0,28,0,0-[10240 bytes] ------------------------------ 000: 02 00 00 00 2b 14 ce 34 00 00 00 00 01 00 00 00 016: 50 00 00 00 26 00 00 00 6c ea 00 00 01 42 83 7d 032: fd 41 02 00 00 00 00 00 00 02 00 00 00 00 00 00 048: 81 0e ce 34 00 00 00 00 8e b8 4c 34 00 00 00 00 ------------------------------ st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): stclose: Closing device st0(bt0:5:0): scsi_cmd st0(bt0:5:0): scsi_done st0(bt0:5:0): command: 10,0,0,0,1,0-[0 bytes] code70 valid0 seg0 key4 ili0 eom0 fmark0 info: 0 0 0 0 followed by 10 extra bytes extra: 0 0 0 0 0 0 0 0 0 0 st0(bt0:5:0): calling private err_handler() st0(bt0:5:0): private err_handler() returned -1 st0(bt0:5:0): HARDWARE FAILURE asc:44,0 Internal target failure st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart st0(bt0:5:0): scsi_cmd st0(bt0:5:0): scsi_done st0(bt0:5:0): command: 1e,0,0,0,0,0-[0 bytes] code70 valid0 seg0 key4 ili0 eom0 fmark0 info: 0 0 0 0 followed by 10 extra bytes extra: 0 0 0 0 0 0 0 0 0 0 st0(bt0:5:0): calling private err_handler() st0(bt0:5:0): private err_handler() returned -1 st0(bt0:5:0): calling private start() st0(bt0:5:0): ststart So, I don't know. I don't know much about scsi and to my untrained eye it is not at all clear what is going on. I'll keep working on this but I sure could use some suggestions. Thanks, Tom Unger unger@raindrop.seaslug.org Seems a similar problem is described for exabyte streamers: http://www.exabyte.com/suppserv/techsupp/8mm/misc/in0181.html (it's a page for using those streamers under linux, but I use freebsd, and the instructions seems quite compitable :-) /Peter State Changed From-To: open->closed Both the tape and Buslogic MultiMaster driver are new works in 3.0R. Please retest these scenarios on this equipment and file new bug reports if your problems persist. |