Bug 24469

Summary: system hangs on scsi disk access error
Product: Base System Reporter: Brett G. Lemoine <bl>
Component: i386Assignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Only Me CC: bl
Priority: Normal    
Version: 4.2-RELEASE   
Hardware: Any   
OS: Any   

Description Brett G. Lemoine 2001-01-19 23:40:01 UTC
	Sporadically (5 times in the last two weeks, including 3 times
	on one day), I get the below errors on one of my two disks.

(da1:ahc0:0:1:0): SCB 0x1d - timed out while idle, SEQADDR == 0x5
STACK == 0x13, 0x174, 0x15e, 0x174
SXFRCTL0 == 0x80
SCB count = 110
QINFIFO entries: 34 18 46 1 19 31 52 20 33 9 3 67 57 45 0 30 54 22 50 40 23 8 36 2 32 44 35 5 17 11 28 10 101 15 51 26 6
Waiting Queue entries: 11:66
Disconnected Queue entries: 17:39 27:29
QOUTFIFO entries:
Sequencer Free SCB List: 20 2 0 28 14 10 29 31 15 24
 7 19 6 23 18 21 12 26 13 22 4 30 9 3 16 8 25 1 5
Pending list: 6 26 51 15 101 10 28 11 17 5 35 44 32
2 36 8 23 40 50 22 54 30 0 45 57 67 3 9 33 20 52 31 19 1 46 18 34 66 39 29
Kernel Free SCB list: 24 58 25 47 59 55 27 42 4 49 3 8 37 43 21 41 53 48 16 12 69 56 68 13 83 14 82 81 80 99 98 97 96 95 94 93 92 91 90 109 108 107 106 105 104 103 102 65 84 85 86 87 88 89 70 71 72 73 74 75 76 77 78 79 60 61 62 63 64 100
sg[0] - Addr 0x1a608800 : Length 1024
(da1:ahc0:0:1:0): SCB 29: Immediate reset.  Flags =
0x4040
(da1:ahc0:0:1:0): no longer in timeout, status = 34b
ahc0: Issued Channel A Bus Reset. 40 SCBs aborted

	After looking for similar problems in the GNATs database, I saw
	suggestions to disable tagged queueing, which I then did on
	both disks (using camcontrol).

	I then didn't see the problem for a while, so I thought that it
	had been taken care of, but today, I get the following:


(da0:ahc0:0:0:0): SCB 0x8 - timed out while idle, SEQADDR == 0x3e
STACK == 0x1, 0x1, 0x1, 0x1
SXFRCTL0 == 0x80
SCB count = 20
QINFIFO entries: 8 14
Waiting Queue entries:
Disconnected Queue entrties:
QOUTFIFO entries:
Sequencer Free SCB List: 1 0 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Pending list: 14 8
Kernel Free SCB list: 15 16 17 18 18 0 1 2 3 4 5 6 7 13 12 11 10
Untagged Q(0): 8
Untagged Q(1): 14
sg[0] - Addr 0x3c381000 : Length 4096
sg[1] - Addr 0x35ce2000 : Length 2048
(da0:ahc0:0:0:0): SCB 8: Immediate reset.  Flags = 0x6040
(da0:ahc0:0:0:0): no longer in timeout, status = 34b
ahc0: Issued Channel A Bus Reset. 2 SCBs aborted

(da0:ahc0:0:0:0): SCB 0x9 - timed out while idle, SEQADDR == 0x3e
STACK == 0x1, 0x1, 0x1, 0x1
SXFRCTL0 == 0x80
SCB count = 20
QINFIFO entries: 9 14
Waiting Queue entries:
Disconnected Queue entrties:
QOUTFIFO entries:
Sequencer Free SCB List: 1 0 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Pending list: 14 9
Kernel Free SCB list: 15 16 17 18 18 0 1 2 3 4 5 6 7 13 12 11 10
Untagged Q(0): 9
Untagged Q(1): 14
sg[0] - Addr 0x3c381000 : Length 4096
sg[1] - Addr 0x35ce2000 : Length 2048
(da0:ahc0:0:0:0): SCB 8: Immediate reset.  Flags = 0x6040
(da0:ahc0:0:0:0): no longer in timeout, status = 34b
ahc0: Issued Channel A Bus Reset. 2 SCBs aborted

	I'm somewhat new to PC-type hardware, so this may be nothing,
	but are the two channels on the ahc's _supposed_ to have the
	same IRQ?  I couldn't find a way to alter either ahc's IRQ
	from either the system or scsi bios, so I'm assuming they're
	setup correctly.  Given that there was no activity on the
	other bus (nothing in either the cd-writer or zip drive) at
	the time of the problems, I don't believe it's likely to be
	simply an IRQ issue.

How-To-Repeat: 
	The problems seem to occur most frequenly when there's heavy
	disk activity, but I can't seem to reproduce it on demand.
Comment 1 Vallo Kallaste 2001-01-23 06:37:52 UTC
On Fri, Jan 19, 2001 at 03:36:27PM -0800, "Brett G. Lemoine" <bl@incyte.com> wrote:

> >Category:       i386
> >Synopsis:       system hangs on scsi disk access error
> >Confidential:   no
> 
> 	TYAN Thunderbolt S1837 motherboard w/ onboard Adaptec
> 	AIC-7896 dual channel Ultra2 LVD SCSI
> 
> 	I'm somewhat new to PC-type hardware, so this may be nothing,
> 	but are the two channels on the ahc's _supposed_ to have the
> 	same IRQ?  I couldn't find a way to alter either ahc's IRQ
> 	from either the system or scsi bios, so I'm assuming they're
> 	setup correctly.  Given that there was no activity on the
> 	other bus (nothing in either the cd-writer or zip drive) at
> 	the time of the problems, I don't believe it's likely to be
> 	simply an IRQ issue.

I have same mobo and yes, both (ahc) sit on the IRQ 10. Two SE disks
and no problems. I'm running -current, thought.
-- 

Vallo Kallaste
vallo@matti.ee
Comment 2 Justin T. Gibbs 2001-02-05 23:34:59 UTC
Have you tried to reproduce this with 4.2-stable?  I believe that
this problem has been resolved there.

--
Justin
Comment 3 Matt Jacob freebsd_committer freebsd_triage 2001-10-02 03:05:51 UTC
State Changed
From-To: open->feedback

Is this still a problem?
Comment 4 wilko freebsd_committer freebsd_triage 2001-11-24 11:43:36 UTC
State Changed
From-To: feedback->closed

Timeout polling for feedback. Orig PR dates from Jan 2001, 
mjacob re-polled on Oct 1.