Bug 19579

Summary: ahc/aic7892: SCSI timeouts with heavy writes
Product: Base System Reporter: thz <thz>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 4.0-STABLE   
Hardware: Any   
OS: Any   

Description thz 2000-06-29 15:40:05 UTC
	During heavy writes appear timeout messages from ahc driver (see below).
	The driver will never recover and falls from one timeout to the next
	one until pack gets invalidated.

	Changing drives with known good ones does NOT help
	(even different type/vendor)

	Changing cables/terminators with known good ones does NOT help

	Drive and cables work fine on an ASUS P2B-S with aic7890 controller.

	Disabling writeback cache in the drive does NOT help

	The problem shows up ONLY for fast massive writes. The disk
	can be read with dd if=/dev/da0s1 of=/dev/null without
	problems.  During the time of last write on scsi-bus and
	the timeout, the disk seems to stay selected: the activity
	led on the drive stays on until it gets the BDR from ahc
	driver.

	timeouts disappear with:
	- disabling tagged queuing
	- disabling processor caching or changing it to writethru in BIOS
	  (really to slow down the processor)

	See also to compare:  PR misc/18786, PR i386/19226

Driver messages:

(da0:ahc0:0:0:0): SCB 0x2c - timed out while idle, SEQADDR == 0xa
(da0:ahc0:0:0:0): Queuing a BDR SCB
(da0:ahc0:0:0:0): SCB 0x2c - timed out in Data-out phase, SEQADDR == 0x167
(da0:ahc0:0:0:0): no longer in timeout, status = 34b
ahc0: Issued Channel A Bus Reset. 49 SCBs aborted
(da0:ahc0:0:0:0): SCB 0x2c - timed out while idle, SEQADDR == 0x9
(da0:ahc0:0:0:0): Queuing a BDR SCB
(da0:ahc0:0:0:0): Bus Device Reset Message Sent
(da0:ahc0:0:0:0): no longer in timeout, status = 34b
ahc0: Bus Device Reset on A:0. 48 SCBs aborted
ahc0:A:0: ahc_intr - referenced scb not valid during seqint 0x71 scb(14)
ahc0: WARNING no command for scb 14 (cmdcmplt)
QOUTPOS = 113
(da0:ahc0:0:0:0): SCB 0x2c - timed out while idle, SEQADDR == 0xa
(da0:ahc0:0:0:0): Queuing a BDR SCB
(da0:ahc0:0:0:0): Bus Device Reset Message Sent
(da0:ahc0:0:0:0): no longer in timeout, status = 34b
ahc0: Bus Device Reset on A:0. 15 SCBs aborted
ahc0:A:0: ahc_intr - referenced scb not valid during seqint 0x71 scb(14)
ahc0: WARNING no command for scb 14 (cmdcmplt)
QOUTPOS = 188
...
Invalidating pack
...
devstat_end_transaction: HELP!! busy_count for da0 if < 0
...

Fix: 

Workaround:
	disabling tagged queuing eliminates the problem at the cost of
	dropping the write performance.
How-To-Repeat: 
	- iozone 300
	- dd if=/dev/zero of=testfile count=10000b
Comment 1 thz 2000-08-29 09:22:48 UTC
This problem was caused by the type of motherboard (Supermicro PIIIDM3).
Different instances of the same MB produce the same problem, (not only
this one, ECC problems also). So we changed the manufacturer and use the
Adaptec 29160 controller, which has the same chip (aic7892) and uses the
same driver, Seagate Disk is unchanged and it work like a charm for over
a month now. Conclusion: I think the PR can be closed.
Comment 2 Sheldon Hearn freebsd_committer freebsd_triage 2000-08-29 10:01:39 UTC
State Changed
From-To: open->closed

Thanks for your feedback!