Bug 23580

Summary: Second SCSI Adapter (Adaptec 39160) still hanging system
Product: Base System Reporter: James F. Hranicky <jfh>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 4.2-STABLE   
Hardware: Any   
OS: Any   

Description James F. Hranicky 2000-12-16 05:10:01 UTC
	[ Note: I sent a message to freebsd-questions earlier, but have since done
	  more work, and feel a PR is necessary --jfh ]

	Installation of second SCSI adapter (Adaptec 39160) hangs the system. Here's
	a short synopsis of things tried:

	Initial installation:

	  - upon installing the new card and booting, it was discovered that the 
	    system was hung part of the way through boot, after the probe of the
	    SCSI adapters, but before the probe of the disks. Upon further investigation
	    it was discovered that the first adapter was sharing IRQ 11 for both
	    channels, and the second adapter was taking IRQs 5 and 11 for its channels.
	    At first, an IRQ conflict was suspected. The boot process halts after
	    the probe of the parallel port:

plip0: <PLIP network interface> on ppbus0
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0

	    At this point, scroll lock and serial break have no effect, while 
	    <Ctrl><Alt><Del> (after plugging the keyboard bak in) does boot the machine

	  - After searching the archives, I discovered that FreeBSD could use IRQs 
	    higher than 15 by enabling APIC_IO (which appears to require SMP support),
	    so I compiled an SMP kernel for the machine. Here are the (edited/annotated
	    slightly) boot messages for that kernel:

-----------------------------------------------------------------

FreeBSD 4.2-STABLE #0: Fri Dec 15 13:29:02 EST 2000
    root@palm.cise.ufl.edu:/private/freebsd-src/src/sys/compile/CISEKERN.SMP
Timecounter "i8254"  frequency 1193182 Hz
CPU: Pentium III/Pentium III Xeon/Celeron (646.67-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x683  Stepping = 3
  Features=0x387fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,PN,MMX,FXSR,SSE>
real memory  = 268369920 (262080K bytes)
avail memory = 256520192 (250508K bytes)
Programming 24 pins in IOAPIC #0
IOAPIC #0 intpin 2 -> irq 0
FreeBSD/SMP: Multiprocessor motherboard
 cpu0 (BSP): apic id:  1, version: 0x00040011, at 0xfee00000
 io0 (APIC): apic id:  0, version: 0x00170011, at 0xfec00000
Preloaded elf kernel "kernel.smp" at 0xc0452000.
Pentium Pro MTRR support enabled
md0: Malloc disk
npx0: <math processor> on motherboard
npx0: INT 16 interface
pcib0: <Intel 82443GX host to PCI bridge> on motherboard
pci0: <PCI bus> on pcib0
pcib2: <Intel 82443GX (440 GX) PCI-PCI (AGP) bridge> at device 1.0 on pci0
pci1: <PCI bus> on pcib2
pcib3: <PCI to PCI bridge (vendor=1011 device=0023)> at device 15.0 on pci1
pci2: <PCI bus> on pcib3

[ Here are the lines for the new SCSI card...this card takes ahc[0-1]
  when installed, normally, the internal card has ahc[0-1] ]

ahc0: <Adaptec 3960D Ultra160 SCSI adapter> port 0x2000-0x20ff mem 0xf4100000-0xf4100fff irq 18 at device 11.0 on pci0
aic7899: Wide Channel A, SCSI Id=7, 32/255 SCBs
ahc1: <Adaptec 3960D Ultra160 SCSI adapter> port 0x2400-0x24ff mem 0xf4101000-0xf4101fff irq 23 at device 11.1 on pci0
aic7899: Wide Channel B, SCSI Id=7, 32/255 SCBs
ahc2: <Adaptec aic7896/97 Ultra2 SCSI adapter> port 0x2800-0x28ff mem 0xf4102000-0xf4102fff irq 19 at device 12.0 on pci0
aic7896/97: Wide Channel A, SCSI Id=7, 32/255 SCBs
ahc3: <Adaptec aic7896/97 Ultra2 SCSI adapter> port 0x2c00-0x2cff mem 0xf4103000-0xf4103fff irq 19 at device 12.1 on pci0
aic7896/97: Wide Channel B, SCSI Id=7, 32/255 SCBs
fxp0: <Intel Pro 10/100B/100+ Ethernet> port 0x3000-0x303f mem 0xf4000000-0xf40fffff,0xf4104000-0xf4104fff irq 21 at device 14.0 on pci0
fxp0: Ethernet address 00:d0:b7:89:0e:24
isab0: <Intel 82371AB PCI to ISA bridge> at device 18.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel PIIX4 ATA33 controller> port 0x3060-0x306f at device 18.1 on pci0
ata0: at 0x1f0 irq 14 on atapci0
ata1: at 0x170 irq 15 on atapci0
pci0: <Intel 82371AB/EB (PIIX4) USB controller> at 18.2 irq 21
Timecounter "PIIX"  frequency 3579545 Hz
chip1: <Intel 82371AB Power management controller> port 0x1040-0x104f at device 18.3 on pci0
pci0: <Cirrus Logic GD5480 SVGA controller> at 20.0
pcib1: <Intel 82443GX host to AGP bridge> on motherboard
pci3: <PCI bus> on pcib1
eisa0: <EISA bus> on motherboard
eisa0: unknown card @@@0000 (0x00000000) at slot 2
fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
sc0: <System console> on isa0
sc0: VGA <16 virtual consoles, flags=0x0>
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 16550A, console
sio1 at port 0x2f8-0x2ff irq 3 on isa0
sio1: type 16550A
ppc0: <Parallel port> at port 0x378-0x37f irq 7 on isa0
ppc0: Generic chipset (ECP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/8 bytes threshold
plip0: <PLIP network interface> on ppbus0
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0
APIC_IO: Testing 8254 interrupt delivery
APIC_IO: routing 8254 via IOAPIC #0 intpin 2
IPsec: Initialized Security Association Processing.
ata0-slave: ata_command: timeout waiting for intr
ata0-slave: identify failed
acd0: CDROM <CD-540E> at ata0-master using PIO4
Waiting 5 seconds for SCSI devices to settle

[ The system hangs here for a 60-90 seconds, then the following messages
  show up ]

(probe45:ahc3:0:0:0): SCB 0x9 - timed out while idle, SEQADDR == 0x3e

[ N.B. : this is the B channel of the internal SCSI card, which works fine
  when the second SCSI card is pulled out again ]

STACK == 0x0, 0x0, 0x0, 0x1
SXFRCTL0 == 0x80
SCB count = 20
QINFIFO entries: 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 
Waiting Queue entries: 
Disconnected Queue entries: 
QOUTFIFO entries: 
Sequencer Free SCB List: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
Pending list: 15 16 17 18 19 0 1 2 3 4 5 6 7 8 9 
Kernel Free SCB list: 13 12 11 10 
Untagged Q(0): 9 
Untagged Q(1): 8 
Untagged Q(2): 7 
Untagged Q(3): 6 
Untagged Q(4): 18 
Untagged Q(5): 17 
Untagged Q(6): 5 
Untagged Q(8): 4 
Untagged Q(9): 16 
Untagged Q(10): 3 
Untagged Q(11): 2 
Untagged Q(12): 15 
Untagged Q(13): 1 
Untagged Q(14): 0 
Untagged Q(15): 19 
sg[0] - Addr 0xd9b0684 : Length 36
(probe45:ahc3:0:0:0): SCB 9: Immediate reset.  Flags = 0x6040
(probe45:ahc3:0:0:0): no longer in timeout, status = 34b
ahc3: Issued Channel A Bus Reset. 15 SCBs aborted
(probe45:ahc3:0:0:0): SCB 0xe - timed out while idle, SEQADDR == 0x3e
STACK == 0x0, 0x0, 0x1, 0x1
SXFRCTL0 == 0x80
SCB count = 20
QINFIFO entries: 14 15 16 17 18 19 0 1 2 3 4 5 6 7 8 
Waiting Queue entries: 
Disconnected Queue entries: 
QOUTFIFO entries: 
Sequencer Free SCB List: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
Pending list: 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 
Kernel Free SCB list: 13 12 11 10 
Untagged Q(0): 14 
Untagged Q(1): 15 
Untagged Q(2): 16 
Untagged Q(3): 17 
Untagged Q(4): 5 
Untagged Q(5): 6 
Untagged Q(6): 18 
Untagged Q(8): 19 
Untagged Q(9): 7 
Untagged Q(10): 0 
Untagged Q(11): 1 
Untagged Q(12): 8 
Untagged Q(13): 2 
Untagged Q(14): 3 
Untagged Q(15): 4 
sg[0] - Addr 0xd9b0684 : Length 36
(probe45:ahc3:0:0:0): SCB 14: Immediate reset.  Flags = 0x6040
(probe45:ahc3:0:0:0): no longer in timeout, status = 34b
ahc3: Issued Channel A Bus Reset. 15 SCBs aborted
		
-----------------------------------------------------------------

	  - These probe/timeout messages continue on at this point. Interestingly enough,
	    scroll-lock works at this point, but serial break doesn't.

	  - As another data point, I disabled the parallel port in the BIOS, compiled
	    a new (non-smp) kernel without parallel support, and used the L440GX+
	    CDROM to set the internal SCSI card's IRQ to 7, then booted. The system hung
	    after the probes of the serial ports. It's very possible this is the same place
	    it was hanging before, as the parallel port probes were obviously absent

	  - What convinced me that this is a FreeBSD problem was booting off a Linux
	    emergency floppy, and watching probe the SCSI cards and then the drives 
	    themselves (which FBSD never got to with the second card in). I was both
	    relieved and disturbed at the same time.

Fix: 

I may try debugging the kernel, though it's not something I've ever done,
	and I don't really know what I'm looking for. Getting this working is a pretty 
	big deal, and at this point I'm willing to try just about anything.
How-To-Repeat: 
	I'm not completely sure, but getting the same hardware setup and trying 4.2
	STABLE might work :->
Comment 1 nouser 2000-12-19 17:40:47 UTC
I've discovered a few things:

	- if I disable either of the internal SCSI busses, the machine
	  will boot normally. Is there something interesting about 4 SCSI
	  busses in one machine?

	- booting from -CURRENT (20001218) floppies makes it through the
    	  boot process until after the "Waiting 15 seconds for SCSI devices
	  to settle" line, then gives the same timeout errors as a 4.2-STABLE
	  SMP kernel. In other words, it doesn't hang after probing the ||
	  port like the non-SMP 4.2 kernel does.

	- enabling CAMDEBUG in the 4.2-STABLE SMP kernel has given me the
following
	  messages:


-----------------------------------------------------------------------

[ trimmed ]

(probe14:ahc3:0:15:0): INQUIRY. CDB: 12 0 0 0 24 0 
(probe14:ahc3:0:15:0): ahc_action
(probe0:ahc3:0:0:0): ahc_action
(probe1:ahc3:0:1:0): SCB 0x9 - timed out while idle, SEQADDR == 0x3e
STACK == 0x0, 0x0, 0x0, 0x1
SXFRCTL0 == 0x80
SCB count = 20
QINFIFO entries: 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 
Waiting Queue entries: 
Disconnected Queue entries: 
QOUTFIFO entries: 
Sequencer Free SCB List: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26 27 28 29 30 31 
Pending list: 15 16 17 18 19 0 1 2 3 4 5 6 7 8 9 
Kernel Free SCB list: 13 12 11 10 
Untagged Q(0): 15 
Untagged Q(1): 9 
Untagged Q(2): 8 
Untagged Q(3): 7 
Untagged Q(4): 6 
Untagged Q(5): 5 
Untagged Q(6): 4 
Untagged Q(8): 3 
Untagged Q(9): 2 
Untagged Q(10): 1 
Untagged Q(11): 0 
Untagged Q(12): 19 
Untagged Q(13): 18 
Untagged Q(14): 17 
Untagged Q(15): 16 
sg[0] - Addr 0xd6eb284 : Length 36
(probe1:ahc3:0:1:0): SCB 9: Immediate reset.  Flags = 0x6040
(probe1:ahc3:0:1:0): ahc_done - scb 9
(probe1:ahc3:0:1:0): no longer in timeout, status = 34b
(probe1:ahc3:0:1:0): xpt_done
(probe2:ahc3:0:2:0): ahc_done - scb 8
(probe2:ahc3:0:2:0): xpt_done
-----------------------------------------------------------------------

Sometimes the timeout is in xpt_release_path as well as ahc_action:

-------
[...]
(xpt0:ahc3:0:1:0): xpt_release_path
(probe2:ahc3:0:2:0): SCB 0xe - timed out while idle, SEQADDR == 0x3e
[...]
-------

	- I tried #defining AHC_DEBUG in the
/sys/dev/aic7xxx/aic7xxx_freebsd.c,
	  but I got several compiler errors, a couple I could fix (with an
extern int
	  declaraion), and some I couldn't:

	  o in function /sys/dev/aic7xxx/aic7xxx.c:ahc_calc_residual, there is
	    a reference to an "ahc" variable within the #define AHC_DEBUG block
	    that isn't defined in the function itself

At this point, I do have a workaround: disable channel A on the internal
SCSI 
bus and hook the boot drive to the new card. This should also help
prevent 
the renumbering of my boot disk as I add drives to the external busses (
I suppose this happens because it gets probed first (why?)). For a
while, though,
(probably until the second week of Jan) I do have time to help someone
do some 
debugging on this problem if anyone's interested.
Comment 2 Justin T. Gibbs 2001-01-06 05:29:58 UTC
>>Synopsis:       Second SCSI Adapter (Adaptec 39160) still hanging system

...

>eisa0: <EISA bus> on motherboard
>eisa0: unknown card @@@0000 (0x00000000) at slot 2

Disable eisa support in your kernel config and your hang will
likely go away.  The problem with EISA probes will likely be fixed
in -current soon.  I'm not sure if those changes will be ported
back to -stable as they may be somewhat invasive.

--
Justin
Comment 3 nouser 2001-01-25 22:48:22 UTC
Disabling eisa support appears to have fixed
the problem. This PR can be closed out.

Thanks very much for the help.

Jim
Comment 4 dwmalone freebsd_committer freebsd_triage 2001-01-26 10:05:38 UTC
State Changed
From-To: open->closed

Problem fixed by disabling EISA support, so closed at 
submitters request.