| Summary: | Unpredictable enabling of SCSI Tagged Queueing | ||
|---|---|---|---|
| Product: | Base System | Reporter: | klh <klh> |
| Component: | kern | Assignee: | Kenneth D. Merry <ken> |
| Status: | Closed FIXED | ||
| Severity: | Affects Only Me | ||
| Priority: | Normal | ||
| Version: | 3.1-RELEASE | ||
| Hardware: | Any | ||
| OS: | Any | ||
On Sun, Dec 12, 1999 at 06:18:55PM -0800, klh@netcom.com wrote: > > >Number: 15446 > >Category: kern > >Synopsis: Unpredictable enabling of SCSI Tagged Queueing [ ... ] > FreeBSD <hostname> 3.1-RELEASE FreeBSD 3.1-RELEASE #<n>: <buildstring> i386 > >Description: > Whether or not a device capable of tagged queueing is actually flagged > as such appears to be semi-random. It can be different for two identical > drives, and can change from one boot to the next of the same kernel. > > For example, on one recent boot, two identical drives give this: > da0: 10.0MB/s transfers (10.0MHz, offset 15), Tagged Queueing Enabled > ... > da2: 10.0MB/s transfers (10.0MHz, offset 15) > > In conjunction with another problem which I am filing separately (the > possibility that the Seagate ST32550WC is broken with respect to > tagged queueing) this causes my system to sometimes work and sometimes > fail depending on the level of disk activity. I did not notice this > earlier because until recently I have not significantly stressed the > disk I/O. Now that it's a problem I've reviewed the system logs for > the past few months and found that sometimes tagged queueing was > enabled and sometimes not. > > FWIW the version of cam_xpt.c in my source tree is 1.42. There are > a few too many things involved in the enable/disable code for me to be > certain what is going on or whether the flags are even being initialized > properly. There are several ways to enable or disable tagged queueing: - If the DQue bit in mode page 10 is set, tagged queueing will be disabled for the drive in question. If you want to view/edit mode page 10, see the camcontrol(8) man page for details. - If the drive is quirked in cam_xpt.c to have 0 tags. The only Seagate drives quirked in cam_xpt.c are the Seagate Medalist Pro drives. Seagate's web site says that the 32550 is a 2 gig Barracuda. I've never seen tagged queuing problems with Barracudas. - If tagged queueing is enabled/disabled from userland via camcontrol(8). Since this camcontrol option appeared in FreeBSD 3.2, that isn't an option in your situation. My guess is that somehow the tagged queueing bit is being enabled and disabled in the drive firmware. None of the other ways of tweaking the tagged queueing settings would explain the behavior you're seeing. Check the settings in mode page 10 with camcontrol and see whether the drive says tagged queueing is enabled or disabled. If the DQue bit is set, the drive should not be reported as a tagged queueing drive in the dmesg. If the DQue bit is set and then cleared somehow between boots on your system, that points fairly strongly to some sort of problem with the drive. Ken -- Kenneth Merry ken@kdm.org > On Sun, Dec 12, 1999 at 06:18:55PM -0800, klh@netcom.com wrote:
> >
> > >Number: 15446
> > >Category: kern
> > >Synopsis: Unpredictable enabling of SCSI Tagged Queueing
>
> [ ... ]
>
> > FreeBSD <hostname> 3.1-RELEASE FreeBSD 3.1-RELEASE #<n>: <buildstring> i386
> > >Description:
> > Whether or not a device capable of tagged queueing is actually flagged
> > as such appears to be semi-random. It can be different for two identical
> > drives, and can change from one boot to the next of the same kernel.
> >
> > For example, on one recent boot, two identical drives give this:
> > da0: 10.0MB/s transfers (10.0MHz, offset 15), Tagged Queueing Enabled
> > ...
> > da2: 10.0MB/s transfers (10.0MHz, offset 15)
> >
> > In conjunction with another problem which I am filing separately (the
> > possibility that the Seagate ST32550WC is broken with respect to
> > tagged queueing) this causes my system to sometimes work and sometimes
> > fail depending on the level of disk activity. I did not notice this
> > earlier because until recently I have not significantly stressed the
> > disk I/O. Now that it's a problem I've reviewed the system logs for
> > the past few months and found that sometimes tagged queueing was
> > enabled and sometimes not.
> >
> > FWIW the version of cam_xpt.c in my source tree is 1.42. There are
> > a few too many things involved in the enable/disable code for me to be
> > certain what is going on or whether the flags are even being initialized
> > properly.
>
> There are several ways to enable or disable tagged queueing:
>
> - If the DQue bit in mode page 10 is set, tagged queueing will be disabled
> for the drive in question. If you want to view/edit mode page 10, see
> the camcontrol(8) man page for details.
>
> - If the drive is quirked in cam_xpt.c to have 0 tags. The only Seagate
> drives quirked in cam_xpt.c are the Seagate Medalist Pro drives.
> Seagate's web site says that the 32550 is a 2 gig Barracuda. I've never
> seen tagged queuing problems with Barracudas.
>
> - If tagged queueing is enabled/disabled from userland via camcontrol(8).
> Since this camcontrol option appeared in FreeBSD 3.2, that isn't an
> option in your situation.
>
> My guess is that somehow the tagged queueing bit is being enabled and
> disabled in the drive firmware. None of the other ways of tweaking the
> tagged queueing settings would explain the behavior you're seeing.
>
> Check the settings in mode page 10 with camcontrol and see whether the
> drive says tagged queueing is enabled or disabled. If the DQue bit is set,
> the drive should not be reported as a tagged queueing drive in the dmesg.
>
> If the DQue bit is set and then cleared somehow between boots on your
> system, that points fairly strongly to some sort of problem with the drive.
Examining mode page 10 as you suggest reveals no changes between
boots, although the system's idea of the tagged queueing status
continues to vary.
More interestingly, a "camcontrol inquiry" shows all 4 of the drives
as having Tagged Queueing. This information also does not change
between boots.
My own theory is more simplistic. I wonder if the data structures
responsible for noting the tagged-queueing status are simply not being
initialized properly. I can't be absolutely certain, but so far it seems
to me that if I continue to reboot using the same kernel and boot
flags, I keep getting the same drives enabled (or disabled); if I vary
either the kernel version or the presence of -v, the selection will
change. Leftover memory values? Buffer junk?
I am going to include 3 chunks of stuff here:
(1) the results of "camcontrol inquiry" for all drives (constant)
(2) mode page 10 for all drives (constant)
(3) dmesg output for a sample boot (varies)
Note that for this particular boot, only the da0 Seagate is
marked as having Tagged Queueing enabled. On other boots, both
Seagates (da0 & da2) are enabled; on still others, just da1 is
enabled! The da3 Fujitsu has also turned up in past logs,
although it's quite rare.
--Ken
--------------------
bohica:/home/klh# camcontrol inquiry -u 0 -DR
<SEAGATE ST32550W SUN2.1G 0418> Fixed Direct Access SCSI-2 device
10.0MB/s transfers (10.0MHz, offset 15), Tagged Queueing Enabled
bohica:/home/klh# camcontrol inquiry -u 1 -DR
<COMPAQPC DPES-30540 S31K> Fixed Direct Access SCSI-2 device
10.0MB/s transfers (10.0MHz, offset 15), Tagged Queueing Enabled
bohica:/home/klh# camcontrol inquiry -u 2 -DR
<SEAGATE ST32550W SUN2.1G 0418> Fixed Direct Access SCSI-2 device
10.0MB/s transfers (10.0MHz, offset 15), Tagged Queueing Enabled
bohica:/home/klh# camcontrol inquiry -u 3 -DR
<FUJITSU M2952ESP SUN2.1G 2545> Fixed Direct Access SCSI-2 device
10.0MB/s transfers (10.0MHz, offset 15), Tagged Queueing Enabled
bohica:/home/klh#
--------------------
bohica:/home/klh# camcontrol modepage -u 0 -m 10
RLEC: 0
Queue Algorithm Modifier: 0
QErr: 0
DQue: 0
EECA: 0
RAENP: 0
UAAENP: 0
EAENP: 0
Ready AEN Holdoff Period: 0
bohica:/home/klh# camcontrol modepage -u 1 -m 10
RLEC: 0
Queue Algorithm Modifier: 1
QErr: 0
DQue: 0
EECA: 0
RAENP: 0
UAAENP: 0
EAENP: 0
Ready AEN Holdoff Period: 0
bohica:/home/klh# camcontrol modepage -u 2 -m 10
RLEC: 0
Queue Algorithm Modifier: 0
QErr: 0
DQue: 0
EECA: 0
RAENP: 0
UAAENP: 0
EAENP: 0
Ready AEN Holdoff Period: 0
bohica:/home/klh# camcontrol modepage -u 3 -m 10
RLEC: 1
Queue Algorithm Modifier: 1
QErr: 0
DQue: 0
EECA: 0
RAENP: 0
UAAENP: 0
EAENP: 0
Ready AEN Holdoff Period: 0
bohica:/home/klh#
--------------------
DMESG OUTPUT:
Copyright (c) 1992-1999 FreeBSD Inc.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California. All rights reserved.
FreeBSD 3.1-RELEASE #16: Sun Dec 12 00:56:35 PST 1999
klh@pcklh:/usr/src/sys/compile/JUMPGATE
Timecounter "i8254" frequency 1193182 Hz
Timecounter "TSC" frequency 66668362 Hz
CPU: Pentium/P5 (66.67-MHz 586-class CPU)
Origin = "GenuineIntel" Id = 0x517 Stepping=7
Features=0x1bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8>
real memory = 33554432 (32768K bytes)
avail memory = 29855744 (29156K bytes)
Bad BIOS32 Service Directory!
Preloaded elf kernel "kernel" at 0xf02e8000.
eisa0: <CPQ521 (System Board)>
Probing for devices on the EISA bus
Probing for devices on PCI bus 0:
chip0: <Intel 82434LX (Mercury) PCI cache memory controller> rev 0x03 on pci0.0.0
lnc1: <PCNet/PCI Ethernet adapter> rev 0x02 int b irq 10 on pci0.11.0
lnc1: PCnet-32 VL-Bus address 00:80:5f:e4:96:18
amd0: <Tekram DC390(T)/AMD53c974 SCSI Adapter Driver v1.05 01-01-1999 CAM ver. > rev 0x02 int a irq 11 on pci0.12.0
vga0: <Matrox model 0d10 graphics accelerator> rev 0x00 int a irq 255 on pci0.13.0
chip1: <PCI to EISA bridge (vendor=0e11 device=0001)> rev 0x03 on pci0.15.0
Probing for devices on the ISA bus:
sc0 on isa
sc0: VGA color <16 virtual consoles, flags=0x0>
ed0 at 0x310-0x31f irq 9 maddr 0xdc000 msize 8192 on isa
ed0: address 02:60:8c:3e:56:b0, type 3c503 (8 bit)
atkbdc0 at 0x60-0x6f on motherboard
atkbd0 irq 1 on isa
psm0 irq 12 on isa
psm0: model Generic PS/2 mouse, device ID 0
sio0 at 0x3f8-0x3ff irq 4 flags 0x10 on isa
sio0: type 16550A
sio1 at 0x2f8-0x2ff irq 3 on isa
sio1: type 16550A
fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa
fdc0: FIFO enabled, 8 bytes threshold
fd0: 1.44MB 3.5in
ppc0 at 0x378 irq 7 on isa
ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode
nlpt0: <generic printer> on ppbus 0
nlpt0: Interrupt-driven port
ppi0: <generic parallel i/o> on ppbus 0
plip0: <PLIP network interface> on ppbus 0
2 3C5x9 board(s) on ISA found at 0x300 0x320
ep0 at 0x300-0x30f irq 5 on isa
ep0: aui/utp[*AUI*] address 00:20:af:34:5f:af
ep1 at 0x320-0x32f irq 15 on isa
ep1: aui/bnc[*AUI*] address 00:60:97:09:40:7c
vga0 at 0x3b0-0x3df maddr 0xa0000 msize 131072 on isa
npx0 on motherboard
npx0: INT 16 interface
Intel Pentium detected, installing workaround for F00F bug
IP packet filtering initialized, divert enabled, rule-based forwarding enabled, logging limited to 1000 packets/entry
Waiting 5 seconds for SCSI devices to settle
changing root device to da0s1a
da1 at amd0 bus 0 target 1 lun 0
da1: <COMPAQPC DPES-30540 S31K> Fixed Direct Access SCSI-2 device
da1: 10.0MB/s transfers (10.0MHz, offset 15)
da1: 511MB (1046532 512 byte sectors: 64H 32S/T 511C)
da0 at amd0 bus 0 target 0 lun 0
da0: <SEAGATE ST32550W SUN2.1G 0418> Fixed Direct Access SCSI-2 device
da0: 10.0MB/s transfers (10.0MHz, offset 15), Tagged Queueing Enabled
da0: 2048MB (4194995 512 byte sectors: 255H 63S/T 261C)
da3 at amd0 bus 0 target 3 lun 0
da3: <FUJITSU M2952ESP SUN2.1G 2545> Fixed Direct Access SCSI-2 device
da3: 10.0MB/s transfers (10.0MHz, offset 15)
da3: 2029MB (4157201 512 byte sectors: 255H 63S/T 258C)
da2 at amd0 bus 0 target 2 lun 0
da2: <SEAGATE ST32550W SUN2.1G 0418> Fixed Direct Access SCSI-2 device
da2: 10.0MB/s transfers (10.0MHz, offset 15)
da2: 2048MB (4194995 512 byte sectors: 255H 63S/T 261C)
--------------------
On Tue, Dec 14, 1999 at 02:29:52 -0800, Ken Harrenstien wrote: > > > > My guess is that somehow the tagged queueing bit is being enabled and > > disabled in the drive firmware. None of the other ways of tweaking the > > tagged queueing settings would explain the behavior you're seeing. > > > > Check the settings in mode page 10 with camcontrol and see whether the > > drive says tagged queueing is enabled or disabled. If the DQue bit is set, > > the drive should not be reported as a tagged queueing drive in the dmesg. > > > > If the DQue bit is set and then cleared somehow between boots on your > > system, that points fairly strongly to some sort of problem with the drive. > > Examining mode page 10 as you suggest reveals no changes between > boots, although the system's idea of the tagged queueing status > continues to vary. > > More interestingly, a "camcontrol inquiry" shows all 4 of the drives > as having Tagged Queueing. This information also does not change > between boots. That is very interesting. It points to a driver problem I think. > lnc1: <PCNet/PCI Ethernet adapter> rev 0x02 int b irq 10 on pci0.11.0 > lnc1: PCnet-32 VL-Bus address 00:80:5f:e4:96:18 > amd0: <Tekram DC390(T)/AMD53c974 SCSI Adapter Driver v1.05 01-01-1999 CAM ver. > rev 0x02 int a irq 11 on pci0.12.0 And, as Justin already pointed out, the driver is most likely your problem. Tekram's default amd driver has a few problems, and you'll probably want Justin's reworked version of that driver. I believe it went in just before 3.3. It may not be in GENERIC in 3.3, however. It will be turned on by default for 3.4, and for any -stable snapshot. You have several options as far as drivers go: - Grab the -stable driver and put it on your system, and recompile your kernel. You should just need src/sys/pci/amd.{c,h}. You'll also need to change sys/conf/files so that the amd driver points to amd.c instead of tek390.c. You can get the -stable driver here: ftp://ftp.FreeBSD.ORG/pub/FreeBSD/branches/3.0-stable/src/sys/pci - Upgrade to 3.3, -stable or 3.4, which should be out shortly. If you don't want to wait, stable snapshots are located here: ftp://current.FreeBSD.ORG/pub/FreeBSD/snapshots/i386 - Upgrade to -current. Snapshots are available at the same place as the stable snapshots. This isn't recommended for most folks, but if you're ready to deal with the requirements, it might be fun: http://www.freebsd.org/handbook/cutting-edge.html#CURRENT Anyway, let me know whether an updated driver fixes your problem. Ken -- Kenneth Merry ken@kdm.org State Changed From-To: open->closed Submitter reports that upgrading to the driver in 3.3 fixed his problem. Responsible Changed From-To: freebsd-bugs->ken I'm handling this. > Anyway, let me know whether an updated driver fixes your problem.
I'm happy to report that the 3.3-RELEASE version of the driver does
indeed fix this problem. I had one glitch on the very first test
after reboot (ungzip barfed about an incorrect checksum) which caused
me to hold off on reporting this fixed. However, after further
testing over the past few weeks I've been unable to reproduce this,
and everything else seems OK as long as I don't use the associated
ethernet interface, so let's close out this one.
Thanks!
--Ken
|
Whether or not a device capable of tagged queueing is actually flagged as such appears to be semi-random. It can be different for two identical drives, and can change from one boot to the next of the same kernel. For example, on one recent boot, two identical drives give this: da0: 10.0MB/s transfers (10.0MHz, offset 15), Tagged Queueing Enabled ... da2: 10.0MB/s transfers (10.0MHz, offset 15) In conjunction with another problem which I am filing separately (the possibility that the Seagate ST32550WC is broken with respect to tagged queueing) this causes my system to sometimes work and sometimes fail depending on the level of disk activity. I did not notice this earlier because until recently I have not significantly stressed the disk I/O. Now that it's a problem I've reviewed the system logs for the past few months and found that sometimes tagged queueing was enabled and sometimes not. FWIW the version of cam_xpt.c in my source tree is 1.42. There are a few too many things involved in the enable/disable code for me to be certain what is going on or whether the flags are even being initialized properly. I looked at the CVS log for this file but could not be sure whether the problem had been noticed or addressed in later revs. Ditto GNATS. It seems unlikely that a problem like this could have escaped notice until now, so this bug report is more of a preliminary check to find out whether it's worth sending additional info. How-To-Repeat: Unknown, but I can reproduce the variability quite reliably and am willing to test some number of changes to 3.1-RELEASE, as well as send additional info like system log output for the last few months (too long for this form).