Bug 21915

Summary: Machine dies sig 12 in ahc driver (Freebsd4.1.1)
Product: Base System Reporter: systems <systems>
Component: kernAssignee: Greg Lehey <grog>
Status: Closed FIXED    
Severity: Affects Only Me CC: gibbs
Priority: Normal    
Version: Unspecified   
Hardware: Any   
OS: Any   

Description Justin T. Gibbs 2000-10-11 15:51:23 UTC
>Had another go today, incorporated Justin's MFC of last night.
>The problem still exists though.

Is the stack trace identical?

>My suspicion is that there's a race condition somewhere which is
>causing the bus to crash - somewhere between the ahc and the fxp
>drivers.  I've not investigated deeper though, not having any
>familiarity with the device driver code.

If the buffer pointer is still null in this latest instance, one
thing to try would be to instrument both dastrategy and dastart
in sys/cam/scsi/scsi_da.c.  dastrategy is the entry point to
the CAM system for buffers headed for a disk.  If bp->b_data is
NULL at that point, we've pretty much exonerated the da driver,
the CAM subsystem and ahc.  If you don't hit that panic, then
dastart is the transfer point from the DA driver to the XPT and
then ahc driver.  It would be interesting to know if the buffer
pointer has been zero'ed by that point.

--
Justin
Comment 1 systems 2000-10-11 17:40:00 UTC
	Rsyncing a remote machine's data onto this machine's
	vinum drive causes the machine to bus error.

	This is repeatable, and occurs on two exactly similar machines.


Fatal trace 12: page fault while in kernel mode
fault virtual address = 0xbfc00000
fault code = supervisor read, page not present
intruction pointer = 0x8:0xc0210222
stack pointer = 0x10:0xc025e55c
frame pointer = 0x10:0xc025e708
code segment = base rx0, limit 0xfffff, type 0x1b
        = DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = Idle
interrupt mask =
kernel: type 12 trap, code=0

stopped at: bus_dmamap_load+0x1ca: movl PTmap(,%eax,4),%edx

db>trace
bus_dmamap_load(c1e5cdc0,c0290700,0,0,c013af88,c1e5e9cc,0) at bus_dmamap_load+0x
1ca
ahc_setup_data(c1e4d200,c1e5cc40,c21c9000,c1e5e9cc,c1e62244) at ahc_setup_data+0
x17e
ahc_action(c1e5cc40,c21c9000,1,c1f36a90,6c0000) at ahc_action+0x2ec
xpt_run_dev_sendq(c1e5cc00) at xpt_run_dev_sendq+0x1cb
xpt_action(c21c9000,680000) at xpt_action+0x23f
dastart(c1f2b500,c21c9000,c21c9000,c1f36a90,1,680000,c1e62230,1) at dastart+0x1c
c
xpt_run_dev_allocq(c1e5cc00,c21c9000,c1f3e000,c21c9000,680000) at xpt_run_dev_al
locq+0x80
xpt_release_ccb(c21c9000,c21c9000,c1e62200,680000,680000) at xpt_release_ccb+0xe
b
dadone(c1f2b500,c21c9000,0,0,ffffffff) at dadone+0x459
camisr(c028a630,0,c021309f,0,400010) at camisr+0x1eb
swi_cambio(0,400010,c0250010,c0120010,ffffffff) at swi_cambio+0xd
doreti_swi() at doreti_swi+0xf

Fix: 

None.
How-To-Repeat: 
	# cd /data
	# rsync -vaW -e ssh remote.machine:/ localcopy/
	[wait for file list to be compiled - then _bang_]
Comment 2 joe 2000-10-12 13:27:35 UTC
Had another go today, incorporated Justin's MFC of last night.
The problem still exists though.

I had originally reported the problems as happening as soon as
rsync's filelist build had completed, but upon reflection we get
20-30 seconds of file transfer happening first before the bus error.

I've also set up the same thing with an rsync from an internal disk
to the vinum partition:

# rsync -vaW /usr /data/usr-copy

This appears to work, i.e. it ran for several minutes before I
stopped it with a ^c.

My suspicion is that there's a race condition somewhere which is
causing the bus to crash - somewhere between the ahc and the fxp
drivers.  I've not investigated deeper though, not having any
familiarity with the device driver code.

Joe




On Wed, Oct 11, 2000 at 05:39:31PM +0100, systems@pavilion.net wrote:
> 
> >Number:         21915
> >Category:       kern
> >Synopsis:       Machine dies sig 12 in ahc driver (Freebsd4.1.1)
> >Confidential:   no
> >Severity:       critical
> >Priority:       high
> >Responsible:    freebsd-bugs
> >State:          open
> >Quarter:        
> >Keywords:       
> >Date-Required:
> >Class:          sw-bug
> >Submitter-Id:   current-users
> >Arrival-Date:   Wed Oct 11 09:40:00 PDT 2000
> >Closed-Date:
> >Last-Modified:
> >Originator:     Joe Karthauser
> >Release:        FreeBSD-4.1.1
> >Organization:
> Pavilion Internet plc
> >Environment:
> 
> Copyright (c) 1992-2000 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
> 	The Regents of the University of California. All rights reserved.
> FreeBSD 4.1.1-STABLE #0: Wed Oct 11 15:19:45 BST 2000 root@:/usr/obj/usr/src/sys/BLASTER
> Timecounter "i8254"  frequency 1193182 Hz
> Timecounter "TSC"  frequency 801823410 Hz
> CPU: Pentium III/Pentium III Xeon/Celeron (801.82-MHz 686-class CPU)
>   Origin = "GenuineIntel"  Id = 0x683  Stepping = 3
>   Features=0x383f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE>
> real memory  = 805306368 (786432K bytes)
> avail memory = 780689408 (762392K bytes)
> Preloaded elf kernel "kernel" at 0xc02e4000.
> Pentium Pro MTRR support enabled
> md0: Malloc disk
> npx0: <math processor> on motherboard
> npx0: INT 16 interface
> pcib0: <Host to PCI bridge> on motherboard
> pci0: <PCI bus> on pcib0
> pcib2: <VIA 82C598MVP (Apollo MVP3) PCI-PCI (AGP) bridge> at device 1.0 on pci0
> pci1: <PCI bus> on pcib2
> isab0: <VIA 82C596B PCI-ISA bridge> at device 7.0 on pci0
> isa0: <ISA bus> on isab0
> atapci0: <VIA 82C596 ATA66 controller> port 0xe000-0xe00f at device 7.1 on pci0
> ata0: at 0x1f0 irq 14 on atapci0
> ata1: at 0x170 irq 15 on atapci0
> pci0: <VIA 83C572 USB controller> at 7.2 irq 11
> fxp0: <Intel Pro 10/100B/100+ Ethernet> port 0xe800-0xe83f mem 0xea000000-0xea0fffff,0xea101000-0xea101fff irq 10 at device 9.0 on pci0
> fxp0: Ethernet address 00:02:b3:03:75:62
> ahc0: <Adaptec 29160N Ultra160 SCSI adapter> port 0xec00-0xecff mem 0xea100000-0xea100fff irq 5 at device 10.0 on pci0
> aic7892: Wide Channel A, SCSI Id=7, 32/255 SCBs
> pci0: <Matrox MGA Millennium II 2164W graphics accelerator> at 11.0 irq 9
> pcib1: <Host to PCI bridge> on motherboard
> pci2: <PCI bus> on pcib1
> fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0
> fdc0: FIFO enabled, 8 bytes threshold
> fd0: <1440-KB 3.5" drive> on fdc0 drive 0
> atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
> atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0
> kbd0 at atkbd0
> psm0: <PS/2 Mouse> irq 12 on atkbdc0
> psm0: model Generic PS/2 mouse, device ID 0
> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
> sc0: <System console> at flags 0x100 on isa0
> sc0: VGA <16 virtual consoles, flags=0x300>
> sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
> sio0: type 16550A
> sio1 at port 0x2f8-0x2ff irq 3 on isa0
> sio1: type 16550A
> ppc0: <Parallel port> at port 0x378-0x37f irq 7 on isa0
> ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode
> lpt0: <Printer> on ppbus0
> lpt0: Interrupt-driven port
> plip0: <PLIP network interface> on ppbus0
> ppi0: <Parallel I/O> on ppbus0
> acd0: CDROM <ATAPI 52X CDROM> at ata0-master using PIO4
> Waiting 3 seconds for SCSI devices to settle
> Mounting root from ufs:/dev/da0s1a
> da0 at ahc0 bus 0 target 0 lun 0
> da0: <COMPAQ AB009322B4 A019> Fixed Direct Access SCSI-2 device 
> da0: 80.000MB/s transfers (40.000MHz, offset 63, 16bit), Tagged Queueing Enabled
> da0: 8678MB (17773500 512 byte sectors: 255H 63S/T 1106C)
> da3 at ahc0 bus 0 target 3 lun 0
> da3: <FUJITSU MAJ3364MC 0114> Fixed Direct Access SCSI-4 device 
> da3: 160.000MB/s transfers (80.000MHz, offset 127, 16bit), Tagged Queueing Enabled
> da3: 34858MB (71390320 512 byte sectors: 255H 63S/T 4443C)
> da2 at ahc0 bus 0 target 2 lun 0
> da2: <FUJITSU MAJ3364MC 0114> Fixed Direct Access SCSI-4 device 
> da2: 160.000MB/s transfers (80.000MHz, offset 127, 16bit), Tagged Queueing Enabled
> da2: 34858MB (71390320 512 byte sectors: 255H 63S/T 4443C)
> da1 at ahc0 bus 0 target 1 lun 0
> da1: <FUJITSU MAJ3364MC 0114> Fixed Direct Access SCSI-4 device 
> da1: 160.000MB/s transfers (80.000MHz, offset 127, 16bit), Tagged Queueing Enabled
> da1: 34858MB (71390320 512 byte sectors: 255H 63S/T 4443C)
> WARNING: / was not properly dismounted
> vinum: loaded
> vinum: reading configuration from /dev/da1s1d
> vinum: updating configuration from /dev/da2s1d
> vinum: updating configuration from /dev/da3s1d
> 
> 
> 
> # cat /etc/fstab
> 
> # Device		Mountpoint	FStype	Options		Dump	Pass#
> /dev/da0s1b		none		swap	sw		0	0
> /dev/da0s1a		/		ufs	rw		1	1
> /dev/da0s1h		/home		ufs	rw		2	2
> /dev/da0s1g		/tmp		ufs	rw		2	2
> /dev/da0s1f		/usr		ufs	rw		2	2
> /dev/da0s1e		/var		ufs	rw		2	2
> /dev/acd0c		/cdrom	cd9660	ro,noauto	0	0
> /dev/vinum/dataraid5	/data		ufs	rw,noauto	2	2
> proc			/proc		procfs	rw		0	0
> 
> 
> 
> # vinum dumpconfig
> 
> Drive raid5drive0:	Device /dev/da1d
> 		Created on temp126.staff.pavilion.net at Tue Oct 10 16:25:38 2000
> 		Config last updated Wed Oct 11 18:29:58 2000
> 		Size:      36544886784 bytes (34851 MB)
> volume dataraid5 state up
> plex name dataraid5.p0 state up org raid5 512s vol dataraid5 
> sd name dataraid5.p0.s0 drive raid5drive0 plex dataraid5.p0 state up len 71376384s driveoffset 265s plexoffset 0s
> sd name dataraid5.p0.s1 drive raid5drive1 plex dataraid5.p0 state up len 71376384s driveoffset 265s plexoffset 512s
> sd name dataraid5.p0.s2 drive raid5drive2 plex dataraid5.p0 state up len 71376384s driveoffset 265s plexoffset 1024s
> 
> Drive /dev/da1d: 34 GB (36544886784 bytes)
> Drive raid5drive1:	Device /dev/da2d
> 		Created on temp126.staff.pavilion.net at Tue Oct 10 16:25:59 2000
> 		Config last updated Wed Oct 11 18:29:58 2000
> 		Size:      36544886784 bytes (34851 MB)
> volume dataraid5 state up
> plex name dataraid5.p0 state up org raid5 512s vol dataraid5 
> sd name dataraid5.p0.s0 drive raid5drive0 plex dataraid5.p0 state up len 71376384s driveoffset 265s plexoffset 0s
> sd name dataraid5.p0.s1 drive raid5drive1 plex dataraid5.p0 state up len 71376384s driveoffset 265s plexoffset 512s
> sd name dataraid5.p0.s2 drive raid5drive2 plex dataraid5.p0 state up len 71376384s driveoffset 265s plexoffset 1024s
> 
> Drive /dev/da2d: 34 GB (36544886784 bytes)
> Drive raid5drive2:	Device /dev/da3d
> 		Created on temp126.staff.pavilion.net at Tue Oct 10 16:26:00 2000
> 		Config last updated Wed Oct 11 18:29:58 2000
> 		Size:      36544886784 bytes (34851 MB)
> volume dataraid5 state up
> plex name dataraid5.p0 state up org raid5 512s vol dataraid5 
> sd name dataraid5.p0.s0 drive raid5drive0 plex dataraid5.p0 state up len 71376384s driveoffset 265s plexoffset 0s
> sd name dataraid5.p0.s1 drive raid5drive1 plex dataraid5.p0 state up len 71376384s driveoffset 265s plexoffset 512s
> sd name dataraid5.p0.s2 drive raid5drive2 plex dataraid5.p0 state up len 71376384s driveoffset 265s plexoffset 1024s
> 
> Drive /dev/da3d: 34 GB (36544886784 bytes)
> 
> 
> 
> 
> 	
> 
> >Description:
> 
> 	Rsyncing a remote machine's data onto this machine's
> 	vinum drive causes the machine to bus error.
> 
> 	This is repeatable, and occurs on two exactly similar machines.
> 
> 
> Fatal trace 12: page fault while in kernel mode
> fault virtual address = 0xbfc00000
> fault code = supervisor read, page not present
> intruction pointer = 0x8:0xc0210222
> stack pointer = 0x10:0xc025e55c
> frame pointer = 0x10:0xc025e708
> code segment = base rx0, limit 0xfffff, type 0x1b
>         = DPL 0, pres 1, def32 1, gran 1
> processor eflags = interrupt enabled, resume, IOPL = 0
> current process = Idle
> interrupt mask =
> kernel: type 12 trap, code=0
> 
> stopped at: bus_dmamap_load+0x1ca: movl PTmap(,%eax,4),%edx
> 
> db>trace
> bus_dmamap_load(c1e5cdc0,c0290700,0,0,c013af88,c1e5e9cc,0) at bus_dmamap_load+0x
> 1ca
> ahc_setup_data(c1e4d200,c1e5cc40,c21c9000,c1e5e9cc,c1e62244) at ahc_setup_data+0
> x17e
> ahc_action(c1e5cc40,c21c9000,1,c1f36a90,6c0000) at ahc_action+0x2ec
> xpt_run_dev_sendq(c1e5cc00) at xpt_run_dev_sendq+0x1cb
> xpt_action(c21c9000,680000) at xpt_action+0x23f
> dastart(c1f2b500,c21c9000,c21c9000,c1f36a90,1,680000,c1e62230,1) at dastart+0x1c
> c
> xpt_run_dev_allocq(c1e5cc00,c21c9000,c1f3e000,c21c9000,680000) at xpt_run_dev_al
> locq+0x80
> xpt_release_ccb(c21c9000,c21c9000,c1e62200,680000,680000) at xpt_release_ccb+0xe
> b
> dadone(c1f2b500,c21c9000,0,0,ffffffff) at dadone+0x459
> camisr(c028a630,0,c021309f,0,400010) at camisr+0x1eb
> swi_cambio(0,400010,c0250010,c0120010,ffffffff) at swi_cambio+0xd
> doreti_swi() at doreti_swi+0xf
> 
> 
> 	
> >How-To-Repeat:
> 
> 	# cd /data
> 	# rsync -vaW -e ssh remote.machine:/ localcopy/
> 	[wait for file list to be compiled - then _bang_]
> 
> >Fix:
> 
> 	None.
> 
> 
> >Release-Note:
> >Audit-Trail:
> >Unformatted:
> 
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-bugs" in the body of the message

-- 
Josef Karthauser	FreeBSD: How many times have you booted today?
Technical Manager	Viagra for your server (http://www.uk.freebsd.org)
Pavilion Internet plc.  [joe@pavilion.net, joe@uk.freebsd.org, joe@tao.org.uk]
Comment 3 Johan Karlsson freebsd_committer freebsd_triage 2000-10-14 15:45:23 UTC
Responsible Changed
From-To: freebsd-bugs->gibbs

Over to ahc(4) maintainer.
Comment 4 Justin T. Gibbs freebsd_committer freebsd_triage 2001-02-24 19:56:37 UTC
Responsible Changed
From-To: gibbs->gorg

The bug is in the code supplying the buffer to the SCSI layer, not in the ahc 
driver.
Comment 5 Kenneth D. Merry freebsd_committer freebsd_triage 2001-03-12 23:29:24 UTC
Responsible Changed
From-To: gorg->grog

Justin meant to assign this to 'grog' not 'gorg'.
Comment 6 Greg Lehey freebsd_committer freebsd_triage 2001-05-03 09:26:48 UTC
State Changed
From-To: open->closed

Problem fixed in revisions 1.20 and 1.18.2.2 of vinumlock.c.