The machine crashed repeatedly after a vinum raid5 was set up and used heavily. Hardware: Dell Poweredge 6100/200 4xPPro SMP machine, with 3 Adaptec SCSI controllers and one Promise Fasttrack ATA100 IDE controller... see dmesg: dmesg output: Copyright (c) 1992-2000 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 4.1-STABLE #0: Fri Sep 8 10:24:40 CEST 2000 root@atleo2.leo.org:/usr/obj/usr/src/sys/ATLEO4 Timecounter "i8254" frequency 1193182 Hz CPU: Pentium Pro (198.95-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x619 Stepping = 9 Features=0xfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV> real memory = 536870912 (524288K bytes) avail memory = 518316032 (506168K bytes) Programming 16 pins in IOAPIC #0 IOAPIC #0 intpin 2 -> irq 0 FreeBSD/SMP: Multiprocessor motherboard cpu0 (BSP): apic id: 0, version: 0x00040011, at 0xfec08000 cpu1 (AP): apic id: 4, version: 0x00040011, at 0xfec08000 cpu2 (AP): apic id: 1, version: 0x00040011, at 0xfec08000 cpu3 (AP): apic id: 2, version: 0x00040011, at 0xfec08000 io0 (APIC): apic id: 14, version: 0x000f0011, at 0xfec00000 Preloaded elf kernel "kernel" at 0xc0401000. Pentium Pro MTRR support enabled md0: Malloc disk npx0: <math processor> on motherboard npx0: INT 16 interface pcib0: <Intel 82454KX/GX (Orion) host to PCI bridge> on motherboard pci0: <PCI bus> on pcib0 fxp0: <Intel Pro 10/100B/100+ Ethernet> port 0xff80-0xff9f mem 0xfe900000-0xfe9fffff,0xfe2ff000-0xfe2fffff irq 10 at device 11.0 on pci0 fxp0: Ethernet address 00:a0:c9:99:47:2c ahc0: <Adaptec 2940 Ultra SCSI adapter> port 0xfc00-0xfcff mem 0xfeaff000-0xfeafffff irq 11 at device 12.0 on pci0 ahc0: aic7880 Wide Channel A, SCSI Id=7, 16/255 SCBs isab0: <Intel 82375EB PCI-EISA bridge> at device 14.0 on pci0 eisa0: <EISA bus> on isab0 mainboard0: <INT31c0 (System Board)> on eisa0 slot 0 isa0: <ISA bus> on isab0 chip0: <> mem 0xfffffc00-0xffffffff,0xfffffc00-0xffffffff,0xfffffc00-0xffffffff,0xfffffc00-0xffffffff,0xfffffc00-0xffffffff,0xfec01000-0xfec013ff at device 15.0 on pci0 chip1: <Intel 82453KX/GX (Orion) PCI memory controller> at device 20.0 on pci0 pcib1: <Intel 82454KX/GX (Orion) host to PCI bridge> on motherboard pci1: <PCI bus> on pcib1 ahc1: <Adaptec aic7880 Ultra SCSI adapter> port 0xec00-0xecff mem 0xfe1ff000-0xfe1fffff irq 5 at device 11.0 on pci1 ahc1: Using left over BIOS settings ahc1: aic7880 Wide Channel A, SCSI Id=7, 16/255 SCBs ahc2: <Adaptec aic7880 Ultra SCSI adapter> port 0xe800-0xe8ff mem 0xfe1fe000-0xfe1fefff irq 5 at device 12.0 on pci1 ahc2: aic7880 Wide Channel A, SCSI Id=7, 16/255 SCBs ahc2: Host Adapter Bios disabled. Using default SCSI device parameters atapci0: <Promise ATA100 controller> port 0xe480-0xe4bf,0xe4f0-0xe4f3,0xe4e8-0xe4ef,0xe4f4-0xe4f7,0xe4f8-0xe4ff mem 0xfe1a0000-0xfe1bffff irq 9 at device 13.0 on pci1 ata2: at 0xe4f8 on atapci0 ata3: at 0xe4e8 on atapci0 fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0 fdc0: FIFO enabled, 8 bytes threshold fd0: <1440-KB 3.5" drive> on fdc0 drive 0 atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0 atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0 kbd0 at atkbd0 psm0: <PS/2 Mouse> irq 12 on atkbdc0 psm0: model Generic PS/2 mouse, device ID 0 vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <12 virtual consoles, flags=0x100> sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 sio0: type 16550A, console sio1 at port 0x2f8-0x2ff irq 3 on isa0 sio1: type 16550A ppc0: parallel port not found. APIC_IO: routing 8254 via IOAPIC #0 intpin 2 IP packet filtering initialized, divert enabled, rule-based forwarding enabled, default to accept, logging limited to 100 packets/entry by default IPv6 packet filtering initialized, default to accept, logging limited to 100 packets/entry IPsec: Initialized Security Association Processing. IP Filter: v3.4.8 initialized. Default = pass all, Logging = enabled SMP: AP CPU #1 Launched! SMP: AP CPU #2 Launched! SMP: AP CPU #3 Launched! ad0: 73308MB <IBM-DTLA-307075> [148945/16/63] at ata2-master using UDMA100 ad1: 73308MB <IBM-DTLA-307075> [148945/16/63] at ata2-slave using UDMA100 ad2: 73308MB <IBM-DTLA-307075> [148945/16/63] at ata3-master using UDMA100 ad3: 73308MB <IBM-DTLA-307075> [148945/16/63] at ata3-slave using UDMA100 Waiting 3 seconds for SCSI devices to settle pt0 at ahc1 bus 0 target 6 lun 0 pt0: <DELL 6UW BACKPLANE 7> Fixed Processor SCSI-2 device pt0: 3.300MB/s transfers sa0 at ahc2 bus 0 target 6 lun 0 sa0: <ARCHIVE Python 29987-XXX 5.AM> Removable Sequential Access SCSI-2 device sa0: 4.545MB/s transfers (4.545MHz, offset 15) ses0 at ahc1 bus 0 target 6 lun 0 ses0: <DELL 6UW BACKPLANE 7> Fixed Processor SCSI-2 device ses0: 3.300MB/s transfers ses0: SAF-TE Compliant Device da2 at ahc1 bus 0 target 2 lun 0 da2: <SEAGATE ST19171W 2224> Fixed Direct Access SCSI-2 device da2: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing Enabled da2: 8683MB (17783112 512 byte sectors: 64H 32S/T 8683C) da3 at ahc1 bus 0 target 3 lun 0 da3: <SEAGATE ST19171W 2224> Fixed Direct Access SCSI-2 device da3: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing Enabled da3: 8683MB (17783112 512 byte sectors: 64H 32S/T 8683C) da0 at ahc1 bus 0 target 0 lun 0 da0: <SEAGATE ST34572WC 0784> Fixed Direct Access SCSI-2 device da0: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing Enabled da0: 4095MB (8388315 512 byte sectors: 64H 32S/T 4095C) da1 at ahc1 bus 0 target 1 lun 0 da1: <SEAGATE ST34572WC 0784> Fixed Direct Access SCSI-2 device da1: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing Enabled da1: 4095MB (8388315 512 byte sectors: 64H 32S/T 4095C) ch0 at ahc2 bus 0 target 6 lun 1 ch0: <ARCHIVE Python 29987-XXX 5.AM> Removable Changer SCSI-2 device ch0: 4.545MB/s transfers (4.545MHz, offset 15) ch0: 0 slots, 1 drive, 1 picker, 0 portals Mounting root from ufs:/dev/da0s1a WARNING: / was not properly dismounted vinum: loaded vinum: reading configuration from /dev/ad3s1e vinum: updating configuration from /dev/ad2s1e vinum: updating configuration from /dev/ad1s1e vinum: updating configuration from /dev/ad0s1e cd0 at ahc2 bus 0 target 5 lun 0 cd0: <NEC CD-ROM DRIVE:464 1.05> Removable CD-ROM SCSI-2 device cd0: 20.000MB/s transfers (20.000MHz, offset 15) cd0: Attempt to query device size failed: NOT READY, Medium not present Kernel Config file: machine i386 #cpu I386_CPU #cpu I486_CPU #cpu I586_CPU cpu I686_CPU ident ATLEO4 maxusers 256 makeoptions DEBUG=-g #Build kernel with gdb(1) debug symbols options INET #InterNETworking options INET6 #IPv6 communications protocols options IPSEC #IP security options IPSEC_ESP #IP security (crypto; define w/ IPSEC) options IPSEC_DEBUG #debug for IP security options MROUTING options IPFIREWALL #firewall options IPFIREWALL_VERBOSE #print information about # dropped packets options IPFIREWALL_FORWARD #enable transparent proxy support options IPFIREWALL_VERBOSE_LIMIT=100 #limit verbosity options IPFIREWALL_DEFAULT_TO_ACCEPT #allow everything by default options IPV6FIREWALL #firewall for IPv6 options IPV6FIREWALL_VERBOSE options IPV6FIREWALL_VERBOSE_LIMIT=100 options IPV6FIREWALL_DEFAULT_TO_ACCEPT options IPDIVERT #divert sockets options IPFILTER #ipfilter support options IPFILTER_LOG #ipfilter logging options IPSTEALTH #support for stealth forwarding options TCPDEBUG #options TCP_DROP_SYNFIN #drop TCP packets with SYN+FIN options TCP_RESTRICT_RST #restrict emission of TCP RST options NETATALK #Appletalk protocol options FFS #Berkeley Fast Filesystem options FFS_ROOT #FFS usable as root device [keep this!] options SOFTUPDATES #Enable FFS soft updates support options MFS #Memory Filesystem options MD_ROOT #MD is a potential root device options NFS #Network Filesystem options NFS_ROOT #NFS usable as root device, NFS required options COMPAT_43 #Compatible with BSD 4.3 [KEEP THIS!] options SCSI_DELAY=3000 #Delay (in ms) before probing SCSI options UCONSOLE #Allow users to grab the console options USERCONFIG #boot -c editor options VISUAL_USERCONFIG #visual boot -c editor options KTRACE #ktrace(1) support options SYSVSHM #SYSV-style shared memory options SYSVMSG #SYSV-style message queues options SYSVSEM #SYSV-style semaphores options P1003_1B #Posix P1003_1B real-time extensions options _KPOSIX_PRIORITY_SCHEDULING options ICMP_BANDLIM #Rate limit bad replies options KBD_INSTALL_CDEV # install a CDEV entry in /dev options NETGRAPH # To make an SMP kernel, the next two are needed options SMP # Symmetric MultiProcessor Kernel options APIC_IO # Symmetric (APIC) I/O # Optionally these may need tweaked, (defaults shown): options NCPU=4 # number of CPUs options NBUS=3 # number of busses options NAPIC=1 # number of IO APICs options NINTR=24 # number of INTs device isa device eisa device pci # Floppy drives device fdc0 at isa? port IO_FD1 irq 6 drq 2 device fd0 at fdc0 drive 0 device fd1 at fdc0 drive 1 # ATA and ATAPI devices #device ata0 at isa? port IO_WD1 irq 14 #device ata1 at isa? port IO_WD2 irq 15 device ata device atadisk # ATA disk drives device atapicd # ATAPI CDROM drives device atapifd # ATAPI floppy drives device atapist # ATAPI tape drives #options ATA_STATIC_ID #Static device numbering options ATA_ENABLE_ATAPI_DMA #Enable DMA on ATAPI devices # SCSI Controllers #device ahb # EISA AHA1742 family device ahc0 # AHA2940 and onboard AIC7xxx devices device ahc1 # AHA2940 and onboard AIC7xxx devices device ahc2 # AHA2940 and onboard AIC7xxx devices # SCSI peripherals device scbus # SCSI bus (required) device da # Direct Access (disks) device sa # Sequential Access (tape etc) device ch # SCSI media changers device cd # CD device pass # Passthrough device (direct SCSI access) device pt # SCSI processor type device ses # SCSI SES/SAF-TE driver # disks # the first ahc0 ist the external controller, which we use as last bus # the first internal ahc1 is the first we use with the SCA disks # the second internal ahc2 has the CD-ROM and the Archive Python device scbus0 at ahc1 device scbus1 at ahc2 device scbus2 at ahc0 device da0 at scbus0 target 0 device da1 at scbus0 target 1 device da2 at scbus0 target 2 device da3 at scbus0 target 3 # atkbdc0 controls both the keyboard and the PS/2 mouse device atkbdc0 at isa? port IO_KBD device atkbd0 at atkbdc? irq 1 flags 0x1 device psm0 at atkbdc? irq 12 device vga0 at isa? # splash screen/screen saver pseudo-device splash # syscons is the default console driver, resembling an SCO console device sc0 at isa? flags 0x100 options MAXCONS=12 # number of virtual consoles options SC_NORM_ATTR="(FG_LIGHTGREY|BG_BLACK)" options SC_NORM_REV_ATTR="(FG_YELLOW|BG_GREEN)" options SC_KERNEL_CONS_ATTR="(FG_WHITE|BG_BLUE)" options SC_KERNEL_CONS_REV_ATTR="(FG_BLACK|BG_RED)" # Floating point support - do not disable. device npx0 at nexus? port IO_NPX irq 13 # Power management support (see LINT for more options) device apm0 at nexus? disable flags 0x20 # Advanced Power Management # PCCARD (PCMCIA) support # Serial (COM) ports device sio0 at isa? port IO_COM1 flags 0x10 irq 4 device sio1 at isa? port IO_COM2 irq 3 device sio2 at isa? disable port IO_COM3 irq 5 device sio3 at isa? disable port IO_COM4 irq 9 # Parallel port device ppc0 at isa? irq 7 device ppbus # Parallel port bus (required) device lpt # Printer device plip # TCP/IP over parallel device ppi # Parallel port interface device #device vpo # Requires scbus and da # PCI Ethernet NICs. device de # DEC/Intel DC21x4x (``Tulip'') device fxp # Intel EtherExpress PRO/100B (82557, 82558) device tx # SMC 9432TX (83c170 ``EPIC'') device vx # 3Com 3c590, 3c595 (``Vortex'') device wx # Intel Gigabit Ethernet Card (``Wiseman'') # PCI Ethernet NICs that use the common MII bus controller code. device miibus # MII bus support device dc # DEC/Intel 21143 and various workalikes device rl # RealTek 8129/8139 device sf # Adaptec AIC-6915 (``Starfire'') device sis # Silicon Integrated Systems SiS 900/SiS 7016 device ste # Sundance ST201 (D-Link DFE-550TX) device tl # Texas Instruments ThunderLAN device vr # VIA Rhine, Rhine II device wb # Winbond W89C840F device xl # 3Com 3c90x (``Boomerang'', ``Cyclone'') # ISA Ethernet NICs. # Pseudo devices - the number indicates how many units to allocated. pseudo-device loop # Network loopback pseudo-device ether # Ethernet support pseudo-device sl 1 # Kernel SLIP pseudo-device ppp 1 # Kernel PPP pseudo-device tun # Packet tunnel. pseudo-device pty 256 # Pseudo-ttys (telnet etc) pseudo-device md # Memory "disks" pseudo-device gif 4 # IPv6 and IPv4 tunneling pseudo-device faith 1 # IPv6-to-IPv4 relaying (translation) pseudo-device vn pseudo-device snp 4 # The `bpf' pseudo-device enables the Berkeley Packet Filter. # Be aware of the administrative consequences of enabling this! pseudo-device bpf #Berkeley packet filter # USB support device uhci # UHCI PCI->USB interface device ohci # OHCI PCI->USB interface device usb # USB Bus (required) device ugen # Generic device uhid # "Human Interface Devices" device ukbd # Keyboard device ulpt # Printer device umass # Disks/Mass storage - Requires scbus and da device ums # Mouse # USB Ethernet, requires mii device aue # ADMtek USB ethernet device cue # CATC USB ethernet device kue # Kawasaki LSI USB ethernet VINUM statements according to instructions on www.vinumvm.org: Problem: Subsequent crashes (kernel panics) during heavy disk-access on a vinum device. FreeBSD: 4.1-STABLE, no changes to the sources Vinum list: one raid5 volume from 4 ATA drives atleo4:/usr/src#vinum list 4 drives: D d1 State: up Device /dev/ad0s1e Avail: 0/73304 MB (0%) D d2 State: up Device /dev/ad1s1e Avail: 0/73304 MB (0%) D d3 State: up Device /dev/ad2s1e Avail: 0/73304 MB (0%) D d4 State: up Device /dev/ad3s1e Avail: 0/73304 MB (0%) 1 volumes: V leoata State: up Plexes: 1 Size: 214 GB 1 plexes: P leoata.p0 R5 State: up Subdisks: 4 Size: 214 GB 4 subdisks: S leoata.p0.s0 State: up PO: 0 B Size: 71 GB S leoata.p0.s1 State: up PO: 512 kB Size: 71 GB S leoata.p0.s2 State: up PO: 1024 kB Size: 71 GB S leoata.p0.s3 State: up PO: 1536 kB Size: 71 GB The history file reflects the creation of the volume which didn't cause any problems: History file in: /var/log/vinum_history (not /var/tmp !): [..] 6 Sep 2000 17:41:13.473942 *** vinum started *** 6 Sep 2000 17:41:13.475950 create -v vinum.init.leoata drive d1 device /dev/ad0e drive d2 device /dev/ad1e drive d3 device /dev/ad2e drive d4 device /dev/ad3e volume leoata plex org raid5 512k sd length 150127097s drive d1 sd length 150127097s drive d2 sd length 150127097s drive d3 sd length 150127097s drive d4 6 Sep 2000 17:41:13.491734 *** Created devices *** [..] 6 Sep 2000 17:50:55.914542 *** vinum started *** 6 Sep 2000 17:50:55.916405 init -w leoata.p0 [..] /var/log/messages from the same period: [..] Sep 6 17:41:13 atleo4 /kernel: vinum: drive d1 is up Sep 6 17:41:13 atleo4 /kernel: vinum: drive d2 is up Sep 6 17:41:13 atleo4 /kernel: vinum: drive d3 is up Sep 6 17:41:13 atleo4 /kernel: vinum: drive d4 is up Sep 6 17:41:13 atleo4 /kernel: vinum: removing 1515 blocks of partial stripe at the en d of leoata.p0 Sep 6 17:50:55 atleo4 /kernel: vinum: leoata.p0.s2 is initializing by force Sep 6 17:50:55 atleo4 /kernel: vinum: leoata.p0 is initializing Sep 6 17:50:55 atleo4 /kernel: vinum: leoata.p0.s0 is initializing by force Sep 6 17:50:56 atleo4 /kernel: vinum: leoata.p0.s1 is initializing by force Sep 6 17:50:56 atleo4 /kernel: vinum: leoata.p0.s3 is initializing by force [..] Sep 6 21:08:09 atleo4 /kernel: vinum: leoata.p0.s0 is initialized by force Sep 6 21:08:10 atleo4 /kernel: vinum: leoata.p0.s0 is initialized Sep 6 21:08:10 atleo4 /kernel: vinum: leoata.p0.s1 is initialized by force Sep 6 21:08:10 atleo4 /kernel: vinum: leoata.p0.s1 is initialized Sep 6 21:08:32 atleo4 /kernel: vinum: leoata.p0.s2 is initialized by force Sep 6 21:08:32 atleo4 /kernel: vinum: leoata.p0.s2 is initialized Sep 6 21:08:32 atleo4 /kernel: vinum: leoata.p0.s3 is initialized by force Sep 6 21:08:32 atleo4 /kernel: vinum: leoata.p0.s0 is up Sep 6 21:08:32 atleo4 /kernel: vinum: leoata.p0.s1 is up Sep 6 21:08:32 atleo4 /kernel: vinum: leoata.p0.s2 is up Sep 6 21:08:32 atleo4 /kernel: vinum: leoata.p0.s3 is up Sep 6 21:08:32 atleo4 /kernel: vinum: leoata.p0 is up Sep 6 21:08:32 atleo4 /kernel: vinum: leoata is up Sep 6 21:08:32 atleo4 /kernel: vinum: leoata.p0.s3 is up [..] newfs, mount, etc worked. Crash anlysis: 4 crashes total within two days!! The machine was did not crash before vinum was used on it. I'm pretty sure, that the modules and kernel are compiled with debugging symbols, that is, configured with -g (CONFIGARGS= -g), and makeoptions DEBUG=-g in the kernel config. atleo4:/var/crash#file /modules/vinum.ko /modules/vinum.ko: ELF 32-bit LSB shared object, Intel 80386, version 1 (FreeBSD), not stripped atleo4:/var/crash#file kernel.1 kernel.1: ELF 32-bit LSB executable, Intel 80386, version 1 (FreeBSD), dynamically link ed, not stripped atleo4:/var/crash#file kernel.2 kernel.2: ELF 32-bit LSB executable, Intel 80386, version 1 (FreeBSD), dynamically link ed, not stripped atleo4:/var/crash#file kernel.3 kernel.3: ELF 32-bit LSB executable, Intel 80386, version 1 (FreeBSD), dynamically link ed, not stripped atleo4:/var/crash#file kernel.4 kernel.4: ELF 32-bit LSB executable, Intel 80386, version 1 (FreeBSD), dynamically link ed, not stripped But I don't seem to get a proper analysis with your .gdbinit.* files, and gdb says: no debugging symbols found ??? Maybe there is something I missed, but what ??? However... Crash 1: atleo4:/var/crash#gdb -k kernel.1 vmcore.1 GNU gdb 4.18 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-unknown-freebsd"... (no debugging symbols found)... SMP 4 cpus IdlePTD 4284416 initial pcb at 3608e0 panicstr: page fault panic messages: --- Fatal trap 12: page fault while in kernel mode mp_lock = 00000002; cpuid = 0; lapic.id = 00000000 fault virtual address = 0x0 fault code = supervisor read, page not present instruction pointer = 0x8:0xc23266ca stack pointer = 0x10:0xff806f00 frame pointer = 0x10:0xff806f1c code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = Idle interrupt mask = bio <- SMP: XXX trap number = 12 panic: page fault mp_lock = 00000002; cpuid = 0; lapic.id = 00000000 boot() called on cpu#0 syncing disks... Fatal trap 12: page fault while in kernel mode mp_lock = 00000003; cpuid = 0; lapic.id = 00000000 fault virtual address = 0x30 fault code = supervisor read, page not present instruction pointer = 0x8:0xc0273971 stack pointer = 0x10:0xff806d20 frame pointer = 0x10:0xff806d24 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = Idle interrupt mask = bio <- SMP: XXX trap number = 12 panic: page fault mp_lock = 00000003; cpuid = 0; lapic.id = 00000000 boot() called on cpu#0 Uptime: 1h18m17s dumping to dev #da/0x20001, offset 1048576 dump 512 ... --- #0 0xc016b6b8 in boot () .gdbinit:4: Error in sourced command file: Attempt to extract a component of a value that is not a structure. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This may be because of missing debugging symbols ?? Stacktrace: (kgdb) bt #0 0xc016b6b8 in boot () #1 0xc016ba70 in poweroff_wait () #2 0xc02d9baf in trap_fatal () #3 0xc02d9845 in trap_pfault () #4 0xc02d93df in trap () #5 0xc0273971 in acquire_lock () #6 0xc0277660 in softdep_update_inodeblock () #7 0xc0272c5d in ffs_update () #8 0xc027a931 in ffs_sync () #9 0xc01993f3 in sync () #10 0xc016b48b in boot () #11 0xc016ba70 in poweroff_wait () #12 0xc02d9baf in trap_fatal () #13 0xc02d9845 in trap_pfault () #14 0xc02d93df in trap () #15 0xc23266ca in ?? () #16 0xc019136b in biodone () #17 0xc02af030 in ad_interrupt () #18 0xc02ab3e6 in ata_intr () #19 0xc02e202d in intr_mux () Crash 2: [..] SMP 4 cpus IdlePTD 4284416 initial pcb at 3608e0 panicstr: page fault panic messages: --- Fatal trap 12: page fault while in kernel mode mp_lock = 00000002; cpuid = 0; lapic.id = 00000000 fault virtual address = 0xc3608010 fault code = supervisor read, page not present instruction pointer = 0x8:0xc232a112 stack pointer = 0x10:0xff806ee8 frame pointer = 0x10:0xff806ef0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = Idle interrupt mask = bio <- SMP: XXX trap number = 12 panic: page fault mp_lock = 00000002; cpuid = 0; lapic.id = 00000000 boot() called on cpu#0 syncing disks... Fatal trap 12: page fault while in kernel mode mp_lock = 00000003; cpuid = 0; lapic.id = 00000000 fault virtual address = 0x30 fault code = supervisor read, page not present instruction pointer = 0x8:0xc0273971 stack pointer = 0x10:0xff806d08 frame pointer = 0x10:0xff806d0c code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = Idle interrupt mask = bio <- SMP: XXX trap number = 12 panic: page fault mp_lock = 00000003; cpuid = 0; lapic.id = 00000000 boot() called on cpu#0 Uptime: 14h5m17s [..] #0 0xc016b6b8 in boot () #1 0xc016ba70 in poweroff_wait () #2 0xc02d9baf in trap_fatal () #3 0xc02d9845 in trap_pfault () #4 0xc02d93df in trap () #5 0xc0273971 in acquire_lock () #6 0xc0277660 in softdep_update_inodeblock () #7 0xc0272c5d in ffs_update () #8 0xc027a931 in ffs_sync () #9 0xc01993f3 in sync () #10 0xc016b48b in boot () #11 0xc016ba70 in poweroff_wait () #12 0xc02d9baf in trap_fatal () #13 0xc02d9845 in trap_pfault () #14 0xc02d93df in trap () #15 0xc232a112 in ?? () #16 0xc2326bfc in ?? () #17 0xc019136b in biodone () #18 0xc02af030 in ad_interrupt () #19 0xc02ab3e6 in ata_intr () #20 0xc02e202d in intr_mux () Crash 3: This one is different ... SMP 4 cpus IdlePTD 4272128 initial pcb at 360920 panicstr: ffs_valloc: dup alloc panic messages: --- panic: ffs_valloc: dup alloc mp_lock = 00000001; cpuid = 0; lapic.id = 00000000 boot() called on cpu#0 syncing disks... 166 38 19 5 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 giving up on 2 buffers Uptime: 11h42m59s [..] #0 0xc016b6bc in boot () #1 0xc016ba74 in poweroff_wait () #2 0xc0270030 in ffs_valloc () #3 0xc02817ca in ufs_mkdir () #4 0xc02827d5 in ufs_vnoperate () #5 0xc019c28a in mkdir () #6 0xc02d9f09 in syscall2 () #7 0xc02c845b in Xint0x80_syscall () #8 0x804efc7 in ?? () #9 0x80494fd in ?? () [..] Crash 4: SMP 4 cpus IdlePTD 4272128 initial pcb at 360920 panicstr: page fault panic messages: --- Fatal trap 12: page fault while in kernel mode mp_lock = 03000002; cpuid = 3; lapic.id = 02000000 fault virtual address = 0xc32c9010 fault code = supervisor read, page not present instruction pointer = 0x8:0xc232a112 stack pointer = 0x10:0xff81bee8 frame pointer = 0x10:0xff81bef0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = Idle interrupt mask = bio <- SMP: XXX trap number = 12 panic: page fault mp_lock = 03000002; cpuid = 3; lapic.id = 02000000 boot() called on cpu#3 syncing disks... Fatal trap 12: page fault while in kernel mode mp_lock = 03000003; cpuid = 3; lapic.id = 02000000 fault virtual address = 0x30 fault code = supervisor read, page not present instruction pointer = 0x8:0xc027397d stack pointer = 0x10:0xff81bd00 frame pointer = 0x10:0xff81bd04 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = Idle interrupt mask = bio <- SMP: XXX trap number = 12 panic: page fault mp_lock = 03000003; cpuid = 3; lapic.id = 02000000 boot() called on cpu#3 Uptime: 3h23m29s [..] (kgdb) bt #0 0xc016b6bc in boot () #1 0xc016ba74 in poweroff_wait () '#2 0xc02d9bdf in trap_fatal () #3 0xc02d9875 in trap_pfault () #4 0xc02d940f in trap () #5 0xc027397d in acquire_lock () #6 0xc0277b52 in softdep_fsync_mountdev () #7 0xc027bc9a in ffs_fsync () #8 0xc027a9c6 in ffs_sync () #9 0xc01993e7 in sync () #10 0xc016b48f in boot () #11 0xc016ba74 in poweroff_wait () #12 0xc02d9bdf in trap_fatal () #13 0xc02d9875 in trap_pfault () #14 0xc02d940f in trap () #15 0xc232a112 in ?? () #16 0xc2326bfc in ?? () #17 0xc019135f in biodone () #18 0xc02af068 in ad_interrupt () #19 0xc02ab41e in ata_intr () #20 0xc02e205d in intr_mux () [..] Of course this could be a ATA problem, but I already had two crashes in a previous configuration while trying to set up a stripe with two SCSI disks. A detailed description of these previous problems has been sent to Greg Lehey <grog@lemis.com> on August 16 2000. Fix: Nope. How-To-Repeat: Tricky, this some sort of unique hardware configuration. On this configuration it seems to be sufficient to transfer huge amounts of data to the vinum device (around 100GB have been transferred in total, with interruptions of the crashes. The largest portion during uptime may be around 50GB). The data was transferred via NFS. The filesystem uses SOFTUPDATES, the first crash corrupted it in severe way, so that fsck had to be run manually (producing lots of 'unexpected softupdates inconsistency' errors). But I guess thats just a side-effect.
State Changed From-To: open->feedback Submitter did not supply the required information.
Responsible Changed From-To: freebsd-bugs->grog grog supports Vinum.
Ok, Some more information, unfortunately its no backtrace with "vinum debug" in the calling frame. I will try to build vinum statically in the kernel, maybe this could help... So, I can now reproduce panics in a deterministic way. The machine repeatedly crashed during periodic daily, and I could track it down to a simple: find /leo/.mntpts/2 -xdev -type f ( -perm -u+x -or -perm -g+x -or -perm -o+x ) ( -perm -u+s -or -perm -g+s ) -print0 (with /leo/.mntpts/2 beeing the mountpoint of the vinum volume). And this also works by just executing find /leo/.mntpts/2 -xdev -type f -print. That may not help much, but's something more... Daniel -- IRCnet: Mr-Spock - ceterum censeo Microsoftinem esse delendam - *Daniel Lang * dl@leo.org * +49 89 289 25735 * http://www.leo.org/~dl/*
Hi, to further trace the problem, as the crash-dumps seemed not to produce any usable stack traces, I hooked the box up to a remote debugging session with DDB/GDB. (No problem to panic the box, as described). As far as I can tell, the crash did not happen inside the vinum-module. This may be the cause, why your .gdbinit scripts don't seem to apply, I guess. The crash happened inside the ata driver, but it seems that a former valid pointer is overwritten somehow, so that it contains garbage, which leads to the crash (I assume that the 'struct ata_softc' is corrupted). This probably comes from a buffer overrun somewhere else, that writes into already allocated memory. Unfortunately such errors are very difficult to trace (well at least for my experience). Since the error only appears on the system with a vinum RAID-5 and only and reproducible while accessing this filesystem, I (possibly naive) assume it must be a problem with vinum. Unfortunatelty I guess we are stuck here, since I am not able to produce more data that could help with the problem. However I would grant access to the machine and the debugger, if someone would like to inspect the situation personally. Best regards, Daniel Lang -- IRCnet: Mr-Spock - Truth lies in the eye of the beholder - *Daniel Lang * dl@leo.org * +49 89 289 25735 * http://www.leo.org/~dl/*
I think I'm suffering this too. I'm getting reliable panics on a vinum RAID 5 config over SCSI drives. Striped or mirror configs are fine. At first I was attributing this to problems with an Adpatec 29160 but it doesn't appear to be the culprit. A little inspection of the crash dump (backtrace etc... follows) shows problems with the request structure in complete_rqe(). I have two dumps, both show different problems but both associated with request completion. BTW, I tried the gdb macros from the module source directory but it issues error messages. Hopefully the crash dump appears sane (I've done lots of kernel work on other systems but no FreeBSD), following pointers makes it appear so. Script started on Wed Sep 20 13:57:39 2000 bash-2.04# gdb -k /sys/compile/yoda/kernel.debug vmcore.1 GNU gdb 4.18 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-unknown-freebsd"... IdlePTD 4296704 initial pcb at 2e7a40 panicstr: page fault panic messages: --- Fatal trap 12: page fault while in kernel mode fault virtual address = 0x4 fault code = supervisor read, page not present instruction pointer = 0x8:0xc014ab14 stack pointer = 0x10:0xc02c0e94 frame pointer = 0x10:0xc02c0eb0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = Idle interrupt mask = bio trap number = 12 panic: page fault syncing disks... Fatal trap 12: page fault while in kernel mode fault virtual address = 0x30 fault code = supervisor read, page not present instruction pointer = 0x8:0xc0229aa0 stack pointer = 0x10:0xc02c0cc8 frame pointer = 0x10:0xc02c0ccc code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = Idle interrupt mask = bio trap number = 12 panic: page fault Uptime: 2h12m16s Fatal trap 12: page fault while in kernel mode fault virtual address = 0x0 fault code = supervisor read, page not present instruction pointer = 0x8:0xc014ab1a stack pointer = 0x10:0xc02c05a8 frame pointer = 0x10:0xc02c05c4 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = Idle interrupt mask = bio cam trap number = 12 panic: page fault Uptime: 2h12m16s Fatal trap 12: page fault while in kernel mode fault virtual address = 0x4 fault code = supervisor read, page not present instruction pointer = 0x8:0xc014ab14 stack pointer = 0x10:0xc02bfe88 frame pointer = 0x10:0xc02bfea4 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = Idle interrupt mask = bio cam trap number = 12 panic: page fault Uptime: 2h12m16s dumping to dev #da/0x20001, offset 1048704 dump 511 510 509 508 507 506 505 504 503 502 501 500 499 498 497 496 495 494 493 492 491 490 489 488 487 486 485 484 483 482 481 480 479 478 477 476 475 474 473 472 471 470 469 468 467 466 465 464 463 462 461 460 459 458 457 456 455 454 453 452 451 450 449 448 447 446 445 444 443 442 441 440 439 438 437 436 435 434 433 432 431 430 429 428 427 426 425 424 423 422 421 420 419 418 417 416 415 414 413 412 411 410 409 408 407 406 405 404 403 402 401 400 399 398 397 396 395 394 393 392 391 390 389 388 387 386 385 384 383 382 381 380 379 378 377 376 375 374 373 372 371 370 369 368 367 366 365 364 363 362 361 360 359 358 357 356 355 354 353 352 351 350 349 348 347 346 345 344 343 342 341 340 339 338 337 336 335 334 333 332 331 330 329 328 327 326 325 324 323 322 321 320 319 318 317 316 315 314 313 312 311 310 309 308 307 306 305 304 303 302 301 300 299 298 297 296 295 294 293 292 291 290 289 288 287 286 285 284 283 282 281 280 279 278 277 276 275 274 273 272 271 270 269 268 267 266 265 264 263 262 261 260 259 258 257 256 255 254 253 252 251 250 249 248 247 246 245 244 243 242 241 240 239 238 237 236 235 234 233 232 231 230 229 228 227 226 225 224 223 222 221 220 219 218 217 216 215 214 213 212 211 210 209 208 207 206 205 204 203 202 201 200 199 198 197 196 195 194 193 192 191 190 189 188 187 186 185 184 183 182 181 180 179 178 177 176 175 174 173 172 171 170 169 168 167 166 165 164 163 162 161 160 159 158 157 156 155 154 153 152 151 150 149 148 147 146 145 144 143 142 141 140 139 138 137 136 135 134 133 132 131 130 129 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 --- #0 boot (howto=260) at ../../kern/kern_shutdown.c:302 302 dumppcb.pcb_cr3 = rcr3(); (kgdb) where #0 boot (howto=260) at ../../kern/kern_shutdown.c:302 #1 0xc01689c4 in poweroff_wait (junk=0xc02ba0cf, howto=0) at ../../kern/kern_shutdown.c:552 #2 0xc027a78d in trap_fatal (frame=0xc02bfe48, eva=4) at ../../i386/i386/trap.c:951 #3 0xc027a465 in trap_pfault (frame=0xc02bfe48, usermode=0, eva=4) at ../../i386/i386/trap.c:844 #4 0xc027a04b in trap (frame={tf_fs = -1070923760, tf_es = -1072562160, tf_ds = -1072562160, tf_edi = -1017374328, tf_esi = -1043326976, tf_ebp = -1070858588, tf_isp = -1070858636, tf_ebx = -1017374328, tf_edx = 6865984, tf_ecx = -1043326816, tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = -1072387308, tf_cs = 8, tf_eflags = 66178, tf_esp = -1017374328, tf_ss = -1043326976}) at ../../i386/i386/trap.c:443 #5 0xc014ab14 in complete_rqe (bp=0xc35c1988) at ../../dev/vinum/vinuminterrupt.c:72 #6 0xc018d82f in biodone (bp=0xc35c1988) at ../../kern/vfs_bio.c:2637 #7 0xc012b7fd in dadone (periph=0xc1cec200, done_ccb=0xc1cfb800) at ../../cam/scsi/scsi_da.c:1262 #8 0xc012771f in camisr (queue=0xc02e5830) at ../../cam/cam_xpt.c:6323 #9 0xc0127531 in swi_cambio () at ../../cam/cam_xpt.c:6226 #10 0xc0124930 in xpt_polled_action (start_ccb=0xc02c0238) at ../../cam/cam_xpt.c:3393 #11 0xc012bcc5 in dashutdown (arg=0x0, howto=260) at ../../cam/scsi/scsi_da.c:1554 #12 0xc0168610 in boot (howto=260) at ../../kern/kern_shutdown.c:297 #13 0xc01689c4 in poweroff_wait (junk=0xc02ba0cf, howto=0) at ../../kern/kern_shutdown.c:552 #14 0xc027a78d in trap_fatal (frame=0xc02c0568, eva=0) at ../../i386/i386/trap.c:951 #15 0xc027a465 in trap_pfault (frame=0xc02c0568, usermode=0, eva=0) at ../../i386/i386/trap.c:844 #16 0xc027a04b in trap (frame={tf_fs = -1070858224, tf_es = -1072562160, tf_ds = -1072562160, tf_edi = -1017372280, tf_esi = -1043326976, tf_ebp = -1070856764, tf_isp = -1070856812, tf_ebx = -1017372280, tf_edx = 0, tf_ecx = -1043326816, tf_eax = -1017372672, tf_trapno = 12, tf_err = 0, tf_eip = -1072387302, tf_cs = 8, tf_eflags = 66182, tf_esp = -1017372280, tf_ss = -1043326976}) at ../../i386/i386/trap.c:443 #17 0xc014ab1a in complete_rqe (bp=0xc35c2188) at ../../dev/vinum/vinuminterrupt.c:73 #18 0xc018d82f in biodone (bp=0xc35c2188) at ../../kern/vfs_bio.c:2637 #19 0xc012b7fd in dadone (periph=0xc1cec200, done_ccb=0xc1eb1400) at ../../cam/scsi/scsi_da.c:1262 #20 0xc012771f in camisr (queue=0xc02e5830) at ../../cam/cam_xpt.c:6323 #21 0xc0127531 in swi_cambio () at ../../cam/cam_xpt.c:6226 #22 0xc0124930 in xpt_polled_action (start_ccb=0xc02c0958) at ../../cam/cam_xpt.c:3393 #23 0xc012bcc5 in dashutdown (arg=0x0, howto=260) at ../../cam/scsi/scsi_da.c:1554 #24 0xc0168610 in boot (howto=260) at ../../kern/kern_shutdown.c:297 #25 0xc01689c4 in poweroff_wait (junk=0xc02ba0cf, howto=0) at ../../kern/kern_shutdown.c:552 #26 0xc027a78d in trap_fatal (frame=0xc02c0c88, eva=48) at ../../i386/i386/trap.c:951 #27 0xc027a465 in trap_pfault (frame=0xc02c0c88, usermode=0, eva=48) at ../../i386/i386/trap.c:844 #28 0xc027a04b in trap (frame={tf_fs = 16, tf_es = 16, tf_ds = -1072300016, tf_edi = 0, tf_esi = -1069900608, tf_ebp = -1070854964, tf_isp = -1070854988, tf_ebx = -1070752356, tf_edx = 6865984, tf_ecx = 12, tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = -1071474016, tf_cs = 8, tf_eflags = 66050, tf_esp = 0, tf_ss = -1070854936}) at ../../i386/i386/trap.c:443 #29 0xc0229aa0 in acquire_lock (lk=0xc02d9d9c) at ../../ufs/ffs/ffs_softdep.c:265 #30 0xc022dc62 in softdep_fsync_mountdev (vp=0xd4b08c00) at ../../ufs/ffs/ffs_softdep.c:3788 #31 0xc0231d16 in ffs_fsync (ap=0xc02c0d40) at ../../ufs/ffs/ffs_vnops.c:134 #32 0xc0230a36 in ffs_sync (mp=0xc1cfee00, waitfor=2, cred=0xc1441680, p=0xc03a9cc0) at vnode_if.h:537 #33 0xc01953f7 in sync (p=0xc03a9cc0, uap=0x0) at ../../kern/vfs_syscalls.c:544 #34 0xc0168413 in boot (howto=256) at ../../kern/kern_shutdown.c:224 #35 0xc01689c4 in poweroff_wait (junk=0xc02ba0cf, howto=0) at ../../kern/kern_shutdown.c:552 #36 0xc027a78d in trap_fatal (frame=0xc02c0e54, eva=4) at ../../i386/i386/trap.c:951 #37 0xc027a465 in trap_pfault (frame=0xc02c0e54, usermode=0, eva=4) at ../../i386/i386/trap.c:844 #38 0xc027a04b in trap (frame={tf_fs = -1070858224, tf_es = -1072562160, tf_ds = -1072562160, tf_edi = -1008691832, tf_esi = -1043326976, tf_ebp = -1070854480, tf_isp = -1070854528, tf_ebx = -1008691832, tf_edx = 6865984, tf_ecx = -1043326816, tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = -1072387308, tf_cs = 8, tf_eflags = 66182, tf_esp = -1008691832, tf_ss = -1043326976}) at ../../i386/i386/trap.c:443 #39 0xc014ab14 in complete_rqe (bp=0xc3e09588) at ../../dev/vinum/vinuminterrupt.c:72 #40 0xc018d82f in biodone (bp=0xc3e09588) at ../../kern/vfs_bio.c:2637 #41 0xc012b7fd in dadone (periph=0xc1cec200, done_ccb=0xc1e9b800) at ../../cam/scsi/scsi_da.c:1262 #42 0xc012771f in camisr (queue=0xc02e5830) at ../../cam/cam_xpt.c:6323 #43 0xc0127531 in swi_cambio () at ../../cam/cam_xpt.c:6226 #44 0xc0270db0 in splz_swi () (kgdb) up #1 0xc01689c4 in poweroff_wait (junk=0xc02ba0cf, howto=0) at ../../kern/kern_shutdown.c:552 552 boot(bootopt); (kgdb) #2 0xc027a78d in trap_fatal (frame=0xc02bfe48, eva=4) at ../../i386/i386/trap.c:951 951 panic(trap_msg[type]); (kgdb) #3 0xc027a465 in trap_pfault (frame=0xc02bfe48, usermode=0, eva=4) at ../../i386/i386/trap.c:844 844 trap_fatal(frame, eva); (kgdb) #4 0xc027a04b in trap (frame={tf_fs = -1070923760, tf_es = -1072562160, tf_ds = -1072562160, tf_edi = -1017374328, tf_esi = -1043326976, tf_ebp = -1070858588, tf_isp = -1070858636, tf_ebx = -1017374328, tf_edx = 6865984, tf_ecx = -1043326816, tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = -1072387308, tf_cs = 8, tf_eflags = 66178, tf_esp = -1017374328, tf_ss = -1043326976}) at ../../i386/i386/trap.c:443 443 (void) trap_pfault(&frame, FALSE, eva); (kgdb) #5 0xc014ab14 in complete_rqe (bp=0xc35c1988) at ../../dev/vinum/vinuminterrupt.c:72 72 rqg = rqe->rqg; /* and the request group */ (kgdb) list 65,75 65 struct rqelement *rqe; 66 struct request *rq; 67 struct rqgroup *rqg; 68 struct buf *ubp; /* user buffer */ 69 struct drive *drive; 70 71 rqe = (struct rqelement *) bp; /* point to the element element that completed */ 72 rqg = rqe->rqg; /* and the request group */ 73 rq = rqg->rq; /* and the complete request */ 74 ubp = rq->bp; /* user buffer */ 75 (kgdb) set print pretty (kgdb) p *rqe $1 = { b = { b_hash = { le_next = 0x0, le_prev = 0x0 }, b_vnbufs = { tqe_next = 0x0, tqe_prev = 0x0 }, b_freelist = { tqe_next = 0x0, tqe_prev = 0x0 }, b_act = { tqe_next = 0xc35c0020, tqe_prev = 0x0 }, b_flags = 516, b_qindex = 0, b_xflags = 0 '\000', b_lock = { lk_interlock = { lock_data = 0 }, lk_flags = 1024, lk_sharecount = 0, lk_waitcount = 0, lk_exclusivecount = 1, lk_prio = 20, lk_wmesg = 0xc029c064 "bufwait", lk_timo = 0, lk_lockholder = 5 }, b_error = 0, b_bufsize = 0, b_bcount = 16384, b_resid = 0, b_dev = 0x0, b_data = 0xccb1e000 "íA\002", b_kvabase = 0x0, b_kvasize = 0, b_lblkno = 0, b_blkno = 15051625, b_offset = 0, b_iodone = 0xc014aafc <complete_rqe>, b_iodone_chain = 0x0, b_vp = 0x0, b_dirtyoff = 0, b_dirtyend = 0, b_rcred = 0xffffffff, b_wcred = 0xffffffff, b_pblkno = 0, b_saveaddr = 0x0, b_driver1 = 0x0, b_driver2 = 0x0, b_caller1 = 0x0, b_caller2 = 0x0, b_pager = { pg_spc = 0x0, pg_reqpage = 0 }, b_cluster = { cluster_head = { tqh_first = 0x0, tqh_last = 0x0 }, cluster_entry = { tqe_next = 0x0, tqe_prev = 0x0 } }, b_pages = {0x0 <repeats 32 times>}, b_npages = 0, b_dep = { lh_first = 0x0 }, b_chain = { parent = 0x0, count = 0 } }, rqg = 0x0, sdoffset = 15051360, useroffset = 0, dataoffset = 0, groupoffset = 0, datalen = 32, grouplen = 0, buflen = 0, flags = 0, sdno = 1, driveno = 1 } (kgdb) quit bash-2.04# gdb -k /sys/compile/yoda/kernel.debug vmcore.0 GNU gdb 4.18 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-unknown-freebsd"... IdlePTD 4296704 initial pcb at 2e7a40 panicstr: page fault panic messages: --- Fatal trap 12: page fault while in kernel mode fault virtual address = 0x54 fault code = supervisor write, page not present instruction pointer = 0x8:0xc014b0b7 stack pointer = 0x10:0xc02c0e94 frame pointer = 0x10:0xc02c0eb0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = Idle interrupt mask = bio trap number = 12 panic: page fault syncing disks... Fatal trap 12: page fault while in kernel mode fault virtual address = 0x30 fault code = supervisor read, page not present instruction pointer = 0x8:0xc0229aa0 stack pointer = 0x10:0xc02c0cc8 frame pointer = 0x10:0xc02c0ccc code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = Idle interrupt mask = bio trap number = 12 panic: page fault Uptime: 3h14m14s Fatal trap 12: page fault while in kernel mode fault virtual address = 0x54 fault code = supervisor write, page not present instruction pointer = 0x8:0xc014b0b7 stack pointer = 0x10:0xc02c05a8 frame pointer = 0x10:0xc02c05c4 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = Idle interrupt mask = bio cam trap number = 12 panic: page fault Uptime: 3h14m14s Fatal trap 12: page fault while in kernel mode fault virtual address = 0x54 fault code = supervisor write, page not present instruction pointer = 0x8:0xc014b0b7 stack pointer = 0x10:0xc02bfe88 frame pointer = 0x10:0xc02bfea4 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = Idle interrupt mask = bio cam trap number = 12 panic: page fault Uptime: 3h14m14s Fatal trap 12: page fault while in kernel mode fault virtual address = 0x54 fault code = supervisor write, page not present instruction pointer = 0x8:0xc014b0b7 stack pointer = 0x10:0xc02bf768 frame pointer = 0x10:0xc02bf784 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = Idle interrupt mask = bio cam trap number = 12 panic: page fault Uptime: 3h14m15s Fatal trap 12: page fault while in kernel mode fault virtual address = 0x54 fault code = supervisor write, page not present instruction pointer = 0x8:0xc014b0b7 stack pointer = 0x10:0xc02bf048 frame pointer = 0x10:0xc02bf064 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = Idle interrupt mask = bio cam trap number = 12 panic: page fault Uptime: 3h14m15s dumping to dev #da/0x20001, offset 1048704 dump 511 510 509 508 507 506 505 504 503 502 501 500 499 498 497 496 495 494 493 492 491 490 489 488 487 486 485 484 483 482 481 480 479 478 477 476 475 474 473 472 471 470 469 468 467 466 465 464 463 462 461 460 459 458 457 456 455 454 453 452 451 450 449 448 447 446 445 444 443 442 441 440 439 438 437 436 435 434 433 432 431 430 429 428 427 426 425 424 423 422 421 420 419 418 417 416 415 414 413 412 411 410 409 408 407 406 405 404 403 402 401 400 399 398 397 396 395 394 393 392 391 390 389 388 387 386 385 384 383 382 381 380 379 378 377 376 375 374 373 372 371 370 369 368 367 366 365 364 363 362 361 360 359 358 357 356 355 354 353 352 351 350 349 348 347 346 345 344 343 342 341 340 339 338 337 336 335 334 333 332 331 330 329 328 327 326 325 324 323 322 321 320 319 318 317 316 315 314 313 312 311 310 309 308 307 306 305 304 303 302 301 300 299 298 297 296 295 294 293 292 291 290 289 288 287 286 285 284 283 282 281 280 279 278 277 276 275 274 273 272 271 270 269 268 267 266 265 264 263 262 261 260 259 258 257 256 255 254 253 252 251 250 249 248 247 246 245 244 243 242 241 240 239 238 237 236 235 234 233 232 231 230 229 228 227 226 225 224 223 222 221 220 219 218 217 216 215 214 213 212 211 210 209 208 207 206 205 204 203 202 201 200 199 198 197 196 195 194 193 192 191 190 189 188 187 186 185 184 183 182 181 180 179 178 177 176 175 174 173 172 171 170 169 168 167 166 165 164 163 162 161 160 159 158 157 156 155 154 153 152 151 150 149 148 147 146 145 144 143 142 141 140 139 138 137 136 135 134 133 132 131 130 129 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 --- #0 boot (howto=260) at ../../kern/kern_shutdown.c:302 302 dumppcb.pcb_cr3 = rcr3(); (kgdb) where #0 boot (howto=260) at ../../kern/kern_shutdown.c:302 #1 0xc01689c4 in poweroff_wait (junk=0xc02ba0cf, howto=0) at ../../kern/kern_shutdown.c:552 #2 0xc027a78d in trap_fatal (frame=0xc02bf008, eva=84) at ../../i386/i386/trap.c:951 #3 0xc027a465 in trap_pfault (frame=0xc02bf008, usermode=0, eva=84) at ../../i386/i386/trap.c:844 #4 0xc027a04b in trap (frame={tf_fs = -1070923760, tf_es = -1072300016, tf_ds = 7077904, tf_edi = -1038983800, tf_esi = -1038984192, tf_ebp = -1070862236, tf_isp = -1070862284, tf_ebx = -1018077120, tf_edx = 0, tf_ecx = 199687681, tf_eax = -7128129, tf_trapno = 12, tf_err = 2, tf_eip = -1072385865, tf_cs = 8, tf_eflags = 66118, tf_esp = -1038983800, tf_ss = -1043326976}) at ../../i386/i386/trap.c:443 #5 0xc014b0b7 in complete_rqe (bp=0xc2125d88) at ../../dev/vinum/vinuminterrupt.c:192 #6 0xc018d82f in biodone (bp=0xc2125d88) at ../../kern/vfs_bio.c:2637 #7 0xc012b7fd in dadone (periph=0xc1cec200, done_ccb=0xc1e1d000) at ../../cam/scsi/scsi_da.c:1262 #8 0xc012771f in camisr (queue=0xc02e5830) at ../../cam/cam_xpt.c:6323 #9 0xc0127531 in swi_cambio () at ../../cam/cam_xpt.c:6226 #10 0xc0124930 in xpt_polled_action (start_ccb=0xc02bf3f8) at ../../cam/cam_xpt.c:3393 #11 0xc012bcc5 in dashutdown (arg=0x0, howto=260) at ../../cam/scsi/scsi_da.c:1554 #12 0xc0168610 in boot (howto=260) at ../../kern/kern_shutdown.c:297 #13 0xc01689c4 in poweroff_wait (junk=0xc02ba0cf, howto=0) at ../../kern/kern_shutdown.c:552 #14 0xc027a78d in trap_fatal (frame=0xc02bf728, eva=84) at ../../i386/i386/trap.c:951 #15 0xc027a465 in trap_pfault (frame=0xc02bf728, usermode=0, eva=84) at ../../i386/i386/trap.c:844 #16 0xc027a04b in trap (frame={tf_fs = -1070923760, tf_es = -1072300016, tf_ds = 7077904, tf_edi = -1038249592, tf_esi = -1038249984, tf_ebp = -1070860412, tf_isp = -1070860460, tf_ebx = -1018076544, tf_edx = 0, tf_ecx = 199097857, tf_eax = -7128129, tf_trapno = 12, tf_err = 2, tf_eip = -1072385865, tf_cs = 8, tf_eflags = 66118, tf_esp = -1038249592, tf_ss = -1043326976}) at ../../i386/i386/trap.c:443 #17 0xc014b0b7 in complete_rqe (bp=0xc21d9188) at ../../dev/vinum/vinuminterrupt.c:192 #18 0xc018d82f in biodone (bp=0xc21d9188) at ../../kern/vfs_bio.c:2637 #19 0xc012b7fd in dadone (periph=0xc1cec200, done_ccb=0xc1ec5400) at ../../cam/scsi/scsi_da.c:1262 #20 0xc012771f in camisr (queue=0xc02e5830) at ../../cam/cam_xpt.c:6323 #21 0xc0127531 in swi_cambio () at ../../cam/cam_xpt.c:6226 #22 0xc0124930 in xpt_polled_action (start_ccb=0xc02bfb18) at ../../cam/cam_xpt.c:3393 #23 0xc012bcc5 in dashutdown (arg=0x0, howto=260) at ../../cam/scsi/scsi_da.c:1554 #24 0xc0168610 in boot (howto=260) at ../../kern/kern_shutdown.c:297 #25 0xc01689c4 in poweroff_wait (junk=0xc02ba0cf, howto=0) at ../../kern/kern_shutdown.c:552 #26 0xc027a78d in trap_fatal (frame=0xc02bfe48, eva=84) at ../../i386/i386/trap.c:951 #27 0xc027a465 in trap_pfault (frame=0xc02bfe48, usermode=0, eva=84) at ../../i386/i386/trap.c:844 #28 0xc027a04b in trap (frame={tf_fs = -1070923760, tf_es = -1072300016, tf_ds = 7077904, tf_edi = -1040079480, tf_esi = -1040079872, tf_ebp = -1070858588, tf_isp = -1070858636, tf_ebx = -1018076352, tf_edx = 0, tf_ecx = 198901249, tf_eax = -7128129, tf_trapno = 12, tf_err = 2, tf_eip = -1072385865, tf_cs = 8, tf_eflags = 66118, tf_esp = -1040079480, tf_ss = -1043326976}) at ../../i386/i386/trap.c:443 #29 0xc014b0b7 in complete_rqe (bp=0xc201a588) at ../../dev/vinum/vinuminterrupt.c:192 #30 0xc018d82f in biodone (bp=0xc201a588) at ../../kern/vfs_bio.c:2637 #31 0xc012b7fd in dadone (periph=0xc1cec200, done_ccb=0xc1ec5c00) at ../../cam/scsi/scsi_da.c:1262 #32 0xc012771f in camisr (queue=0xc02e5830) at ../../cam/cam_xpt.c:6323 #33 0xc0127531 in swi_cambio () at ../../cam/cam_xpt.c:6226 #34 0xc0124930 in xpt_polled_action (start_ccb=0xc02c0238) at ../../cam/cam_xpt.c:3393 #35 0xc012bcc5 in dashutdown (arg=0x0, howto=260) at ../../cam/scsi/scsi_da.c:1554 #36 0xc0168610 in boot (howto=260) at ../../kern/kern_shutdown.c:297 #37 0xc01689c4 in poweroff_wait (junk=0xc02ba0cf, howto=0) at ../../kern/kern_shutdown.c:552 #38 0xc027a78d in trap_fatal (frame=0xc02c0568, eva=84) at ../../i386/i386/trap.c:951 #39 0xc027a465 in trap_pfault (frame=0xc02c0568, usermode=0, eva=84) at ../../i386/i386/trap.c:844 #40 0xc027a04b in trap (frame={tf_fs = -1070858224, tf_es = -1072300016, tf_ds = 7077904, tf_edi = -1040436856, tf_esi = -1040437248, tf_ebp = -1070856764, tf_isp = -1070856812, tf_ebx = -1018076928, tf_edx = 0, tf_ecx = 199491073, tf_eax = -7128129, tf_trapno = 12, tf_err = 2, tf_eip = -1072385865, tf_cs = 8, tf_eflags = 66118, tf_esp = -1040436856, tf_ss = -1043326976}) at ../../i386/i386/trap.c:443 #41 0xc014b0b7 in complete_rqe (bp=0xc1fc3188) at ../../dev/vinum/vinuminterrupt.c:192 #42 0xc018d82f in biodone (bp=0xc1fc3188) at ../../kern/vfs_bio.c:2637 #43 0xc012b7fd in dadone (periph=0xc1cec200, done_ccb=0xc212dc00) at ../../cam/scsi/scsi_da.c:1262 #44 0xc012771f in camisr (queue=0xc02e5830) at ../../cam/cam_xpt.c:6323 #45 0xc0127531 in swi_cambio () at ../../cam/cam_xpt.c:6226 #46 0xc0124930 in xpt_polled_action (start_ccb=0xc02c0958) at ../../cam/cam_xpt.c:3393 #47 0xc012bcc5 in dashutdown (arg=0x0, howto=260) at ../../cam/scsi/scsi_da.c:1554 #48 0xc0168610 in boot (howto=260) at ../../kern/kern_shutdown.c:297 #49 0xc01689c4 in poweroff_wait (junk=0xc02ba0cf, howto=0) at ../../kern/kern_shutdown.c:552 #50 0xc027a78d in trap_fatal (frame=0xc02c0c88, eva=48) at ../../i386/i386/trap.c:951 #51 0xc027a465 in trap_pfault (frame=0xc02c0c88, usermode=0, eva=48) at ../../i386/i386/trap.c:844 #52 0xc027a04b in trap (frame={tf_fs = 16, tf_es = 16, tf_ds = -1072300016, tf_edi = 0, tf_esi = -1069900608, tf_ebp = -1070854964, tf_isp = -1070854988, tf_ebx = -1070752356, tf_edx = 6865984, tf_ecx = 12, tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = -1071474016, tf_cs = 8, tf_eflags = 66050, tf_esp = 0, tf_ss = -1070854936}) at ../../i386/i386/trap.c:443 #53 0xc0229aa0 in acquire_lock (lk=0xc02d9d9c) at ../../ufs/ffs/ffs_softdep.c:265 #54 0xc022dc62 in softdep_fsync_mountdev (vp=0xd4b08540) at ../../ufs/ffs/ffs_softdep.c:3788 #55 0xc0231d16 in ffs_fsync (ap=0xc02c0d40) at ../../ufs/ffs/ffs_vnops.c:134 #56 0xc0230a36 in ffs_sync (mp=0xc1cfe200, waitfor=2, cred=0xc1441680, p=0xc03a9cc0) at vnode_if.h:537 #57 0xc01953f7 in sync (p=0xc03a9cc0, uap=0x0) at ../../kern/vfs_syscalls.c:544 #58 0xc0168413 in boot (howto=256) at ../../kern/kern_shutdown.c:224 #59 0xc01689c4 in poweroff_wait (junk=0xc02ba0cf, howto=0) at ../../kern/kern_shutdown.c:552 #60 0xc027a78d in trap_fatal (frame=0xc02c0e54, eva=84) at ../../i386/i386/trap.c:951 #61 0xc027a465 in trap_pfault (frame=0xc02c0e54, usermode=0, eva=84) at ../../i386/i386/trap.c:844 #62 0xc027a04b in trap (frame={tf_fs = -1070858224, tf_es = -1072300016, tf_ds = 6815760, tf_edi = -1038613112, tf_esi = -1038613504, tf_ebp = -1070854480, tf_isp = -1070854528, tf_ebx = -1018076736, tf_edx = 0, tf_ecx = 199294465, tf_eax = -6865985, tf_trapno = 12, tf_err = 2, tf_eip = -1072385865, tf_cs = 8, tf_eflags = 66118, tf_esp = -1038613112, tf_ss = -1043326976}) at ../../i386/i386/trap.c:443 #63 0xc014b0b7 in complete_rqe (bp=0xc2180588) at ../../dev/vinum/vinuminterrupt.c:192 #64 0xc018d82f in biodone (bp=0xc2180588) at ../../kern/vfs_bio.c:2637 #65 0xc012b7fd in dadone (periph=0xc1cec200, done_ccb=0xc2028800) at ../../cam/scsi/scsi_da.c:1262 #66 0xc012771f in camisr (queue=0xc02e5830) at ../../cam/cam_xpt.c:6323 #67 0xc0127531 in swi_cambio () at ../../cam/cam_xpt.c:6226 #68 0xc0270db0 in splz_swi () (kgdb) up #1 0xc01689c4 in poweroff_wait (junk=0xc02ba0cf, howto=0) at ../../kern/kern_shutdown.c:552 552 boot(bootopt); (kgdb) #2 0xc027a78d in trap_fatal (frame=0xc02bf008, eva=84) at ../../i386/i386/trap.c:951 951 panic(trap_msg[type]); (kgdb) #3 0xc027a465 in trap_pfault (frame=0xc02bf008, usermode=0, eva=84) at ../../i386/i386/trap.c:844 844 trap_fatal(frame, eva); (kgdb) #4 0xc027a04b in trap (frame={tf_fs = -1070923760, tf_es = -1072300016, tf_ds = 7077904, tf_edi = -1038983800, tf_esi = -1038984192, tf_ebp = -1070862236, tf_isp = -1070862284, tf_ebx = -1018077120, tf_edx = 0, tf_ecx = 199687681, tf_eax = -7128129, tf_trapno = 12, tf_err = 2, tf_eip = -1072385865, tf_cs = 8, tf_eflags = 66118, tf_esp = -1038983800, tf_ss = -1043326976}) at ../../i386/i386/trap.c:443 443 (void) trap_pfault(&frame, FALSE, eva); (kgdb) #5 0xc014b0b7 in complete_rqe (bp=0xc2125d88) at ../../dev/vinum/vinuminterrupt.c:192 192 ubp->b_resid = 0; /* completed our transfer */ (kgdb) list 185,195 185 if (rq->error) { /* did we have an error? */ 186 if (rq->isplex) { /* plex operation, */ 187 ubp->b_flags |= B_ERROR; /* yes, propagate to user */ 188 ubp->b_error = rq->error; 189 } else /* try to recover */ 190 queue_daemon_request(daemonrq_ioerror, (union daemoninfo) rq); /* let the daemon complete */ 191 } else { 192 ubp->b_resid = 0; /* completed our transfer */ 193 if (rq->isplex == 0) /* volume request, */ 194 VOL[rq->volplex.volno].active--; /* another request finished */ 195 biodone(ubp); /* top level buffer completed */ (kgdb) p ubp $1 = (struct buf *) 0x0 (kgdb) list 160,185 160 } else if ((rqg->flags & (XFR_NORMAL_WRITE | XFR_DEGRADED_WRITE)) /* RAID 4/5 group write operation */ 161 &&(rqg->active == 1)) /* and this is the last active request */ 162 complete_raid5_write(rqe); 163 /* 164 * This is the earliest place where we can be 165 * sure that the request has really finished, 166 * since complete_raid5_write can issue new 167 * requests. 168 */ 169 rqg->active--; /* this request now finished */ 170 if (rqg->active == 0) { /* request group finished, */ 171 rq->active--; /* one less */ 172 if (rqg->lock) { /* got a lock? */ 173 unlockrange(rqg->plexno, rqg->lock); /* yes, free it */ 174 rqg->lock = 0; 175 } 176 } 177 if (rq->active == 0) { /* request finished, */ 178 #if VINUMDEBUG 179 if (debug & DEBUG_RESID) { 180 if (ubp->b_resid != 0) /* still something to transfer? */ 181 Debugger("resid"); 182 } 183 #endif 184 185 if (rq->error) { /* did we have an error? */ (kgdb) p rqg $2 = (struct rqgroup *) 0xc2125c00 (kgdb) set print pretty (kgdb) p *rqg $4 = { next = 0x0, rq = 0xc3516040, count = 2, active = 0, plexno = 0, badsdno = 0, flags = 0, lock = 0x0, lockbase = 199687680, rqe = 0xc2125c20 } (kgdb) p *rq $5 = { bp = 0x0, flags = 0, volplex = { volno = 0, plexno = 0 }, error = 0, sdno = 0, isplex = 0, active = 0, rqg = 0x0, lrqg = 0xc2125c00, next = 0x0 } (kgdb) quit bash-2.04# exit Script done on Wed Sep 20 13:59:48 2000
State Changed From-To: feedback->closed No feedback from submitter.
Dear Greg, Andy, Roman, grog@FreeBSD.org wrote on Mon, Jan 01, 2001 at 11:41:19PM +0000: > Synopsis: multiple crashes while using vinum [..] > State-Changed-Why: > No feedback from submitter. > > http://www.freebsd.org/cgi/query-pr.cgi?pr=21148 Well, I've sent you stack-traces, with (and alas as well without) debugging symbols, I am perfectly aware of your instruction page about debugging vinum, and not an ignorant moron, who complains without reading. Unfortunately you don't seem to trust me or other people in this matter. If you look at my stack-traces again you will notice, that no stack-frame is part of the vinum module, so your .gdb-debugging scripts cannot apply. The reason is, that _some code_ writes into unallocated memory, in my case overwriting a data-structure of an ata-request with a few zero bytes, causing the panic. The stack trace allows me to trace the problem back to this point, but not further. I later experienced a similar problem on a scsi-only system. The reason, why I filed this pr unter 'vinum' is, that it only occured on boxes using vinum, and perfectly reproducable via simple operations like a 'find /vinum/file/system -print' on a larger and moderately filled vinum-filesystem. Perfectly reproducable means: each night, periodic daily caused the panic (traceable to the find call in /etc/security, finding files with setuid bits). As far as I know, the only way to trace this writing into unallocated/otherallocated memory resp. buffer overrun would be to set a watchpoint to the overwritten data-structure within the kernel-debugger. My stack-traces showed that this memory region stays the same on the same machine with the same kernel (although I can't tell how reliable this is). My experiences with kernel code and kernel-debugging with ddb are very limited. So is my time (I know this applies to anyone). Therefore I ceased spending time to set up remote-gdb sessions and sending you stack traces trying to be helpful, since you obviously didn't seem to be interested. I further decided not to use vinum any more. We spent some cash on a few hardware RAIDs, and the boxes run smooth now, since. I am just writing this to state: a) I did respond to your requests, trying to be as helpful as I could. You could blame me for not knowing or willing to learn how to set up a ddb/gdb session using watchpoints and waiting for the next crash in an environmen that should be productive (and now is). b) I still believe, that there is a problem somewhere in the vinum code (probably within raid5 routines, since a mirror setup worked fine). And in fact, I wouldn't have bothered if there weren't any other people like Roman Shterenzon and Andy Newman, who seem to have the same problems. Best regards, Daniel Lang P.S.: I don't use vinum anymore, nor can I take my boxes out of production. The debugging kernels and crash-dumps are no longer present, sorry. -- IRCnet: Mr-Spock - Der Schatten von Hasenfuss ist ziemlich dunkel - *Daniel Lang * dl@leo.org * +49 89 289 25735 * http://www.leo.org/~dl/*
I had the same problems on IDE and SCSI configuration on 4.0, 4.1, 4.1.1 . I spent cash on buying 2 x 40Gig IDE Wd's and put them into mirror. That was a few months ago. I had crashes also with bridging, but Bosko Milekic and Thomas fixed the code with a little help and bridge works superb in 4.2 STABLE. So I was still interested in RAID5 and also very curious about Vinum in 4.2 so I decided to test that thing with same disks ( old SCSI disks , bud good ) and belive or not, it works also under heavy load. It is strange, but I didn't change anything. Controller was the same Adaptec, cables were the same, disks also. Bsd was 4.2 STABLE. Please try it on 4.2 STABLE and report. I must admit, that I am not a FreeBSD hacker, I don't know many about debugging and I was very unhelpful to Grog, but he is very sure, that Vinum works and I don't like his way of thinking about that. I also set him account on that maschine, to check and debug problems on that maschine and repair them, but he wasn't interested in solving such problems. He blamed me for config or some other mistake. Vinum and RAID5 under 4.1 is not stable. That's all. Cuk Daniel Lang wrote: > Dear Greg, Andy, Roman, > > grog@FreeBSD.org wrote on Mon, Jan 01, 2001 at 11:41:19PM +0000: > > Synopsis: multiple crashes while using vinum > [..] > > State-Changed-Why: > > No feedback from submitter. > > > > http://www.freebsd.org/cgi/query-pr.cgi?pr=21148 > > Well, I've sent you stack-traces, with (and alas as well without) > debugging symbols, I am perfectly aware of your instruction page > about debugging vinum, and not an ignorant moron, who complains > without reading. Unfortunately you don't seem to trust me > or other people in this matter. > > If you look at my stack-traces again you will notice, that no > stack-frame is part of the vinum module, so your .gdb-debugging > scripts cannot apply. > > The reason is, that _some code_ writes into unallocated memory, > in my case overwriting a data-structure of an ata-request > with a few zero bytes, causing the panic. The stack trace > allows me to trace the problem back to this point, but not > further. I later experienced a similar problem on a > scsi-only system. > > The reason, why I filed this pr unter 'vinum' is, that it only > occured on boxes using vinum, and perfectly reproducable > via simple operations like a 'find /vinum/file/system -print' > on a larger and moderately filled vinum-filesystem. > Perfectly reproducable means: each night, periodic daily > caused the panic (traceable to the find call in /etc/security, > finding files with setuid bits). > > As far as I know, the only way to trace this writing into > unallocated/otherallocated memory resp. buffer overrun > would be to set a watchpoint to the overwritten data-structure > within the kernel-debugger. My stack-traces showed that this > memory region stays the same on the same machine with the > same kernel (although I can't tell how reliable this is). > My experiences with kernel code and kernel-debugging with > ddb are very limited. So is my time (I know this applies > to anyone). Therefore I ceased spending time to set up > remote-gdb sessions and sending you stack traces trying to be > helpful, since you obviously didn't seem to be interested. > > I further decided not to use vinum any more. We spent some > cash on a few hardware RAIDs, and the boxes run smooth now, > since. > > I am just writing this to state: > a) I did respond to your requests, trying to be as helpful as > I could. You could blame me for not knowing or willing to > learn how to set up a ddb/gdb session using watchpoints > and waiting for the next crash in an environmen that should > be productive (and now is). > b) I still believe, that there is a problem somewhere in the > vinum code (probably within raid5 routines, since a mirror > setup worked fine). > > And in fact, I wouldn't have bothered if there weren't any > other people like Roman Shterenzon and Andy Newman, > who seem to have the same problems. > > Best regards, > Daniel Lang > > P.S.: I don't use vinum anymore, nor can I take my boxes > out of production. The debugging kernels and crash-dumps > are no longer present, sorry. > -- > IRCnet: Mr-Spock - Der Schatten von Hasenfuss ist ziemlich dunkel - > *Daniel Lang * dl@leo.org * +49 89 289 25735 * http://www.leo.org/~dl/* > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-stable" in the body of the message
On Wednesday, 3 January 2001 at 14:52:35 +0000, Daniel Lang wrote: > Dear Greg, Andy, Roman, > > grog@FreeBSD.org wrote on Mon, Jan 01, 2001 at 11:41:19PM +0000: >> Synopsis: multiple crashes while using vinum > [..] >> State-Changed-Why: >> No feedback from submitter. >> >> http://www.freebsd.org/cgi/query-pr.cgi?pr=21148 > > Well, I've sent you stack-traces, with (and alas as well without) > debugging symbols, I am perfectly aware of your instruction page > about debugging vinum, and not an ignorant moron, who complains > without reading. Unfortunately you don't seem to trust me > or other people in this matter. As my closing message says, the reason I closed the PR was: >> No feedback from submitter. I sent you a message on 10 September 2000 asking for additional information. I received none. There's no reason to get all upset now, or make claims about my intentions. This was just a dead PR, and you've made it clear, both before and now, that you have no intention of following up on it. This is not a question of "ignorant morons" or "trust". > The reason is, that _some code_ writes into unallocated memory, in > my case overwriting a data-structure of an ata-request with a few > zero bytes, causing the panic. The stack trace allows me to trace > the problem back to this point, but not further. I later experienced > a similar problem on a scsi-only system. Yes, this looks very much like the other issues. But you must understand that there's nothing I can do without further information. > The reason, why I filed this pr unter 'vinum' is, that it only > occured on boxes using vinum, and perfectly reproducable via simple > operations like a 'find /vinum/file/system -print' on a larger and > moderately filled vinum-filesystem. Perfectly reproducable means: > each night, periodic daily caused the panic (traceable to the find > call in /etc/security, finding files with setuid bits). > > As far as I know, the only way to trace this writing into > unallocated/otherallocated memory resp. buffer overrun > would be to set a watchpoint to the overwritten data-structure > within the kernel-debugger. The trouble with that is that this only happens when the system is very active, and there are thousands of potential buffer headers which could be trashed. I do have a trace facility within Vinum, but even with that it's difficult to figure out what's going on. > My stack-traces showed that this memory region stays the same on the > same machine with the same kernel (although I can't tell how > reliable this is). If you mean that the same part of the buffer header gets smashed every time, yes, this is reliably reproducible (well, in other words, when it happens (at random), it happens in the same place every time). It may mean that Vinum is doing it, but as far as I can tell it's always 6 words being zeroed out, and I don't do that anywhere in Vinum. The other possibility, which I consider most likely, is that the data structures accidentally get freed and used by some other driver (or, possibly, that some other driver freed them first and then continued using them). This would explain the observed correlation with the fxp driver. > My experiences with kernel code and kernel-debugging with > ddb are very limited. So is my time (I know this applies > to anyone). Therefore I ceased spending time to set up > remote-gdb sessions and sending you stack traces trying to be > helpful, since you obviously didn't seem to be interested. > > I further decided not to use vinum any more. We spent some > cash on a few hardware RAIDs, and the boxes run smooth now, > since. > > I am just writing this to state: > a) I did respond to your requests, trying to be as helpful as > I could. Well, I sent you a message on 10 September 2000, asking for additional information. You didn't send it to me. > You could blame me for not knowing or willing to learn how to > set up a ddb/gdb session using watchpoints and waiting for the > next crash in an environmen that should be productive (and now > is). No, I wouldn't do that. > b) I still believe, that there is a problem somewhere in the > vinum code (probably within raid5 routines, since a mirror > setup worked fine). Correct. I have no doubt about it. But some bugs are difficult to find, and I need help. Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers
On Wednesday, 3 January 2001 at 21:51:52 +0100, Marko Cuk wrote: > I had the same problems on IDE and SCSI configuration on 4.0, 4.1, 4.1.1 . > I spent cash on buying 2 x 40Gig IDE Wd's and put them into mirror. > That was a few months ago. > > I had crashes also with bridging, but Bosko Milekic and Thomas fixed the > code with a little help and bridge works superb in 4.2 STABLE. > > So I was still interested in RAID5 and also very curious about Vinum in 4.2 > so I decided to test that thing with same disks ( old SCSI disks , bud good > ) and belive or not, it works also under heavy load. > It is strange, but I didn't change anything. Controller was the same > Adaptec, cables were the same, disks also. Bsd was 4.2 STABLE. > > Please try it on 4.2 STABLE and report. > > I must admit, that I am not a FreeBSD hacker, I don't know many > about debugging and I was very unhelpful to Grog, but he is very > sure, that Vinum works and I don't like his way of thinking about > that. I'm not sure what you're saying here. I've made it clear (even in the man pages) that there are some problems with RAID-5. What else (apart from fix the problem :-) do you want me to do? > I also set him account on that maschine, to check and debug problems > on that maschine and repair them, but he wasn't interested in > solving such problems. He blamed me for config or some other > mistake. I don't have any record of this. Did you use some other name? The last exchange we had (27 November 2000), you didn't want to submit the information I asked for, I said I wouldn't be able to help you much without it, and you said you would go back and get the information. I don't see anything about being offered an account on the machine, which indeed would have been of assistance. But basically, all I needed at that point was a (preferably unmutilated) copy of the information I asked for. Based on what you supplied, I don't even know if you had a panic or just a freeze. > Vinum and RAID5 under 4.1 is not stable. That's all. Ah, but there's the problem. There are no changes between 4.1 and 4.2. In some cases, we run into these problems, but for the most part it just works. Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers