Bug 121770 - [panic] ZFS on i386, large file or heavy I/O leads to kernel panic or reboot
Summary: [panic] ZFS on i386, large file or heavy I/O leads to kernel panic or reboot
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 7.0-RELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-03-17 00:00 UTC by Solra Bizna
Modified: 2011-09-08 16:51 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Solra Bizna 2008-03-17 00:00:03 UTC
I have a SPARC64 with 1.25GB of RAM running 7.0-RC1, and I'm migrating
a filesystem from it to an i386 with 1.5GB of RAM running 7.0-RELEASE.
One of the filesystems was around 3GB in size. I used zfs send to create
snapshot files of each filesystem, and transferred them over to a
temporary ZFS filesystem using rsync. Sometimes at random (but
consistently partway through the 3GB file) the i386 panics with a "page
fault in kernel mode" and an access to a very low (always 0x1 when I was
at the console to see it) address.

Just now, the temporary filesystem in question was corrupted, leading
ZFS to freeze. This is (IMO) probably just the same error manifesting
itself in a different way. The problem also manifests itself as a
sudden, inexplicable reboot.

I had not followed the instructions in ZFSTuningGuide, but the problem
is still reproducible after having done so.

How-To-Repeat: $ dd if=/dev/zero of=/somepool/test bs=2048 count=1048576
Were it to work, it would create a 2GB file filled with zeroes. Sometimes
a 1GB file is enough to trigger this.
Comment 1 Solra Bizna 2008-03-17 08:09:49 UTC
Much hardware and software testing later: The failure occurs under any
heavy I/O load, not just ZFS. In addition, it only fails when all
three DIMM slots are filled. None of the individual slots or DIMMs are
faulty, and the BIOS's memory test passes even with all three slots
filled. Currently, the machine is operating with 2 slots filled at a
reduced RAM capacity of 1GB.
Signs point to a hardware problem, but I want to think it's a fixable
software bug because otherwise I'm sitting on a 512MB PC133 DIMM I
can't use. :/
-:sigma.SB
Comment 2 Volker Werth freebsd_committer freebsd_triage 2008-03-18 21:40:29 UTC
State Changed
From-To: open->feedback


Please give us the kernel dump! 
For the other issues, you need to carefully tune your system for zfs when using arch i386. 
As you haven't send a dmesg or any other useful information, we can only guess 
about your problem but I'm wondering if you're using a Sil3112 or the like? 
You can't be sure to have working memory just by successfully passing the POST phase. Please check your RAM using something like memtest86, first (assuming you're running x86 hardware as your report is not clear if you're seeing this on sparc or on x86). 
Also please check if the error appearance changes when swapping RAM modules.
Comment 3 Solra Bizna 2008-03-19 09:09:30 UTC
On Tue, Mar 18, 2008 at 3:52 PM,  <vwe@freebsd.org> wrote:
>  Please give us the kernel dump!
It made several dumps, but I can't locate them. (I've historically
been a Linux person so I have no idea where they go.) I haven't had a
recurrence since going without a DIMM.

>  As you haven't send a dmesg or any other useful information,
Copyright (c) 1992-2008 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 7.0-RELEASE #0: Sun Feb 24 19:59:52 UTC 2008
    root@logan.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Pentium(R) 4 CPU 1.60GHz (1595.30-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0xf12  Stepping = 2
  Features=0x3febfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM>
real memory  = 1073741824 (1024 MB)
avail memory = 1036939264 (988 MB)
ACPI APIC Table: <COMPAQ BROOKDA >
ioapic0: Changing APIC ID to 8
ioapic0 <Version 2.0> irqs 0-23 on motherboard
kbd1 at kbdmux0
ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413)
hptrr: HPT RocketRAID controller driver v1.1 (Feb 24 2008 19:59:27)
acpi0: <COMPAQ CPQ003E> on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
acpi0: reservation of 0, a0000 (3) failed
acpi0: reservation of 100000, 40000000 (3) failed
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0xf808-0xf80b on acpi0
cpu0: <ACPI CPU> on acpi0
p4tcc0: <CPU Frequency Thermal Control> on cpu0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
agp0: <Intel 82845 host to AGP bridge> on hostb0
pcib1: <PCI-PCI bridge> at device 1.0 on pci0
pci1: <PCI bus> on pcib1
vgapci0: <VGA-compatible display> mem
0xfc000000-0xfcffffff,0xf0000000-0xf7ffffff irq 18 at device 0.0 on
pci1
pcib2: <ACPI PCI-PCI bridge> at device 30.0 on pci0
pci2: <ACPI PCI bus> on pcib2
fxp0: <Intel 82801BA/CAM (ICH2/3) Pro/100 Ethernet> port 0x1000-0x103f
mem 0xfd200000-0xfd200fff irq 20 at device 8.0 on pci2
miibus0: <MII bus> on fxp0
inphy0: <i82562EM 10/100 media interface> PHY 1 on miibus0
inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
fxp0: Ethernet address: 00:02:a5:f2:7a:a2
fxp0: [ITHREAD]
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel ICH2 UDMA100 controller> port
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x2480-0x248f at device 31.1 on
pci0
ata0: <ATA channel 0> on atapci0
ata0: [ITHREAD]
ata1: <ATA channel 1> on atapci0
ata1: [ITHREAD]
uhci0: <Intel 82801BA/BAM (ICH2) USB controller USB-A> port
0x2440-0x245f irq 19 at device 31.2 on pci0
uhci0: [GIANT-LOCKED]
uhci0: [ITHREAD]
usb0: <Intel 82801BA/BAM (ICH2) USB controller USB-A> on uhci0
usb0: USB revision 1.0
uhub0: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb0
uhub0: 2 ports with 2 removable, self powered
uhci1: <Intel 82801BA/BAM (ICH2) USB controller USB-B> port
0x2460-0x247f irq 23 at device 31.4 on pci0
uhci1: [GIANT-LOCKED]
uhci1: [ITHREAD]
usb1: <Intel 82801BA/BAM (ICH2) USB controller USB-B> on uhci1
usb1: USB revision 1.0
uhub1: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb1
uhub1: 2 ports with 2 removable, self powered
pci0: <multimedia, audio> at device 31.5 (no driver attached)
acpi_button0: <Power Button> on acpi0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
atkbd0: [ITHREAD]
sio0: <Standard PC COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A
sio0: [FILTER]
sio1: <Standard PC COM port> port 0x2f8-0x2ff irq 3 on acpi0
sio1: type 16550A
sio1: [FILTER]
fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0
fdc0: [FILTER]
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
pmtimer0 on isa0
orm0: <ISA Option ROM> at iomem 0xc0000-0xcc7ff pnpid ORM0000 on isa0
ppc0: <Parallel port> at port 0x378-0x37f irq 7 on isa0
ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/13 bytes threshold
ppbus0: <Parallel port bus> on ppc0
ppbus0: [ITHREAD]
plip0: <PLIP network interface> on ppbus0
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0
ppc0: [GIANT-LOCKED]
ppc0: [ITHREAD]
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Timecounter "TSC" frequency 1595300680 Hz quality 800
Timecounters tick every 1.000 msec
hptrr: no controller detected.
ad0: 19092MB <WDC WD200EB-11CSF0 04.01B04> at ata0-master UDMA100
ad2: 117246MB <Maxtor 6Y120L0 YAR41BW0> at ata1-master UDMA100
Trying to mount root from ufs:/dev/ad0s1a
WARNING: / was not properly dismounted
WARNING: /tmp was not properly dismounted
WARNING: /usr was not properly dismounted
/usr: mount pending error: blocks 5804 files 0
WARNING: /var was not properly dismounted
/var: mount pending error: blocks 4 files 1
WARNING: ZFS is considered to be an experimental feature in FreeBSD.
ZFS filesystem version 6
ZFS storage pool version 6

^-- That is my latest dmesg. (The mount-related messages at the end
are because of a power failure.)

>  I'm wondering if you're using a Sil3112 or the like?
I have two IDE devices on two IDE buses on an ICH2.

>  You can't be sure to have working memory just by successfully passing the POST phase. Please check your RAM using something like memtest86, first (assuming you're running x86 hardware as your report is not clear if you're seeing this on sparc or on x86).
You're correct. Unfortunately, unless it's on the boot floppies I
can't run memtest86. (There is a gaping double-height
5.25"-form-factor hole in this computer's case where optical drives
should be.)

>  Also please check if the error appearance changes when swapping RAM modules.
Changing the arrangement of the three DIMMs in the three slots didn't
change anything about the problem. However, any arrangement with only
two DIMMs (leaving a slot open) seems to work perfectly.
-:sigma.SB
Comment 4 Mark Linimon freebsd_committer freebsd_triage 2009-05-18 03:55:50 UTC
State Changed
From-To: feedback->open

Note that feedback was received, and assign properly. 


Comment 5 Mark Linimon freebsd_committer freebsd_triage 2009-05-18 03:55:50 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-fs
Comment 6 vladislav V. Prodan 2009-09-09 20:02:27 UTC
	
Try to /boot/loader.conf write:

## -> Speed to write
vfs.zfs.arc_max = "448M"
vm.kmem_size_max = "999M"
vm.kmem_size = "999M"
# KVA_PAGES = 512
vfs.zfs.zil_disable = "1"

and after reboot again to repeat the panic.
Comment 7 Jaakko Heinonen freebsd_committer freebsd_triage 2011-09-08 16:51:00 UTC
State Changed
From-To: open->closed

Probably a hardware problem.
Comment 8 Pawel Jakub Dawidek freebsd_committer freebsd_triage 2014-06-01 06:45:49 UTC
Responsible Changed
From-To: freebsd-fs->freebsd-bugs

Looks like a hardware failure and not ZFS problem.