Kernel panic. No dump to disk is made. Moreover, despite having KDB turned on, the system did not drop to a db> prompt. login: Memory modified after free 0xfffff80019f97000(2048) val=dead0003 @ 0xfffff80019f97000 Memory modified after free 0xfffff8000569f000(2048) val=dead0003 @ 0xfffff8000569f000 Memory modified after free 0xfffff80005686800(2048) val=dead0003 @ 0xfffff80005686800 Memory modified after free 0xfffff800056dd800(2048) val=dead0003 @ 0xfffff800056dd800 Memory modified after free 0xfffff800054ba800(2048) val=dead0003 @ 0xfffff800054ba800 Memory modified after free 0xfffff8000565b000(2048) val=dead0003 @ 0xfffff8000565b000 Memory modified after free 0xfffff80005609800(2048) val=dead0003 @ 0xfffff80005609800 Memory modified after free 0xfffff80005608000(2048) val=dead0003 @ 0xfffff80005608000 Memory modified after free 0xfffff80005695800(2048) val=dead0003 @ 0xfffff80005695800 Memory modified after free 0xfffff8000563e800(2048) val=dead0003 @ 0xfffff8000563e800 Memory modified after free 0xfffff800055c2000(2048) val=dead0003 @ 0xfffff800055c2000 Memory modified after free 0xfffff80019f77800(2048) val=dead0003 @ 0xfffff80019f77800 Memory modified after free 0xfffff8001920b000(2048) val=dead0003 @ 0xfffff8001920b000 Memory modified after free 0xfffff80019fae000(2048) val=dead0003 @ 0xfffff80019fae000 Memory modified after free 0xfffff800055a6800(2048) val=dead0003 @ 0xfffff800055a6800 Memory modified after free 0xfffff8000565e000(2048) val=dead0003 @ 0xfffff8000565e000 Memory modified after free 0xfffff80005641800(2048) val=dead0003 @ 0xfffff80005641800 Memory modified after free 0xfffff80005675000(2048) val=dead0003 @ 0xfffff80005675000 Memory modified after free 0xfffff8000564c800(2048) val=dead0003 @ 0xfffff8000564c800 panic: pcib: PCI bus B error AFAR 0 AFSR 0 PCI CSR 0x10730b2aff IOMMU 0x3060003 STATUS 0x2a0 cpuid = 1 On pcib bus B I seem to have the following devices: pcib0: <Sun Host-PCI bridge> mem 0x4000ff00000-0x4000ff0afff,0x4000fc10000-0x4000fc1701f,0x7f600000000-0x7f6000000ff,0x4000ff80000-0x4000ff8ffff irq 2035,2032,2033,2036,2019 on nexus0 pcib0: Tomatillo, version 4, IGN 0x1f, bus B, 66MHz pcib0: DVMA map: 0xc0000000 to 0xdfffffff 65536 entries pci0: <OFW PCI bus> on pcib0 pci0: <OFW PCI bus> on pcib0 bge0: <Broadcom BCM5704 A3, ASIC rev. 0x002003> mem 0x200000-0x20ffff,0x110000-0x11ffff at device 2.0 on pci0 bge1: <Broadcom BCM5704 A3, ASIC rev. 0x002003> mem 0x400000-0x40ffff,0x120000-0x12ffff at device 2.1 on pci0 atapci0: <AcerLabs M5229 UDMA100 controller> port 0x900-0x907,0x918-0x91b,0x910-0x917,0x908-0x90b,0x920-0x92f at device 13.0 on pci1 atapci0: [ITHREAD] atapci0: using PIO transfers above 137GB as workaround for 48bit DMA access bug, expect reduced performance There's only a DVD drive attached to atapci0, and the driver for that is not loaded. pcib3: <Sun Host-PCI bridge> mem 0x4000ef00000-0x4000ef0afff,0x4000ec10000-0x4000ec1701f,0x7c600000000-0x7c6000000ff,0x4000ef80000-0x4000ef8ffff irq 1907,1904,1905,1908,1893 on nexus0 pcib3: Tomatillo, version 4, IGN 0x1d, bus B, 66MHz pcib3: DVMA map: 0xc0000000 to 0xdfffffff 65536 entries pci3: <OFW PCI bus> on pcib3 bge2: <Broadcom BCM5704 A3, ASIC rev. 0x002003> mem 0x200000-0x20ffff,0x110000-0x11ffff at device 2.0 on pci3 bge3: <Broadcom BCM5704 A3, ASIC rev. 0x002003> mem 0x400000-0x40ffff,0x120000-0x12ffff at device 2.1 on pci3 atapci1: <Marvell 88SX6081 SATA300 controller> port 0x300-0x3ff mem 0x600000-0x6fffff,0x800000-0xbfffff at device 1.0 on pci3 ata8: <ATA channel 4> on atapci1 ata9: <ATA channel 5> on atapci1 ata10: <ATA channel 6> on atapci1 ata11: <ATA channel 7> on atapci1 ad0: 715404MB <WDC WD7500AADS-00L5B1 01.01A01> at ata8-master UDMA100 SATA 3Gb/s ad1: 715404MB <WDC WD7500AADS-00L5B1 01.01A01> at ata9-master UDMA100 SATA 3Gb/s ad2: 715404MB <WDC WD7500AADS-00L5B1 01.01A01> at ata10-master UDMA100 SATA 3Gb/s ad3: 715404MB <WDC WD7500AADS-00L5B1 01.01A01> at ata11-master UDMA100 SATA 3Gb/s These four disks form a RAIDZ. Kernel configuration options that seem relevant: options SMP options KDB options INVARIANTS options INVARIANT_SUPPORT options WITNESS options WITNESS_SKIPSPIN device ata device atadisk nodevice atapicd nodevice atapifd nodevice atapist device atamarvell What more would be useful to know? How-To-Repeat: Unknown; the crash has happened twice so far, once with a kernel from January after weeks of uptime and once with a kernel from yesterday after only a few hours. The system routinely survives multiple zfs scrubs of the four disks hanging off of pci3, so if it's an ATA bug it's a funny one.
Responsible Changed From-To: freebsd-bugs->freebsd-sparc64 Might be specific to sparc64.
It occurs to me to add that at least the second crash was correlated with a burst of traffic on bge2, which usually sits idle. FWIW, bge0 and bge3 are typically busy, and bge1 is not connected. Is it possible that this is a bge bug? I'll be recreating the busy-bge2 scenario to test other things anyway and will report should it trigger a panic again. While I'm recovering from filing an underinformative bug report, I'll note that the machine is a Sun Fire V210 (with 2G of RAM and 2 1GHz CPUs). Anything else that would help? --nwf;
On Wed, Mar 31, 2010 at 06:50:12PM +0000, Nathaniel W Filardo wrote: > The following reply was made to PR sparc64/145211; it has been noted by GNATS. > > From: Nathaniel W Filardo <nwf@cs.jhu.edu> > To: bug-followup@freebsd.org > Cc: > Subject: Re: kern/145211: Memory modified after free > Date: Wed, 31 Mar 2010 14:49:40 -0400 > > It occurs to me to add that at least the second crash was correlated with a > burst of traffic on bge2, which usually sits idle. FWIW, bge0 and bge3 are > typically busy, and bge1 is not connected. Is it possible that this is a > bge bug? I'll be recreating the busy-bge2 scenario to test other things > anyway and will report should it trigger a panic again. FWIW I've had this twice on ia64 -current. It also seems to follow bge activity, but not sure about the "bursts": http://seis.bris.ac.uk/~mexas/freebsd/ia64/rx2600/tzav/messages -- Anton Shterenlikht Room 2.6, Queen's Building Mech Eng Dept Bristol University University Walk, Bristol BS8 1TR, UK Tel: +44 (0)117 331 5944 Fax: +44 (0)117 929 4423
> > Memory modified after free 0xfffff80005675000(2048) val=dead0003 @ 0xfffff80005675000 > Memory modified after free 0xfffff8000564c800(2048) val=dead0003 @ 0xfffff8000564c800 > panic: pcib: PCI bus B error AFAR 0 AFSR 0 PCI CSR 0x10730b2aff IOMMU 0x3060003 STATUS 0x2a0 This is the IOMMU reporting an error as STX_PCI_CTRL_MMU_ERR is set in the PCI CSR and TOM_PCI_IOMMU_ERR is set in the IOMMO CSR. Moreover the TOM_PCI_IOMMU_INVALID_ERR set in the latter suggests that a DMA buffer was used after it had been unloaded (and thus the TTE invalidated). So it's quite likely that both the UMA and the IOMMU complaints are caused by the same problem. Unfortunately, neither allows to identify the culprit. If you could move the traffic in question from bge2 to bge1 and either use r206020 or the following patch that should allow to identify at least the driver involved, i.e. ata(4) or bge(4), by additionally indicating whether pcib0 or pcib3 triggered the panic. http://people.freebsd.org/~marius/psycho_schizo_device_get_nameunit.diff Which version of if_bge.c were you running when the panic occurred? Marius
On Thu, Apr 01, 2010 at 01:23:59PM +0200, Marius Strobl wrote: > This is the IOMMU reporting an error as STX_PCI_CTRL_MMU_ERR is set in > the PCI CSR and TOM_PCI_IOMMU_ERR is set in the IOMMO CSR. Moreover the > TOM_PCI_IOMMU_INVALID_ERR set in the latter suggests that a DMA buffer > was used after it had been unloaded (and thus the TTE invalidated). So > it's quite likely that both the UMA and the IOMMU complaints are caused > by the same problem. Unfortunately, neither allows to identify the Thank you for decoding that for me. > culprit. If you could move the traffic in question from bge2 to bge1 > and either use r206020 or the following patch that should allow to > identify at least the driver involved, i.e. ata(4) or bge(4), by > additionally indicating whether pcib0 or pcib3 triggered the panic. > http://people.freebsd.org/~marius/psycho_schizo_device_get_nameunit.diff Just csup'd and am now rebuilding; will let you know. > Which version of if_bge.c were you running when the panic occurred? $FreeBSD: src/sys/dev/bge/if_bge.c,v 1.284 2010/03/25 17:17:35 yongari Exp $
State Changed From-To: open->feedback This is expected to be fixed by r208862 (r208995 in stable/7, r208993 in stable/8), especially if a high number of if_iqdrops were seen. Could you please re-test with that revision in place?
State Changed From-To: feedback->closed Close due to feedback timeout.