Bug 20031

Summary: kernel randomly panics with ffs_clusteralloc: map mismatch
Product: Base System Reporter: chris <chris>
Component: miscAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 3.4-RELEASE   
Hardware: Any   
OS: Any   

Description chris 2000-07-19 10:30:04 UTC
Every couple of days, around 3:30 in the morning when periodic:daily is in to some hot and heavy disk activity, the kernel issues a "ffs_clusteralloc: map mismatch" and the machine reboots.  There seems to be no consistent cause of the problem, but some research shows that other folks have reported similar situations, and suspect the cause to be heavy disk activity.  

Possibly related: in each crash, the /var/log/messages file has a large 4-5 hour gap in it preceeding the crash, where it appears that syslogd isn't writing out its activity.

This report is similar to PR#16740, which looks to be unresolved.

Fix: 

I've tried rebuilding the kernel, running fsck several times over, checking the bios settings, and can't get anywhere.  Sorry.
How-To-Repeat: Unknown, although it seems to involve having a particular hardware setup and then generating enough disk writes to cause the panic.
Comment 1 Joakim Henriksson 2000-07-19 12:01:25 UTC
> Possibly related: in each crash, the /var/log/messages file has a large 4-5 hour gap in it preceeding the crash, where it appears that syslogd isn't writing out its activity.
> 
> This report is similar to PR#16740, which looks to be unresolved.
> >How-To-Repeat:
> Unknown, although it seems to involve having a particular hardware setup and then generating enough disk writes to cause the panic.
> >Fix:
> I've tried rebuilding the kernel, running fsck several times over, checking the bios settings, and can't get anywhere.  Sorry.

I've been getting consistant crashes while accessing a msdos drive since i 
filed the 16740 PR. Do you perchance have a VIA MVP3 based mother board? 
Perhaps you could send the chipset related info from a "boot -v"?

Here is some interesting items from the crash dump.

IdlePTD 3600384
initial pcb at 2e9340
panicstr: page fault
panic messages:
---
Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0xe0c58ffc
fault code              = supervisor read, page not present
instruction pointer     = 0x8:0xc018250b
stack pointer           = 0x10:0xc8fbed2c
frame pointer           = 0x10:0xc8fbed3c
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 19 (cp)
interrupt mask          = none
trap number             = 12
panic: page fault
---
(kgdb) where
#0  boot (howto=256) at ../../kern/kern_shutdown.c:302
#1  0xc014a328 in poweroff_wait (junk=0xc02a1b4f, howto=-933211072)
    at ../../kern/kern_shutdown.c:552
#2  0xc02668c5 in trap_fatal (frame=0xc8fbecec, eva=3771043836)
    at ../../i386/i386/trap.c:927
#3  0xc026659d in trap_pfault (frame=0xc8fbecec, usermode=0, eva=3771043836)
    at ../../i386/i386/trap.c:820
#4  0xc026616f in trap (frame={tf_fs = 16, tf_es = 16, tf_ds = 16, 
      tf_edi = 6984, tf_esi = -1060794368, tf_ebp = -923013828, 
      tf_isp = -923013864, tf_ebx = -1060798464, tf_edx = 134217727, 
      tf_ecx = 31, tf_eax = -2147483648, tf_trapno = 12, tf_err = 0, 
      tf_eip = -1072159477, tf_cs = 8, tf_eflags = 68246, tf_esp = 268435455, 
      tf_ss = 268435455}) at ../../i386/i386/trap.c:426
#5  0xc018250b in updatefats (pmp=0xc0c58000, bp=0xc3551760, fatbn=6984)
    at ../../msdosfs/msdosfs_fat.c:353
#6  0xc01829dc in fatchain (pmp=0xc0c58000, start=890233, count=0, 
    fillwith=4294967295) at ../../msdosfs/msdosfs_fat.c:674
#7  0xc0182afd in chainalloc (pmp=0xc0c58000, start=890233, count=8, 
    fillwith=4294967295, retcluster=0xc8fbee00, got=0xc8fbedfc)
    at ../../msdosfs/msdosfs_fat.c:748
#8  0xc0182cfa in clusteralloc (pmp=0xc0c58000, start=0, count=8, 
    fillwith=4294967295, retcluster=0xc8fbee00, got=0xc8fbedfc)
    at ../../msdosfs/msdosfs_fat.c:842
---Type <return> to continue, or q <return> to quit---
#9  0xc0183075 in extendfile (dep=0xc0c95300, count=8, bpp=0x0, ncp=0x0, 
    flags=0) at ../../msdosfs/msdosfs_fat.c:1034
#10 0xc01860b9 in msdosfs_write (ap=0xc8fbee90)
    at ../../msdosfs/msdosfs_vnops.c:725
#11 0xc017a8d0 in vn_write (fp=0xc0c525c0, uio=0xc8fbeedc, cred=0xc0741900, 
    flags=0, p=0xc8605440) at vnode_if.h:363
#12 0xc0157362 in dofilewrite (p=0xc8605440, fp=0xc0c525c0, fd=4, 
    buf=0x8058100, nbyte=65536, offset=-1, flags=0) at ../../sys/file.h:159
#13 0xc0157267 in write (p=0xc8605440, uap=0xc8fbef80)
    at ../../kern/sys_generic.c:298
#14 0xc0266b71 in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, 
      tf_edi = 65536, tf_esi = 134578432, tf_ebp = -1077937648, 
      tf_isp = -923013164, tf_ebx = 65536, tf_edx = 4, tf_ecx = 0, tf_eax = 4, 
      tf_trapno = 12, tf_err = 2, tf_eip = 134561480, tf_cs = 31, 
      tf_eflags = 663, tf_esp = -1077937804, tf_ss = 47})
    at ../../i386/i386/trap.c:1126
#15 0xc025a7c5 in Xint0x80_syscall ()
#16 0x8048989 in ?? ()
#17 0x804851a in ?? ()
#18 0x8048139 in ?? ()
(kgdb) up 5
#5  0xc018250b in updatefats (pmp=0xc0c58000, bp=0xc3551760, fatbn=6984)
    at ../../msdosfs/msdosfs_fat.c:353
353                     if (pmp->pm_freeclustercount
(kgdb) print pmp
$1 = (struct msdosfsmount *) 0x0
(kgdb) list
348              * If we have an FSInfo block, update it.
349              */
350             if (pmp->pm_fsinfo) {
351                     u_long cn = pmp->pm_nxtfree;
352
353                     if (pmp->pm_freeclustercount
354                         && (pmp->pm_inusemap[cn / N_INUSEBITS]
355                             & (1 << (cn % N_INUSEBITS)))) {
356                             /*
357                              * The cluster indicated in FSInfo isn't free
(kgdb) up 1
#6  0xc01829dc in fatchain (pmp=0xc0c58000, start=890233, count=0, 
    fillwith=4294967295) at ../../msdosfs/msdosfs_fat.c:674
674                     updatefats(pmp, bp, bn);
(kgdb) print pmp->pm_nxtfree
$6 = 4294967295
(kgdb) 

This must be a nonsense value (0xffffffff)

(kgdb) print pmp->pm_inusemap[0xffffffff / 32]
Cannot access memory at address 0xe0c58ffc.

The fault address.

As usual crashdumps are available if someone want's to take a closer look this time.

-- 
regards/ Joakim
Comment 2 Sheldon Hearn 2000-07-19 12:51:55 UTC
On Wed, 19 Jul 2000 04:10:04 MST, Joakim Henriksson wrote:

> I've been getting consistant crashes while accessing a msdos drive since i 
> filed the 16740 PR. Do you perchance have a VIA MVP3 based mother board? 
> Perhaps you could send the chipset related info from a "boot -v"?

Msdos, you say?  I saw a fix go into msdosfs_vnops.c recently, which
is presumed to have fixed this panic:

	panic: vrele: negative ref cnt

However, I seem to remember that you're using 3.4-RELEASE.  The fix has
only been applied to the development branch and to RELENG_4.  You can
try to come up with a patch for yourself based on:

	http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/msdosfs/msdosfs_vnops.c.diff?r1=1.95&r2=1.95.2.1

Ciao,
Sheldon.
Comment 3 Joakim Henriksson 2000-07-19 13:14:05 UTC
> > I've been getting consistant crashes while accessing a msdos drive since i 
> > filed the 16740 PR. Do you perchance have a VIA MVP3 based mother board? 
> > Perhaps you could send the chipset related info from a "boot -v"?
> 
> Msdos, you say?  I saw a fix go into msdosfs_vnops.c recently, which
> is presumed to have fixed this panic:
> 
> 	panic: vrele: negative ref cnt

I will try it. But the fact remains that the ffs_cluster alloc bug still 
exists...

> However, I seem to remember that you're using 3.4-RELEASE.  The fix has
> only been applied to the development branch and to RELENG_4.  You can
> try to come up with a patch for yourself based on:

I'm running 4.1-RC two days too early for this patch.

> 	http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/msdosfs/msdosfs_vnops.c.diff?r1=1.95&r2=1.95.2.1



-- 
regards/ Joakim
Comment 4 Sheldon Hearn freebsd_committer freebsd_triage 2000-07-19 14:00:59 UTC
State Changed
From-To: open->closed

This appears to be a duplicate of kern/16740, which provides more 
detail and which applies to a more recent version of FreeBSD. 
Please watch that PR, but be aware that the 3.x line is near 
the end of its development cycle, as the focus has shifted to 
4.x and the development branch (to be 5.x).