Bug 15063

Summary: "indirect pointer mismatch" in softdep
Product: Base System Reporter: assar <assar>
Component: kernAssignee: Kirk McKusick <mckusick>
Status: Closed FIXED    
Severity: Affects Only Me CC: assar
Priority: Normal    
Version: 4.0-CURRENT   
Hardware: Any   
OS: Any   

Description assar 1999-11-23 14:10:00 UTC
IdlePTD 3809280
initial pcb at 31dbc0
panicstr: softdep_lock: locking against myself
panic messages:
---
panic: softdep_write_inodeblock: indirect pointer #0 mismatch 0 != 903440

syncing disks... panic: softdep_lock: locking against myself

dumping to dev #wd/0x20001, offset 272800
dump 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 
---
#0  boot (howto=260) at ../../kern/kern_shutdown.c:273
(kgdb) bt
#0  boot (howto=260) at ../../kern/kern_shutdown.c:273
#1  0xc016c699 in panic (fmt=0xc02d5780 "softdep_lock: locking against myself")
    at ../../kern/kern_shutdown.c:523
#2  0xc0235129 in acquire_lock (lk=0xc030c634)
    at ../../ufs/ffs/ffs_softdep.c:266
#3  0xc0238b80 in softdep_update_inodeblock (ip=0xc0c49200, bp=0xc3444ef8, 
    waitfor=0) at ../../ufs/ffs/ffs_softdep.c:3487
#4  0xc02343ad in ffs_update (vp=0xc8e19060, waitfor=0)
    at ../../ufs/ffs/ffs_inode.c:105
#5  0xc023bcd4 in ffs_sync (mp=0xc0b09400, waitfor=2, cred=0xc073b900, 
    p=0xc0333e60) at ../../ufs/ffs/ffs_vfsops.c:986
#6  0xc0197163 in sync (p=0xc0333e60, uap=0x0) at ../../kern/vfs_syscalls.c:545
#7  0xc016c0af in boot (howto=256) at ../../kern/kern_shutdown.c:197
#8  0xc016c699 in panic (
    fmt=0xc02d64a0 "%s: indirect pointer #%ld mismatch %d != %d")
    at ../../kern/kern_shutdown.c:523
#9  0xc0237fc8 in initiate_write_inodeblock (inodedep=0xc0d15400, 
    bp=0xc34457f0) at ../../ufs/ffs/ffs_softdep.c:2831
#10 0xc0237cdb in softdep_disk_io_initiation (bp=0xc34457f0)
    at ../../ufs/ffs/ffs_softdep.c:2677
#11 0xc01a14a6 in spec_strategy (ap=0xc840deb0)
    at ../../miscfs/specfs/spec_vnops.c:413
#12 0xc01a0fbd in spec_vnoperate (ap=0xc840deb0)
    at ../../miscfs/specfs/spec_vnops.c:121
#13 0xc0243fd1 in ufs_vnoperatespec (ap=0xc840deb0)
    at ../../ufs/ufs/ufs_vnops.c:2319
#14 0xc018d119 in bwrite (bp=0xc34457f0) at vnode_if.h:940
#15 0xc019247a in vop_stdbwrite (ap=0xc840df20) at ../../kern/vfs_default.c:314
#16 0xc01922d9 in vop_defaultop (ap=0xc840df20) at ../../kern/vfs_default.c:134
#17 0xc01a0fbd in spec_vnoperate (ap=0xc840df20)
    at ../../miscfs/specfs/spec_vnops.c:121
#18 0xc0243fd1 in ufs_vnoperatespec (ap=0xc840df20)
    at ../../ufs/ufs/ufs_vnops.c:2319
#19 0xc018dfee in vfs_bio_awrite (bp=0xc34457f0) at vnode_if.h:1189
#20 0xc023d239 in ffs_fsync (ap=0xc840df88) at ../../ufs/ffs/ffs_vnops.c:210
#21 0xc0194bd8 in sched_sync () at vnode_if.h:533
#22 0xc026ed60 in fork_trampoline ()
Cannot access memory at address 0x2000.
(kgdb) f 8
#8  0xc016c699 in panic (
    fmt=0xc02d64a0 "%s: indirect pointer #%ld mismatch %d != %d")
    at ../../kern/kern_shutdown.c:523
(kgdb) up
#9  0xc0237fc8 in initiate_write_inodeblock (inodedep=0xc0d15400, 
    bp=0xc34457f0) at ../../ufs/ffs/ffs_softdep.c:2831
(kgdb) print *adp
$1 = {ad_list = {wk_list = {le_next = 0x0, le_prev = 0xc344fd2c}, wk_type = 4, 
    wk_state = 32777}, ad_next = {tqe_next = 0x0, tqe_prev = 0xc0d15444}, 
  ad_lbn = 12, ad_newblkno = 903440, ad_oldblkno = 0, ad_newsize = 8192, 
  ad_oldsize = 0, ad_deps = {le_next = 0xc0ba6300, le_prev = 0xc0d1acb0}, 
  ad_buf = 0x0, ad_inodedep = 0xc0d15400, ad_freefrag = 0x0}
(kgdb) p dp->di_ib[adp->ad_lbn - NADDR]
No symbol "NADDR" in current context.
(kgdb) p dp->di_ib[adp->ad_lbn - 12]
$2 = 0
(kgdb)

Fix: 

none known :-(
How-To-Repeat: 
stressing the file system.  haven't figured out exactly what sequence
of operations seems to cause this.
Comment 1 sprice 1999-11-23 14:52:50 UTC
---------- Forwarded message ----------
Date: 23 Nov 1999 15:39:25 +0100
From: assar@stacken.kth.se
To: gnats-admin@FreeBSD.org
Cc: freebsd-bugs@FreeBSD.org
Subject: Re: kern/15063: "indirect pointer mismatch" in softdep

some additional information:

the file system being written to was / and as far as I can tell it
wasn't full at the moment.  a little bit of cut-n-paste from gdb:

(kgdb) f 20
#20 0xc023d239 in ffs_fsync (ap=0xc840df88) at ../../ufs/ffs/ffs_vnops.c:210
(kgdb) p *ap
$3 = {a_desc = 0xc02f5920, a_vp = 0xc83f9e80, a_cred = 0xc073b900, 
  a_waitfor = 3, a_p = 0xc83ff800}
(kgdb) p a_vp
No symbol "a_vp" in current context.
(kgdb) p ap->a_vp
$4 = (struct vnode *) 0xc83f9e80
(kgdb) p *$
$5 = {v_flag = 2105344, v_usecount = 1736, v_writecount = 0, v_holdcnt = 110, 
  v_id = 60, v_mount = 0xc0b09400, v_op = 0xc0b2cc00, v_freelist = {
    tqe_next = 0xc83f9d40, tqe_prev = 0xc83f9fdc}, v_mntvnodes = {
    le_next = 0xc83f9f20, le_prev = 0xc83f9d64}, v_cleanblkhd = {
    tqh_first = 0xc34679c0, tqh_last = 0xc34457f8}, v_dirtyblkhd = {
    tqh_first = 0xc3468548, tqh_last = 0xc34a28c8}, v_synclist = {
    le_next = 0x0, le_prev = 0xc0b18664}, v_numoutput = 1, v_type = VBLK, 
  v_un = {vu_mountedhere = 0xc0b68b80, vu_socket = 0xc0b68b80, vu_spec = {
      vu_specinfo = 0xc0b68b80, vu_specnext = {sle_next = 0x0}}, 
    vu_fifoinfo = 0xc0b68b80}, v_lease = 0x0, v_lastw = 0, v_cstart = 0, 
  v_lasta = 0, v_clen = 0, v_object = 0xc03291c0, v_interlock = {
    lock_data = 0}, v_vnlock = 0x0, v_tag = VT_UFS, v_data = 0xc0b66f00, 
  v_cache_src = {lh_first = 0x0}, v_cache_dst = {tqh_first = 0xc0b2bac0, 
    tqh_last = 0xc0b2bad0}, v_dd = 0xc83f9e80, v_ddid = 0, v_pollinfo = {
    vpi_lock = {lock_data = 0}, vpi_selinfo = {si_pid = 0, si_flags = 0}, 
    vpi_events = 0, vpi_revents = 0}}
(kgdb) p $.v_mount
$6 = (struct mount *) 0xc0b09400
(kgdb) p *$
$7 = {mnt_list = {tqe_next = 0xc0b5ca00, tqe_prev = 0xc0302964}, 
  mnt_op = 0xc030cbe0, mnt_vfc = 0xc030cc20, mnt_vnodecovered = 0x0, 
  mnt_syncer = 0xc83f9840, mnt_vnodelist = {lh_first = 0xc8e47600}, 
  mnt_lock = {lk_interlock = {lock_data = 0}, lk_flags = 17825792, 
    lk_sharecount = 1, lk_waitcount = 0, lk_exclusivecount = 0, lk_prio = 20, 
    lk_wmesg = 0xc02c46c2 "vfslock", lk_timo = 0, lk_lockholder = -1}, 
  mnt_flag = 2117632, mnt_kern_flag = 0, mnt_maxsymlinklen = 60, mnt_stat = {
    f_spare2 = 0, f_bsize = 1024, f_iosize = 8192, f_blocks = 39647, 
    f_bfree = 12503, f_bavail = 9332, f_files = 10110, f_ffree = 9006, 
    f_fsid = {val = {938346654, 1041415783}}, f_owner = 0, f_type = 2, 
    f_flags = 2117632, f_syncwrites = 5, f_asyncwrites = 138, 
    f_fstypename = "ufs", '\000' <repeats 12 times>, 
    f_mntonname = "/", '\000' <repeats 88 times>, 
    f_mntfromname = "/dev/wd0s1a", '\000' <repeats 78 times>}, 
  mnt_data = 0xc0b5cc00, mnt_time = 943360985, mnt_iosize_max = 126976}
(kgdb)
Comment 2 Poul-Henning Kamp freebsd_committer freebsd_triage 1999-12-22 16:39:46 UTC
Responsible Changed
From-To: freebsd-bugs->mckusick

Kirks department. 
Comment 3 Kirk McKusick freebsd_committer freebsd_triage 2000-01-11 21:10:01 UTC
State Changed
From-To: open->closed

A fix was applied to sys/ufs/ffs/ffs_balloc.c. The checkin comment was: 

revision 1.25 
date: 2000/01/11 08:27:00;  author: mckusick;  state: Exp;  lines: +14 -4 
The only known cause of this panic is running out of disk space. 
The problem occurs when an indirect block and a data block are 
being allocated at the same time. For example when the 13th block 
of the file is written, the filesystem needs to allocate the first 
indirect block and a data block. If the indirect block allocation 
succeeds, but the data block allocation fails, the error code 
dellocates the indirect block as it has nothing at which to point. 
Unfortunately, it does not deallocate the indirect block's associated 
dependencies which then fail when they find the block unexpectedly 
gone (ptr == 0 instead of its expected value). The fix is to fsync 
the file before doing the block rollback, as the fsync will flush 
out all of the dependencies. Once the rollback is done the file 
must be fsync'ed again so that the soft updates code does not find 
unexpected changes. This approach is much slower than writing the 
code to back out the extraneous dependencies, but running out of 
disk space is not expected to be a common occurence, so just getting 
it right is the main criterion. 

PR:             kern/15063 
Submitted by:   Assar Westerlund <assar@stacken.kth.se>