Bug 93942 - [vfs] [patch] panic: ufs_dirbad: bad dir (patch from DragonFly)
Summary: [vfs] [patch] panic: ufs_dirbad: bad dir (patch from DragonFly)
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 6.1-PRERELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-fs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-02-28 15:40 UTC by Yarema
Modified: 2018-05-28 21:37 UTC (History)
5 users (show)

See Also:


Attachments
file.diff (866 bytes, patch)
2006-02-28 15:40 UTC, Yarema
no flags Details | Diff
Ensure that pager is resized before UFS directory blocks are allocated. (1.77 KB, patch)
2016-07-19 10:08 UTC, Konstantin Belousov
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yarema 2006-02-28 15:40:06 UTC
This is at least the third file system which got hosed for me by the
ufs_dirbad bug on three different hard drives since 5.3 STABLE.
I suspect this is related to the following PRs:
http://www.FreeBSD.org/cgi/query-pr.cgi?pr=49079
http://www.FreeBSD.org/cgi/query-pr.cgi?pr=51001

In every case a process would lock up making the whole system
unresponsive.  A reboot, fsck -y in single user mode and another
reboot would produce the following during the mount of the corrupt
fs in rw mode:

bad dir ino 2 at  offset 16384: mangled entry
panic: ufs_dirbad: bad dir
cpuid = 0

Another reboot, fsck -y in single user mode and reboot produces the
same results repeatedly.  Previously I had recovered by mounting the
corrupt fs in ro mode, backup, newfs, restore.

Recently I noticed Matthew Dillon commit the following to the
DragonFly src repository:

http://leaf.DragonFlyBSD.org/mailarchive/commits/2006-02/msg00057.html

dillon      2006/02/21 10:46:56 PST

DragonFly src repository

  Modified files:
    sys/kern             vfs_cluster.c 
  Log:
  bioops.io_start() was being called in a situation where the buffer could
  be brelse()'d afterwords instead of I/O being initiated.  When this occurs,
  the buffer may contain softupdates-modified data which is never reverted,
  resulting in serious filesystem corruption.  When io_start is called on a
  buffer, I/O MUST be initiated and terminated with a biodone() or the buffer's
  data may not be properly reverted.
  
  Solve the problem by moving the io_start() call a little further on in the
  code, after the potential brelse().
  
  There is a possibility that this bug is responsible for the 'dirbad' panics
  often reported in DragonFly and FreeBSD circles.
  
  Revision  Changes    Path
  1.16      +7 -6      src/sys/kern/vfs_cluster.c

http://www.DragonFlyBSD.org/cvsweb/src/sys/kern/vfs_cluster.c.diff?r1=1.15&r2=1.16&f=u

Below is the equivalent patch to the FreeBSD RELENG_6 branch of
src/sys/kern/vfs_cluster.c

Hope this helps track down the problem.

How-To-Repeat: 	mount <corrupt ufs>
Comment 1 Kris Kennaway 2006-02-28 19:53:43 UTC
On Tue, Feb 28, 2006 at 10:35:36AM -0500, Yarema wrote:
> Hope this helps track down the problem.

Does it work for you? :)

Kris
Comment 2 David Rhodus 2006-03-01 20:10:38 UTC
I have been working with the bad dir problem for several months and I
have not had corruption which fsck would not correct.


-DR
Comment 3 Kris Kennaway 2006-03-01 20:19:37 UTC
> I have been working with the bad dir problem for several months and I
> have not had corruption which fsck would not correct.

Me either, but that's surely small comfort to Yarema :-)

Kris
Comment 4 Dmitry Pryanishnikov 2006-03-02 08:37:34 UTC
Hello!

On Wed, 1 Mar 2006, Kris Kennaway wrote:
>> I have been working with the bad dir problem for several months and I
>> have not had corruption which fsck would not correct.
>
> Me either, but that's surely small comfort to Yarema :-)

  I think it would be great if originator of this PR tried to mount damaged fs
ro, found broken directory (I think ino 2 is always the root directory,
isn't it?), dumped and analyzed it's contents in order to find out how the
corruption looks like. Then we could at least recreate the result of
the corruption on test filesystem (by binary editing the media) and
teach fsck how to cure such corruptions.

Sincerely, Dmitry
-- 
Atlantis ISP, System Administrator
e-mail:  dmitry@atlantis.dp.ua
nic-hdl: LYNX-RIPE
Comment 5 Yarema 2006-03-03 03:04:32 UTC
Update:

Attempting to mount my corrupt /home slice produces the following:

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address	= 0xdab2d004
fault code		= supervisor read, page not present
instruction pointer	= 0x20:0xc06ff7d7
stack pointer		= 0x28:0xe9c56514
frame pointer		= 0x28:0xe9c56570
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, def32 1, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 73 (mount)
trap number		= 12
panic: page fault
cpuid = 0
Uptime 23s
Dumping 1023 MB (2 chunks)
  chunk 0: 1MB (159 pages) ... ok
  chunk 1: 1023MB (261748 pages) ... ok

However I am able to mount this same corrupt /home partition with the 
6.1-BETA2 kernel without error.  After tweaking, building and testing my 
custom  KERNCONF the problem seems to be with:

options	UFS_EXTATTR
options	UFS_EXTATTR_AUTOSTART

which according to the src/sys/ufs/ufs/README.* should only effect UFS1.  I 
only use UFS2, so technically I do not need these options.

To sum up: my computer became unresponsive.  Reboot and fsck produced a 
"panic: ufs_dirbad: bad dir".  Many fsck -f runs later 'mount -r /home' 
started causing the "Fatal trap 12: page fault while in kernel mode" panic. 
Booting a kernel without options UFS_EXTATTR & UFS_EXTATTR_AUTOSTART does 
not cause this mount proc kernel panic.  I have a vmcore dump if anyone 
cares to look at it.

Question: Why does fsck mark a UFS2 clean, but a kernel with options 
UFS_EXTATTR & UFS_EXTATTR_AUTOSTART still cause a kernel panic on that one 
UFS2 but none of the other UFS2 slices?

Regarding the src/sys/kern/vfs_cluster.c patch.  All of the testing 
described above was performed with the patch applied.  But the initial 
corruption occurred before the patch.  It would be nice if someone who 
understands that code looked at it, blessed it and got it committed.  We 
will only find out if the patch indeed fixes the ufs_dirbad problem if all 
those who've been bitten by this bug no longer run into this sort of 
corruption over time.

-- 
Yarema
Comment 6 Robert Watson freebsd_committer freebsd_triage 2006-03-03 03:51:45 UTC
On Thu, 2 Mar 2006, Yarema wrote:

> options	UFS_EXTATTR
> options	UFS_EXTATTR_AUTOSTART

If you disable just UFS_EXTATTR_AUTOSTART, does the panic go away?  The 
autostart routine relies on reading directory data (or at least, performing 
lookups) during the mount process.  While it shouldn't be running on UFS2, it 
could be that it is, and if something has changed in the mount process so that 
reading directories that early is no longer functional, it could be that this 
causes an incorrect reporting of on-disk corruption (i.e., it could be a data 
structure initialization problem or the like).

Robert N M Watson
Comment 7 Yarema 2006-03-03 04:39:45 UTC
--On March 3, 2006 3:51:45 AM +0000 Robert Watson <rwatson@FreeBSD.org> 
wrote:

>
> On Thu, 2 Mar 2006, Yarema wrote:
>
>> options	UFS_EXTATTR
>> options	UFS_EXTATTR_AUTOSTART
>
> If you disable just UFS_EXTATTR_AUTOSTART, does the panic go away?  The
> autostart routine relies on reading directory data (or at least,
> performing lookups) during the mount process.  While it shouldn't be
> running on UFS2, it could be that it is, and if something has changed in
> the mount process so that reading directories that early is no longer
> functional, it could be that this causes an incorrect reporting of
> on-disk corruption (i.e., it could be a data structure initialization
> problem or the like).
>
> Robert N M Watson

Damn, I just reformatted the corrupt partition so I can no longer try this. 
But as per Dmitry's suggestion before newfs wiped it all I did a:

mount -r /home
cat /home > /tmp/ar0s1e.ino2
umount /home
mount -r /dev/twed0s1e /home
cat /home > /tmp/twed0s1e.ino2
umount /home

Where ar0s1e is the corrupt slice and the dir is 17408 bytes in size and 
there seems to be a huge chunk of mostly null data following the entry for 
lost+found.  twed0s1e is where I backed up the /home fs and the dir in a 
more reasonable 1024 bytes in size.  Both files can be found at 
<http://yds.CoolRat.org/freebsd/> along with an archive of my KERNCONF 
files containing a short README explaining how I manage my KERNCONFs.

-- 
Yarema
Comment 8 Jörg Pernfuß 2006-03-07 09:51:22 UTC
On Thu, 02 Mar 2006 23:39:45 -0500
Yarema <yds@CoolRat.org> wrote:

> --On March 3, 2006 3:51:45 AM +0000 Robert Watson
> <rwatson@FreeBSD.org> wrote:
> 
> >
> > On Thu, 2 Mar 2006, Yarema wrote:
> >
> >> options	UFS_EXTATTR
> >> options	UFS_EXTATTR_AUTOSTART
> >
> > If you disable just UFS_EXTATTR_AUTOSTART, does the panic go away?
> > The autostart routine relies on reading directory data (or at least,
> > performing lookups) during the mount process.  While it shouldn't be
> > running on UFS2, it could be that it is, and if something has
> > changed in the mount process so that reading directories that early
> > is no longer functional, it could be that this causes an incorrect
> > reporting of on-disk corruption (i.e., it could be a data structure
> > initialization problem or the like).
> >
> > Robert N M Watson
> 
> Damn, I just reformatted the corrupt partition so I can no longer try
> this.


But I do, I kept my bad partition all the times, (it was just
/usr/ports/distfiles - lucky me).

With normal, stock RELENG_6 sources from yesterday:

Kernel with UFS_EXTATTR and UFS_EXTATTR_AUTOSTART -> panic on mount
Kernel with UFS_EXTATTR without UFS_EXTATTR_AUTOSTART -> works

I recently upgraded my RAM, so I had too little swap to dump, but if
a dump needs to be generated, I'll remove a few of them and we'll
have one.

	Joerg

-- 
| /"\   ASCII ribbon   |  GnuPG Key ID | e86d b753 3deb e749 6c3a |
| \ / campaign against |    0xbbcaad24 | 5706 1f7d 6cfd bbca ad24 |
|  X    HTML in email  |  Now featuring a brand new GPG-Key!      |
| / \     and news     |  Please update your keyring.             |
Comment 9 Matthew Dillon 2006-05-04 20:33:28 UTC
    I've found three additional issues which might be related to ufs_dirbad
    panics.  Again, unfortunately, no smoking gun.

    First, if B_NOCACHE gets set on a B_DIRTY buffer, the buffer can be
    lost without the data being written under certain conditions due
    to brelse() mechanics.  B_NOCACHE is typically set by softupdates 
    related code but can be set by other things as well (in particular,
    if a buffer is resized, and certain write/read combinations).  One
    might think that calling bwrite() after setting B_NOCACHE would be
    safe, but that is not necessarily true.  If a buffer is redirtied
    (B_DIRTY set) during the write, something which softupdates does all
    the time, B_NOCACHE almost certainly has to be cleared.  Of the three
    issues I found, this is the most likely cause.

    Second, vnode_pager_setsize() is being called too late in 
    ufs/ufs/ufs_lookup.c (line 733 in FreeBSD-current).  It is
    being called after the buffer has been instantiated.  This could
    create problems with the VMIO backing store for the buffer created
    by the UFS_BALLOC call.

    Third, vnode_pager_setsize() is being called too late in
    ufs/ufs/ufs_vnops.c (line 1557 in FreeBSD-current).  It is 
    being called after the buffer has been instantiated by UFS_BALLOC()
    in ufs_mkdir(), which could create problems with the buffer's VMIO
    backing store.

    --

    The M.O. of this corruption, after examining over a dozen kernel cores,
    makes me now believe that the corruption is occuring when the kernel
    attempts to append a full block to a directory.  The bitmaps are all
    good... it is if as though the directory block never got written and
    the data we are seeing is data that existed in tha block before the
    directory allocated it.  But, likewise, the issue has occured with
    different disk drivers so I think we can rule out a disk driver failure.
    The issue also seems to occur most often with large, 'busy' buffers
    (lots of directory operations going on).  Since no similar corruption
    has ever been reported for heavily used files, this supports the idea
    that it is *not* the disk driver.

    I believe that the data is getting written to the filesystem buffer
    representing the new block, but the buffer or its backing store
    is somehow getting thrown away without being written, or getting thrown
    away and then reinstantiated without being read.   The areas I 
    indicate in the above list are areas where data can potentially get
    thrown away or lost prior to a write.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>


(Patch against DragonFly, will not apply to FreeBSD directly, included for
reference only):

Index: kern/vfs_bio.c
===================================================================
RCS file: /cvs/src/sys/kern/vfs_bio.c,v
retrieving revision 1.53.2.1
diff -u -r1.53.2.1 vfs_bio.c
--- kern/vfs_bio.c	18 Apr 2006 17:12:25 -0000	1.53.2.1
+++ kern/vfs_bio.c	24 Apr 2006 19:22:04 -0000
@@ -972,6 +972,13 @@
 bdirty(struct buf *bp)
 {
 	KASSERT(bp->b_qindex == BQUEUE_NONE, ("bdirty: buffer %p still on queue %d", bp, bp->b_qindex));
+	if (bp->b_flags & B_NOCACHE) {
+		printf("bdirty: clearing B_NOCACHE on buf %p\n", bp);
+		bp->b_flags &= ~B_NOCACHE;
+	}
+	if (bp->b_flags & B_INVAL) {
+		printf("bdirty: warning, dirtying invalid buffer %p\n", bp);
+	}
 	bp->b_flags &= ~(B_READ|B_RELBUF);
 
 	if ((bp->b_flags & B_DELWRI) == 0) {
@@ -1096,6 +1103,11 @@
 
 	crit_enter();
 
+	if ((bp->b_flags & (B_NOCACHE|B_DIRTY)) == (B_NOCACHE|B_DIRTY)) {
+		printf("warning: buf %p marked dirty & B_NOCACHE, clearing B_NOCACHE\n", bp);
+		bp->b_flags &= ~B_NOCACHE;
+	}
+
 	if (bp->b_flags & B_LOCKED)
 		bp->b_flags &= ~B_ERROR;
 
Index: vfs/ufs/ufs_lookup.c
===================================================================
RCS file: /cvs/src/sys/vfs/ufs/ufs_lookup.c,v
retrieving revision 1.18
diff -u -r1.18 ufs_lookup.c
--- vfs/ufs/ufs_lookup.c	14 Sep 2005 01:13:48 -0000	1.18
+++ vfs/ufs/ufs_lookup.c	24 Apr 2006 19:22:23 -0000
@@ -716,6 +716,7 @@
 		 */
 		if (dp->i_offset & (DIRBLKSIZ - 1))
 			panic("ufs_direnter: newblk");
+		vnode_pager_setsize(dvp, dp->i_offset + DIRBLKSIZ);
 		flags = B_CLRBUF;
 		if (!DOINGSOFTDEP(dvp) && !DOINGASYNC(dvp))
 			flags |= B_SYNC;
@@ -727,7 +728,6 @@
 		}
 		dp->i_size = dp->i_offset + DIRBLKSIZ;
 		dp->i_flag |= IN_CHANGE | IN_UPDATE;
-		vnode_pager_setsize(dvp, (u_long)dp->i_size);
 		dirp->d_reclen = DIRBLKSIZ;
 		blkoff = dp->i_offset &
 		    (VFSTOUFS(dvp->v_mount)->um_mountp->mnt_stat.f_iosize - 1);
Index: vfs/ufs/ufs_vnops.c
===================================================================
RCS file: /cvs/src/sys/vfs/ufs/ufs_vnops.c,v
retrieving revision 1.32
diff -u -r1.32 ufs_vnops.c
--- vfs/ufs/ufs_vnops.c	17 Sep 2005 07:43:12 -0000	1.32
+++ vfs/ufs/ufs_vnops.c	24 Apr 2006 19:22:42 -0000
@@ -1420,12 +1420,12 @@
 	dirtemplate = *dtp;
 	dirtemplate.dot_ino = ip->i_number;
 	dirtemplate.dotdot_ino = dp->i_number;
+	vnode_pager_setsize(tvp, DIRBLKSIZ);
 	if ((error = VOP_BALLOC(tvp, (off_t)0, DIRBLKSIZ, cnp->cn_cred,
 	    B_CLRBUF, &bp)) != 0)
 		goto bad;
 	ip->i_size = DIRBLKSIZ;
 	ip->i_flag |= IN_CHANGE | IN_UPDATE;
-	vnode_pager_setsize(tvp, (u_long)ip->i_size);
 	bcopy((caddr_t)&dirtemplate, (caddr_t)bp->b_data, sizeof dirtemplate);
 	if (DOINGSOFTDEP(tvp)) {
 		/*
Comment 10 Gavin Atkinson freebsd_committer freebsd_triage 2008-06-20 21:01:34 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-fs

Over to freebsd-fs, where hopefully somebody can assess this PR
Comment 11 rdarbha 2016-07-18 16:58:02 UTC
does the dragon fly BSD patch work ?  I m hitting a similar issue and would like to consider porting it to FreeBSD.
Comment 12 Konstantin Belousov freebsd_committer freebsd_triage 2016-07-19 10:06:34 UTC
(In reply to rdarbha from comment #11)
Do you understand that your question contains the intrinsic contradiction ?

Anyway, I looked at the Matt' patches.  The vfs_cluster changes seems to be irrelevant, we start io (and perform SU-related rollbacks) in ffs_geom_strategy() which is executed after the cluster is fully constructed and validated.  Similarly, we assert that there is no dandling dependencies when B_NOCACHE buffer is thrown away in brelse().  So I think that these bits are not (directly) relevant to us.

The interesting stuff is vnode vm_object size handling for directories.  This is the right thing to do, but I doubt that we would have issues with the present order as far as vnode is not unlocked between buffer allocation and pager resizing.  Still it is better to do it right, patch is attached.

If you have dirbad panics, I would first check your hardware and verified integrity of other files on the same volume.  If you have canonical copy of the data, say system distribution disk which was used to install, compare the checksums of regular files.
Comment 13 Konstantin Belousov freebsd_committer freebsd_triage 2016-07-19 10:08:19 UTC
Created attachment 172710 [details]
Ensure that pager is resized before UFS directory blocks are allocated.
Comment 14 Kirk McKusick freebsd_committer freebsd_triage 2016-07-19 11:01:32 UTC
I concur with the analysis and proposed changes by Konstantin Belousov <kib@freebsd.org>.
Comment 15 rdarbha 2016-07-19 16:17:19 UTC
I meant working in Dragon fly as I am hitting it in Freebsd and would like to port. :) Anyways thanks for the quick response and the patch !

~Ravi
Comment 16 commit-hook freebsd_committer freebsd_triage 2016-07-20 14:41:07 UTC
A commit references this bug:

Author: kib
Date: Wed Jul 20 14:40:57 UTC 2016
New revision: 303090
URL: https://svnweb.freebsd.org/changeset/base/303090

Log:
  Ensure that the UFS directory vnode' vm_object is properly sized
  before UFS_BALLOC() is called.  I do not believe that this caused any
  real issue on FreeBSD because the exclusive vnode lock is held over
  the balloc/resize, the change is to make formally correct KPI use.

  Based on:	the Matthew Dillon' patch from DragonFly BSD
  PR:	93942
  Reviewed by:	mckusick
  Tested by:	pho
  Sponsored by:	The FreeBSD Foundation
  MFC after:	1 week

Changes:
  head/sys/ufs/ufs/ufs_lookup.c
  head/sys/ufs/ufs/ufs_vnops.c
Comment 17 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:45:59 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
Comment 18 Kirk McKusick freebsd_committer freebsd_triage 2018-05-28 21:37:08 UTC
This report should have been marked as closed after the change made in comment #16 was made.