This is at least the third file system which got hosed for me by the ufs_dirbad bug on three different hard drives since 5.3 STABLE. I suspect this is related to the following PRs: http://www.FreeBSD.org/cgi/query-pr.cgi?pr=49079 http://www.FreeBSD.org/cgi/query-pr.cgi?pr=51001 In every case a process would lock up making the whole system unresponsive. A reboot, fsck -y in single user mode and another reboot would produce the following during the mount of the corrupt fs in rw mode: bad dir ino 2 at offset 16384: mangled entry panic: ufs_dirbad: bad dir cpuid = 0 Another reboot, fsck -y in single user mode and reboot produces the same results repeatedly. Previously I had recovered by mounting the corrupt fs in ro mode, backup, newfs, restore. Recently I noticed Matthew Dillon commit the following to the DragonFly src repository: http://leaf.DragonFlyBSD.org/mailarchive/commits/2006-02/msg00057.html dillon 2006/02/21 10:46:56 PST DragonFly src repository Modified files: sys/kern vfs_cluster.c Log: bioops.io_start() was being called in a situation where the buffer could be brelse()'d afterwords instead of I/O being initiated. When this occurs, the buffer may contain softupdates-modified data which is never reverted, resulting in serious filesystem corruption. When io_start is called on a buffer, I/O MUST be initiated and terminated with a biodone() or the buffer's data may not be properly reverted. Solve the problem by moving the io_start() call a little further on in the code, after the potential brelse(). There is a possibility that this bug is responsible for the 'dirbad' panics often reported in DragonFly and FreeBSD circles. Revision Changes Path 1.16 +7 -6 src/sys/kern/vfs_cluster.c http://www.DragonFlyBSD.org/cvsweb/src/sys/kern/vfs_cluster.c.diff?r1=1.15&r2=1.16&f=u Below is the equivalent patch to the FreeBSD RELENG_6 branch of src/sys/kern/vfs_cluster.c Hope this helps track down the problem. How-To-Repeat: mount <corrupt ufs>
On Tue, Feb 28, 2006 at 10:35:36AM -0500, Yarema wrote: > Hope this helps track down the problem. Does it work for you? :) Kris
I have been working with the bad dir problem for several months and I have not had corruption which fsck would not correct. -DR
> I have been working with the bad dir problem for several months and I > have not had corruption which fsck would not correct. Me either, but that's surely small comfort to Yarema :-) Kris
Hello! On Wed, 1 Mar 2006, Kris Kennaway wrote: >> I have been working with the bad dir problem for several months and I >> have not had corruption which fsck would not correct. > > Me either, but that's surely small comfort to Yarema :-) I think it would be great if originator of this PR tried to mount damaged fs ro, found broken directory (I think ino 2 is always the root directory, isn't it?), dumped and analyzed it's contents in order to find out how the corruption looks like. Then we could at least recreate the result of the corruption on test filesystem (by binary editing the media) and teach fsck how to cure such corruptions. Sincerely, Dmitry -- Atlantis ISP, System Administrator e-mail: dmitry@atlantis.dp.ua nic-hdl: LYNX-RIPE
Update: Attempting to mount my corrupt /home slice produces the following: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0xdab2d004 fault code = supervisor read, page not present instruction pointer = 0x20:0xc06ff7d7 stack pointer = 0x28:0xe9c56514 frame pointer = 0x28:0xe9c56570 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 73 (mount) trap number = 12 panic: page fault cpuid = 0 Uptime 23s Dumping 1023 MB (2 chunks) chunk 0: 1MB (159 pages) ... ok chunk 1: 1023MB (261748 pages) ... ok However I am able to mount this same corrupt /home partition with the 6.1-BETA2 kernel without error. After tweaking, building and testing my custom KERNCONF the problem seems to be with: options UFS_EXTATTR options UFS_EXTATTR_AUTOSTART which according to the src/sys/ufs/ufs/README.* should only effect UFS1. I only use UFS2, so technically I do not need these options. To sum up: my computer became unresponsive. Reboot and fsck produced a "panic: ufs_dirbad: bad dir". Many fsck -f runs later 'mount -r /home' started causing the "Fatal trap 12: page fault while in kernel mode" panic. Booting a kernel without options UFS_EXTATTR & UFS_EXTATTR_AUTOSTART does not cause this mount proc kernel panic. I have a vmcore dump if anyone cares to look at it. Question: Why does fsck mark a UFS2 clean, but a kernel with options UFS_EXTATTR & UFS_EXTATTR_AUTOSTART still cause a kernel panic on that one UFS2 but none of the other UFS2 slices? Regarding the src/sys/kern/vfs_cluster.c patch. All of the testing described above was performed with the patch applied. But the initial corruption occurred before the patch. It would be nice if someone who understands that code looked at it, blessed it and got it committed. We will only find out if the patch indeed fixes the ufs_dirbad problem if all those who've been bitten by this bug no longer run into this sort of corruption over time. -- Yarema
On Thu, 2 Mar 2006, Yarema wrote: > options UFS_EXTATTR > options UFS_EXTATTR_AUTOSTART If you disable just UFS_EXTATTR_AUTOSTART, does the panic go away? The autostart routine relies on reading directory data (or at least, performing lookups) during the mount process. While it shouldn't be running on UFS2, it could be that it is, and if something has changed in the mount process so that reading directories that early is no longer functional, it could be that this causes an incorrect reporting of on-disk corruption (i.e., it could be a data structure initialization problem or the like). Robert N M Watson
--On March 3, 2006 3:51:45 AM +0000 Robert Watson <rwatson@FreeBSD.org> wrote: > > On Thu, 2 Mar 2006, Yarema wrote: > >> options UFS_EXTATTR >> options UFS_EXTATTR_AUTOSTART > > If you disable just UFS_EXTATTR_AUTOSTART, does the panic go away? The > autostart routine relies on reading directory data (or at least, > performing lookups) during the mount process. While it shouldn't be > running on UFS2, it could be that it is, and if something has changed in > the mount process so that reading directories that early is no longer > functional, it could be that this causes an incorrect reporting of > on-disk corruption (i.e., it could be a data structure initialization > problem or the like). > > Robert N M Watson Damn, I just reformatted the corrupt partition so I can no longer try this. But as per Dmitry's suggestion before newfs wiped it all I did a: mount -r /home cat /home > /tmp/ar0s1e.ino2 umount /home mount -r /dev/twed0s1e /home cat /home > /tmp/twed0s1e.ino2 umount /home Where ar0s1e is the corrupt slice and the dir is 17408 bytes in size and there seems to be a huge chunk of mostly null data following the entry for lost+found. twed0s1e is where I backed up the /home fs and the dir in a more reasonable 1024 bytes in size. Both files can be found at <http://yds.CoolRat.org/freebsd/> along with an archive of my KERNCONF files containing a short README explaining how I manage my KERNCONFs. -- Yarema
On Thu, 02 Mar 2006 23:39:45 -0500 Yarema <yds@CoolRat.org> wrote: > --On March 3, 2006 3:51:45 AM +0000 Robert Watson > <rwatson@FreeBSD.org> wrote: > > > > > On Thu, 2 Mar 2006, Yarema wrote: > > > >> options UFS_EXTATTR > >> options UFS_EXTATTR_AUTOSTART > > > > If you disable just UFS_EXTATTR_AUTOSTART, does the panic go away? > > The autostart routine relies on reading directory data (or at least, > > performing lookups) during the mount process. While it shouldn't be > > running on UFS2, it could be that it is, and if something has > > changed in the mount process so that reading directories that early > > is no longer functional, it could be that this causes an incorrect > > reporting of on-disk corruption (i.e., it could be a data structure > > initialization problem or the like). > > > > Robert N M Watson > > Damn, I just reformatted the corrupt partition so I can no longer try > this. But I do, I kept my bad partition all the times, (it was just /usr/ports/distfiles - lucky me). With normal, stock RELENG_6 sources from yesterday: Kernel with UFS_EXTATTR and UFS_EXTATTR_AUTOSTART -> panic on mount Kernel with UFS_EXTATTR without UFS_EXTATTR_AUTOSTART -> works I recently upgraded my RAM, so I had too little swap to dump, but if a dump needs to be generated, I'll remove a few of them and we'll have one. Joerg -- | /"\ ASCII ribbon | GnuPG Key ID | e86d b753 3deb e749 6c3a | | \ / campaign against | 0xbbcaad24 | 5706 1f7d 6cfd bbca ad24 | | X HTML in email | Now featuring a brand new GPG-Key! | | / \ and news | Please update your keyring. |
I've found three additional issues which might be related to ufs_dirbad panics. Again, unfortunately, no smoking gun. First, if B_NOCACHE gets set on a B_DIRTY buffer, the buffer can be lost without the data being written under certain conditions due to brelse() mechanics. B_NOCACHE is typically set by softupdates related code but can be set by other things as well (in particular, if a buffer is resized, and certain write/read combinations). One might think that calling bwrite() after setting B_NOCACHE would be safe, but that is not necessarily true. If a buffer is redirtied (B_DIRTY set) during the write, something which softupdates does all the time, B_NOCACHE almost certainly has to be cleared. Of the three issues I found, this is the most likely cause. Second, vnode_pager_setsize() is being called too late in ufs/ufs/ufs_lookup.c (line 733 in FreeBSD-current). It is being called after the buffer has been instantiated. This could create problems with the VMIO backing store for the buffer created by the UFS_BALLOC call. Third, vnode_pager_setsize() is being called too late in ufs/ufs/ufs_vnops.c (line 1557 in FreeBSD-current). It is being called after the buffer has been instantiated by UFS_BALLOC() in ufs_mkdir(), which could create problems with the buffer's VMIO backing store. -- The M.O. of this corruption, after examining over a dozen kernel cores, makes me now believe that the corruption is occuring when the kernel attempts to append a full block to a directory. The bitmaps are all good... it is if as though the directory block never got written and the data we are seeing is data that existed in tha block before the directory allocated it. But, likewise, the issue has occured with different disk drivers so I think we can rule out a disk driver failure. The issue also seems to occur most often with large, 'busy' buffers (lots of directory operations going on). Since no similar corruption has ever been reported for heavily used files, this supports the idea that it is *not* the disk driver. I believe that the data is getting written to the filesystem buffer representing the new block, but the buffer or its backing store is somehow getting thrown away without being written, or getting thrown away and then reinstantiated without being read. The areas I indicate in the above list are areas where data can potentially get thrown away or lost prior to a write. -Matt Matthew Dillon <dillon@backplane.com> (Patch against DragonFly, will not apply to FreeBSD directly, included for reference only): Index: kern/vfs_bio.c =================================================================== RCS file: /cvs/src/sys/kern/vfs_bio.c,v retrieving revision 1.53.2.1 diff -u -r1.53.2.1 vfs_bio.c --- kern/vfs_bio.c 18 Apr 2006 17:12:25 -0000 1.53.2.1 +++ kern/vfs_bio.c 24 Apr 2006 19:22:04 -0000 @@ -972,6 +972,13 @@ bdirty(struct buf *bp) { KASSERT(bp->b_qindex == BQUEUE_NONE, ("bdirty: buffer %p still on queue %d", bp, bp->b_qindex)); + if (bp->b_flags & B_NOCACHE) { + printf("bdirty: clearing B_NOCACHE on buf %p\n", bp); + bp->b_flags &= ~B_NOCACHE; + } + if (bp->b_flags & B_INVAL) { + printf("bdirty: warning, dirtying invalid buffer %p\n", bp); + } bp->b_flags &= ~(B_READ|B_RELBUF); if ((bp->b_flags & B_DELWRI) == 0) { @@ -1096,6 +1103,11 @@ crit_enter(); + if ((bp->b_flags & (B_NOCACHE|B_DIRTY)) == (B_NOCACHE|B_DIRTY)) { + printf("warning: buf %p marked dirty & B_NOCACHE, clearing B_NOCACHE\n", bp); + bp->b_flags &= ~B_NOCACHE; + } + if (bp->b_flags & B_LOCKED) bp->b_flags &= ~B_ERROR; Index: vfs/ufs/ufs_lookup.c =================================================================== RCS file: /cvs/src/sys/vfs/ufs/ufs_lookup.c,v retrieving revision 1.18 diff -u -r1.18 ufs_lookup.c --- vfs/ufs/ufs_lookup.c 14 Sep 2005 01:13:48 -0000 1.18 +++ vfs/ufs/ufs_lookup.c 24 Apr 2006 19:22:23 -0000 @@ -716,6 +716,7 @@ */ if (dp->i_offset & (DIRBLKSIZ - 1)) panic("ufs_direnter: newblk"); + vnode_pager_setsize(dvp, dp->i_offset + DIRBLKSIZ); flags = B_CLRBUF; if (!DOINGSOFTDEP(dvp) && !DOINGASYNC(dvp)) flags |= B_SYNC; @@ -727,7 +728,6 @@ } dp->i_size = dp->i_offset + DIRBLKSIZ; dp->i_flag |= IN_CHANGE | IN_UPDATE; - vnode_pager_setsize(dvp, (u_long)dp->i_size); dirp->d_reclen = DIRBLKSIZ; blkoff = dp->i_offset & (VFSTOUFS(dvp->v_mount)->um_mountp->mnt_stat.f_iosize - 1); Index: vfs/ufs/ufs_vnops.c =================================================================== RCS file: /cvs/src/sys/vfs/ufs/ufs_vnops.c,v retrieving revision 1.32 diff -u -r1.32 ufs_vnops.c --- vfs/ufs/ufs_vnops.c 17 Sep 2005 07:43:12 -0000 1.32 +++ vfs/ufs/ufs_vnops.c 24 Apr 2006 19:22:42 -0000 @@ -1420,12 +1420,12 @@ dirtemplate = *dtp; dirtemplate.dot_ino = ip->i_number; dirtemplate.dotdot_ino = dp->i_number; + vnode_pager_setsize(tvp, DIRBLKSIZ); if ((error = VOP_BALLOC(tvp, (off_t)0, DIRBLKSIZ, cnp->cn_cred, B_CLRBUF, &bp)) != 0) goto bad; ip->i_size = DIRBLKSIZ; ip->i_flag |= IN_CHANGE | IN_UPDATE; - vnode_pager_setsize(tvp, (u_long)ip->i_size); bcopy((caddr_t)&dirtemplate, (caddr_t)bp->b_data, sizeof dirtemplate); if (DOINGSOFTDEP(tvp)) { /*
Responsible Changed From-To: freebsd-bugs->freebsd-fs Over to freebsd-fs, where hopefully somebody can assess this PR
does the dragon fly BSD patch work ? I m hitting a similar issue and would like to consider porting it to FreeBSD.
(In reply to rdarbha from comment #11) Do you understand that your question contains the intrinsic contradiction ? Anyway, I looked at the Matt' patches. The vfs_cluster changes seems to be irrelevant, we start io (and perform SU-related rollbacks) in ffs_geom_strategy() which is executed after the cluster is fully constructed and validated. Similarly, we assert that there is no dandling dependencies when B_NOCACHE buffer is thrown away in brelse(). So I think that these bits are not (directly) relevant to us. The interesting stuff is vnode vm_object size handling for directories. This is the right thing to do, but I doubt that we would have issues with the present order as far as vnode is not unlocked between buffer allocation and pager resizing. Still it is better to do it right, patch is attached. If you have dirbad panics, I would first check your hardware and verified integrity of other files on the same volume. If you have canonical copy of the data, say system distribution disk which was used to install, compare the checksums of regular files.
Created attachment 172710 [details] Ensure that pager is resized before UFS directory blocks are allocated.
I concur with the analysis and proposed changes by Konstantin Belousov <kib@freebsd.org>.
I meant working in Dragon fly as I am hitting it in Freebsd and would like to port. :) Anyways thanks for the quick response and the patch ! ~Ravi
A commit references this bug: Author: kib Date: Wed Jul 20 14:40:57 UTC 2016 New revision: 303090 URL: https://svnweb.freebsd.org/changeset/base/303090 Log: Ensure that the UFS directory vnode' vm_object is properly sized before UFS_BALLOC() is called. I do not believe that this caused any real issue on FreeBSD because the exclusive vnode lock is held over the balloc/resize, the change is to make formally correct KPI use. Based on: the Matthew Dillon' patch from DragonFly BSD PR: 93942 Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Changes: head/sys/ufs/ufs/ufs_lookup.c head/sys/ufs/ufs/ufs_vnops.c
batch change: For bugs that match the following - Status Is In progress AND - Untouched since 2018-01-01. AND - Affects Base System OR Documentation DO: Reset to open status. Note: I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
This report should have been marked as closed after the change made in comment #16 was made.