211013 – Write error to UFS filesystem with softupdates panics machine

Bug 211013 - Write error to UFS filesystem with softupdates panics machine

Summary: Write error to UFS filesystem with softupdates panics machine

Status:	Closed FIXED

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	11.0-BETA1
Hardware:	Any Any

Importance:	--- Affects Many People
Assignee:	freebsd-fs (Nobody)

URL:
Keywords:

Depends on:
Blocks:

Reported:	2016-07-11 17:56 UTC by karl
Modified:	2016-10-17 22:43 UTC (History)
CC List:	4 users (show)

See Also:

Flags:	op: mfc-stable10?

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description karl 2016-07-11 17:56:05 UTC

The machine in question had mounted a UFS filesystem mounted that had softupdates enabled (on an SD card; I was updating a system that runs FreeBSD on a Raspberry Pi2 by plugging the card into a different machine) and the I/O card took an unrecoverable write error.

The result was a kernel panic; this is apparently considered expected behavior at present if softupdates are turned on for the filesystem because it's possible that the filesystem has now been corrupted and there is no way to be sure with the machine running.  Thus the choice to panic() when this situation occurs.

But it appears that the choice to panic() is too broad and unnecessary in that in many cases a less-severe action is effective while not exposing the system to the risk of unknown filesystem corruption.

Yes, if there are working-set pages on that volume and it is corrupt, the system is no longer stable (this is especially true if the system is *running* from that volume.)  It is also true that in the case of a solid-state device of some kind the impact of a write error may cross a filesystem boundary, so it's insufficient to invalidate the filesystem (on a SSD or similar device the read/erase/write cycle for a data re-write may involve many megabytes of data, and that can possibly not be entirely local to the filesystem mounted if there is more than one on the physical volume.)

HOWEVER, forcibly-detaching the volume in question instead of calling panic() *should* be effective in preventing the possibility of propagating a corrupted filesystem.  While this will lead to a panic in the event that executing RSS (or consumed page file space) is present on that volume, in the case where the device holds only data the detach will *not* panic the machine.

This appears to be a situation where a less-severe "remedy" for a failed I/O is certainly called for.

The following backtrace was captured from the panic itself:

root@Dbms2:/var/crash # kgdb /boot/kernel/kernel vmcore.0
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
panic: initiate_write_inodeblock_ufs2: already started
cpuid = 14
KDB: stack backtrace:
#0 0xffffffff80b1f357 at kdb_backtrace+0x67
#1 0xffffffff80ad6ec2 at vpanic+0x182
#2 0xffffffff80ad6d33 at panic+0x43
#3 0xffffffff80dc16ad at softdep_disk_io_initiation+0x159d
#4 0xffffffff80de61eb at ffs_geom_strategy+0x13b
#5 0xffffffff80b872f7 at bufwrite+0x267
#6 0xffffffff80b8ac6a at vfs_bio_awrite+0x3ca
#7 0xffffffff80b96b77 at vop_stdfsync+0x277
#8 0xffffffff80983766 at devfs_fsync+0x26
#9 0xffffffff81101f7d at VOP_FSYNC_APV+0x8d
#10 0xffffffff80baf1ae at sched_sync+0x3be
#11 0xffffffff80a8dcb5 at fork_exit+0x85
#12 0xffffffff80f7f85e at fork_trampoline+0xe
Uptime: 27m9s


(kgdb) where
#0  doadump (textdump=<value optimized out>) at pcpu.h:221
#1  0xffffffff80ad6949 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:366
#2  0xffffffff80ad6efb in vpanic (fmt=<value optimized out>,
    ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:759
#3  0xffffffff80ad6d33 in panic (fmt=0x0)
    at /usr/src/sys/kern/kern_shutdown.c:690
#4  0xffffffff80dc16ad in softdep_disk_io_initiation (bp=<value optimized out>)
    at /usr/src/sys/ufs/ffs/ffs_softdep.c:10301
#5  0xffffffff80de61eb in ffs_geom_strategy (bo=<value optimized out>,
    bp=<value optimized out>) at buf.h:412
#6  0xffffffff80b872f7 in bufwrite (bp=0xfffffe02e8629b30) at buf.h:405
#7  0xffffffff80b8ac6a in vfs_bio_awrite (bp=<value optimized out>)
    at buf.h:393
#8  0xffffffff80b96b77 in vop_stdfsync (ap=0xfffffe034f481b68)
    at /usr/src/sys/kern/vfs_default.c:692
#9  0xffffffff80983766 in devfs_fsync (ap=0xfffffe034f481b68)
    at /usr/src/sys/fs/devfs/devfs_vnops.c:702
#10 0xffffffff81101f7d in VOP_FSYNC_APV (vop=<value optimized out>,
    a=<value optimized out>) at vnode_if.c:1331
#11 0xffffffff80baf1ae in sched_sync () at vnode_if.h:549
#12 0xffffffff80a8dcb5 in fork_exit (callout=0xffffffff80baedf0 <sched_sync>,
    arg=0x0, frame=0xfffffe034f481c00) at /usr/src/sys/kern/kern_fork.c:1038
#13 0xffffffff80f7f85e in fork_trampoline ()
    at /usr/src/sys/amd64/amd64/exception.S:611
#14 0x0000000000000000 in ?? ()
(kgdb)

FreeBSD 11.0-BETA1 #0 r302439: Fri Jul  8 14:37:27 CDT 2016     karl@Dbms2.denninger.net:/usr/obj/usr/src/sys/GENERIC

The offending code line:

static void
initiate_write_inodeblock_ufs2(inodedep, bp)
        struct inodedep *inodedep;
        struct buf *bp;                 /* The inode block */
{
        struct allocdirect *adp, *lastadp;
        struct ufs2_dinode *dp;
        struct ufs2_dinode *sip;
        struct inoref *inoref;
        struct ufsmount *ump;
        struct fs *fs;
        ufs_lbn_t i;
#ifdef INVARIANTS
        ufs_lbn_t prevlbn = 0;
#endif
        int deplist;

        if (inodedep->id_state & IOSTARTED)
                panic("initiate_write_inodeblock_ufs2: already started");
        inodedep->id_state |= IOSTARTED;



-- End capture

Comment 1 commit-hook freebsd_committer

2016-08-16 21:02:34 UTC

A commit references this bug:

Author: mckusick
Date: Tue Aug 16 21:02:30 UTC 2016
New revision: 304239
URL: https://svnweb.freebsd.org/changeset/base/304239

Log:
  Bug 211013 reports that a write error to a UFS filesystem running
  with softupdates panics the kernel. The problem that has been pointed
  out is that when there is a transient write error on certain metadata
  blocks, specifically directory blocks (PAGEDEP), inode blocks
  (INODEDEP), indirect pointer blocks (INDIRDEPS), and cylinder group
  (BMSAFEMAP, but only when journaling is enabled), we get a panic
  in one of the routines called by softdep_disk_io_initiation that
  the I/O is "already started" when we retry the write.

  These dependency types potentially need to do roll-backs when called
  by softdep_disk_io_initiation before doing a write and then a
  roll-forward when called by softdep_disk_write_complete after the
  I/O completes.  The panic happens when there is a transient error.
  At the top of softdep_disk_write_complete we check to see if the
  write had an error and if an error occurred we just return.  This
  return is correct most of the time because the main role of the routines
  called by softdep_disk_write_complete is to process the now-completed
  dependencies so that the next I/O steps can happen.

  But for the four types listed above, they do not get to do their
  rollback operations. This causes the panic when softdep_disk_io_initiation
  gets called on the second attempt to do the write and the roll-back
  routines find that the roll-backs have already been done. As an
  aside I note that there is also the problem that the buffer will
  have been unlocked and thus made visible to the filesystem and to
  user applications with the roll-backs in place.

  The way to resolve the problem is to add a flag to the routines called
  by softdep_disk_write_complete for the four dependency types noted
  that indicates whether the write was successful (WRITESUCCEEDED).
  If the write does not succeed, they do just the roll-backs and then
  return. If the write was successful they also do their usual
  processing of the now-completed dependencies.

  The fix was tested by selectively injecting write errors for buffers
  holding dependencies of each of the four types noted above and then
  verifying that the kernel no longer paniced and that following the
  successful retry of the write that the filesystem could be unmounted
  and successfully checked cleanly.

  PR: 211013
  Reviewed by: kib

Changes:
  head/sys/ufs/ffs/ffs_softdep.c
  head/sys/ufs/ffs/softdep.h

Comment 2 Kirk McKusick freebsd_committer

2016-08-16 21:25:24 UTC

Patch has been applied. If no further problems are reported, bug will be closed.

Comment 3 karl 2016-08-17 02:01:07 UTC

I trashed the card that caused this, but will see if I can reproduce and will update in any event.

Comment 4 karl 2016-08-17 02:03:31 UTC

(In reply to karl from comment #3)

Is this expected to be MFC'd back against 11.0-PRE (and should it apply cleanly?)

Comment 5 Kirk McKusick freebsd_committer

2016-08-17 05:39:00 UTC

Though I did not specify an MFC, it should apply easily to 11.0. I do plan to do an MFC to 11.0 once it has been released. Since it is not a common bug I don't want to slow the process of getting 11.0 out the door hence the pause to MFC.

Comment 6 Bryan Drewery freebsd_committer

2016-08-17 10:55:35 UTC

Depending on how long you wanted to let this be tested in head, it may very well make the timeline to MFC into releng/11.0.  Even if an unrare case, it seems worth it to me
to merge this if it fits into the timeline.

How long were you wanting to let this bake in head?

Comment 7 Kirk McKusick freebsd_committer

2016-08-17 16:13:29 UTC

I would like a week or two in head just to be sure that it does not break any existing code usage. Note that 304230 has to be merged at the same time as
this one (304239) since this one uses the new LIST_CONCAT added to queue.h in 304230.

Comment 8 commit-hook freebsd_committer

2016-10-17 21:45:24 UTC

A commit references this bug:

Author: mckusick
Date: Mon Oct 17 21:44:41 UTC 2016
New revision: 307533
URL: https://svnweb.freebsd.org/changeset/base/307533

Log:
  MFC r304230:
  Add two new macros, SLIST_CONCAT and LIST_CONCAT.

  MFC r304239:
  Bug 211013 reports that a write error to a UFS filesystem running
  with softupdates panics the kernel.

  PR: 211013

Changes:
_U  stable/11/
  stable/11/share/man/man3/queue.3
  stable/11/sys/sys/queue.h
  stable/11/sys/ufs/ffs/ffs_softdep.c
  stable/11/sys/ufs/ffs/softdep.h

Comment 9 commit-hook freebsd_committer

2016-10-17 21:50:26 UTC

A commit references this bug:

Author: mckusick
Date: Mon Oct 17 21:49:55 UTC 2016
New revision: 307534
URL: https://svnweb.freebsd.org/changeset/base/307534

Log:
  MFC r304230:
  Add two new macros, SLIST_CONCAT and LIST_CONCAT.

  MFC r304239:
  Bug 211013 reports that a write error to a UFS filesystem running
  with softupdates panics the kernel.

  PR: 211013

Changes:
_U  stable/10/
  stable/10/share/man/man3/queue.3
  stable/10/sys/sys/queue.h
  stable/10/sys/ufs/ffs/ffs_softdep.c
  stable/10/sys/ufs/ffs/softdep.h

Comment 10 Kirk McKusick freebsd_committer

2016-10-17 22:43:55 UTC

With MFC to stable-10 and stable-11 this bug report can be closed.