The machine in question had mounted a UFS filesystem mounted that had softupdates enabled (on an SD card; I was updating a system that runs FreeBSD on a Raspberry Pi2 by plugging the card into a different machine) and the I/O card took an unrecoverable write error. The result was a kernel panic; this is apparently considered expected behavior at present if softupdates are turned on for the filesystem because it's possible that the filesystem has now been corrupted and there is no way to be sure with the machine running. Thus the choice to panic() when this situation occurs. But it appears that the choice to panic() is too broad and unnecessary in that in many cases a less-severe action is effective while not exposing the system to the risk of unknown filesystem corruption. Yes, if there are working-set pages on that volume and it is corrupt, the system is no longer stable (this is especially true if the system is *running* from that volume.) It is also true that in the case of a solid-state device of some kind the impact of a write error may cross a filesystem boundary, so it's insufficient to invalidate the filesystem (on a SSD or similar device the read/erase/write cycle for a data re-write may involve many megabytes of data, and that can possibly not be entirely local to the filesystem mounted if there is more than one on the physical volume.) HOWEVER, forcibly-detaching the volume in question instead of calling panic() *should* be effective in preventing the possibility of propagating a corrupted filesystem. While this will lead to a panic in the event that executing RSS (or consumed page file space) is present on that volume, in the case where the device holds only data the detach will *not* panic the machine. This appears to be a situation where a less-severe "remedy" for a failed I/O is certainly called for. The following backtrace was captured from the panic itself: root@Dbms2:/var/crash # kgdb /boot/kernel/kernel vmcore.0 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: panic: initiate_write_inodeblock_ufs2: already started cpuid = 14 KDB: stack backtrace: #0 0xffffffff80b1f357 at kdb_backtrace+0x67 #1 0xffffffff80ad6ec2 at vpanic+0x182 #2 0xffffffff80ad6d33 at panic+0x43 #3 0xffffffff80dc16ad at softdep_disk_io_initiation+0x159d #4 0xffffffff80de61eb at ffs_geom_strategy+0x13b #5 0xffffffff80b872f7 at bufwrite+0x267 #6 0xffffffff80b8ac6a at vfs_bio_awrite+0x3ca #7 0xffffffff80b96b77 at vop_stdfsync+0x277 #8 0xffffffff80983766 at devfs_fsync+0x26 #9 0xffffffff81101f7d at VOP_FSYNC_APV+0x8d #10 0xffffffff80baf1ae at sched_sync+0x3be #11 0xffffffff80a8dcb5 at fork_exit+0x85 #12 0xffffffff80f7f85e at fork_trampoline+0xe Uptime: 27m9s (kgdb) where #0 doadump (textdump=<value optimized out>) at pcpu.h:221 #1 0xffffffff80ad6949 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:366 #2 0xffffffff80ad6efb in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:759 #3 0xffffffff80ad6d33 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:690 #4 0xffffffff80dc16ad in softdep_disk_io_initiation (bp=<value optimized out>) at /usr/src/sys/ufs/ffs/ffs_softdep.c:10301 #5 0xffffffff80de61eb in ffs_geom_strategy (bo=<value optimized out>, bp=<value optimized out>) at buf.h:412 #6 0xffffffff80b872f7 in bufwrite (bp=0xfffffe02e8629b30) at buf.h:405 #7 0xffffffff80b8ac6a in vfs_bio_awrite (bp=<value optimized out>) at buf.h:393 #8 0xffffffff80b96b77 in vop_stdfsync (ap=0xfffffe034f481b68) at /usr/src/sys/kern/vfs_default.c:692 #9 0xffffffff80983766 in devfs_fsync (ap=0xfffffe034f481b68) at /usr/src/sys/fs/devfs/devfs_vnops.c:702 #10 0xffffffff81101f7d in VOP_FSYNC_APV (vop=<value optimized out>, a=<value optimized out>) at vnode_if.c:1331 #11 0xffffffff80baf1ae in sched_sync () at vnode_if.h:549 #12 0xffffffff80a8dcb5 in fork_exit (callout=0xffffffff80baedf0 <sched_sync>, arg=0x0, frame=0xfffffe034f481c00) at /usr/src/sys/kern/kern_fork.c:1038 #13 0xffffffff80f7f85e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:611 #14 0x0000000000000000 in ?? () (kgdb) FreeBSD 11.0-BETA1 #0 r302439: Fri Jul 8 14:37:27 CDT 2016 karl@Dbms2.denninger.net:/usr/obj/usr/src/sys/GENERIC The offending code line: static void initiate_write_inodeblock_ufs2(inodedep, bp) struct inodedep *inodedep; struct buf *bp; /* The inode block */ { struct allocdirect *adp, *lastadp; struct ufs2_dinode *dp; struct ufs2_dinode *sip; struct inoref *inoref; struct ufsmount *ump; struct fs *fs; ufs_lbn_t i; #ifdef INVARIANTS ufs_lbn_t prevlbn = 0; #endif int deplist; if (inodedep->id_state & IOSTARTED) panic("initiate_write_inodeblock_ufs2: already started"); inodedep->id_state |= IOSTARTED; -- End capture
A commit references this bug: Author: mckusick Date: Tue Aug 16 21:02:30 UTC 2016 New revision: 304239 URL: https://svnweb.freebsd.org/changeset/base/304239 Log: Bug 211013 reports that a write error to a UFS filesystem running with softupdates panics the kernel. The problem that has been pointed out is that when there is a transient write error on certain metadata blocks, specifically directory blocks (PAGEDEP), inode blocks (INODEDEP), indirect pointer blocks (INDIRDEPS), and cylinder group (BMSAFEMAP, but only when journaling is enabled), we get a panic in one of the routines called by softdep_disk_io_initiation that the I/O is "already started" when we retry the write. These dependency types potentially need to do roll-backs when called by softdep_disk_io_initiation before doing a write and then a roll-forward when called by softdep_disk_write_complete after the I/O completes. The panic happens when there is a transient error. At the top of softdep_disk_write_complete we check to see if the write had an error and if an error occurred we just return. This return is correct most of the time because the main role of the routines called by softdep_disk_write_complete is to process the now-completed dependencies so that the next I/O steps can happen. But for the four types listed above, they do not get to do their rollback operations. This causes the panic when softdep_disk_io_initiation gets called on the second attempt to do the write and the roll-back routines find that the roll-backs have already been done. As an aside I note that there is also the problem that the buffer will have been unlocked and thus made visible to the filesystem and to user applications with the roll-backs in place. The way to resolve the problem is to add a flag to the routines called by softdep_disk_write_complete for the four dependency types noted that indicates whether the write was successful (WRITESUCCEEDED). If the write does not succeed, they do just the roll-backs and then return. If the write was successful they also do their usual processing of the now-completed dependencies. The fix was tested by selectively injecting write errors for buffers holding dependencies of each of the four types noted above and then verifying that the kernel no longer paniced and that following the successful retry of the write that the filesystem could be unmounted and successfully checked cleanly. PR: 211013 Reviewed by: kib Changes: head/sys/ufs/ffs/ffs_softdep.c head/sys/ufs/ffs/softdep.h
Patch has been applied. If no further problems are reported, bug will be closed.
I trashed the card that caused this, but will see if I can reproduce and will update in any event.
(In reply to karl from comment #3) Is this expected to be MFC'd back against 11.0-PRE (and should it apply cleanly?)
Though I did not specify an MFC, it should apply easily to 11.0. I do plan to do an MFC to 11.0 once it has been released. Since it is not a common bug I don't want to slow the process of getting 11.0 out the door hence the pause to MFC.
Depending on how long you wanted to let this be tested in head, it may very well make the timeline to MFC into releng/11.0. Even if an unrare case, it seems worth it to me to merge this if it fits into the timeline. How long were you wanting to let this bake in head?
I would like a week or two in head just to be sure that it does not break any existing code usage. Note that 304230 has to be merged at the same time as this one (304239) since this one uses the new LIST_CONCAT added to queue.h in 304230.
A commit references this bug: Author: mckusick Date: Mon Oct 17 21:44:41 UTC 2016 New revision: 307533 URL: https://svnweb.freebsd.org/changeset/base/307533 Log: MFC r304230: Add two new macros, SLIST_CONCAT and LIST_CONCAT. MFC r304239: Bug 211013 reports that a write error to a UFS filesystem running with softupdates panics the kernel. PR: 211013 Changes: _U stable/11/ stable/11/share/man/man3/queue.3 stable/11/sys/sys/queue.h stable/11/sys/ufs/ffs/ffs_softdep.c stable/11/sys/ufs/ffs/softdep.h
A commit references this bug: Author: mckusick Date: Mon Oct 17 21:49:55 UTC 2016 New revision: 307534 URL: https://svnweb.freebsd.org/changeset/base/307534 Log: MFC r304230: Add two new macros, SLIST_CONCAT and LIST_CONCAT. MFC r304239: Bug 211013 reports that a write error to a UFS filesystem running with softupdates panics the kernel. PR: 211013 Changes: _U stable/10/ stable/10/share/man/man3/queue.3 stable/10/sys/sys/queue.h stable/10/sys/ufs/ffs/ffs_softdep.c stable/10/sys/ufs/ffs/softdep.h
With MFC to stable-10 and stable-11 this bug report can be closed.