Summary: | Write error to UFS filesystem with softupdates panics machine | ||
---|---|---|---|
Product: | Base System | Reporter: | karl |
Component: | kern | Assignee: | freebsd-fs (Nobody) <fs> |
Status: | Closed FIXED | ||
Severity: | Affects Many People | CC: | bdrewery, kbowling, mckusick, op |
Priority: | --- | Flags: | op:
mfc-stable10?
|
Version: | 11.0-BETA1 | ||
Hardware: | Any | ||
OS: | Any |
Description
karl
2016-07-11 17:56:05 UTC
A commit references this bug: Author: mckusick Date: Tue Aug 16 21:02:30 UTC 2016 New revision: 304239 URL: https://svnweb.freebsd.org/changeset/base/304239 Log: Bug 211013 reports that a write error to a UFS filesystem running with softupdates panics the kernel. The problem that has been pointed out is that when there is a transient write error on certain metadata blocks, specifically directory blocks (PAGEDEP), inode blocks (INODEDEP), indirect pointer blocks (INDIRDEPS), and cylinder group (BMSAFEMAP, but only when journaling is enabled), we get a panic in one of the routines called by softdep_disk_io_initiation that the I/O is "already started" when we retry the write. These dependency types potentially need to do roll-backs when called by softdep_disk_io_initiation before doing a write and then a roll-forward when called by softdep_disk_write_complete after the I/O completes. The panic happens when there is a transient error. At the top of softdep_disk_write_complete we check to see if the write had an error and if an error occurred we just return. This return is correct most of the time because the main role of the routines called by softdep_disk_write_complete is to process the now-completed dependencies so that the next I/O steps can happen. But for the four types listed above, they do not get to do their rollback operations. This causes the panic when softdep_disk_io_initiation gets called on the second attempt to do the write and the roll-back routines find that the roll-backs have already been done. As an aside I note that there is also the problem that the buffer will have been unlocked and thus made visible to the filesystem and to user applications with the roll-backs in place. The way to resolve the problem is to add a flag to the routines called by softdep_disk_write_complete for the four dependency types noted that indicates whether the write was successful (WRITESUCCEEDED). If the write does not succeed, they do just the roll-backs and then return. If the write was successful they also do their usual processing of the now-completed dependencies. The fix was tested by selectively injecting write errors for buffers holding dependencies of each of the four types noted above and then verifying that the kernel no longer paniced and that following the successful retry of the write that the filesystem could be unmounted and successfully checked cleanly. PR: 211013 Reviewed by: kib Changes: head/sys/ufs/ffs/ffs_softdep.c head/sys/ufs/ffs/softdep.h Patch has been applied. If no further problems are reported, bug will be closed. I trashed the card that caused this, but will see if I can reproduce and will update in any event. (In reply to karl from comment #3) Is this expected to be MFC'd back against 11.0-PRE (and should it apply cleanly?) Though I did not specify an MFC, it should apply easily to 11.0. I do plan to do an MFC to 11.0 once it has been released. Since it is not a common bug I don't want to slow the process of getting 11.0 out the door hence the pause to MFC. Depending on how long you wanted to let this be tested in head, it may very well make the timeline to MFC into releng/11.0. Even if an unrare case, it seems worth it to me to merge this if it fits into the timeline. How long were you wanting to let this bake in head? I would like a week or two in head just to be sure that it does not break any existing code usage. Note that 304230 has to be merged at the same time as this one (304239) since this one uses the new LIST_CONCAT added to queue.h in 304230. A commit references this bug: Author: mckusick Date: Mon Oct 17 21:44:41 UTC 2016 New revision: 307533 URL: https://svnweb.freebsd.org/changeset/base/307533 Log: MFC r304230: Add two new macros, SLIST_CONCAT and LIST_CONCAT. MFC r304239: Bug 211013 reports that a write error to a UFS filesystem running with softupdates panics the kernel. PR: 211013 Changes: _U stable/11/ stable/11/share/man/man3/queue.3 stable/11/sys/sys/queue.h stable/11/sys/ufs/ffs/ffs_softdep.c stable/11/sys/ufs/ffs/softdep.h A commit references this bug: Author: mckusick Date: Mon Oct 17 21:49:55 UTC 2016 New revision: 307534 URL: https://svnweb.freebsd.org/changeset/base/307534 Log: MFC r304230: Add two new macros, SLIST_CONCAT and LIST_CONCAT. MFC r304239: Bug 211013 reports that a write error to a UFS filesystem running with softupdates panics the kernel. PR: 211013 Changes: _U stable/10/ stable/10/share/man/man3/queue.3 stable/10/sys/sys/queue.h stable/10/sys/ufs/ffs/ffs_softdep.c stable/10/sys/ufs/ffs/softdep.h With MFC to stable-10 and stable-11 this bug report can be closed. |