Summary: | [md] md getting stuck in wdrain state | ||
---|---|---|---|
Product: | Base System | Reporter: | Carl <k0802647> |
Component: | kern | Assignee: | freebsd-fs (Nobody) <fs> |
Status: | Closed FIXED | ||
Severity: | Affects Only Me | CC: | kib |
Priority: | Normal | ||
Version: | 8.1-RELEASE | ||
Hardware: | Any | ||
OS: | Any |
Description
Carl
2011-01-22 23:00:21 UTC
State Changed From-To: open->feedback Apparently, the suspension of the filesystem failed to finish, causing all writers on the filesystem to block. To diagnose the cause, we need the information specified at http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html Responsible Changed From-To: freebsd-amd64->freebsd-fs UFS issue. Now I owe a friend a beer. His assertion was that in submitting this bug report I would incur a request to use a debugger myself, and this despite me being an end user reporting a problem on a production system in a remote location which other people depend on. While it would be an interesting and educational distraction to rebuild the kernel and deadlock a production system a few more times, I trust it's understood why that can't happen. As such, I thought it would be helpful to provide the above script so FreeBSD developers with more systems at their disposal might try to reproduce the problem. Any chance of that happening? Carl / K0802647 Author: kib Date: Wed Jan 26 10:34:21 2011 New Revision: 217880 URL: http://svn.freebsd.org/changeset/base/217880 Log: Treat async buffer writes from the gjournal switcher thread the same as from syncer. We shall not sleep on running buffer space when suspending. Reproduced and tested by: pho PR: kern/154228 MFC after: 1 week Modified: head/sys/geom/journal/g_journal.c Modified: head/sys/geom/journal/g_journal.c ============================================================================== --- head/sys/geom/journal/g_journal.c Wed Jan 26 10:08:37 2011 (r217879) +++ head/sys/geom/journal/g_journal.c Wed Jan 26 10:34:21 2011 (r217880) @@ -3033,6 +3033,7 @@ g_journal_switcher(void *arg) int error; mp = arg; + curthread->td_pflags |= TDP_NORUNNINGBUF; for (;;) { g_journal_switcher_wokenup = 0; error = tsleep(&g_journal_switcher_state, PRIBIO, "jsw:wait", _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org" For whatever reason I was not copied on the patch message, despite being the bug reporter. The explanation for that patch is more than a little obscure. In simpler terms, what have you uncovered? Does that patch implement a complete fix, partial fix, a workaround, or what? Is it recommended I try it? Did someone manage to reproduce my problem scenario? Yesterday I ran into the same bug. Similar but different exercise. Again on a remote production system. I had no choice but to try again, so I repeated the procedure, only using a non-sparse file instead. It hung yet again, so that should rule out sparse files as part of the problem. I noticed in the mdconfig(8) man page this description for the "-o [no]async" option: 'For vnode backed devices: avoid IO_SYNC for increased performance but at the risk of deadlocking the entire kernel.' It seems to me the default would be "-o noasync" and that this is supposed to avoid that particular risk for deadlock, but what command can I use to verify whether a particular enabled memory disk is actually using IO_SYNC or not? Carl / K0802647 For the sake of end users suffering from this problem, please elaborate on the patch. Carl / K0802647 I applied the patch to the FreeBSD-8.1-RELEASE-amd64 system for which I'd filed the bug report. It solved the problem I reported for the scenario in question. Thanks. Carl / K0802647 State Changed From-To: feedback->patched Fixed in head (r217880) and stable/8 (r218188). |