Bug 171865 - [geom] [patch] g_wither_washer() keeping a core busy
Summary: [geom] [patch] g_wither_washer() keeping a core busy
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: unspecified
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-geom
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-09-22 09:40 UTC by Fabian Keil
Modified: 2013-04-01 12:21 UTC (History)
0 users

See Also:


Attachments
file.txt (1.37 KB, text/plain)
2012-09-22 09:40 UTC, Fabian Keil
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Fabian Keil 2012-09-22 09:40:07 UTC
In http://lists.freebsd.org/pipermail/freebsd-fs/2011-June/011855.html
I reported a problem with g_wither_washer() being called more than
400000 times per second after a device got lost, keeping a cpu busy:

fk@r500 ~ $sudo dtrace -n 'fbt:kernel:g_*:entry { @[probefunc, stack()] = count(); } tick-1sec { trunc(@, 3); printa(@); trunc(@)}'
dtrace: description 'fbt:kernel:g_*:entry ' matched 359 probes
CPU     ID                    FUNCTION:NAME
  0  32988                       :tick-1sec 
  g_wither_washer                                   
              kernel`g_run_events+0x3b5
              kernel`0xffffffff8084967e
           446626

  0  32988                       :tick-1sec 
  g_trace                                           
              kernel`g_io_request+0x4d
              kernel`g_io_schedule_down+0x25f
              kernel`g_down_procbody+0x6d
              kernel`fork_exit+0x9a
              kernel`0xffffffff8084967e
              230
  g_trace                                           
              kernel`g_io_deliver+0x7a
              kernel`g_up_procbody+0x6d
              kernel`fork_exit+0x9a
              kernel`0xffffffff8084967e
              230
[...]

I recently found a way to reproduce the problem without using
ZFS or writing to the device.

Fix: I don't have a fix, but the attached patch can be used as a workaround.

After kern.geom.debugflags has been set to 256, it can be set to 0 again,
but the problem will be back after the next geom "event".

Patch attached with submission follows:
How-To-Repeat: geli onetime /dev/md0
geom sched insert -a rr /dev/md0.eli
geli detach /dev/md0.eli.sched.
Comment 1 Jaakko Heinonen freebsd_committer 2012-09-25 15:06:54 UTC
On 2012-09-22, Fabian Keil wrote:
> I recently found a way to reproduce the problem without using
> ZFS or writing to the device.
> >How-To-Repeat:
> geli onetime /dev/md0
> geom sched insert -a rr /dev/md0.eli
> geli detach /dev/md0.eli.sched.

It seems that if you "insert" a sched geom and do "geli detach" on it,
the geli geom can't be destroyed.

After your commands "md0.eli" still exists:

# geli list
Geom name: md0.eli
Providers:
1. Name: md0.eli
   Mediasize: 10485760 (10M)
   Sectorsize: 512
   Mode: r0w0e0
# geli detach md0.eli
geli: No such device: md0.eli.

I didn't find a way to destroy it. I suspect a geom_sched bug. luigi@
cc'd.

-- 
Jaakko
Comment 2 Fabian Keil 2012-09-26 16:41:16 UTC
Jaakko Heinonen <jh@FreeBSD.org> wrote:

> On 2012-09-22, Fabian Keil wrote:
> > I recently found a way to reproduce the problem without using
> > ZFS or writing to the device.
> > >How-To-Repeat:
> > geli onetime /dev/md0
> > geom sched insert -a rr /dev/md0.eli
> > geli detach /dev/md0.eli.sched.
> 
> It seems that if you "insert" a sched geom and do "geli detach" on it,
> the geli geom can't be destroyed.
> 
> After your commands "md0.eli" still exists:

 
> I didn't find a way to destroy it. I suspect a geom_sched bug. luigi@
> cc'd.


While I can't rule out a geom_sched bug, I usually run into the
problem while only using glabel+geli+ZFS on an USB device that
disappears as described in the initial report at:
http://lists.freebsd.org/pipermail/freebsd-fs/2011-June/011855.html

It's just less convenient to reproduce as it requires more steps
and the disappearance can also lead to panics like these:
http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/162010
http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/162036

Fabian
Comment 3 Mark Linimon freebsd_committer 2012-10-06 04:25:16 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-geom

Over to maintainer(s).
Comment 4 Alexander Motin freebsd_committer 2013-04-01 12:19:17 UTC
State Changed
From-To: open->closed

r248674 fixed the problem, making g_wither_washer() to be rerun only after 
some more changes in GEOM topology.