Bug 222288 - g_bio leak after zfs ABD commit
Summary: g_bio leak after zfs ABD commit
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.1-STABLE
Hardware: Any Any
: --- Affects Many People
Assignee: Andriy Gapon
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-09-13 14:57 UTC by Dan Nelson
Modified: 2017-10-01 15:09 UTC (History)
9 users (show)

See Also:


Attachments
work in progress (1.40 KB, patch)
2017-09-15 14:00 UTC, Andriy Gapon
no flags Details | Diff
alternative patch (1.44 KB, patch)
2017-09-18 14:26 UTC, Andriy Gapon
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dan Nelson 2017-09-13 14:57:00 UTC
It looks like the ABD commit may have introduced a leak of geom bio structs.  It's a slow leak so it's not obvious, but after a few weeks of uptime, one of my 8GB-RAM systems had accumulated a couple million g_bio objects according to vmstat (over 2GB of RAM) and was starting to swap.  I rebooted it, and after 11 hours, I'm up to 310182:

# uptime
 9:55AM  up 11:08, 11 users, load averages: 0.50, 0.54, 0.55
# vmstat -z | egrep 'ITEM|g_bio' 
ITEM                   SIZE   LIMIT     USED     FREE       REQ   FAIL SLEEP
g_bio:                  376,      0,  310182,     818, 38182778,     0,   0

Looking at base r321610 itself, it looks like there's a g_bio_destroy() call that got relocated from vdev_geom_io_intr() to vdev_geom_io_done(); maybe there are cases where vdev_geom_io_intr is called, but vdev_geom_io_done isn't?  I don't know enough about ZFS internals to get any farther than this.

Rolling the kernel back to r321609 makes the leak stop, and updating to r321610 makes it appear again.
Comment 1 Andriy Gapon freebsd_committer 2017-09-13 16:18:04 UTC
(In reply to Dan Nelson from comment #0)
Thank you very much for the report!
As soon as I read it I noticed that I have the same kind of issue.  I dug through the biozone to examine the leaked bio-s and they all seem to have bio_cmd = 5 and bio_flags = 8.  So, they seem to be BIO_FLUSH bio-s.  Their zio-s must have been ZIO_TYPE_IOCTL.

Now, those zio-s use ZIO_IOCTL_PIPELINE and it is defined as:
#define ZIO_IOCTL_PIPELINE                      \
        (ZIO_INTERLOCK_STAGES |                 \
        ZIO_STAGE_VDEV_IO_START |               \
        ZIO_STAGE_VDEV_IO_ASSESS)

So, ZIO_STAGE_VDEV_IO_START is in the pipeline, but ZIO_STAGE_VDEV_IO_DONE is not as you have correctly theorised.
The normal I/O pipelines always include ZIO_VDEV_IO_STAGES
#define ZIO_VDEV_IO_STAGES                      \
        (ZIO_STAGE_VDEV_IO_START |              \
        ZIO_STAGE_VDEV_IO_DONE |                \
        ZIO_STAGE_VDEV_IO_ASSESS)
so the problem does not affect them.

I will double-check why the ioctl pipeline omitted ZIO_STAGE_VDEV_IO_DONE and will test adding that stage.
Comment 2 Andriy Gapon freebsd_committer 2017-09-15 14:00:05 UTC
Created attachment 186415 [details]
work in progress

This is a patch that I am testing right now.
So far so good.
Comment 3 Fabian Keil 2017-09-17 04:16:43 UTC
Thanks a lot for the report Dan.

I noticed that something was leaking but didn't have time to track it
down yet. Thanks to the report I didn't have to.

To work around the issue in ElectroBSD I've reverted  r321610/a0dddc24c9050
after reverting the follow-up commits that would cause revert conflicts.

Your patch looks good to me, Andriy.
I've imported it and will test it in the next couple of days.
Thanks.

It occurred to me that this issue could be easily detected
automatically if there was a way to specify a time limit between
uma_zalloc() and uma_zfree() calls for a given zone (or item from
the zone).

Obviously this only works if an upper limit makes sense (and items
are expected to be freed), but in case of g_bio I believe that this is
the case and there are a bunch of other zones where enforcing
allocation time limits should work.

I wouldn't be surprised if there were a bunch of other zone item
leaks that haven't been detected yet because they don't occur
frequently enough to have a big impact.
Comment 4 Andriy Gapon freebsd_committer 2017-09-18 14:26:34 UTC
Created attachment 186510 [details]
alternative patch

Here is an alternative patch that does not require the modification of ZIO_IOCTL_PIPELINE.
Comment 5 Borja Marcos 2017-09-19 10:05:08 UTC
I've applied the second patch. Seems to be working fine, no leaks.

Only one thing has me slightly puzzled, a very high used UMA slabs count.

ITEM                   SIZE  LIMIT     USED     FREE      REQ FAIL SLEEP

UMA Kegs:               384,      0,     242,       8,     242,   0,   0
UMA Zones:             4736,      0,     259,       0,     259,   0,   0
UMA Slabs:               80,      0,  542261,      39,  553319,   0,   0
UMA Hash:               256,      0,      60,      60,     109,   0,   0

The other server running Elasticsearch, which is still on 11.1-RELEASE and has
also 8 GB of memory, shows this. Note the high number of free UMA slabs.


ITEM                   SIZE  LIMIT     USED     FREE      REQ FAIL SLEEP

UMA Kegs:               384,      0,     240,       0,     240,   0,   0
UMA Zones:             2176,      0,     257,       0,     257,   0,   0
UMA Slabs:               80,      0,  207526,  243674, 2339176,   0,   0


I'll let it run for several days anyway. As I said it's not critical.

Thanks!
Comment 6 Borja Marcos 2017-09-19 14:20:10 UTC
(In reply to Borja Marcos from comment #5)

Ignore the UMA slab comment, it's irrelevant. Sorry.
Comment 7 commit-hook freebsd_committer 2017-09-20 08:28:22 UTC
A commit references this bug:

Author: avg
Date: Wed Sep 20 08:27:21 UTC 2017
New revision: 323796
URL: https://svnweb.freebsd.org/changeset/base/323796

Log:
  fix memory leak in g_bio zone introduced in r320452, another ABD fallout

  I overlooked the fact that that ZIO_IOCTL_PIPELINE does not include
  ZIO_STAGE_VDEV_IO_DONE stage.  We do allocate a struct bio for an ioctl
  zio (a disk cache flush), but we never freed it.

  This change splits bio handling into two groups, one for normal
  read/write i/o that passes data around and, thus, needs the abd data
  tranform; the other group is for "data-less" i/o such as trim and cache
  flush.

  PR:		222288
  Reported by:	Dan Nelson <dnelson@allantgroup.com>
  Tested by:	Borja Marcos <borjam@sarenet.es>
  MFC after:	10 days

Changes:
  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
Comment 8 Borja Marcos 2017-09-20 11:11:31 UTC
For the record, I have applied the patch to 11-STABLE from today and so far it's running fine on my worst case server (running Elasticsearch). No leaks.
Comment 9 Dan Nelson 2017-09-25 14:31:11 UTC
I also applied the patch to 11-STABLE and have been running for a few days with it.  No leaks here either; everything looks back to normal.
Comment 10 commit-hook freebsd_committer 2017-10-01 15:04:37 UTC
A commit references this bug:

Author: avg
Date: Sun Oct  1 15:03:44 UTC 2017
New revision: 324161
URL: https://svnweb.freebsd.org/changeset/base/324161

Log:
  MFV r323796: fix memory leak in g_bio zone introduced in r320452

  I overlooked the fact that that ZIO_IOCTL_PIPELINE does not include
  ZIO_STAGE_VDEV_IO_DONE stage.  We do allocate a struct bio for an ioctl
  zio (a disk cache flush), but we never freed it.

  This change splits bio handling into two groups, one for normal
  read/write i/o that passes data around and, thus, needs the abd data
  tranform; the other group is for "data-less" i/o such as trim and cache
  flush.

  PR:		222288

Changes:
_U  stable/11/
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c