Bug 194078 - 10.1-BETA2 kernel memory leak in routing table upon PF reload
Summary: 10.1-BETA2 kernel memory leak in routing table upon PF reload
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.0-STABLE
Hardware: Any Any
: --- Affects Many People
Assignee: Gleb Smirnoff
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-10-01 18:48 UTC by Rumen Telbizov
Modified: 2016-03-24 03:09 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Rumen Telbizov 2014-10-01 18:48:43 UTC
Corresponding discussion over email is available at: http://lists.freebsd.org/pipermail/freebsd-stable/2014-September/080296.html


We discovered that our newly upgraded 10.1-BETA2 (r271983) has started leaking kernel memory and the *wired* portion has been steadily growing over the last few days. The *routetbl* pool is getting bloated with time.


# while true; do date; vmstat -m | grep routetbl; sleep 60; done

         Type InUse   MemUse HighUse Requests  Size(s)
Mon Sep 29 18:27:55 UTC 2014
     routetbl 5988792 2888491K       - 14285826  32,64,128,256,512,2048
Mon Sep 29 18:28:55 UTC 2014
     routetbl 5990120 2889131K       - 14288972  32,64,128,256,512,2048
Mon Sep 29 18:29:55 UTC 2014
     routetbl 5991448 2889771K       - 14292352  32,64,128,256,512,2048
Mon Sep 29 18:30:55 UTC 2014
     routetbl 5992776 2890411K       - 14295464  32,64,128,256,512,2048
Mon Sep 29 18:31:55 UTC 2014
     routetbl 5994104 2891051K       - 14298576  32,64,128,256,512,2048
Mon Sep 29 18:32:55 UTC 2014
     routetbl 5995432 2891691K       - 14301904  32,64,128,256,512,2048
Mon Sep 29 18:33:55 UTC 2014
     routetbl 5996096 2892011K       - 14303624  32,64,128,256,512,2048
Mon Sep 29 18:34:55 UTC 2014
     routetbl 5997422 2892650K       - 14306980  32,64,128,256,512,2048
Mon Sep 29 18:35:55 UTC 2014
     routetbl 5998750 2893290K       - 14310092  32,64,128,256,512,2048
Mon Sep 29 18:36:55 UTC 2014
     routetbl 6000078 2893930K       - 14313204  32,64,128,256,512,2048
Mon Sep 29 18:37:55 UTC 2014
     routetbl 6001406 2894570K       - 14316532  32,64,128,256,512,2048
Mon Sep 29 18:38:55 UTC 2014
     routetbl 6002734 2895210K       - 14319644  32,64,128,256,512,2048
Mon Sep 29 18:39:55 UTC 2014
     routetbl 6004062 2895850K       - 14323024  32,64,128,256,512,2048
Mon Sep 29 18:40:56 UTC 2014
     routetbl 6004726 2896170K       - 14324745  32,64,128,256,512,2048
Mon Sep 29 18:41:56 UTC 2014
     routetbl 6006054 2896810K       - 14327857  32,64,128,256,512,2048
Mon Sep 29 18:42:56 UTC 2014
     routetbl 6007382 2897450K       - 14331185  32,64,128,256,512,2048
Mon Sep 29 18:43:56 UTC 2014
     routetbl 6008710 2898090K       - 14334297  32,64,128,256,512,2048


After some investigation we discovered that the memory leak is triggered by a reload of the PF rule set. A simple test case with 1 rule and 1 table reveals the bug and makes it easy to reproduce. Every time 'pfctl -f /etc/firewall/pf.conf' (or your corresponding pf.conf file) is run the output above shows an increase in memory. In between reloads the memory usage stays stable.

The following DTrace script could be used to watch the same behavior:

#!/usr/sbin/dtrace -s
#pragma D option quiet

BEGIN {
    printf("%-20s %20s %20s %20s\n", "FUNCTION", "ALLOCATED", "FREE", "TOTAL");
}

dtmalloc::$1:malloc {
    @malloc[probefunc] = sum(args[3]);
    @total[probefunc] = sum(args[3]);
}

dtmalloc::$1:free {
    @free[probefunc] = sum(args[3]);
    @total[probefunc] = sum(-args[3]);
}

tick-1s {
    printa("%-20.20s %20@d %20@d %20@d\n", @malloc, @free, @total);
}


Output:
# ./script2.d routetbl
FUNCTION  ALLOCATED    FREE   TOTAL
routetbl        512     512       0
...
routetbl      46592   46592       0
routetbl     808960  444416  364544
...
routetbl     861184  496640  364544
routetbl    1623552  894464  729088
...
routetbl    1641984  912896  729088



Further diagnostics:

#!/usr/sbin/dtrace -s
fbt:kernel:rt_msg2:entry {
    @rt_msg2[stack()] = count();
}
fbt:kernel:rn_addroute:entry {
    @rn_addroute[stack()] = count();
}

Output:
              kernel`sysctl_rtsock+0x274
              kernel`sysctl_root+0x214
              kernel`userland_sysctl+0x1d8
              kernel`sys___sysctl+0x74
              kernel`amd64_syscall+0x334
              kernel`0xffffffff80900e0b
             2340

              kernel`sysctl_rtsock+0x64c
              kernel`sysctl_root+0x214
              kernel`userland_sysctl+0x1d8
              kernel`sys___sysctl+0x74
              kernel`amd64_syscall+0x334
              kernel`0xffffffff80900e0b
             4680

This problem was reported to Gleb Smirnoff and he managed to reproduce and confirm the problem.
Comment 1 commit-hook freebsd_committer freebsd_triage 2014-10-01 21:25:45 UTC
A commit references this bug:

Author: melifaro
Date: Wed Oct  1 21:24:59 UTC 2014
New revision: 272385
URL: https://svnweb.freebsd.org/changeset/base/272385

Log:
  Free radix mask entries on main radix destroy.
  This is temporary commit to be merged to 10.
  Other approach (like hash table) should be used
  to store different masks.

  PR:		194078
  Submitted by:	Rumen Telbizov
  MFC after:	3 days

Changes:
  head/sys/net/radix.c
Comment 2 commit-hook freebsd_committer freebsd_triage 2014-10-16 20:46:23 UTC
A commit references this bug:

Author: glebius
Date: Thu Oct 16 20:46:03 UTC 2014
New revision: 273185
URL: https://svnweb.freebsd.org/changeset/base/273185

Log:
  Merge r272385 by melifaro from head:
    Free radix mask entries on main radix destroy.
    This is temporary commit to be merged to 10.
    Other approach (like hash table) should be used
    to store different masks.

  PR:             194078

Changes:
_U  stable/10/
  stable/10/sys/net/radix.c
Comment 3 commit-hook freebsd_committer freebsd_triage 2014-10-16 23:03:36 UTC
A commit references this bug:

Author: glebius
Date: Thu Oct 16 23:03:05 UTC 2014
New revision: 273196
URL: https://svnweb.freebsd.org/changeset/base/273196

Log:
  Merge r273184, r273185 from stable/10:
    - Use rn_detachhead() instead of direct free(9) for radix tables.
    - Free radix mask entries on main radix destroy.

  PR:		194078
  Approved by:	re (gjb)

Changes:
_U  releng/10.1/
  releng/10.1/sys/net/radix.c
  releng/10.1/sys/netpfil/pf/pf_table.c
Comment 4 commit-hook freebsd_committer freebsd_triage 2016-03-24 03:09:38 UTC
A commit references this bug:

Author: bdrewery
Date: Thu Mar 24 03:08:39 UTC 2016
New revision: 297222
URL: https://svnweb.freebsd.org/changeset/base/297222

Log:
  Fix M_RTABLE memory leak from r274118 (11/2014).

  Replace free(M_RTABLE) with rn_detachhead() to match rn_inithead().

  This would trigger when reloading NFS exports and was similar to
  problems with pf reload [1].

  PR:		194078 [1]
  Sponsored by:	EMC / Isilon Storage Division

Changes:
  head/sys/kern/vfs_export.c