Bug 188911 - [kern] sysctl(KERN_PROC_VMMAP) takes too long
Summary: [kern] sysctl(KERN_PROC_VMMAP) takes too long
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: Unspecified
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-04-23 10:40 UTC by Ivan Kosarev
Modified: 2014-07-25 19:05 UTC (History)
4 users (show)

See Also:


Attachments
suggested change (62.47 KB, patch)
2014-07-04 09:55 UTC, Konstantin Belousov
no flags Details | Diff
corrected patch, unrelated changes removed (3.63 KB, patch)
2014-07-04 13:30 UTC, Konstantin Belousov
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ivan Kosarev 2014-04-23 10:40:00 UTC
With a mmap() call with a large 'size' parameter a subsequent
sysctl(KERN_PROC_VMMAP) call takes too long to perform.

Fix: According to kib@ this is because we compute rss accurately which means we have to visit every page
in range. I dont think you need rss. To confirm this theory, can you try this patch?



This should disable the rss computation.--WxuB8bj8k1nKHBCuaH4djPULWT8VZhhtbPx2rz0eNP9C0N81
Content-Type: text/plain; name="file.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="file.diff"

Index: sys/kern/kern_proc.c
===================================================================
--- sys/kern/kern_proc.c        (revision 265931)
+++ sys/kern/kern_proc.c        (working copy)
@@ -2182,7 +2182,7 @@ kern_proc_vmmap_out(struct proc *p, struct sbuf *s
                }
                kve->kve_resident = 0;
                addr = entry->start;
-               while (addr < entry->end) {
+               while (0 && addr < entry->end) {
                        locked_pa = 0;
                        mincoreinfo = pmap_mincore(map->pmap, addr, &locked_pa);
                        if (locked_pa != 0)
How-To-Repeat: 
#include <assert.h>
#include <stdlib.h>
#include <stdio.h>

#include <unistd.h>
#include <dlfcn.h>
#include <fcntl.h>

#include <sys/sysctl.h>
#include <sys/user.h>
#include <sys/mman.h>


int main(void)
{
    int mib[4];
    size_t size;
    int err;
    void *p;

    printf("#1\n");

    p = mmap((void*) 0x3ffffffff000, 0x80000001000,
        PROT_READ | PROT_WRITE,
        MAP_PRIVATE | MAP_ANON | MAP_FIXED | MAP_NORESERVE,
        -1, 0);
    assert(p != MAP_FAILED);

    printf("#2\n");

    mib[0] = CTL_KERN;
    mib[1] = KERN_PROC;
    mib[2] = KERN_PROC_VMMAP;
    mib[3] = getpid();

    size = 0;
    err = sysctl(mib, 4, NULL, &size, NULL, 0);  /* takes about 40 seconds */
    assert(err == 0);

    printf("#3\n");

    return EXIT_SUCCESS;
}
Comment 1 Ed Maste freebsd_committer freebsd_triage 2014-05-13 03:33:41 UTC
Yes, this is the cause of the slowdown; it takes a long time to
iterate over 8TB 4K at a time. I confirmed by commenting out the loop,
as in the proof of concept patch.
Comment 2 ik 2014-05-15 14:10:46 UTC
Yes, for our purposes we don't need to care about resident pages so if 
we could somehow avoid executing the loop, that would solve the problem. 
Thanks.
Comment 3 David Chisnall freebsd_committer freebsd_triage 2014-07-01 12:43:41 UTC
To provide some context, this is currently a blocker in getting the clang sanitizers working on FreeBSD.
Comment 4 Konstantin Belousov freebsd_committer freebsd_triage 2014-07-04 09:55:23 UTC
Created attachment 144398 [details]
suggested change

The existing calculation of the resident page count in kern_proc_vmmap_out() does not make sense.  It counts the number of installed pte's in the specified range, which can be less than the number of resident pages, if the pages are not faulted on yet (i.e. softfault case).

The patch does two things:
1. it adds a tunable to disable the calculation of the resident count at all. sysctl kern.proc_vmmap_skip_resident_count;
2. it changes the algorithm for calculation to count the number of pages
which are resident for the read fault, the COW copy allocations are counted
as resident, while they are really not.

I am on the edge WRT disabling the calculation by default, the patch does disable.

One interesting consequence of the new algorithm is that the provided test case is executed in zero time even with the residency count calculation enabled.  The reason is that there is no backing object for the mapping which was never faulted on.  As result, the loop is not executed at all.

If I change the test case to access at least one page in the mmaped range before
calling sysctl, I get around 30 sec runtime on my i7 2600K.
Comment 5 Konstantin Belousov freebsd_committer freebsd_triage 2014-07-04 13:30:40 UTC
Created attachment 144404 [details]
corrected patch, unrelated changes removed
Comment 6 Ivan Kosarev 2014-07-09 16:33:25 UTC
I confirm that current/11.0 with the kernel patch applied over it does solve the issue for both the isolated test case provided above and the LLVM's address sanitizers tests, though it takes a bit longer to pass the tests comparing with stable/9.2 with the workaround patch reading /dev/kmem.
Comment 7 commit-hook freebsd_committer freebsd_triage 2014-07-09 19:12:03 UTC
A commit references this bug:

Author: kib
Date: Wed Jul  9 19:11:57 UTC 2014
New revision: 268466
URL: http://svnweb.freebsd.org/changeset/base/268466

Log:
  Current code in sysctl proc.vmmap, which intent is to calculate the
  amount of resident pages, in fact calculates the amount of installed
  pte entries in the region.  Resident pages which were not soft-faulted
  yet are not counted.

  Calculate the amount of resident pages by looking in the objects chain
  backing the region.

  Add a knob to disable the residency calculation at all.  For large
  sparce regions, either previous or updated algorithm runs for too long
  time, while several introspection tools do not need the (advisory) RSS
  value at all.

  PR:	kern/188911
  Sponsored by:	The FreeBSD Foundation
  MFC after:	1 week

Changes:
  head/sys/kern/kern_proc.c