Bug 281471

Summary: ASLR: jemalloc RES memory keeps on increasing until process cores
Product: Base System Reporter: Rupesh Pilania <rupeshpilania>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Open ---    
Severity: Affects Only Me CC: brooks, emaste, grahamperrin
Priority: ---    
Version: 13.3-RELEASE   
Hardware: arm64   
OS: Any   

Description Rupesh Pilania 2024-09-13 05:37:24 UTC
Hi Team,

I noticed Jemalloc doesn't bring down RES memory to original one even after calling free and setting these flags.
MALLOC_CONF="xmalloc:true,dirty_decay_ms:0,retain:false" 
The only way to make it work is to disable aslr at kernel level along with MALLOC_CONF="xmalloc:true,dirty_decay_ms:0,retain:false" 

Test Results:

ASLR Enabled:

cat /etc/sysctl.conf | grep aslr
kern.elf32.aslr.enable=1
kern.elf32.aslr.pie_enable=1
kern.elf64.aslr.enable=1
kern.elf64.aslr.pie_enable=1
13.2-RELEASE-p12 FreeBSD 13.2-RELEASE-p12 MESSAGING_GATEWAY amd64


C600V-DUT018:rtestuser 16] ./mem-fragment
Hello!  This program will fragment its process heap.  Run top -p 5364 to follow along!
Press Enter to continue...

 PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
 5364 root          1  20    0    10M  2280K ttyin    6   0:00   0.00% mem-frag


500k 5KB chunks were just provisioned
Press Enter to continue...


  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
 5364 root          1  52    0  4898M  3034M ttyin    6   0:02   0.00% mem-frag

500k 5KB chunks were just provisioned
Press Enter to continue...

 PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
 5364 root          1  48    0  9816M  6064M ttyin    6   0:04   0.00% mem-frag

The first allocations were just free()'d.
Press Enter to continue...

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
 5364 root          1  27    0  7266M  3541M ttyin    6   0:05   0.00% mem-frag

The 2nd allocations were just free()'d.
Press Enter to continue...

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
 5364 root          1  28    0  4716M  1018M ttyin    6   0:05   0.00% mem-frag




ASLR Disabled:
cat /etc/sysctl.conf | grep aslr
kern.elf32.aslr.enable=0
kern.elf32.aslr.pie_enable=0
kern.elf64.aslr.enable=0
kern.elf64.aslr.pie_enable=0

C600V-DUT018:rtestuser 8] setenv MALLOC_CONF "xmalloc:true,dirty_decay_ms:0,retain:false"

C600V-DUT018:rtestuser 10] ./mem-fragment
Hello!  This program will fragment its process heap.  Run top -p 3884 to follow along!
Press Enter to continue...
 PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
 3884 root          1  20    0    16M  2152K ttyin    5   0:00   0.00% mem-frag
500k 5KB chunks were just provisioned
Press Enter to continue...
  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COM
 3884 root          1  44    0  2592M  2072M ttyin    5   0:02   0.00% mem
500k 5KB chunks were just provisioned
Press Enter to continue...
 PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COM
 3884 root          1  45    0  5168M  4142M ttyin    5   0:03   0.00% mem
The first allocations were just free()'d.
Press Enter to continue...
 PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COM
 3884 root          1  25    0  2618M  2097M ttyin    5   0:04   0.00% mem
The 2nd allocations were just free()'d.
Press Enter to continue...
 PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COM
 3884 root          1  26    0    68M    52M ttyin    5   0:04   0.00% mem
Comment 1 Rupesh Pilania 2024-09-13 05:41:44 UTC
mem-fragment program reference was taken from https://engineering.linkedin.com/blog/2021/taming-memory-fragmentation-in-venice-with-jemalloc#:~:text=Jemalloc

#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
#include <stdlib.h>
 
 
// Define constants to make sure strings are not allocated at the top of the heap
#define HIT_ENTER  "Press Enter to continue...\n"
#define ALLOCATED  "500k 5KB chunks were just provisioned\n"
#define FREED      "The first allocations were just free()'d.\n"
#define FREED_NEXT "The 2nd allocations were just free()'d.\n"
 
 
void press_enter_to_continue(void) {
   printf(HIT_ENTER);
   getchar();
   return;
}
 
int main() {
    printf("Hello!  This program will fragment its process heap.  Run top -p %d to follow along!\n", getpid());
    press_enter_to_continue();
 
    int i;
    // Arbitrary value
    int ARRAY_SIZE = 5*1024*102;
 
    // Because we're asking for a large size at the get-go, the backing memory for these arrays will be mmap'd.  But the pointers stored here will be for
    // small allocations that we expect to be brk()'d.  We keep the mapping so we can free them later.
    char *p1[ARRAY_SIZE];
    char *p2[ARRAY_SIZE];
 
    int mallocSize = 5 * 1024;
 
    for(i=0; i < ARRAY_SIZE; i++){
        // malloc in small chunks such that we are always below the mmap threshold for these allocations.
        p1[i] = malloc(mallocSize);
        // Write something to make sure the page is backed by physical RAM
        *p1[i] = 'a';
    }
 
    printf(ALLOCATED);
    press_enter_to_continue();
 
    for(i=0; i < ARRAY_SIZE; i++){
        // Again, malloc in small chunks such that we are always below the mmap threshold for these allocations
        p2[i] = malloc(mallocSize);
       *p2[i] = 'a'; 
    }
 
    printf(ALLOCATED);
    press_enter_to_continue();
 
    // Free the allocations
    for(i=0; i < ARRAY_SIZE; i++){
        free(p1[i]);
    }
 
    printf(FREED);
    press_enter_to_continue();
 
    // Free the allocations
    for(i=0; i < ARRAY_SIZE; i++){
        free(p2[i]);
    }
 
    printf(FREED_NEXT);
    press_enter_to_continue();
 
    return 0;
}
Comment 2 Rupesh Pilania 2024-09-13 05:43:01 UTC
Issue should exists on 13.3 and 13.4 as well. Will check and update the OS Tag.
Comment 3 Brooks Davis freebsd_committer freebsd_triage 2024-09-13 07:51:22 UTC
There was a commit in June to reduce jemalloc induced fragmentation which describes a somewhat different scenerio.  It has not been merged to any non-main branch, but might be relevant?

commit 268f19aacc6af8f64c438e8515213023a2e66ed7
Author: Alan Cox <alc@FreeBSD.org>
Date:   Sun Jun 9 11:58:27 2024 -0500

    vm: Reduce address space fragmentation
    
    jemalloc performs two types of virtual memory allocations: (1) large
    chunks of virtual memory, where the chunk size is a multiple of a
    superpage and explicitly aligned, and (2) small allocations, mostly
    128KB, where no alignment is requested.  Typically, it starts with a
    small allocation, and over time it makes both types of allocation.
    
    With anon_loc being updated on every allocation, we wind up with a
    repeating pattern of a small allocation, a large gap, and a large,
    aligned allocation.  (As an aside, we wind up allocating a reservation
    for these small allocations, but it will never fill because the next
    large, aligned allocation updates anon_loc, leaving a gap that will
    never be filled with other small allocations.)
    
    With this change, anon_loc isn't updated on every allocation.  So, the
    small allocations will be clustered together, the large allocations will
    be clustered together, and there will be fewer gaps between the
    anonymous memory allocations.  In addition, I see a small reduction in
    reservations allocated (e.g., 1.6% during buildworld), fewer partially
    populated reservations, and a small increase in 64KB page promotions on
    arm64.
    
    Reviewed by:    kib
    MFC after:      1 week
    Differential Revision:  https://reviews.freebsd.org/D39845
Comment 4 Rupesh Pilania 2024-09-13 11:14:57 UTC
(In reply to Brooks Davis from comment #3)
Thankyou for your reply. I will try after applying this patch.

Tested with FreeBSD 14.1. Issue is not seen.

sysctl -a | grep aslr
kern.elf32.aslr.shared_page: 0
kern.elf32.aslr.stack: 0
kern.elf32.aslr.honor_sbrk: 0
kern.elf32.aslr.pie_enable: 0
kern.elf32.aslr.enable: 0
kern.elf64.aslr.shared_page: 1
kern.elf64.aslr.stack: 1
kern.elf64.aslr.honor_sbrk: 0
kern.elf64.aslr.pie_enable: 1
kern.elf64.aslr.enable: 1
vm.aslr_restarts: 0

rpilania@build-server-FBSD14:~ % ./mem-fragment
Hello!  This program will fragment its process heap.  Run top -p 52296 to follow along!
Press Enter to continue...
PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
52296 rpilania      1  34    0    10M  1792K ttyin    3   0:00   0.00% 
500k 5KB chunks were just provisioned
Press Enter to continue...
PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
52296 rpilania      1  58    0  2592M  2072M ttyin    3   0:02   0.00% 
500k 5KB chunks were just provisioned
Press Enter to continue...
PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
52296 rpilania      1  68    0  5168M  4142M ttyin    3   0:04   0.00% 
The first allocations were just free()'d.
Press Enter to continue...
PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
52296 rpilania      1  32    0  2618M  2097M ttyin    3   0:05   0.00% 
The 2nd allocations were just free()'d.
Press Enter to continue...
PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
52296 rpilania      1  31    0    68M    52M ttyin    3   0:05   0.00%
Comment 5 Rupesh Pilania 2024-09-13 13:16:30 UTC
(In reply to Brooks Davis from comment #3)
Applied patch suggested by you but no difference seen.
https://github.com/freebsd/freebsd-src/commit/268f19aacc6af8f64c438e8515213023a2e66ed7
Comment 6 Rupesh Pilania 2024-09-13 14:37:39 UTC
This patch https://github.com/freebsd/freebsd-src/commit/d8e6f4946cec0b84a6997d62e791b8cf993741b2

Bringing RES memory to almost similar to FreeBSD 14.1 and FreeBSD 10.4. Seems to be resolving the problem.

But we do run some applications with compat/10 libraries, which are giving out of memory cores on FreeBSD 13.2/13.0 systems.
Comment 7 Rupesh Pilania 2024-09-13 15:32:02 UTC
This patch https://github.com/freebsd/freebsd-src/commit/d8e6f4946cec0b84a6997d62e791b8cf993741b2

Bringing RES memory to almost similar to FreeBSD 14.1 and FreeBSD 10.4. Seems to be resolving the problem.

Applying this patch resolves the RES memory issue but it breaks compatibility with FreeBSD 10.4 libraries. 
We do run some applications with compat/10 libraries, which are giving out of memory cores on FreeBSD 13.2/13.0 after applying this.
Comment 8 Rupesh Pilania 2024-09-13 18:19:03 UTC
Issue exists on 13.3 as well.
Will test on 13.4 and update

FreeBSD rpilania 13.3-RELEASE FreeBSD 13.3-RELEASE releng/13.3-n257428-80d2b634ddf0 GENERIC amd64
root@rpilania:/home/rpilania # setenv MALLOC_CONF "xmalloc:true,dirty_decay_ms:0,retain:false"
root@rpilania:/home/rpilania # ./mem-frag
Hello!  This program will fragment its process heap.  Run top -p 810 to follow along!
Press Enter to continue...
  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
  810 root          1  20    0    10M  2128K ttyin    2   0:00   0.00% mem-frag


500k 5KB chunks were just provisioned
Press Enter to continue...
  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
  810 root          1  27    0  4898M  3033M ttyin    1   0:01   0.00% mem-frag

500k 5KB chunks were just provisioned
Press Enter to continue...

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
  810 root          1  25    0  9816M  5260M ttyin    3   0:02   0.00% mem-frag

The first allocations were just free()'d.
Press Enter to continue...


  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
  810 root          1  31    0  7266M  3540M ttyin    3   0:02   0.00% mem-frag

The 2nd allocations were just free()'d.
Press Enter to continue...


  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
  810 root          1  23    0  4716M  1018M ttyin    3   0:02   0.00% mem-frag
Comment 9 Rupesh Pilania 2024-09-14 09:07:56 UTC
Issue exists in 13.4 as well.

FreeBSD rupesh2 13.4-RC3 FreeBSD 13.4-RC3 releng/13.4-n258255-087b246271b6 GENERIC amd64
root@rupesh2:/home/rpilania # setenv MALLOC_CONF "xmalloc:true,dirty_decay_ms:0,retain:false"
root@rupesh2:/home/rpilania # ./mem-fragment
Hello!  This program will fragment its process heap.  Run top -p 1027 to follow along!
Press Enter to continue...


  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WC
 1027 root          1  20    0    10M  2100K ttyin    2   0:00   0.0


500k 5KB chunks were just provisioned
Press Enter to continue...


  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WC
 1027 root          1  28    0  4898M  3034M ttyin    2   0:01   0.0

500k 5KB chunks were just provisioned
Press Enter to continue...

 PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WC
 1027 root          1  27    0  9816M  5191M ttyin    2   0:02   0.0


The first allocations were just free()'d.
Press Enter to continue...

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WC
 1027 root          1  30    0  7266M  3541M ttyin    3   0:03   0.0

The 2nd allocations were just free()'d.
Press Enter to continue...


  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WC
 1027 root          1  23    0  4716M  1018M ttyin    3   0:03   0.0
Comment 10 Mark Linimon freebsd_committer freebsd_triage 2024-09-15 21:15:31 UTC
^Triage: brooks@ has commented, so move this out of "new".
Comment 11 Rupesh Pilania 2024-09-18 08:26:57 UTC
(In reply to Rupesh Pilania from comment #7)
We found that MALLOC_OPTIONS was set to X in our env, which was causing cores with above aslr patch for compat/10 libraries.
Setting MALLOC_OPTIONS = "XDm" resolves the core issue with compat/10 libraries.
But I am not sure how that patch impacted compat/10.