Bug 59912

Summary: mremap() implementation lacking
Product: Base System Reporter: Zachary Amsden <zach>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Open ---    
Severity: Affects Only Me CC: doctorwhoguy, shawn.webb, thmu7, trasz
Priority: Normal    
Version: 4.2-RELEASE   
Hardware: Any   
OS: Any   
Bug Depends on:    
Bug Blocks: 247219    

Description Zachary Amsden 2003-12-03 00:00:17 UTC
mremap() kernel implementation missing, may present Linux emulation
problems.

Fix: Differences ...


--- sys/kern/syscalls.master
+++ sys/kern/syscalls.master
@@ -522,3 +522,5 @@
                            struct kevent *eventlist, int nevents, \
                            const struct timespec *timeout); }
 364    STD     BSD     { int settaskgroup (int group); }
+365    STD     BSD     { caddr_t mremap(void *old_address, size_t old_size, size_t new_size, \
+                           unsigned long flags); }

--- sys/vm/vm_map.c
+++ sys/vm/vm_map.c
@@ -1011,6 +1011,219 @@
 } 

 /*
+ *     vm_map_extend:
+ *
+ *     Attempt to extend a specified address range
+ *
+ */
+int
+vm_map_extend(map, start, end, newend, flags)
+       vm_map_t map;
+       vm_offset_t *start; /* IN/OUT */
+       vm_offset_t end;
+       vm_offset_t newend;
+       int flags;
+{
+       vm_map_entry_t new_entry;
+       vm_map_entry_t prev_entry;
+       vm_ooffset_t offset;
+       vm_offset_t addr;
+       vm_size_t len;
+       int result;
+       int cow;
+       vm_object_t object;
+
+       if (map == kmem_map || map == mb_map)
+               return (KERN_INVALID_ARGUMENT);
+
+       vm_map_lock(map);
+       addr = *start;
+
+       /*
+        * Check that the start and end points are not bogus.
+        */
+
+       if ((addr < map->min_offset) || (newend > map->max_offset) ||
+           (addr >= end) || (end > newend)) {
+               result = KERN_INVALID_ADDRESS;
+               goto err;
+       }
+
+       /*
+        * Find the entry based on the start address
+        */
+       if (!vm_map_lookup_entry(map, addr, &prev_entry))
+               prev_entry = prev_entry->next;
+
+       /*
+        * Ensure that the start and end occurs in the entry
+        */
+       if ((prev_entry == &map->header) || (prev_entry->end < end) ||
+           (prev_entry->start > addr)) {
+               result = KERN_INVALID_ADDRESS;
+               goto err;
+       }
+       object = prev_entry->object.vm_object;
+       
+  
+       /*
+        * Assert that the next entry doesn't overlap the new end point,
+        * and that the current entry ends at the specified region.
+        */
+       if (((prev_entry->next != &map->header) &&
+            (prev_entry->next->start < newend)) ||
+           (prev_entry->end > end)) {
+               /*
+                * If we are not allowed to move the range, fail
+                */
+               if ((flags & MREMAP_MAYMOVE) == 0) {
+                       result = KERN_NO_SPACE;
+                       goto err;
+               }
+
+               /*
+                * Reverse the eflags to COW arguments.  Ugh.
+                */
+               cow = 0;
+               if ((prev_entry->eflags & MAP_ENTRY_COW) &&
+                   (prev_entry->eflags & MAP_ENTRY_NEEDS_COPY))
+                       cow |= MAP_COPY_ON_WRITE;
+               if (prev_entry->eflags & MAP_ENTRY_NOFAULT)
+                       cow |= MAP_NOFAULT;
+               if (prev_entry->eflags & MAP_ENTRY_NOSYNC)
+                       cow |= MAP_DISABLE_SYNCER;
+               if (prev_entry->eflags & MAP_ENTRY_NOCOREDUMP)
+                       cow |= MAP_DISABLE_COREDUMP;
+       
+               /*
+                * Search for a new range using the old address as a
+                * hint.  Return address in start.
+                */
+               len = newend - addr;
+               *start = pmap_addr_hint(object, addr, len);
+               if (vm_map_findspace(map, *start, len, start)) {
+                       result = KERN_NO_SPACE;
+                       goto err;
+               }
+               result = vm_map_insert(map, object, prev_entry->offset,
+                                    *start, *start + len, prev_entry->protection,
+                                    prev_entry->max_protection, cow);
+               if (result == 0) {
+                       vm_map_lookup_entry(map, *start + len, &new_entry);
+                       if (!new_entry) {
+                               /* Impossible */
+                               vm_map_remove(map, *start, *start + len);
+                               result = KERN_INVALID_ADDRESS;
+                               goto err;
+                       }
+                       if (object)
+                               vm_object_reference(object);
+                       /*
+                        * Found a new region to place this block.  Copy
+                        * the page map or fault the pages into place.
+                        * We do this ourselves, since we don't want to
+                        * trigger COW protection on the page - we are just
+                        * relocating prev_entry.  Deallocating the old map
+                        * also must be done by hand.
+                        *
+                        * First, clip the old region out of the possible
+                        * coalesced entry.
+                        */
+                       vm_map_clip_start(map, prev_entry, addr);
+                       vm_map_clip_end(map, prev_entry, end);
+                       if (prev_entry->wired_count == 0)
+                               pmap_copy(map->pmap, map->pmap, new_entry->start,
+                                         len, prev_entry->start);
+                       else {
+                               vm_fault_copy_entry(map, map, new_entry, prev_entry);
+                               vm_map_entry_unwire(map, prev_entry);
+                       }
+                       if ((object != kernel_object) &&
+                           (object != kmem_object))
+                               pmap_remove(map->pmap, prev_entry->start, prev_entry->end);
+                       vm_map_entry_delete(map, prev_entry);
+                       vm_map_simplify_entry(map, new_entry);
+                       result = KERN_SUCCESS;
+                       goto err;
+               } else {
+                       result = KERN_NO_SPACE;
+               }
+               goto err;
+       }
+
+       offset = prev_entry->offset;
+       if ((prev_entry->wired_count == 0) &&
+           ((object == NULL) ||
+            vm_object_coalesce(object,
+                               OFF_TO_IDX(prev_entry->offset),
+                               (vm_size_t)(prev_entry->end - prev_entry->start),
+                               (vm_size_t)(newend - prev_entry->end)))) {
+               /*
+                * We were able to extend the object.  Determine if we
+                * can extend the previous map entry to include the
+                * new range as well.
+                */
+               if (prev_entry->inheritance == VM_INHERIT_DEFAULT) {
+                       map->size += (newend - prev_entry->end);
+                       prev_entry->end = newend;
+                       result = KERN_SUCCESS;
+                       goto err;
+               } 
+               offset = prev_entry->offset +
+                       (prev_entry->end - prev_entry->start);
+       }
+               
+       /*
+        * If we couldn't extend the object or map for any reason,
+        * we are going to reuse the vm_object from the previous map
+        * entry, so refcount it.
+        */
+       if (object) {
+               vm_object_reference(object);
+               vm_object_clear_flag(object, OBJ_ONEMAPPING);
+       }
+                       
+       /*
+        * Create a new map entry
+        */
+                               
+       new_entry = vm_map_entry_create(map);
+       new_entry->start = end;
+       new_entry->end = newend;   
+                               
+       new_entry->eflags = prev_entry->eflags;
+       new_entry->object.vm_object = prev_entry->object.vm_object;
+       new_entry->offset = offset;
+       new_entry->avail_ssize = 0;
+         
+        new_entry->inheritance = VM_INHERIT_DEFAULT;
+        new_entry->protection = prev_entry->protection;
+        new_entry->max_protection = prev_entry->max_protection;
+        new_entry->wired_count = 0;
+
+        /*
+         * Insert the new entry into the list
+         */
+                       
+        vm_map_entry_link(map, prev_entry, new_entry);
+        map->size += new_entry->end - new_entry->start;
+                       
+        /*
+         * Update the free space hint
+         */
+        if ((map->first_free == prev_entry) &&
+            (prev_entry->end >= new_entry->start)) {
+                map->first_free = new_entry;
+        }
+       result = KERN_SUCCESS;
+                       
+err:
+       vm_map_unlock(map);
+                 
+       return (result);
+}
+        
+/*
  *     vm_map_madvise:
  *
  *     This routine traverses a processes map handling the madvise

--- sys/vm/vm_mmap.c
+++ sys/vm/vm_mmap.c
@@ -152,6 +152,93 @@
 }
 #endif                         /* COMPAT_43 || COMPAT_SUNOS */
                
+/*
+ * Memory remap (mremap) system call.  Old address must be page  
+ * aligned.  If the MREMAP_MAYMOVE flag is specified, the pages
+ * may be automatically moved to a new location.
+ */
+#ifndef _SYS_SYSPROTO_H_
+struct mremap_args {
+       void *old_address;
+       size_t old_size;
+       size_t new_size;
+       int flags;
+};
+#endif
+                               
+int
+mremap(p, uap)
+       struct proc *p;
+       register struct mremap_args *uap;
+{
+       vm_offset_t addr;
+       vm_size_t osize, nsize;
+       vm_map_t map;
+       int error;
+        
+       addr = (vm_offset_t) uap->old_address;
+       /*
+        * Must be page aligned
+        */
+       if (trunc_page(addr) != addr)
+               return (EINVAL);
+           
+       if (uap->flags & ~MREMAP_MAYMOVE)
+               return (EINVAL);
+       
+       osize = round_page((vm_offset_t)uap->old_size);
+       nsize = round_page((vm_offset_t)uap->new_size);
+       if (osize == 0)
+               return (EINVAL);
+
+       /*
+        * Check for illegal addresses.  Watch out for address wrap... Note
+        * that VM_*_ADDRESS are not constants due to casts (argh).
+        */
+       if (VM_MAXUSER_ADDRESS > 0 && addr + nsize > VM_MAXUSER_ADDRESS)
+               return (EINVAL);
+#ifndef i386
+       if (VM_MIN_ADDRESS > 0 && addr < VM_MIN_ADDRESS)
+               return (EINVAL);
+#endif
+
+       /*
+        * nothing to do
+        */
+       if (nsize == osize)
+               return (0);
+
+       map = &p->p_vmspace->vm_map;
+
+       /*
+        * Shrink case
+        */
+       if (nsize < osize) {
+               /*
+                * Make sure entire range is allocated.
+                */
+               if (!vm_map_check_protection(map, addr, addr + osize, VM_PROT_NONE))
+                       return (EINVAL);
+               /* returns nothing but KERN_SUCCESS anyway */
+               (void) vm_map_remove(map, addr + nsize, addr + osize);
+               p->p_retval[0] = nsize ? (register_t) addr : 0;
+               return (0);
+       }
+
+       error = vm_map_extend(map, &addr, addr + osize, addr + nsize, uap->flags);
+       switch (error) {
+       case KERN_SUCCESS:
+               p->p_retval[0] = addr;
+               return (0);
+       case KERN_NO_SPACE:
+               return (ENOMEM);
+       case KERN_PROTECTION_FAILURE:
+               return (EACCES);
+       case KERN_INVALID_ADDRESS:
+       default:  
+               return (EINVAL);
+       }
+}
         
 /*
  * Memory Map (mmap) system call.  Note that the file offset

--- sys/vm/vm_map.h
+++ sys/vm/vm_map.h
@@ -356,6 +356,7 @@
 int vm_map_inherit __P((vm_map_t, vm_offset_t, vm_offset_t, vm_inherit_t));
 void vm_map_init __P((struct vm_map *, vm_offset_t, vm_offset_t));   
 int vm_map_insert __P((vm_map_t, vm_object_t, vm_ooffset_t,
vm_offset_t, vm_offset_t, vm_prot_t, vm
_prot_t, int));
+int vm_map_extend __P((vm_map_t, vm_offset_t *, vm_offset_t,
vm_offset_t, int));
 int vm_map_lookup __P((vm_map_t *, vm_offset_t, vm_prot_t,
vm_map_entry_t *, vm_object_t *,
     vm_pindex_t *, vm_prot_t *, boolean_t *));
 void vm_map_lookup_done __P((vm_map_t, vm_map_entry_t));--YTYhycYiqF9zA9CytwafDiHnykdEcxtDykF0WaOhLC7oGICH
Content-Type: text/plain; name="file.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="file.diff"

--- sys/sys/mman.h        Thu Mar 20 15:34:49 2003
+++ sys/sys/mman.h   Mon Dec 16 17:29:04 2002
@@ -123,6 +123,11 @@
 #define        MINCORE_REFERENCED_OTHER 0x8 /* Page has been referenced */
 #define        MINCORE_MODIFIED_OTHER  0x10 /* Page has been modified */
 
+/*
+ * Flags for mremap
+ */
+#define MREMAP_MAYMOVE         0x100 /* Region may be moved in memory */
+
 #ifndef _KERNEL
 
 #include <sys/cdefs.h>
@@ -148,6 +153,7 @@
 int    mincore __P((const void *, size_t, char *));
 int    minherit __P((void *, size_t, int));
 #endif
+caddr_t mremap __P((void *, size_t, size_t, unsigned long));
 __END_DECLS
 
 #endif /* !_KERNEL */
Comment 1 Bruce M Simpson freebsd_committer freebsd_triage 2003-12-03 05:57:49 UTC
Responsible Changed
From-To: freebsd-bugs->bms

I'll look at this.
Comment 2 Bruce M Simpson freebsd_committer freebsd_triage 2006-08-02 15:19:02 UTC
Responsible Changed
From-To: bms->freebsd-bugs

Back to the free pool
Comment 3 K. Macy freebsd_committer freebsd_triage 2007-11-16 00:32:04 UTC
Responsible Changed
From-To: freebsd-bugs->alc


alc's area 
alc - please reassign to me if you don't think you can get to it
Comment 4 Alexander Best freebsd_committer freebsd_triage 2010-07-27 22:16:03 UTC
i'm forwarding this conversation i had with alc@, because it reveals some 
details about the issues described in this PR which haven't been mentioned 
beforehand. this should help any developer deciding to work on this PR.

cheers.
alex

----- Forwarded message from Alan Cox <alc@cs.rice.edu> -----

Date: Mon, 26 Jul 2010 17:34:47 -0500
From: Alan Cox <alc@cs.rice.edu>
To: Alexander Best <arundel@freebsd.org>

I wasn't aware of the Google linker using mremap().  I would suggest 
that you add most of this e-mail to the PR for posterity.

[snip]
Alexander Best wrote:
>On Sun Jul 25 10, Alan Cox wrote:
>  
>>Alexander Best wrote:
>>    
>>>On Wed, Jul 21, 2010 at 04:20:41PM -0500, Alan Cox wrote:

[snip]

>>>i also stumpled over kern/59912 which provides 
>>>patches for a mremap() syscall.
>>>
>>>in 2000 such a syscall was discussed and the result was that freebsd 
>>>doesn't need mremap(), because doing munamp() and mmap() again is 
>>>sufficient enough. [1]
>>>
>>>matthew dillon wrote:
>>>
>>>   "There are a thousand ways to do it, which is why linux's mremap() 
>>>   syscall is stupid.
>>>
>>>   * simply mmap() a larger block in the first place.  For example,
>>>     if you have a 16K file mmap() 1MB.  You will seg fault on pages
>>>     that are beyond the file EOF, but those pages will become valid
>>>     the moment the file is extended into them without having to lift
>>>     a finger.
>>>
>>>   * mmap() the tail end of the newly extended file without removing or
>>>     overwriting the previous mmap, by specifying an absolute address.
>>>
>>>   * munmap() and re-mmap() the file.
>>>
>>>   * Don't depend on a single monolithic mmap(), it won't work for files
>>>     larger then 2-3GB anyway (on intel architecture), instead mmap the
>>>     file in chunks on an as-needed basis."
>>>
>>>my question is: should this pr remain open or is there a consens that 
>>>freebsd doesn't need mremap()? it looks like linux mremap() can be 
>>>perfectly simulated in the linuxulator with munmap()/mmap(). so the point 
>>>that freebsd having mremap() would improve linux compatibility isn't 
>>>really valid.
>>>
>>> 
>>>      
>>First, do a web search for the e-mail thread on the FreeBSD lists with 
>>the subject "perl malloc slow?".
>>    
>
>thanks a lot. i wasn't aware of the perl performance issue back then and 
>that
>the root cause for it may have been an excessive use of munmap()/mmap() 
>whereas
>linux was getting better performance due to mremap().
>
>  

[snip]

>>What if there is no file, that is, what if you want to relocate anonymous
>>memory?  As I understand mremap(), it can relocate anonymous memory for
>>which there is no backing file.
>>    
>
>indeed matt didn't comment on the situation you just described. it seems 
>this
>would be a scenario where mremap() could really pose a huge benefit to any 
>OS
>which has implemented it.
>
>  
>>It's not clear to me that mremap() is entirely without merit.  However,
>>it's never really been a priority for anyone with the necessary skill
>>set.  I haven't looked at this patch in ages, but I seem to recall that
>>there were corner cases that it didn't handle.
>>    
>
>the only bsd which has implemented mremap() is netbsd [1].  however the
>semantics are quite different to the ones of the linux implementation of
>mremap().  also the manual mentions that netbsd's mremap() is based upon
>mremap() from their linuxulator code.
>
>i'm not sure if that means that they implemented it properly or simply made
>their mremap() to wrap calls into munmap() and mmap().
>
>  
>>I'm content to leave it idle.  I don't see a compelling reason to close it.
>>    
>
>yeah maybe somebody wants to port an existing implementation of mremap() to
>freebsd. the one that's in linux seems to exist since 1995 and thus should 
>be
>quite solid. however there don't seem to exist a lot of benchmark results 
>for
>this scenario. i found [2], but that's still based on the old phkmalloc.
>
>it would be nice e.g. to see the linux and netbsd implementations of 
>mremap()
>compete against each other. maybe some other OSes also implement it and 
>could
>be included in the bechmarks.
>
>this is the one from MacOS X 10.5.7 btw:
>
>int
>mremap(void)
>{
>    /* Not yet implemented */
>    return (ENOTSUP);
>}
>
>*hehehe*
>
>i also saw that the GNU binutils include a very rough implementation of
>mremap() [3]. this only occurs in versions of binutils which feature the 
>GOLD
>linker created by google. since the GOLD linker relies upon mremap(), the
>binutils have to make sure that the new linker can be used even on systems
>that don't have mremap().  
>
>thanks for clearing things up.
>
>cheers.
>alex
>
>[1] 
>http://www.freebsd.org/cgi/man.cgi?query=mremap&apropos=0&sektion=0&manpath=NetBSD+5.0&format=html
>[2] http://www.dent.med.uni-muenchen.de/~wmglo/malloc-slides.html
>[3] 
>http://grok.x12.su/source/xref/dragonfly/contrib/binutils-2.20/gold/mremap.c
>
>  
>>Regards,
>>Alan
>>
>>    


----- End forwarded message -----
Comment 5 Alexander Best freebsd_committer freebsd_triage 2010-07-30 00:19:49 UTC
State Changed
From-To: open->suspended

Alan considers mremap() to be a reasonable addition to FreeBSD. However neither 
he nor any other developer are working on this issue atm. 
Thus set PR into suspended state. Developers willing to work on this PR should 
contact Alan.
Comment 6 Alexander Best freebsd_committer freebsd_triage 2011-11-10 20:29:52 UTC
Responsible Changed
From-To: alc->freebsd-bugs

Alan isn't working on this PR anymore -- assign it back into the pool.
Comment 7 Eitan Adler freebsd_committer freebsd_triage 2018-05-20 23:50:38 UTC
For bugs matching the following conditions:
- Status == In Progress
- Assignee == "bugs@FreeBSD.org"
- Last Modified Year <= 2017

Do
- Set Status to "Open"
Comment 8 Edward Tomasz Napierala freebsd_committer freebsd_triage 2021-01-14 12:58:13 UTC
FWIW, this is what breaks apt(8); it fails with:

E: Dynamic MMap ran out of room. Please increase the size of APT::Cache-Start.

The workaround is to:

echo "APT::Cache-Start 251658240;" >> "/compat/ubuntu/etc/apt/apt.conf.d/00freebsd"

I'm considering adding this workaround to sysutils/deboostrap; apt(8) seems to be only thing I've seen that's affected by the mremap problem.
Comment 9 Thomas Mueller 2021-11-19 12:56:24 UTC
I'm observing linux_mremap() calls issued by RPM 4.16 failing with -12 (ENOMEM).
Comment 10 Shawn Webb 2022-01-08 01:00:41 UTC
(In reply to Thomas Mueller from comment #9)
FreeBSD's linuxulator's mremap doesn't support extending mappings. It only supports shrinking.
Comment 11 Shawn Webb 2022-01-08 01:01:55 UTC
The Cross-DSO CFI implementation in llvm favors mremap, though it does contain a fallback in the case mremap doesn't exist. If FreeBSD were to provide an official mremap syscall implementation, both the linuxulator and llvm's Cross-DSO CFI implementation would benefit.