Bug 22441

Summary: pmap_growkernel() is not effective at kernel vm limit of 0xffc00000
Product: Base System Reporter: carp <carp>
Component: i386Assignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 4.1.1-RELEASE   
Hardware: Any   
OS: Any   

Description carp 2000-10-31 14:10:01 UTC
When called with a kernel vm limit of 0xffc00000, pamp_growkernel()
does not set-up the page mapping hardware because of an overflow.

In this line:

  addr = (addr + PAGE_SIZE * NPTEPG) & ~(PAGE_SIZE * NPTEPG - 1);
  ( Line 1403 in $FreeBSD: src/sys/i386/i386/pmap.c,v 1.250.2.5 2000/08/05 00:39:08 peter Exp $)

addr + PAGE_SIZE * NPTEPG overflows to zero when addr is 0xffc00000.

When addr is zero, the conditional in the following while loop is
never satisfied and the hardware page mapping data structures are
never initialized to cover the extended kernel vm.

I would also like to suggest that test of kernel_vm_end in 
vm_map_findspace() (src/sys/vm/vm_map.c) is a minor software 
layering violation.  kernel_vm_end is properly a pmap layer variable
and the test against kernel_vm_end should occur in pmap_growkernel().
I realize that making this change might appear to impede performance, 
but pmap_growkernel() only gets called during kernel initialization 
when the kernel is setting up sub-maps of the kernel map. (kernel_map)

Fix: 

In line 1403, I used a standard rounding macro and all appears well.

For instance, I have compiled and tested the kernel with:

  addr = roundup(addr, (PAGE_SIZE*NPTEPG));

Now that I got all of this typed, it appears that the kernel
is being extended an extra 4Mbytes whenever the limit submitted
to pmap_growkernel() falls on a 4Mbyte (i386 page directory) boundary.
The kernel, correctly, rounds up its vm limit to the next page and in 
most cases, the boundary submitted to pmap_growkernel() falls on
a page boundary.
How-To-Repeat: 
I am writing a driver that creates kernel vm up against the limit
of kernel vm and therefore requests that the kernel vm grow to 
0xffc00000.  I could submit a sample of the driver, but paper and
pencil should convince the reader that when PAGE_SIZE is 4096 and 
NPTEPG is 1024, 0xffc00000 + (4096 * 1024) overflows to zero.
Comment 1 David Greenman 2000-10-31 14:23:19 UTC
>When called with a kernel vm limit of 0xffc00000, pamp_growkernel()
>does not set-up the page mapping hardware because of an overflow.
>
>In this line:
>
>  addr = (addr + PAGE_SIZE * NPTEPG) & ~(PAGE_SIZE * NPTEPG - 1);
>  ( Line 1403 in $FreeBSD: src/sys/i386/i386/pmap.c,v 1.250.2.5 2000/08/05 00:39:08 peter Exp $)
>
>addr + PAGE_SIZE * NPTEPG overflows to zero when addr is 0xffc00000.

   I might not be understanding what you're doing exactly, but I should point
out that the alternate page table map (APTmap) starts at 0xffc00000, so I
don't see how you could ever use that area of virtual memory without serious
problems. What am I missing?

-DG

David Greenman
Co-founder, The FreeBSD Project - http://www.freebsd.org
President, TeraSolutions, Inc. - http://www.terasolutions.com
Pave the road of life with opportunities.
Comment 2 carp 2000-10-31 15:20:20 UTC
   cc: freebsd-gnats-submit@FreeBSD.ORG
   From: David Greenman <dg@root.com>
   Reply-To: dg@root.com
   Date: Tue, 31 Oct 2000 06:23:19 -0800
   Sender: dg@implode.root.com

   >When called with a kernel vm limit of 0xffc00000, pamp_growkernel()
   >does not set-up the page mapping hardware because of an overflow.
   >
   >In this line:
   >
   >  addr = (addr + PAGE_SIZE * NPTEPG) & ~(PAGE_SIZE * NPTEPG - 1);
   >  ( Line 1403 in $FreeBSD: src/sys/i386/i386/pmap.c,v 1.250.2.5 2000/08/05 00:39:08 peter Exp $)
   >
   >addr + PAGE_SIZE * NPTEPG overflows to zero when addr is 0xffc00000.

      I might not be understanding what you're doing exactly, 

I am not sure that what I am doing is strictly necessary, but
I can successfully allocate kernel vm this way in BSDI.

   but I
   should point out that the alternate page table map (APTmap) starts
   at 0xffc00000, so I don't see how you could ever use that area of
   virtual memory without serious problems. What am I missing?

That's good to know.

The problem is that 0xffc00000 is passed as a limit to pmap_growkernel().
I am not using 0xffc00000, I am using memory up to 0xffc00000 which
is the advertised limit to kernel vm. (kernel_map->header.end)

(kgdb) print /x * kernel_map
$221 = {header = {prev = 0xc02c1d60, next = 0xc02c1f70, start = 0xbfeff000, 
    end = 0xffc00000, avail_ssize = 0x0, object = {vm_object = 0x0, 
      sub_map = 0x0}, offset = 0x0, eflags = 0x0, protection = 0x0, 
    max_protection = 0x0, inheritance = 0x0, wired_count = 0x0, lastr = 0x0}, 
  lock = {lk_interlock = {lock_data = 0x0}, lk_flags = 0x1000000, 
    lk_sharecount = 0x0, lk_waitcount = 0x0, lk_exclusivecount = 0x0, 
    lk_prio = 0x4, lk_wmesg = 0xc0277da0, lk_timo = 0x0, 
    lk_lockholder = 0xffffffff}, nentries = 0x9, size = 0x1e531000, 
  system_map = 0x1, hint = 0xc02c1d60, timestamp = 0x21, 
  first_free = 0xc02c1d00, pmap = 0xc02cf8c0}

Pmap_growkernel()'s job is to set-up the physical page mappings to 
cover the increased kernel vm.  In order to do this on the i386
architecture, a page map directory entry must be made for every
4mbytes of memory.  The kernel vm layer rounds vm up to a page
boundary.  The physical map layer must round-up a 4mbyte boundary
because of the two level i386 page mapping hierarchy.

If the vm layer kernel memory limit falls on a page boundary,
but not on a page directory boundary, say 0xffb01000, 
then as far as physical map layer is concerned, a page directory 
entry must be created to cover vm from 0xffb00000 to 0xffbfffff.
If the vm layer limit falls on 0xffb00000, this means that
the kernel requested vm in the range 0xffaff000 to 0xffafffff.
The vm layer rounds this address up to 0xffb00000 and declares
that this is the limit for pmap_growkernel().  In the case where
the limit is 0xffb00000, pmap_growkernel actually makes a 
page directory entry for adresses >= 0xffb00000 -- a harmless
sort of overflow which merely sucks up an extra page of memory.
In the case where the kernel vm grows to the range of
[ 0xffbff001, 0xffc00000-1 ], (the page below 0xffc00000, in this
case, kernel vm requested explicitly by me), pmap_growkernel() 
overflows by rounding 0xffc0000 up to the next 4m boundary which 
on 32-bit machines is 0.

I hope this helps explain what I think is a problem.

Bill

-- 

Bill Carpenter			           carp@world.std.com
Comment 3 David Greenman 2000-10-31 21:03:29 UTC
>I am not sure that what I am doing is strictly necessary, but
>I can successfully allocate kernel vm this way in BSDI.
>
>   but I
>   should point out that the alternate page table map (APTmap) starts
>   at 0xffc00000, so I don't see how you could ever use that area of
>   virtual memory without serious problems. What am I missing?
>
>That's good to know.
>
>The problem is that 0xffc00000 is passed as a limit to pmap_growkernel().
>I am not using 0xffc00000, I am using memory up to 0xffc00000 which
>is the advertised limit to kernel vm. (kernel_map->header.end)
>
>(kgdb) print /x * kernel_map
>$221 = {header = {prev = 0xc02c1d60, next = 0xc02c1f70, start = 0xbfeff000, 
>    end = 0xffc00000, avail_ssize = 0x0, object = {vm_object = 0x0, 
>      sub_map = 0x0}, offset = 0x0, eflags = 0x0, protection = 0x0, 
>    max_protection = 0x0, inheritance = 0x0, wired_count = 0x0, lastr = 0x0}, 
>  lock = {lk_interlock = {lock_data = 0x0}, lk_flags = 0x1000000, 
>    lk_sharecount = 0x0, lk_waitcount = 0x0, lk_exclusivecount = 0x0, 
>    lk_prio = 0x4, lk_wmesg = 0xc0277da0, lk_timo = 0x0, 
>    lk_lockholder = 0xffffffff}, nentries = 0x9, size = 0x1e531000, 
>  system_map = 0x1, hint = 0xc02c1d60, timestamp = 0x21, 
>  first_free = 0xc02c1d00, pmap = 0xc02cf8c0}
>
>Pmap_growkernel()'s job is to set-up the physical page mappings to 
>cover the increased kernel vm.  In order to do this on the i386
>architecture, a page map directory entry must be made for every
>4mbytes of memory.  The kernel vm layer rounds vm up to a page
>boundary.  The physical map layer must round-up a 4mbyte boundary
>because of the two level i386 page mapping hierarchy.
>
>If the vm layer kernel memory limit falls on a page boundary,
>but not on a page directory boundary, say 0xffb01000, 
>then as far as physical map layer is concerned, a page directory 
>entry must be created to cover vm from 0xffb00000 to 0xffbfffff.
>If the vm layer limit falls on 0xffb00000, this means that
>the kernel requested vm in the range 0xffaff000 to 0xffafffff.
>The vm layer rounds this address up to 0xffb00000 and declares
>that this is the limit for pmap_growkernel().  In the case where
>the limit is 0xffb00000, pmap_growkernel actually makes a 
>page directory entry for adresses >= 0xffb00000 -- a harmless
>sort of overflow which merely sucks up an extra page of memory.
>In the case where the kernel vm grows to the range of
>[ 0xffbff001, 0xffc00000-1 ], (the page below 0xffc00000, in this
>case, kernel vm requested explicitly by me), pmap_growkernel() 
>overflows by rounding 0xffc0000 up to the next 4m boundary which 
>on 32-bit machines is 0.
>
>I hope this helps explain what I think is a problem.

   Your analysis appears to be correct, and using roundup() seems like a fine
solution. All three rounding expressions in that function need to be changed
to completely fix the problem.
   Thanks for the bug report!

-DG

David Greenman
Co-founder, The FreeBSD Project - http://www.freebsd.org
President, TeraSolutions, Inc. - http://www.terasolutions.com
Pave the road of life with opportunities.
Comment 4 iedowse 2001-11-17 22:07:42 UTC
I think the change proposed by this PR corresponds to the following
patch; pmap_growkernel's `addr' argument is one past the last valid
offset, so we don't want to change it if it is already at the start
of a 4M region. The roundup2() macro does the right thing in this
case, whereas the existing code may increase kernel_vm_end 4M too
far, so it can potentially wrap around to 0.

(See the PR for further details, though dispite what it says, I think
only the following rounding should be changed).

Ian

Index: pmap.c
===================================================================
RCS file: /dump/FreeBSD-CVS/src/sys/i386/i386/pmap.c,v
retrieving revision 1.297
diff -u -r1.297 pmap.c
--- pmap.c	17 Nov 2001 01:56:04 -0000	1.297
+++ pmap.c	17 Nov 2001 21:38:25 -0000
@@ -1587,7 +1587,7 @@
 			nkpt++;
 		}
 	}
-	addr = (addr + PAGE_SIZE * NPTEPG) & ~(PAGE_SIZE * NPTEPG - 1);
+	addr = roundup2(addr, PAGE_SIZE * NPTEPG);
 	while (kernel_vm_end < addr) {
 		if (pdir_pde(PTD, kernel_vm_end)) {
 			kernel_vm_end = (kernel_vm_end + PAGE_SIZE * NPTEPG) & ~(PAGE_SIZE * NPTEPG - 1);
Comment 5 iedowse freebsd_committer freebsd_triage 2002-08-12 11:35:49 UTC
State Changed
From-To: open->closed


Fixed in revision 1.359 of i386/i386/pmap.c, thanks!