204121 – numa(4) is broken: "vm_page_alloc: missing page" panic

Bug 204121 - numa(4) is broken: "vm_page_alloc: missing page" panic

Summary: numa(4) is broken: "vm_page_alloc: missing page" panic

Status:	Closed FIXED

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	CURRENT
Hardware:	Any Any

Importance:	--- Affects Only Me
Assignee:	freebsd-bugs (Nobody)

URL:
Keywords:	crash, needs-qa

Depends on:
Blocks:

Reported:	2015-10-29 13:16 UTC by Peter Holm
Modified:	2018-07-24 13:58 UTC (History)
CC List:	3 users (show)

See Also:

Flags:	koobs: mfc-stable10? koobs: mfc-stable9?

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Peter Holm freebsd_committer

2015-10-29 13:16:00 UTC

Not all memory can be allocated from a two domain configuration.
This results in a "vm_page_alloc: missing page" panic.

Details @ https://people.freebsd.org/~pho/stress/log/maxmemdom.txt

Comment 1 Conrad Meyer freebsd_committer

2015-11-02 17:20:36 UTC

Is this related to https://lists.freebsd.org/pipermail/freebsd-arch/2015-April/017138.html ?

Comment 2 Adrian Chadd freebsd_committer

2015-11-03 21:21:34 UTC

This is a long-standing VM issue that earlier first-touch page allocation (in freebsd-8?) would also hit.

I had a local modification in my NUMA branch that handled the case of the page allocation failing.

diff --git a/sys/vm/vm_page.c b/sys/vm/vm_page.c
index 3b58fb7..9bd4adc 100644
--- a/sys/vm/vm_page.c
+++ b/sys/vm/vm_page.c
@@ -1625,6 +1625,7 @@ vm_page_alloc(vm_object_t object, vm_pindex_t pindex, int req)
         * vm_page_cache().
         */
        mtx_lock_flags(&vm_page_queue_free_mtx, MTX_RECURSE);
+       m = NULL;
        if (vm_cnt.v_free_count + vm_cnt.v_cache_count > vm_cnt.v_free_reserved ||
            (req_class == VM_ALLOC_SYSTEM &&
            vm_cnt.v_free_count + vm_cnt.v_cache_count > vm_cnt.v_interrupt_free_min) ||
@@ -1669,7 +1670,19 @@ vm_page_alloc(vm_object_t object, vm_pindex_t pindex, int req)
                        }
 #endif
                }
-       } else {
+       }
+
+       /*
+        * Can't allocate or attempted to and couldn't allocate a page
+        * given the current VM policy.  Give up.
+        *
+        * Note - yes, this is one of the current shortcomings of the
+        * VM domain design - there's a global set of vm_cnt counters,
+        * and it's quite possible things will get unhappy with this.
+        * However without it'll kernel panic below - the code didn't
+        * check m == NULL here and would continue.
+        */
+       if (m == NULL) {
                /*
                 * Not allocatable, give up.
                 */

Comment 3 Peter Holm freebsd_committer

2015-11-05 06:55:17 UTC

I have tested the patch and this indeed removes the panic.

Out of curiosity I added a "failed allocation" counter:

+       if (m == NULL) {
   /*
    * Not allocatable, give up.
    */
   mtx_unlock(&vm_page_queue_free_mtx);
+  atomic_add_int(&pho, 1);
   atomic_add_int(&vm_pageout_deficit,

which reached
pho = 100479984
during this test scenario.

Comment 4 Adrian Chadd freebsd_committer

2015-11-05 07:03:41 UTC

Right. So this was in here way before my numa stuff let you configure things.

The problem is that the memory allocation isn't being perfectly balanced between numa domains and the VM thresholds are global. So, the VM thresholds say "there's pages", but when you go to allocate, there aren't any from the given domain.

Now, the odd situation here is that the page allocation should be "first touch round robin" so it should be failing back to allocating from another domain and only returning NULL if it couldn't find anything.

Can you try this on stable/10 with MAXMEMDOM set to something in your kernel config? I'd like to see if you hit the same issue.

Comment 5 Peter Holm freebsd_committer

2015-11-05 10:32:16 UTC

No problems on stable/10:

$ uname -a
FreeBSD t1.osted.lan 10.2-STABLE FreeBSD 10.2-STABLE #0 r290387: Thu Nov  5 11:03:23 CET 2015     pho@t1.osted.lan:/usr/src/sys/amd64/compile/MAXMEMDOM  amd64
$ /usr/bin/time -h ./maxmemdom.sh 
        8m8,92s real            0,12s user              5m10,15s sys
$ /usr/bin/time -h ./maxmemdom.sh 
        9m19,95s real           0,23s user              5m13,58s sys
$ sysctl vm.ndomains
vm.ndomains: 2
$

Comment 6 Adrian Chadd freebsd_committer

2015-11-05 18:49:47 UTC

ok. Well, there have been VM changes too between 10 and head.

What's the output of "sysctl vm.default_policy" ?

You can set it to "rr" (for round-robin) and retry. At that point it should be mirroring the existing default behaviour in stable/10 (which with NUMA enabled is round-robin).

Comment 7 Adrian Chadd freebsd_committer

2015-11-10 19:07:46 UTC

ok, I bet my first-touch iterator is biting me. It doesn't skip over the first-touch domain, so it's possible that you've hit a situation where the domain 'n' fails allocation, and the per-thread round-robin domain value is also 'n'.

We'll just have to fix the round-robin iterator routine to take a domain to 'skip' over (and have it ensure that it isn't just a single domain (0) and thus gets stuck skipping over that. :-)

I don't have any NUMA boxes handy atm but I'll try to come up with a patch to test.

Thanks!


-a

Comment 8 Kubilay Kocak freebsd_committer

2016-01-11 18:22:05 UTC

Can't be In Progress without an Assignee

Can we get current/proposed patches added as attachments please

Comment 9 Peter Holm freebsd_committer

2016-06-03 20:57:37 UTC

I tested the patch committed as r293640. The panic reported is no longer seen.

Comment 10 Peter Holm freebsd_committer

2018-07-24 13:58:43 UTC

Can not reproduce the problem any more.