Bug 204121 - numa(4) is broken: "vm_page_alloc: missing page" panic
Summary: numa(4) is broken: "vm_page_alloc: missing page" panic
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-bugs mailing list
URL:
Keywords: crash, needs-qa
Depends on:
Blocks:
 
Reported: 2015-10-29 13:16 UTC by Peter Holm
Modified: 2018-07-24 13:58 UTC (History)
3 users (show)

See Also:
koobs: mfc-stable9?
koobs: mfc-stable10?


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Holm freebsd_committer 2015-10-29 13:16:00 UTC
Not all memory can be allocated from a two domain configuration.
This results in a "vm_page_alloc: missing page" panic.

Details @ https://people.freebsd.org/~pho/stress/log/maxmemdom.txt
Comment 1 Conrad Meyer freebsd_committer 2015-11-02 17:20:36 UTC
Is this related to https://lists.freebsd.org/pipermail/freebsd-arch/2015-April/017138.html ?
Comment 2 Adrian Chadd freebsd_committer 2015-11-03 21:21:34 UTC
This is a long-standing VM issue that earlier first-touch page allocation (in freebsd-8?) would also hit.

I had a local modification in my NUMA branch that handled the case of the page allocation failing.

diff --git a/sys/vm/vm_page.c b/sys/vm/vm_page.c
index 3b58fb7..9bd4adc 100644
--- a/sys/vm/vm_page.c
+++ b/sys/vm/vm_page.c
@@ -1625,6 +1625,7 @@ vm_page_alloc(vm_object_t object, vm_pindex_t pindex, int req)
         * vm_page_cache().
         */
        mtx_lock_flags(&vm_page_queue_free_mtx, MTX_RECURSE);
+       m = NULL;
        if (vm_cnt.v_free_count + vm_cnt.v_cache_count > vm_cnt.v_free_reserved ||
            (req_class == VM_ALLOC_SYSTEM &&
            vm_cnt.v_free_count + vm_cnt.v_cache_count > vm_cnt.v_interrupt_free_min) ||
@@ -1669,7 +1670,19 @@ vm_page_alloc(vm_object_t object, vm_pindex_t pindex, int req)
                        }
 #endif
                }
-       } else {
+       }
+
+       /*
+        * Can't allocate or attempted to and couldn't allocate a page
+        * given the current VM policy.  Give up.
+        *
+        * Note - yes, this is one of the current shortcomings of the
+        * VM domain design - there's a global set of vm_cnt counters,
+        * and it's quite possible things will get unhappy with this.
+        * However without it'll kernel panic below - the code didn't
+        * check m == NULL here and would continue.
+        */
+       if (m == NULL) {
                /*
                 * Not allocatable, give up.
                 */
Comment 3 Peter Holm freebsd_committer 2015-11-05 06:55:17 UTC
I have tested the patch and this indeed removes the panic.

Out of curiosity I added a "failed allocation" counter:

+       if (m == NULL) {
   /*
    * Not allocatable, give up.
    */
   mtx_unlock(&vm_page_queue_free_mtx);
+  atomic_add_int(&pho, 1);
   atomic_add_int(&vm_pageout_deficit,

which reached
pho = 100479984
during this test scenario.
Comment 4 Adrian Chadd freebsd_committer 2015-11-05 07:03:41 UTC
Right. So this was in here way before my numa stuff let you configure things.

The problem is that the memory allocation isn't being perfectly balanced between numa domains and the VM thresholds are global. So, the VM thresholds say "there's pages", but when you go to allocate, there aren't any from the given domain.

Now, the odd situation here is that the page allocation should be "first touch round robin" so it should be failing back to allocating from another domain and only returning NULL if it couldn't find anything.

Can you try this on stable/10 with MAXMEMDOM set to something in your kernel config? I'd like to see if you hit the same issue.
Comment 5 Peter Holm freebsd_committer 2015-11-05 10:32:16 UTC
No problems on stable/10:

$ uname -a
FreeBSD t1.osted.lan 10.2-STABLE FreeBSD 10.2-STABLE #0 r290387: Thu Nov  5 11:03:23 CET 2015     pho@t1.osted.lan:/usr/src/sys/amd64/compile/MAXMEMDOM  amd64
$ /usr/bin/time -h ./maxmemdom.sh 
        8m8,92s real            0,12s user              5m10,15s sys
$ /usr/bin/time -h ./maxmemdom.sh 
        9m19,95s real           0,23s user              5m13,58s sys
$ sysctl vm.ndomains
vm.ndomains: 2
$
Comment 6 Adrian Chadd freebsd_committer 2015-11-05 18:49:47 UTC
ok. Well, there have been VM changes too between 10 and head.

What's the output of "sysctl vm.default_policy" ?

You can set it to "rr" (for round-robin) and retry. At that point it should be mirroring the existing default behaviour in stable/10 (which with NUMA enabled is round-robin).
Comment 7 Adrian Chadd freebsd_committer 2015-11-10 19:07:46 UTC
ok, I bet my first-touch iterator is biting me. It doesn't skip over the first-touch domain, so it's possible that you've hit a situation where the domain 'n' fails allocation, and the per-thread round-robin domain value is also 'n'.

We'll just have to fix the round-robin iterator routine to take a domain to 'skip' over (and have it ensure that it isn't just a single domain (0) and thus gets stuck skipping over that. :-)

I don't have any NUMA boxes handy atm but I'll try to come up with a patch to test.

Thanks!


-a
Comment 8 Kubilay Kocak freebsd_committer freebsd_triage 2016-01-11 18:22:05 UTC
Can't be In Progress without an Assignee

Can we get current/proposed patches added as attachments please
Comment 9 Peter Holm freebsd_committer 2016-06-03 20:57:37 UTC
I tested the patch committed as r293640. The panic reported is no longer seen.
Comment 10 Peter Holm freebsd_committer 2018-07-24 13:58:43 UTC
Can not reproduce the problem any more.