Bug 14042

Summary: mbuf shortage and allocation with M_WAIT results in a panic
Product: Base System Reporter: bmilekic <bmilekic>
Component: kernAssignee: Brian Feldman <green>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 3.3-STABLE   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
file.diff none

Description bmilekic@dsuper.net 1999-09-30 04:10:00 UTC
Upon starvation of the mb_map arena, we solely rely on freed (recycled) mbufs
and mbuf clusters since nothing is every freed back to the map.
When no free mbuf is available, if we're M_DONTWAIT, a null is returned (this
is expected behaviour). However, if we're calling with M_WAIT and there's
nothing available, a call to the protocol drain routines is made and if after
that point we still have nothing available, we panic().
When no free mbuf cluster is available, whether we call M_DONTWAIT or M_WAIT
makes no difference in the case of mb_map already being starved and a failure
is immediately reported.

Fix: The diffs provided below add functions that will sleep for a predefined amount
of time waiting for an mbuf (or mbuf cluster) to be freed. In the case of a
failure even after the sleep, the callers to the allocation routines should
check for a null being returned. These diffs make sure that the check is  
performed, and if the failure is detected, the calling routines themselves 
would normally return ENOBUFS. The applications in userland generating the
system calls that eventually end up calling the allocation routines (e.g.
sosend() ) should check for the possiblity of the ENOBUFS and handle the
situation appropriately.

These diffs are for -STABLE but if the decision is made for them to be
commited, I will gladly make sure that they function and not interfere
with other work that made be in progress on this part of the code for
what concerns -CURRENT.

The first diff applies to sys/kern whereas the second applies to sys/sys/mbuf.h

--8<--[mbuf.patch]--8<--

--8<--[end]--8<--


--8<--[mbuf2.patch]--8<--



--8<--[end]--8<--

There are still very few calls to the allocation routines and macros from
some of the net module code (net/, netinet/, etc.) with M_WAIT. Most of
these calls check for the possibility of error... a few still remain. From
what I've been able to see, what remains is not much and can be remedied
fairly quickly (I just never got around to it). Again, if interest arises,
I will gladly go over that as well -- these patches, however, take care of
the main problem.

--Bosko M.--fNasqLtyrnNDIEEJoSnkDdQaMLwFrXzeRXwUp6nTHo1USetn
Content-Type: text/plain; name="file.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="file.diff"

diff -ruN /usr/src/sys/kern.old/uipc_mbuf.c /usr/src/sys/kern/uipc_mbuf.c
--- /usr/src/sys/kern.old/uipc_mbuf.c   Wed Sep  8 20:45:50 1999
+++ /usr/src/sys/kern/uipc_mbuf.c       Wed Sep 29 19:31:37 1999
@@ -47,6 +47,10 @@
 #include <vm/vm_kern.h>
 #include <vm/vm_extern.h>
 
+#ifdef INVARIANTS
+#include <machine/cpu.h>
+#endif
+
 static void mbinit __P((void *));
 SYSINIT(mbuf, SI_SUB_MBUF, SI_ORDER_FIRST, mbinit, NULL)
 
@@ -60,6 +64,8 @@
 int    max_hdr;
 int    max_datalen;
 
+static u_int m_mballoc_wid = 0, m_clalloc_wid = 0;
+
 SYSCTL_INT(_kern_ipc, KIPC_MAX_LINKHDR, max_linkhdr, CTLFLAG_RW,
           &max_linkhdr, 0, "");
 SYSCTL_INT(_kern_ipc, KIPC_MAX_PROTOHDR, max_protohdr, CTLFLAG_RW,
@@ -73,9 +79,12 @@
 
 /* "number of clusters of pages" */
 #define NCL_INIT       1
-
 #define NMB_INIT       16
 
+/* This sleep time definition/hz will determine the duration of the
+   mbuf and mbuf cluster "wait" sleeps if exhaustion occurs. */
+#define M_SLEEP_TIME   32
+
 /* ARGSUSED*/
 static void
 mbinit(dummy)
@@ -153,6 +162,50 @@
        return (1);
 }
 
+/*
+ * For callers with M_WAIT (who prefer -- and can afford -- to wait
+ * a little longer as opposed to immediately return null). 
+ */
+struct mbuf *
+m_mballoc_wait(caller, type)
+       int caller;
+       u_short type;
+{
+       struct mbuf *p;
+
+       /* Sleep until something's available or until we expire. */
+       m_mballoc_wid++;
+       if ( (tsleep(&m_mballoc_wid, PVM, "mballc",
+           M_SLEEP_TIME)) == EWOULDBLOCK)
+               m_mballoc_wid--;
+       
+       /*
+        * Now that we (think) that we've got something, we will redo an
+        * MGET, but avoid getting into another instance of m_mballoc_wait()
+        */
+#define m_mballoc_wait(caller,type)    (struct mbuf *)0 
+       if (caller == MGET_C) {
+               MGET(p, M_WAIT, type);
+       } else {
+               MGETHDR(p, M_WAIT, type);
+       }
+#undef m_mballoc_wait
+ 
+       return (p);
+}
+
+/*
+ * Function used to wakeup sleepers waiting for mbufs...
+ */
+void
+m_mballoc_wakeup(void)
+{
+       if (m_mballoc_wid) {
+               m_mballoc_wid = 0;
+               wakeup(&m_mballoc_wid);
+       }
+}
+
 #if MCLBYTES > PAGE_SIZE
 static int i_want_my_mcl;
 
@@ -242,6 +295,46 @@
 }
 
 /*
+ * For callers with M_WAIT (who prefer -- and can afford -- to wait
+ * a little longer as opposed to immediately return null).
+ */
+caddr_t
+m_clalloc_wait(void)
+{
+       caddr_t p;
+
+       KASSERT(intr_nesting_level == 0,("CLALLOC: CANNOT WAIT IN INTERRUPT"));
+
+       /* Sleep until something's available or until we expire. */
+       m_clalloc_wid++;
+       if ( (tsleep(&m_clalloc_wid, PVM, "mclalc",
+           M_SLEEP_TIME)) == EWOULDBLOCK)
+               m_clalloc_wid--;
+       
+       /*
+        * Now that we (think) that we've got something, we will redo and
+        * MGET, but avoid getting into another instance of m_clalloc_wait()
+        */
+#define m_clalloc_wait()       (caddr_t)0
+       MCLALLOC(p,M_WAIT);
+#undef m_clalloc_wait
+
+       return (p);
+}
+
+/*
+ * Function used to wakeup sleepers waiting for mbuf clusters...
+ */
+void
+m_clalloc_wakeup(void)
+{
+       if (m_clalloc_wid) {
+               m_clalloc_wid = 0;
+               wakeup(&m_clalloc_wid);
+       }
+}
+
+/*
  * When MGET fails, ask protocols to free space when short of memory,
  * then re-attempt to allocate an mbuf.
  */
@@ -254,19 +347,27 @@
        /*
         * Must only do the reclaim if not in an interrupt context.
         */
-       if (i == M_WAIT)
+       if (i == M_WAIT) {
+               KASSERT(intr_nesting_level == 0,
+                   ("MBALLOC: CANNOT WAIT IN INTERRUPT"));
                m_reclaim();
+       }
+
+       /*
+        * XXX Be paranoid and define both as null. We don't want to wait
+        * from here.
+        */
+#define m_mballoc_wait(caller,type)    (struct mbuf *)0
 #define m_retry(i, t)  (struct mbuf *)0
        MGET(m, i, t);
 #undef m_retry
+#undef m_mballoc_wait
+
        if (m != NULL) {
                mbstat.m_wait++;
-       } else {
-               if (i == M_DONTWAIT)
-                       mbstat.m_drops++;
-               else
-                       panic("Out of mbuf clusters");
-       }
+       } else 
+               mbstat.m_drops++;
+
        return (m);
 }
 
@@ -284,17 +385,16 @@
         */
        if (i == M_WAIT)
                m_reclaim();
+#define m_mballoc_wait(caller,type) (struct mbuf *)0
 #define m_retryhdr(i, t) (struct mbuf *)0
        MGETHDR(m, i, t);
 #undef m_retryhdr
+#undef m_mballoc_wait
        if (m != NULL) {
                mbstat.m_wait++;
-       } else {
-               if (i == M_DONTWAIT)
-                       mbstat.m_drops++;
-               else
-                       panic("Out of mbuf clusters");
-       }
+       } else 
+               mbstat.m_drops++;
+
        return (m);
 }
 
diff -ruN /usr/src/sys/kern.old/uipc_socket.c /usr/src/sys/kern/uipc_socket.c
--- /usr/src/sys/kern.old/uipc_socket.c Wed Sep  8 20:45:50 1999
+++ /usr/src/sys/kern/uipc_socket.c     Sun Sep 26 19:33:42 1999
@@ -486,15 +486,21 @@
                    } else do {
                        if (top == 0) {
                                MGETHDR(m, M_WAIT, MT_DATA);
+                               if (m == NULL)
+                                       return ENOBUFS;
                                mlen = MHLEN;
                                m->m_pkthdr.len = 0;
                                m->m_pkthdr.rcvif = (struct ifnet *)0;
                        } else {
                                MGET(m, M_WAIT, MT_DATA);
+                               if (m == NULL)
+                                       return ENOBUFS;
                                mlen = MLEN;
                        }
                        if (resid >= MINCLSIZE) {
                                MCLGET(m, M_WAIT);
+                               if (m == NULL)
+                                       return ENOBUFS;
                                if ((m->m_flags & M_EXT) == 0)
                                        goto nopages;
                                mlen = MCLBYTES;
@@ -606,6 +612,8 @@
                flags = 0;
        if (flags & MSG_OOB) {
                m = m_get(M_WAIT, MT_DATA);
+               if (m == NULL)
+                       return ENOBUFS;
                error = (*pr->pr_usrreqs->pru_rcvoob)(so, m, flags & MSG_PEEK);
                if (error)
                        goto bad;
diff -ruN /usr/src/sys/kern.old/uipc_socket2.c /usr/src/sys/kern/uipc_socket2.c
--- /usr/src/sys/kern.old/uipc_socket2.c        Wed Sep  8 20:45:50 1999
+++ /usr/src/sys/kern/uipc_socket2.c    Sun Sep 26 16:17:56 1999
@@ -407,7 +407,15 @@
 sbrelease(sb)
        struct sockbuf *sb;
 {
-
+       /* 
+        * XXX If sockbuf is locked and we call sbflush, we will
+        * panic for sure, avoid that for now by just unlocking the
+        * sockbuf, so that sbflush can flush it, because if this is
+        * occuring, chances are we want to kill the process holding
+        * the sockbuf space anyway...
+        */
+       if (sb->sb_flags & SB_LOCK) 
+               sb->sb_flags &= ~SB_LOCK;
        sbflush(sb);
        sb->sb_hiwat = sb->sb_mbmax = 0;
 }
diff -ruN /usr/src/sys/kern.old/uipc_syscalls.c /usr/src/sys/kern/uipc_syscalls.
c
--- /usr/src/sys/kern.old/uipc_syscalls.c       Wed Sep  8 20:45:50 1999
+++ /usr/src/sys/kern/uipc_syscalls.c   Sun Sep 26 19:34:29 1999
@@ -1620,6 +1620,8 @@
                 * Get an mbuf header and set it up as having external storage.
                 */
                MGETHDR(m, M_WAIT, MT_DATA);
+               if (m == NULL)
+                       return ENOBUFS;
                m->m_ext.ext_free = sf_buf_free;
                m->m_ext.ext_ref = sf_buf_ref;
                m->m_ext.ext_buf = (void *)sf->kva;
How-To-Repeat: 
There are several ways to reproduce this problem. An obvious [local] way to
do this would be to sockopt large socket buffer space. Eventually, we will run 
out of mbufs and panic.
Comment 1 bmilekic@dsuper.net 1999-12-06 01:49:47 UTC
 (Appologies for the double-post... I don't believe the first post ended up
 in the correct place.)


  I'd like to point out that the above patches for -STABLE are way
  outdated, and that the most recent version of these diffs for -CURRENT is
  now available at:

   http://pages.infinit.net/bmilekic/mbuf.patch or mbuf.diff (for `diff -c')

  It is presently in a state where it does what it was intended to do and
  is awaiting decent review/approval from someone in the position to do so.

  -Bosko.  

--
  Bosko Milekic <bmilekic@technokratis.com>
Comment 2 bill fumerola freebsd_committer freebsd_triage 1999-12-13 21:21:10 UTC
State Changed
From-To: open->closed

Committed to -CURRENT. 


Comment 3 bill fumerola freebsd_committer freebsd_triage 1999-12-13 21:21:10 UTC
Responsible Changed
From-To: freebsd-bugs->green

Brian committed the fixes.