| Summary: | mbuf shortage and allocation with M_WAIT results in a panic | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Base System | Reporter: | bmilekic <bmilekic> | ||||
| Component: | kern | Assignee: | Brian Feldman <green> | ||||
| Status: | Closed FIXED | ||||||
| Severity: | Affects Only Me | ||||||
| Priority: | Normal | ||||||
| Version: | 3.3-STABLE | ||||||
| Hardware: | Any | ||||||
| OS: | Any | ||||||
| Attachments: |
|
||||||
(Appologies for the double-post... I don't believe the first post ended up in the correct place.) I'd like to point out that the above patches for -STABLE are way outdated, and that the most recent version of these diffs for -CURRENT is now available at: http://pages.infinit.net/bmilekic/mbuf.patch or mbuf.diff (for `diff -c') It is presently in a state where it does what it was intended to do and is awaiting decent review/approval from someone in the position to do so. -Bosko. -- Bosko Milekic <bmilekic@technokratis.com> State Changed From-To: open->closed Committed to -CURRENT. Responsible Changed From-To: freebsd-bugs->green Brian committed the fixes. |
Upon starvation of the mb_map arena, we solely rely on freed (recycled) mbufs and mbuf clusters since nothing is every freed back to the map. When no free mbuf is available, if we're M_DONTWAIT, a null is returned (this is expected behaviour). However, if we're calling with M_WAIT and there's nothing available, a call to the protocol drain routines is made and if after that point we still have nothing available, we panic(). When no free mbuf cluster is available, whether we call M_DONTWAIT or M_WAIT makes no difference in the case of mb_map already being starved and a failure is immediately reported. Fix: The diffs provided below add functions that will sleep for a predefined amount of time waiting for an mbuf (or mbuf cluster) to be freed. In the case of a failure even after the sleep, the callers to the allocation routines should check for a null being returned. These diffs make sure that the check is performed, and if the failure is detected, the calling routines themselves would normally return ENOBUFS. The applications in userland generating the system calls that eventually end up calling the allocation routines (e.g. sosend() ) should check for the possiblity of the ENOBUFS and handle the situation appropriately. These diffs are for -STABLE but if the decision is made for them to be commited, I will gladly make sure that they function and not interfere with other work that made be in progress on this part of the code for what concerns -CURRENT. The first diff applies to sys/kern whereas the second applies to sys/sys/mbuf.h --8<--[mbuf.patch]--8<-- --8<--[end]--8<-- --8<--[mbuf2.patch]--8<-- --8<--[end]--8<-- There are still very few calls to the allocation routines and macros from some of the net module code (net/, netinet/, etc.) with M_WAIT. Most of these calls check for the possibility of error... a few still remain. From what I've been able to see, what remains is not much and can be remedied fairly quickly (I just never got around to it). Again, if interest arises, I will gladly go over that as well -- these patches, however, take care of the main problem. --Bosko M.--fNasqLtyrnNDIEEJoSnkDdQaMLwFrXzeRXwUp6nTHo1USetn Content-Type: text/plain; name="file.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="file.diff" diff -ruN /usr/src/sys/kern.old/uipc_mbuf.c /usr/src/sys/kern/uipc_mbuf.c --- /usr/src/sys/kern.old/uipc_mbuf.c Wed Sep 8 20:45:50 1999 +++ /usr/src/sys/kern/uipc_mbuf.c Wed Sep 29 19:31:37 1999 @@ -47,6 +47,10 @@ #include <vm/vm_kern.h> #include <vm/vm_extern.h> +#ifdef INVARIANTS +#include <machine/cpu.h> +#endif + static void mbinit __P((void *)); SYSINIT(mbuf, SI_SUB_MBUF, SI_ORDER_FIRST, mbinit, NULL) @@ -60,6 +64,8 @@ int max_hdr; int max_datalen; +static u_int m_mballoc_wid = 0, m_clalloc_wid = 0; + SYSCTL_INT(_kern_ipc, KIPC_MAX_LINKHDR, max_linkhdr, CTLFLAG_RW, &max_linkhdr, 0, ""); SYSCTL_INT(_kern_ipc, KIPC_MAX_PROTOHDR, max_protohdr, CTLFLAG_RW, @@ -73,9 +79,12 @@ /* "number of clusters of pages" */ #define NCL_INIT 1 - #define NMB_INIT 16 +/* This sleep time definition/hz will determine the duration of the + mbuf and mbuf cluster "wait" sleeps if exhaustion occurs. */ +#define M_SLEEP_TIME 32 + /* ARGSUSED*/ static void mbinit(dummy) @@ -153,6 +162,50 @@ return (1); } +/* + * For callers with M_WAIT (who prefer -- and can afford -- to wait + * a little longer as opposed to immediately return null). + */ +struct mbuf * +m_mballoc_wait(caller, type) + int caller; + u_short type; +{ + struct mbuf *p; + + /* Sleep until something's available or until we expire. */ + m_mballoc_wid++; + if ( (tsleep(&m_mballoc_wid, PVM, "mballc", + M_SLEEP_TIME)) == EWOULDBLOCK) + m_mballoc_wid--; + + /* + * Now that we (think) that we've got something, we will redo an + * MGET, but avoid getting into another instance of m_mballoc_wait() + */ +#define m_mballoc_wait(caller,type) (struct mbuf *)0 + if (caller == MGET_C) { + MGET(p, M_WAIT, type); + } else { + MGETHDR(p, M_WAIT, type); + } +#undef m_mballoc_wait + + return (p); +} + +/* + * Function used to wakeup sleepers waiting for mbufs... + */ +void +m_mballoc_wakeup(void) +{ + if (m_mballoc_wid) { + m_mballoc_wid = 0; + wakeup(&m_mballoc_wid); + } +} + #if MCLBYTES > PAGE_SIZE static int i_want_my_mcl; @@ -242,6 +295,46 @@ } /* + * For callers with M_WAIT (who prefer -- and can afford -- to wait + * a little longer as opposed to immediately return null). + */ +caddr_t +m_clalloc_wait(void) +{ + caddr_t p; + + KASSERT(intr_nesting_level == 0,("CLALLOC: CANNOT WAIT IN INTERRUPT")); + + /* Sleep until something's available or until we expire. */ + m_clalloc_wid++; + if ( (tsleep(&m_clalloc_wid, PVM, "mclalc", + M_SLEEP_TIME)) == EWOULDBLOCK) + m_clalloc_wid--; + + /* + * Now that we (think) that we've got something, we will redo and + * MGET, but avoid getting into another instance of m_clalloc_wait() + */ +#define m_clalloc_wait() (caddr_t)0 + MCLALLOC(p,M_WAIT); +#undef m_clalloc_wait + + return (p); +} + +/* + * Function used to wakeup sleepers waiting for mbuf clusters... + */ +void +m_clalloc_wakeup(void) +{ + if (m_clalloc_wid) { + m_clalloc_wid = 0; + wakeup(&m_clalloc_wid); + } +} + +/* * When MGET fails, ask protocols to free space when short of memory, * then re-attempt to allocate an mbuf. */ @@ -254,19 +347,27 @@ /* * Must only do the reclaim if not in an interrupt context. */ - if (i == M_WAIT) + if (i == M_WAIT) { + KASSERT(intr_nesting_level == 0, + ("MBALLOC: CANNOT WAIT IN INTERRUPT")); m_reclaim(); + } + + /* + * XXX Be paranoid and define both as null. We don't want to wait + * from here. + */ +#define m_mballoc_wait(caller,type) (struct mbuf *)0 #define m_retry(i, t) (struct mbuf *)0 MGET(m, i, t); #undef m_retry +#undef m_mballoc_wait + if (m != NULL) { mbstat.m_wait++; - } else { - if (i == M_DONTWAIT) - mbstat.m_drops++; - else - panic("Out of mbuf clusters"); - } + } else + mbstat.m_drops++; + return (m); } @@ -284,17 +385,16 @@ */ if (i == M_WAIT) m_reclaim(); +#define m_mballoc_wait(caller,type) (struct mbuf *)0 #define m_retryhdr(i, t) (struct mbuf *)0 MGETHDR(m, i, t); #undef m_retryhdr +#undef m_mballoc_wait if (m != NULL) { mbstat.m_wait++; - } else { - if (i == M_DONTWAIT) - mbstat.m_drops++; - else - panic("Out of mbuf clusters"); - } + } else + mbstat.m_drops++; + return (m); } diff -ruN /usr/src/sys/kern.old/uipc_socket.c /usr/src/sys/kern/uipc_socket.c --- /usr/src/sys/kern.old/uipc_socket.c Wed Sep 8 20:45:50 1999 +++ /usr/src/sys/kern/uipc_socket.c Sun Sep 26 19:33:42 1999 @@ -486,15 +486,21 @@ } else do { if (top == 0) { MGETHDR(m, M_WAIT, MT_DATA); + if (m == NULL) + return ENOBUFS; mlen = MHLEN; m->m_pkthdr.len = 0; m->m_pkthdr.rcvif = (struct ifnet *)0; } else { MGET(m, M_WAIT, MT_DATA); + if (m == NULL) + return ENOBUFS; mlen = MLEN; } if (resid >= MINCLSIZE) { MCLGET(m, M_WAIT); + if (m == NULL) + return ENOBUFS; if ((m->m_flags & M_EXT) == 0) goto nopages; mlen = MCLBYTES; @@ -606,6 +612,8 @@ flags = 0; if (flags & MSG_OOB) { m = m_get(M_WAIT, MT_DATA); + if (m == NULL) + return ENOBUFS; error = (*pr->pr_usrreqs->pru_rcvoob)(so, m, flags & MSG_PEEK); if (error) goto bad; diff -ruN /usr/src/sys/kern.old/uipc_socket2.c /usr/src/sys/kern/uipc_socket2.c --- /usr/src/sys/kern.old/uipc_socket2.c Wed Sep 8 20:45:50 1999 +++ /usr/src/sys/kern/uipc_socket2.c Sun Sep 26 16:17:56 1999 @@ -407,7 +407,15 @@ sbrelease(sb) struct sockbuf *sb; { - + /* + * XXX If sockbuf is locked and we call sbflush, we will + * panic for sure, avoid that for now by just unlocking the + * sockbuf, so that sbflush can flush it, because if this is + * occuring, chances are we want to kill the process holding + * the sockbuf space anyway... + */ + if (sb->sb_flags & SB_LOCK) + sb->sb_flags &= ~SB_LOCK; sbflush(sb); sb->sb_hiwat = sb->sb_mbmax = 0; } diff -ruN /usr/src/sys/kern.old/uipc_syscalls.c /usr/src/sys/kern/uipc_syscalls. c --- /usr/src/sys/kern.old/uipc_syscalls.c Wed Sep 8 20:45:50 1999 +++ /usr/src/sys/kern/uipc_syscalls.c Sun Sep 26 19:34:29 1999 @@ -1620,6 +1620,8 @@ * Get an mbuf header and set it up as having external storage. */ MGETHDR(m, M_WAIT, MT_DATA); + if (m == NULL) + return ENOBUFS; m->m_ext.ext_free = sf_buf_free; m->m_ext.ext_ref = sf_buf_ref; m->m_ext.ext_buf = (void *)sf->kva; How-To-Repeat: There are several ways to reproduce this problem. An obvious [local] way to do this would be to sockopt large socket buffer space. Eventually, we will run out of mbufs and panic.