Summary: | [patch] [panic] Kernel panic in udp_input() | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | Rick <vrwmiller> | ||||||
Component: | kern | Assignee: | freebsd-bugs (Nobody) <bugs> | ||||||
Status: | Closed FIXED | ||||||||
Severity: | Affects Only Me | ||||||||
Priority: | Normal | ||||||||
Version: | 8.3-RELEASE | ||||||||
Hardware: | Any | ||||||||
OS: | Any | ||||||||
Attachments: |
|
Description
Rick
2012-10-22 18:40:00 UTC
Some of our developers spent a significant amount of time troubleshooting this. The race condition has been identified. The relevant information is below: It is a race condition between: - udp_input() - udp_detach() - udp_pcblist() How to reproduce: - Call sysctl(" net.inet.udp.pcblist") continuously with: $ while true; do sysctl -x net.inet.udp.pcblist > /dev/null; done; - Open a bunch of UDP sockets (to slow down udp_pcblist() call): $ for port in $(jot - 20000 24000 1); do socat -u -T 1 UDP4-LISTEN:$port,reuseaddr GOPEN:/dev/null & done - Launch a UDP server that close() after each request it received: $ while true; do socat -u -T 0.0001 UDP4-LISTEN:12345,reuseaddr GOPEN:/dev/null; done - Bombard this server with UDP request from another machine (could be also on the same machine): $ while true; do socat -u EXEC:'/bin/echo' UDP4:10.51.33.40:12345 & sleep 0.0001; done - Check these messages in /var/log/messages after 5/10 minutes: Oct 24 09:05:21 flumpe4-qa2 kernel: udp_input(): Using freed inp 0xffffff0df43d7930 inp->inp_refcount 1 inp->inp_ppcb 0 Oct 24 09:05:21 flumpe4-qa2 kernel: udp_pcblist(): Using freed inp 0xffffff0df43d7930 inp->inp_refcount 1 inp->inp_ppcb 0 (inp pointer being the same for udp_input() _and_ udp_pcblist(), _and_ inp->inp_ppcb being NULL) Which is the proof that: udp_input() retrieves an inp pointer 0xffffff0df43d7930 _and_ this inp has been released in_pcbrele() but not deleted because someone had still a reference on it (flag INP_FREED introduced by debug patch) _and_ udp_detach() has been called on this inp (inp->inp_ppcb being NULL) _and_ this is udp_pcblist() that holds this reference on this inp. Without the patch the crash will occur in udp_input(): /* Check inp state */ if ((inp->inp_flags2 & INP_FREED) && (inp->inp_socket == NULL)) { log(LOG_INFO, "udp_input(): Using freed inp %p inp->inp_refcount %d inp->inp_ppcb %p\n", inp, inp->inp_refcount, inp->inp_ppcb); INP_RUNLOCK(inp); goto badunlocked; } up = intoudpcb(inp); // intoudpcb(ip) being ((struct udpcb *)(ip)->inp_ppcb) and inp_ppcb being NULL... if (up->u_tun_func == NULL) { // Panic here in default kernel Below the patch used with previous instructions that highlights and logs this race condition: Index: sys/netinet/in_pcb.c =================================================================== --- sys/netinet/in_pcb.c (revision 32) +++ sys/netinet/in_pcb.c (working copy) @@ -1055,8 +1055,10 @@ INP_WLOCK_ASSERT(inp); inp->inp_refcount--; - if (inp->inp_refcount > 0) + if (inp->inp_refcount > 0) { + inp->inp_flags2 |= INP_FREED; return (0); + } in_pcbfree_internal(inp); return (1); } Index: sys/netinet/in_pcb.h =================================================================== --- sys/netinet/in_pcb.h (revision 32) +++ sys/netinet/in_pcb.h (working copy) @@ -443,6 +443,7 @@ */ #define INP_LLE_VALID 0x00000001 /* cached lle is valid */ #define INP_RT_VALID 0x00000002 /* cached rtentry is valid */ +#define INP_FREED 0x00000004 /* inp no more valid */ #define INPLOOKUP_WILDCARD 1 #define sotoinpcb(so) ((struct inpcb *)(so)->so_pcb) Index: sys/netinet/udp_usrreq.c =================================================================== --- sys/netinet/udp_usrreq.c (revision 32) +++ sys/netinet/udp_usrreq.c (working copy) @@ -624,6 +624,13 @@ INP_RUNLOCK(inp); goto badunlocked; } + /* Check inp state */ + if ((inp->inp_flags2 & INP_FREED) && (inp->inp_socket == NULL)) { + log(LOG_INFO, "udp_input(): Using freed inp %p inp->inp_refcount %d\n", + inp, inp->inp_refcount); + INP_RUNLOCK(inp); + goto badunlocked; + } up = intoudpcb(inp); if (up->u_tun_func == NULL) { udp_append(inp, ip, m, iphlen + sizeof(struct udphdr), &udp_in); @@ -797,6 +804,10 @@ for (i = 0; i < n; i++) { inp = inp_list[i]; INP_WLOCK(inp); + if ((inp->inp_flags2 & INP_FREED) && (inp->inp_socket == NULL)) { + log(LOG_INFO, "udp_pcblist(): Using freed inp %p inp->inp_refcount %d\n", + inp, inp->inp_refcount); + } if (!in_pcbrele(inp)) INP_WUNLOCK(inp); } @@ -1443,6 +1454,7 @@ inp = sotoinpcb(so); inp->inp_vflag |= INP_IPV4; inp->inp_ip_ttl = V_ip_defttl; + inp->inp_flags2 = 0; error = udp_newudpcb(inp); if (error) { I confirm that this race condition is also present in IPv6 UDP code (Not a surprise as FreeBSD UDP v4 and v6 codes are pretty symmetric), and below the stack trace: Fatal trap 12: page fault while in kernel mode cpuid = 7; apic id = 22 fault virtual address = 0x7 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff807b60be stack pointer = 0x28:0xffffffa41c83e510 frame pointer = 0x28:0xffffffa41c83e5a0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq291: ix1:que 7) trap number = 12 panic: page fault cpuid = 7 KDB: stack backtrace: #0 0xffffffff80642b3e at kdb_backtrace+0x5e #1 0xffffffff8060fd57 at panic+0x187 #2 0xffffffff80905990 at trap_fatal+0x290 #3 0xffffffff80905ce1 at trap_pfault+0x201 #4 0xffffffff8090619f at trap+0x3df #5 0xffffffff808ed674 at calltrap+0x8 #6 0xffffffff807b6986 at ip6_savecontrol+0x36 #7 0xffffffff807cd5c0 at udp6_append+0x60 #8 0xffffffff807ce99d at udp6_input+0x63d #9 0xffffffff807b76bf at ip6_input+0xb4f #10 0xffffffff806cb23e at netisr_dispatch_src+0x7e #11 0xffffffff806c12dd at ether_demux+0x14d #12 0xffffffff806c16e7 at ether_input+0x197 #13 0xffffffff806c11ff at ether_demux+0x6f #14 0xffffffff806c16e7 at ether_input+0x197 #15 0xffffffff803e3d8b at ixgbe_rxeof+0x1eb #16 0xffffffff803e4578 at ixgbe_msix_que+0xa8 #17 0xffffffff805e7794 at intr_event_execute_handlers+0x104 -- Julien I confirm this issue is still reproducible in FreeBSD 8.4-BETA1. Joined a smaller patch wrote my Marc to fix it. -- Julien Thanks -- I'll try to look at this tonight. Definitely want to fix this =
before 8.4 ships, if we can -- apologies for the delay :-(.
Robert
On 9 Apr 2013, at 15:51, Charbon, Julien wrote:
> <udp_input_panic_minimal.patch>
Author: rwatson Date: Sun Apr 14 16:25:37 2013 New Revision: 249478 URL: http://svnweb.freebsd.org/changeset/base/249478 Log: FreeBSD 8.0 introduced inpcb reference counting, and FreeBSD 8.1 began using that reference count to protect inpcb stability in udp_pcblist() and other monitoring functions, preventing the inpcb from being garbage collected across potentially sleeping copyout() operations despite the inpcb zone becoming shrinkable. However, this introduced a race condition in which inp->inp_socket() might become NULL as a result of the socket being freed, but before the inpcb we removed from the global list of connections, allowing it to be exposed to a third thread invoking udp_input() or udp6_input() which would try to indirect through inp_socket without testing it for NULL. This might occur with particular regularity on systems that frequently run netstat, or which use SNMP for connection monitoring. Later FreeBSD releases use a different reference/destruction model, but stable/8 remained affected in FreeBSD 8.2 and 8.3; the problem could be spotted on very high-load UDP services, such as top-level name servers. An Errata Note for 8.x branches under continuing support might be appropriate. Regardless, this fix should be merged to releng/8.4 prior to 8.4-RELEASE. PR: 172963 Submitted by: Vincent Miller <vmiller@verisign.com> Submitted by: Julien Charbon <jcharbon@verisign.com> Submitted by: Marc De La Gueronniere <mdelagueronniere@verisign.com> Modified: stable/8/sys/netinet/udp_usrreq.c stable/8/sys/netinet6/udp6_usrreq.c Modified: stable/8/sys/netinet/udp_usrreq.c ============================================================================== --- stable/8/sys/netinet/udp_usrreq.c Sun Apr 14 16:20:25 2013 (r249477) +++ stable/8/sys/netinet/udp_usrreq.c Sun Apr 14 16:25:37 2013 (r249478) @@ -495,6 +495,15 @@ udp_input(struct mbuf *m, int off) INP_RLOCK(inp); /* + * Detached PCBs can linger in the list if someone + * holds a reference. (e.g. udp_pcblist) + */ + if (inp->inp_socket == NULL) { + INP_RUNLOCK(inp); + continue; + } + + /* * Handle socket delivery policy for any-source * and source-specific multicast. [RFC3678] */ @@ -620,6 +629,15 @@ udp_input(struct mbuf *m, int off) */ INP_RLOCK(inp); INP_INFO_RUNLOCK(&V_udbinfo); + + /* + * Detached PCBs can linger in the hash table if someone holds a + * reference. (e.g. udp_pcblist) + */ + if (inp->inp_socket == NULL) { + INP_RUNLOCK(inp); + goto badunlocked; + } if (inp->inp_ip_minttl && inp->inp_ip_minttl > ip->ip_ttl) { INP_RUNLOCK(inp); goto badunlocked; Modified: stable/8/sys/netinet6/udp6_usrreq.c ============================================================================== --- stable/8/sys/netinet6/udp6_usrreq.c Sun Apr 14 16:20:25 2013 (r249477) +++ stable/8/sys/netinet6/udp6_usrreq.c Sun Apr 14 16:25:37 2013 (r249478) @@ -273,6 +273,13 @@ udp6_input(struct mbuf **mp, int *offp, } /* + * Detached PCBs can linger in the list if someone + * holds a reference. (e.g. udp_pcblist) + */ + if (inp->inp_socket == NULL) + continue; + + /* * Handle socket delivery policy for any-source * and source-specific multicast. [RFC3678] */ @@ -396,6 +403,15 @@ udp6_input(struct mbuf **mp, int *offp, } INP_RLOCK(inp); INP_INFO_RUNLOCK(&V_udbinfo); + + /* + * Detached PCBs can linger in the hash table if someone holds a + * reference. (e.g. udp_pcblist) + */ + if (inp->inp_socket == NULL) { + INP_RUNLOCK(inp); + goto badunlocked; + } up = intoudpcb(inp); if (up->u_tun_func == NULL) { udp6_append(inp, m, off, &fromsa); _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org" Hi Julien:
I've committed the patch, with minor style tweaks, to stable/8. I also =
communicated, a few days ago, to re@ that we need to get this fix into =
the next 8.4 release candidate, and they are willing to hold the release =
candidate for the merge. I'll ping re@ and find out how quickly they =
want me to do the releng/8.4 merge. Regardless, it sounds like it will =
successfully make 8.4
Thanks again for your patience over the last few months, and also for =
the assiduous debugging and patch generation!
Robert
On 9 Apr 2013, at 15:51, Charbon, Julien wrote:
>=20
> I confirm this issue is still reproducible in FreeBSD 8.4-BETA1.
> Joined a smaller patch wrote my Marc to fix it.
>=20
> --
> Julien
> <udp_input_panic_minimal.patch>
Author: rwatson Date: Fri Apr 19 21:08:56 2013 New Revision: 249660 URL: http://svnweb.freebsd.org/changeset/base/249660 Log: Merge r249478 from stable/8 to releng/8.4: FreeBSD 8.0 introduced inpcb reference counting, and FreeBSD 8.1 began using that reference count to protect inpcb stability in udp_pcblist() and other monitoring functions, preventing the inpcb from being garbage collected across potentially sleeping copyout() operations despite the inpcb zone becoming shrinkable. However, this introduced a race condition in which inp->inp_socket() might become NULL as a result of the socket being freed, but before the inpcb we removed from the global list of connections, allowing it to be exposed to a third thread invoking udp_input() or udp6_input() which would try to indirect through inp_socket without testing it for NULL. This might occur with particular regularity on systems that frequently run netstat, or which use SNMP for connection monitoring. Later FreeBSD releases use a different reference/destruction model, but stable/8 remained affected in FreeBSD 8.2 and 8.3; the problem could be spotted on very high-load UDP services, such as top-level name servers. An Errata Note for 8.x branches under continuing support might be appropriate. Regardless, this fix should be merged to releng/8.4 prior to 8.4-RELEASE. PR: 172963 Submitted by: Vincent Miller <vmiller@verisign.com> Submitted by: Julien Charbon <jcharbon@verisign.com> Submitted by: Marc De La Gueronniere <mdelagueronniere@verisign.com> Approved by: re (rodrigc) Modified: releng/8.4/sys/netinet/udp_usrreq.c releng/8.4/sys/netinet6/udp6_usrreq.c Directory Properties: releng/8.4/sys/ (props changed) releng/8.4/sys/netinet/ (props changed) releng/8.4/sys/netinet6/ (props changed) Modified: releng/8.4/sys/netinet/udp_usrreq.c ============================================================================== --- releng/8.4/sys/netinet/udp_usrreq.c Fri Apr 19 21:08:21 2013 (r249659) +++ releng/8.4/sys/netinet/udp_usrreq.c Fri Apr 19 21:08:56 2013 (r249660) @@ -495,6 +495,15 @@ udp_input(struct mbuf *m, int off) INP_RLOCK(inp); /* + * Detached PCBs can linger in the list if someone + * holds a reference. (e.g. udp_pcblist) + */ + if (inp->inp_socket == NULL) { + INP_RUNLOCK(inp); + continue; + } + + /* * Handle socket delivery policy for any-source * and source-specific multicast. [RFC3678] */ @@ -620,6 +629,15 @@ udp_input(struct mbuf *m, int off) */ INP_RLOCK(inp); INP_INFO_RUNLOCK(&V_udbinfo); + + /* + * Detached PCBs can linger in the hash table if someone holds a + * reference. (e.g. udp_pcblist) + */ + if (inp->inp_socket == NULL) { + INP_RUNLOCK(inp); + goto badunlocked; + } if (inp->inp_ip_minttl && inp->inp_ip_minttl > ip->ip_ttl) { INP_RUNLOCK(inp); goto badunlocked; Modified: releng/8.4/sys/netinet6/udp6_usrreq.c ============================================================================== --- releng/8.4/sys/netinet6/udp6_usrreq.c Fri Apr 19 21:08:21 2013 (r249659) +++ releng/8.4/sys/netinet6/udp6_usrreq.c Fri Apr 19 21:08:56 2013 (r249660) @@ -273,6 +273,13 @@ udp6_input(struct mbuf **mp, int *offp, } /* + * Detached PCBs can linger in the list if someone + * holds a reference. (e.g. udp_pcblist) + */ + if (inp->inp_socket == NULL) + continue; + + /* * Handle socket delivery policy for any-source * and source-specific multicast. [RFC3678] */ @@ -396,6 +403,15 @@ udp6_input(struct mbuf **mp, int *offp, } INP_RLOCK(inp); INP_INFO_RUNLOCK(&V_udbinfo); + + /* + * Detached PCBs can linger in the hash table if someone holds a + * reference. (e.g. udp_pcblist) + */ + if (inp->inp_socket == NULL) { + INP_RUNLOCK(inp); + goto badunlocked; + } up = intoudpcb(inp); if (up->u_tun_func == NULL) { udp6_append(inp, m, off, &fromsa); _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org" ---------- Forwarded message ---------- Date: Fri, 19 Apr 2013 21:08:56 +0000 (UTC) From: Robert Watson <rwatson@FreeBSD.org> To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-releng@freebsd.org Subject: svn commit: r249660 - in releng/8.4/sys: netinet netinet6 Author: rwatson Date: Fri Apr 19 21:08:56 2013 New Revision: 249660 URL: http://svnweb.freebsd.org/changeset/base/249660 Log: Merge r249478 from stable/8 to releng/8.4: FreeBSD 8.0 introduced inpcb reference counting, and FreeBSD 8.1 began using that reference count to protect inpcb stability in udp_pcblist() and other monitoring functions, preventing the inpcb from being garbage collected across potentially sleeping copyout() operations despite the inpcb zone becoming shrinkable. However, this introduced a race condition in which inp->inp_socket() might become NULL as a result of the socket being freed, but before the inpcb we removed from the global list of connections, allowing it to be exposed to a third thread invoking udp_input() or udp6_input() which would try to indirect through inp_socket without testing it for NULL. This might occur with particular regularity on systems that frequently run netstat, or which use SNMP for connection monitoring. Later FreeBSD releases use a different reference/destruction model, but stable/8 remained affected in FreeBSD 8.2 and 8.3; the problem could be spotted on very high-load UDP services, such as top-level name servers. An Errata Note for 8.x branches under continuing support might be appropriate. Regardless, this fix should be merged to releng/8.4 prior to 8.4-RELEASE. PR: 172963 Submitted by: Vincent Miller <vmiller@verisign.com> Submitted by: Julien Charbon <jcharbon@verisign.com> Submitted by: Marc De La Gueronniere <mdelagueronniere@verisign.com> Approved by: re (rodrigc) Modified: releng/8.4/sys/netinet/udp_usrreq.c releng/8.4/sys/netinet6/udp6_usrreq.c Directory Properties: releng/8.4/sys/ (props changed) releng/8.4/sys/netinet/ (props changed) releng/8.4/sys/netinet6/ (props changed) Modified: releng/8.4/sys/netinet/udp_usrreq.c ============================================================================== --- releng/8.4/sys/netinet/udp_usrreq.c Fri Apr 19 21:08:21 2013 (r249659) +++ releng/8.4/sys/netinet/udp_usrreq.c Fri Apr 19 21:08:56 2013 (r249660) @@ -495,6 +495,15 @@ udp_input(struct mbuf *m, int off) INP_RLOCK(inp); /* + * Detached PCBs can linger in the list if someone + * holds a reference. (e.g. udp_pcblist) + */ + if (inp->inp_socket == NULL) { + INP_RUNLOCK(inp); + continue; + } + + /* * Handle socket delivery policy for any-source * and source-specific multicast. [RFC3678] */ @@ -620,6 +629,15 @@ udp_input(struct mbuf *m, int off) */ INP_RLOCK(inp); INP_INFO_RUNLOCK(&V_udbinfo); + + /* + * Detached PCBs can linger in the hash table if someone holds a + * reference. (e.g. udp_pcblist) + */ + if (inp->inp_socket == NULL) { + INP_RUNLOCK(inp); + goto badunlocked; + } if (inp->inp_ip_minttl && inp->inp_ip_minttl > ip->ip_ttl) { INP_RUNLOCK(inp); goto badunlocked; Modified: releng/8.4/sys/netinet6/udp6_usrreq.c ============================================================================== --- releng/8.4/sys/netinet6/udp6_usrreq.c Fri Apr 19 21:08:21 2013 (r249659) +++ releng/8.4/sys/netinet6/udp6_usrreq.c Fri Apr 19 21:08:56 2013 (r249660) @@ -273,6 +273,13 @@ udp6_input(struct mbuf **mp, int *offp, } /* + * Detached PCBs can linger in the list if someone + * holds a reference. (e.g. udp_pcblist) + */ + if (inp->inp_socket == NULL) + continue; + + /* * Handle socket delivery policy for any-source * and source-specific multicast. [RFC3678] */ @@ -396,6 +403,15 @@ udp6_input(struct mbuf **mp, int *offp, } INP_RLOCK(inp); INP_INFO_RUNLOCK(&V_udbinfo); + + /* + * Detached PCBs can linger in the hash table if someone holds a + * reference. (e.g. udp_pcblist) + */ + if (inp->inp_socket == NULL) { + INP_RUNLOCK(inp); + goto badunlocked; + } up = intoudpcb(inp); if (up->u_tun_func == NULL) { udp6_append(inp, m, off, &fromsa); State Changed From-To: open->closed Close PR -- patch now merged to releng/8 and in particular the forthcoming 8.x release. Please let me know if you encounter any further issues -- and thanks for submitting the patch! |