Summary: | [tcp] TCP stack lock contention with short-lived connections | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | jcharbon | ||||||
Component: | kern | Assignee: | Julien Charbon <jch> | ||||||
Status: | Closed FIXED | ||||||||
Severity: | Affects Some People | CC: | chris, jch, jhb | ||||||
Priority: | Normal | Flags: | jch:
mfc-stable11+
jch: mfc-stable10+ |
||||||
Version: | 8.4-RELEASE | ||||||||
Hardware: | Any | ||||||||
OS: | Any | ||||||||
Attachments: |
|
Description
jcharbon
2013-11-04 13:30:02 UTC
Responsible Changed From-To: freebsd-bugs->freebsd-net Over to networking group Joined a first patch that removes INP_INFO lock from tcp_usr_accept(): This changes simply follows the advice made in corresponding code comment: "A better fix would prevent the socket from being placed in the listen queue until all fields are fully initialized." For more technical details, check the comment in related change below: http://svnweb.freebsd.org/base?view=revision&revision=175612 With this patch applied we see no regressions and a performance improvement of ~5% i.e with 9.2 vanilla kernel: 52k TCP Queries Per Second, with 9.2 + joined patch: 55k TCP QPS. -- Julien Author: gnn Date: Tue Jan 28 20:28:32 2014 New Revision: 261242 URL: http://svnweb.freebsd.org/changeset/base/261242 Log: Decrease lock contention within the TCP accept case by removing the INP_INFO lock from tcp_usr_accept. As the PR/patch states this was following the advice already in the code. See the PR below for a full disucssion of this change and its measured effects. PR: 183659 Submitted by: Julian Charbon Reviewed by: jhb Modified: head/sys/netinet/tcp_syncache.c head/sys/netinet/tcp_usrreq.c Modified: head/sys/netinet/tcp_syncache.c ============================================================================== --- head/sys/netinet/tcp_syncache.c Tue Jan 28 19:12:31 2014 (r261241) +++ head/sys/netinet/tcp_syncache.c Tue Jan 28 20:28:32 2014 (r261242) @@ -682,7 +682,7 @@ syncache_socket(struct syncache *sc, str * connection when the SYN arrived. If we can't create * the connection, abort it. */ - so = sonewconn(lso, SS_ISCONNECTED); + so = sonewconn(lso, 0); if (so == NULL) { /* * Drop the connection; we will either send a RST or @@ -922,6 +922,8 @@ syncache_socket(struct syncache *sc, str INP_WUNLOCK(inp); + soisconnected(so); + TCPSTAT_INC(tcps_accepts); return (so); Modified: head/sys/netinet/tcp_usrreq.c ============================================================================== --- head/sys/netinet/tcp_usrreq.c Tue Jan 28 19:12:31 2014 (r261241) +++ head/sys/netinet/tcp_usrreq.c Tue Jan 28 20:28:32 2014 (r261242) @@ -610,13 +610,6 @@ out: /* * Accept a connection. Essentially all the work is done at higher levels; * just return the address of the peer, storing through addr. - * - * The rationale for acquiring the tcbinfo lock here is somewhat complicated, - * and is described in detail in the commit log entry for r175612. Acquiring - * it delays an accept(2) racing with sonewconn(), which inserts the socket - * before the inpcb address/port fields are initialized. A better fix would - * prevent the socket from being placed in the listen queue until all fields - * are fully initialized. */ static int tcp_usr_accept(struct socket *so, struct sockaddr **nam) @@ -633,7 +626,6 @@ tcp_usr_accept(struct socket *so, struct inp = sotoinpcb(so); KASSERT(inp != NULL, ("tcp_usr_accept: inp == NULL")); - INP_INFO_RLOCK(&V_tcbinfo); INP_WLOCK(inp); if (inp->inp_flags & (INP_TIMEWAIT | INP_DROPPED)) { error = ECONNABORTED; @@ -653,7 +645,6 @@ tcp_usr_accept(struct socket *so, struct out: TCPDEBUG2(PRU_ACCEPT); INP_WUNLOCK(inp); - INP_INFO_RUNLOCK(&V_tcbinfo); if (error == 0) *nam = in_sockaddr(port, &addr); return error; _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org" State Changed From-To: open->patched Patched with commit 261242 Just a follow-up that updates lock profiling results with short-lived TCP connection traffic on FreeBSD-10.0 RELEASE: (Previous results were made on FreeBSD-9.2 RELEASE) o FreeBSD-10 RELEASE: # sysctl debug.lock.prof.stats | head -2; sysctl debug.lock.prof.stats | sort -n -k 4 -r | head -5 debug.lock.prof.stats: max wait_max total wait_total count avg wait_avg cnt_hold cnt_lock name 37 321900 3049892 13033648 610019 4 21 0 588013 sys/netinet/tcp_input.c:778 (rw:tcp) tcp_input() (SYN|FIN|RST) 51 115462 3240265 12270984 553157 5 22 0 545293 sys/netinet/tcp_input.c:1013 (rw:tcp) tcp_input() (state != ESTABLISHED) 29 62577 1170617 8754815 305885 3 28 0 296845 sys/netinet/tcp_usrreq.c:728 (rw:tcp) tcp_usr_close() 6 62645 146544 8548857 292058 0 29 0 283587 sys/netinet/tcp_usrreq.c:984 (rw:tcp) tcp_usr_shutdown() 11 62595 198811 6525067 309009 0 21 0 304522 sys/netinet/tcp_usrreq.c:635 (rw:tcp) tcp_usr_accept() - If lock contention spots moved a little between 9.2 and 10.0, nothing major as the top 5 still belongs to (rw:tcp) lock (a.k.a. TCP INP_INFO). o FreeBSD-10 RELEASE + PCBGROUP kernel option (by popular demand): # sysctl debug.lock.prof.stats | head -2; sysctl debug.lock.prof.stats | sort -n -k 4 -r | head -5 debug.lock.prof.stats: max wait_max total wait_total count avg wait_avg cnt_hold cnt_lock name 58 84250 2970633 13154832 622401 4 21 0 598964 sys/netinet/tcp_input.c:778 (rw:tcp) tcp_input() (SYN|FIN|RST) 47 224326 3375328 12945466 562451 6 23 0 554567 sys/netinet/tcp_input.c:1013 (rw:tcp) tcp_input() (state != ESTABLISHED) 22 84332 1193078 9693951 311555 3 31 0 302420 sys/netinet/tcp_usrreq.c:728 (rw:tcp) tcp_usr_close() 6 84307 151411 9137383 298120 0 30 0 289496 sys/netinet/tcp_usrreq.c:984 (rw:tcp) tcp_usr_shutdown() 15 84351 201705 6504520 314353 0 20 0 310270 sys/netinet/tcp_usrreq.c:635 (rw:tcp) tcp_usr_accept() - No changes at all in first ranks by using PCBGROUP option on FreeBSD-10 RELEASE. I have indeed checked that PCBGROUP was in use as at #36 rank there is the specific pcbgroup lock: 11 9 289817 4815 1505626 0 0 0 16054 sys/netinet/in_pcb.c:1530 (sleep mutex:pcbgroup) o FreeBSD-10 RELEASE + current lock mitigation patches [1][2]: # sysctl debug.lock.prof.stats | head -2; sysctl debug.lock.prof.stats | sort -n -k 4 -r | head -20 debug.lock.prof.stats: max wait_max total wait_total count avg wait_avg cnt_hold cnt_lock name 29 297 3781629 13476466 734686 5 18 0 715214 sys/netinet/tcp_input.c:778 (rw:tcp) tcp_input() (SYN|FIN|RST) 35 287 3817278 12301410 672907 5 18 0 669324 sys/netinet/tcp_input.c:1013 (rw:tcp) tcp_input() (state != ESTABLISHED) 18 170 1392058 2494823 367131 3 6 0 357888 sys/netinet/tcp_usrreq.c:719 (rw:tcp) tcp_usr_shutdown() 7 141 182209 2433120 350488 0 6 0 344878 sys/netinet/tcp_usrreq.c:975 (rw:tcp) tcp_usr_close() 10 259 26786 933073 38101 0 24 0 37624 sys/netinet/tcp_timer.c:493 (rw:tcp) tcp_timer_rexmt() - No more tcp_usr_accept() (expected) o Global results: Maximum short-lived TCP connection rate without dropping a single packet: - FreeBSD 10.0 RELEASE: 40.0k - FreeBSD 10.0 RELEASE + PCBGROUP: 40.0k - FreeBSD 10.0 RELEASE + patches: 56.8k [1] Decrease lock contention within the TCP accept case by removing the INP_INFO lock from tcp_usr_accept. http://svnweb.freebsd.org/base?view=revision&revision=261242 [2] tw-clock-v2.patch attached in: http://lists.freebsd.org/pipermail/freebsd-net/2014-March/038124.html -- Julien Just a follow-up that updates lock profiling results with short-lived TCP connection traffic on FreeBSD-10.0 RELEASE: (Previous results were made on FreeBSD-9.2 RELEASE) o FreeBSD-10 RELEASE: # sysctl debug.lock.prof.stats | head -2; sysctl debug.lock.prof.stats | sort -n -k 4 -r | head -5 debug.lock.prof.stats: max wait_max total wait_total count avg wait_avg cnt_hold cnt_lock name 37 321900 3049892 13033648 610019 4 21 0 588013 sys/netinet/tcp_input.c:778 (rw:tcp) tcp_input() (SYN|FIN|RST) 51 115462 3240265 12270984 553157 5 22 0 545293 sys/netinet/tcp_input.c:1013 (rw:tcp) tcp_input() (state != ESTABLISHED) 29 62577 1170617 8754815 305885 3 28 0 296845 sys/netinet/tcp_usrreq.c:728 (rw:tcp) tcp_usr_close() 6 62645 146544 8548857 292058 0 29 0 283587 sys/netinet/tcp_usrreq.c:984 (rw:tcp) tcp_usr_shutdown() 11 62595 198811 6525067 309009 0 21 0 304522 sys/netinet/tcp_usrreq.c:635 (rw:tcp) tcp_usr_accept() - If lock contention spots moved a little between 9.2 and 10.0, nothing major as the top 5 still belongs to (rw:tcp) lock (a.k.a. TCP INP_INFO). o FreeBSD-10 RELEASE + PCBGROUP kernel option (by popular demand): # sysctl debug.lock.prof.stats | head -2; sysctl debug.lock.prof.stats | sort -n -k 4 -r | head -5 debug.lock.prof.stats: max wait_max total wait_total count avg wait_avg cnt_hold cnt_lock name 58 84250 2970633 13154832 622401 4 21 0 598964 sys/netinet/tcp_input.c:778 (rw:tcp) tcp_input() (SYN|FIN|RST) 47 224326 3375328 12945466 562451 6 23 0 554567 sys/netinet/tcp_input.c:1013 (rw:tcp) tcp_input() (state != ESTABLISHED) 22 84332 1193078 9693951 311555 3 31 0 302420 sys/netinet/tcp_usrreq.c:728 (rw:tcp) tcp_usr_close() 6 84307 151411 9137383 298120 0 30 0 289496 sys/netinet/tcp_usrreq.c:984 (rw:tcp) tcp_usr_shutdown() 15 84351 201705 6504520 314353 0 20 0 310270 sys/netinet/tcp_usrreq.c:635 (rw:tcp) tcp_usr_accept() - No changes at all in first ranks by using PCBGROUP option on FreeBSD-10 RELEASE. I have indeed checked that PCBGROUP was in use as at #36 rank there is the specific pcbgroup lock: 11 9 289817 4815 1505626 0 0 0 16054 sys/netinet/in_pcb.c:1530 (sleep mutex:pcbgroup) o FreeBSD-10 RELEASE + current lock mitigation patches [1][2]: # sysctl debug.lock.prof.stats | head -2; sysctl debug.lock.prof.stats | sort -n -k 4 -r | head -20 debug.lock.prof.stats: max wait_max total wait_total count avg wait_avg cnt_hold cnt_lock name 29 297 3781629 13476466 734686 5 18 0 715214 sys/netinet/tcp_input.c:778 (rw:tcp) tcp_input() (SYN|FIN|RST) 35 287 3817278 12301410 672907 5 18 0 669324 sys/netinet/tcp_input.c:1013 (rw:tcp) tcp_input() (state != ESTABLISHED) 18 170 1392058 2494823 367131 3 6 0 357888 sys/netinet/tcp_usrreq.c:719 (rw:tcp) tcp_usr_shutdown() 7 141 182209 2433120 350488 0 6 0 344878 sys/netinet/tcp_usrreq.c:975 (rw:tcp) tcp_usr_close() 10 259 26786 933073 38101 0 24 0 37624 sys/netinet/tcp_timer.c:493 (rw:tcp) tcp_timer_rexmt() - No more tcp_usr_accept() (expected) o Global results: Maximum short-lived TCP connection rate without dropping a single packet: - FreeBSD 10.0 RELEASE: 40.0k - FreeBSD 10.0 RELEASE + PCBGROUP: 40.0k - FreeBSD 10.0 RELEASE + patches: 56.8k [1] Decrease lock contention within the TCP accept case by removing the INP_INFO lock from tcp_usr_accept. http://svnweb.freebsd.org/base?view=revision&revision=261242 [2] tw-clock-v2.patch attached in: http://lists.freebsd.org/pipermail/freebsd-net/2014-March/038124.html -- Julien Further refinements to the original patch are currently in review. Those will need to go in before this is merged to 10. A commit references this bug: Author: jch Date: Mon Aug 3 12:13:58 UTC 2015 New revision: 286227 URL: https://svnweb.freebsd.org/changeset/base/286227 Log: Decompose TCP INP_INFO lock to increase short-lived TCP connections scalability: - The existing TCP INP_INFO lock continues to protect the global inpcb list stability during full list traversal (e.g. tcp_pcblist()). - A new INP_LIST lock protects inpcb list actual modifications (inp allocation and free) and inpcb global counters. It allows to use TCP INP_INFO_RLOCK lock in critical paths (e.g. tcp_input()) and INP_INFO_WLOCK only in occasional operations that walk all connections. PR: 183659 Differential Revision: https://reviews.freebsd.org/D2599 Reviewed by: jhb, adrian Tested by: adrian, nitroboost-gmail.com Sponsored by: Verisign, Inc. Changes: head/sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c head/sys/dev/cxgb/ulp/tom/cxgb_listen.c head/sys/dev/cxgbe/tom/t4_connect.c head/sys/dev/cxgbe/tom/t4_cpl_io.c head/sys/dev/cxgbe/tom/t4_listen.c head/sys/netinet/in_pcb.c head/sys/netinet/in_pcb.h head/sys/netinet/tcp_input.c head/sys/netinet/tcp_subr.c head/sys/netinet/tcp_syncache.c head/sys/netinet/tcp_timer.c head/sys/netinet/tcp_timewait.c head/sys/netinet/tcp_usrreq.c head/sys/netinet/toecore.c head/sys/netinet6/in6_pcb.c Fixed in 11.0-CURRENT A commit references this bug: Author: jch Date: Mon Jul 18 08:20:32 UTC 2016 New revision: 302995 URL: https://svnweb.freebsd.org/changeset/base/302995 Log: MFC r261242: Decrease lock contention within the TCP accept case by removing the INP_INFO lock from tcp_usr_accept. As the PR/patch states this was following the advice already in the code. See the PR below for a full discussion of this change and its measured effects. PR: 183659 Submitted by: Julien Charbon Reviewed by: jhb Changes: _U stable/10/ stable/10/sys/netinet/tcp_syncache.c stable/10/sys/netinet/tcp_usrreq.c A commit references this bug: Author: jch Date: Thu Nov 24 14:48:47 UTC 2016 New revision: 309108 URL: https://svnweb.freebsd.org/changeset/base/309108 Log: MFC r286227, r286443: r286227: Decompose TCP INP_INFO lock to increase short-lived TCP connections scalability: - The existing TCP INP_INFO lock continues to protect the global inpcb list stability during full list traversal (e.g. tcp_pcblist()). - A new INP_LIST lock protects inpcb list actual modifications (inp allocation and free) and inpcb global counters. It allows to use TCP INP_INFO_RLOCK lock in critical paths (e.g. tcp_input()) and INP_INFO_WLOCK only in occasional operations that walk all connections. PR: 183659 Differential Revision: https://reviews.freebsd.org/D2599 Reviewed by: jhb, adrian Tested by: adrian, nitroboost-gmail.com Sponsored by: Verisign, Inc. r286443: Fix a kernel assertion issue introduced with r286227: Avoid too strict INP_INFO_RLOCK_ASSERT checks due to tcp_notify() being called from in6_pcbnotify(). Reported by: Larry Rosenman <ler@lerctr.org> Submitted by: markj, jch Changes: stable/10/sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c stable/10/sys/dev/cxgb/ulp/tom/cxgb_listen.c stable/10/sys/dev/cxgbe/tom/t4_connect.c stable/10/sys/dev/cxgbe/tom/t4_cpl_io.c stable/10/sys/dev/cxgbe/tom/t4_listen.c stable/10/sys/netinet/in_pcb.c stable/10/sys/netinet/in_pcb.h stable/10/sys/netinet/tcp_input.c stable/10/sys/netinet/tcp_subr.c stable/10/sys/netinet/tcp_syncache.c stable/10/sys/netinet/tcp_timer.c stable/10/sys/netinet/tcp_timewait.c stable/10/sys/netinet/tcp_usrreq.c stable/10/sys/netinet/toecore.c stable/10/sys/netinet6/in6_pcb.c |