Bug 227760 - Race condition in syncache_lookup and syncache_insert in TCP Handshake
Summary: Race condition in syncache_lookup and syncache_insert in TCP Handshake
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Some People
Assignee: Navdeep Parhar
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-04-25 10:32 UTC by Harsh Jain
Modified: 2019-04-30 19:01 UTC (History)
6 users (show)

See Also:
np: mfc-stable12+
np: mfc-stable11+
np: mfc-stable10-


Attachments
server_100.sh (2.33 KB, application/x-shellscript)
2018-04-25 10:32 UTC, Harsh Jain
no flags Details
bash_client.sh (2.71 KB, application/x-shellscript)
2018-04-25 10:34 UTC, Harsh Jain
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Harsh Jain 2018-04-25 10:32:26 UTC
Created attachment 192796 [details]
server_100.sh

Some of TCP connection Resets when we try to establish multiple connections with Chelsio TOE Driver.

Test Setup Details:

2 Machines connected Back 2 Back with Chelsio T6 Adapter.

OS : FreeBSD-current.

Application Used: openssl s_time/s_server. Each s_server can handle single client(s_time) connection.


Steps to Re-produce the issue.

Server:

1) kldload if_cxgbe
2) kldload t4_tom
3) ifconfig cc0 1.0.0.9/24 toe up
4) sh server_100.sh (file attached). It will start ~200 s_server connection.


Client:

1) kldload if_cxgbe
2) kldload t4_tom
3) ifconfig cc0 1.0.0.35 toe up
4) ./bash_client_100.sh(file attached). It will submit http GET request for File transfer and repeats the same for given time interval.


Problem:

After some time some of connections will reset.
On debugging We observed that syncache_lookup() in syncache_expand() fails while handling ack(3rd hadshake message) from client. On failure following code snippet in syncache_expand sends reset

if (!V_tcp_syncookiesonly &&
                    sch->sch_last_overflow < time_uptime - SYNCOOKIE_LIFETIME) {
                        SCH_UNLOCK(sch);
                        if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
                                log(LOG_DEBUG, "%s; %s: Spurious ACK, "
                                    "segment rejected (no syncache entry)\n",
                                    s, __func__);
                        goto failed;
                }

Detailed steps:

Client sends SYNC to server ==> 1 server in syncache_add calls syncache_respond.
                                2 syncache_respond will send SYNC+ACK to client
                            <== SYNC+ACK 
                                3 thread handling SYNC will context switch                       
                                  because of that syncache_insert()will not get
                                  a chance to add entry. 
                        ACK ==> another thread in server will start handling
                                ack message. During syncache_expand lookup will
                                fail and RESET will be sent to client.

Following workarounds tried avoid this 

1) Setting sysctl net.inet.tcp.syncookies_only=1 avoids this issue.
2) Following hack to make sure insert happens before sending SYNC+ACK also works.

@@ -1558,17 +1575,23 @@ syncache_add(struct in_conninfo *inc, struct tcpopt *to,
 struct tcphdr *th,
--More--(byte 11894)        /*
         * Do a standard 3-way handshake.
         */
+       if (V_tcp_syncookies && V_tcp_syncookiesonly && sc != &scs) {
+               // compilation error
+               rv = rv +1;
+               rv = rv - 1;
+       }else if (sc != &scs)
+               syncache_insert(sc, sch);   /* locks and unlocks sch */
+
        if (syncache_respond(sc, sch, 0, m) == 0) {
                if (V_tcp_syncookies && V_tcp_syncookiesonly && sc != &scs)
                        syncache_free(sc);
-               else if (sc != &scs)
-                       syncache_insert(sc, sch);   /* locks and unlocks sch */
-               TCPSTAT_INC(tcps_sndacks);
-               TCPSTAT_INC(tcps_sndtotal);
        } else {
-               if (sc != &scs)
-                       syncache_free(sc);
-               TCPSTAT_INC(tcps_sc_dropped);
+                if (sc != &scs) {
+                       SCH_LOCK(sch);
+                       syncache_drop(sc, sch);   /* locks and unlocks sch */
+                       SCH_UNLOCK(sch);
+                       TCPSTAT_INC(tcps_sc_dropped);
+               }
        }


netstat -sp tcp at the time of issue

42824 syncache entries added
                12 retransmitted
                0 dupsyn
                0 dropped
                42820 completed
                0 bucket overflow
                0 cache overflow
                0 reset
                4 stale
                0 aborted
                0 badack
                0 unreach
                0 zone failures

dmesg log with "sysctl net.inet.tcp.log_debug=1"

TCP: [1.0.0.35]:46690 to [1.0.0.9]:1021 tcpflags 0x10<ACK>; syncache_expand: Spurious ACK, segment rejected (no syncache ent     ry)
TCP: [1.0.0.35]:46690 to [1.0.0.9]:1021; syncache_timer: Response timeout, retransmitting (1) SYN|ACK
TCP: [1.0.0.35]:46690 to [1.0.0.9]:1021; syncache_timer: Response timeout, retransmitting (2) SYN|ACK
TCP: [1.0.0.35]:46690 to [1.0.0.9]:1021; syncache_timer: Response timeout, retransmitting (3) SYN|ACK
Comment 1 Harsh Jain 2018-04-25 10:34:18 UTC
Created attachment 192797 [details]
bash_client.sh
Comment 2 commit-hook freebsd_committer freebsd_triage 2018-12-19 01:37:23 UTC
A commit references this bug:

Author: np
Date: Wed Dec 19 01:37:01 UTC 2018
New revision: 342208
URL: https://svnweb.freebsd.org/changeset/base/342208

Log:
  cxgbe/t4_tom: fixes for issues on the passive open side.

  - Fix PR 227760 by getting the TOE to respond to the SYN after the call
    to toe_syncache_add, not during it.  The kernel syncache code calls
    syncache_respond just before syncache_insert.  If the ACK to the
    syncache_respond is processed in another thread it may run before the
    syncache_insert and won't find the entry.  Note that this affects only
    t4_tom because it's the only driver trying to insert and expand
    syncache entries from different threads.

  - Do not leak resources if an embryonic connection terminates at
    SYN_RCVD because of L2 lookup failures.

  - Retire lctx->synq and associated code because there is never a need to
    walk the list of embryonic connections associated with a listener.
    The per-tid state is still called a synq entry in the driver even
    though the synq itself is now gone.

  PR:		227760
  MFC after:	2 weeks
  Sponsored by:	Chelsio Communications

Changes:
  head/sys/dev/cxgbe/tom/t4_connect.c
  head/sys/dev/cxgbe/tom/t4_cpl_io.c
  head/sys/dev/cxgbe/tom/t4_listen.c
  head/sys/dev/cxgbe/tom/t4_tom.c
  head/sys/dev/cxgbe/tom/t4_tom.h
Comment 3 commit-hook freebsd_committer freebsd_triage 2019-03-27 21:30:44 UTC
A commit references this bug:

Author: np
Date: Wed Mar 27 21:29:46 UTC 2019
New revision: 345605
URL: https://svnweb.freebsd.org/changeset/base/345605

Log:
  MFC r342208:

  cxgbe/t4_tom: fixes for issues on the passive open side.

  - Fix PR 227760 by getting the TOE to respond to the SYN after the call
    to toe_syncache_add, not during it.  The kernel syncache code calls
    syncache_respond just before syncache_insert.  If the ACK to the
    syncache_respond is processed in another thread it may run before the
    syncache_insert and won't find the entry.  Note that this affects only
    t4_tom because it's the only driver trying to insert and expand
    syncache entries from different threads.

  - Do not leak resources if an embryonic connection terminates at
    SYN_RCVD because of L2 lookup failures.

  - Retire lctx->synq and associated code because there is never a need to
    walk the list of embryonic connections associated with a listener.
    The per-tid state is still called a synq entry in the driver even
    though the synq itself is now gone.

  PR:		227760
  Sponsored by:	Chelsio Communications

Changes:
_U  stable/12/
  stable/12/sys/dev/cxgbe/tom/t4_connect.c
  stable/12/sys/dev/cxgbe/tom/t4_cpl_io.c
  stable/12/sys/dev/cxgbe/tom/t4_listen.c
  stable/12/sys/dev/cxgbe/tom/t4_tom.c
  stable/12/sys/dev/cxgbe/tom/t4_tom.h
Comment 4 commit-hook freebsd_committer freebsd_triage 2019-04-30 18:03:27 UTC
A commit references this bug:

Author: np
Date: Tue Apr 30 18:03:18 UTC 2019
New revision: 346970
URL: https://svnweb.freebsd.org/changeset/base/346970

Log:
  MFC r342208:

  cxgbe/t4_tom: fixes for issues on the passive open side.

  - Fix PR 227760 by getting the TOE to respond to the SYN after the call
    to toe_syncache_add, not during it.  The kernel syncache code calls
    syncache_respond just before syncache_insert.  If the ACK to the
    syncache_respond is processed in another thread it may run before the
    syncache_insert and won't find the entry.  Note that this affects only
    t4_tom because it's the only driver trying to insert and expand
    syncache entries from different threads.

  - Do not leak resources if an embryonic connection terminates at
    SYN_RCVD because of L2 lookup failures.

  - Retire lctx->synq and associated code because there is never a need to
    walk the list of embryonic connections associated with a listener.
    The per-tid state is still called a synq entry in the driver even
    though the synq itself is now gone.

  PR:		227760
  Sponsored by:	Chelsio Communications

Changes:
_U  stable/11/
  stable/11/sys/dev/cxgbe/tom/t4_connect.c
  stable/11/sys/dev/cxgbe/tom/t4_cpl_io.c
  stable/11/sys/dev/cxgbe/tom/t4_listen.c
  stable/11/sys/dev/cxgbe/tom/t4_tom.c
  stable/11/sys/dev/cxgbe/tom/t4_tom.h