Created attachment 192796 [details] server_100.sh Some of TCP connection Resets when we try to establish multiple connections with Chelsio TOE Driver. Test Setup Details: 2 Machines connected Back 2 Back with Chelsio T6 Adapter. OS : FreeBSD-current. Application Used: openssl s_time/s_server. Each s_server can handle single client(s_time) connection. Steps to Re-produce the issue. Server: 1) kldload if_cxgbe 2) kldload t4_tom 3) ifconfig cc0 1.0.0.9/24 toe up 4) sh server_100.sh (file attached). It will start ~200 s_server connection. Client: 1) kldload if_cxgbe 2) kldload t4_tom 3) ifconfig cc0 1.0.0.35 toe up 4) ./bash_client_100.sh(file attached). It will submit http GET request for File transfer and repeats the same for given time interval. Problem: After some time some of connections will reset. On debugging We observed that syncache_lookup() in syncache_expand() fails while handling ack(3rd hadshake message) from client. On failure following code snippet in syncache_expand sends reset if (!V_tcp_syncookiesonly && sch->sch_last_overflow < time_uptime - SYNCOOKIE_LIFETIME) { SCH_UNLOCK(sch); if ((s = tcp_log_addrs(inc, th, NULL, NULL))) log(LOG_DEBUG, "%s; %s: Spurious ACK, " "segment rejected (no syncache entry)\n", s, __func__); goto failed; } Detailed steps: Client sends SYNC to server ==> 1 server in syncache_add calls syncache_respond. 2 syncache_respond will send SYNC+ACK to client <== SYNC+ACK 3 thread handling SYNC will context switch because of that syncache_insert()will not get a chance to add entry. ACK ==> another thread in server will start handling ack message. During syncache_expand lookup will fail and RESET will be sent to client. Following workarounds tried avoid this 1) Setting sysctl net.inet.tcp.syncookies_only=1 avoids this issue. 2) Following hack to make sure insert happens before sending SYNC+ACK also works. @@ -1558,17 +1575,23 @@ syncache_add(struct in_conninfo *inc, struct tcpopt *to, struct tcphdr *th, --More--(byte 11894) /* * Do a standard 3-way handshake. */ + if (V_tcp_syncookies && V_tcp_syncookiesonly && sc != &scs) { + // compilation error + rv = rv +1; + rv = rv - 1; + }else if (sc != &scs) + syncache_insert(sc, sch); /* locks and unlocks sch */ + if (syncache_respond(sc, sch, 0, m) == 0) { if (V_tcp_syncookies && V_tcp_syncookiesonly && sc != &scs) syncache_free(sc); - else if (sc != &scs) - syncache_insert(sc, sch); /* locks and unlocks sch */ - TCPSTAT_INC(tcps_sndacks); - TCPSTAT_INC(tcps_sndtotal); } else { - if (sc != &scs) - syncache_free(sc); - TCPSTAT_INC(tcps_sc_dropped); + if (sc != &scs) { + SCH_LOCK(sch); + syncache_drop(sc, sch); /* locks and unlocks sch */ + SCH_UNLOCK(sch); + TCPSTAT_INC(tcps_sc_dropped); + } } netstat -sp tcp at the time of issue 42824 syncache entries added 12 retransmitted 0 dupsyn 0 dropped 42820 completed 0 bucket overflow 0 cache overflow 0 reset 4 stale 0 aborted 0 badack 0 unreach 0 zone failures dmesg log with "sysctl net.inet.tcp.log_debug=1" TCP: [1.0.0.35]:46690 to [1.0.0.9]:1021 tcpflags 0x10<ACK>; syncache_expand: Spurious ACK, segment rejected (no syncache ent ry) TCP: [1.0.0.35]:46690 to [1.0.0.9]:1021; syncache_timer: Response timeout, retransmitting (1) SYN|ACK TCP: [1.0.0.35]:46690 to [1.0.0.9]:1021; syncache_timer: Response timeout, retransmitting (2) SYN|ACK TCP: [1.0.0.35]:46690 to [1.0.0.9]:1021; syncache_timer: Response timeout, retransmitting (3) SYN|ACK
Created attachment 192797 [details] bash_client.sh
A commit references this bug: Author: np Date: Wed Dec 19 01:37:01 UTC 2018 New revision: 342208 URL: https://svnweb.freebsd.org/changeset/base/342208 Log: cxgbe/t4_tom: fixes for issues on the passive open side. - Fix PR 227760 by getting the TOE to respond to the SYN after the call to toe_syncache_add, not during it. The kernel syncache code calls syncache_respond just before syncache_insert. If the ACK to the syncache_respond is processed in another thread it may run before the syncache_insert and won't find the entry. Note that this affects only t4_tom because it's the only driver trying to insert and expand syncache entries from different threads. - Do not leak resources if an embryonic connection terminates at SYN_RCVD because of L2 lookup failures. - Retire lctx->synq and associated code because there is never a need to walk the list of embryonic connections associated with a listener. The per-tid state is still called a synq entry in the driver even though the synq itself is now gone. PR: 227760 MFC after: 2 weeks Sponsored by: Chelsio Communications Changes: head/sys/dev/cxgbe/tom/t4_connect.c head/sys/dev/cxgbe/tom/t4_cpl_io.c head/sys/dev/cxgbe/tom/t4_listen.c head/sys/dev/cxgbe/tom/t4_tom.c head/sys/dev/cxgbe/tom/t4_tom.h
A commit references this bug: Author: np Date: Wed Mar 27 21:29:46 UTC 2019 New revision: 345605 URL: https://svnweb.freebsd.org/changeset/base/345605 Log: MFC r342208: cxgbe/t4_tom: fixes for issues on the passive open side. - Fix PR 227760 by getting the TOE to respond to the SYN after the call to toe_syncache_add, not during it. The kernel syncache code calls syncache_respond just before syncache_insert. If the ACK to the syncache_respond is processed in another thread it may run before the syncache_insert and won't find the entry. Note that this affects only t4_tom because it's the only driver trying to insert and expand syncache entries from different threads. - Do not leak resources if an embryonic connection terminates at SYN_RCVD because of L2 lookup failures. - Retire lctx->synq and associated code because there is never a need to walk the list of embryonic connections associated with a listener. The per-tid state is still called a synq entry in the driver even though the synq itself is now gone. PR: 227760 Sponsored by: Chelsio Communications Changes: _U stable/12/ stable/12/sys/dev/cxgbe/tom/t4_connect.c stable/12/sys/dev/cxgbe/tom/t4_cpl_io.c stable/12/sys/dev/cxgbe/tom/t4_listen.c stable/12/sys/dev/cxgbe/tom/t4_tom.c stable/12/sys/dev/cxgbe/tom/t4_tom.h
A commit references this bug: Author: np Date: Tue Apr 30 18:03:18 UTC 2019 New revision: 346970 URL: https://svnweb.freebsd.org/changeset/base/346970 Log: MFC r342208: cxgbe/t4_tom: fixes for issues on the passive open side. - Fix PR 227760 by getting the TOE to respond to the SYN after the call to toe_syncache_add, not during it. The kernel syncache code calls syncache_respond just before syncache_insert. If the ACK to the syncache_respond is processed in another thread it may run before the syncache_insert and won't find the entry. Note that this affects only t4_tom because it's the only driver trying to insert and expand syncache entries from different threads. - Do not leak resources if an embryonic connection terminates at SYN_RCVD because of L2 lookup failures. - Retire lctx->synq and associated code because there is never a need to walk the list of embryonic connections associated with a listener. The per-tid state is still called a synq entry in the driver even though the synq itself is now gone. PR: 227760 Sponsored by: Chelsio Communications Changes: _U stable/11/ stable/11/sys/dev/cxgbe/tom/t4_connect.c stable/11/sys/dev/cxgbe/tom/t4_cpl_io.c stable/11/sys/dev/cxgbe/tom/t4_listen.c stable/11/sys/dev/cxgbe/tom/t4_tom.c stable/11/sys/dev/cxgbe/tom/t4_tom.h