Created attachment 196357 [details] Patch for libalias to solve a natd problem If natd must send out a retransmitted "227 Entering Passive Mode" message then sometimes the sequncenumber of the generated packet is wrong breaking the tcp connection. Details are explained in https://lists.freebsd.org/pipermail/freebsd-net/2018-August/051290.html The attached patch for FreeBSD 10 solves the problem for me and works also for newer FreeBSD versions.
The patch given in this bug report waits for commit. I like to describe some technical background for the problem. Relevant are the functions AddSeq() and GetDeltaSeqOut() in the source alias_db.c. AddSeq() start with the following comment: When a TCP packet has been altered in length, save this information in a circular list. If enough packets have been altered, then this list will begin to overwrite itself. The important word here is "circular". The used list has N_LINK_TCP_DATA (=3) entries and AddSeq() writes to the list circual from top to bottom and holds the actual index of the list in the variable lnk->data.tcp->state.index. The function GetDeltaSeqOut() starts with the following comment: Find out how much the sequence number has been altered for an outgoing TCP packet. To do this, a circular list of ACK numbers where the TCP packet size was altered is searched. The circular list read by GetDeltaSeqOut() is the same list filled by AddSeq() in a circular manner before. But the function GetDeltaSeqOut() does not read the list in a circular way but sequential from top to bottom. Especially GetDeltaSeqOut() does not use the variable lnk->data.tcp->state.index in any way. That is the leak ! Example: If a "227 Entering Passive Mode" packet must be retransmittet and the length changes from 51 to 50, then we have three cases for the list: case 1: index 0: original packet 51 index 1: retransmitted packet 50 index 2: not relevant case 2: index 0: not relevant index 1: original packet 51 index 2: retransmitted packet 50 case 3: index 0: retransmitted packet 50 index 1: not relevant index 2: original packet 51 Only in case 3 the function GetDeltaSeqOut() finds the correct length 50, in case 1 and 2 GetDeltaSeqOut() finds length 51 and TCP fails. Exactly this problem is solved by the patch given in this bug report for GetDeltaSeqOut() and (identical constellation) GetDeltaAckIn(). The function GetDeltaSeqOut() with the patch included searches the list from bottom to top and starts with the correct index lnk->data.tcp->state.index.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=9febbc4541903bb8e6b0f1c84988c98b2f7c96ef commit 9febbc4541903bb8e6b0f1c84988c98b2f7c96ef Author: Hans Petter Selasky <hselasky@FreeBSD.org> AuthorDate: 2021-02-22 10:58:46 +0000 Commit: Hans Petter Selasky <hselasky@FreeBSD.org> CommitDate: 2021-02-22 16:13:58 +0000 Fix for natd(8) sending wrong sequence number after TCP retransmission, terminating a TCP connection. If a TCP packet must be retransmitted and the data length has changed in the retransmitted packet, due to the internal workings of TCP, typically when ACK packets are lost, then there is a 30% chance that the logic in GetDeltaSeqOut() will find the correct length, which is the last length received. This can be explained as follows: If a "227 Entering Passive Mode" packet must be retransmittet and the length changes from 51 to 50 bytes, for example, then we have three cases for the list scan in GetDeltaSeqOut(), depending on how many prior packets were received modulus N_LINK_TCP_DATA=3: case 1: index 0: original packet 51 index 1: retransmitted packet 50 index 2: not relevant case 2: index 0: not relevant index 1: original packet 51 index 2: retransmitted packet 50 case 3: index 0: retransmitted packet 50 index 1: not relevant index 2: original packet 51 This patch simply changes the searching order for TCP packets, always starting at the last received packet instead of any received packet, in GetDeltaAckIn() and GetDeltaSeqOut(). Else no functional changes. Discussed with: rscheff@ Submitted by: Andreas Longwitz <longwitz@incore.de> PR: 230755 MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking sys/netinet/libalias/alias_db.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-)
It appears Linux has had a similar issue too, just for the record: https://bugzilla.redhat.com/show_bug.cgi?id=642388 Will be MFC'ed later. Thank you!
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=17cc6689eb16302f897023c1b227c30de1e373a4 commit 17cc6689eb16302f897023c1b227c30de1e373a4 Author: Hans Petter Selasky <hselasky@FreeBSD.org> AuthorDate: 2021-02-22 10:58:46 +0000 Commit: Hans Petter Selasky <hselasky@FreeBSD.org> CommitDate: 2021-03-03 09:47:44 +0000 MFC 9febbc454190: Fix for natd(8) sending wrong sequence number after TCP retransmission, terminating a TCP connection. If a TCP packet must be retransmitted and the data length has changed in the retransmitted packet, due to the internal workings of TCP, typically when ACK packets are lost, then there is a 30% chance that the logic in GetDeltaSeqOut() will find the correct length, which is the last length received. This can be explained as follows: If a "227 Entering Passive Mode" packet must be retransmittet and the length changes from 51 to 50 bytes, for example, then we have three cases for the list scan in GetDeltaSeqOut(), depending on how many prior packets were received modulus N_LINK_TCP_DATA=3: case 1: index 0: original packet 51 index 1: retransmitted packet 50 index 2: not relevant case 2: index 0: not relevant index 1: original packet 51 index 2: retransmitted packet 50 case 3: index 0: retransmitted packet 50 index 1: not relevant index 2: original packet 51 This patch simply changes the searching order for TCP packets, always starting at the last received packet instead of any received packet, in GetDeltaAckIn() and GetDeltaSeqOut(). Else no functional changes. Discussed with: rscheff@ Submitted by: Andreas Longwitz <longwitz@incore.de> PR: 230755 Sponsored by: Mellanox Technologies // NVIDIA Networking (cherry picked from commit 9febbc4541903bb8e6b0f1c84988c98b2f7c96ef) sys/netinet/libalias/alias_db.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-)
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=25858ea74b07383145d91773635107ceddf7ca33 commit 25858ea74b07383145d91773635107ceddf7ca33 Author: Hans Petter Selasky <hselasky@FreeBSD.org> AuthorDate: 2021-02-22 10:58:46 +0000 Commit: Hans Petter Selasky <hselasky@FreeBSD.org> CommitDate: 2021-03-03 09:50:11 +0000 MFC 9febbc454190: Fix for natd(8) sending wrong sequence number after TCP retransmission, terminating a TCP connection. If a TCP packet must be retransmitted and the data length has changed in the retransmitted packet, due to the internal workings of TCP, typically when ACK packets are lost, then there is a 30% chance that the logic in GetDeltaSeqOut() will find the correct length, which is the last length received. This can be explained as follows: If a "227 Entering Passive Mode" packet must be retransmittet and the length changes from 51 to 50 bytes, for example, then we have three cases for the list scan in GetDeltaSeqOut(), depending on how many prior packets were received modulus N_LINK_TCP_DATA=3: case 1: index 0: original packet 51 index 1: retransmitted packet 50 index 2: not relevant case 2: index 0: not relevant index 1: original packet 51 index 2: retransmitted packet 50 case 3: index 0: retransmitted packet 50 index 1: not relevant index 2: original packet 51 This patch simply changes the searching order for TCP packets, always starting at the last received packet instead of any received packet, in GetDeltaAckIn() and GetDeltaSeqOut(). Else no functional changes. Discussed with: rscheff@ Submitted by: Andreas Longwitz <longwitz@incore.de> PR: 230755 Sponsored by: Mellanox Technologies // NVIDIA Networking (cherry picked from commit 9febbc4541903bb8e6b0f1c84988c98b2f7c96ef) sys/netinet/libalias/alias_db.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-)
A commit in branch stable/11 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=e3d9b9cc02b6e00858526d302c82bfa1dbeb87ee commit e3d9b9cc02b6e00858526d302c82bfa1dbeb87ee Author: Hans Petter Selasky <hselasky@FreeBSD.org> AuthorDate: 2021-02-22 10:58:46 +0000 Commit: Hans Petter Selasky <hselasky@FreeBSD.org> CommitDate: 2021-03-03 09:51:48 +0000 MFC 9febbc454190: Fix for natd(8) sending wrong sequence number after TCP retransmission, terminating a TCP connection. If a TCP packet must be retransmitted and the data length has changed in the retransmitted packet, due to the internal workings of TCP, typically when ACK packets are lost, then there is a 30% chance that the logic in GetDeltaSeqOut() will find the correct length, which is the last length received. This can be explained as follows: If a "227 Entering Passive Mode" packet must be retransmittet and the length changes from 51 to 50 bytes, for example, then we have three cases for the list scan in GetDeltaSeqOut(), depending on how many prior packets were received modulus N_LINK_TCP_DATA=3: case 1: index 0: original packet 51 index 1: retransmitted packet 50 index 2: not relevant case 2: index 0: not relevant index 1: original packet 51 index 2: retransmitted packet 50 case 3: index 0: retransmitted packet 50 index 1: not relevant index 2: original packet 51 This patch simply changes the searching order for TCP packets, always starting at the last received packet instead of any received packet, in GetDeltaAckIn() and GetDeltaSeqOut(). Else no functional changes. Discussed with: rscheff@ Submitted by: Andreas Longwitz <longwitz@incore.de> PR: 230755 Sponsored by: Mellanox Technologies // NVIDIA Networking (cherry picked from commit 9febbc4541903bb8e6b0f1c84988c98b2f7c96ef) sys/netinet/libalias/alias_db.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-)
A commit in branch stable/10 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=0a62368c6a53054fcc255583cf6308f85a1533d2 commit 0a62368c6a53054fcc255583cf6308f85a1533d2 Author: Hans Petter Selasky <hselasky@FreeBSD.org> AuthorDate: 2021-02-22 10:58:46 +0000 Commit: Hans Petter Selasky <hselasky@FreeBSD.org> CommitDate: 2021-03-03 09:57:29 +0000 MFC 9febbc454190: Fix for natd(8) sending wrong sequence number after TCP retransmission, terminating a TCP connection. If a TCP packet must be retransmitted and the data length has changed in the retransmitted packet, due to the internal workings of TCP, typically when ACK packets are lost, then there is a 30% chance that the logic in GetDeltaSeqOut() will find the correct length, which is the last length received. This can be explained as follows: If a "227 Entering Passive Mode" packet must be retransmittet and the length changes from 51 to 50 bytes, for example, then we have three cases for the list scan in GetDeltaSeqOut(), depending on how many prior packets were received modulus N_LINK_TCP_DATA=3: case 1: index 0: original packet 51 index 1: retransmitted packet 50 index 2: not relevant case 2: index 0: not relevant index 1: original packet 51 index 2: retransmitted packet 50 case 3: index 0: retransmitted packet 50 index 1: not relevant index 2: original packet 51 This patch simply changes the searching order for TCP packets, always starting at the last received packet instead of any received packet, in GetDeltaAckIn() and GetDeltaSeqOut(). Else no functional changes. Discussed with: rscheff@ Submitted by: Andreas Longwitz <longwitz@incore.de> PR: 230755 Sponsored by: Mellanox Technologies // NVIDIA Networking (cherry picked from commit 9febbc4541903bb8e6b0f1c84988c98b2f7c96ef) sys/netinet/libalias/alias_db.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-)