166501 – [pf] FreeBSD 9.0 generates incorrect SEC/ACK numbers under load

Bug 166501 - [pf] FreeBSD 9.0 generates incorrect SEC/ACK numbers under load

Summary: [pf] FreeBSD 9.0 generates incorrect SEC/ACK numbers under load

Status:	Closed Overcome By Events

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	9.0-RELEASE
Hardware:	Any Any

Importance:	Normal Affects Only Me
Assignee:	freebsd-bugs (Nobody)

URL:
Keywords:

Depends on:
Blocks:

Reported:	2012-03-29 21:50 UTC by Sergey Smitienko
Modified:	2019-02-01 13:49 UTC (History)
CC List:	1 user (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Sergey Smitienko 2012-03-29 21:50:09 UTC

I've run into a problem with a web server runing FreeBSD 9.0/amd64. What
I believe is happening, is what server loses track of correct SEQ/ACK
numberson some connections. Here is an example:

15:20:00.347514 IP (tos 0x68, ttl 123, id 1181, offset 0, flags [DF],
proto TCP (6), length 52)
    93.72.14.220.49239 > 193.178.147.113.80: Flags [S], cksum 0x6995
(correct), seq 3881466934, win 8192, options [mss 1460,nop,wscale
2,nop,nop,sackOK], length 0
15:20:00.347526 IP (tos 0x10, ttl 254, id 28065, offset 0, flags [DF],
proto TCP (6), length 44)
    193.178.147.113.80 > 93.72.14.220.49239: Flags [S.], cksum 0x79fa
(correct), seq 2151790680, ack 3881466935, win 0, options [mss 1460],
length 0
15:20:00.361812 IP (tos 0x68, ttl 123, id 1183, offset 0, flags [DF],
proto TCP (6), length 40)
    93.72.14.220.49239 > 193.178.147.113.80: Flags [.], cksum 0x96c6
(correct), seq 3881466935, ack 2151790681, win 64240, length 0
15:20:00.361869 IP (tos 0x10, ttl 254, id 31305, offset 0, flags [DF],
proto TCP (6), length 40)
    193.178.147.113.80 > 93.72.14.220.49239: Flags [.], cksum 0x71b7
(correct), seq 2151790681, ack 3881466935, win 8192, length 0

Client sends "GET"  request
15:20:48.236181 IP (tos 0x68, ttl 123, id 1353, offset 0, flags [DF],
proto TCP (6), length 626)
    93.72.14.220.49239 > 193.178.147.113.80: Flags [P.], cksum 0x7fc9
(correct), seq 3881466935:3881467521, ack 2151790681, win 64240, length 586

and then the "ping-pong" starts:

15:20:48.236198 IP (tos 0x0, ttl 254, id 63530, offset 0, flags [DF],
proto TCP (6), length 40)
    193.178.147.113.80 > 93.72.14.220.49239: Flags [.], cksum 0x8a97
(correct), seq 2991748588, ack 1985077892, win 8760, length 0
15:20:48.255998 IP (tos 0x68, ttl 123, id 1357, offset 0, flags [DF],
proto TCP (6), length 40)
    93.72.14.220.49239 > 193.178.147.113.80: Flags [.], cksum 0x947c
(correct), seq 3881467521, ack 2151790681, win 64240, length 0
15:20:48.256015 IP (tos 0x0, ttl 254, id 53518, offset 0, flags [DF],
proto TCP (6), length 40)
    193.178.147.113.80 > 93.72.14.220.49239: Flags [.], cksum 0x8a97
(correct), seq 2991748588, ack 1985077892, win 8760, length 0
15:20:48.276084 IP (tos 0x68, ttl 123, id 1360, offset 0, flags [DF],
proto TCP (6), length 40)
    93.72.14.220.49239 > 193.178.147.113.80: Flags [.], cksum 0x947c
(correct), seq 3881467521, ack 2151790681, win 64240, length 0
15:20:48.276099 IP (tos 0x0, ttl 254, id 42983, offset 0, flags [DF],
proto TCP (6), length 40)
    193.178.147.113.80 > 93.72.14.220.49239: Flags [.], cksum 0x8a97
(correct), seq 2991748588, ack 1985077892, win 8760, length 0
15:20:48.290914 IP (tos 0x68, ttl 123, id 1361, offset 0, flags [DF],
proto TCP (6), length 40)
    93.72.14.220.49239 > 193.178.147.113.80: Flags [.], cksum 0x947c
(correct), seq 3881467521, ack 2151790681, win 64240, length 0

This happens on about 0.01% of connections. This tcpdump is recorded on
the 193.178.147.113, before traffic hits the wire.
So it's not a NIC fault. Server is running nginx and serving static
content 200-500 request  per second.

Fix: 

n/a
How-To-Repeat: make.conf:

CPUTYPE?=nocona
CFLAGS=-O2 -pipe -fno-strict-aliasing
COPTFLAGS=-O2 -pipe -funroll-loops -ffast-math -fno-strict-aliasing
KERNCONF=IPSEC
OPTIMIZED_CFLAGS=YES
WITHOUT_X11=YES
BUILD_OPTIMIZED=YES
WITH_CPUFLAGS=YES
WITH_OPTIMIZED_CFLAGS=YES

Kernel is generic with 
options   IPSEC        #IP security
device    crypto

/boot/loader.conf:
net.inet.tcp.tcbhashsize=8192
net.inet.tcp.syncache.hashsize=1024
net.inet.tcp.syncache.bucketlimit=100

/etc/sysctl.conf:

kern.maxvnodes=100000

net.inet.ip.random_id=1
net.inet.ip.portrange.first=10240
net.inet.ip.portrange.last=65535
net.inet.ip.ttl=254

net.inet.tcp.maxtcptw=102400
net.inet.tcp.syncookies=1 
net.inet.tcp.mssdflt=1024

net.inet.icmp.drop_redirect=1
net.inet.icmp.icmplim=100
net.inet.icmp.log_redirect=0
net.inet.icmp.maskrepl=0
net.inet.icmp.icmplim_output=0
net.inet.ip.accept_sourceroute=0

kern.ipc.somaxconn=4096
kern.maxfiles=524288
kern.maxfilesperproc=524288
kern.ipc.maxsockets=524288
kern.ipc.nmbclusters=204800
net.inet.tcp.recvspace=8192
net.inet.tcp.recvbuf_auto=0
net.inet.tcp.sendspace=16384
net.inet.tcp.sendbuf_max=65536
net.inet.tcp.sendbuf_inc=8192
net.inet.tcp.sendbuf_auto=1
kern.ipc.nmbjumbop=192000
kern.ipc.shmall=1048576

Comment 1 Sergey Smitienko 2012-03-30 23:07:40 UTC

I have pf running on the server.  It's has basic ruleset.
I have table <trusted> with 4k+ networks of our web site usual visitors.
pf rules looks like this:

pass in quick from <trusted> to <me> port 80 keep state
pass in quick from any to <me> port 80 synproxy state.

In the tcpdump in report you can see Syn/Ack packet with 0 window size.
And then one more packet with 8K tcp window.

This packet is generated by pf synproxy. Pf anwers Syn packet with Syn/Ack 
without knowledge of window size, and then passes connection to the kernel 
tcp stack and generates "window open" Ack packet.

From the over side, I have 20Gb of tcpdump files with 10^8 packets recorded.
I've wrote a simple parser, which can detect sessions with incorrect
seq/ack numbers. Then I've checked all IP addresses with failed TCP sessions 
and non of them was from <trusted> set.
So, 100% of failed sessions was comming through pf synproxy state.
Synproxy state includes modulate state function, which is basicky an addition 
of strong random number to seq/ack numbers.
So, I think there is a case, then tcp comming from kernel is not
properly modulated/demodulated
by pf and this causes generation of incorrect seq/ack numbers.

-- 
Sergey Smitienko

Comment 2 Mark Linimon freebsd_committer

2012-04-02 07:13:22 UTC

Responsible Changed
From-To: freebsd-bugs->freebsd-net

Comment 3 Andre Oppermann freebsd_committer

2012-04-03 14:46:57 UTC

State Changed
From-To: open->analyzed

The problem was found to be an issue with pf state modulation, 
not FreeBSD's TCP implementation.

Comment 4 Andre Oppermann freebsd_committer

2012-04-03 14:46:57 UTC

Responsible Changed
From-To: freebsd-net->andre

Take over.

Comment 5 Eitan Adler freebsd_committer

2018-01-08 04:14:27 UTC

For the following conditions
Product: Base System, Documentation Status: New, Open, In Progress, UNCONFIRMED 
Assignee: Former FreeBSD committer 

Reset to default assignee. Reset status to "Open".

Comment 6 Kristof Provost freebsd_committer

2019-02-01 13:49:04 UTC

FreeBSD 9.0 is no longer supported. Please re-open this bug if the problem can be reproduced on 12.0 or 11.2, ideally along with a reproduction script.