Bug 223817 - mbuf leaking with tcpmd5
Summary: mbuf leaking with tcpmd5
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.1-RELEASE
Hardware: i386 Any
: --- Affects Only Me
Assignee: Andrey V. Elsukov
URL:
Keywords: regression
Depends on:
Blocks:
 
Reported: 2017-11-23 13:06 UTC by Dirk Meyer
Modified: 2017-12-21 14:11 UTC (History)
2 users (show)

See Also:


Attachments
Proposed patch (345 bytes, patch)
2017-11-27 13:23 UTC, Andrey V. Elsukov
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dirk Meyer freebsd_committer freebsd_triage 2017-11-23 13:06:41 UTC
after Upgrading from FreeBSD 10.3 to FreeBSD 11.1
The router freezes after a few hours with mbuf exhaustion.

Custom Kernel, added:
options         TCP_SIGNATURE           #include support for RFC 2385

if md5 sessions with bird and bird6 are disabled, the router is stable.

stable Router:

# netstat -m
291/984/1275 mbufs in use (current/cache/total)
290/470/760/2326 mbuf clusters in use (current/cache/total/max)
290/469 mbuf+clusters out of packet secondary zone in use (current/cache)
0/2/2/1162 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/344 9k jumbo clusters in use (current/cache/total/max)
0/0/0/193 16k jumbo clusters in use (current/cache/total/max)
652K/1194K/1846K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/5/1392 sfbufs in use (current/peak/max)
0 sendfile syscalls
0 sendfile syscalls completed without I/O request
0 requests for I/O initiated by sendfile
0 pages read by sendfile as part of a request
0 pages were valid at time of a sendfile request
0 pages were requested for read ahead by applications
0 pages were read ahead by sendfile
0 times sendfile encountered an already busy page
0 requests for sfbufs denied
0 requests for sfbufs delayed


router with md5 protected sessions:
# netstat -m
5589/486/6075 mbufs in use (current/cache/total)
192/318/510/5012 mbuf clusters in use (current/cache/total/max)
192/314 mbuf+clusters out of packet secondary zone in use (current/cache)
0/2/2/2505 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/742 9k jumbo clusters in use (current/cache/total/max)
0/0/0/417 16k jumbo clusters in use (current/cache/total/max)
1781K/765K/2546K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/4/2416 sfbufs in use (current/peak/max)
0 sendfile syscalls
0 sendfile syscalls completed without I/O request
0 requests for I/O initiated by sendfile
0 pages read by sendfile as part of a request
0 pages were valid at time of a sendfile request
0 pages were requested for read ahead by applications
0 pages were read ahead by sendfile
0 times sendfile encountered an already busy page
0 requests for sfbufs denied
0 requests for sfbufs delayed

console log:
[zone: mbuf] kern.ipc.nmbufs limit reached
sis2.2: unable to prepend VLAN header
[zone: mbuf] kern.ipc.nmbufs limit reached
Comment 1 Andrey V. Elsukov freebsd_committer freebsd_triage 2017-11-23 15:30:17 UTC
Can you show the output of `netstat -sp tcp` command?
Comment 2 Dirk Meyer freebsd_committer freebsd_triage 2017-11-25 07:23:24 UTC
uptime 3 min

tcp:
        13140 packets sent
                208 data packets (15004 bytes)
                0 data packets (0 bytes) retransmitted
                0 data packets unnecessarily retransmitted
                0 resends initiated by MTU discovery
                8423 ack-only packets (457 delayed)
                0 URG only packets
                0 window probe packets
                4485 window update packets
                24 control packets
        18601 packets received
                163 acks (for 15011 bytes)
                35 duplicate acks
                0 acks for unsent data
                17810 packets (21631400 bytes) received in-sequence
                35 completely duplicate packets (26221 bytes)
                0 old duplicate packets
                9 packets with some dup. data (11927 bytes duped)
                464 out-of-order packets (625887 bytes)
                0 packets (0 bytes) of data after window
                0 window probes
                0 window update packets
                0 packets received after close
                0 discarded for bad checksums
                0 discarded for bad header offset fields
                0 discarded because packet too short
                0 discarded due to memory problems
        18 connection requests
        1 connection accept
        0 bad connection attempts
        0 listen queue overflows
        0 ignored RSTs in the windows
        11 connections established (including accepts)
                0 times used RTT from hostcache
                0 times used RTT variance from hostcache
                0 times used slow-start threshold from hostcache
        17 connections closed (including 0 drops)
                1 connection updated cached RTT on close
                1 connection updated cached RTT variance on close
                0 connections updated cached ssthresh on close
        6 embryonic connections dropped
        163 segments updated rtt (of 170 attempts)
        62 retransmit timeouts
                0 connections dropped by rexmit timeout
        0 persist timeouts
                0 connections dropped by persist timeout
        0 Connections (fin_wait_2) dropped because of timeout
        6 keepalive timeouts
                0 keepalive probes sent
                6 connections dropped by keepalive
        92 correct ACK header predictions
        17774 correct data packet header predictions
        1 syncache entry added
                0 retransmitted
                0 dupsyn
                0 dropped
                1 completed
                0 bucket overflow
                0 cache overflow
                0 reset
                0 stale
                0 aborted
                0 badack
                0 unreach
                0 zone failures
        1 cookie sent
        0 cookies received
        1 hostcache entry added
                0 bucket overflow
        0 SACK recovery episodes
        0 segment rexmits in SACK recovery episodes
        0 byte rexmits in SACK recovery episodes
        0 SACK options (SACK blocks) received
        0 SACK options (SACK blocks) sent
        0 SACK scoreboard overflow
        0 packets with ECN CE bit set
        0 packets with ECN ECT(0) bit set
        0 packets with ECN ECT(1) bit set
        0 successful ECN handshakes
        0 times ECN reduced the congestion window
        0 packets with matching signature received
        0 packets with bad signature received
        143 times failed to make signature due to no SA
        0 times unexpected signature received
        10 times no signature provided by segment
TCP connection count by state:
        0 connections in CLOSED state
        4 connections in LISTEN state
        2 connections in SYN_SENT state
        0 connections in SYN_RCVD state
        5 connections in ESTABLISHED state
        0 connections in CLOSE_WAIT state
        0 connections in FIN_WAIT_1 state
        0 connections in CLOSING state
        0 connections in LAST_ACK state
        0 connections in FIN_WAIT_2 state
        0 connections in TIME_WAIT state
Comment 3 Dirk Meyer freebsd_committer freebsd_triage 2017-11-25 07:24:18 UTC
uptime 30 min

tcp:
        19129 packets sent
                673 data packets (32758 bytes)
                0 data packets (0 bytes) retransmitted
                0 data packets unnecessarily retransmitted
                0 resends initiated by MTU discovery
                13871 ack-only packets (5040 delayed)
                0 URG only packets
                0 window probe packets
                4485 window update packets
                100 control packets
        25085 packets received
                572 acks (for 32765 bytes)
                35 duplicate acks
                0 acks for unsent data
                23055 packets (22835499 bytes) received in-sequence
                35 completely duplicate packets (26221 bytes)
                0 old duplicate packets
                9 packets with some dup. data (11927 bytes duped)
                464 out-of-order packets (625887 bytes)
                0 packets (0 bytes) of data after window
                0 window probes
                0 window update packets
                0 packets received after close
                0 discarded for bad checksums
                0 discarded for bad header offset fields
                0 discarded because packet too short
                0 discarded due to memory problems
        94 connection requests
        1 connection accept
        0 bad connection attempts
        0 listen queue overflows
        1 ignored RSTs in the window
        11 connections established (including accepts)
                0 times used RTT from hostcache
                0 times used RTT variance from hostcache
                0 times used RTT from hostcache
                0 times used RTT variance from hostcache
                0 times used slow-start threshold from hostcache
        93 connections closed (including 0 drops)
                1 connection updated cached RTT on close
                1 connection updated cached RTT variance on close
                0 connections updated cached ssthresh on close
        82 embryonic connections dropped
        572 segments updated rtt (of 655 attempts)
        669 retransmit timeouts
                0 connections dropped by rexmit timeout
        0 persist timeouts
                0 connections dropped by persist timeout
        0 Connections (fin_wait_2) dropped because of timeout
        82 keepalive timeouts
                0 keepalive probes sent
                82 connections dropped by keepalive
        455 correct ACK header predictions
        22992 correct data packet header predictions
        1 syncache entry added
                0 retransmitted
                0 dupsyn
                0 dropped
                1 completed
                0 bucket overflow
                0 cache overflow
                0 reset
                0 stale
                0 aborted
                0 badack
                0 unreach
                0 zone failures
        1 cookie sent
        0 cookies received
        1 hostcache entry added
                0 bucket overflow
        0 SACK recovery episodes
        0 segment rexmits in SACK recovery episodes
        0 byte rexmits in SACK recovery episodes
        0 SACK options (SACK blocks) received
        0 SACK options (SACK blocks) sent
        0 SACK scoreboard overflow
        0 packets with ECN CE bit set
        0 packets with ECN ECT(0) bit set
        0 packets with ECN ECT(1) bit set
        0 successful ECN handshakes
        0 times ECN reduced the congestion window
        0 packets with matching signature received
        0 packets with bad signature received
        1209 times failed to make signature due to no SA
        0 times unexpected signature received
        10 times no signature provided by segment
TCP connection count by state:
        0 connections in CLOSED state
        4 connections in LISTEN state
        2 connections in SYN_SENT state
        0 connections in SYN_RCVD state
        5 connections in ESTABLISHED state
        0 connections in CLOSE_WAIT state
        0 connections in FIN_WAIT_1 state
        0 connections in CLOSING state
        0 connections in LAST_ACK state
        0 connections in FIN_WAIT_2 state
        0 connections in TIME_WAIT state
Comment 4 Dirk Meyer freebsd_committer freebsd_triage 2017-11-25 07:26:41 UTC
The same command on FreeeBSD 10.3
shows data for tcpmd5.

uptime 2 days:

root@hbw2:~# netstat -sp tcp
tcp:
        404820 packets sent
                16566 data packets (809618 bytes)
                9 data packets (171 bytes) retransmitted
                0 data packets unnecessarily retransmitted
                0 resends initiated by MTU discovery
                381530 ack-only packets (351256 delayed)
                0 URG only packets
                0 window probe packets
                6703 window update packets
                12 control packets
        453693 packets received
                15429 acks (for 809625 bytes)
                6189 duplicate acks
                0 acks for unsent data
                420555 packets (130396892 bytes) received in-sequence
                42 completely duplicate packets (32133 bytes)
                0 old duplicate packets
                9 packets with some dup. data (9812 bytes duped)
                222 out-of-order packets (240797 bytes)
                0 packets (0 bytes) of data after window
                0 window probes
                25 window update packets
                0 packets received after close
                3 discarded for bad checksums
                0 discarded for bad header offset fields
                0 discarded because packet too short
                0 discarded due to memory problems
        8 connection requests
        5 connection accepts
        0 bad connection attempts
        0 listen queue overflows
        0 ignored RSTs in the windows
        11 connections established (including accepts)
        11 connections closed (including 0 drops)
                4 connections updated cached RTT on close
                4 connections updated cached RTT variance on close
                0 connections updated cached ssthresh on close
        0 embryonic connections dropped
        15426 segments updated rtt (of 15426 attempts)
        7 retransmit timeouts
                0 connections dropped by rexmit timeout
        0 persist timeouts
                0 connections dropped by persist timeout
        0 Connections (fin_wait_2) dropped because of timeout
        8 keepalive timeouts
                8 keepalive probes sent
                0 connections dropped by keepalive
        13389 correct ACK header predictions
        419108 correct data packet header predictions
        10 syncache entries added
                15 retransmitted
                0 dupsyn
                0 dropped
                5 completed
                0 bucket overflow
                0 cache overflow
                0 reset
                5 stale
                0 aborted
                0 badack
                0 unreach
                0 zone failures
        10 cookies sent
        0 cookies received
        3 hostcache entries added
                0 bucket overflow
        0 SACK recovery episodes
        0 segment rexmits in SACK recovery episodes
        0 byte rexmits in SACK recovery episodes
        0 SACK options (SACK blocks) received
        0 SACK options (SACK blocks) sent
        0 SACK scoreboard overflow
        0 packets with ECN CE bit set
        0 packets with ECN ECT(0) bit set
        0 packets with ECN ECT(1) bit set
        0 successful ECN handshakes
        0 times ECN reduced the congestion window
        252573 packets with valid tcp-md5 signature received
        2 packets with invalid tcp-md5 signature received
        0 packets with tcp-md5 signature mismatch
        0 packets with unexpected tcp-md5 signature received
        2 packets without expected tcp-md5 signature received
Comment 5 Andrey V. Elsukov freebsd_committer freebsd_triage 2017-11-25 08:29:42 UTC
Can you try to see what will show the following dtrace script?

# kldload dtraceall
# dtrace -n 'fbt::tcp_ipsec_output:return {printf("%d", arg1);}'

Also, it seems your MD5-signed connections doesn't work. Probably you need to check that SAs for both directions are created.
Comment 6 Andrey V. Elsukov freebsd_committer freebsd_triage 2017-11-27 13:23:12 UTC
Created attachment 188313 [details]
Proposed patch

If dtrace script will show 13 return code, then probably my assumption is correct and you can try this patch.
Comment 7 Dirk Meyer freebsd_committer freebsd_triage 2017-11-28 22:47:11 UTC
I rebuild the kernel with dtrace.

# kldload dtraceall
# dtrace -n 'fbt::tcp_ipsec_output:return {printf("%d", arg1);}'
dtrace: description 'fbt::tcp_ipsec_output:return ' matched 1 probe

no further output for 50 mins.

Indeed the connection does not come up.
# setkey -D

output looks good.
setkey shows 4 SA, 2 IPv4 for each direction, 2 IPv6 for each direction.
Comment 8 Dirk Meyer freebsd_committer freebsd_triage 2017-12-03 11:10:11 UTC
The 11.1 kernel compiled with the patch runs now with stable mbufs.

The memory leak I saw is fixed.

but MD5-signed connections doesn't work with 11.1

I have in my kernel:

options         IPSEC
decive          crypto
options         TCP_SIGNATURE 

MD5-signed connections doesn't work

I will try to rebuild the kernel with option 
options         IPSEC_SUPPORT
Comment 9 Dirk Meyer freebsd_committer freebsd_triage 2017-12-11 08:12:05 UTC
adding IPSEC_SUPPORT does not have any effect on the issue.

Has the TCP_MD5 API changed in 11.x?
Comment 10 commit-hook freebsd_committer freebsd_triage 2017-12-14 12:55:13 UTC
A commit references this bug:

Author: ae
Date: Thu Dec 14 12:54:21 UTC 2017
New revision: 326847
URL: https://svnweb.freebsd.org/changeset/base/326847

Log:
  Fix mbuf leak when TCPMD5_OUTPUT() method returns error.

  PR:		223817
  MFC after:	1 week

Changes:
  head/sys/netinet/tcp_output.c
Comment 11 Andrey V. Elsukov freebsd_committer freebsd_triage 2017-12-14 13:06:50 UTC
(In reply to Dirk Meyer from comment #9)
> adding IPSEC_SUPPORT does not have any effect on the issue.
> 
> Has the TCP_MD5 API changed in 11.x?

The IPSEC_SUPPORT option just allows use tcpmd5.ko and ipsec.ko kernel modules.

The main change in 11.1+ is that now security association for both directions are required to initiate MD5 signed TCP connection. Also, now TCP-MD5 implementation fully conforms to RFC and it is impossible to send unsigned TCP segments or segments with empty MD5 signature.

If you have both SAs and connection still does not created, look at the netstat -sp tcp statistics. Probably you need to create SAs by hand (via setkey or ipsec.conf) using endpoint's IP addresses. I think source addresses in SAs is incorrect due to bird has bound socket to any IP address and then uses this address to create SA. Also there were several reports that disabling RX/TX checksum offloading on NIC can help.
Comment 12 commit-hook freebsd_committer freebsd_triage 2017-12-21 14:10:05 UTC
A commit references this bug:

Author: ae
Date: Thu Dec 21 14:09:06 UTC 2017
New revision: 327061
URL: https://svnweb.freebsd.org/changeset/base/327061

Log:
  MFC r326847:
    Fix mbuf leak when TCPMD5_OUTPUT() method returns error.

    PR:		223817

Changes:
_U  stable/11/
  stable/11/sys/netinet/tcp_output.c
Comment 13 Andrey V. Elsukov freebsd_committer freebsd_triage 2017-12-21 14:11:04 UTC
Fixed in head/ and stable/11. Thanks!