After upgrade from 11.0-STABLE r318137 to 11.1-PRERELEASE TCP MD5 signatures cannot be verified, so bird session cannot be established. Neither ISP, nor our side changed the configuration. Bird-1.6.3_1 was recompiled from port, but it doesn't fix the trouble. # cat /etc/ipsec.conf flush ; add x.x.x.y x.x.x.x tcp 0x1000 -A tcp-md5 "Password1234" ; add x.x.x.x x.x.x.y tcp 0x1001 -A tcp-md5 "Password1234" ; # setkey -D x.x.x.x x.x.x.y tcp mode=any spi=4097(0x00001001) reqid=0(0x00000000) A: tcp-md5 3647334d 72483753 4c4d5733 seq=0x00000000 replay=0 flags=0x00000040 state=mature created: May 22 12:25:03 2017 current: May 22 12:35:06 2017 diff: 603(s) hard: 0(s) soft: 0(s) last: May 22 12:25:09 2017 hard: 0(s) soft: 0(s) current: 6016(bytes) hard: 0(bytes) soft: 0(bytes) allocated: 94 hard: 0 soft: 0 sadb_seq=1 pid=37398 refcnt=1 x.x.x.y x.x.x.x tcp mode=any spi=4096(0x00001000) reqid=0(0x00000000) A: tcp-md5 3647334d 72483753 4c4d5733 seq=0x00000000 replay=0 flags=0x00000040 state=mature created: May 22 12:25:03 2017 current: May 22 12:35:06 2017 diff: 603(s) hard: 0(s) soft: 0(s) last: May 22 12:25:08 2017 hard: 0(s) soft: 0(s) current: 5680(bytes) hard: 0(bytes) soft: 0(bytes) allocated: 71 hard: 0 soft: 0 sadb_seq=0 pid=37398 refcnt=1 # netstat -sp tcp | grep signature 0 packets with matching signature received 4601 packets with bad signature received 42 times failed to make signature due to no SA 0 times unexpected signature received 30 times no signature provided by segment
This also affects quagga. AFAICT, we're not sending MD5, either.
Oh-and ... this all worked fine in 11.0-RELEASE (save the ipv6 bug ... which was supposedly patched in 11.0-STABLE (nice if that stays "fixed" too)).
FYI something was broken between 11.0-STABLE r318137 and 11.1-PRERELEASE r318590. In our setup, TCP MD5 signed packets originate from vlan interface created on lagg interface, so the issue could be VLAN or LAGG specific. I don't know if freshest PRERELEASE works fine. For sure r318137 recompiled from sources works correctly, although just after the bootup, some packets with bad signatures occurred: # netstat -sp tcp | grep signature 8794 packets with matching signature received 18 packets with bad signature received 0 times failed to make signature due to no SA 1 time unexpected signature received 1 time no signature provided by segment
(In reply to Marek Zarychta from comment #0) > After upgrade from 11.0-STABLE r318137 to 11.1-PRERELEASE TCP MD5 signatures > cannot be verified, so bird session cannot be established. > Neither ISP, nor our side changed the configuration. Bird-1.6.3_1 was > recompiled from port, but it doesn't fix the trouble. > # netstat -sp tcp | grep signature > 0 packets with matching signature received > 4601 packets with bad signature received > 42 times failed to make signature due to no SA > 0 times unexpected signature received > 30 times no signature provided by segment There were no changes in stable/11 in TCP-MD5 code. So if it worked in r318137, it should work. Do you use bird's "password" option to set SAs or are they set via setkey(8)? There is patch for bird in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=218907 I sent it to bird's developer and port maintainer, but seems it is not committed.
(In reply to Andrey V. Elsukov from comment #4) SAs are they set via setkey(8)from ipsec.conf config file.
(In reply to Andrey V. Elsukov from comment #4) My previous reply was mistaken, so let me explain once again. SAs are set from /etc/ipsec.conf and since the new IPSEC code was committed to STABLE I am following a branch. A couple of successful world updates has been provided. tcpmd5.ko module is loaded from /boot/loader.conf as well as some other modules: # kldstat Id Refs Address Size Name 1 38 0xffffffff80200000 1f32ac8 kernel 2 1 0xffffffff82134000 313338 zfs.ko 3 2 0xffffffff82448000 cb38 opensolaris.ko 4 1 0xffffffff82455000 12530 carp.ko 5 1 0xffffffff82468000 161c8 if_lagg.ko 6 1 0xffffffff8247f000 18d0 accf_dns.ko 7 1 0xffffffff82481000 66f0 ichwd.ko 8 1 0xffffffff82488000 40c0 tcpmd5.ko 9 1 0xffffffff8248d000 2af28 ipsec.ko 10 1 0xffffffff82621000 106c5 geom_eli.ko 11 1 0xffffffff82632000 58de fdescfs.ko 12 1 0xffffffff82638000 2839 pflog.ko 13 1 0xffffffff8263b000 34c2c pf.ko 14 1 0xffffffff82670000 a0d green_saver.ko 15 1 0xffffffff82671000 837a autofs.ko Kernel config is quite simple: include GENERIC ident MAXDATA options EM_MULTIQUEUE options IPSEC_SUPPORT nooptions IPSEC This is a production machine and updates will be stalled, at least till 11.1 will be released.
What is the branch you are following? I'm a little stuck here. 11-0-RELEASE is broken because TCP_MD5 on IPv6 panics... so I can't go back to that (misplaced patch that fixes it). 11-1-PRERELEASE is also broken. At least it doesn't panic. I also use TCP-MD5 over vlans.
(In reply to dgilbert from comment #7) I am using TCP-MD5 signatures only for IPv4 with vlans created atop of lagg interface following 11.0-STABLE branch since march, just after r315514 when ae@ MFCed new IPSEC code. The kernels had been always built with IPSEC_SUPPORT to allow load TCP-MD5 as a module.
(In reply to Andrey V. Elsukov from comment #4) I have tested on the spare machine, that r318329 commit broke TCP-MD5 for vlan interfaces created atop of lagg. Vlan interfaces created on raw network adapters are not affected.
Can somebody hint me how TCP MD5 supposed to coexist with TSO and LRO? Because it is the only way I can guess how my changes may affect TCP MD% signatures, calculated in my understanding completely in software. Marek, don't you have ifconfig outputs for working and not working cases? Are the interface options/capabilities you see there different?
Created attachment 182877 [details] ifconfig output
Only one of adapters (msk0) belonging to lagg0 is TSO capable, but it is brought up with "-tso" in both machines. Lagg interfaces are brought up with "-lacp_strict" option. The promiscuous mode in working is caused by other child VLAN interface running this mode. I have attached output of ifconfig -v command limited to considered interfaces only, IP addresses are masked.
I see one problem with my change: when it tries to disable RXCSUM for em0, since msk0 does not support it, em0 also disables TXCSUM, which lagg does not notice. But that should not affect vlan operation, since due to lack of VLAN_HWCSUM on msk0 none of those flags propagated to vlan0 either before my change or after. I'd appreciate some comments here from people knowing how TCP MD5 is working. Meanwhile, Marek, could you try to manually synchronize the msk0 and em0 interface capabilities (disable rxcsuma and txcsum) to see whether it change anything.
(In reply to Alexander Motin from comment #13) I have manually synchronized (disabled) TX/RXcsum options after lagg0 creation. It has no impact on TCP-MD5 packet signing on affected VLAN interface.
A commit references this bug: Author: mav Date: Fri May 26 20:15:33 UTC 2017 New revision: 318966 URL: https://svnweb.freebsd.org/changeset/base/318966 Log: Improve applying unified capabilities to the lagg ports. Some NICs have some capabilities dependent, so that disabling one require disabling some other (TXCSUM/RXCSUM on em). This code tries to reach the consensus more insistently. PR: 219453 MFC after: 1 week Changes: head/sys/net/if_lagg.c
(In reply to commit-hook from comment #15) Thank you for MFCing this to STABLE. Now options/capabilities of LAGG interface is a subset of options/capabilities of interfaces belonging to this LAGG. But this change looks like irrelevant to TCP-MD5 packet signing on child VLAN interface which still seems to be broken: # netstat -sp tcp | grep signature 0 packets with matching signature received 239 packets with bad signature received 0 times failed to make signature due to no SA 0 times unexpected signature received 0 times no signature provided by segment
Over to committer.
The TCP-MD5 signatures received directly on IPv4 enabled LAGG interface are also incorrect. That makes the issue strictly LAGG related. Curiously the packets originating from the affected interface are reported by the receiving (unaffected) peer as correct. So the problem is in checking TCP-MD5 signatures of incoming packets on the affected interface, not packet signing originating from this interface, if it at all could be considered separately. Any chance to make it work again before 11.1 is released?
(In reply to Marek Zarychta from comment #18) > The TCP-MD5 signatures received directly on IPv4 enabled LAGG interface are > also incorrect. That makes the issue strictly LAGG related. > > Curiously the packets originating from the affected interface are reported > by the receiving (unaffected) peer as correct. So the problem is in checking > TCP-MD5 signatures of incoming packets on the affected interface, not packet > signing originating from this interface, if it at all could be considered > separately. > > Any chance to make it work again before 11.1 is released? I don't know how lagg or vlan can affect TCP-MD5 calculation. There are several counters that could be changed during processing inbound signed packet: 1. "no signature provided by segment" is incremented when signed TCP segment received, but socket is not configured to receive them. 2. "failed to make signature due to no SA" is incremented when received signed TCP segment, but there is no SA for given addresses. 3. "packets with bad signature received" is incremented when signed TCP segment received, SA is found, but calculated signature for given addresses and password from SA doesn't match to expected value. There are many users who have tested the TCP-MD5 code and it works for both IPv4 and IPv6. According to your last `netstat` output I can suggest to check that password is the same on both hosts in all SAs. Use tcpdump with -M flag on both hosts to check that sent and received packets are correct.
(In reply to Andrey V. Elsukov from comment #19) Received packets are correct. One more notice here: the router with LAGG interface was always the initiator of BGP session, even before disastrous 318329 commit the router never responded to BGP session initiated by the peer, but attempted to initiate session itself.
It is not bug, it is a feature. May be it should better documented, that TCP-MD5 signatures work with interfaces and VLAN subinterfaces created on them if the interface supports TX/RX checksums generation/checking in hardware. This trait applies not only to LAGG interfaces. Thank all, especially ae@ for help in this weird case.