Bug 266598 - if_ovpn(4) DCO module not supporting correctly IPv6 Traffic Class for tunneled packets
Summary: if_ovpn(4) DCO module not supporting correctly IPv6 Traffic Class for tunnele...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords: ipv6
Depends on:
Blocks:
 
Reported: 2022-09-25 11:33 UTC by Marek Zarychta
Modified: 2022-09-26 12:05 UTC (History)
3 users (show)

See Also:


Attachments
Traffic sniffed at DCO side (2.20 KB, text/plain)
2022-09-25 11:33 UTC, Marek Zarychta
no flags Details
Traffic sniffed on the server (7.75 KB, text/plain)
2022-09-25 11:35 UTC, Marek Zarychta
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marek Zarychta 2022-09-25 11:33:44 UTC
Created attachment 236804 [details]
Traffic sniffed at DCO side

In the beginning, let me thank and express my sincere appreciation to anyone involved in creating if_ovpn.ko and updating security/openvpn-devel with regard to testing DCO support, especially kp@, cron2 and mandree@.

I have spent some time this weekend testing this and found two flaws in tunneling IPv6 traffic when DCO is used. After reverting to standard tun(4), everything works as expected.

1. I am not able to establish an ssh session using IPv6 over the tunnel. It looks like a problem with large TCP segments, known ie. when MTU discovery fails.
 
2. Sniffing traffic with tcpdump(1) on tun(4), when observed at DCO endpoint,  reveals only packets originating from the tunnel are visible, not ones sent over the tunnel.

In the attached files, the IPv6 address 2001:db8:1:c0:2::1 belongs to a FreeBSD client with DCO enabled.
Comment 1 Marek Zarychta 2022-09-25 11:35:13 UTC
Created attachment 236805 [details]
Traffic sniffed on the server
Comment 2 Marek Zarychta 2022-09-25 11:43:53 UTC
Test were run on:

FreeBSD 14.0-CURRENT #21 main-n258191-57338837aef: Sun Sep 25 11:42:29 CEST 2022

OpenVPN 2.6_git [git:734de8f9aa2df56bcb45ebab7cfa799a23f36403] amd64-portbld-freebsd14.0 [SSL (OpenSSL)] [LZO] [LZ4] [MH/RECVDA] [AEAD] [DCO] built on Sep 25 2022
Comment 3 Gert Doering 2022-09-26 06:38:41 UTC
I can not reproduce the tcpdump issue, but I can reproduce the SSH stall.

Setup:
 - OpenVPN Client with Linux-DCO, Server with FreeBSD-DCO.
 - SSH client on the Linux side, SSH server on the FreeBSD side
 - tcpdump running on both tunnel sides
 - initial SSH handshake passes, then

08:32:25.353401 IP6 fd00:abcd:114:2::1.22 > fd00:abcd:114:2::1001.54728: Flags [P.], seq 2627:2671, ack 2086, win 1042, options [nop,nop,TS val 3092511785 ecr 3727979942], length 44
08:32:25.353420 IP6 fd00:abcd:114:2::1001.54728 > fd00:abcd:114:2::1.22: Flags [.], ack 2671, win 501, options [nop,nop,TS val 3727979943 ecr 3092511785], length 0
08:32:25.354086 IP6 fd00:abcd:114:2::1001.54728 > fd00:abcd:114:2::1.22: Flags [P.], seq 2086:2642, ack 2671, win 501, options [nop,nop,TS val 3727979944 ecr 3092511785], length 556
08:32:25.564213 IP6 fd00:abcd:114:2::1001.54728 > fd00:abcd:114:2::1.22: Flags [P.], seq 2086:2642, ack 2671, win 501, options [nop,nop,TS val 3727980154 ecr 3092511785], length 556
08:32:25.776229 IP6 fd00:abcd:114:2::1001.54728 > fd00:abcd:114:2::1.22: Flags [P.], seq 2086:2642, ack 2671, win 501, options [nop,nop,TS val 3727980366 ecr 3092511785], length 556
...
08:34:15.540211 IP6 fd00:abcd:114:2::1001.54728 > fd00:abcd:114:2::1.22: Flags [P.], seq 2086:2642, ack 2671, win 501, options [nop,nop,TS val 3728090130 ecr 3092511785], length 556

a 556 byte packet gets "stuck" - this is seen on the client side tcpdump, but never show up on the server side tcpdump.

The initial handshake up to the "length 44" and "length 0" byte packets ARE seen on the server side tcpdump, so generally, tcpdump is working fine:

08:32:25.352754 IP6 fd00:abcd:114:2::1.22 > fd00:abcd:114:2::1001.54728: Flags [P.], seq 2627:2671, ack 2086, win 1042, options [nop,nop,TS val 3092511785 ecr 3727979942], length 44
08:32:25.353596 IP6 fd00:abcd:114:2::1001.54728 > fd00:abcd:114:2::1.22: Flags [.], ack 2671, win 501, options [nop,nop,TS val 3727979943 ecr 3092511785], length 0

This does not look related to MTU/MSS (FreeBSD DCO seems to do mssfix just fine, I see packets coming out with mss 1360 - which is fine) - especially as the packet that is eaten is small anyway.
Comment 4 Gert Doering 2022-09-26 06:52:01 UTC
OK, so this seems to be TOS related - the "packet not getting through" is the first one to show "class 0x10".

08:42:29.940180 IP6 (flowlabel 0x8e0f2, hlim 64, next-header TCP (6) payload length: 76) fd00:abcd:114:2::1.22 > fd00:abcd:114:2::1001.54734: Flags [P.], cksum 0x3223 (correct), seq 2627:2671, ack 2086, win 1042, options [nop,nop,TS val 1553137522 ecr 3728584529], length 44
08:42:29.940212 IP6 (flowlabel 0x4302d, hlim 64, next-header TCP (6) payload length: 32) fd00:abcd:114:2::1001.54734 > fd00:abcd:114:2::1.22: Flags [.], cksum 0x9f30 (correct), ack 2671, win 501, options [nop,nop,TS val 3728584530 ecr 1553137522], length 0
08:42:29.940420 IP6 (class 0x10, flowlabel 0x4302d, hlim 64, next-header TCP (6) payload length: 588) fd00:abcd:114:2::1001.54734 > fd00:abcd:114:2::1.22: Flags [P.], cksum 0xe2c7 (correct), seq 2086:2642, ack 2671, win 501, options [nop,nop,TS val 3728584530 ecr 1553137522], length 556
08:42:30.152235 IP6 (class 0x10, flowlabel 0x4302d, hlim 64, next-header TCP (6) payload length: 588) fd00:abcd:114:2::1001.54734 > fd00:abcd:114:2::1.22: Flags [P.], cksum 0xe1f3 (correct), seq 2086:2642, ack 2671, win 501, options [nop,nop,TS val 3728584742 ecr 1553137522], length 556
08:42:30.364255 IP6 (class 0x10, flowlabel 0x03eee, hlim 64, next-header TCP (6) payload length: 588) fd00:abcd:114:2::1001.54734 > fd00:abcd:114:2::1.22: Flags [P.], cksum 0xe11f (correct), seq 2086:2642, ack 2671, win 501, options [nop,nop,TS val 3728584954 ecr 1553137522], length 556


... but that might be a red herring... calling "ssh -o 'ipqos 0' ..." will result in the same stall, but no more "class 0x10".  Mmmh.
Comment 5 Marek Zarychta 2022-09-26 06:59:28 UTC
Definitely, it's ToS that makes packets invisible to tcpdump(1) on the tun(4) link, reproducible with netcat. To reproduce run

server:
nc -6 -l 52555
client with DCO:
nc -T lowdelay -6 server 52555
Comment 6 Marek Zarychta 2022-09-26 07:12:11 UTC
(In reply to Marek Zarychta from comment #5)
To clarify, when sniffed with tcpdump(1), the packets with IPv6 Traffic Class 0x10 are visible at tun(4) endpoint, but are not visible at DCO accelerated ovpn(4) endpoint.
Comment 7 Kristof Provost freebsd_committer freebsd_triage 2022-09-26 07:32:41 UTC
(In reply to Marek Zarychta from comment #6)
Thank you both for the report. It's very nice when people narrow down the reproduction scenario to this extent.

I've been able to reproduce the problem with a simple `ping6 -z 16` (i.e. set lowdelay TOS). I currently have no idea why, but with a simple reproduction case like this that's just a matter of time.
Comment 8 Marek Zarychta 2022-09-26 07:50:07 UTC
It looks like packets with traffic class set are invisible to tcpdump(1) but still being sent by ovpn(4) interface. When the DCO side acts as a receiver, such packets are not intercepted by ovpn(4), and thus ssh(1) session breaks when entering interactive mode when ToS is applied also at ssh server side.
Comment 9 Gert Doering 2022-09-26 09:04:29 UTC
(In reply to Kristof Provost from comment #7)

I have now tested a bit more.  It's... interesting.

Sending a packet with ToS 0x10 (fping6 -O 0x10) into an ovpn(4) interface will properly transmit the encapsulated packet to the remote host *but* "tcpdump -n -i tun7" will not show the packet.

Receiving a packet with ToS 0x10 on an ovpn(4) OpenVPN peer (encrypted packet coming in from LAN) will neither show the packet on tcpdump, nor receive it into "FreeBSD stack".

Based on this I have now built a t_client test that will excercise ping tests with ToS 0x10, so I can automatedly test if it works or breaks.  Thanks for the challenge :-)
Comment 10 Kristof Provost freebsd_committer freebsd_triage 2022-09-26 09:27:18 UTC
(In reply to Gert Doering from comment #9)
It's even more bizarre than that, in that a ToS of say 15 does work. It only breaks as soon as the first bit in the second byte is set.

My current debugging does agree that it's a receive side issue, but I'm not yet clear on where the packet gets lost.
Comment 11 Kristof Provost freebsd_committer freebsd_triage 2022-09-26 09:55:43 UTC
(In reply to Kristof Provost from comment #10)
I suspect I found it:

diff --git a/sys/net/if_ovpn.c b/sys/net/if_ovpn.c
index 286125fb42d5..0577fcee8618 100644
--- a/sys/net/if_ovpn.c
+++ b/sys/net/if_ovpn.c
@@ -1572,7 +1581,7 @@ ovpn_get_af(struct mbuf *m)
                return (AF_INET);

        ip6 = mtod(m, struct ip6_hdr *);
-       if (ip6->ip6_vfc == IPV6_VERSION)
+       if ((ip6->ip6_vfc & IPV6_VERSION_MASK) == IPV6_VERSION)
                return (AF_INET6);

        return (0);

The check for 'what IP version is this packet?' didn't account for the ToS field sharing bits with the IP version field. We didn't see the outgoing packet in tcpdump, because the BPF capture point (for outbound traffic) is conditional on the address family (in part to avoid capturing control packets, in part because the capture point needs to know).
On the receive side the packet gets decrypted, but not passed to the IP stack, because we don't know where to send it (i.e. v4 or v6).

Small fix for a bigger issue.

I'll also extend the FreeBSD if_ovpn tests to include packets with the ToS bits set.
Comment 12 Marek Zarychta 2022-09-26 10:19:38 UTC
(In reply to Kristof Provost from comment #11)

I can confirm, that the proposed patch solves the issue in my case 
Thanks for an expedited fix !
Comment 13 commit-hook freebsd_committer freebsd_triage 2022-09-26 11:55:58 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=76e1c9c671043e08bdd951ae6c768b541fdede19

commit 76e1c9c671043e08bdd951ae6c768b541fdede19
Author:     Kristof Provost <kp@FreeBSD.org>
AuthorDate: 2022-09-26 09:58:51 +0000
Commit:     Kristof Provost <kp@FreeBSD.org>
CommitDate: 2022-09-26 11:54:20 +0000

    if_ovpn: fix address family check when traffic class bits are set

    When the tunneled (IPv6) traffic had traffic class bits set (but only >=
    16) the packet got lost on the receive side.

    This happened because the address family check in ovpn_get_af() failed
    to mask correctly, so the version check didn't match, causing us to drop
    the packet.

    While here also extend the existing 6-in-6 test case to trigger this
    issue.

    PR:             266598
    Sponsored by:   Rubicon Communications, LLC ("Netgate")

 sys/net/if_ovpn.c                | 2 +-
 tests/sys/net/if_ovpn/if_ovpn.sh | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)