Summary: | Data corruption over if_ovpn (OpenVPN DCO) observed | ||
---|---|---|---|
Product: | Base System | Reporter: | ss3bsd <3226388001> |
Component: | kern | Assignee: | freebsd-net (Nobody) <net> |
Status: | Closed FIXED | ||
Severity: | Affects Some People | CC: | ae, franco, kib, kp, markj, mike, zarychtam, zlei |
Priority: | --- | ||
Version: | 14.1-RELEASE | ||
Hardware: | Any | ||
OS: | Any |
Description
ss3bsd
2024-06-28 12:31:50 UTC
Did you run the same test with --disable-dco option? (In reply to Andrey V. Elsukov from comment #1) With a different config at a physical machine, yes, and the issue did not occur if disable-dco is set. # openvpn --config CONFIG --disable-dco * Perhaps the sample configs I showed in the reproduce step need to be modified to use the disable-dco option since the option is not compatible with dev ovpn0. Has anyone else reproduced this? Since I have already decided not to use if_ovpn, this additional comment below will be the last one unless someone can reproduce the issue. -- The issue has been confirmed using Hyper-V (10.0.22621 [SP3]) too. Also, another issue (server-side file corrption during FTP GET over if_oven) is observed there. [procedure #2] Test condition is almost same as the already reported one. Smaller test file is enough this time, but GET multiple times instead. ** prepare test data # dd if=/dev/random bs=1M count=4 of=randomfile ** transfer data over if_ovpn (put and get back) ftp> put randomfile local: randomfile remote: randomfile 229 Entering Extended Passive Mode (|||56921|) 150 Opening BINARY mode data connection for 'randomfile'. 100% |********************************************************************************************************| 4096 KiB 107.39 MiB/s 00:00 ETA 226 Transfer complete. 4194304 bytes sent in 00:00 (104.75 MiB/s) ftp> get randomfile randomfile.returned1 local: randomfile.returned1 remote: randomfile 229 Entering Extended Passive Mode (|||52890|) 150 Opening BINARY mode data connection for 'randomfile' (4194304 bytes). 100% |********************************************************************************************************| 4096 KiB 106.43 MiB/s 00:00 ETA 226 Transfer complete. 4194304 bytes received in 00:00 (106.16 MiB/s) ftp> get randomfile randomfile.returned2 local: randomfile.returned2 remote: randomfile 229 Entering Extended Passive Mode (|||51626|) 150 Opening BINARY mode data connection for 'randomfile' (4194304 bytes). 100% |********************************************************************************************************| 4096 KiB 112.00 MiB/s 00:00 ETA 226 Transfer complete. 4194304 bytes received in 00:00 (111.72 MiB/s) ftp> get randomfile randomfile.returned3 local: randomfile.returned3 remote: randomfile 229 Entering Extended Passive Mode (|||50186|) 150 Opening BINARY mode data connection for 'randomfile' (4194304 bytes). 100% |********************************************************************************************************| 4096 KiB 116.54 MiB/s 00:00 ETA 226 Transfer complete. 4194304 bytes received in 00:00 (116.23 MiB/s) ftp> get randomfile randomfile.returned4 local: randomfile.returned4 remote: randomfile 229 Entering Extended Passive Mode (|||56341|) 150 Opening BINARY mode data connection for 'randomfile' (4194304 bytes). 100% |********************************************************************************************************| 4096 KiB 96.91 MiB/s 00:00 ETA 226 Transfer complete. 4194304 bytes received in 00:00 (96.68 MiB/s) ftp> bye 221 Goodbye. # md5sum randomfile* 76b3ea410e4d56b9b684116a2a8319cb randomfile 76b3ea410e4d56b9b684116a2a8319cb randomfile.returned1 -- file is small, so corruption at the network did not occur this time 13f1d329379862cd1e157eeb55c6a436 randomfile.returned2 -- the server-side file has been destroyed by the previous GET (*) 4d3846965ad68016d545ae4222a33ed5 randomfile.returned3 -- same as above ebfc8ff8ee13841599562a8a211c5b31 randomfile.returned4 -- same as above (*) example of the file corruption by GET command. -- At the ftp server side -- * before "get randomfile randomfile.returned4" # md5sum randomfile ebfc8ff8ee13841599562a8a211c5b31 randomfile -- matches to randomfile.returned4 at the client side later * after "get randomfile randomfile.returned4" # md5sum randomfile 5a49c61306c82a4c57768876b95c4ae4 randomfile -- destroyed (In reply to ss3bsd from comment #3) I guess a small file ( 1M bytes ) is enough to reproduce. Maybe require multiple rounds though. > # md5sum randomfile* This is good to check the corruption, but less useful to figure out how many bytes got corrupted. May you please also share the binary diff ? You can do that by: ``` # diff <( hexdump -x randomfile ) <( hexdump -x randomfile.return ) ``` Setting "tun-mtu 1400" on the OpenVPN access server should work as a workaround. Please see bug 276838 for more details. FWIW: there is another mild flaw with the FreeBSD implementation of DCO: the ovpn(4) module doesn't support "multihome" option (more details in bug 273664). Please let me apologise since I hastily connected this bug with another in comment #5 misleading the audience. These bugs seem to be unrelated to each other. To not hijack this PR but to give more insight into the problem let me add some details: The bug described here is reproducible, but to narrow the scope I observed that only data sent directly from OpenVPN DCO accelerated FreeBSD-driven access server to the FreeBSD client was corrupted. The corruption doesn't affect network packets transferred (routed) via OpenVPN DCO accelerated FreeBSD access server. (In reply to Zhenlei Huang from comment #4) I guess the file is entirely corrupted. --- # md5sum randomfile* 4d1b051f8bdea93ddca5d8831a0a829c randomfile 4d1b051f8bdea93ddca5d8831a0a829c randomfile.returned1 dd8e4c1a9839976c9e812456ecf71b3a randomfile.returned2 de2d31f956a53a0a86e64058886eb1e9 randomfile.returned3 9432cd3c1ec5540e5251f657e18b93a8 randomfile.returned4 # hexdump -x randomfile > randomfile.hex # hexdump -x randomfile.returned2 > randomfile.returned2.hex # head randomfile.hex 0000000 0595 a291 c861 c7ee c67a 59ab 73ca 66e3 0000010 29ca 9d14 d874 fe71 2fe2 76ae 93b0 cb1a 0000020 df5b a72f 2256 1e88 337d d9a8 4a12 0923 0000030 5a72 56f6 64f8 647a 7a2c a33b 5f0b 87ec 0000040 e86e fe3e 62a3 e1f4 992a 3fc1 7535 c273 0000050 40f1 ce42 6895 cf20 e272 74b5 485a 8f9b 0000060 584e 3431 78bd f48c efd4 cfc8 cd19 8ca7 0000070 cf09 78b2 a189 3d08 4c4e 3acb 057d 3a20 0000080 06d9 2c22 a667 8b38 0aaa 7176 c86f 71b3 0000090 9dd8 6128 4a08 2475 674d f801 daf7 4fe9 # head randomfile.returned2.hex 0000000 9ae9 866b 3498 82ee aef9 abeb af8a 6ae4 0000010 ce04 3671 301d af4e fab0 f01d 890a a143 0000020 8635 7c19 982f 0dcf 34e3 c09e 9992 54ad 0000030 1a97 2a3b 78c4 b3a9 3230 3a67 3649 5f97 0000040 5944 7980 adde 5f02 1e35 73f0 4b7d 32ee 0000050 bd6d 514a 4065 9817 fe0d 6d74 7736 626e 0000060 ef27 d331 7f8b 2ebb 5568 9b92 7264 7846 0000070 606a 0132 d1c9 1a3e 230e 4061 3897 c873 0000080 78ea e2f7 6f54 38c2 ceb9 413d 557f 20ea 0000090 3ac1 3367 7f3b 01bf fc96 75bb 549a 5b3e # diff randomfile.hex randomfile.returned2.hex > randomfile.hex.diff # wc *.hex* 262145 2359297 18874376 randomfile.hex 524290 5242882 38797334 randomfile.hex.diff 262145 2359297 18874376 randomfile.returned2.hex 1048580 9961476 76546086 total (In reply to Marek Zarychta from comment #7) Hi. I had checked your PR before reporting this issue. Reducing MTU did not solve this, so I agree that your PR and mine are separate issues. Thanks. While investigating the bug, I found that when ftpd(8) is serving a file from the OpenVPN access server itself (from the ZFS filesystem in my case), the file silently (apparently!) gets corrupted on that SERVER! To reproduce, connect from another machine to the FTP server (running on OpenVPN access server) offering read access to the file on the ZFS dataset and download the file a few times, checking if the checksum of the file served remains the same (see the experiment below). In my case, the checksum of the file on the server changed with each file access via ftpd(8), but only if it was served from the OpenVPN access server and was accessed via the OpenVPN tunnel. When the file was accessed by OpenVPN tunnel but, terminated on another access server (also running FreeBSD and using OpenVPN DCO module!), everything was fine. Moreover, the file never got corrupted when the connection was using a Wireguard tunnel terminated on the same machine or the connection was established without any tunnelling. 1. CLient PC: [homepc] /tmp% lftp ftp://somehost lftp somehost:~> set xfer:clobber yes lftp somehost:~> get file 81362184 bajty przesłane w ciągu 3 sekund (22.60 MiB/s) lftp somehost:/> get file 81362184 bajty przesłane w ciągu 3 sekund (23.17 MiB/s) lftp somehost:/> get file 81362184 bajty przesłane w ciągu 3 sekund (23.17 MiB/s) lftp somehost:/> (...) 2. FreeBSD OpenVPN access server and ftpd(8) server: [somehost] ~ftp# stat file ; sha1sum file 0:13:54 2019" 131072 152537 0x800 file 589ca9781368ffd66f35394f3b7bf43ab9d98b33 file [somehost] ~ftp# stat file ; sha1sum file 11075423362389754006 262491 -rw-rw-r-- 2 root wheel 0 81362184 "sty 1 01:00:00 1970" "lip 8 10:13:54 2019" "sie 31 00:40:13 2024" "lip 8 1 0:13:54 2019" 131072 152537 0x800 file 05ba9b9a60df861887dcc690c1237998c1844180 file [somehost] ~ftp# stat file ; sha1sum file 11075423362389754006 262491 -rw-rw-r-- 2 root wheel 0 81362184 "sty 1 01:00:00 1970" "lip 8 10:13:54 2019" "sie 31 00:40:13 2024" "lip 8 1 0:13:54 2019" 131072 152537 0x800 file d259dbd577c17b5ad7da0d1d0fa67f5564a68e72 file [somehost] ~ftp# zpool scrub zroot [somehost] ~ftp# zpool status pool: zroot state: ONLINE scan: scrub repaired 0B in 00:02:32 with 0 errors on Tue Sep 3 12:37:03 2024 config: NAME STATE READ WRITE CKSUM zroot ONLINE 0 0 0 gpt/ssdrootfs ONLINE 0 0 0 errors: No known data errors [somehost] ~ftp# stat file ; sha1sum file 11075423362389754006 262491 -rw-rw-r-- 2 root wheel 0 81362184 "sty 1 01:00:00 1970" "lip 8 10:13:54 2019" "sie 31 00:40:13 2024" "lip 8 1 0:13:54 2019" 131072 152537 0x800 file fdf68d23439f5a269850e71e1807ea8889a8ffd0 file [somehost] ~ftp# stat file ; sha1sum file 11075423362389754006 262491 -rw-rw-r-- 2 root wheel 0 81362184 "sty 1 01:00:00 1970" "lip 8 10:13:54 2019" "sie 31 00:40:13 2024" "lip 8 1 0:13:54 2019" 131072 152537 0x800 file 7674bcd2cbfdff3c5455e287c609b3364c311914 file (In reply to Marek Zarychta from comment #10) ftpd uses sendfile(2) AFAIR, and the symptoms of the file corruption mean that M_RDONLY mbuf, of type M_EXTPG, modified by the network stack. I still can't reproduce this, but kib's remark makes a lot of sense. This should fix the problem: https://reviews.freebsd.org/D46529 (In reply to Kristof Provost from comment #12) This patch seems to work. Thank you. (In reply to Kristof Provost from comment #12) With the patch from the review D46529 applied, I can no longer reproduce the original issue reported here by ss3bsd and the issue described by me in comment #10. Thank you for the fix, please push it. (In reply to Kristof Provost from comment #12) >I still can't reproduce this A few questions arise: Could the issue described in comment #10 be reproducible when sendfile(2) uses UFS as a backend? Is sendfile(2) on ZFS reliable and safe? AFAIR there were issues with sendfile(8) and ZFS in the past. FWIW: the file I have experimented with seems to be untouched, at least the checksums of the file and those taken from various historical ZFS snapshots are the same. A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=5644e2c6d47c6113a61ab7fc0776b7227677656a commit 5644e2c6d47c6113a61ab7fc0776b7227677656a Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2024-09-04 12:54:23 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2024-09-05 08:59:31 +0000 if_ovpn: ensure it's safe to modify the mbuf PR: 280036 Reviewed by: ae MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D46529 sys/net/if_ovpn.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=8f251937850142748cdf67a46630342934ff9f91 commit 8f251937850142748cdf67a46630342934ff9f91 Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2024-09-04 12:54:23 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2024-09-12 07:57:04 +0000 if_ovpn: ensure it's safe to modify the mbuf PR: 280036 Reviewed by: ae MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D46529 (cherry picked from commit 5644e2c6d47c6113a61ab7fc0776b7227677656a) sys/net/if_ovpn.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) |