Tested on 12.1-Release, 12.1-Release-p1 and with patch regarding tso from https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236999 All VM running on the same host with VMware ESXi, 6.7.0, 15160138 1 VM - FreeBSD 12.1 2 VM - Any OS (tested with different, second one is irellevant) After some network load packets in the interface queue being to get stuck Reproduce in differents enviroment with different netowrk load Easiest way to reproduce - copy large file from second VM to FreeBSD using samba or ftp. Some output examples: 172.31.255.2 - FreeBSD 172.31.255.3 - second VM Ping: 8008 bytes from 172.31.255.3: icmp_seq=321 ttl=128 time=0.375 ms 8008 bytes from 172.31.255.3: icmp_seq=322 ttl=128 time=0.229 ms 8008 bytes from 172.31.255.3: icmp_seq=323 ttl=128 time=0.179 ms 8008 bytes from 172.31.255.3: icmp_seq=324 ttl=128 time=0.203 ms 8008 bytes from 172.31.255.3: icmp_seq=325 ttl=128 time=0.278 ms 8008 bytes from 172.31.255.3: icmp_seq=326 ttl=128 time=0.178 ms 8008 bytes from 172.31.255.3: icmp_seq=327 ttl=128 time=2.204 ms 8008 bytes from 172.31.255.3: icmp_seq=328 ttl=128 time=0.226 ms 8008 bytes from 172.31.255.3: icmp_seq=329 ttl=128 time=0.221 ms 8008 bytes from 172.31.255.3: icmp_seq=330 ttl=128 time=0.466 ms 8008 bytes from 172.31.255.3: icmp_seq=331 ttl=128 time=0.253 ms 8008 bytes from 172.31.255.3: icmp_seq=350 ttl=128 time=1020.494 ms <-- start file copy 8008 bytes from 172.31.255.3: icmp_seq=362 ttl=128 time=1037.839 ms 8008 bytes from 172.31.255.3: icmp_seq=381 ttl=128 time=4162.085 ms 8008 bytes from 172.31.255.3: icmp_seq=382 ttl=128 time=5202.940 ms 8008 bytes from 172.31.255.3: icmp_seq=383 ttl=128 time=6200.121 ms 8008 bytes from 172.31.255.3: icmp_seq=385 ttl=128 time=7261.939 ms 8008 bytes from 172.31.255.3: icmp_seq=386 ttl=128 time=7200.745 ms 8008 bytes from 172.31.255.3: icmp_seq=390 ttl=128 time=4100.259 ms tcpdump after propblem appear: # tcpdump -i vmx1 icmp tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on vmx1, link-type EN10MB (Ethernet), capture size 262144 bytes 16:09:41.700833 IP 172.31.255.2 > 172.31.255.3: ICMP echo request, id 23149, seq 0, length 8008 16:09:42.727307 IP 172.31.255.2 > 172.31.255.3: ICMP echo request, id 23149, seq 1, length 8008 16:09:43.761866 IP 172.31.255.2 > 172.31.255.3: ICMP echo request, id 23149, seq 2, length 8008 16:09:43.762117 IP 172.31.255.3 > 172.31.255.2: ICMP echo reply, id 23149, seq 0, length 8008 16:09:44.771901 IP 172.31.255.2 > 172.31.255.3: ICMP echo request, id 23149, seq 3, length 8008 16:09:44.978178 IP truncated-ip - 7810 bytes missing! 172.31.255.3 > 172.31.255.2: ICMP echo reply, id 23149, seq 1, length 8008 16:09:45.788996 IP 172.31.255.2 > 172.31.255.3: ICMP echo request, id 23149, seq 4, length 8008 16:09:46.039983 IP truncated-ip - 7928 bytes missing! 172.31.255.3 > 172.31.255.2: ICMP echo reply, id 23149, seq 2, length 8008 16:09:46.290583 IP truncated-ip - 7928 bytes missing! 172.31.255.3 > 172.31.255.2: ICMP echo reply, id 23149, seq 3, length 8008 16:09:46.801911 IP 172.31.255.2 > 172.31.255.3: ICMP echo request, id 23149, seq 5, length 8008 16:09:47.861870 IP 172.31.255.2 > 172.31.255.3: ICMP echo request, id 23149, seq 6, length 8008 16:09:48.071886 IP truncated-ip - 7822 bytes missing! 172.31.255.3 > 172.31.255.2: ICMP echo reply, id 23149, seq 4, length 8008 16:09:48.871905 IP 172.31.255.2 > 172.31.255.3: ICMP echo request, id 23149, seq 7, length 8008 16:09:49.802033 IP truncated-ip - 7928 bytes missing! 172.31.255.3 > 172.31.255.2: ICMP echo reply, id 23149, seq 5, length 8008 16:09:49.891036 IP 172.31.255.2 > 172.31.255.3: ICMP echo request, id 23149, seq 8, length 8008 16:09:50.802001 IP truncated-ip - 7928 bytes missing! 172.31.255.3 > 172.31.255.2: ICMP echo reply, id 23149, seq 6, length 8008 16:09:50.901923 IP 172.31.255.2 > 172.31.255.3: ICMP echo request, id 23149, seq 9, length 8008 16:09:51.961848 IP 172.31.255.2 > 172.31.255.3: ICMP echo request, id 23149, seq 10, length 8008 Seems like new arrived packets pushed out old packets from the queue After some time even more than 40 seconds gap can exist between sending and receiving packet. tcpdump running on the same time on second VM shows sending a response immediately after receiving request. Disabling/enabling any driver features like tso, csum, etc, changing mtu irrelevant to the problem.
I am seeing a somewhat similar problem. When scp-ing a large file from a hardware host to a VM with vmxnet3 interface, I see that the copy starts very fast at first, but then it stalls for many seconds and then suddenly it becomes fast again and the copy completes. If a file is very large, then the stall+recovery can happen a few times before the copy is complete.
Have you tried running an image with this patch in place: https://svnweb.freebsd.org/base?view=revision&revision=356703? If you do not have that patch, you should try to see if you can reproduce your issue without TSO enabled, specifically without a TSO packet ever attempted to be sent (for example by ensuring TSO is disabled for that interface in /etc/rc.conf). Without the above patch in place, and TSO enabled, it may be possible (that is, has not been ruled out) for the state of the virtual device to become corrupted via the sending of TSO packets, so disabling TSO on the interface later might not make a difference.
(In reply to Patrick Kelsey from comment #2) Ah, sorry, just saw that you stated you are running with that patch in place.
I did some more tests. Hope this will provide some more information. First one with recompiled kernel with TSO patch. As a network load was a file coping to server using samba ifconfig vmx1 vmx1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 9000 options=e403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6> ether 00:50:56:be:f0:13 inet 172.31.255.2 netmask 0xffffff00 broadcast 172.31.255.255 media: Ethernet autoselect status: active nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> # tcpdump -i vmx1 icmp & # tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on vmx1, link-type EN10MB (Ethernet), capture size 262144 bytes # ping -s 8000 172.31.255.3 PING 172.31.255.3 (172.31.255.3): 8000 data bytes 11:59:07.108253 IP 172.31.255.2 > 172.31.255.3: ICMP echo request, id 30548, seq 0, length 8008 11:59:07.108425 IP 172.31.255.3 > 172.31.255.2: ICMP echo reply, id 30548, seq 0, length 8008 8008 bytes from 172.31.255.3: icmp_seq=0 ttl=128 time=0.226 ms 11:59:08.126583 IP 172.31.255.2 > 172.31.255.3: ICMP echo request, id 30548, seq 1, length 8008 11:59:08.126754 IP 172.31.255.3 > 172.31.255.2: ICMP echo reply, id 30548, seq 1, length 8008 8008 bytes from 172.31.255.3: icmp_seq=1 ttl=128 time=0.213 ms --- skipped some lines --- 12:00:20.401492 IP 172.31.255.2 > 172.31.255.3: ICMP echo request, id 30548, seq 71, length 8008 8008 bytes from 172.31.255.3: icmp_seq=71 ttl=128 time=0.550 ms 12:00:20.402010 IP 172.31.255.3 > 172.31.255.2: ICMP echo reply, id 30548, seq 71, length 8008 12:00:21.408758 IP 172.31.255.2 > 172.31.255.3: ICMP echo request, id 30548, seq 72, length 8008 8008 bytes from 172.31.255.3: icmp_seq=72 ttl=128 time=2.303 ms 12:00:21.410995 IP 172.31.255.3 > 172.31.255.2: ICMP echo reply, id 30548, seq 72, length 8008 12:00:24.527165 IP 172.31.255.2 > 172.31.255.3: ICMP echo request, id 30548, seq 73, length 8008 8008 bytes from 172.31.255.3: icmp_seq=73 ttl=128 time=133.291 ms 12:00:24.592341 IP 172.31.255.3 > 172.31.255.2: ICMP echo reply, id 30548, seq 73, length 8008 12:00:25.569300 IP 172.31.255.2 > 172.31.255.3: ICMP echo request, id 30548, seq 74, length 8008 12:00:25.662953 IP 172.31.255.3 > 172.31.255.2: ICMP echo reply, id 30548, seq 74, length 8008 --- after seqnum 73 packets received by kernel and seen with tcpdump but not returned to ping process --- skipped some lines --- 12:01:27.114142 IP 172.31.255.2 > 172.31.255.3: ICMP echo request, id 30548, seq 134, length 8008 12:01:27.160943 IP 172.31.255.3 > 172.31.255.2: ICMP echo reply, id 30548, seq 134, length 8008 12:01:28.125972 IP 172.31.255.2 > 172.31.255.3: ICMP echo request, id 30548, seq 135, length 8008 12:01:28.126346 IP truncated-ip - 7982 bytes missing! 172.31.255.3 > 172.31.255.2: ICMP echo reply, id 30548, seq 135, length 8008 --- received malformed L2 frame from seqnum >= 135 12:01:29.198552 IP 172.31.255.2 > 172.31.255.3: ICMP echo request, id 30548, seq 136, length 8008 12:01:29.223302 IP truncated-ip - 7810 bytes missing! 172.31.255.3 > 172.31.255.2: ICMP echo reply, id 30548, seq 136, length 8008 12:01:30.214849 IP 172.31.255.2 > 172.31.255.3: ICMP echo request, id 30548, seq 137, length 8008 12:01:30.221687 IP truncated-ip - 7822 bytes missing! 172.31.255.3 > 172.31.255.2: ICMP echo reply, id 30548, seq 137, length 8008 12:01:31.246460 IP 172.31.255.2 > 172.31.255.3: ICMP echo request, id 30548, seq 138, length 8008 --- skip some lines 12:01:37.514942 IP 172.31.255.2 > 172.31.255.3: ICMP echo request, id 30548, seq 144, length 8008 12:01:37.517865 IP truncated-ip - 7808 bytes missing! 172.31.255.3 > 172.31.255.2: ICMP echo reply, id 30548, seq 144, length 8008 12:01:38.579626 IP 172.31.255.2 > 172.31.255.3: ICMP echo request, id 30548, seq 145, length 8008 12:01:38.615120 IP truncated-ip - 7928 bytes missing! 172.31.255.3 > 172.31.255.2: ICMP echo reply, id 30548, seq 145, length 8008 12:01:39.603253 IP 172.31.255.2 > 172.31.255.3: ICMP echo request, id 30548, seq 146, length 8008 12:01:40.614996 IP 172.31.255.2 > 172.31.255.3: ICMP echo request, id 30548, seq 147, length 8008 12:01:40.615183 IP 172.31.255.3 > 172.31.255.2: ICMP echo reply, id 30548, seq 146, length 8008 --- difference in 1 second between sending and receiveng reply from seqnum 146 12:01:40.615201 IP truncated-ip - 7928 bytes missing! 172.31.255.3 > 172.31.255.2: ICMP echo reply, id 30548, seq 147, length 8008 8008 bytes from 172.31.255.3: icmp_seq=146 ttl=128 time=1011.985 ms 12:01:41.657600 IP 172.31.255.2 > 172.31.255.3: ICMP echo request, id 30548, seq 148, length 8008 12:01:42.701072 IP 172.31.255.2 > 172.31.255.3: ICMP echo request, id 30548, seq 149, length 8008 12:01:42.701321 IP 172.31.255.3 > 172.31.255.2: ICMP echo reply, id 30548, seq 148, length 8008 8008 bytes from 172.31.255.3: icmp_seq=148 ttl=128 time=1043.763 ms 12:01:43.615120 IP truncated-ip - 7928 bytes missing! 172.31.255.3 > 172.31.255.2: ICMP echo reply, id 30548, seq 149, length 8008 12:01:43.714982 IP 172.31.255.2 > 172.31.255.3: ICMP echo request, id 30548, seq 150, length 8008 12:01:43.988367 IP truncated-ip - 7808 bytes missing! 172.31.255.3 > 172.31.255.2: ICMP echo reply, id 30548, seq 150, length 8008 12:01:44.787457 IP 172.31.255.2 > 172.31.255.3: ICMP echo request, id 30548, seq 151, length 8008 12:01:44.788966 IP truncated-ip - 7782 bytes missing! 172.31.255.3 > 172.31.255.2: ICMP echo reply, id 30548, seq 151, length 8008 12:01:45.815011 IP 172.31.255.2 > 172.31.255.3: ICMP echo request, id 30548, seq 152, length 8008 12:01:45.970727 IP truncated-ip - 7976 bytes missing! 172.31.255.3 > 172.31.255.2: ICMP echo reply, id 30548, seq 152, length 8008 12:01:46.834089 IP 172.31.255.2 > 172.31.255.3: ICMP echo request, id 30548, seq 153, length 8008 12:01:47.615212 IP truncated-ip - 7928 bytes missing! 172.31.255.3 > 172.31.255.2: ICMP echo reply, id 30548, seq 153, length 8008 12:01:47.897600 IP 172.31.255.2 > 172.31.255.3: ICMP echo request, id 30548, seq 154, length 8008 12:01:48.914981 IP 172.31.255.2 > 172.31.255.3: ICMP echo request, id 30548, seq 155, length 8008 12:01:48.915192 IP 172.31.255.3 > 172.31.255.2: ICMP echo reply, id 30548, seq 154, length 8008 8008 bytes from 172.31.255.3: icmp_seq=154 ttl=128 time=1017.638 ms --- some packet reveived undamaged but with delay in 1 second When i try using iperf as network load source in most cases was kernel panic as result: Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 02 fault virtual address = 0x0 fault code = supervisor write data, page not present instruction pointer = 0x20:0xffffffff80cef252 stack pointer = 0x28:0xfffffe00753547c0 frame pointer = 0x28:0xfffffe00753548a0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (if_io_tqg_1) trap number = 12 panic: page fault cpuid = 1 time = 1579255990 KDB: stack backtrace: #0 0xffffffff80c1d297 at kdb_backtrace+0x67 #1 0xffffffff80bd05cd at vpanic+0x19d #2 0xffffffff80bd0423 at panic+0x43 #3 0xffffffff810a7d2c at trap_fatal+0x39c #4 0xffffffff810a7d79 at trap_pfault+0x49 #5 0xffffffff810a736f at trap+0x29f #6 0xffffffff81081a0c at calltrap+0x8 #7 0xffffffff80ce9be5 at _task_fn_rx+0x75 #8 0xffffffff80c1bb54 at gtaskqueue_run_locked+0x144 #9 0xffffffff80c1b7b8 at gtaskqueue_thread_loop+0x98 #10 0xffffffff80b90c23 at fork_exit+0x83 #11 0xffffffff81082a4e at fork_trampoline+0xe Uptime: 16m42s Then reverting to default kernel, disabling tso and reboot: # uname -a FreeBSD ******************* 12.1-RELEASE-p1 FreeBSD 12.1-RELEASE-p1 GENERIC amd64 # ifconfig vmx1 vmx1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000 options=e400bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6> ether 00:50:56:be:f0:13 inet 172.31.255.2 netmask 0xffffff00 broadcast 172.31.255.255 media: Ethernet autoselect status: active nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> # iperf3 -c 172.31.255.2 -p 1234 Connecting to host 172.31.255.2, port 1234 [ 5] local 172.31.255.5 port 32466 connected to 172.31.255.2 port 1234 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.03 sec 497 MBytes 4.05 Gbits/sec 11 8.74 KBytes [ 5] 1.03-2.07 sec 0.00 Bytes 0.00 bits/sec 3 8.74 KBytes [ 5] 2.07-3.06 sec 0.00 Bytes 0.00 bits/sec 1 8.74 KBytes [ 5] 3.06-4.02 sec 0.00 Bytes 0.00 bits/sec 1 8.74 KBytes [ 5] 4.02-5.01 sec 0.00 Bytes 0.00 bits/sec 0 8.74 KBytes [ 5] 5.01-6.03 sec 0.00 Bytes 0.00 bits/sec 1 8.74 KBytes [ 5] 6.03-7.06 sec 0.00 Bytes 0.00 bits/sec 0 8.74 KBytes [ 5] 7.06-8.04 sec 0.00 Bytes 0.00 bits/sec 0 8.74 KBytes [ 5] 8.04-9.07 sec 0.00 Bytes 0.00 bits/sec 0 8.74 KBytes [ 5] 9.07-10.01 sec 0.00 Bytes 0.00 bits/sec 1 8.74 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.01 sec 497 MBytes 416 Mbits/sec 18 sender [ 5] 0.00-10.60 sec 496 MBytes 393 Mbits/sec receiver # ping -s 8000 172.31.255.5 PING 172.31.255.5 (172.31.255.5): 8000 data bytes 8008 bytes from 172.31.255.5: icmp_seq=0 ttl=64 time=0.322 ms 12:22:09.903151 IP 172.31.255.2 > 172.31.255.5: ICMP echo request, id 10122, seq 0, length 8008 12:22:09.903253 IP 172.31.255.5 > 172.31.255.2: ICMP echo reply, id 10122, seq 0, length 8008 12:22:10.922205 IP 172.31.255.2 > 172.31.255.5: ICMP echo request, id 10122, seq 1, length 8008 12:22:10.922300 IP 172.31.255.5 > 172.31.255.2: ICMP echo reply, id 10122, seq 1, length 8008 8008 bytes from 172.31.255.5: icmp_seq=1 ttl=64 time=0.147 ms 12:22:11.969930 IP 172.31.255.2 > 172.31.255.5: ICMP echo request, id 10122, seq 2, length 8008 12:22:11.970035 IP 172.31.255.5 > 172.31.255.2: ICMP echo reply, id 10122, seq 2, length 8008 8008 bytes from 172.31.255.5: icmp_seq=2 ttl=64 time=0.159 ms 12:22:12.997254 IP 172.31.255.2 > 172.31.255.5: ICMP echo request, id 10122, seq 3, length 8008 12:22:12.997386 IP 172.31.255.5 > 172.31.255.2: ICMP echo reply, id 10122, seq 3, length 8008 8008 bytes from 172.31.255.5: icmp_seq=3 ttl=64 time=0.175 ms 12:22:14.029823 IP 172.31.255.2 > 172.31.255.5: ICMP echo request, id 10122, seq 4, length 8008 12:22:14.030017 IP 172.31.255.5 > 172.31.255.2: ICMP echo reply, id 10122, seq 4, length 8008 8008 bytes from 172.31.255.5: icmp_seq=4 ttl=64 time=0.237 ms 12:22:15.058570 IP 172.31.255.2 > 172.31.255.5: ICMP echo request, id 10122, seq 5, length 8008 12:22:15.058769 IP 172.31.255.5 > 172.31.255.2: ICMP echo reply, id 10122, seq 5, length 8008 8008 bytes from 172.31.255.5: icmp_seq=5 ttl=64 time=0.241 ms 12:22:16.096803 IP 172.31.255.2 > 172.31.255.5: ICMP echo request, id 10122, seq 6, length 8008 12:22:16.096896 IP 172.31.255.5 > 172.31.255.2: ICMP echo reply, id 10122, seq 6, length 8008 8008 bytes from 172.31.255.5: icmp_seq=6 ttl=64 time=0.139 ms 12:22:17.136966 IP 172.31.255.2 > 172.31.255.5: ICMP echo request, id 10122, seq 7, length 8008 12:22:17.137224 IP 172.31.255.5 > 172.31.255.2: ICMP echo reply, id 10122, seq 7, length 8008 12:22:18.164014 IP 172.31.255.2 > 172.31.255.5: ICMP echo request, id 10122, seq 8, length 8008 12:22:18.164194 IP 172.31.255.5 > 172.31.255.2: ICMP echo reply, id 10122, seq 8, length 8008 --- packets stops sending to ping process --- skip some lines --- -- but after some time packets againg sending to ping process 12:26:15.636917 IP 172.31.255.2 > 172.31.255.5: ICMP echo request, id 10122, seq 238, length 8008 12:26:15.637147 IP 172.31.255.5 > 172.31.255.2: ICMP echo reply, id 10122, seq 238, length 8008 12:26:16.696907 IP 172.31.255.2 > 172.31.255.5: ICMP echo request, id 10122, seq 239, length 8008 12:26:16.697100 IP 172.31.255.5 > 172.31.255.2: ICMP echo reply, id 10122, seq 239, length 8008 8008 bytes from 172.31.255.5: icmp_seq=239 ttl=64 time=0.256 ms 12:26:17.756044 IP 172.31.255.2 > 172.31.255.5: ICMP echo request, id 10122, seq 240, length 8008 12:26:17.756178 IP 172.31.255.5 > 172.31.255.2: ICMP echo reply, id 10122, seq 240, length 8008 8008 bytes from 172.31.255.5: icmp_seq=240 ttl=64 time=0.190 ms 12:26:18.796861 IP 172.31.255.2 > 172.31.255.5: ICMP echo request, id 10122, seq 241, length 8008 12:26:18.796982 IP 172.31.255.5 > 172.31.255.2: ICMP echo reply, id 10122, seq 241, length 8008 8008 bytes from 172.31.255.5: icmp_seq=241 ttl=64 time=0.176 ms 12:26:19.836847 IP 172.31.255.2 > 172.31.255.5: ICMP echo request, id 10122, seq 242, length 8008 12:26:19.836981 IP 172.31.255.5 > 172.31.255.2: ICMP echo reply, id 10122, seq 242, length 8008 8008 bytes from 172.31.255.5: icmp_seq=242 ttl=64 time=0.192 ms
^Triage: Correct driver (component) name
I probably should open a new bug for my side of the things as my problem can be different from the original problem reported here. But for the moment here is a new bit of information. I tried to gather a little bit more of information with tcpdump and I got a very surprising result. If I run tcpdump on the vmx side, then the copy speed never degrades. If I start copying before starting tcpdump then the speed can degrade, but as soon as I start tcpdump (on the vmx side) the speed immediately recovers. I have tested that many times with the same result. It seems that enabling bpf changes the packet processing flow and works around whatever problem there is. Does anyone have an idea what that change could be and where to dig further? Thanks!
When I run tcpdump on the other end, there is no effect on the speed. There are some potentially interesting snippets in the capture. Here is one where 10.25.200.199 is the vmx side and 10.180.0.51 is the other host that sends data to the vmx host: 115 02:16:28.787281 IP 10.180.0.51.17739 > 10.25.200.199.22: Flags [.], seq 81774:83160, ack 1, win 1026, options [nop,nop,TS val 3595053000 ecr 1184721692], length 1386 116 02:16:28.787284 IP 10.180.0.51.17739 > 10.25.200.199.22: Flags [.], seq 83160:84546, ack 1, win 1026, options [nop,nop,TS val 3595053000 ecr 1184721692], length 1386 117 02:16:28.787304 IP 10.180.0.51.17739 > 10.25.200.199.22: Flags [.], seq 84546:85932, ack 1, win 1026, options [nop,nop,TS val 3595053000 ecr 1184721692], length 1386 118 02:16:28.787307 IP 10.180.0.51.17739 > 10.25.200.199.22: Flags [.], seq 85932:87318, ack 1, win 1026, options [nop,nop,TS val 3595053000 ecr 1184721692], length 1386 119 02:16:28.883545 IP 10.25.200.199.22 > 10.180.0.51.17739: Flags [.], ack 83160, win 2192, options [nop,nop,TS val 1184721792 ecr 3595053000], length 0 120 02:16:28.883576 IP 10.180.0.51.17739 > 10.25.200.199.22: Flags [.], seq 87318:88704, ack 1, win 1026, options [nop,nop,TS val 3595053097 ecr 1184721792], length 1386 121 02:16:28.887035 IP 10.25.200.199.22 > 10.180.0.51.17739: Flags [.], ack 83160, win 2192, options [nop,nop,TS val 1184721792 ecr 3595053000,nop,nop,sack 1 {87318:88704}], length 0 122 02:16:29.336381 IP 10.180.0.51.17739 > 10.25.200.199.22: Flags [.], seq 83160:84546, ack 1, win 1026, options [nop,nop,TS val 3595053549 ecr 1184721792], length 1386 My reading of the above. Packets 115-118: data packets Packet 119: ack for packet 115 Packet 120: another data packet Packet 121: appears to be exact duplicate of the packet 119 (ack) Packet 122: looks like a retransmission of packet 116 after a noticeable delay (0.44s) Well, on another look, packets 119 and 121 are not exactly the same. 121 has this "sack 1 {87318:88704}" that 119 does not have. And it seems like packet 121 is what "confuses" the sending side as the pause and retransmission. Not a TCP expert, so don't know if anything is wrong here.
(In reply to Andriy Gapon from comment #7) It seems like packets 116 (seq 84546) and 117 (seq 85932) were never ack-ed and, thus, lost (presumably by the vmx side) ? Packet 115 was ack-ed, packets 118 and 120 were sack-ed.
A few additional datapoints. Running tcpdump with -p (no promiscuous mode) does not prevent the speed degradation. tcpdump shows many incoming tcp packets are received out of order. Disabling LRO works around the problem.
Disabling all possible options except jumbo frames is not workaround for a problem edited rc.conf & reboot. After receiving approx 300-400 MBytes network become unavailable # ifconfig vmx1 vmx1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000 options=800020<JUMBO_MTU> ether 00:50:56:be:f0:13 inet 172.31.255.2 netmask 0xffffff00 broadcast 172.31.255.255 media: Ethernet autoselect status: active nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> # iperf3 -s -p 1234 ----------------------------------------------------------- Server listening on 1234 ----------------------------------------------------------- Accepted connection from 172.31.255.5, port 52124 [ 5] local 172.31.255.2 port 1234 connected to 172.31.255.5 port 51022 [ ID] Interval Transfer Bitrate [ 5] 0.00-1.02 sec 320 MBytes 2.64 Gbits/sec [ 5] 1.02-2.02 sec 0.00 Bytes 0.00 bits/sec [ 5] 2.02-3.02 sec 0.00 Bytes 0.00 bits/sec [ 5] 3.02-4.01 sec 0.00 Bytes 0.00 bits/sec [ 5] 4.01-5.02 sec 0.00 Bytes 0.00 bits/sec [ 5] 5.02-6.02 sec 0.00 Bytes 0.00 bits/sec [ 5] 6.02-7.01 sec 0.00 Bytes 0.00 bits/sec [ 5] 7.01-8.02 sec 0.00 Bytes 0.00 bits/sec [ 5] 8.02-9.01 sec 0.00 Bytes 0.00 bits/sec [ 5] 9.01-10.01 sec 0.00 Bytes 0.00 bits/sec [ 5] 10.01-10.19 sec 0.00 Bytes 0.00 bits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate [ 5] 0.00-10.19 sec 320 MBytes 263 Mbits/sec receiver ----------------------------------------------------------- Server listening on 1234 -----------------------------------------------------------
For additional: ifconfig down & up temporary fix problem but after receiveing again approx 300-500 Mbytes in short period network become unavailable again or kernel panic (50%/50%) # ifconfig vmx1 down && ifconfig vmx1 up && iperf3 -s -p 1234 ----------------------------------------------------------- Server listening on 1234 ----------------------------------------------------------- Accepted connection from 172.31.255.5, port 39861 [ 5] local 172.31.255.2 port 1234 connected to 172.31.255.5 port 39030 --- kernel panic ---- Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 02 fault virtual address = 0x0 fault code = supervisor write data, page not present instruction pointer = 0x20:0xffffffff80cef252 stack pointer = 0x28:0xfffffe00753547c0 frame pointer = 0x28:0xfffffe00753548a0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (if_io_tqg_1) trap number = 12 panic: page fault cpuid = 1 time = 1580287973 KDB: stack backtrace: #0 0xffffffff80c1d297 at kdb_backtrace+0x67 #1 0xffffffff80bd05cd at vpanic+0x19d #2 0xffffffff80bd0423 at panic+0x43 #3 0xffffffff810a7dcc at trap_fatal+0x39c #4 0xffffffff810a7e19 at trap_pfault+0x49 #5 0xffffffff810a740f at trap+0x29f #6 0xffffffff81081a0c at calltrap+0x8 #7 0xffffffff80ce9be5 at _task_fn_rx+0x75 #8 0xffffffff80c1bb54 at gtaskqueue_run_locked+0x144 #9 0xffffffff80c1b7b8 at gtaskqueue_thread_loop+0x98 #10 0xffffffff80b90c23 at fork_exit+0x83 #11 0xffffffff81082a4e at fork_trampoline+0xe Uptime: 11m47s And from sending side: root@bsd_template:/usr/home/ao # iperf3 -c 172.31.255.2 -p 1234 Connecting to host 172.31.255.2, port 1234 [ 5] local 172.31.255.5 port 39030 connected to 172.31.255.2 port 1234 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.01 sec 614 MBytes 5.07 Gbits/sec 1 8.74 KBytes [ 5] 1.01-2.03 sec 0.00 Bytes 0.00 bits/sec 3 8.74 KBytes [ 5] 2.03-3.02 sec 0.00 Bytes 0.00 bits/sec 1 8.74 KBytes [ 5] 3.02-4.03 sec 0.00 Bytes 0.00 bits/sec 1 8.74 KBytes [ 5] 4.03-5.03 sec 0.00 Bytes 0.00 bits/sec 0 8.74 KBytes [ 5] 5.03-6.02 sec 0.00 Bytes 0.00 bits/sec 1 8.74 KBytes [ 5] 6.02-7.03 sec 0.00 Bytes 0.00 bits/sec 0 8.74 KBytes [ 5] 7.03-8.03 sec 0.00 Bytes 0.00 bits/sec 0 8.74 KBytes [ 5] 8.03-9.03 sec 0.00 Bytes 0.00 bits/sec 0 8.74 KBytes
A commit references this bug: Author: pkelsey Date: Sat Mar 14 19:43:44 UTC 2020 New revision: 358995 URL: https://svnweb.freebsd.org/changeset/base/358995 Log: Fix iflib freelist state corruption This fixes a bug in iflib freelist management that breaks the required correspondence between freelist indexes and driver ring slots. PR: 243126, 243392, 240628 Reported by: avg, alexandr.oleynikov@gmail.com, Harald Schmalzbauer Reviewed by: avg, gallatin MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D23943 Changes: head/sys/net/iflib.c
A commit references this bug: Author: pkelsey Date: Sat Mar 14 19:55:06 UTC 2020 New revision: 358997 URL: https://svnweb.freebsd.org/changeset/base/358997 Log: Remove freelist contiguous-indexes assertion from rxd_frag_to_sd() The vmx driver is an example of an iflib driver that might report packets using non-contiguous descriptors (with unused descriptors either between received packets or between the fragments of a received packet), so this assertion needs to be removed. For such drivers, the freelist producer and consumer indexes don't relate directly to driver ring slots (the driver deals directly with freelist buffer indexes supplied by iflib during refill, and reports them with each fragment during packet reception), but do continue to be used by iflib for accounting, such as determining the number of ring slots that are refillable. PR: 243126, 243392, 240628 Reported by: avg, alexandr.oleynikov@gmail.com, Harald Schmalzbauer Reviewed by: gallatin MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D23946 Changes: head/sys/net/iflib.c
A commit references this bug: Author: pkelsey Date: Sat Mar 14 20:08:04 UTC 2020 New revision: 359000 URL: https://svnweb.freebsd.org/changeset/base/359000 Log: Fix if_vmx receive checksum offload bug and harden against the device skipping receive descriptors This fixes a bug where the checksum offload status of received packets was being taken from the first descriptor instead of the last, which affected LRO packets. The driver has been hardened against the device skipping receive descriptors, although it is not believed that this can occur given the way this implementation configures the receive rings. Additionally, for packets received with the error indicator set, the driver now forces the length of all fragments in that packet to zero prior to passing it to iflib. Such packets should wind up being discarded at some point in the stack anyway, but this removes any questions by killing them in the driver. Counters have been added (and exposed via sysctls) for skipped receive descriptors, zero-length packets received, and packets received with the error indicator set so that these conditions can be easily observed in the field. PR: 243126, 243392, 240628 Reported by: avg, alexandr.oleynikov@gmail.com, Harald Schmalzbauer Reviewed by: gallatin MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D23949 Changes: head/sys/dev/vmware/vmxnet3/if_vmx.c head/sys/dev/vmware/vmxnet3/if_vmxvar.h
^Triage: Assign to committer resolving, pending MFC. If this doesn't need to go to stable/11, set mfc-stable11 to - Thanks Patrick!
It has been more than a month. Any way to speedup the MFC?
Thanks for fix, but when the MFC to stable12 is planned ? According to comments, MFC time is over.
A commit references this bug: Author: mav Date: Mon Jul 13 17:20:21 UTC 2020 New revision: 363163 URL: https://svnweb.freebsd.org/changeset/base/363163 Log: MFC r358995 (by pkelsey): Fix iflib freelist state corruption This fixes a bug in iflib freelist management that breaks the required correspondence between freelist indexes and driver ring slots. PR: 243126, 243392, 240628 Changes: _U stable/12/ stable/12/sys/net/iflib.c