Bug 165059 - vtnet(4): Networking breaks with a router using virtio net driver on KVM host
Summary: vtnet(4): Networking breaks with a router using virtio net driver on KVM host
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 9.0-RELEASE
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords: needs-qa
Depends on:
Blocks:
 
Reported: 2012-02-12 21:20 UTC by David Talkington
Modified: 2024-10-11 21:25 UTC (History)
37 users (show)

See Also:
koobs: maintainer-feedback? (bryanv)


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Talkington 2012-02-12 21:20:11 UTC
When the router for a FreeBSD guest on KVM is also a FreeBSD guest on
the same KVM host, and which is using the virtio network driver from
virtio_kmod, ping will work between guests on different subnets, but
no userland network daemons will respond. If I switch to the e1000
driver on the router, but change nothing else, everything works correctly.

Fix: 

Unknown.
How-To-Repeat: I created three FreeBSD guests on one Linux KVM host. I am using bridged
networking on the KVM host, as br0 and br1. One of the guests has two
network interfaces and acts as a router between two subnets, as follows:

router1: br0, 192.168.1.1; br1, 192.168.2.1
client1: br0, 192.168.1.100; default route 192.168.1.1
client2: br1, 192.168.2.100; default route 192.168.2.1

I configured virtio network interfaces on all three hosts. I enabled
forwarding on router1, but no packet filtering. No NAT is in use.

Result:

    * client1 can ping client2, and vice versa.
    * ssh works from router1 to client1 and vice versa, and from router1
      to client2 and vice versa.
    * ssh from client1 to client2 will fail (and vice versa); the client
      simply hangs indefinitely while trying to connect. 
    * tcpdump on client2 will show that the SYN is arriving at client2
      port 22, but client2 never replies, nor generates any debug or log
      output that suggests it ever saw the connection attempt.
    * any other userland network service I try (both tcp and udp) will
      show the same thing -- packets arrive at client2 from client1, but
      the daemon seems to never see them. Since ping works, I know the
      kernel is getting them.
    * If I switch back to the e1000 driver on router1, but make no other
      changes, and make no changes at all to client1 and client2, then
      ssh will work properly from client1 to client2 and the problem is resolved.
    * If I let router1 continue to use virtio interfaces, but move router1
      onto a different KVM host -- so that the traffic from client1 to client2
      must leave the KVM host via the bridged interface and then return on a
      different interface - then ssh will work properly from client1 to
      client2 and the problem is resolved.

KVM guests: FreeBSD 9
virtio-kmod: 0.228301
KVM host: Ubuntu 11.10
qemu-kvm: 0.14.1
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2012-04-05 02:43:24 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-ports-bugs

reclassify.
Comment 2 Edwin Groothuis freebsd_committer freebsd_triage 2012-05-28 03:17:30 UTC
Responsible Changed
From-To: freebsd-ports-bugs->kuriyama

Over to maintainer (via the GNATS Auto Assign Tool)
Comment 3 Chris Rees freebsd_committer freebsd_triage 2013-08-07 20:33:18 UTC
Responsible Changed
From-To: kuriyama->bryanv

Hi Bryan, 

Is this something you may be acquainted with, as the virtio maintainer? 
Do you have any recommendations? 

Please accept my apologies if this isn't something for you... 

Chris
Comment 4 jmealo 2013-08-07 20:46:18 UTC
I wanted to confirm that this bug is present in FreeBSD 8.3-RELEASE_p9
running on SmartOS KVM (a different implementation than the linux KVM).
Comment 5 Chris Rees freebsd_committer freebsd_triage 2013-08-07 21:22:58 UTC
State Changed
From-To: open->feedback

Fantastic, thanks for your quick response. 

Jeffrey, does disabling checksum offloading work for you?
Comment 6 jmealo 2013-08-13 01:30:53 UTC
This did not resolve my issue.

Thanks,
Jeff
Comment 7 Phil Regnauld 2013-10-22 00:22:30 UTC
Env:
	Host OS: Debian 7.1 3.2.0-4-amd64 #1 SMP Debian 3.2.46-1+deb7u1 x86_64 GNU/Linux
	KVMM: QEMU emulator version 1.1.2 (qemu-kvm-1.1.2+dfsg-6, Debian)
	Guest: FreeBSD 9.2-R amd64

Disabling checksum offload with ifconfig vtnetX -rxcsum -txcsum on both
interfaces (this is a router) solves the issue, but performance becomes
terrible (150 KB/sec uses 100% CPU on host).

vtnet interfaces are, Host side, bridged to VLANs.

Problem does not appear if the traffic is to/from the router itself. Only
forwarded traffic is a problem.

Can provide more info/feedback if needed.
Comment 8 Phil Regnauld 2014-01-07 14:54:53 UTC
Phil Regnauld (regnauld) writes:
> Env:
> 	Host OS: Debian 7.1 3.2.0-4-amd64 #1 SMP Debian 3.2.46-1+deb7u1 x86_64 GNU/Linux
> 	KVMM: QEMU emulator version 1.1.2 (qemu-kvm-1.1.2+dfsg-6, Debian)
> 	Guest: FreeBSD 9.2-R amd64
> 
> Disabling checksum offload with ifconfig vtnetX -rxcsum -txcsum on both
> interfaces (this is a router) solves the issue, but performance becomes
> terrible (150 KB/sec uses 100% CPU on host).
> 
> vtnet interfaces are, Host side, bridged to VLANs.
> 
> Problem does not appear if the traffic is to/from the router itself. Only
> forwarded traffic is a problem.
> 
> Can provide more info/feedback if needed.

	Same problem has been observed with 10.0-RC4.

	kern/166645 may be related.

	This is causing FreeBSD (and pfSense) to be unusable as a network
	appliance / router on KVM platforms.

	Phil
Comment 9 elico 2015-08-27 21:06:53 UTC
Still present on 10.1.
Environment:
Ubuntu 14.04 KVM hypervisor
A FreeBSD 10.01 gateway between the world to three networks.
The FreeBSD has a VYOS default gateway with NATTING it.
When I remove the txcsum and rxcsum from the interface the packets doesn't get corrupted.

The VYOS router blocks INVALID packets ICMP packets are not malformed while TCP do.

I had the same issue with OpenBSD 5.7 and it got fixed on current(5.8).

Examples of the setup rc.conf and pf rules at:
http://wiki.squid-cache.org/ConfigExamples/Intercept/PfPolicyRoute#rc.conf_example_for_a_router
Comment 10 elico 2015-08-28 15:13:29 UTC
(In reply to elico from comment #9)
My testing topology:
http://ngtech.co.il/squidblocker/topology1.png
Comment 11 amvandemore 2015-09-13 14:25:29 UTC
I've seen the same issue on linux kvm guest(w/ FBSD router virtio guest w/ tso) worked around by:

pre-up /sbin/ethtool --offload eth0 tx off

so I am curious as to how this is identified as a FreeBSD bug.  Seems to more like something within the kvm stack.

Ubuntu 14.04.3 LTS
qemu-kvm                            2.0.0+dfsg-2ubuntu1.16

Seems to also be the same issue here:

https://forum.pfsense.org/index.php?topic=88467.0

However it's not a PF issue as ipfw kernel nat also did the same.
Comment 12 elico 2015-09-13 14:31:02 UTC
(In reply to amvandemore from comment #11)
My assumption is that if it works with Linux, OpenBSD in many versions then it's not an hypervisor issue.
What and how exactly I do not know but maybe the OpenBSD virtio changes can help to understand what was changed.

Notice that it's affecting only routing\gateway mode and not regular traffic so it's something special and it's not related at all to FW but to the GW\routing related code.
Comment 13 elico 2015-09-21 01:13:30 UTC
I do not "need" this since my systems works fine with e1000 and with Linux hosts but I was wondering if there is any progress with it?
Comment 14 Sydney Meyer 2016-01-24 00:59:12 UTC
(In reply to elico from comment #12)

I doubt too that this is a KVM issue as there seem to be similar problems (forwarding tcp packets) on Xen.

E.g: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=188261
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=202199
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197344
Comment 15 elico 2017-06-14 23:59:05 UTC
(In reply to Sydney Meyer from comment #14)
I had a chance to verify the subject with a special setup which includes debian and found out that the issue is only in a specific scenario:
The KVM hypervisor hosting two VM's and sharing the same interfae such as bridge.
The bug is that the hypervisor virtio driver doesn't write a checksum for packets which are directed towards an internal interface.
The Hypervisor should either write the checksome or the VM should not check it.
It's an issue that is partially in the driver and the hypervisor.
I am running now on both CentOS and Ubuntu KVM hypervisor sharing the same issue.
The solution for my case was to use iptables checksum fill option on the gateway machine.
The first step is to allow DHCP traffic pass between VM's and that the DHCP client(ISC) will not drop the packets using:
iptables -A POSTROUTING -t mangle -p udp --dport 68 -j CHECKSUM --checksum-fill

I will try to test with FreeBSD 11 since with OpenBSD 5.X it didn't but with 5.Y(tip) it was working fine.
Comment 16 Bryan Venteicher freebsd_committer freebsd_triage 2017-06-15 14:51:43 UTC
This has been a long standing and unfortunate issue. My memory is somewhat fuzzy, but generally speaking the host doesn't need to compute a checksum because it is basically just a memory copy into the guest, but FreeBSD doesn't have a flag (at least at the time I was originally working on the VirtIO drivers) to denote "recompute this checksum if forwarding" the packet.
Comment 17 elico 2017-06-15 15:40:26 UTC
(In reply to Bryan Venteicher from comment #16)
OK So the test environment should be the next:
- 1 KVM Hypervier(CentOS 7)
- 1 VYOS EDGE GW(eth0=192.168.89.200/24, eth1=192.168.7.254/24, GW=192.168.89.1)
- 1 FreeBSD (10.3+11.0) GW for 2 networks(eth0=192.168.7.1/24,eth1=192.168.6.254/24,eth2=192.168.7.254/24, GW=192.168.7.254/23)
- Windows+Linux+FreeBSD clients on networks 192.168.6.0/24+192.168.7.0/24 with GW 192.168.6.254 or 192.168.7.254

The expected result should be a working connection from the end user(Win\Lin\BSD) to the local networks and Internet.
Comment 18 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:40:45 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
Comment 19 elico 2018-05-28 20:04:21 UTC
(In reply to Eitan Adler from comment #18)
With what version of FreeBSD? Latest stable?
Comment 20 andcycle-bugs.freebsd.org 2019-02-28 09:34:18 UTC
I am probably hit similar issue with Gentoo Linux Host KVM and FreeBSD 9~12 Guest with virtio network adapter,

my environment is 
Linux bacztwo 4.14.101-gentoo
libvirt-4.9.0
qemu-3.1.0

under default setting
with e1000 series the guest sometimes raising interrupt storm over it
with virtio it won't receive any packet and unable to communicate through network.


the following option for qemu that just disable all checksum for virtio-net works on my FreeBSD guest

-device virtio-net-pci,csum=off,guest_csum=off
Comment 21 Jakub Chromy 2019-06-09 07:02:26 UTC
(In reply to andcycle-bugs.freebsd.org from comment #20)

here are the options for "virsh edit" on the KVM host:

    <interface type='network'>
      <source network='private'/>
      <model type='virtio'/>
      <driver>
        <host csum='off'/>
        <guest csum='off'/>
      </driver>
    </interface>

eg. "virsh edit freebsd-host" and put this into the config.

Jakub
Comment 22 zain david 2019-07-23 04:33:30 UTC
MARKED AS SPAM
Comment 23 elico 2020-01-21 11:33:16 UTC
(In reply to zain david from comment #22)
Why Spam?
Comment 24 Greg A. 2020-02-14 21:33:18 UTC
Hello FreeBSD maintainers,

This bug is still present on 11.3-RELEASE and 12.0-RELEASE
Maybe it could be a good idea to upgrade version in the ticket description.

Maybe somebody will consider it :)

It's so sad that FreeBSD run better on vmware than on linux KVM :/


Thanks in advance for the good job on FreeBSD
BR,
Grégory
Comment 25 Charlie Root 2020-04-17 16:25:26 UTC
(In reply to Greg A. from comment #24)

FreeBsd 12.1 workaround

# Gateway Host

Ifconfig for WAN interface vtnet0 (vtnet uplink 10g)

----
ifconfig_vtnet0="inet WAN_IP netmask WAN_MASK vlanmtu vlanhwtag vlanhwfilter vlanhwcsum vlanhwtso -rxcsum -txcsum -rxcsum6 -txcsum6 tso6 tso4 lro"
----

disable only -rxcsum -txcsum -rxcsum6 -txcsum6


Ifconfig for LAN interface vtnet1 (vtnet uplink 10g)

---
ifconfig_vtnet1="inet 192.168.1.1 netmask 255.255.255.0 vlanmtu vlanhwtag vlanhwfilter vlanhwcsum vlanhwtso rxcsum -txcsum rxcsum6 -txcsum6 tso6 tso4 lro"
---

disable only -txcsum -txcsum6


pf.conf - simple nat rule for LAN (scrub rules not need)

---
nat on vtnet0 from 192.168.1.0/24 to any -> WAN_IP
---


######################################################################

# LAN Client Host (vtnet uplink 10g)

Ifconfig for LAN interface vtnet0

---
ifconfig_vtnet0="inet 10.66.1.2 netmask 255.255.255.0 vlanmtu vlanhwtag vlanhwfilter vlanhwcsum vlanhwtso rxcsum -txcsum rxcsum6 -txcsum6 tso6 tso4 lro"
---

disable only -txcsum -txcsum6


######################################################################

All traffic test passed normal.

1) Gateway LAN -> Client LAN - iperf3 result 14Gbit/s

2) Client Lan -> Gateway Lan - iperf result 14Gbit/s

3) Client Lan download Gateway Lan (NAT) from External source - result max download speed

3) External iperf client -> to Gateway WAN (iperf port) -> redirect to LAN Client iperf server - result max External iperf client speed

Please check workaround
Comment 26 Kubilay Kocak freebsd_committer freebsd_triage 2022-11-11 20:55:22 UTC
^Triage: 

This issue needs a reproduction on currently supported FreeBSD versions and steps to reproduce (minimum test case).

Ideally reproduction confirmation against CURRENT, 13.1 and 12.4

(re)confirmation that disabling RX/TX checksumming works around the issue, or not, would also be great.
Comment 27 elico 2022-11-13 19:07:42 UTC
(In reply to Kubilay Kocak from comment #26)
I will try to test later on.
Since Ubuntu 14.04 is not supported anymore I will try to reproduce on later versions of Ubuntu 20.04/22.04 and Oracle Linux 8.
I will try to verify with 12.3 and 12.4
Comment 28 elico 2022-11-13 22:13:14 UTC
(In reply to elico from comment #27)
OK so just to mention the NAT related documents are at:
http://draft.scyphus.co.jp/freebsd/nat.html

and ontop of Oracle Enterprise Linux 8 KVM host the issue exists on 12.3
the setup is very simple:
* Alpine 3.16 with ip 192.168.111.1/24 gw 192.168.111.254 DNS 8.8.8.8
* FreeBSD 12.3 with two interfaces: vtnet0 192.168.110.1/24 GW 192.168.110.254
 vtnet1 192.168.111.254/24
 pf rules to nat on $ext_inf
* VyOS 1.3.2 with two interfaces: eth0 192.168.122.183/24 GW 192.168.122.1
  eth1: 192.168.110.254/24

ping from Alpine to 8.8.8.8 via FreeBSD (NAT) -> VyOS (NAT) = works (ICMP_
wget from Alpine to 8.8.8.8 via FreeBSD (NAT) -> VyOS (NAT) = doesn't work (TCP)


When I am running the next on the vtnet0 and vtnet1 interfaces the TCP works:
ifconfig vtnet0 -rxcsum
ifconfig vtnet1 -rxcsum

It was resolved long ago in OpenBSD so now there only should be a fix and a text.
Comment 29 Tristin Stagg 2023-02-23 20:37:39 UTC
(In reply to elico from comment #28)

Hello,

I created a FreeBSD 13.1 STABLE virtual router in a Proxmox 7.1 testing environment and was encountering this issue with my FreeBSD VM. I can confirm that running the following helped me to resolve the issue:

ifconfig vtnet0 -rxcsum
ifconfig vtnet1 -rxcsum

Thank you for helping with my virtual router's NAT not working. 

Previous to resolution, I was able to ping things, which tells me that ICMP was working fine (I could also see this in my pflogging) - however, I could not say the same for TCP
Comment 30 José Zadir 2023-04-27 01:01:01 UTC
Hello,

I created a FreeBSD 12.3 STABLE virtual router in a Proxmox 7.1 environment and was encountering this issue with my FreeBSD VM. 
I can confirm that running the following helped to work arround the issue, but the performance is terrible.

ifconfig vtnet0 -rxcsum
ifconfig vtnet1 -rxcsum
Comment 31 Karel Krýda 2023-12-09 14:49:31 UTC
Hello,
I came across this bug report while troubleshooting an OPNsense throughput issue when using a VirtIO network card on Proxmox 8. Unfortunately the only way to achieve 10Gbps speed is to enable the HW Checksum Offloading option. Unfortunately, after activating it, access to servers on the same Proxmox server stops working and not only to them, but also for example to the TrueNAS Core server (NOT on the same Proxmox server).
I see that this issue has not been resolved since 2012. Is there ever a plan to fix this? This behavior is still present on OPNsense 23.7.9 and therefore FreeBSD 13.2.
Thanks
Comment 32 Igor Raschetov 2024-01-29 11:49:41 UTC
Hello
Adding parameters to /boot/loader.conf

hw.vtnet.X.tso_disable="1"
hw.vtnet.tso_disable="1"
hw.vtnet.lro_disable="1"
hw.vtnet.X.lro_disable="1"
hw.vtnet.csum_disable="1"
hw.vtnet.X.csum_disable="1"

Solved the problem
Comment 33 Mark Linimon freebsd_committer freebsd_triage 2024-10-04 14:36:23 UTC
^Triage: clear unneeded flags.  Nothing has yet been committed to be merged.