Bug 242023 - bhyve pci_vtnet_rx broken after r354552
Summary: bhyve pci_vtnet_rx broken after r354552
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bhyve (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: Vincenzo Maffione
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-11-17 08:18 UTC by Ben Woods
Modified: 2020-01-19 07:34 UTC (History)
8 users (show)

See Also:


Attachments
Patch to print iov lengths (589 bytes, patch)
2019-11-17 09:08 UTC, Vincenzo Maffione
no flags Details | Diff
Netgraph image showing bhyve connected via tap0 and link17 to ix0bridge (65.94 KB, image/png)
2019-11-17 22:18 UTC, Ben Woods
no flags Details
Patch to find out about features (787 bytes, patch)
2019-11-17 22:38 UTC, Vincenzo Maffione
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ben Woods freebsd_committer freebsd_triage 2019-11-17 08:18:03 UTC
I am running FreeBSD 13-CURRENT r354563, and when I boot my Windows 10 VM guest it is crashing after ~5 seconds with the following error printed to the console:

Assertion failed: (iov[0].iov_len >= sizeof(*hdr)), function pci_vtnet_rx, file /usr/src/usr.sbin/bhyve/pci_virtio_net.c, line 272.

This appears that it could be a result of the recent change r354552 "bhyve: add support for virtio-net mergeable rx buffers", as it added the line of code referenced in the above error:
https://svnweb.freebsd.org/base?view=revision&revision=354552

The other thing that could be relevant is that this VM was recently upgraded to Windows 10 version 1909. However, given the code that is crashing was only added to head 8 days ago, that is my first guess at where the issue is.
Comment 1 Ben Woods freebsd_committer freebsd_triage 2019-11-17 08:36:40 UTC
I can confirm that rebooting to an older boot environment running r354376 has worked around the issue. Therefore the problem is not the updated Windows 10 version 1909 VM guest, but a regression in the bhyve code between r354377 and r354563.
Comment 2 Vincenzo Maffione freebsd_committer freebsd_triage 2019-11-17 09:08:56 UTC
Created attachment 209202 [details]
Patch to print iov lengths
Comment 3 Vincenzo Maffione freebsd_committer freebsd_triage 2019-11-17 09:10:44 UTC
Thanks for reporting.
Yes, the issue is due to r354552, although it apparently only affects Windows (for now).
Can you please try the attached (non-fix) patch to help me understand what it is actually happening in your scenario? I would like to see the output of bhyve just before the assert() triggers.
Comment 4 Vincenzo Maffione freebsd_committer freebsd_triage 2019-11-17 14:42:58 UTC
And, by the way, since you enter that code it means that you are using VALE as a net backend rather than if_tap. Can you please confirm?
Comment 5 Ben Woods freebsd_committer freebsd_triage 2019-11-17 22:18:05 UTC
Created attachment 209215 [details]
Netgraph image showing bhyve connected via tap0 and link17 to ix0bridge

Hi Vincenzo, Thanks for the reply.

No, I am not using vale, so it looks like the code is being incorrectly called. I am using a combination of tap(4) and ng_bridge(4).

For my jails, I use the jng script at the link below to automatically create the vnet interfaces and join them through ng_bridge(4):
https://svnweb.freebsd.org/base/head/share/examples/jails/jng?revision=354563&view=markup

For bhyve, I manually run the command below to hook into this same ng_bridge infrastructure:
# ngctl connect ix0bridge: tap0: link17 lower
# ngctl msg tap0: setpromisc 1
# ngctl msg tap0: setautosrc 0
# ifconfig tap0 up
# bhyve \
        -c 2 \
        -m 12G \
        -s 0,hostbridge \
        -s 3,ahci-hd,/dev/ada0 \
        -s 10,virtio-net,tap0 \
        -s 31,lpc \
        -l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd \
        -H -w \
        -s 16,fbuf,tcp=127.0.0.1:5900,w=1920,h=1080 \
        -s 17,xhci,tablet win10

This results in a netgraph setup as shown in the attached graph output.

With your debug patch attached above, I get the following output:
frag #0: len 0
frag #1: len 1514
frag #2: len 10
frag #3: len 1514
frag #4: len 10
frag #5: len 1514
frag #6: len 10
frag #7: len 1514
frag #8: len 10
frag #9: len 1514
frag #10: len 10
frag #11: len 1514
frag #12: len 10
frag #13: len 1514
frag #14: len 10
frag #15: len 1514
frag #16: len 10
frag #17: len 1514
frag #18: len 10
frag #19: len 1514
frag #20: len 10
frag #21: len 1514
frag #22: len 10
frag #23: len 1514
frag #24: len 10
frag #25: len 1514
frag #26: len 10
frag #27: len 1514
frag #28: len 10
frag #29: len 1514
frag #30: len 10
frag #31: len 1514
frag #32: len 10
frag #33: len 1514
frag #34: len 10
frag #35: len 1514
frag #36: len 10
frag #37: len 1514
frag #38: len 10
frag #39: len 1514
frag #40: len 10
frag #41: len 1514
frag #42: len 10
frag #43: len 1514
Assertion failed: (iov[0].iov_len >= sizeof(*hdr)), function pci_vtnet_rx, file /usr/src/usr.sbin/bhyve/pci_virtio_net.c, line 279.
Abort trap
Comment 6 Vincenzo Maffione freebsd_committer freebsd_triage 2019-11-17 22:38:21 UTC
Created attachment 209216 [details]
Patch  to find out about features
Comment 7 Vincenzo Maffione freebsd_committer freebsd_triage 2019-11-17 22:41:07 UTC
I see. I don't think netgraph is relevant here, but you are right about the code being incorrectly called. With the tap backend (see your bhyve invocation), mergeable buffers should be disabled.
It looks more like an issue related to virtio feature negotiation. I uploaded a second patch to see what is going on.. can you please run a test and show me the output?
Comment 8 Ben Woods freebsd_committer freebsd_triage 2019-11-17 23:07:17 UTC
Output from features patch:
My features: 11010020
negotiated features = 10020
Assertion failed: (iov[0].iov_len >= sizeof(*hdr)), function pci_vtnet_rx, file /usr/src/usr.sbin/bhyve/pci_virtio_net.c, line 272.
Comment 9 Aleksandr Fedorov freebsd_committer freebsd_triage 2019-11-18 09:06:13 UTC
Negotiated futures looks fine for me. VIRTIO_NET_F_MRG_RXBUF flag is zero.

But I'm worried about how we use sc->rx_merge flag.

During initialization, we set it to 1: https://svnweb.freebsd.org/base/head/usr.sbin/bhyve/pci_virtio_net.c?revision=354552&view=markup#l515

Than on features negotiation, we set it to 0: https://svnweb.freebsd.org/base/head/usr.sbin/bhyve/pci_virtio_net.c?revision=354552&view=markup#l575

But during the reset, we set it to 1: https://svnweb.freebsd.org/base/head/usr.sbin/bhyve/pci_virtio_net.c?revision=354552&view=markup#l161

And it seems that there is a race between pci_vtnet_rx() and pci_vtnet_reset().

If pci_vtnet_rx() is called after pci_vtnet_reset() immediately, than the wrong code path is called (RX MRG BUF enabled).

I have not yet installed windows to test this hypothesis, but using FreyBSD as a guest system I discovered another problem.

Start vm: sh vmrun.sh -c 2 -m 4096M -t tap1001 -d freebsd-12-0.img freebsd-0
Configure tap1001: ifconfig tap1001 192.168.1.5/24 up
Ping VM: ping 192.168.1.2
PING 192.168.1.2 (192.168.1.2): 56 data bytes
64 bytes from 192.168.1.2: icmp_seq=0 ttl=64 time=0.544 ms

Inside VM put vtnet0 interface down.

root@vm0:~ # ifconfig vtnet0 down
pci_vtnet_rx
pci_vtnet_reset
pci_vtnet_rx <<-- pci_vtnet_neg_features() is not called before RX.
vtnet: ndesc (1040) out of range, driver confused?
Assertion failed: (n >= 1 && cur_iov_len + n <= VTNET_MAXSEGS), function pci_vtnet_rx, file /afedorov/freebsd-head-clean/usr.sbin/bhyve/pci_virtio_net.c, line 239.
Abort trap
Comment 10 Aleksandr Fedorov freebsd_committer freebsd_triage 2019-11-18 11:53:49 UTC
I created a guest system with windows 10. It seems my theory is correct.

I started ping -f on tap interface before starting the VM.

root@q1u001:/afedorov/vm # /usr/obj/afedorov/freebsd-head-clean/amd64.amd64/usr.sbin/bhyve/bhyve -c 2 -m 4G -s 0,hostbridge -s 3,ahci-hd,./Windows-2k19-vstack.raw -s 4,ahci-cd,./seed.iso -s 10,virtio-net,tap1001 -s 31,lpc -l bootrom,/usr
/local/share/uefi-firmware/BHYVE_UEFI.fd -H -w -s 16,fbuf,tcp=10.78.67.11:5900,w=1920,h=1080 -s 17,xhci,tablet win10
fbuf frame buffer base: 0x941e00000 [sz 16777216]
pci_vtnet_reset
pci_vtnet_reset
pci_vtnet_neg_features
pci_vtnet_rx
pci_vtnet_rx
pci_vtnet_rx
Unhandled ps2 keyboard command 0x02
pci_vtnet_rx
pci_vtnet_rx
pci_vtnet_rx
pci_vtnet_rx
Unhandled ps2 keyboard command 0x02
pci_vtnet_rx
pci_vtnet_rx
pci_vtnet_rx
pci_vtnet_reset << Set sc->rx_merge to 1.
pci_vtnet_rx <<<-- RX called before features was negotiated, so sc->rx_merge == 1
Assertion failed: (iov[0].iov_len >= sizeof(*hdr)), function pci_vtnet_rx, file /afedorov/freebsd-head-clean/usr.sbin/bhyve/pci_virtio_net.c, line 278.
Abort


As far as I understand, RX should be disabled until features are negotiated.

Guest starts successfully if you apply the next patch:

Index: usr.sbin/bhyve/pci_virtio_net.c
===================================================================
--- usr.sbin/bhyve/pci_virtio_net.c     (revision 354802)
+++ usr.sbin/bhyve/pci_virtio_net.c     (working copy)
@@ -149,6 +149,8 @@
        /* Acquire the RX lock to block RX processing. */
        pthread_mutex_lock(&sc->rx_mtx);

+       netbe_rx_disable(sc->vsc_be);
+
        /* Set sc->resetting and give a chance to the TX thread to stop. */
        pthread_mutex_lock(&sc->tx_mtx);
        sc->resetting = 1;
Comment 11 Ben Woods freebsd_committer freebsd_triage 2019-11-18 14:01:01 UTC
(In reply to Aleksandr Fedorov from comment #10)
Indeed, that patch fixes it for me.
Comment 12 Vincenzo Maffione freebsd_committer freebsd_triage 2019-11-18 21:44:29 UTC
Thank you for the debugging. That's the issue indeed.
I extended the patch a little bit to add comments and clean up the code.
https://reviews.freebsd.org/D22440
Comment 13 commit-hook freebsd_committer freebsd_triage 2019-11-19 21:11:04 UTC
A commit references this bug:

Author: vmaffione
Date: Tue Nov 19 21:10:44 UTC 2019
New revision: 354864
URL: https://svnweb.freebsd.org/changeset/base/354864

Log:
  bhyve: virtio-net: disable receive until features are negotiated

  This patch fixes a race condition where the receive callback is called
  while the device is being reset. Since the rx_merge variable may change
  during reset, the receive callback may operate inconsistently with what
  the guest expects.
  Also, get rid of the unused rx_vhdrlen variable.

  PR:	242023
  Reported by:	aleksandr.fedorov@itglobal.com
  Reviewed by:	markj, jhb
  MFC with:	r354552
  Differential Revision:	https://reviews.freebsd.org/D22440

Changes:
  head/usr.sbin/bhyve/pci_virtio_net.c
Comment 14 Shirkdog 2019-11-21 23:12:53 UTC
After discovering this issue with my FreeNAS Bhyve VM, I updated to the latest 13-CURRENT source code with this patch for bhyve and it resolved my issue.
Comment 15 Ben Woods freebsd_committer freebsd_triage 2020-01-19 07:34:20 UTC
Marking this bug as fixed with commit r354864.

Thanks!