I am running FreeBSD 13-CURRENT r354563, and when I boot my Windows 10 VM guest it is crashing after ~5 seconds with the following error printed to the console: Assertion failed: (iov[0].iov_len >= sizeof(*hdr)), function pci_vtnet_rx, file /usr/src/usr.sbin/bhyve/pci_virtio_net.c, line 272. This appears that it could be a result of the recent change r354552 "bhyve: add support for virtio-net mergeable rx buffers", as it added the line of code referenced in the above error: https://svnweb.freebsd.org/base?view=revision&revision=354552 The other thing that could be relevant is that this VM was recently upgraded to Windows 10 version 1909. However, given the code that is crashing was only added to head 8 days ago, that is my first guess at where the issue is.
I can confirm that rebooting to an older boot environment running r354376 has worked around the issue. Therefore the problem is not the updated Windows 10 version 1909 VM guest, but a regression in the bhyve code between r354377 and r354563.
Created attachment 209202 [details] Patch to print iov lengths
Thanks for reporting. Yes, the issue is due to r354552, although it apparently only affects Windows (for now). Can you please try the attached (non-fix) patch to help me understand what it is actually happening in your scenario? I would like to see the output of bhyve just before the assert() triggers.
And, by the way, since you enter that code it means that you are using VALE as a net backend rather than if_tap. Can you please confirm?
Created attachment 209215 [details] Netgraph image showing bhyve connected via tap0 and link17 to ix0bridge Hi Vincenzo, Thanks for the reply. No, I am not using vale, so it looks like the code is being incorrectly called. I am using a combination of tap(4) and ng_bridge(4). For my jails, I use the jng script at the link below to automatically create the vnet interfaces and join them through ng_bridge(4): https://svnweb.freebsd.org/base/head/share/examples/jails/jng?revision=354563&view=markup For bhyve, I manually run the command below to hook into this same ng_bridge infrastructure: # ngctl connect ix0bridge: tap0: link17 lower # ngctl msg tap0: setpromisc 1 # ngctl msg tap0: setautosrc 0 # ifconfig tap0 up # bhyve \ -c 2 \ -m 12G \ -s 0,hostbridge \ -s 3,ahci-hd,/dev/ada0 \ -s 10,virtio-net,tap0 \ -s 31,lpc \ -l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd \ -H -w \ -s 16,fbuf,tcp=127.0.0.1:5900,w=1920,h=1080 \ -s 17,xhci,tablet win10 This results in a netgraph setup as shown in the attached graph output. With your debug patch attached above, I get the following output: frag #0: len 0 frag #1: len 1514 frag #2: len 10 frag #3: len 1514 frag #4: len 10 frag #5: len 1514 frag #6: len 10 frag #7: len 1514 frag #8: len 10 frag #9: len 1514 frag #10: len 10 frag #11: len 1514 frag #12: len 10 frag #13: len 1514 frag #14: len 10 frag #15: len 1514 frag #16: len 10 frag #17: len 1514 frag #18: len 10 frag #19: len 1514 frag #20: len 10 frag #21: len 1514 frag #22: len 10 frag #23: len 1514 frag #24: len 10 frag #25: len 1514 frag #26: len 10 frag #27: len 1514 frag #28: len 10 frag #29: len 1514 frag #30: len 10 frag #31: len 1514 frag #32: len 10 frag #33: len 1514 frag #34: len 10 frag #35: len 1514 frag #36: len 10 frag #37: len 1514 frag #38: len 10 frag #39: len 1514 frag #40: len 10 frag #41: len 1514 frag #42: len 10 frag #43: len 1514 Assertion failed: (iov[0].iov_len >= sizeof(*hdr)), function pci_vtnet_rx, file /usr/src/usr.sbin/bhyve/pci_virtio_net.c, line 279. Abort trap
Created attachment 209216 [details] Patch to find out about features
I see. I don't think netgraph is relevant here, but you are right about the code being incorrectly called. With the tap backend (see your bhyve invocation), mergeable buffers should be disabled. It looks more like an issue related to virtio feature negotiation. I uploaded a second patch to see what is going on.. can you please run a test and show me the output?
Output from features patch: My features: 11010020 negotiated features = 10020 Assertion failed: (iov[0].iov_len >= sizeof(*hdr)), function pci_vtnet_rx, file /usr/src/usr.sbin/bhyve/pci_virtio_net.c, line 272.
Negotiated futures looks fine for me. VIRTIO_NET_F_MRG_RXBUF flag is zero. But I'm worried about how we use sc->rx_merge flag. During initialization, we set it to 1: https://svnweb.freebsd.org/base/head/usr.sbin/bhyve/pci_virtio_net.c?revision=354552&view=markup#l515 Than on features negotiation, we set it to 0: https://svnweb.freebsd.org/base/head/usr.sbin/bhyve/pci_virtio_net.c?revision=354552&view=markup#l575 But during the reset, we set it to 1: https://svnweb.freebsd.org/base/head/usr.sbin/bhyve/pci_virtio_net.c?revision=354552&view=markup#l161 And it seems that there is a race between pci_vtnet_rx() and pci_vtnet_reset(). If pci_vtnet_rx() is called after pci_vtnet_reset() immediately, than the wrong code path is called (RX MRG BUF enabled). I have not yet installed windows to test this hypothesis, but using FreyBSD as a guest system I discovered another problem. Start vm: sh vmrun.sh -c 2 -m 4096M -t tap1001 -d freebsd-12-0.img freebsd-0 Configure tap1001: ifconfig tap1001 192.168.1.5/24 up Ping VM: ping 192.168.1.2 PING 192.168.1.2 (192.168.1.2): 56 data bytes 64 bytes from 192.168.1.2: icmp_seq=0 ttl=64 time=0.544 ms Inside VM put vtnet0 interface down. root@vm0:~ # ifconfig vtnet0 down pci_vtnet_rx pci_vtnet_reset pci_vtnet_rx <<-- pci_vtnet_neg_features() is not called before RX. vtnet: ndesc (1040) out of range, driver confused? Assertion failed: (n >= 1 && cur_iov_len + n <= VTNET_MAXSEGS), function pci_vtnet_rx, file /afedorov/freebsd-head-clean/usr.sbin/bhyve/pci_virtio_net.c, line 239. Abort trap
I created a guest system with windows 10. It seems my theory is correct. I started ping -f on tap interface before starting the VM. root@q1u001:/afedorov/vm # /usr/obj/afedorov/freebsd-head-clean/amd64.amd64/usr.sbin/bhyve/bhyve -c 2 -m 4G -s 0,hostbridge -s 3,ahci-hd,./Windows-2k19-vstack.raw -s 4,ahci-cd,./seed.iso -s 10,virtio-net,tap1001 -s 31,lpc -l bootrom,/usr /local/share/uefi-firmware/BHYVE_UEFI.fd -H -w -s 16,fbuf,tcp=10.78.67.11:5900,w=1920,h=1080 -s 17,xhci,tablet win10 fbuf frame buffer base: 0x941e00000 [sz 16777216] pci_vtnet_reset pci_vtnet_reset pci_vtnet_neg_features pci_vtnet_rx pci_vtnet_rx pci_vtnet_rx Unhandled ps2 keyboard command 0x02 pci_vtnet_rx pci_vtnet_rx pci_vtnet_rx pci_vtnet_rx Unhandled ps2 keyboard command 0x02 pci_vtnet_rx pci_vtnet_rx pci_vtnet_rx pci_vtnet_reset << Set sc->rx_merge to 1. pci_vtnet_rx <<<-- RX called before features was negotiated, so sc->rx_merge == 1 Assertion failed: (iov[0].iov_len >= sizeof(*hdr)), function pci_vtnet_rx, file /afedorov/freebsd-head-clean/usr.sbin/bhyve/pci_virtio_net.c, line 278. Abort As far as I understand, RX should be disabled until features are negotiated. Guest starts successfully if you apply the next patch: Index: usr.sbin/bhyve/pci_virtio_net.c =================================================================== --- usr.sbin/bhyve/pci_virtio_net.c (revision 354802) +++ usr.sbin/bhyve/pci_virtio_net.c (working copy) @@ -149,6 +149,8 @@ /* Acquire the RX lock to block RX processing. */ pthread_mutex_lock(&sc->rx_mtx); + netbe_rx_disable(sc->vsc_be); + /* Set sc->resetting and give a chance to the TX thread to stop. */ pthread_mutex_lock(&sc->tx_mtx); sc->resetting = 1;
(In reply to Aleksandr Fedorov from comment #10) Indeed, that patch fixes it for me.
Thank you for the debugging. That's the issue indeed. I extended the patch a little bit to add comments and clean up the code. https://reviews.freebsd.org/D22440
A commit references this bug: Author: vmaffione Date: Tue Nov 19 21:10:44 UTC 2019 New revision: 354864 URL: https://svnweb.freebsd.org/changeset/base/354864 Log: bhyve: virtio-net: disable receive until features are negotiated This patch fixes a race condition where the receive callback is called while the device is being reset. Since the rx_merge variable may change during reset, the receive callback may operate inconsistently with what the guest expects. Also, get rid of the unused rx_vhdrlen variable. PR: 242023 Reported by: aleksandr.fedorov@itglobal.com Reviewed by: markj, jhb MFC with: r354552 Differential Revision: https://reviews.freebsd.org/D22440 Changes: head/usr.sbin/bhyve/pci_virtio_net.c
After discovering this issue with my FreeNAS Bhyve VM, I updated to the latest 13-CURRENT source code with this patch for bhyve and it resolved my issue.
Marking this bug as fixed with commit r354864. Thanks!