Created attachment 223748 [details] Kernel Panic I am currently seeing a panic on a Hyper-V based virtual machine when mounting an NFS share. The -CURRENT build is from the today (1th of April). I would think that this panic is Hyper-V or amd64 related. I have the same share mounted on a RPi4B and with a build from today the share is accessible and a stress test via a build world was successful. The system hangs in an endless loop, so I can currently only provide screenshot of the panic.
btw it looks more like an error in tcp code, this might be related: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254725
(In reply to Li-Wen Hsu from comment #1) Its mainly the call chain from vmbus_chan_task() -> hn_chan_callback() which leads me towards an bug in the Hyper-V implementation, since hn0 is the default network interface on Hyper-V. I got an old kernel from around march 19th booted and will examine the crash dump as soon as I found some free time.
The KERNCONF is the following --------------------------------------------------- include GENERIC options RATELIMIT options TCPHPTS options KERN_TLS options ROUTE_MPATH options RANDOM_FENESTRASX --------------------------------------------------- Dump information are --------------------------------------------------- Dump header from device: /dev/da0p3 Architecture: amd64 Architecture Version: 2 Dump Length: 282898432 Blocksize: 512 Compression: none Dumptime: 2021-04-01 13:05:24 +0200 Hostname: fbsd-dev.0xfce3.net Magic: FreeBSD Kernel Dump Version String: FreeBSD 14.0-CURRENT #25 main-n245771-529a2a0f2765: Thu Apr 1 11:36:01 CEST 2021 root@lion.0xfce3.net:/boiler/nfs/obj/boiler/nfs/src/amd64.amd64/sys/GENERIC-TCP Panic String: Assertion in_epoch(net_epoch_preempt) failed at /boiler/nfs/src/sys/netinet/tcp_lro.c:915 Dump Parity: 4289937466 Bounds: 3 Dump Status: good --------------------------------------------------- src.conf is --------------------------------------------------- WITH_EXTRA_TCP_STACKS=1 WITH_BEARSSL=1 WITH_PIE=1 WITH_RETPOLINE=1 WITH_INIT_ALL_ZERO=1 --------------------------------------------------- I upload the dump in a minute.
The crash dump is too large for the upload. The stacktrace is the following. KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0051761840 vpanic() at vpanic+0x181/frame 0xfffffe0051761890 panic() at panic+0x43/frame 0xfffffe00517618f0 tcp_lro_lookup() at tcp_lro_lookup+0xef/frame 0xfffffe0051761920 tcp_lro_rx2() at tcp_lro_rx2+0x7da/frame 0xfffffe00517619f0 hn_chan_callback() at hn_chan_callback+0x1eb/frame 0xfffffe0051761ad0 vmbus_chan_task() at vmbus_chan_task+0x2f/frame 0xfffffe0051761b00 taskqueue_run_locked() at taskqueue_run_locked+0xaa/frame 0xfffffe0051761b80 taskqueue_thread_loop() at taskqueue_thread_loop+0x94/frame 0xfffffe0051761bb0 fork_exit() at fork_exit+0x80/frame 0xfffffe0051761bf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0051761bf0 If I can provide more information, just let me know.
Backtrace #0 __curthread () at /boiler/nfs/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=textdump@entry=1) at /boiler/nfs/src/sys/kern/kern_shutdown.c:399 #2 0xffffffff80c132f0 in kern_reboot (howto=260) at /boiler/nfs/src/sys/kern/kern_shutdown.c:486 #3 0xffffffff80c13750 in vpanic (fmt=<optimized out>, ap=<optimized out>) at /boiler/nfs/src/sys/kern/kern_shutdown.c:919 #4 0xffffffff80c134a3 in panic (fmt=<unavailable>) at /boiler/nfs/src/sys/kern/kern_shutdown.c:843 #5 0xffffffff80ded83f in tcp_lro_lookup (lc=0xfffffe0057299ad0, le=0xfffffe00573a3b48) at /boiler/nfs/src/sys/netinet/tcp_lro.c:915 #6 0xffffffff80dee79a in tcp_lro_rx2 (lc=<optimized out>, lc@entry=0xfffffe0057299ad0, m=<optimized out>, m@entry=0xfffff80003956d00, csum=<optimized out>, csum@entry=0, use_hash=<optimized out>, use_hash@entry=1) at /boiler/nfs/src/sys/netinet/tcp_lro.c:1946 #7 0xffffffff80def51f in tcp_lro_rx (lc=<unavailable>, lc@entry=0xfffffe0057299ad0, m=<unavailable>, m@entry=0xfffff80003956d00, csum=<unavailable>, csum@entry=0) at /boiler/nfs/src/sys/netinet/tcp_lro.c:2060 #8 0xffffffff8103822b in hn_lro_rx (lc=0xfffffe0057299ad0, m=0xfffff80003956d00) at /boiler/nfs/src/sys/dev/hyperv/netvsc/if_hn.c:3421 #9 hn_rxpkt (rxr=0xfffffe0057298000) at /boiler/nfs/src/sys/dev/hyperv/netvsc/if_hn.c:3722 #10 hn_rndis_rx_data (rxr=<optimized out>, data=<optimized out>, dlen=<optimized out>) at /boiler/nfs/src/sys/dev/hyperv/netvsc/if_hn.c:7392 #11 hn_rndis_rxpkt (rxr=<optimized out>, data=<optimized out>, dlen=<optimized out>) at /boiler/nfs/src/sys/dev/hyperv/netvsc/if_hn.c:7413 #12 hn_nvs_handle_rxbuf (rxr=<optimized out>, chan=0xfffff800039f7400, pkthdr=<optimized out>) at /boiler/nfs/src/sys/dev/hyperv/netvsc/if_hn.c:7512 #13 hn_chan_callback (chan=chan@entry=0xfffff800039f7400, xrxr=<optimized out>, xrxr@entry=0xfffffe0057298000) at /boiler/nfs/src/sys/dev/hyperv/netvsc/if_hn.c:7604 #14 0xffffffff810454df in vmbus_chan_task (xchan=0xfffff800039f7400, pending=<optimized out>) at /boiler/nfs/src/sys/dev/hyperv/vmbus/vmbus_chan.c:1381 #15 0xffffffff80c752fa in taskqueue_run_locked (queue=queue@entry=0xfffff800038b9300) at /boiler/nfs/src/sys/kern/subr_taskqueue.c:476 #16 0xffffffff80c76384 in taskqueue_thread_loop (arg=arg@entry=0xfffffe000428f090) at /boiler/nfs/src/sys/kern/subr_taskqueue.c:793 #17 0xffffffff80bcd540 in fork_exit (callout=0xffffffff80c762f0 <taskqueue_thread_loop>, arg=0xfffffe000428f090, frame=0xfffffe0051761c00) at /boiler/nfs/src/sys/kern/kern_fork.c:1077 #18 <signal handler called>
It hit this assert in tcp_lro.c 911 tcp_lro_lookup(struct lro_ctrl *lc, struct lro_entry *le) 912 { 913 struct inpcb *inp = NULL; 914 915 NET_EPOCH_ASSERT(); <--- panic here 916 switch (le->eh_type) { How often does it occur? I am not familiar to the lro and epoch code. the HyperV hn driver had couple commit since March 12. The commits are about RSC support for packets from same host. Is the NFS server VM running on the same Hyper-V host? If it is easy for your reproduce on the current build, can you try any build before March 12 to see if it is reproducible?
(In reply to Wei Hu from comment #6) This panic occurs right at boot time when an NFS share is mounted. I have a 12-STABLE system, virtualized also via Hyper-V on the same Windows10 system, which acts as a NFS-server. I can try to bisect this, based on the timeframe from march 12 to april 1, but this will take some time.
(In reply to Gordon Bergling from comment #7) I tried to reproduce in my test env without luck. I created two FreeBSD guests on same Windows Server 2019 host, one being the NFS server and the other the test client with latest commit. I tried two builds on the client: 1. a091c353235e0ee97d2531e80d9d64e1648350f4 on April 11 2. b6fd00791f2b9690b0a5d8670fc03f74eda96da2 on March 22 Build world and build kernel with NFS share all succeeded on the client. I may not have the right environment as you do. Here is how my test env looks: - Hyper-V Version: 10.0.17763 [SP1] - NFS mount not at boot time. I tried automount during boot but it keeps complaining the file system is not clean, couldn't find fsck_nfs to check and kick me into single user mode. I think that's something with my configuration however irrelevant to this bug. - Both nfs server and client are running DEBUG build. - Both nfs server and client are Gen-1 VMs. It will be really helpful if you can bisect in your env.
(In reply to Wei Hu from comment #8) Thanks for the investigation. I was able to boot a kernel from today (15 April) on this machine. I tracked the issue down to the tcp_bbr or cc_htcp. I build the system with WITH_EXTRA_TCP_STACKS=1 and have tcp_bbr_load="YES" cc_htcp_load="YES" in /boot/loader.conf and net.inet.tcp.cc.algorithm=htcp net.inet.tcp.functions_default=bbr in /etc/sysctl.conf. I first disabled the sysctl.conf settings and the panic is still happening. So it is enough to load the modules at boot time. If I disable both modules the system is starting as usual. If one of these modules is loaded at boot time, the system panics. Maybe it is something locking related. Hope that helps to track down that issue.
(In reply to Gordon Bergling from comment #9) With tcp_bbr module loaded, have you ever successfully booted up the system with NFS before? The BBR code was introduced in commit 35c7bb340788f0ce9347b7066619d8afb31e2123 on Sept 24, 2019. I wonder if this problem existed from Day 1 of this commit and it has only been tested on Hyper-V recently by you.
(In reply to Wei Hu from comment #10) Sorry for the late reply. I was running TCP BBR on Hyper-V since January 2020 without seeing any problems. Around the mid of march 2021 there must be a commit that hit the tree, which leads to the panic. I had tried to bisect the problem, but compiling an older version of -CURRENT on a recent one leads to obscure clang error messages. I tried a fresh build from todays sources without loading any additional TCP stack at boot time and the panic is still present for both TCP BBR and TCP RACK. Triggered by a simple 'kldload $module'.
The PR 257067 could be related, since the panic message is the same. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=257067
(In reply to Gordon Bergling from comment #11) So you boot the system up using the TCP default stack. After logging in, the system panic when you kldload tcp_rack?
(In reply to Michael Tuexen from comment #13) Thats correct. If the Stack is loaded via /boot/loader.conf the system panics, when the first real network traffic, in my case an attempt to mount a NFS share, is happening. When booting with default network stack the system panics, when loading the tcp_rack or tcp_bbr kernel module. The panic including the backtrace are the same.
(In reply to Gordon Bergling from comment #14) What if you boot with the default stack and load some kernel module other than RACK or BBR? I'm trying to figure out if this problem is actually related to RACK or BBR or not...
(In reply to Michael Tuexen from comment #15) Just loaded cc_cubic, cc_vegas and a few netgraph modules without any panic. So I would think that its an interaction of the hyper-v hn interface and the bbr and rack stacks.
(In reply to Gordon Bergling from comment #16) Can you update to the latest sources? The LRO code has changed recently... I think the problem is related which part is entering the network epoch and which part is expecting that it is already done... Can't test myself, since I don't have a Windows system and Hyper-V requires one. Or am I wrong?
(In reply to Michael Tuexen from comment #17) The build I tested recently is from 11th of July, so I would think, I already have the most relevant changes included. Hyper-V is indeed Windows only. It would be possible to test this on Azure if you have an account and available resources.
(In reply to Gordon Bergling from comment #18) Can you then provide a backtrace like in #5, but based on the latest code?
Created attachment 226456 [details] Patch for testing
Created attachment 226457 [details] Patch for testing
Can you test the patch I uploaded?
(In reply to Michael Tuexen from comment #19) I just applied your patch and have a build running. The backtrace from a kernel build from today: __curthread () at /boiler/nfs/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) bt #0 __curthread () at /boiler/nfs/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=textdump@entry=1) at /boiler/nfs/src/sys/kern/kern_shutdown.c:399 #2 0xffffffff80c20da0 in kern_reboot (howto=260) at /boiler/nfs/src/sys/kern/kern_shutdown.c:486 #3 0xffffffff80c21206 in vpanic (fmt=0xffffffff81202daa "Assertion %s failed at %s:%d", ap=<optimized out>) at /boiler/nfs/src/sys/kern/kern_shutdown.c:919 #4 0xffffffff80c20f53 in panic (fmt=<unavailable>) at /boiler/nfs/src/sys/kern/kern_shutdown.c:843 #5 0xffffffff80e01f6d in tcp_lro_lookup (ifp=0xfffff80003784000, pa=0xfffffe00573d9748) at /boiler/nfs/src/sys/netinet/tcp_lro.c:1155 #6 tcp_lro_flush_tcphpts (lc=0xfffffe00572cdad0, le=0xfffffe00573d9710) at /boiler/nfs/src/sys/netinet/tcp_lro.c:1268 #7 tcp_lro_flush (lc=lc@entry=0xfffffe00572cdad0, le=le@entry=0xfffffe00573d9710) at /boiler/nfs/src/sys/netinet/tcp_lro.c:1348 #8 0xffffffff80e02303 in tcp_lro_rx_done (lc=<optimized out>) at /boiler/nfs/src/sys/netinet/tcp_lro.c:565 #9 tcp_lro_flush_all (lc=lc@entry=0xfffffe00572cdad0) at /boiler/nfs/src/sys/netinet/tcp_lro.c:1509 #10 0xffffffff8105487c in hn_chan_rollup (rxr=<optimized out>, txr=0xfffff80003b2e400) at /boiler/nfs/src/sys/dev/hyperv/netvsc/if_hn.c:2886 #11 hn_chan_callback (chan=chan@entry=0xfffff80003ac7c00, xrxr=<optimized out>, xrxr@entry=0xfffffe00572cc000) at /boiler/nfs/src/sys/dev/hyperv/netvsc/if_hn.c:7617 #12 0xffffffff81060aff in vmbus_chan_task (xchan=0xfffff80003ac7c00, pending=<optimized out>) at /boiler/nfs/src/sys/dev/hyperv/vmbus/vmbus_chan.c:1381 #13 0xffffffff80c82e8a in taskqueue_run_locked (queue=queue@entry=0xfffff800035dc400) at /boiler/nfs/src/sys/kern/subr_taskqueue.c:476 #14 0xffffffff80c83f12 in taskqueue_thread_loop (arg=arg@entry=0xfffffe0051a1b090) at /boiler/nfs/src/sys/kern/subr_taskqueue.c:793 #15 0xffffffff80bda300 in fork_exit (callout=0xffffffff80c83e50 <taskqueue_thread_loop>, arg=0xfffffe0051a1b090, frame=0xfffffe0051757c00) at /boiler/nfs/src/sys/kern/kern_fork.c:1083
(In reply to Michael Tuexen from comment #22) With your patch applied the panic is gone. I tested the following scenarios: a) booting the default network stack and manually loading tcp_bbr and tcp_rack b) loading the modules at boot time via /boot/loader.conf, but leaving the default network stack active c) loading the modules at boot time via /boot/loader.conf and activating the stack via /etc/sysctl.conf net.inet.tcp.cc.algorithm=htcp net.inet.tcp.functions_default=bbr Thanks for the patch!
(In reply to Gordon Bergling from comment #24) Thanks for testing. We'll get the patch in tree soon...
Michael: Please use __predict_false() for those epoch checks. Typically it is the caller which is responsible to apply EPOCH and not the LRO code. --HPS
(In reply to Hans Petter Selasky from comment #26) OK. So does the Hyper-V driver (and possibly others) has a bug?
I think your patch is fine Michael. We do the same in ether_input() for devices that doesn't support EPOCH. Just add the __predict_false() before pushing, like done in ether_input. --HPS
Rather than adding additional epoch enters to the critical path, I'd strongly prefer the hyperv driver be fixed to respect the network epoch. Most network drivers pass packets into the network stack as part of their receive interrupt processing. Code has been added to automatically enter the network epoch in the FreeBSD ithread code, so that the epoch is held for the duration of the interrupt handler's execution on each interrupt delivered. For drivers which use special mechanisms (like, taskqueues), it would be best if they marked themselves with IFF_KNOWSEPOCH and then called NET_EPOCH_ENTER() around calls into the network stack. Since entering an EPOCH uses atomic operations, its best to take and release the epoch as infrequently as possible. Eg, around the loop that processes packets and passes them to lro or if_input(). See iflib for an example.
(In reply to Michael Tuexen from comment #27) Yes, I think so. See comment 29 for my opinion..
(In reply to Andrew Gallatin from comment #30) I'm fine with that. If we go down that path we might want to call `NET_EPOCH_ASSERT()` at all relevant entry points to the LRO code, not inside an internal routing. That way it would be clear that it is expected that the call entered the net epoch.
(In reply to Andrew Gallatin from comment #30) One more comment: It was review D28374 which changed the LRO code from entering the network epoch to asserting that this is done by the caller. I guess that might be the reason why not all callers follow right now the contract you are suggesting.
With a -CURRENT from today the line numbers have changed, so here is the new panic output and backtrace. ---------------------------------------------------------------- panic: Assertion in_epoch(net_epoch_preempt) failed at /boiler/nfs/src/sys/netinet/tcp_lro.c:1180 cpuid = 0 time = 1629461898 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe005174d850 vpanic() at vpanic+0x187/frame 0xfffffe005174d8b0 panic() at panic+0x43/frame 0xfffffe005174d910 tcp_lro_flush() at tcp_lro_flush+0x171d/frame 0xfffffe005174d9a0 tcp_lro_flush_all() at tcp_lro_flush_all+0x1a3/frame 0xfffffe005174d9f0 hn_chan_callback() at hn_chan_callback+0x112c/frame 0xfffffe005174dad0 vmbus_chan_task() at vmbus_chan_task+0x2f/frame 0xfffffe005174db00 taskqueue_run_locked() at taskqueue_run_locked+0xaa/frame 0xfffffe005174db80 taskqueue_thread_loop() at taskqueue_thread_loop+0xc2/frame 0xfffffe005174dbb0 fork_exit() at fork_exit+0x80/frame 0xfffffe005174dbf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe005174dbf0 ---------------------------------------------------------------- backtrace: ---------------------------------------------------------------- #0 __curthread () at /boiler/nfs/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=textdump@entry=1) at /boiler/nfs/src/sys/kern/kern_shutdown.c:399 #2 0xffffffff80c245c0 in kern_reboot (howto=260) at /boiler/nfs/src/sys/kern/kern_shutdown.c:486 #3 0xffffffff80c24a26 in vpanic (fmt=0xffffffff8120a5e4 "Assertion %s failed at %s:%d", ap=<optimized out>) at /boiler/nfs/src/sys/kern/kern_shutdown.c:919 #4 0xffffffff80c24773 in panic (fmt=<unavailable>) at /boiler/nfs/src/sys/kern/kern_shutdown.c:843 #5 0xffffffff80e08c6d in tcp_lro_lookup (ifp=0xfffff80008f42800, pa=0xfffffe00573dd748) at /boiler/nfs/src/sys/netinet/tcp_lro.c:1180 #6 tcp_lro_flush_tcphpts (lc=0xfffffe00572d1ad0, le=0xfffffe00573dd710) at /boiler/nfs/src/sys/netinet/tcp_lro.c:1293 #7 tcp_lro_flush (lc=lc@entry=0xfffffe00572d1ad0, le=le@entry=0xfffffe00573dd710) at /boiler/nfs/src/sys/netinet/tcp_lro.c:1373 #8 0xffffffff80e09003 in tcp_lro_rx_done (lc=<optimized out>) at /boiler/nfs/src/sys/netinet/tcp_lro.c:590 #9 tcp_lro_flush_all (lc=lc@entry=0xfffffe00572d1ad0) at /boiler/nfs/src/sys/netinet/tcp_lro.c:1534 #10 0xffffffff8105bd2c in hn_chan_rollup (rxr=<optimized out>, txr=0xfffff80009026c00) at /boiler/nfs/src/sys/dev/hyperv/netvsc/if_hn.c:2886 #11 hn_chan_callback (chan=chan@entry=0xfffff80003ab6400, xrxr=<optimized out>, xrxr@entry=0xfffffe00572d0000) at /boiler/nfs/src/sys/dev/hyperv/netvsc/if_hn.c:7617 #12 0xffffffff81067faf in vmbus_chan_task (xchan=0xfffff80003ab6400, pending=<optimized out>) at /boiler/nfs/src/sys/dev/hyperv/vmbus/vmbus_chan.c:1381 #13 0xffffffff80c8792a in taskqueue_run_locked (queue=queue@entry=0xfffff80003550100) at /boiler/nfs/src/sys/kern/subr_taskqueue.c:476 #14 0xffffffff80c889b2 in taskqueue_thread_loop (arg=arg@entry=0xfffffe0051a1f090) at /boiler/nfs/src/sys/kern/subr_taskqueue.c:793 #15 0xffffffff80bdd830 in fork_exit (callout=0xffffffff80c888f0 <taskqueue_thread_loop>, arg=0xfffffe0051a1f090, frame=0xfffffe005174dc00) at /boiler/nfs/src/sys/kern/kern_fork.c:1087 #16 <signal handler called> ----------------------------------------------------------------
I think I have found a way to enable TCP BBR and RACK on Hyper-V again with the following patch, which adds NET_EPOCH{ENTER,EXIT} calls around hn_chan_rollup(), in which the panic is happening. diff --git a/sys/dev/hyperv/netvsc/if_hn.c b/sys/dev/hyperv/netvsc/if_hn.c index cd0b5a5fa8b9..fa141adad9f6 100644 --- a/sys/dev/hyperv/netvsc/if_hn.c +++ b/sys/dev/hyperv/netvsc/if_hn.c @@ -83,6 +83,7 @@ __FBSDID("$FreeBSD$"); #include <sys/taskqueue.h> #include <sys/buf_ring.h> #include <sys/eventhandler.h> +#include <sys/epoch.h> #include <machine/atomic.h> #include <machine/in_cksum.h> @@ -2883,7 +2884,10 @@ static void hn_chan_rollup(struct hn_rx_ring *rxr, struct hn_tx_ring *txr) { #if defined(INET) || defined(INET6) + struct epoch_tracker et; + NET_EPOCH_ENTER(et); tcp_lro_flush_all(&rxr->hn_lro); + NET_EPOCH_EXIT(et); #endif /* If have been running with this change for about three days and haven't had a panic since then. One question remains before opening a differential for it. Should I place something around these NET_EPOCH_* calls like HAVE_BBR or something? The panic within tcp_lro_flush_all() happens only if TCP BBR or RACK are been used.
We need something along the lines of your proposed fix. However, the number of entries/exist of the epoch should be kept minimal. So we might want to have the author of the driver to have a look. But first we need to get the contracts of the TCP LRO code nailed down. Especially in the case when the HPTS is used. Let me come up with a patch for that first to get the contract fixed. Then we can propose a functional, maybe performance wise not optimal fix for the MS specific drive. Hopefully some people with expertise on the drive can help out.
See review D31648 for a step towards making the LRO contracts with regard to net epoch handling clear.
I out out a minimalistic patch in review D31679. Gordon: Would you be so kind and test it? I contains the code change you are already testing. I'll only be able to do a compile check, since I don't have access to a system running Windows.
The patch in review D31679 has been updated. Gordon: Please test the updated version.
(In reply to Michael Tuexen from comment #38) I tested D31679 and haven't system is running stable. I did some iperf3 testing and the performance is somewhat strange when enabling TCP BBR. With the default stack I got [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 268 MBytes 2.24 Gbits/sec 4646 78.4 KBytes [ 5] 1.00-2.00 sec 304 MBytes 2.55 Gbits/sec 5520 85.5 KBytes [ 5] 2.00-3.00 sec 305 MBytes 2.56 Gbits/sec 5474 106 KBytes [ 5] 3.00-4.00 sec 311 MBytes 2.61 Gbits/sec 5705 1.41 KBytes [ 5] 4.00-5.00 sec 303 MBytes 2.54 Gbits/sec 5565 71.3 KBytes [ 5] 5.00-6.00 sec 305 MBytes 2.56 Gbits/sec 5474 88.4 KBytes [ 5] 6.00-7.00 sec 302 MBytes 2.53 Gbits/sec 5566 78.4 KBytes [ 5] 7.00-8.00 sec 287 MBytes 2.41 Gbits/sec 5198 65.6 KBytes [ 5] 8.00-9.00 sec 309 MBytes 2.60 Gbits/sec 5566 67.1 KBytes [ 5] 9.00-10.00 sec 297 MBytes 2.49 Gbits/sec 5520 65.6 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 2.92 GBytes 2.51 Gbits/sec 54234 sender [ 5] 0.00-10.00 sec 2.92 GBytes 2.51 Gbits/sec receiver and with TCP BBR [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.02 sec 22.4 MBytes 183 Mbits/sec 806 17.0 KBytes [ 5] 1.02-2.00 sec 8.48 KBytes 71.1 Kbits/sec 4 17.0 KBytes [ 5] 2.00-3.01 sec 0.00 Bytes 0.00 bits/sec 2 17.0 KBytes [ 5] 3.01-4.02 sec 2.83 KBytes 23.1 Kbits/sec 1 17.0 KBytes [ 5] 4.02-5.05 sec 2.83 KBytes 22.5 Kbits/sec 2 17.0 KBytes [ 5] 5.05-6.05 sec 0.00 Bytes 0.00 bits/sec 0 17.0 KBytes [ 5] 6.05-7.05 sec 2.83 KBytes 23.2 Kbits/sec 1 17.0 KBytes [ 5] 7.05-8.05 sec 0.00 Bytes 0.00 bits/sec 0 17.0 KBytes [ 5] 8.05-9.02 sec 0.00 Bytes 0.00 bits/sec 1 17.0 KBytes [ 5] 9.02-10.04 sec 0.00 Bytes 0.00 bits/sec 0 17.0 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.04 sec 22.4 MBytes 18.7 Mbits/sec 817 sender [ 5] 0.00-10.04 sec 22.1 MBytes 18.4 Mbits/sec receiver Tested from VM to VM on the same host. But at least the panic is gone.
But only when running iperf3 as a client. When running as a server I got [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.01 sec 228 KBytes 1.85 Mbits/sec 0 25.5 KBytes [ 5] 1.01-2.00 sec 340 KBytes 2.79 Mbits/sec 0 39.8 KBytes [ 5] 2.00-3.00 sec 491 KBytes 4.02 Mbits/sec 0 54.1 KBytes [ 5] 3.00-4.00 sec 268 MBytes 2.25 Gbits/sec 0 882 KBytes [ 5] 4.00-5.00 sec 865 MBytes 7.26 Gbits/sec 0 1.40 MBytes [ 5] 5.00-6.00 sec 910 MBytes 7.63 Gbits/sec 0 1.61 MBytes [ 5] 6.00-7.00 sec 909 MBytes 7.63 Gbits/sec 0 1.61 MBytes [ 5] 7.00-8.00 sec 906 MBytes 7.60 Gbits/sec 0 1.61 MBytes [ 5] 8.00-9.00 sec 901 MBytes 7.56 Gbits/sec 0 1.61 MBytes [ 5] 9.00-10.00 sec 913 MBytes 7.66 Gbits/sec 0 1.61 MBytes Test was running from 12/stable client to 14-CURRENT patched as server.
Thank you very much for testing! I think the bug reported (the panic) is fixed. Regarding the performance: I would suggest that you open a new bug regarding unexpected performance results on the BBR stack. Please note that the BBR stack is experimental and has not been tested in a VM environment. Right now I have no idea if the is a problem in the BBR stack or in the hyperv driver, but I would like to get some insights... I'm also not sure what the difference is between the measurement in comment #39 and comment #40.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=26d79d40a74fc804c76acd88a1f8f10f9827a2b3 commit 26d79d40a74fc804c76acd88a1f8f10f9827a2b3 Author: Michael Tuexen <tuexen@FreeBSD.org> AuthorDate: 2021-08-26 17:27:04 +0000 Commit: Michael Tuexen <tuexen@FreeBSD.org> CommitDate: 2021-08-26 17:32:00 +0000 Hyper-V: hn: Enter network epoch when required PR: 254695 sys/dev/hyperv/netvsc/if_hn.c | 8 ++++++++ 1 file changed, 8 insertions(+)
(In reply to commit-hook from comment #42) @tuexen: could you MFC this bugfix to stable/13? I think this bug can then be closed.
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=42cb69e147e30000bd35c97d4212da4540b862dd commit 42cb69e147e30000bd35c97d4212da4540b862dd Author: Michael Tuexen <tuexen@FreeBSD.org> AuthorDate: 2021-08-26 17:27:04 +0000 Commit: Michael Tuexen <tuexen@FreeBSD.org> CommitDate: 2021-12-10 10:50:01 +0000 Hyper-V: hn: Enter network epoch when required PR: 254695 (cherry picked from commit 26d79d40a74fc804c76acd88a1f8f10f9827a2b3) sys/dev/hyperv/netvsc/if_hn.c | 8 ++++++++ 1 file changed, 8 insertions(+)