Observed this on FreeBSD 12.1-RELEASE-p6 but not FreeBSD 12.1-RELEASE. Environment: - VMware Fusion 11.5.3 - FreeBSD 12.1-RELEASE-p6 amd64 VM with vmxnet3 interface Steps to reproduce: 1. Boot VM 2. tcpdump -ni netmap:vmx0 This is repeatable.
Sorry but it can be also reproduced on FreeBSD 12.1-RELEASE. It looks if there's no traffic on interface vmx0, the problem cannot be reproduced following the above steps. Need one more step to reproduced. Steps to reproduce: 1. Boot VM 2. tcpdump -ni netmap:vmx0 3. ctrl + c to stop tcpdump, then repeat step 2
Created attachment 216051 [details] crash dump
Ack. This is also easily reproducible on stable/12, within a QEMU/KVM VM: # ifconfig vmx0 up # pkt-gen -i vmx0 [CRASH] It's a bug related to the DMA code within iflib (on the receive side). I'll try to investigate.
Created attachment 216076 [details] Patch to fix the bug
The patch above fixes the panic for me. Can you please test it on your environment?
(In reply to Vincenzo Maffione from comment #5) Glad to test it :)
(In reply to Vincenzo Maffione from comment #5) It works like a charm after applying the patch. Cheers :)
Thanks for double checking. I will commit the fix soon (https://reviews.freebsd.org/D25541).
A commit references this bug: Author: vmaffione Date: Mon Jul 20 21:08:57 UTC 2020 New revision: 363378 URL: https://svnweb.freebsd.org/changeset/base/363378 Log: iflib: initialize netmap with the correct number of descriptors In case the network device has a RX or TX control queue, the correct number of TX/RX descriptors is contained in the second entry of the isc_ntxd (or isc_nrxd) array, rather than in the first entry. This case is correctly handled by iflib_device_register() and iflib_pseudo_register(), but not by iflib_netmap_attach(). If the first entry is larger than the second, this can result in a panic. This change fixes the bug by introducing two helper functions that also lead to some code simplification. PR: 247647 MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D25541 Changes: head/sys/net/iflib.c
Glad to see the fix was merged into head master :) I'm not familiar with FreeBSD's releasing policy, are there any est or schedule releasing a new patched version?
Once a commit hits master, we wait for a settling period (e.g. 2 weeks) to give other developers and users the chance to hit possible regressions and problems. If nothing shows up, the commit can be merged to the stable/12 branch (and it will be in this case). Once merged to stable/12, you will be able to get it if you download the FreeBSD-12-STABLE iso. Finally, the commit will be automatically included in the next 12.x release. In this case (https://www.freebsd.org/releng/), this is going to be FreeBSD 12.2
(In reply to Vincenzo Maffione from comment #11) Thx :)
^Triage: Assign to committer resolving, set URL to related review, track pending merge (stable/12) flag Thank you Vincenzo!
Created attachment 217017 [details] icmp DUP icmp echo/replies
Created attachment 217018 [details] pcap: tcp dup ACKs pcap: tcp dup ACKs
We tested this patch against 12/STABLE (677f5a8efda). I can also confirm that patch fixes the kernel crash. PINGs are also ok. However, if you try to do secure copy, you start getting DUP packets; and connection stalls. Please see attached PCAP file and the pic. Environment: root@OPNsense:~ # uname -a FreeBSD OPNsense.localdomain 12.1-STABLE FreeBSD 12.1-STABLE #2 677f5a8efda(stable/12)-dirty: Tue Aug 4 14:55:58 PDT 2020 root@client_bsd12:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
(In reply to Murat from comment #16) > I can also confirm that patch fixes the kernel crash. PINGs are also ok. > However, if you try to do secure copy, you start getting DUP packets; and connection stalls. For > PINGs are also ok , do you mean after the fix there's no duplicated ICMP replies ? For the duplicated TCP acks, there's a known regression https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236999 . Can you please disable TSO ('ifconfig vmx0 -tso') and test again ?
(In reply to Zhenlei Huang from comment #17) Hi Zhenlei, No, if you just do a ping, all is OK. But, whenever you start an scp transfer (while pinging), you start getting duplicate packets for both ping and scp. See attached files. We disable all offloadings during the tests: ifconfig vmx0 -vlanhwtso -vlanhwfilter -vlanhwtag -vlanhwcsum -txcsum -rxcsum -tso4 -tso6 -lro -txcsum6 -rxcsum6
I must add: This happens while running below command on vmx0, bridging host+hw rings ./bridge -i netmap:vmx0 -i netmap:vmx0^
Hi Murat, Sorry I'm not familiar with such bridge configuration. It seems to be a different issue that masked by this one. CC Vincenzo Maffione.
I'm pretty sure what you see is the effect of a separate bug. It's a regression introduced by porting netmap to iflib. It shows up with those devices that use multiple "free lists" on the receive side. (e.g. isc_nfl > 1). vmx is one of those devices. I still have to find the time to look into that (e.g., figure out how these free lists actually work). Please feel free to open a separate bug, since this one is about the kernel crash, and it's fixed.
Vincenzo, thanks. I'll create another ticket for that. As for the kernel crash, any possibility that this bug was also affecting ix(4). We saw a similar crash with that one too.
Follow-up ticket: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=248494
(In reply to Murat from comment #22) It is possible that the same bug caused a crash also on ix(4). Have you tested if the patch makes the bug go away?
(In reply to Vincenzo Maffione from comment #24) Thanks for the update. Yes, we've confirmed that this fix resolved kernel crash with vmx. I will try to test with ix(4) soon.
A commit references this bug: Author: vmaffione Date: Mon Aug 10 17:53:10 UTC 2020 New revision: 364085 URL: https://svnweb.freebsd.org/changeset/base/364085 Log: MFC r363378 iflib: initialize netmap with the correct number of descriptors In case the network device has a RX or TX control queue, the correct number of TX/RX descriptors is contained in the second entry of the isc_ntxd (or isc_nrxd) array, rather than in the first entry. This case is correctly handled by iflib_device_register() and iflib_pseudo_register(), but not by iflib_netmap_attach(). If the first entry is larger than the second, this can result in a panic. This change fixes the bug by introducing two helper functions that also lead to some code simplification. PR: 247647 Differential Revision: https://reviews.freebsd.org/D25541 Changes: _U stable/12/ stable/12/sys/net/iflib.c
Repeating on FreeBSD 12.2-RELEASE, this bug can not be reproduced. Close as fixed.