Created attachment 221304 [details] core.txt After upgrade to the most recent 12.2-STABLE system installed on ESXi panicked with: [9] vmci0: <VMware Virtual Machine Communication Interface> port 0x1080-0x10bf irq 16 at device 7.7 on pci0 <3>[9] vmci: Could not map: BAR1 <3>[9] vmci: Failed to map PCI BARs. <4>[9] vmci: Failed to unsubscribe to event (type=0) with subscriber (ID=0xffffffff). [9] device_attach: vmci0 attach returned 6 [9] vmci0: <VMware Virtual Machine Communication Interface> port 0x1080-0x10bf irq 16 at device 7.7 on pci0 <3>[9] vmci: Could not map: BAR1 <3>[9] vmci: Failed to map PCI BARs. [9] [9] [9] Fatal trap 12: page fault while in kernel mode [9] cpuid = 3; apic id = 03 [9] fault virtual address = 0x410 [9] fault code = supervisor read data, page not present [9] instruction pointer = 0x20:0xffffffff80b822a6 [9] stack pointer = 0x28:0xfffffe0035170600 [9] frame pointer = 0x28:0xfffffe0035170680 [9] code segment = base rx0, limit 0xfffff, type 0x1b [9] = DPL 0, pres 1, long 1, def32 0, gran 1 [9] processor eflags = interrupt enabled, resume, IOPL = 0 [9] current process = 198 (devctl) [9] trap number = 12 [9] panic: page fault The kernel from 12.2-STABLE r367922 build in November 2020 works fine.
After revering suspicious commit 6338833c50a7566d006b722c791a6a92071309b8 everything works fine again. https://cgit.freebsd.org/src/commit/sys/dev/vmware/vmci/vmci.c?h=stable/12&id=6338833c50a7566d006b722c791a6a92071309b8 I am not able to report the revision but it's still the latest 12.2-STABLE
^Triage: assign, but notify reviewer of DR as well.
Could you please show the backtrace? Presumably that commit is triggering the problem because it's causing a driver to be loaded when it was not being loaded before, so it's just exposing a driver bug.
(In reply to Mark Johnston from comment #3) Isn't attached core.txt including a backtrace? It's standard postmortem analysis done by the /etc/rc.d/savecore script. If it's not enough, then please let me know and I will try to examine the core file with lldb.
Created attachment 221331 [details] Untested patch for vmci_qp_guest_endpoints_exit Based on the core.txt stack trace, try the (untested) attached patch. Note: there may be secondary cleanup issues past this one. Also, ESXi 4.1.0 has been EOL for some time so debugging the actual PCI mapping issue is likely not as useful.
(In reply to Mark Peek from comment #5) Thanks for the patch, but I am not able to build kernel with it applied: Building /usr/obj/usr/src/amd64.amd64/sys/VBSD/modules/usr/src/sys/modules/vmware/vmci/vmci_queue_pair.o /usr/src/sys/dev/vmware/vmci/vmci_queue_pair.c:341:6: error: invalid argument type 'vmci_mutex' (aka 'struct mtx') to unary expression if (!qp_guest_endpoints.mutex) ^~~~~~~~~~~~~~~~~~~~~~~~~ 1 error generated. *** Error code 1 Stop. make[5]: stopped in /usr/src/sys/modules/vmware/vmci
(In reply to Marek Zarychta from comment #6) > if (!qp_guest_endpoints.mutex) Maybe: if (qp_guest_endpoints.mutex == NULL)
Created attachment 221369 [details] Proposed patch for handling vmci pci errors Loaded up stable/12, simulated the error, and worked through the error cases. Give this latest patch a try.
(In reply to Mark Peek from comment #8) Thank you for the patch. It solves the issue, the panics have gone. Here is excerpt of the dmesg connected with this: [10] VMware memory control driver initialized [10] intsmb0: <Intel PIIX4 SMBUS Interface> port 0x1040-0x104f at device 7.3 on pci0 [10] intsmb0: intr SMI disabled revision 0 [10] smbus0: <System Management Bus> on intsmb0 [10] vmci0: <VMware Virtual Machine Communication Interface> port 0x1080-0x10bf irq 16 at device 7.7 on pci0 [10] vmci: Could not map: BAR1 [10] vmci: Failed to map PCI BARs. [10] vmci: Failed to unsubscribe to event (type=0) with subscriber (ID=0xffffffff). [10] device_attach: vmci0 attach returned 6 [10] vmci0: <VMware Virtual Machine Communication Interface> port 0x1080-0x10bf irq 16 at device 7.7 on pci0 [10] vmci: Could not map: BAR1 [10] vmci: Failed to map PCI BARs. [10] vmci: Failed to unsubscribe to event (type=0) with subscriber (ID=0xffffffff). [10] device_attach: vmci0 attach returned 6
So far the patch seems to be a reliable solution. Is anything that prevents it from being committed to HEAD? Should we expect it to be MFCed to stable/12 or directly committed to this branch or should we cope with it on our own? Anyway thanks again for providing us with this patch which perfectly solves the issue.
(In reply to Mark Peek from comment #8) Mark, is there any reason not to commit this? 13.0 is going to be branched in the next day or so, so it'd be nice to get this in.
I encountered this in 12.2-STABLE on ESXi 6.5.0. Patch applied and resolved the issue. FreeBSD xxx@yyy.com 12.2-STABLE FreeBSD 12.2-STABLE #6 r369256M: Fri Feb 12 17:17:35 CST 2021 root@yyy.com:/usr/obj/usr/src/i386.i386/sys/GENERIC i386
If the patch was not applied, then a quick workaround for this will be to do some post make installkernel cleanup: rm /boot/kernel/vmci.ko Adding "WITHOUT_MODULES = vmci" to /etc/make.conf doesn't help much.
Confirmed this patch worked for me on stable/12 on ESXi...so old I don't really want to tell you (hint: it's still called ESX). Thanks for running this down.
I won't be able to help in this case anymore since moved the affected machine from EOLed ESXi to bhyve. After the upgrade to stable/13 even more issues have risen affecting both: stability and reliability of the VM (mostly due to problems with timecounter keeping and PF state table overflowing due to this). The workaround for this bug is very simple, but the bug persistent, so PR while still opened might be useful. The decision about closing it I leave over to the Committers/Triggers team.
Bumped into this problem after upgrading to 13.0, which is quite unpleasant. Any progress on this?
Bumping the PR again. What prevents getting it in?
(In reply to Gleb Popov from comment #17) This is my fault. It dropped off my radar. I have an updated patch for -current (really for INVARIANTS turned on) that I need to validate on a new build and then put it out for a quick review. I'll get that prioritized in the next day or two.
I'm having the same issue when upgrading from FreeBSD 12.2 to 13.0 on VMware ESXi 6.0.0.My workaround was: # mv /boot/kernel/vmci.ko /boot/kernel/vmci.koNOTUSED Upgrading from FreeBSD 12.2 to 13.0 on VMware ESXi 6.5.0 has not been a problem.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=0f14bcbe384091c729464cb770372aeb79061070 commit 0f14bcbe384091c729464cb770372aeb79061070 Author: Mark Peek <mp@FreeBSD.org> AuthorDate: 2021-10-09 21:21:16 +0000 Commit: Mark Peek <mp@FreeBSD.org> CommitDate: 2021-10-09 21:21:16 +0000 vmci: fix panic due to freeing unallocated resources Summary: An error mapping PCI resources results in a panic due to unallocated resources being freed up. This change puts the appropriate checks in place to prevent the panic. PR: 252445 Reported by: Marek Zarychta <zarychtam@plan-b.pwste.edu.pl> Tested by: marcus MFC after: 1 week Sponsored by: VMware Test Plan: Along with user testing, also simulated error by inserting a ENXIO return in vmci_map_bars(). Reviewed by: marcus Subscribers: imp Differential Revision: https://reviews.freebsd.org/D32016 sys/dev/vmware/vmci/vmci.c | 9 ++++--- sys/dev/vmware/vmci/vmci_event.c | 3 +++ sys/dev/vmware/vmci/vmci_kernel_if.c | 48 ++++++++++++++++++++++++++++++++++- sys/dev/vmware/vmci/vmci_kernel_if.h | 2 ++ sys/dev/vmware/vmci/vmci_queue_pair.c | 3 +++ 5 files changed, 61 insertions(+), 4 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=4e5c1be4202a141b7a15c505848abcbea535912f commit 4e5c1be4202a141b7a15c505848abcbea535912f Author: Mark Peek <mp@FreeBSD.org> AuthorDate: 2021-10-09 21:21:16 +0000 Commit: Mark Peek <mp@FreeBSD.org> CommitDate: 2021-10-16 18:22:43 +0000 vmci: fix panic due to freeing unallocated resources Summary: An error mapping PCI resources results in a panic due to unallocated resources being freed up. This change puts the appropriate checks in place to prevent the panic. PR: 252445 Reported by: Marek Zarychta <zarychtam@plan-b.pwste.edu.pl> Tested by: marcus MFC after: 1 week Sponsored by: VMware Test Plan: Along with user testing, also simulated error by inserting a ENXIO return in vmci_map_bars(). Reviewed by: marcus Subscribers: imp Differential Revision: https://reviews.freebsd.org/D32016 (cherry picked from commit 0f14bcbe384091c729464cb770372aeb79061070) sys/dev/vmware/vmci/vmci.c | 9 ++++--- sys/dev/vmware/vmci/vmci_event.c | 3 +++ sys/dev/vmware/vmci/vmci_kernel_if.c | 48 ++++++++++++++++++++++++++++++++++- sys/dev/vmware/vmci/vmci_kernel_if.h | 2 ++ sys/dev/vmware/vmci/vmci_queue_pair.c | 3 +++ 5 files changed, 61 insertions(+), 4 deletions(-)
Will this change end up into a security update, so that `freebsd-update upgrade -r 13.0-RELEASE` would work?
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=b5d236785dc352a65bc29d97c8a89b40387eb7a0 commit b5d236785dc352a65bc29d97c8a89b40387eb7a0 Author: Mark Peek <mp@FreeBSD.org> AuthorDate: 2021-10-09 21:21:16 +0000 Commit: Mark Peek <mp@FreeBSD.org> CommitDate: 2021-10-17 15:31:53 +0000 vmci: fix panic due to freeing unallocated resources Summary: An error mapping PCI resources results in a panic due to unallocated resources being freed up. This change puts the appropriate checks in place to prevent the panic. PR: 252445 Reported by: Marek Zarychta <zarychtam@plan-b.pwste.edu.pl> Tested by: marcus MFC after: 1 week Sponsored by: VMware Test Plan: Along with user testing, also simulated error by inserting a ENXIO return in vmci_map_bars(). Reviewed by: marcus Subscribers: imp Differential Revision: https://reviews.freebsd.org/D32016 (cherry picked from commit 0f14bcbe384091c729464cb770372aeb79061070) sys/dev/vmware/vmci/vmci.c | 9 ++++--- sys/dev/vmware/vmci/vmci_event.c | 3 +++ sys/dev/vmware/vmci/vmci_kernel_if.c | 48 ++++++++++++++++++++++++++++++++++- sys/dev/vmware/vmci/vmci_kernel_if.h | 2 ++ sys/dev/vmware/vmci/vmci_queue_pair.c | 3 +++ 5 files changed, 61 insertions(+), 4 deletions(-)
(In reply to Gleb Popov from comment #22) Could you drop a note to secteam@ requesting this? Ideally a copy of https://www.freebsd.org/security/errata-template.txt would be filled out with some details of the problem but a pointer to this PR is probably sufficient.
Finally, it even found its way to errata notices! Thank you again for fixing this nasty bug. I am closing as completely resolved.