Bug 222937 - [bhyve] Severe RAM corruption after PciPassThrough-guest shutdown
Summary: [bhyve] Severe RAM corruption after PciPassThrough-guest shutdown
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.1-STABLE
Hardware: Any Any
: --- Affects Some People
Assignee: John Baldwin
URL:
Keywords: patch
Depends on:
Blocks:
 
Reported: 2017-10-11 19:27 UTC by Harald Schmalzbauer
Modified: 2017-11-16 18:28 UTC (History)
3 users (show)

See Also:
jhb: mfc-stable11?


Attachments
/etc/rc.d/pciptdetach: Work around RAM corruption at guest shutdown (1.39 KB, application/x-shellscript)
2017-10-11 19:27 UTC, Harald Schmalzbauer
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Harald Schmalzbauer 2017-10-11 19:27:07 UTC
Created attachment 187086 [details]
/etc/rc.d/pciptdetach: Work around RAM corruption at guest shutdown

Various different panics will happen after shutting down a bhyve guest with PCIe passthrough NICs if some conditions are met.

This can lead to completely destroyed zpools, like I found out after some dozend not so impacting crashes...

Quoting jhb@:
I suspect what is happening is that the PCI devices are still issuing DMAs
after the guest has been shutdown which end up trashing other parts of host
memory.  This may somewhat be my fault as I made a change which moves the
device back into the host domain after FLR during guest shutdown.  I should
perhaps leave the device disabled in the DMAR table instead if the FLR
doesn't succeed.  (We could also add some other forms of reset for devices
not supporting FLR.)
</quote>

Since I don't have the skills to help fixing the root cause, I wrote a little workaround in form of a rc(8) script (to be copied to /etc/rc.d) which should protect against accidental crashes and data losses, by bringing the PciPassThrough devices down before shutting down, which prevents DMA writes from the card after moving it back into host domain.

-harry
Comment 1 Harald Schmalzbauer 2017-10-13 13:43:02 UTC
Qick update:
jhb@ seems to have fixed the root cause.
I'm currently testing the fix and already reported him that I couldn't reproduce the issue anymore.  I'm confident that the fix will be commited very soon, so the script attached becomes obsolete very soon too.
Comment 2 commit-hook freebsd_committer freebsd_triage 2017-10-27 14:57:42 UTC
A commit references this bug:

Author: jhb
Date: Fri Oct 27 14:57:15 UTC 2017
New revision: 325039
URL: https://svnweb.freebsd.org/changeset/base/325039

Log:
  Rework pass through changes in r305485 to be safer.

  Specifically, devices that do not support PCI-e FLR and were not
  gracefully shutdown by the guest OS could continue to issue DMA
  requests after the VM was terminated.  The changes in r305485 meant
  that those DMA requests were completed against the host's memory which
  could result in random memory corruption.  Instead, leave ppt devices
  that are not attached to a VM disabled in the IOMMU and only restore
  the devices to the host domain if the ppt(4) driver is detached from a
  device.

  As an added safety belt, disable busmastering for a pass-through device
  when before adding it to the host domain during ppt(4) detach.

  PR:		222937
  Tested by:	Harry Schmalzbauer <freebsd@omnilan.de>
  Reviewed by:	grehan
  MFC after:	1 week
  Differential Revision:	https://reviews.freebsd.org/D12661

Changes:
  head/sys/amd64/vmm/io/iommu.c
  head/sys/amd64/vmm/io/ppt.c
Comment 3 Harald Schmalzbauer 2017-10-27 15:11:48 UTC
Thanks a lot John!
Since my tests were all done on stable/11, I strongly vote for MFC – which the commit log already shows with 1 week timeframe; just to respond to the last comment...  Like reported, brave bhyve users have their data in danger ;-)

-harry
Comment 4 Matthew Macy 2017-11-10 23:49:55 UTC
Harald: could you please test the following change to confirm that it doesn't introduce any regressions?
https://github.com/mattmacy/networking/commit/5a12497038b11b55986efc81ffb2f211dec6077a

-M
Comment 5 commit-hook freebsd_committer freebsd_triage 2017-11-16 18:22:18 UTC
A commit references this bug:

Author: jhb
Date: Thu Nov 16 18:22:03 UTC 2017
New revision: 325900
URL: https://svnweb.freebsd.org/changeset/base/325900

Log:
  MFC 325039: Rework pass through changes in r305485 to be safer.

  Specifically, devices that do not support PCI-e FLR and were not
  gracefully shutdown by the guest OS could continue to issue DMA
  requests after the VM was terminated.  The changes in r305485 meant
  that those DMA requests were completed against the host's memory which
  could result in random memory corruption.  Instead, leave ppt devices
  that are not attached to a VM disabled in the IOMMU and only restore
  the devices to the host domain if the ppt(4) driver is detached from a
  device.

  As an added safety belt, disable busmastering for a pass-through device
  when before adding it to the host domain during ppt(4) detach.

  PR:		222937

Changes:
_U  stable/10/
  stable/10/sys/amd64/vmm/io/iommu.c
  stable/10/sys/amd64/vmm/io/ppt.c
_U  stable/11/
  stable/11/sys/amd64/vmm/io/iommu.c
  stable/11/sys/amd64/vmm/io/ppt.c