Bug 235010 - bhyve: Linux guest crash due to unhandled MSR
Summary: bhyve: Linux guest crash due to unhandled MSR
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.0-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-virtualization mailing list
Keywords: bhyve
Depends on:
Reported: 2019-01-16 20:46 UTC by Rys Sommefeldt
Modified: 2019-01-20 10:23 UTC (History)
4 users (show)

See Also:

Bad boot due to unhandled Spectre v2 MSR (160 bytes, text/plain)
2019-01-16 20:46 UTC, Rys Sommefeldt
no flags Details
Good boot log after using -w switch (47.90 KB, text/plain)
2019-01-16 20:47 UTC, Rys Sommefeldt
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Rys Sommefeldt 2019-01-16 20:46:17 UTC
Created attachment 201198 [details]
Bad boot due to unhandled Spectre v2 MSR

bhyve in FreeBSD 12.0-RELEASE can't run certain Linux guests on certain AMD processors due to an unhandled MSR related to the Spectre v2 mitigation present in modern Linux kernels.

Use of the -w switch lets at least certain guests run properly.

The MSR is 0x49, and the bhyve log shows the guest attempting to set it to 0x1. I've attached logs from a single vCPU configuration attempting to run Debian 9.6.0.

debian9-1vCPU.log is from a run without -w
debian9-1vCPU-ignore-unimplemented-msr.log is from a run with -w

This LKML thread has more information on how it relates to Spectre v2 mitigation:


I don't understand the interaction between the Linux kernel and the CPU as presented to the guest by bhyve, with regards to the microcode. It's possible my host system needs a microcode update delivered by the BIOS or FreeBSD somehow, but I'm out of my depth there.

I'm able to get guests that support the Linux kernel's spectre_v2_user=off kernel boot param to boot OK, without needing -w.

Let me know if I can do more testing. I'm running an AMD Ryzen Threadripper 2990WX with AGESA firmware as part of the BIOS.
Comment 1 Rys Sommefeldt 2019-01-16 20:47:28 UTC
Created attachment 201199 [details]
Good boot log after using -w switch
Comment 2 Conrad Meyer freebsd_committer 2019-01-16 23:35:37 UTC
This seems to be a good summary:


We might be passing through CPUID bits that we should not be to the guest, at least not without adding that MSR to our emulation list.

I'm not sure how we handle spectre/meltdown representations to guests on Intel.

I don't think guests should be able to set these MSRs and they probably shouldn't do software mitigation -- it's up to the host to correct mitigate.  So maybe we should set whatever bit claims immunity to spectre/meltdown in guest cpuid.
Comment 3 commit-hook freebsd_committer 2019-01-17 19:45:46 UTC
A commit references this bug:

Author: cem
Date: Thu Jan 17 19:44:48 UTC 2019
New revision: 343120
URL: https://svnweb.freebsd.org/changeset/base/343120

  Add definitions for AMD Spectre/Meltdown CPUID information

  No functional change, aside from printing recognized bits in CPU

  The bits are documented in 111006-B "Indirect Branch Control Extension"[1] and
  124441 "Speculative Store Bypass Disable."[2]

  Notably missing (left as future work):
    * Integration with hw.spec_store_bypass_disable and hw_ssb_active flag,
      which are currently Intel-specific
    * Integration with hw_ibrs_active global flag, which are currently
    * SSB_NO integration in hw_ssb_recalculate()
    * Bhyve integration (PR 235010)



  PR:		235010 (related, but does not fix)
  MFC after:	a week

Comment 4 Rodney W. Grimes freebsd_committer 2019-01-18 15:06:40 UTC
(In reply to Conrad Meyer from comment #2)
Your correct in that bhyve does very little to hide CPUID bits of the host from the guest and this has caused us these types of problems.  This only gets worse with the addition of mitigation bits.

What we need is a general mechanism to deal with this that would allow both masking and setting of any of the CPUID bits, something along the lines of (HOST & mask) | force for each of the cpuid values.

This would even give us the ability to change processor model and type.

IIRC this is how vmware implements the "create least common denominator" CPUid accross a cluster of servers so that you can do live migration.
Comment 5 commit-hook freebsd_committer 2019-01-18 23:55:38 UTC
A commit references this bug:

Author: cem
Date: Fri Jan 18 23:54:51 UTC 2019
New revision: 343166
URL: https://svnweb.freebsd.org/changeset/base/343166

  vmm(4): Mask Spectre feature bits on AMD hosts

  For parity with Intel hosts, which already mask out the CPUID feature
  bits that indicate the presence of the SPEC_CTRL MSR, do the same on

  Eventually we may want to have a better support story for guests, but
  for now, limit the damage of incorrectly indicating an MSR we do not yet

  Eventually, we may want a generic CPUID override system for
  administrators, or for minimum supported feature set in heterogenous
  environments with failover.  That is a much larger scope effort than
  this bug fix.

  PR:		235010
  Reported by:	Rys Sommefeldt <rys AT sommefeldt.com>
  Sponsored by:	Dell EMC Isilon

Comment 6 Conrad Meyer freebsd_committer 2019-01-18 23:56:47 UTC
Thanks for the report and MSR access log, Rys!  I think I found the reason this works on Intel (the CPUID feature bits where SPEC_CTRL would be indicated are cleared) and not AMD (on AMD, we pass through those feature bits from the host).  Both Intel and AMD share the same SPEC_CRTL MSR and we do not implement it on either platform.  Please try the committed change, I believe it should fix the issue.
Comment 7 Rys Sommefeldt 2019-01-20 10:23:53 UTC
I patched my 12.0-RELEASE kernel with 343120 and 343166 and now modern Linux guests boot without needing -w in bhyve (ignore_bad_msr="yes" in vm-bhyve) in both uniprocessor and multiprocess vCPU guest configs on my platform.

Thanks, Conrad! I'll file a new bug if anything else shows up, and reference this one if needed.