Created attachment 201198 [details]
Bad boot due to unhandled Spectre v2 MSR
bhyve in FreeBSD 12.0-RELEASE can't run certain Linux guests on certain AMD processors due to an unhandled MSR related to the Spectre v2 mitigation present in modern Linux kernels.
Use of the -w switch lets at least certain guests run properly.
The MSR is 0x49, and the bhyve log shows the guest attempting to set it to 0x1. I've attached logs from a single vCPU configuration attempting to run Debian 9.6.0.
debian9-1vCPU.log is from a run without -w
debian9-1vCPU-ignore-unimplemented-msr.log is from a run with -w
This LKML thread has more information on how it relates to Spectre v2 mitigation:
I don't understand the interaction between the Linux kernel and the CPU as presented to the guest by bhyve, with regards to the microcode. It's possible my host system needs a microcode update delivered by the BIOS or FreeBSD somehow, but I'm out of my depth there.
I'm able to get guests that support the Linux kernel's spectre_v2_user=off kernel boot param to boot OK, without needing -w.
Let me know if I can do more testing. I'm running an AMD Ryzen Threadripper 2990WX with AGESA firmware 220.127.116.11a as part of the BIOS.
Created attachment 201199 [details]
Good boot log after using -w switch
This seems to be a good summary:
We might be passing through CPUID bits that we should not be to the guest, at least not without adding that MSR to our emulation list.
I'm not sure how we handle spectre/meltdown representations to guests on Intel.
I don't think guests should be able to set these MSRs and they probably shouldn't do software mitigation -- it's up to the host to correct mitigate. So maybe we should set whatever bit claims immunity to spectre/meltdown in guest cpuid.
A commit references this bug:
Date: Thu Jan 17 19:44:48 UTC 2019
New revision: 343120
Add definitions for AMD Spectre/Meltdown CPUID information
No functional change, aside from printing recognized bits in CPU
The bits are documented in 111006-B "Indirect Branch Control Extension" and
124441 "Speculative Store Bypass Disable."
Notably missing (left as future work):
* Integration with hw.spec_store_bypass_disable and hw_ssb_active flag,
which are currently Intel-specific
* Integration with hw_ibrs_active global flag, which are currently
* SSB_NO integration in hw_ssb_recalculate()
* Bhyve integration (PR 235010)
PR: 235010 (related, but does not fix)
MFC after: a week
(In reply to Conrad Meyer from comment #2)
Your correct in that bhyve does very little to hide CPUID bits of the host from the guest and this has caused us these types of problems. This only gets worse with the addition of mitigation bits.
What we need is a general mechanism to deal with this that would allow both masking and setting of any of the CPUID bits, something along the lines of (HOST & mask) | force for each of the cpuid values.
This would even give us the ability to change processor model and type.
IIRC this is how vmware implements the "create least common denominator" CPUid accross a cluster of servers so that you can do live migration.
A commit references this bug:
Date: Fri Jan 18 23:54:51 UTC 2019
New revision: 343166
vmm(4): Mask Spectre feature bits on AMD hosts
For parity with Intel hosts, which already mask out the CPUID feature
bits that indicate the presence of the SPEC_CTRL MSR, do the same on
Eventually we may want to have a better support story for guests, but
for now, limit the damage of incorrectly indicating an MSR we do not yet
Eventually, we may want a generic CPUID override system for
administrators, or for minimum supported feature set in heterogenous
environments with failover. That is a much larger scope effort than
this bug fix.
Reported by: Rys Sommefeldt <rys AT sommefeldt.com>
Sponsored by: Dell EMC Isilon
Thanks for the report and MSR access log, Rys! I think I found the reason this works on Intel (the CPUID feature bits where SPEC_CTRL would be indicated are cleared) and not AMD (on AMD, we pass through those feature bits from the host). Both Intel and AMD share the same SPEC_CRTL MSR and we do not implement it on either platform. Please try the committed change, I believe it should fix the issue.
I patched my 12.0-RELEASE kernel with 343120 and 343166 and now modern Linux guests boot without needing -w in bhyve (ignore_bad_msr="yes" in vm-bhyve) in both uniprocessor and multiprocess vCPU guest configs on my platform.
Thanks, Conrad! I'll file a new bug if anything else shows up, and reference this one if needed.