Bug 208238 - [Hyper-V] TSC frequency is not correctly detected: "calcru: runtime went backwards"
Summary: [Hyper-V] TSC frequency is not correctly detected: "calcru: runtime went back...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-virtualization (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-03-23 12:54 UTC by Dexuan Cui
Modified: 2016-06-30 08:58 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dexuan Cui 2016-03-23 12:54:10 UTC
I can easily get a large number of messages in the system log similar to this:

# dmesg | grep backwards
calcru: runtime went backwards from 5188 usec to 2727 usec for pid 770 (cron)
calcru: runtime went backwards from 2944 usec to 1488 usec for pid 766 (sendmail)
calcru: runtime went backwards from 2497 usec to 1262 usec for pid 766 (sendmail)


It looks this is caused by the wrongly-detected TSC frequency and I made this patch to address this:

https://github.com/dcui/freebsd/commit/07acbd8a3771adba9ac0fca807ec629ff05a807c

"
use hyper-v time counter (rather than i8254) to calibrate TSC
The i8254 PIT counter emulated by Hyper-V is not reliable.

In probe_tsc_freq() -> DELAY -> init_ops.early_delay() -> i8254_delay(),
getit() can return these values when it is invoked 10 times:
  1 pit count0 = 228
  2 pit count0 = 131
  3 pit count0 = 34
  4 pit count0 = 65473 <-- this is a normal wrap-around.
  5 pit count0 = 65375
  6 pit count0 = 65278
  7 pit count0 = 65180
  8 pit count0 = 65388 <-- this is bad!
  9 pit count0 = 65290
 10 pit count0 = 65193

For the 8th time, the 'delta' in i8254_delay() is < 0 while
it shouldn't, so the later "delta += i8254_max_count;" and
"ticks_left -= delta" will cause i8254_delay() to wait shorter than
expected, and finally probe_tsc_freq() get a smaller 'tsc_freq'.

A smaller 'tsc_freq' can cause time inaccuracy in dtrace.
It can also cause warnings like

calcru: runtime went backwards from 50 usec to 25 usec for pid 0 (kernel)
calcru: runtime went backwards from 1471 usec to 743 usec for pid 0 (kernel)
calcru: runtime went backwards from 40 usec to 20 usec for pid 0 (kernel)
calcru: runtime went backwards from 18 usec to 9 usec for pid 0 (kernel)
calcru: runtime went backwards from 46204978 usec to 23362331 usec for pid 0 (kernel)

We use Hyper-V time counter, which is much more reliable than i8254,
to calibrate TSC.
"
Comment 1 Dexuan Cui 2016-04-06 02:20:12 UTC
Fixed in 11-CURRENT by:
https://github.com/freebsd/freebsd/commit/cff47489671a6ec6470f706f530df99c158511b0
Comment 2 Dexuan Cui 2016-04-12 08:34:12 UTC
The tsc-calibration fix was temporarily taken back last Friday because we found an interesting thing: with some kind of config, Xen can pretend to be Hyper-V by mimicking the hypervisor CPUID signature (this is usually to make Windows VM run better on Xen), but Xen doesn’t support Hyper-V timecounter, so the fix causes an unexpected panic when FreeBSD VM runs on Xen in this case.

We're making another patch to better detect if the underlying hypervisor supports Hyper-V timecounter or not:
https://lists.freebsd.org/pipermail/freebsd-current/2016-April/060519.html