I am trying to use the latest FreeBSD 13 Release AMIs on EC2 and by default they start up with: kern.timecounter.hardware: TSC-low which results in a very slow clock. I did these rudimentary tests, hopefully demonstrating the problem. (all date commands were executed one minute apart after switching the timecounter. root@buildbot_master: # sysctl kern.timecounter.hardware=i8254 kern.timecounter.hardware: TSC-low -> i8254 root@buildbot_master: # date Tue Jun 22 18:22:45 UTC 2021 root@buildbot_master: # date Tue Jun 22 18:23:46 UTC 2021 root@buildbot_master: # sysctl kern.timecounter.hardware=ACPI-fast kern.timecounter.hardware: i8254 -> ACPI-fast root@buildbot_master: # date Tue Jun 22 18:24:46 UTC 2021 root@buildbot_master: # date Tue Jun 22 18:25:45 UTC 2021 root@buildbot_master: # sysctl kern.timecounter.hardware=HPET kern.timecounter.hardware: ACPI-fast -> HPET root@buildbot_master: # date Tue Jun 22 18:26:45 UTC 2021 root@buildbot_master: # date Tue Jun 22 18:27:45 UTC 2021 root@buildbot_master:/var/service-dir # sysctl kern.timecounter.hardware=TSC-low kern.timecounter.hardware: HPET -> TSC-low root@buildbot_master: # date Tue Jun 22 18:28:16 UTC 2021 root@buildbot_master: # date Tue Jun 22 18:28:43 UTC 2021 This is the uname of the machine and if you need any additional info I'd be happy to provide it: root@buildbot_master: # uname -a FreeBSD buildbot_master 13.0-RELEASE-p1 FreeBSD 13.0-RELEASE-p1 #0: Wed May 26 22:15:09 UTC 2021 root@amd64-builder.daemonology.net:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
^Triage: affects AWS instances, so assign.
@Reporter Could you include /var/run/dmesg.boot as an attachment please
Exactly which instance type is this?
Created attachment 226069 [details] dmesg.boot
(In reply to Colin Percival from comment #3) I get this behavior with t3 and t3a instances.
It's possible that this is a problem with "stolen" CPU time. Martin: Is the EC2 "CPU credit balance" positive on that instance?
Created attachment 226072 [details] cpu utilization and credit balance graphs
(In reply to Colin Percival from comment #6) Yes, it is positive, the instance isn't very loaded, and also the same thing happens even if I just let it sit after it starts up. By the time I am able to ssh in when the instance starts the clock is already too far behind to fetch SSM parameters for example... On the other hand if I use configinit to change the timecounter to any of the other options, the time is accurate enough.
Hmm, very strange. I can't reproduce this. Which EC2 region are you in? Have you tried other regions? It's possible that AWS is rolling out a change which breaks things.
I am using the us-east-1 region.
Ok, that's where I was testing too... have you tried all the different AZs?
I think all my tests were in us-east-1a, I don't think I have tested other AZs
Can you try other AZs, and also tell me if running something CPU intensive (say, 'dd if=/dev/random | sha256') changes how fast the clock runs?
Also, what does `sysctl kern.timecounter` say?
Created attachment 226092 [details] terraform testcase
(In reply to Colin Percival from comment #13) I created a terraform file that launches an instance with this behavior in any region (provided you update the region and the AMI). It seems that if you choose standard cpu_credits mode when launching a t3/t3a instance, TSC-low is 50% slower, while if you choose unlimited - everything works fine (at least in my tests...).
(In reply to Colin Percival from comment #14) ec2-user@freebsd:~ $ sysctl kern.timecounter kern.timecounter.tsc_shift: 1 kern.timecounter.smp_tsc_adjust: 0 kern.timecounter.smp_tsc: 1 kern.timecounter.invariant_tsc: 1 kern.timecounter.fast_gettime: 1 kern.timecounter.tick: 1 kern.timecounter.choice: i8254(0) ACPI-fast(900) HPET(950) TSC-low(1000) dummy(-1000000) kern.timecounter.hardware: TSC-low kern.timecounter.alloweddeviation: 5 kern.timecounter.timehands_count: 2 kern.timecounter.stepwarnings: 0 kern.timecounter.tc.i8254.quality: 0 kern.timecounter.tc.i8254.frequency: 1193182 kern.timecounter.tc.i8254.counter: 5840 kern.timecounter.tc.i8254.mask: 65535 kern.timecounter.tc.ACPI-fast.quality: 900 kern.timecounter.tc.ACPI-fast.frequency: 3579545 kern.timecounter.tc.ACPI-fast.counter: 15311745 kern.timecounter.tc.ACPI-fast.mask: 16777215 kern.timecounter.tc.HPET.quality: 950 kern.timecounter.tc.HPET.frequency: 62500000 kern.timecounter.tc.HPET.counter: 1237447075 kern.timecounter.tc.HPET.mask: 4294967295 kern.timecounter.tc.TSC-low.quality: 1000 kern.timecounter.tc.TSC-low.frequency: 1426718697 kern.timecounter.tc.TSC-low.counter: 3076039010 kern.timecounter.tc.TSC-low.mask: 4294967295
Thanks for all the details. After some investigation, it looks like there's a specific issue with TSC on T3/T3A instances which affects the clock calibration. In the long term we should switch over to using the KVM pvclock (which won't have this problem) but that code isn't in the tree yet; for 13.0 I don't think there's any good solution.
Can you try to set kern.timecounter.smp_tsc_adjust=1 when using TSC-low and see if that makes a difference? So this system reports invariant TSC, but the TSCs are not synchronized across CPUs?
(In reply to Roger Pau Monné from comment #19) > So this system reports invariant TSC, but the TSCs are not synchronized across CPUs? My understanding is that it's even worse than that: The TSC is not synchronized across a *single* CPU, since we're a VM guest and what we see as a single CPU might move between host CPUs which aren't synchronized.
(In reply to Colin Percival from comment #20) That sounds like a bug in their implementation? As the hypervisor should keep the TSC of the virtual CPU synchronised across physical CPU movements. Have you tried to report to AWS?
They're the people who told me that we should use pvclock instead. ;-)
(In reply to Colin Percival from comment #22) Oh OK. Maybe they don't do the vCPU TSC adjustments in order to avoid some of the overhead of vCPU scheduling. Not sure exactly what are the requirement/expectations of OSes running on Nitro instances.
(In reply to Roger Pau Monné from comment #19) That seems to help, although one has to edit /boot/loader.conf and reboot the instance to apply it... Also my only test is just running date one minute apart, so if you have any ideas how to test more, I'd be happy to... root@freebsd:~ # sysctl -a | grep kern.timecounter.smp_tsc_adjust kern.timecounter.smp_tsc_adjust: 1 root@freebsd:~ # date Wed Jun 30 19:00:59 UTC 2021 root@freebsd:~ # date Wed Jun 30 19:01:59 UTC 2021
Martin: Can you try the latest 14-CURRENT AMI, and confirm that setting kern.timecounter.hardware=kvmclock fixes it?
(In reply to Colin Percival from comment #25) You mean ami-0c69868f578f6cc8a with this description - "FreeBSD/amd64 main@58a7bf124cc"?(In reply to Colin Percival from comment #25)
Yep, that's the one.
Created attachment 227814 [details] dmesg.boot - 14.0-CURRENT
(In reply to Colin Percival from comment #27) Setting kvmclock seems to work as expected: root@freebsd:~ # sysctl kern.timecounter.hardware=kvmclock kern.timecounter.hardware: TSC-low -> kvmclock root@freebsd:~ # date Fri Sep 10 18:07:12 UTC 2021 root@freebsd:~ # sysctl kern.timecounter kern.timecounter.tsc_shift: 1 kern.timecounter.smp_tsc_adjust: 0 kern.timecounter.smp_tsc: 1 kern.timecounter.invariant_tsc: 1 kern.timecounter.fast_gettime: 1 kern.timecounter.tick: 1 kern.timecounter.choice: i8254(0) ACPI-fast(900) HPET(950) kvmclock(975) TSC-low(1000) dummy(-1000000) kern.timecounter.hardware: kvmclock kern.timecounter.alloweddeviation: 5 kern.timecounter.timehands_count: 2 kern.timecounter.stepwarnings: 0 kern.timecounter.tc.i8254.quality: 0 kern.timecounter.tc.i8254.frequency: 1193182 kern.timecounter.tc.i8254.counter: 25411 kern.timecounter.tc.i8254.mask: 65535 kern.timecounter.tc.ACPI-fast.quality: 900 kern.timecounter.tc.ACPI-fast.frequency: 3579545 kern.timecounter.tc.ACPI-fast.counter: 5485062 kern.timecounter.tc.ACPI-fast.mask: 16777215 kern.timecounter.tc.HPET.quality: 950 kern.timecounter.tc.HPET.frequency: 62500000 kern.timecounter.tc.HPET.counter: 3218406757 kern.timecounter.tc.HPET.mask: 4294967295 kern.timecounter.tc.kvmclock.quality: 975 kern.timecounter.tc.kvmclock.frequency: 1000000000 kern.timecounter.tc.kvmclock.counter: 2954620152 kern.timecounter.tc.kvmclock.mask: 4294967295 kern.timecounter.tc.TSC-low.quality: 1000 kern.timecounter.tc.TSC-low.frequency: 1100021373 kern.timecounter.tc.TSC-low.counter: 2772787131 kern.timecounter.tc.TSC-low.mask: 4294967295 root@freebsd:~ # date Fri Sep 10 18:08:06 UTC 2021 However I am not able to reproduce the original issue with this instance - TSC-low seems to be working fine as well... Please let me know if you need anything else or if I can run more tests...
(In reply to martin.stoyanov from comment #29) Thanks for testing -- indeed, reproducing the original issue seems difficult (I was never able to reproduce it myself). But my contacts at Amazon say that we should definitely use kvmclock where it's available, so I'm inclined to trust them in the absence of evidence to the contrary. Do you have any existing instances where you're seeing the issue which you might be able to upgrade to HEAD to test this?
(In reply to Colin Percival from comment #30) With the terraform test case I attached, I am able to reproduce this fairly reliably, but only with the 13.0-RELEASE AMIs. As a test, I started one 13.0-RELEASE instance that had the problem and updated it to 14.0-CURRENT and now it works fine with both TSC-low and kvmclock.
(In reply to Colin Percival from comment #30) I guess I spoke too soon - after a reboot of this instance, this is what I get (with TSC-low): root@freebsd:~ # uname -a FreeBSD freebsd 14.0-CURRENT FreeBSD 14.0-CURRENT #0 main-n249319-3a0976dea38: Sun Sep 12 14:01:34 UTC 2021 root@freebsd:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 root@freebsd:~ # sysctl kern.timecounter kern.timecounter.tsc_shift: 1 kern.timecounter.smp_tsc_adjust: 0 kern.timecounter.smp_tsc: 1 kern.timecounter.invariant_tsc: 1 kern.timecounter.fast_gettime: 1 kern.timecounter.tick: 1 kern.timecounter.choice: i8254(0) ACPI-fast(900) HPET(950) kvmclock(975) TSC-low(1000) dummy(-1000000) kern.timecounter.hardware: TSC-low kern.timecounter.alloweddeviation: 5 kern.timecounter.timehands_count: 2 kern.timecounter.stepwarnings: 0 kern.timecounter.tc.i8254.quality: 0 kern.timecounter.tc.i8254.frequency: 1193182 kern.timecounter.tc.i8254.counter: 24651 kern.timecounter.tc.i8254.mask: 65535 kern.timecounter.tc.ACPI-fast.quality: 900 kern.timecounter.tc.ACPI-fast.frequency: 3579545 kern.timecounter.tc.ACPI-fast.counter: 3169760 kern.timecounter.tc.ACPI-fast.mask: 16777215 kern.timecounter.tc.HPET.quality: 950 kern.timecounter.tc.HPET.frequency: 62500000 kern.timecounter.tc.HPET.counter: 163922095 kern.timecounter.tc.HPET.mask: 4294967295 kern.timecounter.tc.kvmclock.quality: 975 kern.timecounter.tc.kvmclock.frequency: 1000000000 kern.timecounter.tc.kvmclock.counter: 236796286 kern.timecounter.tc.kvmclock.mask: 4294967295 kern.timecounter.tc.TSC-low.quality: 1000 kern.timecounter.tc.TSC-low.frequency: 1244790063 kern.timecounter.tc.TSC-low.counter: 1383859311 kern.timecounter.tc.TSC-low.mask: 4294967295 root@freebsd:~ # date ; sntp pool.ntp.org ; sleep 60 ; date ; sntp pool.ntp.org Mon Sep 13 15:45:25 UTC 2021 ntpd 4.2.8p15-a (1) kod_init_kod_db(): Cannot open KoD db file /var/db/ntp-kod: No such file or directory 2021-09-13 15:45:25.093618 (+0000) +194.309232 +/- 129.554879 pool.ntp.org 23.131.160.7 s2 no-leap Mon Sep 13 15:46:25 UTC 2021 ntpd 4.2.8p15-a (1) kod_init_kod_db(): Cannot open KoD db file /var/db/ntp-kod: No such file or directory 2021-09-13 15:46:25.245602 (+0000) +270.24435 +/- 180.186988 pool.ntp.org 83.68.137.76 s2 no-leap and with KVMCLOCK: root@freebsd:~ # sysctl kern.timecounter kern.timecounter.tsc_shift: 1 kern.timecounter.smp_tsc_adjust: 0 kern.timecounter.smp_tsc: 1 kern.timecounter.invariant_tsc: 1 kern.timecounter.fast_gettime: 1 kern.timecounter.tick: 1 kern.timecounter.choice: i8254(0) ACPI-fast(900) HPET(950) kvmclock(975) TSC-low(1000) dummy(-1000000) kern.timecounter.hardware: kvmclock kern.timecounter.alloweddeviation: 5 kern.timecounter.timehands_count: 2 kern.timecounter.stepwarnings: 0 kern.timecounter.tc.i8254.quality: 0 kern.timecounter.tc.i8254.frequency: 1193182 kern.timecounter.tc.i8254.counter: 33596 kern.timecounter.tc.i8254.mask: 65535 kern.timecounter.tc.ACPI-fast.quality: 900 kern.timecounter.tc.ACPI-fast.frequency: 3579545 kern.timecounter.tc.ACPI-fast.counter: 10497046 kern.timecounter.tc.ACPI-fast.mask: 16777215 kern.timecounter.tc.HPET.quality: 950 kern.timecounter.tc.HPET.frequency: 62500000 kern.timecounter.tc.HPET.counter: 2364450388 kern.timecounter.tc.HPET.mask: 4294967295 kern.timecounter.tc.kvmclock.quality: 975 kern.timecounter.tc.kvmclock.frequency: 1000000000 kern.timecounter.tc.kvmclock.counter: 1123025369 kern.timecounter.tc.kvmclock.mask: 4294967295 kern.timecounter.tc.TSC-low.quality: 1000 kern.timecounter.tc.TSC-low.frequency: 1244790063 kern.timecounter.tc.TSC-low.counter: 3561554254 kern.timecounter.tc.TSC-low.mask: 4294967295 root@freebsd:~ # date ; sntp pool.ntp.org ; sleep 60 ; date ; sntp pool.ntp.org Mon Sep 13 16:06:29 UTC 2021 ntpd 4.2.8p15-a (1) kod_init_kod_db(): Cannot open KoD db file /var/db/ntp-kod: No such file or directory 2021-09-13 16:06:29.878712 (+0000) -0.420875 +/- 0.322316 pool.ntp.org 72.30.35.88 s2 no-leap Mon Sep 13 16:07:29 UTC 2021 ntpd 4.2.8p15-a (1) kod_init_kod_db(): Cannot open KoD db file /var/db/ntp-kod: No such file or directory 2021-09-13 16:07:29.984987 (+0000) -0.422455 +/- 0.334305 pool.ntp.org 168.235.86.33 s2 no-leap I can keep that instance up and give you access if you want...
(In reply to martin.stoyanov from comment #32) Just to confirm: It looks like that instance is having the problem by default, but the problem goes away when you tell it to use kvmclock?
(In reply to Colin Percival from comment #33) Yep, that's exactly right.
(In reply to martin.stoyanov from comment #34) Ok, great. In that case no need to keep the instance around; we'll get the default timecounter changed and the issue should be gone next release.
(In reply to Colin Percival from comment #35) Works for me; thanks a lot for your time and effort!
The TSC shouldn't vary; I suspect what we're seeing here is an issue with the i8253 PIT making the calibration go wrong. Do you have a current way to switch back from kvmclock to TSC and test, that I can experiment with and then include in regression tests please?
(In reply to dwmw2 from comment #37) You should be able to switch back to TSC at runtime with: # sysctl kern.timecounter.hardware=TSC You can also set it in loader.conf to select at boot time. I think older versions of FreeBSD name this TSC-low, so check kern.timecounter.choice for the option name.
Sorry for the long radio silence on this issue. We think this was due to a bug on Amazon's side which has since been fixed, so I'm going to close the bug. If anyone can still reproduce this, please let me know!