Bug 216759 - Memory speed with small blocks (1K) up to 35 times slower than host system (under QEMU emulation, but not only)
Summary: Memory speed with small blocks (1K) up to 35 times slower than host system (u...
Status: Closed Feedback Timeout
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.0-STABLE
Hardware: amd64 Any
: --- Affects Many People
Assignee: Konstantin Belousov
URL: https://www.reddit.com/r/freebsd/comm...
Keywords: performance
Depends on:
Blocks:
 
Reported: 2017-02-03 19:49 UTC by andrew
Modified: 2024-06-14 03:55 UTC (History)
16 users (show)

See Also:


Attachments
tsc.c patch to allow KVM hypervisor with host cpu to pass through the good TSC (1.08 KB, patch)
2017-03-22 11:28 UTC, andrew
andrew: maintainer-approval+
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description andrew 2017-02-03 19:49:53 UTC
FreeBSD 11-RELEASE and 10.3-RELEASE seem to run memory much slower according to sysbench benchmarking software:

Bare Metal run:
# uname -a
FreeBSD backup 10.3-RELEASE-p11 FreeBSD 10.3-RELEASE-p11 #0: Mon Oct 24 18:49:24 UTC 2016     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64
# sysbench --num-threads=1 --test=memory --memory-total-size=1G run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing memory operations speed test
Memory block size: 1K

Memory transfer size: 1024M

Memory operations type: write
Memory scope type: global
Threads started!
Done.

Operations performed: 1048576 (2183178.34 ops/sec)

1024.00 MB transferred (2132.01 MB/sec)


Test execution summary:
    total time:                          0.4803s
    total number of events:              1048576
    total time taken by event execution: 0.3527
    per-request statistics:
         min:                                  0.00ms
         avg:                                  0.00ms
         max:                                  7.56ms
         approx.  95 percentile:               0.00ms

Threads fairness:
    events (avg/stddev):           1048576.0000/0.00
    execution time (avg/stddev):   0.3527/0.00


QEMU KVM emulation:
# uname -a
FreeBSD dev 11.0-RELEASE-p2 FreeBSD 11.0-RELEASE-p2 #0: Mon Oct 24 06:55:27 UTC 2016     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64
# sysbench --num-threads=1 --test=memory --memory-total-size=1G run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing memory operations speed test
Memory block size: 1K

Memory transfer size: 1024M

Memory operations type: write
Memory scope type: global
Threads started!
Done.

Operations performed: 1048576 (69497.13 ops/sec)

1024.00 MB transferred (67.87 MB/sec)


Test execution summary:
    total time:                          15.0880s
    total number of events:              1048576
    total time taken by event execution: 11.1440
    per-request statistics:
         min:                                  0.01ms
         avg:                                  0.01ms
         max:                                  7.32ms
         approx.  95 percentile:               0.00ms

Threads fairness:
    events (avg/stddev):           1048576.0000/0.00
    execution time (avg/stddev):   11.1440/0.00

For comparison
VMWARE:

# uname -a
FreeBSD ns3 10.2-RELEASE-p7 FreeBSD 10.2-RELEASE-p7 #0: Mon Nov  2 14:19:39 UTC 2015     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64
# sysbench --num-threads=1 --test=memory --memory-total-size=1G run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing memory operations speed test
Memory block size: 1K

Memory transfer size: 1024M

Memory operations type: write
Memory scope type: global
Threads started!
Done.

Operations performed: 1048576 (2234641.77 ops/sec)

1024.00 MB transferred (2182.27 MB/sec)


Test execution summary:
    total time:                          0.4692s
    total number of events:              1048576
    total time taken by event execution: 0.3437
    per-request statistics:
         min:                                  0.00ms
         avg:                                  0.00ms
         max:                                  0.09ms
         approx.  95 percentile:               0.00ms

Threads fairness:
    events (avg/stddev):           1048576.0000/0.00
    execution time (avg/stddev):   0.3437/0.00



This is not 11 only problem. VPSs with 10.3 tested also have the same problem.

I haven't found any info on this on the net. Might be because nobody tests the RAM.
Sysbench itself starts up a thread and runs the allocation code. I couldn't trace the thread though.
Maybe it is old code of sysbench.



Additional information and reports:
https://www.reddit.com/r/freebsd/comments/5rtf05/abysmal_memory_perfomance_witch_freebsd_under/
Comment 1 andrew 2017-02-04 14:12:26 UTC
Memory disks show writes are perfectly fine.

KVM:

# mdconfig -a -t swap -s 128m -u 1
# dd if=/dev/urandom of=/root/test bs=1M count=128
# dd if=/root/test of=/dev/md1 bs=1M
128+0 records in
128+0 records out
134217728 bytes transferred in 0.089190 secs (1504858419 bytes/sec)
# dd if=/root/test of=/dev/md1 bs=1M
128+0 records in
128+0 records out
134217728 bytes transferred in 0.056503 secs (2375411572 bytes/sec)

So seems some specific sysbench call ruins everything. Should we add [benchmarks/sysbench] ?
Comment 2 andrew 2017-02-04 18:13:24 UTC
(In reply to andrew from comment #1)
# dd if=/root/test of=/dev/md1 bs=512
262144+0 records in
262144+0 records out
134217728 bytes transferred in 4.829553 secs (27790924 bytes/sec

# dd if=/root/test of=/dev/md1 bs=1K
131072+0 records in
131072+0 records out
134217728 bytes transferred in 2.478277 secs (54157674 bytes/sec)

Apparently the bug is only visible with anything smaller than 1M
Comment 3 andrew 2017-02-05 17:30:01 UTC
Without KVM extensions on QEMU the speed goes a bit up to 94MB/s instead of 67 before
Comment 4 andrew 2017-02-20 07:19:47 UTC
Changed the Importance, because it's not a random bug but a 100% hit rate on any qemu platform
Comment 5 Bartek Rutkowski freebsd_committer freebsd_triage 2017-02-22 09:05:27 UTC
Here are some results from the tests on a 12-C vm under XenServer:

VM: FreeBSD poudriere 12.0-CURRENT FreeBSD 12.0-CURRENT #3 r314028: Tue Feb 21 08:07:02 CET 2017     root@pd.valinor.palantiri.org:/usr/obj/usr/src/sys/POUDRIERE  amd64

XenServer: Linux xenserver 3.10.0+10 #1 SMP Thu Sep 22 12:31:44 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux, Xen version: 4.6.1-xs133690

DD on 8G mdisk:

# dd if=/dev/zero of=/dev/md1 bs=1M
dd: /dev/md1: end of device
8193+0 records in
8192+0 records out
8589934592 bytes transferred in 15.565112 secs (551871047 bytes/sec)

# dd if=/dev/zero of=/dev/md1 bs=1K
dd: /dev/md1: end of device
8388609+0 records in
8388608+0 records out
8589934592 bytes transferred in 232.354641 secs (36969068 bytes/sec)

Sysbench:

# sysbench --num-threads=1 --test=memory --memory-total-size=8G run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing memory operations speed test
Memory block size: 1K

Memory transfer size: 8192M

Memory operations type: write
Memory scope type: global
Threads started!
Done.

Operations performed: 8388608 (1169942.57 ops/sec)

8192.00 MB transferred (1142.52 MB/sec)


Test execution summary:
    total time:                          7.1701s
    total number of events:              8388608
    total time taken by event execution: 5.2679
    per-request statistics:
         min:                                  0.00ms
         avg:                                  0.00ms
         max:                                  3.95ms
         approx.  95 percentile:               0.00ms

Threads fairness:
    events (avg/stddev):           8388608.0000/0.00
    execution time (avg/stddev):   5.2679/0.00



# sysbench --num-threads=1 --test=memory --memory-total-size=8G --memory-block-size=1M run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing memory operations speed test
Memory block size: 1024K

Memory transfer size: 8192M

Memory operations type: write
Memory scope type: global
Threads started!
Done.

Operations performed: 8192 (46849.56 ops/sec)

8192.00 MB transferred (46849.56 MB/sec)


Test execution summary:
    total time:                          0.1749s
    total number of events:              8192
    total time taken by event execution: 0.1727
    per-request statistics:
         min:                                  0.02ms
         avg:                                  0.02ms
         max:                                  0.08ms
         approx.  95 percentile:               0.03ms

Threads fairness:
    events (avg/stddev):           8192.0000/0.00
    execution time (avg/stddev):   0.1727/0.00
Comment 6 Roger Pau Monné freebsd_committer freebsd_triage 2017-02-22 10:23:00 UTC
This are the results of the tests on bare-metal using FreeBSD 12 (less than 1 month old):

# sysbench --num-threads=1 --test=memory --memory-total-size=4G run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing memory operations speed test
Memory block size: 1K

Memory transfer size: 4096M

Memory operations type: write
Memory scope type: global
Threads started!
Done.

Operations performed: 4194304 (1122695.18 ops/sec)

4096.00 MB transferred (1096.38 MB/sec)


Test execution summary:
    total time:                          3.7359s
    total number of events:              4194304
    total time taken by event execution: 2.6089
    per-request statistics:
         min:                                  0.00ms
         avg:                                  0.00ms
         max:                                  0.22ms
         approx.  95 percentile:               0.00ms

Threads fairness:
    events (avg/stddev):           4194304.0000/0.00
    execution time (avg/stddev):   2.6089/0.00

# sysbench --num-threads=1 --test=memory --memory-total-size=4G --memory-block-size=1M run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing memory operations speed test
Memory block size: 1024K

Memory transfer size: 4096M

Memory operations type: write
Memory scope type: global
Threads started!
Done.

Operations performed: 4096 (10462.26 ops/sec)

4096.00 MB transferred (10462.26 MB/sec)


Test execution summary:
    total time:                          0.3915s
    total number of events:              4096
    total time taken by event execution: 0.3898
    per-request statistics:
         min:                                  0.06ms
         avg:                                  0.10ms
         max:                                  0.17ms
         approx.  95 percentile:               0.10ms

Threads fairness:
    events (avg/stddev):           4096.0000/0.00
    execution time (avg/stddev):   0.3898/0.00
Comment 7 andrew 2017-02-23 20:13:29 UTC
Same config VPS on QEMU with CentOS 7

# sysbench --num-threads=1 --test=memory --memory-total-size=1G run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing memory operations speed test
Memory block size: 1K

Memory transfer size: 1024M

Memory operations type: write
Memory scope type: global
Threads started!
Done.

Operations performed: 1048576 (1491466.64 ops/sec)

1024.00 MB transferred (1456.51 MB/sec)


Test execution summary:
    total time:                          0.7031s
    total number of events:              1048576
    total time taken by event execution: 0.5486
    per-request statistics:
         min:                                  0.00ms
         avg:                                  0.00ms
         max:                                 18.49ms
         approx.  95 percentile:               0.00ms

Threads fairness:
    events (avg/stddev):           1048576.0000/0.00
    execution time (avg/stddev):   0.5486/0.00


The node running it:

# sysbench --num-threads=1 --test=memory --memory-total-size=1G run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing memory operations speed test
Memory block size: 1K

Memory transfer size: 1024M

Memory operations type: write
Memory scope type: global
Threads started!
Done.

Operations performed: 1048576 (3862725.85 ops/sec)

1024.00 MB transferred (3772.19 MB/sec)


Test execution summary:
    total time:                          0.2715s
    total number of events:              1048576
    total time taken by event execution: 0.2213
    per-request statistics:
         min:                                  0.00ms
         avg:                                  0.00ms
         max:                                  0.04ms
         approx.  95 percentile:               0.00ms

Threads fairness:
    events (avg/stddev):           1048576.0000/0.00
    execution time (avg/stddev):   0.2213/0.00
Comment 8 andrew 2017-02-23 20:17:19 UTC
In the last post the memory performance is fine (twice slower) because the VPS is CPU limited to 1.7GHz while host is not.
Comment 9 kvanbiesen 2017-02-23 20:28:06 UTC
Strange. I installed centos 7 on my server and qemu and i had the same lousy performance as i had with proxmox and bsd
Comment 10 andrew 2017-02-28 15:55:16 UTC
About linux, seems kvm-clock is broken somehow, I adjusted to TSC and got 3700 of the host...


And it seems same is for FreeBSD. Only BSD forces HPET or ACPI on any virtualised platform. So there is only manual way to force to TSC-low which brings us back to 3700

Seems like kernel timecounters code has been written in times of just appearing invariant_tsc and it does not know anything about CPU flags constant_tsc nonstop_tsc tsc_deadline_timer etc...

It's really simple in code. If it's invariant or SMP it's boosted... and that's it:

https://github.com/freebsd/freebsd/blob/21c11d113415f2c87107b6735407b147fae0b851/sys/x86/x86/tsc.c
Comment 11 andrew 2017-02-28 19:46:21 UTC
Here are the latest results supporting my previous post:

root@debian8-test:~# sysbench --num-threads=1 --test=memory --memory-total-size=512M --memory-block-size=1K --debug run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1
Debug mode enabled.


Doing memory operations speed test
Memory block size: 1K

Memory transfer size: 512M

Memory operations type: write
Memory scope type: global
Threads started!
DEBUG: Runner thread started (0)!
Done.

Operations performed: 524288 (3877252.90 ops/sec)

512.00 MB transferred (3786.38 MB/sec)


Test execution summary:
    total time:                          0.1352s
    total number of events:              524288
    total time taken by event execution: 0.1093
    per-request statistics:
         min:                                  0.00ms
         avg:                                  0.00ms
         max:                                  0.14ms
         approx.  95 percentile:               0.00ms

Threads fairness:
    events (avg/stddev):           524288.0000/0.00
    execution time (avg/stddev):   0.1093/0.00

DEBUG: Verbose per-thread statistics:

DEBUG:     thread #  0: min: 0.0000s  avg: 0.0000s  max: 0.0001s  events: 524288
DEBUG:                  total time taken by even execution: 0.1093s

root@debian8-test:~# cat /sys/bus/clocksource/devices/clocksource0/current_clocksource
tsc
root@debian8-test:~# cat /sys/bus/clocksource/devices/clocksource0/available_clocksource
tsc hpet acpi_pm




root@dev:~ # dmesg | grep -i "TSC"
Calibrating TSC clock ... TSC clock: 3400129027 Hz
  Features=0xf83fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2,SS>
  Features2=0xfffa3203<SSE3,PCLMULQDQ,SSSE3,FMA,CX16,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND,HV>
  AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
TSC timecounter discards lower 1 bit(s)
Timecounter "TSC-low" frequency 1700064513 Hz quality -100
root@dev:~ # sysbench --num-threads=1 --test=memory --memory-total-size=512M --memory-block-size=1K --debug run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1
Debug mode enabled.


Doing memory operations speed test
Memory block size: 1K

Memory transfer size: 512M

Memory operations type: write
Memory scope type: global
Threads started!
DEBUG: Runner thread started (0)!
Done.

Operations performed: 524288 (73267.52 ops/sec)

512.00 MB transferred (71.55 MB/sec)


Test execution summary:
    total time:                          7.1558s
    total number of events:              524288
    total time taken by event execution: 5.2780
    per-request statistics:
         min:                                  0.01ms
         avg:                                  0.01ms
         max:                                  0.72ms
         approx.  95 percentile:               0.00ms

Threads fairness:
    events (avg/stddev):           524288.0000/0.00
    execution time (avg/stddev):   5.2780/0.00

DEBUG: Verbose per-thread statistics:

DEBUG:     thread #  0: min: 0.0000s  avg: 0.0000s  max: 0.0007s  events: 524288
DEBUG:                  total time taken by even execution: 5.2780s

root@dev:~ # sysctl kern.timecounter.hardware=TSC-low
kern.timecounter.hardware: HPET -> TSC-low
root@dev:~ # sysbench --num-threads=1 --test=memory --memory-total-size=512M --memory-block-size=1K --debug run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1
Debug mode enabled.


Doing memory operations speed test
Memory block size: 1K

Memory transfer size: 512M

Memory operations type: write
Memory scope type: global
Threads started!
DEBUG: Runner thread started (0)!
Done.

Operations performed: 524288 (3408336.84 ops/sec)

512.00 MB transferred (3328.45 MB/sec)


Test execution summary:
    total time:                          0.1538s
    total number of events:              524288
    total time taken by event execution: 0.1102
    per-request statistics:
         min:                                  0.00ms
         avg:                                  0.00ms
         max:                                  0.24ms
         approx.  95 percentile:               0.00ms

Threads fairness:
    events (avg/stddev):           524288.0000/0.00
    execution time (avg/stddev):   0.1102/0.00

DEBUG: Verbose per-thread statistics:

DEBUG:     thread #  0: min: 0.0000s  avg: 0.0000s  max: 0.0002s  events: 524288
DEBUG:                  total time taken by even execution: 0.1102s




So it's even worse than memory lag - it's gettimeofday() function lag, basically all software relies on it one way or another....
Comment 12 andrew 2017-02-28 20:22:30 UTC
Ok I tried removing the "hypervisor" feature from CPU and it resulted in 1390MB/s on TSC-low by default in the VPS (same as kvm-clock I must notice)... So seems like FreeBSD does something as not to run slow with "hypervisor" flag but it is turned off when you disable the flag.
Also networking stopped working without that flag for some reason, routing stopped working.

So in the end the only fix I see currently is forcing TSC-low manually in FreeBSD until the code is fixed not to penalize TSC on all new platforms.

Linux is fixed with disabled kvmclock (seems like debian 7 backport from 8 is not the latest so that might be the trouble).
Comment 13 bob.cauthen@gmail.com 2017-03-03 15:40:55 UTC
As an interested party to this bug I have to raise an issue with this potential workaround.

First though... thanks everybody who discovered and tested this... BUT

According to timecounters(4):

     kern.timecounter.tc.X.quality is an integral value, defining the quality
     of	this time counter compared to others.  A negative value	means this
     time counter is broken and	should not be used.


Andrew's test output showed this line:

Timecounter "TSC-low" frequency 1700064513 Hz quality -100

If the workaround forces the use of TSC-low, and it's kern.timecounter.tc.X.quality is negative, are we not advocating a workaround with a broken timecounter as measured by the OS?

If the answer is yes (to my rhetorical question) possible follow-up questions might then be:

- Should we trust the negative "quality" measurement? (if not, maybe it's easier to mod the timecounter measurement code??)

- Has anyone done any longer term testing with the TSC-low timer in this configuration to see if using that time counter effects anything else in a running system?

Sorry to be the opposing voice here (especially because this bug affects me too).
Comment 14 andrew 2017-03-03 18:42:36 UTC
Bob, this issue is a major one. I had to delve into almost academical works around all available timers and their connection with virtualization.

Seems the only "correct" work is done by VMWare which does corrections to broken TSC. I say "correct" because it still requires a special reliable_tsc flag on virtual CPU that the OS believes it. Reason? Because virtualization software does not provide direct access there might be time outs (say host is doing something) which leads to skewing of time or even negative timer.

Here's the VMWare "fix":

static void
tsc_freq_vmware(void)
{
	u_int regs[4];

	if (hv_high >= 0x40000010) {
		do_cpuid(0x40000010, regs);
		tsc_freq = regs[0] * 1000;
	} else {
		vmware_hvcall(VMW_HVCMD_GETHZ, regs);
		if (regs[1] != UINT_MAX)
			tsc_freq = regs[0] | ((uint64_t)regs[1] << 32);
	}
	tsc_is_invariant = 1;
}

static void
probe_tsc_freq(void)
{

...

	if (vm_guest == VM_GUEST_VMWARE) {
		tsc_freq_vmware();
		return;
	}
...

So this issue arrived on KVM, also with ease of migration it came to attention that non-standardized timers lead to migration and suspend resume FAILURE.

KVM created kvmclock - which is a monstrosity and is still twice slower than native TSC. And works only on Linux.

FreeBSD has no support for KVMClock. It has native support for VMWare in code, however it does not recognize constant_tsc flag for some reason. This leads to issue where OS under virtualization uses HPET or ACPI-PM which are slower on some systems and are serial (why have them faster when you have TSC?).

Here's the code which kills the quality of TSC under KVM in FreeBSD:

static int
test_tsc(void)
{
	uint64_t *data, *tsc;
	u_int i, size, adj;

	if ((!smp_tsc && !tsc_is_invariant) || vm_guest)
		return (-100);

...

As you can see, KVM has no chance of going around. What takes sets the vm_guest variable? Well it's the 'hypervisor' flag on CPU


Disabling which brings on a new can of worms (not working virtio drivers). And I think it hits the SMP test which never happens (KVM gives 1 core)
Comment 15 deJong 2017-03-15 08:43:48 UTC
I managed to find a patch which adds KVMCLOCK support, however it's pretty old and I did not get around testing it.

I believe it has not been implemented in the -CURRENT branch yet.

Also, I read here:
http://markmail.org/message/pjtay3dghuqpv4hg

that using TSC-low is not entirely foolproof (whatever that may mean, I am more of a sysadmin) so there are probably some issues with using it.

Mailing list:
https://lists.freebsd.org/pipermail/freebsd-arch/2015-January/016587.html

Patch:
https://reviews.freebsd.org/D1435#inline-51273

Maybe somebody could take a look at it and test it to see if it resolves this issue.
Comment 16 deJong 2017-03-21 13:55:21 UTC
I have tested the patch set and unfortunately it made no difference

root@testbsd11:~ # sysctl kern.timecounter.choice
kern.timecounter.choice: TSC-low(-100) i8254(0) ACPI-fast(900) HPET(950) dummy(-1000000)

root@testbsd11:~ # sysbench --num-threads=1 --test=memory --memory-total-size=1G --memory-block-size=1K run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing memory operations speed test
Memory block size: 1K

Memory transfer size: 1024M

Memory operations type: write
Memory scope type: global
Threads started!
Done.

Operations performed: 1048576 (34976.80 ops/sec)

1024.00 MB transferred (34.16 MB/sec)


Test execution summary:
    total time:                          29.9792s
    total number of events:              1048576
    total time taken by event execution: 21.9685
    per-request statistics:
         min:                                  0.02ms
         avg:                                  0.02ms
         max:                                 22.21ms
         approx.  95 percentile:               0.01ms

Threads fairness:
    events (avg/stddev):           1048576.0000/0.00
    execution time (avg/stddev):   21.9685/0.00

So either the problem is somewhere else or these patches does not apply to this situation, it would be good if someone can confirm or deny this.
Comment 17 andrew 2017-03-21 19:04:40 UTC
(In reply to deJong from comment #16)
It doesn't seem like the patchset is applied at all.

> kern.timecounter.choice: TSC-low(-100) i8254(0) ACPI-fast(900) HPET(950) dummy(-1000000)

It should add kvmclock to that list which is obviously not there.

Forcing TSC-low on KVM VM when the host trusts it more than HPET and ACPI-PM is foolproof imo.

It's obvious that the choice for this has been designed a long time ago and has been programmed this way because back then TSC really sucked and any hypervisor except for programmed by VMWare loophole is penalised (don't know though which because xen requires more workarounds in the kernel and I don't really know how bhyve runs it)..

So obviously this part of code should be reworked
if ((!smp_tsc && !tsc_is_invariant) || vm_guest)
	return (-100);

X86_FEATURE_CONSTANT_TSC is not checked at all (this is from linux, but still BSD does not check that in any case in its TSC code (or I might be misreading the magic tsc_is_invariant variable)). And most of all the vm_guest check should go away. It's a terrible hack nowadays as this bug shows.

I did not try this yet but maybe something like this will help

if ((!smp_tsc && !tsc_is_invariant) || (vm_guest && vm_guest != VM_GUEST_KVM))
	return (-100);
Comment 18 andrew 2017-03-22 11:28:15 UTC
Created attachment 181048 [details]
tsc.c patch to allow KVM hypervisor with host cpu to pass through the good TSC

This patch is not "correct" because it just whitelists Generic virtualization (VM_GUEST_VM), however I don't see how it can be done correctly at this point of overall coding quality in tsc.c regarding virtualization.

I guess a CPU_VENDOR_VIRTUAL should be considered as an option, so we can leave host pass-through CPU as a real host CPU and allow virtual CPUs like QEMU_X86_64 or whatever to be treated separately where all ifs and switches are tested for stability of CPU TSC (that seems to be the Linux approach).
Comment 19 Dave Cottlehuber freebsd_committer freebsd_triage 2021-04-02 15:56:16 UTC
still present on 13.0-RC4 and earlier. This is the default setting on many cloud providers (e.g. Digital Ocean) so it's a really really nasty widespread default.
Comment 20 me+freebsd 2021-04-12 22:01:19 UTC
I hit that issue while using zfs under kvm and created https://reviews.freebsd.org/D29531 about it but Adam published a much better version at https://reviews.freebsd.org/D29733

So hopefully we shall see some progress about this issue.
Comment 21 Allan Jude freebsd_committer freebsd_triage 2021-04-24 13:30:01 UTC
(In reply to me+freebsd from comment #20)

We are working on a kmod port that will add the KVMClock driver, so people will be able to test the fix without having to patch their kernel, just install the kmod port, load it, and set the sysctl.
Comment 22 Dimitry Andric freebsd_committer freebsd_triage 2021-04-28 18:56:30 UTC
With https://reviews.freebsd.org/D29733 applied on top of 13.0-RELEASE, running a FreeBSD guest on Ubuntu 18.04.3 LTS, with qemu-kvm 1:2.11+dfsg-1ubuntu7.19, I see the following difference in sysbench results:

--- sysbench-acpi-fast.txt	2021-04-28 20:40:05.529460000 +0200
+++ sysbench-kvmclock.txt	2021-04-28 20:33:52.083953000 +0200
@@ -21,28 +21,28 @@

 Done.

-Total operations: 524288 (66548.67 per second)
+Total operations: 524288 (2142666.15 per second)

-512.00 MiB transferred (64.99 MiB/sec)
+512.00 MiB transferred (2092.45 MiB/sec)


 General statistics:
-    total time:                          7.8730s
+    total time:                          0.2397s
     total number of events:              524288

 Latency (ms):
          min:                                    0.00
-         avg:                                    0.01
-         max:                                    0.85
-         95th percentile:                        0.01
-         sum:                                 2644.91
+         avg:                                    0.00
+         max:                                    0.03
+         95th percentile:                        0.00
+         sum:                                  101.93

 Threads fairness:
     events (avg/stddev):           524288.0000/0.00
-    execution time (avg/stddev):   2.6449/0.00
+    execution time (avg/stddev):   0.1019/0.00

 DEBUG: Verbose per-thread statistics:

-DEBUG:     thread #  0: min: 0.0000s  avg: 0.0000s  max: 0.0008s  events: 524288
-DEBUG:                  total time taken by event execution: 2.6449s
+DEBUG:     thread #  0: min: 0.0000s  avg: 0.0000s  max: 0.0000s  events: 524288
+DEBUG:                  total time taken by event execution: 0.1019s
Comment 23 commit-hook freebsd_committer freebsd_triage 2021-05-30 13:37:55 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/ports/commit/?id=563d5929d5f28631999a32d2aab4e72e1bf2c323

commit 563d5929d5f28631999a32d2aab4e72e1bf2c323
Author:     Dave Cottlehuber <dch@FreeBSD.org>
AuthorDate: 2021-05-30 13:35:19 +0000
Commit:     Dave Cottlehuber <dch@FreeBSD.org>
CommitDate: 2021-05-30 13:35:19 +0000

    sysutils/kvmclock-kmod: new port - call for testing

    Improved performance on KVM paravirtualised systems. Testing
    welcomed; please report successes and issues to:

    https://reviews.freebsd.org/D29733

    kvmclock-kmod is experimental and currently under development. This port
    provides an easy and quick method for users to test this code for early
    testing, feedback and bug reports.

    This driver enables FreeBSD to use a more efficient paravirtualized
    hardware clock, instead of emulating one, or abusing hypervisor
    interrupts, when running as a virtualized OS under Linux KVM
    (Kernel-based Virtual Machine).

    Reviewed by:    allanjude
    Sponsored by:   Klara Inc.
    Sponsored by:   SkunkWerks, GmbH
    Differential Revision:  https://reviews.freebsd.org/D30459
    PR:                     216759

 sysutils/Makefile                        |  1 +
 sysutils/kvmclock-kmod/Makefile (new)    | 25 +++++++++++++++++++++++++
 sysutils/kvmclock-kmod/distinfo (new)    |  3 +++
 sysutils/kvmclock-kmod/pkg-descr (new)   | 11 +++++++++++
 sysutils/kvmclock-kmod/pkg-message (new) | 24 ++++++++++++++++++++++++
 5 files changed, 64 insertions(+)
Comment 24 rainer 2021-06-04 14:59:19 UTC
I still have very slow IO in FreeBSD 13.0 amd64 on KVM.

Even with this kvmclock driver (compile from ports as of today).

Host:

ii  qemu-system-x86                      1:2.11+dfsg-1ubuntu7.23                amd64        QEMU full system emulation binaries (x86)
Linux ewos1-com17-prod 5.8.0-48-generic #54~20.04.1-Ubuntu SMP Sat Mar 20 13:40:25 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux


Guest:

(freebsd <tests>) 0 # uname -a
FreeBSD freebsd 13.0-RELEASE-p1 FreeBSD 13.0-RELEASE-p1 #0: Wed May 26 22:15:09 UTC 2021     root@amd64-builder.daemonology.net:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64


(freebsd </root>) 1 # kldload kvmclock 
(freebsd </root>) 0 # sysctl kern.timecounter
kern.timecounter.tsc_shift: 1
kern.timecounter.smp_tsc_adjust: 0
kern.timecounter.smp_tsc: 0
kern.timecounter.invariant_tsc: 0
kern.timecounter.fast_gettime: 1
kern.timecounter.tick: 1
kern.timecounter.choice: kvmclock(975) i8254(0) ACPI-fast(900) TSC-low(-100) dummy(-1000000)
kern.timecounter.hardware: kvmclock
kern.timecounter.alloweddeviation: 5
kern.timecounter.timehands_count: 2
kern.timecounter.stepwarnings: 0
kern.timecounter.tc.kvmclock.quality: 975
kern.timecounter.tc.kvmclock.frequency: 1000000000
kern.timecounter.tc.kvmclock.counter: 2592740499
kern.timecounter.tc.kvmclock.mask: 4294967295
kern.timecounter.tc.i8254.quality: 0
kern.timecounter.tc.i8254.frequency: 1193182
kern.timecounter.tc.i8254.counter: 56633
kern.timecounter.tc.i8254.mask: 65535
kern.timecounter.tc.ACPI-fast.quality: 900
kern.timecounter.tc.ACPI-fast.frequency: 3579545
kern.timecounter.tc.ACPI-fast.counter: 15142699
kern.timecounter.tc.ACPI-fast.mask: 16777215
kern.timecounter.tc.TSC-low.quality: -100
kern.timecounter.tc.TSC-low.frequency: 1300025233
kern.timecounter.tc.TSC-low.counter: 3482345276
kern.timecounter.tc.TSC-low.mask: 4294967295
(freebsd </root>) 0 # time dc3dd wipe=/dev/vtbd1

dc3dd 7.2.646 started at 2021-06-04 14:54:43 +0200
compiled options:
command line: dc3dd wipe=/dev/vtbd1
device size: 209715200 sectors (probed),   107,374,182,400 bytes
sector size: 512 bytes (probed)
  2559508480 bytes ( 2.4 G ) copied (  2% ),  145 s, 17 M/s                     

input results for pattern `00':
   4999040 sectors in

output results for device `/dev/vtbd1':
   4999040 sectors out

dc3dd aborted at 2021-06-04 14:57:07 +0200

dc3dd wipe=/dev/vtbd1  0.79s user 1.55s system 1% cpu 2:24.62 total


(freebsd </root>) 1 # sysctl kern.timecounter.hardware=TSC-low
kern.timecounter.hardware: kvmclock -> TSC-low
(freebsd </root>) 0 # time dc3dd wipe=/dev/vtbd1              

dc3dd 7.2.646 started at 2021-06-04 14:59:07 +0200
compiled options:
command line: dc3dd wipe=/dev/vtbd1
device size: 209715200 sectors (probed),   107,374,182,400 bytes
sector size: 512 bytes (probed)
  2588770304 bytes ( 2.4 G ) copied (  2% ),  150 s, 16 M/s                     

input results for pattern `00':
   5056192 sectors in

output results for device `/dev/vtbd1':
   5056192 sectors out

dc3dd aborted at 2021-06-04 15:01:37 +0200

dc3dd wipe=/dev/vtbd1  0.99s user 1.53s system 1% cpu 2:29.96 total


This volume should be good for 5000 IOPs and 250MB/s.
Comment 25 Graham Perrin freebsd_committer freebsd_triage 2022-12-30 19:49:16 UTC
<https://github.com/freebsd/freebsd-src/commit/6fa88a627d5e9d290022b6f463effc99f3df8ee2> (2021-10-12)

* kvm_clock: KVM paravirtual clock support 
* on branches releng/12.3 + releng/12.4 + stable/12
Comment 26 Mark Linimon freebsd_committer freebsd_triage 2024-01-10 03:16:17 UTC
^Triage: assign to committer that resolved.  It is now in 13 and 14.

To submitter: did the freebsd-src commit Graham cited fix the problem?
Comment 27 Mark Linimon freebsd_committer freebsd_triage 2024-06-14 03:55:32 UTC
^Triage: believe fixed.  Feedback timeout > 5 months.