Bug 169379 - [dtrace] [patch] dtrace's timestamp variable is unreliable when the cpu goes into C3 state
Summary: [dtrace] [patch] dtrace's timestamp variable is unreliable when the cpu goes ...
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: Unspecified
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords: patch
Depends on:
Blocks:
 
Reported: 2012-06-24 18:40 UTC by Fabian Keil
Modified: 2022-10-17 12:38 UTC (History)
0 users

See Also:


Attachments
file.txt (6.68 KB, text/plain)
2012-06-24 18:40 UTC, Fabian Keil
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Fabian Keil 2012-06-24 18:40:06 UTC
On my amd64 system dtrace's timestamp variable is unreliable
if the cpu is allowed to go into C3 state and the best
kern.timecounter.hardware available (HPET) is used.

The timestamp counter seems to tick about ten times slower than
expected, causing things that are timed with timestamp deltas to
appear ten times faster than they are.

This does not happen if TSC is used as timecounter,
but it's rated as -1000 and using it is undesirable.

Looking at the code, I suspect the same issue exists on i386
but I didn't test this.

Fix: The attached patch replaces the kernel timestamp code on
amd64 and i386 with the much simpler code already used on
mips.

This fixes the issue for me on amd64, I now reliably
get an average of 1007 ms which seems reasonable to me.

Patch attached with submission follows:
How-To-Repeat: Set hw.acpi.cpu.cx_lowest to C3 and execute the dtrace
script below while calling date in a loop with:

while date; do sleep 1; done

I get averages around 90 ms instead of 1000 ms + overhead as expected.

-------

#!/usr/sbin/dtrace -s

#pragma D option quiet

dtrace:::BEGIN
{
        cached_timestamp = (int64_t)0;
}

syscall::exec*:return
/(execname == "date") && cached_timestamp/
{
        self->elapsed = (timestamp - cached_timestamp) / 1000000;
        printf("ms since the last date execution: %d\n", self->elapsed);
        @elapsed_time["elapsed"] = lquantize(self->elapsed, 0, 1200, 10);
        @elapsed_avg["elapsed avg"] = avg(self->elapsed);
        self->elapsed = 0;
}

syscall::exec*:return
/execname == "date"/
{
        cached_timestamp = timestamp;
        printf("Date execution detected at %d\n", cached_timestamp);
}

tick-60sec
{
        exit(1);
}
Comment 1 Fabian Keil 2012-06-24 22:04:25 UTC
Unfortunately the patch has the same issue as the one I proposed in:
http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/159612

In this case it causes the system to panic when using timestamp
inside probes for "fbt::nanouptime:".

Fabian
Comment 2 Mark Linimon freebsd_committer freebsd_triage 2012-06-25 00:51:09 UTC
Responsible Changed
From-To: freebsd-amd64->gnn

possibly another one for gnn to mull over.
Comment 3 George V. Neville-Neil freebsd_committer freebsd_triage 2012-07-26 17:19:37 UTC
State Changed
From-To: open->patched

This should be fixed by: 

http://svn.freebsd.org/changeset/base/238537
Comment 4 George V. Neville-Neil freebsd_committer freebsd_triage 2012-07-26 19:15:01 UTC
State Changed
From-To: patched->open

Updated the wrong PR, this is still an issue.
Comment 5 Davide Italiano freebsd_committer freebsd_triage 2013-05-29 12:11:48 UTC
Hi you both.
I'm not able to reproduce this as per r251026.
root@dirichlet:/home/davide # sysctl hw.acpi.cpu.cx_lowest
hw.acpi.cpu.cx_lowest: C3

root@dirichlet:/home/davide/debug_dtrace # ./timestamp.d
Probe fired at 2013 May 28 17:19:01, 999 ms after the previous one.
Probe fired at 2013 May 28 17:19:02, 999 ms after the previous one.
Probe fired at 2013 May 28 17:19:03, 999 ms after the previous one.
Probe fired at 2013 May 28 17:19:04, 999 ms after the previous one.
Probe fired at 2013 May 28 17:19:05, 999 ms after the previous one.
Probe fired at 2013 May 28 17:19:06, 999 ms after the previous one.
Probe fired at 2013 May 28 17:19:07, 999 ms after the previous one.
Probe fired at 2013 May 28 17:19:08, 999 ms after the previous one.
Probe fired at 2013 May 28 17:19:09, 999 ms after the previous one.
Probe fired at 2013 May 28 17:19:10, 999 ms after the previous one.
Probe fired at 2013 May 28 17:19:11, 999 ms after the previous one.
Probe fired at 2013 May 28 17:19:12, 999 ms after the previous one.
^C
Probe fired at 2013 May 28 17:19:13, 999 ms after the previous one.

  elapsed
           value  ------------- Distribution ------------- count
             980 |                                         0
             990 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 13
            1000 |                                         0

  elapsed avg                                                     999

root@dirichlet:/home/davide/debug_dtrace # ./timestamp2.d
Date execution detected at 582088745137
ms since the last date execution: 1012
Date execution detected at 583101069460
ms since the last date execution: 1065
Date execution detected at 584166933763
ms since the last date execution: 1034
Date execution detected at 585201455739
ms since the last date execution: 1010
Date execution detected at 586212097290
ms since the last date execution: 1020
Date execution detected at 587233056523
ms since the last date execution: 1003
Date execution detected at 588237041013
ms since the last date execution: 1043
Date execution detected at 589280876859
ms since the last date execution: 1008
Date execution detected at 590289153541
ms since the last date execution: 1065
Date execution detected at 591355033619
ms since the last date execution: 1003
Date execution detected at 592359053134

Can you please see if you can reproduce it on a recent HEAD?
Comment 6 Fabian Keil 2013-05-29 18:27:05 UTC
Davide Italiano <davide@freebsd.org> wrote:

> I'm not able to reproduce this as per r251026.


Were you ever able to reproduce it with your current hardware?

My understanding is that whether or not the various sleep states
affect the TSC depends on the hardware.

This is already reflected in the TSC timecounter quality which I
assume isn't -1000 on your system, but DTrace currently doesn't
leverage this information.

> Can you please see if you can reproduce it on a recent HEAD?


I can still reproduce it with a kernel based on r251099:

fk@r500 ~ $sudo scripts/timestamp-test.d 
Probe fired at 2013 May 29 19:09:42,  28 ms after the previous one.
Probe fired at 2013 May 29 19:09:43,  33 ms after the previous one.
Probe fired at 2013 May 29 19:09:44, 320 ms after the previous one.
[...]
Probe fired at 2013 May 29 19:10:37, 126 ms after the previous one.
Probe fired at 2013 May 29 19:10:38, 149 ms after the previous one.
Probe fired at 2013 May 29 19:10:39, 204 ms after the previous one.
Probe fired at 2013 May 29 19:10:40, 136 ms after the previous one.

  elapsed                                           
           value  ------------- Distribution ------------- count    
              10 |                                         0        
              20 |@                                        2        
              30 |@@@@@                                    7        
              40 |@                                        2        
              50 |                                         0        
              60 |                                         0        
              70 |@@@                                      4        
              80 |@@                                       3        
              90 |@                                        2        
[...]
             680 |                                         0        
             690 |@                                        1        
             700 |                                         0        

  elapsed avg                                                     177

Fabian
Comment 7 Davide Italiano freebsd_committer freebsd_triage 2013-05-29 18:36:25 UTC
On Wed, May 29, 2013 at 7:27 PM, Fabian Keil <fk@fabiankeil.de> wrote:
> Davide Italiano <davide@freebsd.org> wrote:
>
>> I'm not able to reproduce this as per r251026.
>
> Were you ever able to reproduce it with your current hardware?
>
> My understanding is that whether or not the various sleep states
> affect the TSC depends on the hardware.
>
> This is already reflected in the TSC timecounter quality which I
> assume isn't -1000 on your system, but DTrace currently doesn't
> leverage this information.
>
>> Can you please see if you can reproduce it on a recent HEAD?
>
> I can still reproduce it with a kernel based on r251099:
>
> fk@r500 ~ $sudo scripts/timestamp-test.d
> Probe fired at 2013 May 29 19:09:42,  28 ms after the previous one.
> Probe fired at 2013 May 29 19:09:43,  33 ms after the previous one.
> Probe fired at 2013 May 29 19:09:44, 320 ms after the previous one.
> [...]
> Probe fired at 2013 May 29 19:10:37, 126 ms after the previous one.
> Probe fired at 2013 May 29 19:10:38, 149 ms after the previous one.
> Probe fired at 2013 May 29 19:10:39, 204 ms after the previous one.
> Probe fired at 2013 May 29 19:10:40, 136 ms after the previous one.
>
>   elapsed
>            value  ------------- Distribution ------------- count
>               10 |                                         0
>               20 |@                                        2
>               30 |@@@@@                                    7
>               40 |@                                        2
>               50 |                                         0
>               60 |                                         0
>               70 |@@@                                      4
>               80 |@@                                       3
>               90 |@                                        2
> [...]
>              680 |                                         0
>              690 |@                                        1
>              700 |                                         0
>
>   elapsed avg                                                     177
>
> Fabian

hmm, OK, I trust you.
Do you mind to give me the specific hardware configuration you're
running on (e.g. CPU, the values of sysctl related to timecounter)? In
particular, this problem is visible only if you rely on TSC or also
with other hardware counters (e.g. HPET)?

-- 
Davide

"There are no solved problems; there are only problems that are more
or less solved" -- Henri Poincare
Comment 8 Fabian Keil 2013-05-29 20:05:53 UTC
Davide Italiano <davide@freebsd.org> wrote:

> On Wed, May 29, 2013 at 7:27 PM, Fabian Keil <fk@fabiankeil.de> wrote:
> > Davide Italiano <davide@freebsd.org> wrote:


> >> Can you please see if you can reproduce it on a recent HEAD?
> >
> > I can still reproduce it with a kernel based on r251099:
> >
> > fk@r500 ~ $sudo scripts/timestamp-test.d

[...]

> hmm, OK, I trust you.


While you can trust me that I can still reproduce the problem with a kernel
based on r251099, what I wrote about timecounters should probably be run
by jkim@, mav@ or someone else with actual experience in this area.

I was mainly parroting what I remembered from threads like
"Time keeping Issues with the low-resolution TSC timecounter":
http://lists.freebsd.org/pipermail/freebsd-current/2011-June/thread.html#25175

> Do you mind to give me the specific hardware configuration you're
> running on (e.g. CPU, the values of sysctl related to timecounter)? In
> particular, this problem is visible only if you rely on TSC or also
> with other hardware counters (e.g. HPET)?


I'm using a Lenove ThinkPad R500 2714-9VG:
http://www.nycbug.org/?action=dmesgd&dmesgid=2449

fk@r500 ~ $sysctl -a | grep counter
kern.timecounter.tc.i8254.mask: 65535
kern.timecounter.tc.i8254.counter: 28801
kern.timecounter.tc.i8254.frequency: 1193182
kern.timecounter.tc.i8254.quality: 0
kern.timecounter.tc.HPET.mask: 4294967295
kern.timecounter.tc.HPET.counter: 1207831391
kern.timecounter.tc.HPET.frequency: 14318180
kern.timecounter.tc.HPET.quality: 950
kern.timecounter.tc.ACPI-fast.mask: 16777215
kern.timecounter.tc.ACPI-fast.counter: 4714057
kern.timecounter.tc.ACPI-fast.frequency: 3579545
kern.timecounter.tc.ACPI-fast.quality: 900
kern.timecounter.tc.TSC.mask: 4294967295
kern.timecounter.tc.TSC.counter: 106194638
kern.timecounter.tc.TSC.frequency: 1995043070
kern.timecounter.tc.TSC.quality: -1000
kern.timecounter.stepwarnings: 0
kern.timecounter.alloweddeviation: 5
kern.timecounter.hardware: HPET
kern.timecounter.choice: TSC(-1000) ACPI-fast(900) HPET(950) i8254(0) dummy(-1000000)
kern.timecounter.tick: 1
kern.timecounter.fast_gettime: 1
kern.timecounter.invariant_tsc: 1
kern.timecounter.smp_tsc: 0
kern.timecounter.smp_tsc_adjust: 0
kern.timecounter.tsc_shift: 1

With hw.acpi.cpu.cx_lowest set to C3, the DTrace timestamps are unreliable
if kern.timecounter.hardware is set to ACPI-fast, HPET or i8254.

With kern.timecounter.hardware=TSC the DTrace timestamp values are
reliable but due to the poor rating using TSC seems undesirable to me:

fk@r500 ~ $sudo scripts/timestamp-test.d 
Probe fired at 2013 May 29 20:20:13, 999 ms after the previous one.
[...]
Probe fired at 2013 May 29 20:21:11, 999 ms after the previous one.

  elapsed                                           
           value  ------------- Distribution ------------- count    
             980 |                                         0        
             990 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 59       
            1000 |                                         0        

  elapsed avg                                                     999

Fabian
Comment 9 George V. Neville-Neil freebsd_committer freebsd_triage 2014-04-21 18:14:39 UTC
Responsible Changed
From-To: gnn->markj

Hand over to the person doing most of the DTrace work.
Comment 10 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 07:59:41 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped
Comment 11 Graham Perrin freebsd_committer freebsd_triage 2022-10-17 12:38:22 UTC
Keyword: 

    patch
or  patch-ready

– in lieu of summary line prefix: 

    [patch]

* bulk change for the keyword
* summary lines may be edited manually (not in bulk). 

Keyword descriptions and search interface: 

    <https://bugs.freebsd.org/bugzilla/describekeywords.cgi>