Bug 262273 - Subsuquent Calls to clock_gettime(CLOCK_THREAD_CPUTIME_ID,... ) return time in the past
Summary: Subsuquent Calls to clock_gettime(CLOCK_THREAD_CPUTIME_ID,... ) return time i...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: Unspecified
Hardware: amd64 Any
: --- Affects Some People
Assignee: Mark Johnston
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-03-01 13:59 UTC by Donald Sharp
Modified: 2022-03-24 15:27 UTC (History)
3 users (show)

See Also:


Attachments
attempt to fix the problem (945 bytes, patch)
2022-03-01 15:47 UTC, firk
no flags Details | Diff
patch to fix the problem (971 bytes, patch)
2022-03-02 14:54 UTC, firk
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Donald Sharp 2022-03-01 13:59:11 UTC
When I run:
```
#include <stdio.h>
#include <time.h>

void main(void)
{
        struct timespec before, after;

        while (1) {
                clock_gettime(CLOCK_THREAD_CPUTIME_ID, &before);
                clock_gettime(CLOCK_THREAD_CPUTIME_ID, &after);

                printf("before: %lu:%lu after %lu:%lu\n",
                        before.tv_sec, before.tv_nsec,
                        after.tv_sec, after.tv_nsec);

                if (after.tv_nsec < before.tv_nsec)
                        exit(-1);
        }
}
```

I get this output:

before: 0:18195000 after 0:18198000
before: 0:26005000 after 0:18258000

The after time is prior to the before time.

I am running on 12.1-Release in a vm on a ryzen 3900x.  I also asked a friend to run it on bare metal and got the same issue:
before: 0:444288000 after 0:444288000
before: 0:446097000 after 0:444290000

They are running on FreeBSD 12.3-STABLE stable/12-n163-259bedb8f

It is my expectation that there should not be jumps backwards in time.  Is my assumption incorrect?
Comment 1 firk 2022-03-01 14:09:29 UTC
That could happen when CPU frequency still not reached its maximum value. The cpu_time is calculated as cpu_time = cpu_ticks / max_cpu_tickrate, and when max_cpu_tickrate increasing, cpu_time decreasing.

Did you tested this on not just-powered-up system? Try to run the test multiple times.
Comment 2 Donald Sharp 2022-03-01 14:15:23 UTC
I am able to reproduce this issue at will on my system.  Just have to run this test program and it happens
Comment 3 Joe Marcus Clarke freebsd_committer freebsd_triage 2022-03-01 14:16:57 UTC
My understanding is that the thread_cputime clock is monotonic for a given resolution (e.g., 1.0000000000000002e-06).  On the 12.3-STABLE system below, this i5 machine has been up for nearly three days.
Comment 4 firk 2022-03-01 14:45:08 UTC
I can't reproduce this (but that's not on Ryzen)

Try this


#include <stdio.h>
#include <sys/time.h>

int main(void)
{
        struct timespec before, after;

        clockid_t cid;
        if(clock_getcpuclockid2(0, CPUCLOCK_WHICH_TID, &cid)<0) return -1;

        while (1) {
                clock_gettime(cid, &before);
                clock_gettime(cid, &after);

                printf("before: %lu:%lu after %lu:%lu\n",
                        before.tv_sec, before.tv_nsec,
                        after.tv_sec, after.tv_nsec);

                if (after.tv_nsec < before.tv_nsec)
                        return -1;
        }
}


same problem?
Comment 5 Joe Marcus Clarke freebsd_committer freebsd_triage 2022-03-01 14:51:14 UTC
This has run for over 15 seconds without fail on my i5.  Though still not sure why the other clock is not monotonic.  I think the goal is to have somewhat portable code between Linux and *BSD.
Comment 6 firk 2022-03-01 14:57:58 UTC
Please run both tests multiple times to ensure that the first fails from time to time and the second not. If this is true, it is surely a bug that should be fixed.

The difference between these two timers is that CLOCK_THREAD_CPUTIME_ID is designed to be fully realtime, while clock_getcpuclockid2(0,CPUCLOCK_WHICH_TID) may return slightly outdated value.
Comment 7 Joe Marcus Clarke freebsd_committer freebsd_triage 2022-03-01 14:59:19 UTC
Yeah, I've been running both and I get 100% failure on Donald's original code (at different times), but thus far your code has not failed minutes in.
Comment 8 Donald Sharp 2022-03-01 15:36:42 UTC
When I run my test program it fails within 2 seconds every time I have ever run it.

When I run firk@cantconnect.ru 's test, I ran it for over 5 minutes without it failing.
Comment 9 firk 2022-03-01 15:47:03 UTC
Created attachment 232188 [details]
attempt to fix the problem

Please try this patch to kernel.

Looks like it is a conflict between switchtime/runtime reading in kern_thread_cputime() and switchtime/runtime updating in statclock(). Fixing statclock will be to expensive, so just make workaround in kern_thread_cputime().
Comment 10 firk 2022-03-02 14:54:42 UTC
Created attachment 232206 [details]
patch to fix the problem

Ok, I reproduced the bug, now there is proper patch that fixes it.
Comment 11 Joe Marcus Clarke freebsd_committer freebsd_triage 2022-03-02 15:16:01 UTC
This patch looks good.  I couldn't patch my i5 at the moment as it's my prod router.  But I patched my 13-STABLE VM, fixed Donald's code to allow for a .0 nsec, and it's been running solid for minutes.
Comment 12 firk 2022-03-15 21:04:18 UTC
created differential revision https://reviews.freebsd.org/D34559 hoping it'll help to approve this patch
Comment 13 commit-hook freebsd_committer freebsd_triage 2022-03-17 19:40:47 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=28d08dc7d051a4e058cc0004cf4dd884f87037a2

commit 28d08dc7d051a4e058cc0004cf4dd884f87037a2
Author:     firk <firk@cantconnect.ru>
AuthorDate: 2022-03-15 22:22:21 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2022-03-17 19:39:00 +0000

    clock_gettime: Fix CLOCK_THREAD_CPUTIME_ID race

    Use a spinlock section instead of a critical section to synchronize with
    statclock().  Otherwise the CLOCK_THREAD_CPUTIME_ID clock can appear to
    go backwards.

    PR:             262273
    Reviewed by:    markj
    MFC after:      1 week
    Differential Revision:  https://reviews.freebsd.org/D34568

 sys/kern/kern_time.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
Comment 14 commit-hook freebsd_committer freebsd_triage 2022-03-24 15:23:54 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=e7867aaa35d0970baa33c2aaa959829d84034558

commit e7867aaa35d0970baa33c2aaa959829d84034558
Author:     firk <firk@cantconnect.ru>
AuthorDate: 2022-03-15 22:22:21 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2022-03-24 14:28:27 +0000

    clock_gettime: Fix CLOCK_THREAD_CPUTIME_ID race

    Use a spinlock section instead of a critical section to synchronize with
    statclock().  Otherwise the CLOCK_THREAD_CPUTIME_ID clock can appear to
    go backwards.

    PR:             262273
    Reviewed by:    markj

    (cherry picked from commit 28d08dc7d051a4e058cc0004cf4dd884f87037a2)

 sys/kern/kern_time.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)