Bug 121660 - [hwpmc] [patch] hwpmc(4) incorrectly handles PMC sampling events from AMD
Summary: [hwpmc] [patch] hwpmc(4) incorrectly handles PMC sampling events from AMD
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 8.0-CURRENT
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-03-13 04:00 UTC by Adrian Chadd
Modified: 2019-01-19 19:36 UTC (History)
1 user (show)

See Also:


Attachments
file.diff (744 bytes, patch)
2008-03-13 04:00 UTC, Adrian Chadd
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Adrian Chadd freebsd_committer freebsd_triage 2008-03-13 04:00:07 UTC
hwpmc returns nothing from a user-mode sample (pmcstat -P instructions -O sample.out -t <pid>)

The hwpmc registers are 48 bit on at least the Athlon XP platform
(and the code sets the width value in the class for all AMD CPUs to
48, not 64). The sampling is done by counting upwards until the
counter loops and generates an interrupt. The code uses a 2s compliment
trick to turn the sample period counter into a counter number
useful for generating an NMI. It does this on a 64 bit value but
as the counters are 48 bit, it will read back a 48 bit value with
the high 16 bits set to 0 and the virtual PMC stuff quickly loses
track.

My email to -current has more info:

Between my Athlon XP box giving me no useful pmc stats and my new Core
2 duo box not even working with pmc, I decided to poke at the Athlon
XP support a bit to see if I could figure out what was going on.

It seems that at least my revision of the Athlon XP has 48 bit
performance counters (AMD Athlon Processor x86 Code Optimisation
Guide, page 235 (Performance-Monitoring Counters: Overview) and the
top 16 bits read back 0x0000.

Since the code is taking the 2's compliment of the stored PMC value
(which is so the value is incremented to 0xffffffffffffffff and wraps
over, generating an NMI - mentioned on page 240), negating the value
gives humerous results:

(Note: some of these are my own debugging information.)

Mar  9 16:09:43 jacinta kernel: hwpmc: TSC/1/0x20<REA>
K7/4/0x1ff<INT,USR,SYS,EDG,THR,REA,WRI,INV,QUA>
Mar  9 16:10:02 jacinta kernel: MDP:SWO:1: pc=0xc5814180 pp=0 enable-msr=0
Mar  9 16:10:02 jacinta kernel: local initial: ri 1: 65536
Mar  9 16:10:02 jacinta kernel: MDP:SWO:1: pc=0xc5814180 pp=0xc576a780
enable-msr=0
Mar  9 16:10:02 jacinta kernel: csw_in: ri 1; pmcval 65536
Mar  9 16:10:02 jacinta kernel: MDP:WRI:1: amd-write cpu=0 ri=1
v=ffffffffffff0000
Mar  9 16:10:02 jacinta kernel: MDP:SWI:1: pc=0xc5814180 pp=0xc576a780
enable-msr=0
Mar  9 16:10:02 jacinta kernel: MDP:REA:1: amd-read id=1 class=1
Mar  9 16:10:02 jacinta kernel: MDP:REA:2: amd-read id=1 -> ffff00000000ff01
Mar  9 16:10:02 jacinta kernel: read: ffff00000000ff01; saved 10000;
diff -281474976710911
Mar  9 16:10:02 jacinta kernel: csw_out: ri 1: pp_pmcval 65536..
Mar  9 16:10:02 jacinta kernel: csw_out: ... ri 1: pp_pmcval now
281474976710911..
Mar  9 16:10:02 jacinta kernel: MDP:SWO:1: pc=0xc5814180 pp=0xc576a780
enable-msr=0
Mar  9 16:10:02 jacinta kernel: csw_in: ri 1; pmcval 281474976710911
Mar  9 16:10:02 jacinta kernel: MDP:WRI:1: amd-write cpu=0 ri=1
v=fffeffffffffff01
Mar  9 16:10:02 jacinta kernel: MDP:SWI:1: pc=0xc5814180 pp=0xc576a780
enable-msr=0
Mar  9 16:10:02 jacinta kernel: MDP:REA:1: amd-read id=1 class=1
Mar  9 16:10:02 jacinta kernel: MDP:REA:2: amd-read id=1 -> ffff00000000f47f
Mar  9 16:10:02 jacinta kernel: read: ffff00000000f47f; saved
10000000000ff; diff -562949953358976
Mar  9 16:10:02 jacinta kernel: csw_out: ri 1: pp_pmcval 281474976710911..
Mar  9 16:10:02 jacinta kernel: csw_out: ... ri 1: pp_pmcval now
844424930004351..

Fix: This attempts to "pretend" to be the expected value - and it begins
recording sample events in the above test - but I don't believe its
correct. If the value rolls over somehow then we'll be OR'ing in
high bits inappropriately.

I think it should be a sign-extend rather than my OR operation.
How-To-Repeat: 
pmcstat -P instructions -O sample.out -t pid
Comment 1 Robert Watson freebsd_committer freebsd_triage 2008-03-13 12:41:28 UTC
Responsible Changed
From-To: freebsd-bugs->jkoshy

Over to maintainer.
Comment 2 Adrian Chadd freebsd_committer freebsd_triage 2008-03-16 05:07:58 UTC
This patch sign extends the 48 bit to 64 bits before applying 2s
compliement to restore the original sampling value. It seems to work
for me.

Index: hwpmc_amd.c
===================================================================
RCS file: /share/FreeBSD/cvsrepo/src/sys/dev/hwpmc/hwpmc_amd.c,v
retrieving revision 1.14
diff -u -r1.14 hwpmc_amd.c
--- hwpmc_amd.c 7 Dec 2007 08:20:15 -0000       1.14
+++ hwpmc_amd.c 16 Mar 2008 05:02:38 -0000
@@ -302,12 +302,22 @@
 #endif

        tmp = rdmsr(pd->pm_perfctr); /* RDMSR serializes */
-       if (PMC_IS_SAMPLING_MODE(mode))
-               *v = AMD_PERFCTR_VALUE_TO_RELOAD_COUNT(tmp);
-       else
-               *v = tmp;
+       PMCDBG(MDP,REA,2,"amd-read (pre-munge) id=%d -> %jd", ri, tmp);
+       if (PMC_IS_SAMPLING_MODE(mode)) {
+               /*
+                * The counters are 48 bit but we expect them to be 64 bit for
+                * 2s compliment "hack" done to allow the "count up" to overflow
+                * which triggers the sampling mode NMI.
+                * This code sign-extends the 48 bit number in case the returned
+                * value requires it.
+                */
+               if (tmp & 0x0000800000000000)
+                       tmp |= 0xffff000000000000;
+               tmp = AMD_PERFCTR_VALUE_TO_RELOAD_COUNT(tmp);
+       }
+       *v = tmp;

-       PMCDBG(MDP,REA,2,"amd-read id=%d -> %jd", ri, *v);
+       PMCDBG(MDP,REA,2,"amd-read (post-munge) id=%d -> %jd", ri, *v);

        return 0;
 }
@@ -683,7 +693,7 @@
        KASSERT(cpu >= 0 && cpu < mp_ncpus,
            ("[amd,%d] out of range CPU %d", __LINE__, cpu));

-       PMCDBG(MDP,INT,1, "cpu=%d tf=0x%p um=%d", cpu, (void *) tf,
+       PMCDBG(MDP,INT,1, "cpu=%d tf=%p um=%d", cpu, (void *) tf,
            TRAPF_USERMODE(tf));

        retval = 0;
Comment 3 Joseph Koshy freebsd_committer freebsd_triage 2008-03-17 05:58:27 UTC
>  This patch sign extends the 48 bit to 64 bits before applying 2s
>  compliement to restore the original sampling value. It seems to work
>  for me.

That is great news.

I have a simpler patch to offer, though.  This replaces the test and
branch with two shifts.

Index: hwpmc_amd.c
===================================================================
RCS file: /cvs/FreeBSD/src/sys/dev/hwpmc/hwpmc_amd.c,v
retrieving revision 1.9.2.1
diff -u -r1.9.2.1 hwpmc_amd.c
--- hwpmc_amd.c	15 Sep 2005 15:48:16 -0000	1.9.2.1
+++ hwpmc_amd.c	16 Mar 2008 16:48:27 -0000
@@ -298,9 +298,11 @@
 #endif
 
 	tmp = rdmsr(pd->pm_perfctr); /* RDMSR serializes */
-	if (PMC_IS_SAMPLING_MODE(mode))
+	if (PMC_IS_SAMPLING_MODE(mode)) {
+		/* Sign extend 48 bit value to 64 bits. */
+		tmp = (pmc_value_t) (((int64_t) tmp << 16) >> 16);
 		*v = AMD_PERFCTR_VALUE_TO_RELOAD_COUNT(tmp);
-	else
+	} else
 		*v = tmp;
 
 	PMCDBG(MDP,REA,2,"amd-read id=%d -> %jd", ri, *v);

The use of 2's complement isn't a 'hack'; these counters only count
up, so if you want sample every 'n' events, you have to program the
counters with an initial value that is 'n' less than the overflow
limit.  This is documented in the manual, IIRC.

The additional debug messages you've added are fine; if you feel
they will be to be useful to someone in the future, feel free to
add them to CVS.

Regards,
Koshy
Comment 4 dfilter service freebsd_committer freebsd_triage 2008-03-18 08:39:18 UTC
adrian      2008-03-18 08:39:12 UTC

  FreeBSD src repository

  Modified files:
    sys/dev/hwpmc        hwpmc_amd.c 
  Log:
  Sign-extend the 48-bit AMD PMC counter before treating it to a 64-bit
  2's compliment.
  
  The 2's compliment transform is done so a "count down" sampling interval
  can be converted into a "count up" PMC value. a 2's complimented 'count down'
  value is written to the PMC counter; then the read-back counter is reverted
  via another 2's compliment.
  
  PR: kern/121660
  Reviewed by: jkoshy
  Approved by: jkoshy
  MFC after: 1 week
  
  Revision  Changes    Path
  1.16      +8 -5      src/sys/dev/hwpmc/hwpmc_amd.c
_______________________________________________
cvs-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/cvs-all
To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org"
Comment 5 Joseph Koshy freebsd_committer freebsd_triage 2008-03-19 05:15:42 UTC
jk> I have a simpler patch to offer, though.  This replaces the test and
jk> branch with two shifts.

And here's an even simpler patch that keeps the flow of the code
in hwpmc_amd.c intact.  It is cleaner to keep the conversion to and
from the perf counter's representation and the MI representation
within these macros instead of partially inside and partially outside
them.

Index: hwpmc_amd.h
===================================================================
RCS file: /cvs/FreeBSD/src/sys/dev/hwpmc/hwpmc_amd.h,v
retrieving revision 1.1
diff -u -r1.1 hwpmc_amd.h
--- hwpmc_amd.h	9 Jun 2005 19:45:07 -0000	1.1
+++ hwpmc_amd.h	18 Mar 2008 16:22:03 -0000
@@ -79,8 +79,9 @@
 #define AMD_PMC_IS_STOPPED(evsel) ((rdmsr((evsel)) & AMD_PMC_ENABLE) == 0)
 #define AMD_PMC_HAS_OVERFLOWED(pmc) ((rdpmc(pmc) & (1ULL << 47)) == 0)
 
-#define	AMD_RELOAD_COUNT_TO_PERFCTR_VALUE(V)	(-(V))
-#define	AMD_PERFCTR_VALUE_TO_RELOAD_COUNT(P)	(-(P))
+#define	AMD_PERFCTR_LIMIT			0x0000FFFFFFFFFFFFULL
+#define	AMD_RELOAD_COUNT_TO_PERFCTR_VALUE(V)	(AMD_PERFCTR_LIMIT - (V) + 1)
+#define	AMD_PERFCTR_VALUE_TO_RELOAD_COUNT(P)	(AMD_PERFCTR_LIMIT - (P) + 1)
 
 struct pmc_md_amd_op_pmcallocate {
 	uint32_t	pm_amd_config;


Could you let me know if this change works ok?

Koshy
Comment 6 Joseph Koshy freebsd_committer freebsd_triage 2008-04-10 09:58:34 UTC
State Changed
From-To: open->feedback

Awaiting feedback on revised patch.
Comment 7 Hiren Panchasara freebsd_committer freebsd_triage 2013-05-06 21:41:43 UTC
Responsible Changed
From-To: jkoshy->adrian

Assigning to Adrian as it's waiting for his feedback.
Comment 8 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 08:00:20 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped
Comment 9 Oleksandr Tymoshenko freebsd_committer freebsd_triage 2019-01-19 19:36:07 UTC
There was a commit referencing this bug, but it's still not closed and has been inactive for some time. Closing as "fixed". Please re-open it if issue hasn't been completely resolved.