Bug 207068 - hwpmc wrap around/sign extension
Summary: hwpmc wrap around/sign extension
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.2-STABLE
Hardware: amd64 Any
: --- Affects Some People
Assignee: Konstantin Belousov
Keywords: hwpmc
Depends on:
Reported: 2016-02-10 00:26 UTC by joss.upton
Modified: 2016-04-04 07:32 UTC (History)
4 users (show)

See Also:

pmc sign extension diff (4.06 KB, patch)
2016-02-10 15:41 UTC, joss.upton
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description joss.upton 2016-02-10 00:26:24 UTC
hwpmc_intel writes the old value of a counter back to the counter value register during a context switch in.  However, those registers ignore the upper 32 bits (EDX) and instead sign extended to the register width based on EAX[31].  This is the behavior on all machines not supporting "full width writes" (indicated by CPUID2_PDCM being set && and IA32_PERF_CAPABILITIES(0x345) & FW_WRITE(bit 13) being set).

If full width writes are supported, full-width aliases for IA32_PMCn exist 0x400 MSRs higher (e.g. IA32_PMC0 == 0xC1 and IA32_A_PMC0 = 0x4C1).

This affects processes monitoring CPU_CLK for processes running more than a second or two.  Using this counter quickly runs into a "negative increment" assertion in hwpmc_mod.c because of the sign extension.  Attached is the least invasive patch I could come up with and it's against 10.2-R.

It also fixes a few debugging PMCDBGs and adjusts the size of PMC_DEFAULT_DEBUG_FLAGS to match the size of the structure.
Comment 1 Konstantin Belousov freebsd_committer 2016-02-10 11:54:14 UTC
There is no patch attached.
Comment 2 joss.upton 2016-02-10 15:41:15 UTC
Created attachment 166832 [details]
pmc sign extension diff
Comment 3 joss.upton 2016-02-10 17:04:20 UTC
(In reply to Konstantin Belousov from comment #1)
My apologies... attached now.
Comment 4 Konstantin Belousov freebsd_committer 2016-02-10 20:03:15 UTC
(In reply to joss.upton from comment #3)
Could you please provide exact recipe to reproduce the overflow panic ?

I tried e.g. pmcstat -P CPU_CLK_UNHALTED_CORE <some looping binary> on sandybridge and was unable to panic stock HEAD kernel.
Comment 5 joss.upton 2016-02-10 21:47:03 UTC
(In reply to Konstantin Belousov from comment #4)
I don't know if Sandy Bridge supports the aliases or not.  I do know that vmware fusion on my Broadwell machine does NOT support the IA32_A_PMCn registers and thus will panic.

Note you may also need other processes using the PMC to be scheduled/unscheduled to reproduce... You have to catch a case where it writes with EAX[31]=1 to IA32_PMCn.  This results in an automatic sign extension (as described in the Intel arch manual, Full-Width Writes to Performance Counter Registers) and ignores EDX completely.
Comment 6 joss.upton 2016-02-10 21:57:56 UTC
(In reply to joss.upton from comment #5)
Actually, the best test might be to do this:

uint64_t num;

num = rdmsr(0xc1);
printf("init: %lx\n", num);
wrmsr(0xc1, 0x80000000);
num = rdmsr(0xc1);
printf("after e31: %lx\n", num);

When I do this:

init: 0
after e31: ffff80000000

This behavior is correct according to the Intel manual, but causes issues with negative increments if you race it long enough.
Comment 7 Konstantin Belousov freebsd_committer 2016-02-11 13:57:35 UTC
(In reply to joss.upton from comment #6)
I believe that the behaviour of reading MSR 0xc1 is architectural ?  Anyway, on sandybridge:

sandy% sudo cpucontrol -m 0xc1=0x80000000 /dev/cpuctl0
sandy% sudo cpucontrol -m 0xc1 /dev/cpuctl0
MSR 0xc1: 0x0000ffff 0x80000000

SB does have full-width PMC write capability:
sandy% sudo cpucontrol -m 0x345 /dev/cpuctl0
MSR 0x345: 0x00000000 0x000031c3
bit 13 (FW_WRITE) is set

But my concern is that I cannot reproduce the issue with the following script:
for x in $(jot 8); do pmcstat -P CPU_CLK_UNHALTED_CORE perl ./loop.pl 2>/dev/null &; done

Unpatched kernel must exhibit the problem because it does not write to PMC through aliases and all writes are clipped.  I want to be able to reproduce this before commit, at least I need to validate the change and to understand that it is complete (I agree that this is the right approach, at least).
Comment 8 joss.upton 2016-02-12 03:25:10 UTC
(In reply to Konstantin Belousov from comment #7)
Not sure why you can't reproduce, but I'm doing the equivalent of the shell script below.  Note the ",usr" and the lowercase p.

  On a 4 core machine:
   while true; do
    for a in 1 2 3 4 5 6 7 8; do
    pmcstat -p CPU_CLK_UNHALTED_CORE,usr perl loop.pl &
    sleep 10
    killall perl

This ensures that the PMCs are csw in/out very often (every hz).
Comment 9 commit-hook freebsd_committer 2016-02-12 07:20:29 UTC
A commit references this bug:

Author: kib
Date: Fri Feb 12 07:20:00 UTC 2016
New revision: 295558
URL: https://svnweb.freebsd.org/changeset/base/295558

  Remove tautological cast.

  PR:	207068
  Submitted by:	joss.upton@yahoo.com
  MFC after:	2 weeks

Comment 10 commit-hook freebsd_committer 2016-02-12 07:21:32 UTC
A commit references this bug:

Author: kib
Date: Fri Feb 12 07:20:27 UTC 2016
New revision: 295559
URL: https://svnweb.freebsd.org/changeset/base/295559

  Adjust the size of PMC_DEFAULT_DEBUG_FLAGS to match the size of the

  PR:	207068
  Submitted by:	joss.upton@yahoo.com
  MFC after:	2 weeks

Comment 11 commit-hook freebsd_committer 2016-02-12 07:27:34 UTC
A commit references this bug:

Author: kib
Date: Fri Feb 12 07:27:24 UTC 2016
New revision: 295560
URL: https://svnweb.freebsd.org/changeset/base/295560

  If full width writes to the performance monitoring counters are
  supported, use full-width aliases MSRs for writes.  This fixes the
  "[pmc,X] negative increment" assertion on the context switch when
  clipped counter value is sign-extended.

  Add definitions for the MSR IA32_PERF_CAPABILITIES needed to detect
  the feature.

  PR:	207068
  Submitted by:	joss.upton@yahoo.com
  MFC after:	2 weeks

Comment 12 Konstantin Belousov freebsd_committer 2016-02-12 07:29:19 UTC
(In reply to Konstantin Belousov from comment #7)
I still not able to reproduce the problem, not on sandy machine, not on the haswell box.  Even more strange, sandy box does not report any counter increment for ,usr.

Anyway, I do believe that your patch is the right thing to do, I committed it split into main part and two tweaks.
Comment 13 joss.upton 2016-02-13 03:12:05 UTC
(In reply to Konstantin Belousov from comment #12)
Thank you.  I will integrate this back into my tree and retest (diffs look correct to me).  I think the bug can be closed.
Comment 14 Mark Linimon freebsd_committer freebsd_triage 2016-04-04 07:32:59 UTC
Already committed by kib.