hwpmc_intel writes the old value of a counter back to the counter value register during a context switch in. However, those registers ignore the upper 32 bits (EDX) and instead sign extended to the register width based on EAX[31]. This is the behavior on all machines not supporting "full width writes" (indicated by CPUID2_PDCM being set && and IA32_PERF_CAPABILITIES(0x345) & FW_WRITE(bit 13) being set). If full width writes are supported, full-width aliases for IA32_PMCn exist 0x400 MSRs higher (e.g. IA32_PMC0 == 0xC1 and IA32_A_PMC0 = 0x4C1). This affects processes monitoring CPU_CLK for processes running more than a second or two. Using this counter quickly runs into a "negative increment" assertion in hwpmc_mod.c because of the sign extension. Attached is the least invasive patch I could come up with and it's against 10.2-R. It also fixes a few debugging PMCDBGs and adjusts the size of PMC_DEFAULT_DEBUG_FLAGS to match the size of the structure.
There is no patch attached.
Created attachment 166832 [details] pmc sign extension diff
(In reply to Konstantin Belousov from comment #1) My apologies... attached now.
(In reply to joss.upton from comment #3) Could you please provide exact recipe to reproduce the overflow panic ? I tried e.g. pmcstat -P CPU_CLK_UNHALTED_CORE <some looping binary> on sandybridge and was unable to panic stock HEAD kernel.
(In reply to Konstantin Belousov from comment #4) I don't know if Sandy Bridge supports the aliases or not. I do know that vmware fusion on my Broadwell machine does NOT support the IA32_A_PMCn registers and thus will panic. Note you may also need other processes using the PMC to be scheduled/unscheduled to reproduce... You have to catch a case where it writes with EAX[31]=1 to IA32_PMCn. This results in an automatic sign extension (as described in the Intel arch manual, 18.2.2.3 Full-Width Writes to Performance Counter Registers) and ignores EDX completely.
(In reply to joss.upton from comment #5) Actually, the best test might be to do this: uint64_t num; num = rdmsr(0xc1); printf("init: %lx\n", num); wrmsr(0xc1, 0x80000000); num = rdmsr(0xc1); printf("after e31: %lx\n", num); When I do this: init: 0 after e31: ffff80000000 This behavior is correct according to the Intel manual, but causes issues with negative increments if you race it long enough.
(In reply to joss.upton from comment #6) I believe that the behaviour of reading MSR 0xc1 is architectural ? Anyway, on sandybridge: sandy% sudo cpucontrol -m 0xc1=0x80000000 /dev/cpuctl0 sandy% sudo cpucontrol -m 0xc1 /dev/cpuctl0 MSR 0xc1: 0x0000ffff 0x80000000 SB does have full-width PMC write capability: sandy% sudo cpucontrol -m 0x345 /dev/cpuctl0 MSR 0x345: 0x00000000 0x000031c3 bit 13 (FW_WRITE) is set But my concern is that I cannot reproduce the issue with the following script: for x in $(jot 8); do pmcstat -P CPU_CLK_UNHALTED_CORE perl ./loop.pl 2>/dev/null &; done Unpatched kernel must exhibit the problem because it does not write to PMC through aliases and all writes are clipped. I want to be able to reproduce this before commit, at least I need to validate the change and to understand that it is complete (I agree that this is the right approach, at least).
(In reply to Konstantin Belousov from comment #7) Not sure why you can't reproduce, but I'm doing the equivalent of the shell script below. Note the ",usr" and the lowercase p. On a 4 core machine: while true; do for a in 1 2 3 4 5 6 7 8; do pmcstat -p CPU_CLK_UNHALTED_CORE,usr perl loop.pl & done sleep 10 killall perl done This ensures that the PMCs are csw in/out very often (every hz).
A commit references this bug: Author: kib Date: Fri Feb 12 07:20:00 UTC 2016 New revision: 295558 URL: https://svnweb.freebsd.org/changeset/base/295558 Log: Remove tautological cast. PR: 207068 Submitted by: joss.upton@yahoo.com MFC after: 2 weeks Changes: head/sys/dev/hwpmc/hwpmc_mod.c
A commit references this bug: Author: kib Date: Fri Feb 12 07:20:27 UTC 2016 New revision: 295559 URL: https://svnweb.freebsd.org/changeset/base/295559 Log: Adjust the size of PMC_DEFAULT_DEBUG_FLAGS to match the size of the structure. PR: 207068 Submitted by: joss.upton@yahoo.com MFC after: 2 weeks Changes: head/sys/sys/pmc.h
A commit references this bug: Author: kib Date: Fri Feb 12 07:27:24 UTC 2016 New revision: 295560 URL: https://svnweb.freebsd.org/changeset/base/295560 Log: If full width writes to the performance monitoring counters are supported, use full-width aliases MSRs for writes. This fixes the "[pmc,X] negative increment" assertion on the context switch when clipped counter value is sign-extended. Add definitions for the MSR IA32_PERF_CAPABILITIES needed to detect the feature. PR: 207068 Submitted by: joss.upton@yahoo.com MFC after: 2 weeks Changes: head/sys/dev/hwpmc/hwpmc_core.c head/sys/dev/hwpmc/hwpmc_core.h
(In reply to Konstantin Belousov from comment #7) I still not able to reproduce the problem, not on sandy machine, not on the haswell box. Even more strange, sandy box does not report any counter increment for ,usr. Anyway, I do believe that your patch is the right thing to do, I committed it split into main part and two tweaks.
(In reply to Konstantin Belousov from comment #12) Thank you. I will integrate this back into my tree and retest (diffs look correct to me). I think the bug can be closed.
Already committed by kib.