Bug 198149 - [hwpmc] pmcstat -P -t (top mode, process sampling) stops after a while
Summary: [hwpmc] pmcstat -P -t (top mode, process sampling) stops after a while
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-03-02 01:49 UTC by Adrian Chadd
Modified: 2015-06-02 13:32 UTC (History)
3 users (show)

See Also:


Attachments
threadspin.c (2.20 KB, patch)
2015-05-01 15:35 UTC, John Baldwin
no flags Details | Diff
first_hack.patch (541 bytes, patch)
2015-05-07 19:59 UTC, John Baldwin
no flags Details | Diff
second_hack.patch (571 bytes, patch)
2015-05-08 00:20 UTC, John Baldwin
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Adrian Chadd freebsd_committer freebsd_triage 2015-03-02 01:49:45 UTC
Hi,

Using pmcstat in top mode with system level sampling works fine, but process level sampling stops reporting events after a while. A while is typically a couple of seconds if you're lucky.

This is reproducable on -10 and -HEAD.

To reproduce:

* run something that's very busy - say, nginx with multiple worker processes
* run pmcstat -P CPU_CLK_UNHALTED_CORE -T -w 1 -t <nginx pid>
* wait until it stops providing events.
Comment 1 ganbold 2015-04-17 05:31:14 UTC
Maybe it is related to VM. When I run pmcstat (top mode) in VM it behaves same as Adrian observed.

However when I tried pmcstat in real hardware, sampling seems working providing events.
I tried both commands:

pmcstat -P CPU_CLK_UNHALTED_CORE -T -w 1 -t 1030
pmcstat -P INSTR_RETIRED_ANY -T -w 1 -t 1030

where 1030 is nginx worker process pid.

I'm running FreeBSD 11.0-CURRENT #4 r281075M: Sat Apr  4 22:27:29 ULAST 2015.
Comment 2 Adrian Chadd freebsd_committer freebsd_triage 2015-04-17 05:36:25 UTC
Hm, how many CPUs are in the real box?
Comment 3 ganbold 2015-04-17 05:38:43 UTC
% sysctl -a | grep ncpu
hw.ncpu: 2
% sysctl -a | grep smp
kern.smp.forward_signal_enabled: 1
kern.smp.topology: 0
kern.smp.cpus: 2
kern.smp.disabled: 0
kern.smp.active: 1
kern.smp.maxcpus: 32
kern.smp.maxid: 1
Comment 4 Adrian Chadd freebsd_committer freebsd_triage 2015-04-17 05:40:11 UTC
Do you have any boxes with >2 CPUs?
Comment 5 ganbold 2015-04-17 05:54:53 UTC
Works fine here in a box where:

FreeBSD 10.1-STABLE #0 r281519: Tue Apr 14 18:34:17 ULAT 2015
    tsgan@usec1:/usr/obj/usr/src/sys/GENERIC amd64
FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512
CPU: Intel(R) Xeon(R) CPU           X5670  @ 2.93GHz (2926.06-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x206c2  Family=0x6  Model=0x2c  Stepping=2
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x29ee3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,POPCNT,AESNI>
  AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
  AMD Features2=0x1<LAHF>
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
  TSC: P-state invariant, performance statistics
real memory  = 8589934592 (8192 MB)
avail memory = 8257982464 (7875 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <DELL   PE_SC3  >
FreeBSD/SMP: Multiprocessor System Detected: 12 CPUs
FreeBSD/SMP: 1 package(s) x 6 core(s) x 2 SMT threads
 cpu0 (BSP): APIC ID: 32
 cpu1 (AP): APIC ID: 33
 cpu2 (AP): APIC ID: 34
 cpu3 (AP): APIC ID: 35
 cpu4 (AP): APIC ID: 36
 cpu5 (AP): APIC ID: 37
 cpu6 (AP): APIC ID: 48
 cpu7 (AP): APIC ID: 49
 cpu8 (AP): APIC ID: 50
 cpu9 (AP): APIC ID: 51
 cpu10 (AP): APIC ID: 52
 cpu11 (AP): APIC ID: 53
Comment 6 Adrian Chadd freebsd_committer freebsd_triage 2015-04-21 01:52:06 UTC
ok, try something multi-threaded  and doing system calls. I just tried with nginx and it's okay - it's all one thread per process.
Comment 7 ganbold 2015-04-21 03:05:08 UTC
I tried to reproduce it with asterisk in same multicore machine and put some load on asterisk:

last pid:  1296;  load averages:  7.44,  5.22,  3.17                                                                                                   up 0+00:28:00  12:01:12
51 processes:  2 running, 49 sleeping
CPU: 31.9% user,  0.0% nice, 21.8% system,  1.8% interrupt, 44.5% idle
Mem: 300M Active, 443M Inact, 252M Wired, 90M Buf, 6933M Free
Swap: 16G Total, 16G Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
 1275 root       1059  20    0  1472M   584M select  3  14:30 217.48% asterisk
 1280 root          1  24    0 33100K 12692K CPU3    3   0:06   1.27% pmcstat
  721 tsgan         1  20    0 86488K  7060K select  7   0:03   0.00% sshd

And I still see pmcstat shows in top mode:

PMC: [CPU_CLK_UNHALTED_CORE] Samples: 24904 (100.0%) , 2906 unresolved

%SAMP IMAGE      FUNCTION             CALLERS
  7.0 kernel     _mtx_lock_spin_cooki pmclog_reserve:6.4 sched_add:0.6
  3.4 kernel     __mtx_lock_sleep     umtxq_sleep:1.3 __umtx_op_wake2_umutex:1.2 do_lock_umutex:0.6
  3.3 libgsm.so. 0xad72               Gsm_Short_Term_Synthesis_Filt
  2.4 kernel     cpu_search_lowest    cpu_search_lowest
  2.3 libgsm.so. 0xad37               Gsm_Short_Term_Synthesis_Filt
  1.8 libgsm.so. 0xad19               Gsm_Short_Term_Synthesis_Filt
  1.6 libgsm.so. 0xad30               Gsm_Short_Term_Synthesis_Filt
  1.6 libgsm.so. 0xad69               Gsm_Short_Term_Synthesis_Filt
  1.6 libgsm.so. 0xace7               Gsm_Short_Term_Synthesis_Filt
  1.4 libgsm.so. Gsm_Decoder          gsm_decode
  1.3 libgsm.so. 0xad53               Gsm_Short_Term_Synthesis_Filt
  1.3 libgsm.so. Gsm_Long_Term_Synthe Gsm_Decoder
  1.2 hwpmc.ko   pmc_hook_handler     sched_switch
  1.1 libgsm.so. 0xad4b               Gsm_Short_Term_Synthesis_Filt
  1.1 codec_ulaw lintoulaw_framein
  1.0 libgsm.so. Gsm_Short_Term_Synth Gsm_Decoder
  1.0 kernel     Xrendezvous
  1.0 libgsm.so. 0xad33               Gsm_Short_Term_Synthesis_Filt
  1.0 libgsm.so. 0xad3f               Gsm_Short_Term_Synthesis_Filt
  1.0 libgsm.so. 0xace0               Gsm_Short_Term_Synthesis_Filt
  1.0 libgsm.so. 0xacb0               Gsm_Short_Term_Synthesis_Filt
  0.9 libgsm.so. 0xacca               Gsm_Short_Term_Synthesis_Filt
  0.9 kernel     do_lock_umutex       __umtx_op_wait_umutex
  0.9 libgsm.so. 0xacef               Gsm_Short_Term_Synthesis_Filt
  0.9 libgsm.so. 0xad11               Gsm_Short_Term_Synthesis_Filt
  0.8 libgsm.so. 0xad00               Gsm_Short_Term_Synthesis_Filt
  0.8 kernel     amd64_syscall
  0.8 kernel     in_pcbconnect_setup  udp_send
  0.8 libgsm.so. 0xacfa               Gsm_Short_Term_Synthesis_Filt
  0.8 libgsm.so. 0xad06               Gsm_Short_Term_Synthesis_Filt
  0.7 libgsm.so. Gsm_RPE_Decoding     Gsm_Decoder
  0.7 kernel     thread_lock_flags_
  0.7 libc.so.7  memcpy
  0.6 kernel     __umtx_op_wake2_umut amd64_syscall
  0.6 kernel     __rw_rlock
  0.6 libgsm.so. 0xad43               Gsm_Short_Term_Synthesis_Filt
...
Comment 8 John Baldwin freebsd_committer freebsd_triage 2015-05-01 15:34:46 UTC
I can reproduce this trivially on real hardware.  I use a dummy test app (threadspin.c).  On my x220 laptop, this usually breaks in a few seconds.  Same on a server (fox2 in the netperf lab).  If I run it in a reduced cpuset to limit the number of CPUs then pmcstat runs longer before it stops working.  Pinning the threads to specific CPUs does not fix the issue either.  I looked via kgdb at the software state of hwpmc when pmcstat stops reporting samples and it seems to think that the PMC is enabled and should be working.  However, it seems that the PMC is no longer generating interrupts.

I've ported all of the PMC debug stuff over to KTR and added a hack so that KTR auto-disables once a polling sample request fails.  When run with all CPUs there is simply too much noise in the ktrdump and I can't find the last sample in my dump (it seems that the dump only contains events logged after interrupts stopped working).  I tried restricting to a smaller set of CPUs but that failed to break.  My plan is to try to find the smallest set of CPUs that do break while KTR is active and then see if I can tease anything out from the ktrdump.

The changes to pmc to use KTR can be found at https://github.com/freebsd/freebsd/compare/master...bsdjhb:pr_198149
Comment 9 John Baldwin freebsd_committer freebsd_triage 2015-05-01 15:35:31 UTC
Created attachment 156213 [details]
threadspin.c
Comment 10 John Baldwin freebsd_committer freebsd_triage 2015-05-07 18:10:21 UTC
So I think I somewhat understand what is wrong.  I'm not yet sure how to fix it. What seems to happen is that on a context switch out, the read_pmc operation is returning a very large value.  The result of this is that the PMC gets set to a value large enough that it won't expire during the next slice.  This error gets recompounded on every switch out/in and the PMC stops firing as a result.  Some snippets of KTR traces show the error in action:

238280   1  268934513552654 MDP:REA:1: iaf-read cpu=1 ri=2 msr=0x40000002 -> v=5dd31
238271   1  268934513429846 MDP:WRI:1: iaf-write cpu=1 ri=2 msr=0x40000002 v=4fffffff855b6 iafctrl=0 pmc=fffffff855b6
238262   1  268934513342102 MDP:REA:1: iaf-read cpu=1 ri=2 msr=0x40000002 -> v=2f2c
238247   1  268934510388202 MDP:WRI:1: iaf-write cpu=1 ri=2 msr=0x40000002 v=2fffffffc9345 iafctrl=0 pmc=fffffffc9345
238238   1  268934510294742 MDP:REA:1: iaf-read cpu=1 ri=2 msr=0x40000002 -> v=4125
238229   1  268934510220562 MDP:WRI:1: iaf-write cpu=1 ri=2 msr=0x40000002 v=1fffffffea910 iafctrl=0 pmc=fffffffea910
238220   1  268934510132922 MDP:REA:1: iaf-read cpu=1 ri=2 msr=0x40000002 -> v=fffffffffffe
238211   1  268934510048862 MDP:WRI:1: iaf-write cpu=1 ri=2 msr=0x40000002 v=ffffffffd489 iafctrl=0 pmc=ffffffffd489
238202   1  268934509967030 MDP:REA:1: iaf-read cpu=1 ri=2 msr=0x40000002 -> v=b34d
238193   1  268934509880238 MDP:WRI:1: iaf-write cpu=1 ri=2 msr=0x40000002 v=ffffffff109e iafctrl=0 pmc=ffffffff109e
238184   1  268934509789534 MDP:REA:1: iaf-read cpu=1 ri=2 msr=0x40000002 -> v=e848
238175   1  268934509749902 MDP:WRI:1: iaf-write cpu=1 ri=2 msr=0x40000002 v=ffffffff942b iafctrl=0 pmc=ffffffff942b
238166   1  268934509673986 MDP:REA:1: iaf-read cpu=1 ri=2 msr=0x40000002 -> v=62dd
238157   1  268934508267090 MDP:WRI:1: iaf-write cpu=1 ri=2 msr=0x40000002 v=ffffffff18a7 iafctrl=0 pmc=ffffffff18a7
238148   1  268934508103386 MDP:REA:1: iaf-read cpu=1 ri=2 msr=0x40000002 -> v=6d25

The error occurs at event 238220 when "-2" is converted to a large unsigned value.  After this point, the PMC is programmed with progressively larger and larger values on each switch in and never fires again.

By the end of the trace when I killed my test program it was quite far off:

116541   1  268945752955406 MDP:WRI:1: iaf-write cpu=1 ri=2 msr=0x40000002 v=ffed2e3cfd33 iafctrl=0 pmc=ffed2e3cfd33
116448   1  268945715324794 MDP:WRI:1: iaf-write cpu=1 ri=2 msr=0x40000002 v=ffef3dd2e030 iafctrl=0 pmc=ffef3dd2e030
116337   1  268945421271906 MDP:WRI:1: iaf-write cpu=1 ri=2 msr=0x40000002 v=fff2cff21be4 iafctrl=0 pmc=fff2cff21be4
116321   1  268945168850926 MDP:WRI:1: iaf-write cpu=1 ri=2 msr=0x40000002 v=fff2defc2fec iafctrl=0 pmc=fff2defc2fec
116276   1  268944964260070 MDP:WRI:1: iaf-write cpu=1 ri=2 msr=0x40000002 v=fff3210cd42b iafctrl=0 pmc=fff3210cd42b
116241   1  268944442945530 MDP:WRI:1: iaf-write cpu=1 ri=2 msr=0x40000002 v=fff353fadde0 iafctrl=0 pmc=fff353fadde0
116207   1  268944442823210 MDP:WRI:1: iaf-write cpu=1 ri=2 msr=0x40000002 v=fff3fa4fc2e3 iafctrl=0 pmc=fff3fa4fc2e3
...

I'm not really sure where the error is.  I think it might be that iap_perfctr_value_to_reload_count needs to sign extend its return value so it can return -2 as the value of the PMC in this case instead of what it returned.  Note that this seems specific to hwpmc_core.c.  hwpmc_amd.c uses a different approach.  It sign extends the value it reads from the PMC first and then negates it (which would have returned -2 in this case).
Comment 11 John Baldwin freebsd_committer freebsd_triage 2015-05-07 19:59:05 UTC
Created attachment 156477 [details]
first_hack.patch

I tried this change which would cause iaf_read_pmc() to return -2 in this case instead of a large number.  I added some additional debugging so I could verify this was being triggered in my testing.  It did seem to make things last longer, but in at least one of my test runs I still saw pmcstat eventually stop getting samples.
Comment 12 John Baldwin freebsd_committer freebsd_triage 2015-05-07 20:34:54 UTC
Finally snagged the hang with the first hack in place.  By the end of the trace all the PMCs have super high thresholds again, but here is where it went wrong:

211497   1  305651558868602 MDP:WRI:1: iaf-write cpu=1 ri=2 msr=0x40000002 v=ffffffdaa2a0 iafctrl=0 pmc=ffffffdaa2a0
211490   1  305651558791430 MDP:REA:1: iaf-read cpu=1 ri=2 msr=0x40000002 -> v=25ec32
211481   1  305651558772962 MDP:WRI:1: iaf-write cpu=1 ri=2 msr=0x40000002 v=ffffffd9db37 iafctrl=0 pmc=ffffffd9db37
211474   1  305651558684430 MDP:REA:1: iaf-read cpu=1 ri=2 msr=0x40000002 -> v=ffffffffffd8db37
211465   1  305651556116174 MDP:WRI:1: iaf-write cpu=1 ri=2 msr=0x40000002 v=1000000000000 iafctrl=0 pmc=0
211458   1  305651555926610 MDP:REA:1: iaf-read cpu=1 ri=2 msr=0x40000002 -> v=c220
211449   1  305651555915314 MDP:WRI:1: iaf-write cpu=1 ri=2 msr=0x40000002 v=ffffffff1ef0 iafctrl=0 pmc=ffffffff1ef0
211442   1  305651555647630 MDP:REA:1: iaf-read cpu=1 ri=2 msr=0x40000002 -> v=97d8
211433   1  305651555563074 MDP:WRI:1: iaf-write cpu=1 ri=2 msr=0x40000002 v=ffffffff438c iafctrl=0 pmc=ffffffff438c

In particular, after writing 0 to the pmc in 211465, things went downhill.  I'm still looking in the debug logs, but one thing is that we definitely can't put a value of 0 into a PMC.  We should possibly force it to -1 in this case so that the interrupt triggers on the next event in that case?
Comment 13 John Baldwin freebsd_committer freebsd_triage 2015-05-08 00:20:36 UTC
Created attachment 156483 [details]
second_hack.patch

This prevents the PMC value from being saved as zero.  When it hits zero during a switch out, this will now wrap back up to ask for a full batch of events before the next interrupt.
Comment 14 John Baldwin freebsd_committer freebsd_triage 2015-05-08 10:58:46 UTC
This survived an overnight run with pmcstat still getting samples this morning.  I added debugging to print a message each time one of these fixes was applied (and in the case of the first patch, I outputted the raw value of the PMC).  Both of these conditions fired fairly consistently during the test (once every few seconds or so).  In addition, when I had run with just the first patch, I had seen raw PMC counter values that in my debug messages that could be a bit large, for example:

CPU 1: counter overflowed: 87516
CPU 1: counter overflowed: 22
CPU 12: counter overflowed: 2
CPU 1: counter overflowed: 2
CPU 1: counter overflowed: 13629
CPU 1: counter overflowed: 2
CPU 1: counter overflowed: 2

With both patches applied I do not see "large" values, only small ones:

CPU 5: fixing zero PMC
CPU 1: counter overflowed: 20
CPU 1: fixing zero PMC
CPU 1: counter overflowed: 2
CPU 1: fixing zero PMC
CPU 1: fixing zero PMC
CPU 15: fixing zero PMC
CPU 1: fixing zero PMC
CPU 1: counter overflowed: 4
CPU 1: counter overflowed: 2
CPU 1: counter overflowed: 2
CPU 1: fixing zero PMC
CPU 1: counter overflowed: 2
CPU 1: fixing zero PMC
CPU 1: counter overflowed: 2
CPU 8: fixing zero PMC
CPU 1: counter overflowed: 2
CPU 1: counter overflowed: 2
CPU 1: counter overflowed: 2
CPU 2: counter overflowed: 2
CPU 9: counter overflowed: 2
CPU 3: fixing zero PMC
CPU 8: fixing zero PMC
CPU 1: counter overflowed: 22
CPU 1: fixing zero PMC
CPU 1: fixing zero PMC
CPU 1: counter overflowed: 2
CPU 1: counter overflowed: 27
CPU 1: counter overflowed: 5
CPU 1: fixing zero PMC
.....

I had wondered if the second bug (writing a PMC value of zero) could have been the source of the second bug (you can see how it would easily trigger it: if you write a PMC of zero and the event happens 2 times before your next context switch you would have a raw value of "2" when you switched out).  However, whilt it seems to have fixed some of them (the "large" ones) it does not seem to have fixed all of them.  I definitely think the second fix is probably legit (and has been present since sampling was added to PMC).  I think the first change is also technically correct, but I'm not sure why we are seeing those values.
Comment 15 John Baldwin freebsd_committer freebsd_triage 2015-05-08 19:34:29 UTC
So I don't have a good explanation of why the counters overflow without generating an interrupt, however, it does seem like the interrupt is lost, or perhaps there is a race where the counter can be disabled right at about the time that it should fire, but the counter might still count a few events after it has been disabled.  In particular, I captured this sequence of events (filtered to only show CPU 1) where I got an overflow with a value of 2:

 72890   1  563885152232313 MDP:REA:2: iaf-read counter overflowed: 2
  2264   1  563885043095469 MDP:STO:1: iafctrl=0(0) globalctrl=0(0)
  2263   1  563885043094005 MDP:STO:1: iaf-stop iafctrl=0
  2262   1  563885043093777 MDP:STO:1: iaf-stop cpu=1 ri=2
  2256   1  563885043091809 CSW:SWO:1: cpu=1 proc=0xfffff8005698c538 (5697, threadspin) pp=0xfffff80052fea600
  2188   1  563885043029837 MDP:INT:1: cpu=1 fixedctrl=b00 globalctrl=400000000 status=0 ovf=0
  2187   1  563885043028445 MDP:INT:1: cpu=1 intrenable=400000000
  2186   1  563885043028165 MDP:INT:1: iaf-intr cpu=1 error=0 v=ffffffff0000(ffffffff0000)
  2185   1  563885043026045 MDP:INT:1: cpu=1 intrstatus=400000000
  2184   1  563885043025621 MDP:INT:1: cpu=1 tf=0x0xfffffe083a551f30 um=1
  2113   1  563885042960229 MDP:INT:1: cpu=1 fixedctrl=b00 globalctrl=400000000 status=0 ovf=0
  2112   1  563885042958865 MDP:INT:1: cpu=1 intrenable=400000000
  2111   1  563885042958665 MDP:INT:1: iaf-intr cpu=1 error=0 v=ffffffff0000(ffffffff0000)
  2110   1  563885042957205 MDP:INT:1: cpu=1 intrstatus=400000000
  2109   1  563885042956765 MDP:INT:1: cpu=1 tf=0x0xfffffe083a551f30 um=0
  2108   1  563885042955049 MDP:SWI:1: cr4=0x406e0
  2107   1  563885042954861 MDP:SWI:1: pc=0xfffff8014d0e5300 pp=0xfffff80052fea600 enable-msr=0
  2106   1  563885042954569 MDP:STA:1: iafctrl=b00(b00) globalctrl=400000000(400000000)
  2103   1  563885042952493 MDP:STA:1: iaf-start cpu=1 ri=2
  2102   1  563885042952109 MDP:WRI:1: iaf-write cpu=1 ri=2 msr=0x40000002 v=fffffffff7d2 iafctrl=0 pmc=fffffffff7d2
  2094   1  563885042949821 CSW:SWI:1: cpu=1 ri=23 new=2094
  2087   1  563885042947433 CSW:SWI:1: cpu=1 proc=0xfffff8005698c538 (5697, threadspin) pp=0xfffff80052fea600

Here you can see that the interrupts for are occurring "about" every 75 KTR events or so, so the next one would have fired at around 2250.  That is when the switchout begins.  Note that there is a long gap between the call to iaf_stop() from pmc_process_csw_out() and the later call to iaf_read(), not just in KTR events, but also in terms of clock cycles (109136844 cycles on a 3.3 GHz CPU, so ~33 milliseconds assuming my math is correct).  All of the intervening PMC events are PMC interrupts on other CPUs during that time.  I'm actually not sure what CPU 1 could have been spinning on for that duration aside from the KTR index itself.

I don't see anything that should be blocking the PMC interrupt (an NMI) either, so I wonder if it's possible for the PMC to increment a few times after it has been disabled?  I do find that if I change my command to only monitor user events, then this goes away, so presumably the counter was still "running" and counting clock cycles in the PMC code itself when it overflowed.  Given that, I'm just inclined to apply the first hack patch as-is (though also fix the similar function for IAP).
Comment 16 commit-hook freebsd_committer freebsd_triage 2015-05-19 19:16:04 UTC
A commit references this bug:

Author: jhb
Date: Tue May 19 19:15:20 UTC 2015
New revision: 283123
URL: https://svnweb.freebsd.org/changeset/base/283123

Log:
  Fix two bugs that could result in PMC sampling effectively stopping.
  In both cases, the the effect of the bug was that a very small positive
  number was written to the counter. This means that a large number of
  events needed to occur before the next sampling interrupt would trigger.
  Even with very frequently occurring events like clock cycles wrapping all
  the way around could take a long time. Both bugs occurred when updating
  the saved reload count for an outgoing thread on a context switch.

  First, the counter-independent code compares the current reload count
  against the count set when the thread switched in and generates a delta
  to apply to the saved count. If this delta causes the reload counter
  to go negative, it would add a full reload interval to wrap it around to
  a positive value. The fix is to add the full reload interval if the
  resulting counter is zero.

  Second, occasionally the raw counter value read during a context switch
  has actually wrapped, but an interrupt has not yet triggered. In this
  case the existing logic would return a very large reload count (e.g.
  2^48 - 2 if the counter had overflowed by a count of 2). This was seen
  both for fixed-function and programmable counters on an E5-2643.
  Workaround this case by returning a reload count of zero.

  PR:		198149
  Differential Revision:	https://reviews.freebsd.org/D2557
  Reviewed by:	emaste
  MFC after:	1 week
  Sponsored by:	Norse Corp, Inc.

Changes:
  head/sys/dev/hwpmc/hwpmc_core.c
  head/sys/dev/hwpmc/hwpmc_mod.c
Comment 17 commit-hook freebsd_committer freebsd_triage 2015-06-01 18:09:23 UTC
A commit references this bug:

Author: jhb
Date: Mon Jun  1 18:08:58 UTC 2015
New revision: 283886
URL: https://svnweb.freebsd.org/changeset/base/283886

Log:
  MFC 283123:
  Fix two bugs that could result in PMC sampling effectively stopping.
  In both cases, the the effect of the bug was that a very small positive
  number was written to the counter. This means that a large number of
  events needed to occur before the next sampling interrupt would trigger.
  Even with very frequently occurring events like clock cycles wrapping all
  the way around could take a long time. Both bugs occurred when updating
  the saved reload count for an outgoing thread on a context switch.

  First, the counter-independent code compares the current reload count
  against the count set when the thread switched in and generates a delta
  to apply to the saved count. If this delta causes the reload counter
  to go negative, it would add a full reload interval to wrap it around to
  a positive value. The fix is to add the full reload interval if the
  resulting counter is zero.

  Second, occasionally the raw counter value read during a context switch
  has actually wrapped, but an interrupt has not yet triggered. In this
  case the existing logic would return a very large reload count (e.g.
  2^48 - 2 if the counter had overflowed by a count of 2). This was seen
  both for fixed-function and programmable counters on an E5-2643.
  Workaround this case by returning a reload count of zero.

  PR:		198149
  Sponsored by:	Norse Corp, Inc.

Changes:
_U  stable/10/
  stable/10/sys/dev/hwpmc/hwpmc_core.c
  stable/10/sys/dev/hwpmc/hwpmc_mod.c
_U  stable/9/sys/
_U  stable/9/sys/dev/
  stable/9/sys/dev/hwpmc/hwpmc_core.c
  stable/9/sys/dev/hwpmc/hwpmc_mod.c