Bug 234325

Summary: pmcstat seems to be broken in sampling mode (at least on amd hardware)
Product: Base System Reporter: shamaz.mazum
Component: binAssignee: freebsd-bugs (Nobody) <bugs>
Status: New ---    
Severity: Affects Some People CC: cem, luporl
Priority: --- Keywords: regression
Version: 12.0-RELEASE   
Hardware: amd64   
OS: Any   

Description shamaz.mazum 2018-12-24 07:41:50 UTC
Hello. I use FreeBSD 12.0-RELEASE on two machines (one with Ryzen 5 1600X processor and the other with FX-6300).

Recently (presumable after upgrading to 12.0-RELEASE) pmcstat stopped working in sample mode.

I run it as:
`pmcstat -P instructions -O test.out -n 65536 ./noisecpu` on FX-6300
or
`pmcstat -P ex_ret_instr -O test.out -n 65536 ./noisecpu` on Ryzen

and then I run `pmcstat -R test.out -g`

In both cases it creates directories (e.g. ex_ret_instr) which are either empty or containing only kernel.gmon

`noisecpu` is computation heavy program, calculating value noise on a large grid and works for ~8 seconds.

Conversion statistics usually looks like following:
#exec/elf 1
#samples/total 169
#samples/unclaimed 165
#callchain/dubious-frames 165

In counting mode (with -p argument) everything seems to be working. Can anybody confirm this?
Comment 1 Leandro Lupori freebsd_committer freebsd_triage 2020-08-19 17:26:24 UTC
Hello. I'm using FreeBSD 13.0-CURRENT (r364197) on an amd64 VM (qemu with kvm) and I'm seeing similar issues. Host has a Core i7-6700K CPU.

pmcstat -S/-s/-p seems to work fine, but I've noticed several issues with pmcstat -P.

I started using sysutils/stress for the tests - that, with --cpu flag, basically calls sqrt(rand()) in a tight loop - but the only way to get it working under pmcstat -P was to build it statically, with debug symbols and modify it so it doesn't fork (I called this modified version mystress).
Then the following commands produce correct results:

pmcstat -n 500000 -d -P inst_retired.any -O sample.out /tmp/mystress.static
pmcstat -R sample.out -G sample.graph
CONVERSION STATISTICS:
 #exec/elf                                1
 #samples/total                           52068

Sample.graph correctly shows hogcpu(), rand() and random_r() as the functions that consumed most of the time.

But, when using a static version of stress, with debug symbols, but that forks, pmcstat fails to resolve all userspace callchains:

# case 1
pmcstat -n 500000 -d -P inst_retired.any -O sample.out ./stress.static -c 1 -t 3
pmcstat -R sample.out -G sample.graph
CONVERSION STATISTICS:
 #exec/elf                                1
 #samples/total                           49771
 #callchain/dubious-frames                49742

From a brief look at pmcstat code, it seems there is a missing PROCFORK event, to tell it about the forked child.

Another case in which pmcstat -P fails to resolve most symbols is when using a dynamically linked binary (mystress in this case, that doesn't fork):

# case 2
pmcstat -n 500000 -d -P inst_retired.any -O sample.out /tmp/mystress
pmcstat -R sample.out -G sample.graph
CONVERSION STATISTICS:
 #exec/elf                                1
 #samples/total                           52558
 #samples/unknown-function                5973
 #callchain/dubious-frames                38383

The main binary symbols are resolved correctly (e.g. hogcpu), but none of the shared libraries' ones are (e.g. rand).

Yet another case in which pmcstat -P fails to resolve symbols is when using it with -t to specify the process, instead of starting it on the command line:

# case 3
./mystress.static &
sleep 1
pmcstat -n 500000 -d -P inst_retired.any -O sample.out -t mystress
pmcstat -R sample.out -G sample.graph
CONVERSION STATISTICS:
 #samples/total                           4195
 #callchain/dubious-frames                4193

Sometimes no samples are collected at all.

All the issues above also occur on a PowerPC64 machine.