Bug 257641 - hwpmc/libpmc needs to gain a notion of big.LITTLE
Summary: hwpmc/libpmc needs to gain a notion of big.LITTLE
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: Unspecified
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-08-05 17:35 UTC by Mitchell Horne
Modified: 2021-08-06 09:26 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mitchell Horne freebsd_committer freebsd_triage 2021-08-05 17:35:35 UTC
Some systems that FreeBSD supports contain a heterogeneous collection of CPUs. This is present in ARM's big.LITTLE chips, such as the rockpro64, and will be a feature of some next-generation x86 chips as well [1][2]. The PMC stack was written in a time before these heterogeneous systems, and thus the assumption of homogeneous support for performance monitoring capabilities among all cores in the system is ingrained. This is stated explicitly in the hwpmc(4) man page under IMPLEMENTATION NOTES.

In the case of the rockpro64/RK3399, it contains four Cortex-a53 cores and two larger Cortex-a72 cores. There is some overlap of supported performance events between the two types, but some events that are unique to each. This poses problems that hwpmc is not currently equipped to deal with.

The first problem to solve is CPU reporting. There are two ways this is communicated from the kernel to libpmc, via the kern.hwpmc.cpuid sysctl and the PMC_OP_GETCPUINFO operation on the hwpmc syscall. Neither of these methods make a distinction between different CPUs in the system, so the value received by userspace basically depends on which CPU does the initialization of the hwpmc module. This somehow needs to become a per-CPU value, in order to properly detect which events are supported on a given core.

Assuming this is solved, the basic high-level behaviour will depend on the type of PMC being allocated:

System-scope PMCs:
Allocating a system-scope counter with e.g. pmcstat -s <event> will attempt to allocate the event on every CPU in the system. If the allocation fails for any CPU, the command will not proceed with any measurement. This has reasonable behaviour on a heterogeneous system, where the user needs to either pick an event that is compatible with all CPUs, or use the -c flag to qualify the selected CPUs.

Process-scope PMCs:
Allocating a process-scope counter is slightly more problematic. Suppose a PMC counter is allocated on CPU A, where the target process is running and the requested event is supported. If the process is migrated to CPU B, which differs from A, then attempting to resume the hardware counter could start measuring an entirely different event, if the programmed value is valid at all. 

I see two possible ways to solve this: don't allow PMC-enabled processes (curproc->p_flag & P_HWPMC) to migrate outside of their PMC-compatible cluster, OR, have libpmc call cpuset(3) for the process, and bind it to compatible CPUs for the duration of the measurement. I have not thought through either of these approaches in detail, but both require building some list of "PMC-compatible" CPU groups/clusters in the kernel.



[1] https://www.cnx-software.com/2021/07/10/intel-alder-lake-hybrid-mobile-processor-family-to-range-from-5w-to-55w-tdp/
[2] https://www.tomshardware.com/news/amd-patent-hybrid-cpu-rival-intel-raptor-lake-cpu
Comment 1 Stefan Eßer freebsd_committer freebsd_triage 2021-08-06 09:26:31 UTC
Maybe the best solution is to just have a per-core check for validity of the requested PMC and verify that it is available on *all* CPUs in the applicable cpuset. You can deal with both system-scope and process-scope PMCs in exactly the same way, then.

This will obviously require a list of supported PMC register number ranges per architecture attached to the per-core data. By formalizing a format (e.g. a range of start,end values) the check could become MI.

For process-scope PMCs the user has to explicitly specify a cpuset that excludes the cores that do not support the PMC, giving him full control over the measured setup. If the process is not bound only to cores that support the PMC, the request must be rejected. 

An implicit cpuset() to limit the process-scope PMC to use just those cores that support some particular PMC might give surprising results, since the user might compare different runs with different PMCs without being aware that some of them were measured on a limited set of cores and the others on all cores.

In the case of system-scope PMCs you may be able to request one PMC on cores that support it and another less optimal PMC on cores that don't. To support such a use case, the selection of cores to use for the measurement should again be explicit and based on core numbers (i.e., not just implicitly based on whether a core supports the requested PMC).

In either case I'd reject the request if not all selected cores (cpuset of the process to monitor, currently active cpuset, or cores selected by the -c option of pmcstat) support it.

It might be a good idea to somehow report the supported PMCs to the userland by means of dev.cpu sysctl variables. These could either identify the core architecture or just provide a list of supported PMC register numbers as a string (e.g. in the style of "1,5-10" or perhaps as a list of register names). That would make it possible to list the core numbers that allow some specific measurement, for example, without the user remembering all details of the CPU.