The EAR Daemon (EARD) is a per-node process that provides privileged metrics of each node as well as a periodic power monitoring service. Said periodic power metrics can be sent to EAR's database directly, via the EAR Database Daemon (EARDBD) or by using some of the provided report plug-ins.
See the EARDBD section and the configuration page for more information about the EAR Database Manager and how to to configure the EARD to send its collected data to it.
The node Daemon is the component in charge of providing any kind of services that requires privileged capabilities. Current version is conceived as an external process executed with root privileges.
The EARD provides the following services, each one covered by one thread:
If using the EAR Database as the storage targe, EARD connects with EARDBD service, that has to be up before starting the node daemon, otherwise values reported by EARD to be stored in the database, will be lost.
The EAR Daemon uses the $(EAR_ETC)/ear/ear.conf
file to be configured. It can be dynamically configured by reloading the service.
Please visit the EAR configuration file page for more information about the options of EARD and other components.
To execute this component, these systemctl
command examples are provided:
sudo systemctl start eard
to start the EARD service.sudo systemctl stop eard
to stop the EARD service.sudo systemctl reload eard
to force reloading the configuration of the EARD service.Log messages are generated during the execution. Use journalctl
command to see eard message:
sudo journalctl -u eard -f
After executing a systemctl reload eard
command, not all the EARD options will be dynamically updated. The list of updated variables are:
To reconfigure other options such as EARD connection port, coefficients, etc., it must be stopped and restarted again. Visit the EAR configuration file page for more information about the options of EARD and other components.
The EAR Database Daemon (EARDBD) acts as an intermediate layer between any EAR component that inserts data and the EAR's Database, in order to prevent the database server from collapsing due to getting overrun with connections and insert queries.
The Database Manager caches records generated by the EAR Library and the EARD in the system and reports it to the centralized database. It is recommended to run several EARDBDs if the cluster is big enough in order to reduce the number of inserts and connections to the database.
Also, the EARDBD accumulates data during a period of time to decrease the total insertions in the database, helping the performance of big queries. By now just the energy metrics are available to accumulate in the new metric called energy aggregation. EARDBD uses periodic power metrics sent by the EARD, the per-node daemon, including job identification details (Job Id and Step Id when executed in a SLURM system).
The EAR Database Daemon uses the $(EAR_ETC)/ear/ear.conf
file to be configured. It can be dynamically configured by reloading the service.
Please visit the EAR configuration file page for more information about the options of EARDBD and other components.
To execute this component, these systemctl
command examples are provided:
sudo systemctl start eardbd
to start the EARDBD service.sudo systemctl stop eardbd
to stop the EARDBD service.sudo systemctl reload eardbd
to force reloading the configuration of the EARDBD service.The EAR Global Manager Daemon (EARGMD) is a cluster wide component offering cluster energy monitoring and capping. EARGM can work in two modes: manual and automatic. When running in manual mode, EARGM monitors the total energy consumption, evaluates the percentage of energy consumption over the energy limit set by the admin and reports the cluster status to the DB. When running in automatic mode, apart from evaluating the energy consumption percentage it sends the evaluation to computing nodes. EARDs passes these messages to EARL which re-applies the energy policy with the new settings.
Apart from sending messages and reporting the energy consumption to the DB, EARGM offers additional features to notify the energy consumption: automatic execution of commands is supported and mails can also automatically be sent. Both the command to be executed or the mail address can be defined in the ear.conf
, where it can also be specified the energy limits, the monitoring period, etc.
EARGM uses periodic aggregated power metrics to efficiently compute the cluster energy consumption. Aggregated metrics are computed by EARDBD based on power metrics reported by EARD, the per-node daemon.
Note: if you have multiple EARGMs running, only 1 should be used for Energy management. To turn off energy management for a certain EARGM simply set its energy value to 0.
EARGM also includes an optional power capping system. Power capping can work in two different ways:
Furthermore, when using fine grained power cap control it is possible to have multiple EARGMs, each controlling a part of the cluster, with (or without) meta-EARGMs redistributing the power allocation of each EARGM depending on the current needs of each part of the cluster. If no meta-EARGMs are specified, the power value each EARGM has will be static.
Meta-EARGMs are NOT compatible with the unlimited cluster powercap mode.
The EAR Global Manager uses the $(EAR_ETC)/ear/ear.conf
file to be configured. It can be dynamically configured by reloading the service.
Please visit the EAR configuration file page for more information about the options of EARGM and other components.
Additonally, 2 EARGMs can be used in the same host by declaring the environment variable EARGMID to specify which EARGM configuration each should use. If said variable is not declared, all EARGMs in the same host will read the first entry.
To execute this component, these systemctl
command examples are provided:
sudo systemctl start eargmd
to start the EARGM service.sudo systemctl stop eargmd
to stop the EARGM service.sudo systemctl reload eargmd
to force reloading the configuration of the EARGM service.The EAR Library (EARL) is the core of the EAR package. The Library offers a lightweight and simple solution to select the optimal frequency for applications at runtime, with multiple power policies each with a different approach to find said frequency.
EARL uses the Daemon to read performance metrics and to send application data to EAR Database.
EARL is dynamically loaded next to the running applications by the EAR Loader. The Loader detects whether the application is MPI or not. In case it is MPI, it also detects whether it is Intel or OpenMPI, and it intercepts the MPI symbols through the PMPI interface, and next symbols are saved in order to provide compatibility with MPI or other profiling tools. The Library is divided in several stages summarized in the following picture:
The loop signature is used to classify the application activity in different phases. The current EAR version supports the following phases for: IO bound, CPU computation and GPU idle, CPU busy waiting and GPU computing, CPU-GPU computation, and CPU computation (for CPU only nodes). For phases including CPU computation, the optimization policy is applied. For other phases, the EAR library implements some predefined CPU/Memory/GPU frequency settings.
Some specific configurations are modified when jobs are executed sharing nodes with other jobs. For example the memory frequency optiization is disabled. See section environment variables page for more information on how to tune the EAR library optimization using environment variables.
The Library uses the $(EAR_ETC)/ear.conf
file to be configured. Please visit the EAR configuration file page for more information about the options of EARL and other components.
EARL receives its specific settings through a shared memory regions initialized by EARD.
For information on how to run applications alongside with EARL read the User guide. Next section contains more information regarding EAR's optimisation policies.
EAR offers three energy policies plugins: min_energy
, min_time
and monitoring
. The last one is not a power policy, is used just for application monitoring where CPU frequency is not modified (neither memory or GPU frequency). For application analysis monitoring
can be used with specific CPU, memory and/or GPU frequencies.
The energy policy is selected by setting the --ear-policy=policy
option when submitting a SLURM job. A policy parameter, which is a particular value or threshold depending on the policy, can be set using the flag --ear-policy-th=value
. Its default value is defined in the configuration file, for more information check the configuration page for more information.
min_energy
The goal of this policy is to minimise the energy consumed with a limit to the performance degradation. This limit is is set in the SLURM --ear-policy-th
option or the configuration file. The min_energy
policy will select the optimal frequency that minimizes energy enforcing (performance degradation <= parameter). When executing with this policy, applications starts at default frequency(specified at ear.conf).
min_time
The goal of this policy is to improve the execution time while guaranteeing a minimum ratio between performance benefit and frequency increment that justifies the increased energy consumption from this frequency increment. The policy uses the SLURM parameter option mentioned above as a minimum efficiency threshold.
Example: if --ear-policy-th=0.75
, EAR will prevent scaling to upper frequencies if the ratio between performance gain and frequency gain do not improve at least 75% (PerfGain >= (FreqGain * threshold).
When launched with min_time
policy, applications start at a default frequency (defined at ear.conf
). Check the configuration page for more information.
Example: given a system with a nominal frequency of 2.3GHz and default P_STATE set to 3, an application executed with min_time
will start with frequency F\\\[i\\\]=2.0Ghz
(3 P_STATEs less than nominal). When application metrics are computed, the library will compute performance projection for F\\\[i+1\\\]
and will compute the performance_gain as shown in the Figure 1. If performance gain is greater or equal than threshold, the policy will check with the next performance projection F\\\[i+2\\\]
. If the performance gain computed is less than threshold, the policy will select the last frequency where the performance gain was enough, preventing the waste of energy.
Figure 1: min_time
uses the threshold value as the minimum value for the performance gain between F\\\[i\\\]
and F\\\[i+1\\\]
.
EAR offers a user API for applications. The current EAR version only offers two functions, one to read the accumulated energy and time and another to compute the difference between the two measurements.
int ear_connect()
int ear_energy(unsigned long \\\*energy_mj, unsigned long \\\*time_ms)
void ear_energy_diff(unsigned long ebegin, unsigned long eend, unsigned long \\\*ediff, unsigned long tbegin, unsigned long tend, unsigned long \\\*tdiff)
int ear_set_cpufreq(cpu_set_t \\\*mask,unsigned long cpufreq);
int ear_set_gpufreq(int gpu_id,unsigned long gpufreq)
int ear_set_gpufreq_list(int num_gpus,unsigned long \\\*gpufreqlist)
void ear_disconnect()
EAR's header file and library can be found at $EAR_INSTALL_PATH/include/ear.h and $EAR_INSTALL_PATH/lib/libEAR_api.so respectively. The following example reports the energy, time, and average power during that time for a simple loop including a sleep(5)
.
The EAR Loader is the responsible for loading the EAR Library. It is a small and lightweight library loaded by the EAR SLURM Plugin (through the LD_PRELOAD
environment variable) that identifies the user application and loads its corresponding EAR Library distribution.
The Loader detects the underlying application, identifying the MPI version (if used) and other minor details. With this information, the loader opens the suitable EAR Library version.
As can be read in the EARL page, depending on the MPI vendor the MPI types can be different, preventing any compatibility between distributions. For example, if the MPI distribution is OpenMPI, the EAR Loader will load the EAR Library compiled with the OpenMPI includes.
You can read the installation guide for more information about compiling and installing different EARL versions.
EAR SLURM plugin allows to dynamically load and configure the EAR library for the SLURM jobs (and steps), if the flag --ear=on
is set or if it is enabled by default. Additionally, it reports any jobs that start or end to the nodes' EARDs for accounting and monitoring purposes.
Visit the SLURM SPANK plugin section on the configuration page to set up properly the SLURM /etc/slurm/plugstack.conf
file.
You can find the complete list of EAR SLURM plugin accpeted parameters in the user guide.