Energy Aware Runtime (EAR) package provides an energy management framework for super computers. EAR contains different components, all together provide three main services:
1) A easy-to-use and lightweight optimizarion service to automatically select the optimal CPU frequency according to the application and the node characteristics. This service is provided by two components: the EAR library ( EARL) and the EAR daemon (EARD). EARL is a smart component which is loaded next to the application, intercepting MPI calls and selecting the CPU frequency based on the application behaviour on the fly. The library is loaded automatically through the EAR Loader (EARLO) and SLURM plugin (EARPLUG, earplug.so).
2) A complete energy and performance accounting and monitoring system based on SQL database (MariaDB and PostgreSQL are supported). The energy accounting system is configurable in terms of application details and update frequency. The EAR database daemon (EARDBD) is used to cache those metrics prior to DB insertions.
3) A global energy management to monitor and control the energy consumed in the system through the EAR global manager daemon (EARGMD). This control is configurable, it can dynamically adapt policy settings based on global energy limits or just offer global cluster monitoring.
Visit the architecture section for a detailed description of each of these components of EAR.
EAR is a open source software and it is licensed under both the BSD-3 license for individual/non-commercial use and EPL-1.0 license for commercial use. Full text of both licenses can be found in COPYING.BSD and COPYING.EPL files.
Contact: ear-support@bsc.es
With EAR's SLURM plugin, running an application with EAR is as easy as submitting a job with either srun
, sbatch
or mpirun
with SLURM. There are multiple configuration settings that can be set
to customize EAR's behaviour, which are explained below as well as examples on how to run applications with each method.
The following EAR options can be specified when running srun
and/or sbatch
, and are supported with srun
/sbatch
/salloc
:
Options | Description |
---|---|
--ear=on/off(**) | Enables/disables EAR library. |
--ear-policy=policy | Selects an energy policy for EAR. See the Policies page for more info |
--ear-cpufreq=frequency(*) | Specifies the starting frequency to be used by the chosen EAR policy (in KHz). |
--ear-policy-th=value(*) | Specifies the ear_threshold to be used by the chosen EAR policy {value=[0...1] }. |
--ear-user-db=file | Specifies the files where the user applications' metrics summary will be stored {'file.nodename.csv'}. If not defined, these files will not be created. |
--ear-verbose=value | Specifies the level of verbosity {value=[0...2]}; the default is 0. |
--ear-tag=tag | Selects an energy tag. |
--ear-learning=p_state(*) | Enables the learning phase for a given P_STATE {p_state=[1...n] }. |
For more information consult srun --help
output or see configuration options sections for more detailed description.
(*) Option requires ear privileges to be used. (**) Does not require ear privileges but values might be limited by EAR configuration.
When using sbacth/srun or salloc, Intel MPI and OpenMPI are 100% supported. When using mpi commands to start applications (mpirun, mpiexec.hydra etc), There are minor differences explained in the following examples.
srun
examplesEAR plugin reads srun
options and contacts with EARD. Invalid options are filtered to default values, so behaviour depends on system configuration.
srun -J test -N 1 -n 24 --tasks-per-node=24 application
srun --ear-verbose=1 -J test -N 1 -n 24 --tasks-per-node=24 application
srun --ear-cpufreq=2000000 --ear-policy=monitoring --ear-verbose=1 -J test -N 1 -n 24 --tasks-per-node=24 application
srun --ear-tag=memory-intensive --ear-verbose=1 -J test -N 1 -n 24 --tasks-per-node=24 application
sbatch
examplesWhen using sbatch
EAR options can be specified in the same way. If more than one srun is included in the job submission, EAR options can be inherited from sbatch
to the different srun
s or specifically
modified in each individual srun
.
The following example will set the ear verbose mode for all the job steps to 1. First job step will be executed with default settings and second one with monitoring as policy.
#!/bin/bash
#SBATCH -N 1
#SBATCH -e test.%j.err
#SBATCH -o test.%j.out
#SBTACH --ntasks=24
#SBATCH --tasks-per-node=24
#SBATCH --cpus-per-task=1
#SBATCH --ear-verbose=1
srun application
srun --ear-policy=monitoring application
mpirun
(in SLURM systems)When running EAR with mpirun
rather than srun
, we have to specify the utilisation of srun
as bootstrap. Otherwise jobs will not go through the SLURM plugin and any EAR options will not be recognised.
The API depends on Intel version. Versions older or equal than 2018 use two mpirun arguments to specify the bootstrap and extra SLURM flags (to be passed to the SLURM).
The following example will run application with min_time_to_solution policy:
mpirun -n 10 -bootstrap slurm -bootstrap-exec-args="--ear-policy=min_time” application
Version 2019 and newer offers two environment variables rather than mpirun arguments.
export I_MPI_HYDRA_BOOTSTRAP=slurm
export I_MPI_HYDRA_BOOTSTRAP_EXEC_EXTRA_ARGS="--ear-policy=monitoring --ear-verbose=1"
mpiexec.hydra -n 10 application
Bootstrap is an Intel® MPI option but not an OpenMPI option. For OpenMPI srun
must be used for an automatic EAR support, or use the erun
program explained below.
ERUN is a program that simulates all the SLURM and EAR SLURM Plugin pipeline. It comes with the EAR package and is compiled automatically. You can find it in in bin
folder in your installation path. It must be used when a
set of nodes does not have SLURM installed or when using OpenMPI mpirun
which does not contact with SLURM. You can launch ERUN instead directly your application. In example:
mpirun -n 4 /path/to/erun --program="hostname --alias"
In this example, MPIRUN would run 4 ERUN processes. Then, ERUN would launch the application hostname with its alias parameter. You can use as many parameters as you want but the semicolons have to cover all the parameters in case there are more than just the program name. ERUN would simulate in the remote node both the local and remote pipelines for all created processes. It has an internal system to avoid repeating functions that are executed just one time per job or node, like SLURM does with its plugins.
> erun --help
This is the list of ERUN parameters:
Usage: ./erun [OPTIONS]
Options:
--job-id=<arg> Set the JOB_ID.
--nodes=<arg> Sets the number of nodes.
--program=<arg> Sets the program to run.
--clean Removes the internal files.
SLURM options:
...
The --job-id
and --nodes
parameters, creates the environment variables that SLURM would have created automatically, because it is possible that your application make use of them. The --clean
option
removes the temporal files created to synchronize all ERUN processes.
Also you have to load the EAR environment module or define its environment variables in your environment or script:
Variable | Parameter |
---|---|
EAR_INSTALL_PATH=\<path> | prefix=\<path> |
EAR_TMP=\<path> | localstatedir=\<path> |
EAR_ETC=\<path> | sysconfdir=\<path> |
EAR_DEFAULT=\<on/off> | default=<on/off> |
Lastly, the tipical SLURM parameters can be passed to ERUN in the same way they were written to SRUN or SBATCH. In example:
mpirun -n 4 /path/to/erun --program="myapp" --ear-policy=monitoring --ear-verbose=2
The only command available to users is eacct
. With eacct
a user can see their previously executed jobs with the information that EAR monitors (time, average power, number of nodes and average frequency among others)
and a number of options to manipulate said output. Some data will not be available if a job is not executed with EARL.
Note that a user can only see their own applications/jobs unless they are a privileged user and specified as such in the ear.conf
configuration file.
For more information, check its Commands section.
EAR offers a user API for applications. The current EAR version only offers two functions, one to read the accumulated energy and time and another to compute the difference between the two measurements.
int ear_connect()
int ear_energy(unsigned long *energy_mj, unsigned long *time_ms)
void ear_energy_diff(unsigned long ebegin, unsigned long eend, unsigned long *ediff, unsigned long tbegin, unsigned long tend, unsigned long *tdiff)
void ear_disconnect()
EAR's header file and library can be found at $EAR_INSTALL_PATH/include/ear.h and $EAR_INSTALL_PATH/lib/libEAR_api.so respectively. The following example reports the energy, time, and average power during that time for a simple loop
including a sleep(5)
.
#include <ear.h>
int main(int argc,char *argv[])
{
unsigned long e_mj=0,t_ms=0,e_mj_init,t_ms_init,e_mj_end,t_ms_end=0;
unsigned long ej,emj,ts,tms,os,oms;
unsigned long ej_e,emj_e,ts_e,tms_e,os_e,oms_e;
int i=0;
struct tm *tstamp,*tstamp2,*tstamp3,*tstamp4;
char s[128],s2[128],s3[128],s4[128];
/* Connecting with ear */
if (ear_connect()!=EAR_SUCCESS)
{
printf("error connecting eard\n");
exit(1);
}
/* Reading energy */
if (ear_energy(&e_mj_init,&t_ms_init)!=EAR_SUCCESS)
{
printf("Error in ear_energy\n");
}
while(i<5)
{
sleep(5);
/* READING ENERGY */
if (ear_energy(&e_mj_end,&t_ms_end)!=EAR_SUCCESS)
{
printf("Error in ear_energy\n");
}
else
{
ts=t_ms_init/1000;
ts_e=t_ms_end/1000;
tstamp=localtime((time_t *)&ts);
strftime(s, sizeof(s), "%c", tstamp);
tstamp2=localtime((time_t *)&ts_e);
strftime(s2, sizeof(s), "%c", tstamp2);
printf("Start time %s End time %s\n",s,s2);
ear_energy_diff(e_mj_init,e_mj_end, &e_mj, t_ms_init,t_ms_end,&t_ms);
printf("Time consumed %lu (ms), energy consumed %lu(mJ),
Avg power %lf(W)\n",t_ms,e_mj,(double)e_mj/(double)t_ms);
e_mj_init=e_mj_end;
t_ms_init=t_ms_end;
}
i++;
}
ear_disconnect();
}
EAR offers three energy policies plugins: min_energy
, min_time
and monitoring
. The last one is not a power policy, is used just for application monitoring where CPU frequency is not modified.
The energy policy is selected by setting the --ear-policy=policy
option when submitting a SLURM job. A policy parameter, which is a particular value or threshold depending on the policy, can be set using the flag --ear-policy-th=value
.
Its default value is defined in the configuration file so please, check the configuration page for more information.
min_energy
The goal of this policy is to minimize the energy consumed with a limit to the performance degradation. This limit is is set in the SLURM option or the configuration file. The min_energy
policy will select the optimal frequency
that minimizes energy enforcing (performance degradation <= parameter). When executing with this policy, applications starts at nominal frequency.
PerfDegr = (CurrTime - PrevTime) / (PrevTime)
min_time
The goal of this policy is to improve the execution time while guaranteeing a minimum ratio between performance benefit and frequency increment that justifies the increased energy consumption from said frequency increment. The policy uses the parameter option as a minimum efficiency threshold.
Example: if --ear-policy-th=0.75
, EAR will prevent scaling to upper frequencies if the ratio between performance gain and frequency gain do not improve at least 75% (PerfGain >= (FreqGain * threshold).
PerfGain=(PrevTime-CurrTime)/PrevTime
FreqGain=(CurFreq-PrevFreq)/PrevFreq
When executed with min_time
policy, applications starts at a default predefined frequency lower than nominal (defined at ear.conf
, check the configuration page for more information.
Example: given a system with a nominal frequency of 2.3GHz and default frequency set to 3, an application executed with min_time
will start with frequency F[i]=2.0Ghz
(3 P_STATEs less than nominal).
When application metrics are computed, the library will compute performance projection for F[i+1]
and will compute the performance_gain as shown in the Figure 1. If performance gain is greater or equal than threshold,
the policy will check with the next performance projection F[i+2]
. If the performance gain computed is less than threashold, the policy will select the last frequency where the performance gain was enough, preventing the
waste of energy.
Figure 1: min_time
uses threashold as the minimum value for the performance gain between between F[i]
and F[i+1]
.
EAR offers the following commands:
All these commands read the EAR configurarion file (ear.conf) to determine if the user is an authorized (or not user). Root is a special case, it doesn't need to be included in the list of authorized users. Some options are disables when the user is not authorized.
The eacct command shows accounting information stored in the EAR DB for jobs (and step) IDs. The command uses EAR's configuration file to determine if the user running it is privileged or not, as non-privileged users can only access their information. It provides the following options.
Usage: eacct [Optional parameters]
Optional parameters:
-h displays this message
-v verbose mode for debugging purposes
-u specifies the user whose applications will be retrieved. Only available to privileged users. [default: all users]
-j specifies the job id and step id to retrieve with the format [jobid.stepid] or the format [jobid1,jobid2,...,jobid_n].
A user can only retrieve its own jobs unless said user is privileged. [default: all jobs]
-c specifies the file where the output will be stored in CSV format. [default: no file]
-t specifies the energy_tag of the jobs that will be retrieved. [default: all tags].
-l shows the information for each node for each job instead of the global statistics for said job.
-x shows the last EAR events. Nodes, job ids, and step ids can be specified as if were showing job information.
-r shows the EAR loop signatures. Nodes, job ids, and step ids can be specified as if were showing job information.
-n specifies the number of jobs to be shown, starting from the most recent one. [default: 20][to get all jobs use -n all]
-f specifies the file where the user-database can be found. If this option is used, the information will be read from the file and not the database.
Job 31191 corresponds with the execution of the bqcd application with 6 job steps. When executing eacct -j 31191 we will get the following output:
[user@host EAR]$ eacct -j 31191
JOB-STEP USER APPLICATION POLICY NODES# FREQ(GHz) TIME(s) POWER(Watts) GBS CPI ENERGY(J) GFLOPS/WATT MAX POWER(W)
31191-5 user bqcd_cpu ME 49 2.24 404.57 217.25 4.19 1.02 4306747.97 0.12 252.56
31191-4 user bqcd_cpu ME 50 2.27 398.38 229.09 4.26 1.00 4563306.92 0.12 251.49
31191-3 user bqcd_cpu ME 50 2.28 394.89 230.84 4.30 0.98 4557703.38 0.12 277.92
Columns shown are: job id.stepid, username, application name, policy (NP means EAR Library was not loaded)), number of nodes, average frequency, execution time, average power, GBs, Cycles per instruction (CP),energy, GFlops/Watt and Maximum Power.
The ereport command creates reports from the energy accounting data from nodes stored in the EAR DB. It is intended to use for energy consumption analysis over a set period of time, with some additional (optional) criteria such as node name or username.
Usage: ereport [options]
Options are as follows:
-s start_time : indicates the start of the period from which the energy consumed will be computed. Format: YYYY-MM-DD. Default 1970-01-01.
-e end_time: indicates the end of the period from which the energy consumed will be computed. Format: YYYY-MM-DD. Default: current time.
-n node_name |all : indicates from which node the energy will be computed. Default: none (all nodes computed) 'all' option shows all users individually, not aggregated.
-u user_name |all : requests the energy consumed by a user in the selected period of time. Default: none (all users computed). 'all' option shows all users individually, not aggregated.
-t energy_tag|all : requests the energy consumed by energy tag in the selected period of time. Default: none (all tags computed). 'all' option shows all tags individually, not aggregated.
-i eardbd_name|all : indicates from which eardbd (island) the energy will be computed. Default: none (all islands computed) 'all' option shows all eardbds individually, not aggregated.
-g : shows the contents of EAR's database Global_energy table. The default option will show the records for the two previous T2 periods of EARGM. This option can only be modified with -s, not -e
-x : shows the daemon events from -s to -e. If no time frame is specified, it shows the last 20 events.
-h : shows this message.
The following example uses the 'all' nodes option to display information for each node, as well as a start_time so it will give the accumulated energy from that point moment until the current time.
[user@host EAR]$ ereport -n all -s 2018-09-18
Energy (J) Node Avg. Power (W)
20668697 node1 146
20305667 node2 144
20435720 node3 145
20050422 node4 142
20384664 node5 144
20432626 node6 145
18029624 node7 128
This example filters by EARDBD host (one per island typically) instead:
[user@host EAR]$ ereport -s 2019-05-19 -i all
Energy (J) Node
9356791387 island1
30475201705 island2
37814151095 island3
28573716711 island4
29700149501 island5
26342209716 island6
And to see the state of the custer's energy budget (set by the sysadmin) you can use the following:
[user@host EAR]$ ereport -g
Energy% Warning lvl Timestamp INC th p_state ENERGY T1 ENERGY T2 TIME T1 TIME T2 LIMIT POLICY
111.486 100 2019-05-22 10:31:34 0 100 893 1011400 907200 600 604800 EnergyBudget
111.492 100 2019-05-22 10:21:34 0 100 859 1011456 907200 600 604800 EnergyBudget
111.501 100 2019-05-22 10:11:34 0 100 862 1011533 907200 600 604800 EnergyBudget
111.514 100 2019-05-22 10:01:34 0 100 842 1011658 907200 600 604800 EnergyBudget
111.532 100 2019-05-22 09:51:34 0 100 828 1011817 907200 600 604800 EnergyBudget
111.554 0 2019-05-22 09:41:34 0 0 837 1012019 907200 600 604800 EnergyBudget
The econtrol command modifies cluster settings (temporally) related to power policy settings. These options are sent to all the nodes in the cluster.
NOTE: Any changes done with econtrol
will not be reflected in ear.conf
and thus will be lost when reloading the system.
Usage: econtrol [options]
--set-freq newfreq ->sets the frequency of all nodes to the requested one
--set-def-freq newfreq policy_name ->sets the default frequency for the selected policy
--set-max-freq newfreq ->sets the maximum frequency
--inc-th new_th policy_name ->increases the threshold for all nodes
--set-th new_th policy_name ->sets the threshold for all nodes
--red-def-freq n_pstates ->reduces the default and max frequency by n pstates
--restore-conf ->restores the configuration to all node
--status ->requests the current status for all nodes. The ones responding show the current
power, IP address and policy configuration. A list with the ones not
responding is provided with their hostnames and IP address.
--status=node_name retrieves the status of that node individually.
--ping ->pings all nodes to check wether the nodes are up or not. Additionally,
--ping=node_name pings that node individually.
--help ->displays this message.
Creates the EAR DB used for accounting and for the global energy control. Requires root access to the MySQL server. It reads the ear.conf to get connection details (server IP and port), DB name (which may or may not have been previously created) and EAR's default users (which will be created or altered to have the necessary privileges on EAR's database).
Usage:edb_create [options]
-p Specify the password for MySQL's root user.
-o Outputs the commands that would run.
-r Runs the program. If '-o' this option will be override.
-h Shows this message.
Cleans periodic metrics from the database. Useful to reduce the size of EAR's database. For more information check the FAQs regarding MySQL size management
EAR is composed of five main components:
The following image shows the main interactions between components:
This section provides a, summed up, step by step installation and execution guide for EAR. For a more in depth explanation of the necessary steps see the Installation from source or Installation from RPM, following the Configuration and Execution guides, or contact us at ear-support@bsc.es
$EAR_ETC/ear/ear.conf.template
. Copy at $EAR_ETC/ear/ear.conf
and update with the desired configuration. Go to our
ear.conf page to see how to do it.The ear.conf is used by all the services. $EAR_ETC/module
. You can add ear module when it's not in standard paths by doing module use $EAR_ETC/module
and then module load ear
.
edb_create
. The edb_create -p
command will ask you for the DB root password. If you get any problem here, check first the node where you are running the command can connect to the
DB server. In case problems persists, execute edb_create -o
to report the specific SQL queries generated. In case of troubles, contact with ear-support@bsc.es.$EAR_ETC/systemd
and they can usually be placed in $(ETC)/systemd
.econtrol --status
(note that the daemons will take around a minute to correctly report energy and not show up as an error in econtrol
). EARDs creates
a per-node text file with values reported to the EARDBD. In case there is problems when running econtrol, you can also find this file at $EAR_TMP/nodename.pm_periodic_data.txt
. ereport
(ereport -n all
should report the total energy send by each daemon since the setup).ereport -g
. (Note that EARGM will take a period of time set by the admin in ear.conf
, option GlobalManagerPeriodT1, to report for the first time. ).eacct
. (Note that only privileged users can check other users' applications).--ear=on
and check that the report by eacct
now includes the library metrics. EAR library depends on the MPI version: Intel, OpenMPI, etc. By default libear.so is used. Different
names for different versions can be specified automatically by adding the EAR version name in the corresponding MPI module. For instance, for libear.openmpi.4.0.0.so library, define SLURM_EAR_MPI_VERSION environment
variable as openmpi.4.0.0. When EAR has been installed from sources, this name is the same it is specified in MPI_VERSION during the configure. When installed from rpm, look at ´$EAR_INSTALL_PATH/lib´ to see the available versions.
default=on
to specify the EAR library will be loaded with all the applicatins by default in plugstack.conf
. If default is set to off, EAR library can be explicitly loaded by doing --ear=on when submitting
a job.EAR requires some third party libraries and headers to compile and run, in addition to the basic requirements such as the compiler and Autoconf. This is a list of these libraries, minimum tested versions and its references:
Library | Minimum version | References |
---|---|---|
PAPI | 5.4.0 | Website |
GSL | 1.4 | Website |
SLURM | 17.02.6 | Website |
MPI | - | - |
MySQL* | 15.1 | MySQL or MariaDB |
PostgreSQL* | 9.2 | PostgreSQL |
CUDA** | 7.5 | CUDA or MariaDB |
Autoconf | 2.69 | Website |
* Just one of them required.
** Required if you want to monitor GPU data.
Also, some drivers has to be present and loaded in the system:
Driver | File | Kernel version | References |
---|---|---|---|
CPUFreq | kernel/drivers/cpufreq/acpi-cpufreq.ko | 3.10 | Information |
Open IPMI | kernel/drivers/char/ipmi/*.ko | 3.10 | Information |
Lastly, the compilers: EAR uses C compiler and Fortran compiler. It has been tested with both Intel and GNU.
Compiler | Comment | Minimum version | References |
---|---|---|---|
GNU Compiler Collection (GCC) | For the library and daemon | 4.8.5 | Website |
Intel C Compiler (ICC) | For the library and daemon | 17.0.1 | Website |
$(EAR_ETC)
in this guide for
simplicity).
configure
program by typing autoreconf -i
.configure
parameters../configure ...
, make
and make install
in the root directory.make etc.install
to install the content of $(EAR_ETC)
. It is the configuration content, but that configuration will be expanded in the next section. You have a link at the bottom of this page.configure
is based on shell variables which initial value could be given by setting variables in the command line, or in the environment. Take a look to the table with the most popular variables:
Variable | Description |
---|---|
MPICC | MPI compiler. |
CC | C compiler command. |
MPICC_FLAGS | MPI compiler flags. |
CFLAGS | C compiler flags. |
CC_FLAGS | Also C compiler flags. |
LDFLAGS | Linker flags. E.g. ‘-L\
|
LIBS | Libraries to pass to the linker. E.g. ‘-l
|
EAR_TMP | Defines the node local storage as 'var', 'tmp' or other tempfs file system (default: /var/ear) (you can alo use --localstatedir=DIR). |
EAR_ETC | Defines the read-only single-machine data as 'etc' (default: EPREFIX/etc) (you can also use --sharedstatedir=DIR). |
MAN | Defines the manual directory (default: PREFIX/man) (you can use also --mandir=DIR). |
DOC | Defines the documentation directory (default: PREFIX/doc) (you can use also --docdir=DIR). |
MPI_VERSION | Adds a suffix to the compiled EAR library name. Read on for more information. |
USER | Owner user of the installed files. |
GROUP | Owned group of the installed files |
CC
, CFLAGS
and DEBUG
variables overwriting:
./configure CC=icc CFLAGS=-g EAR_ETC=/hpc/opt/etc
You can choose the root folder by typing ./configure --PREFIX=<path>
. But there are other options in the following table:
Definition | Default directory | Content / description |
---|---|---|
\<PREFIX> | /usr/local | Installation path |
\<EAR_ETC> | \<PREFIX>/etc | Configuration files. |
\<EAR_TMP> | /var/ear | Pipes and temporal files. |
You have more installation options information by typing ./configure --help
. If you want to change the value of any of this options after the configuration process, you can edit the root Makefile. All the options are at the
top of the text and its names are self-explanatory.
The configure
script is capable to find libraries located in custom location if a module is loaded in the environment or its path is included in LD_LIBRARY_PATH
. If not, you can help configure
to
find PAPI, SLURM, or other required libraries in case you installed in a custom location. It is necessary to add its root path for the compiler to see include headers and libraries for the linker. You can do this by adding to it the
following arguments:
Argument | Description |
---|---|
--with-papi=\<path> | Specifies the path to PAPI installation. |
--with-gsl=\<path> | Specifies the path to GSL installation. |
--with-slurm=\<path> | Specifies the path to SLURM installation. |
--with-cuda=\<path> | Specifies the path to CUDA installation. |
--with-mysql=\<path> | Specify path to MySQL installation. |
--with-pgsql=\<path> | Specify path to PostgreSQL installation. |
--with-fortran | Adds Fortran symbols to the binaries. Required for some MPI distributions. |
./configure --with-papi=/path/to/PAPI
If unusual procedures must be done to compile the package, please try to figure out how configure
could check whether to do them and contact the team to be considered for the next release. In the meantime, you can overwrite
shell variables or export its paths to the environment (e.g. LD_LIBRARY).
Also, there are additional flags to help administrator increase the compatibility of EAR in the nodes.
Argument | Description |
---|---|
--disable-rpath | Disables the RPATH included in binaries to specify some dependencies location. |
--disable-avx512 | Replaces the AVX-512 function calls by AVX-2. |
--disable-gpus | The GPU monitoring data is not allocated nor inserted in the database. |
Some EAR characteristics can be modified by changing the value of the constants defined in src/common/config/config_def.h
. You can open it with an editor and modify those pre-procesor variables to alter the EAR behaviour.
Also, you can quickly switch the user/group of your installation files by modifying the CHOWN_USR/CHOWN_GRP
variables the root Makefile.
As commented in the overview, the EAR library is loaded next to the user MPI application by the EAR Loader. The library uses MPI symbols, so it is compiled by using the includes provided by your MPI distribution. The
selection of the library version is automatic in runtime, but in the compiling and installation process is not. Each compiled library has its own file name that has to be defined by the MPI_VERSION
variable during ./configure
or by editing the root Makefile. The name list per distribution is exposed in the following table:
Distribution | Name | MPI_VERSION variable |
---|---|---|
Intel MPI | libear.so (default) | it is not required |
MVAPICH | libear.so (default) | it is not required |
OpenMPI | libear.ompi.so | ompi |
If different MPI distributions shares the same library name, it means that its symbols are compatible between them, so compiling and installing the library one time will be enough. However, if you provide different MPI distributions to the users, you will have to compile and install the library multiple times.
Before compiling new libraries you have to install by make install
. Then you can run the ./configure
again, changing the MPICC
, MPICC_FLAGS
and MPI_VERSION
variables, or
just opening the root Makefile
and edit the same variables and MPI_BASE
, which just sets the MPI installation root path. Now type make full
to perform a clean compilation and make earl.install
,
to install only the new version of the library.
If your MPI version is not fully compatible, please contact ear-support@bsc.es. We will add compatibility to EAR and give you a solution in the meantime.
You can install individual components by doing: make eard.install
to install EAR Daemon, make earl.install
to install EAR Library, make eardbd.install
EAR Database Manager, make eargmd.install
EAR Global Manager and make commands.install
the EAR command binaries.
This is the list of the inner installation folders and their content:
Root | Directory | Content / description |
---|---|---|
\<PREFIX> | /lib | Libraries. |
\<PREFIX> | /lib/plugins | Plugins. |
\<PREFIX> | /bin | EAR commands. |
\<PREFIX> | /bin/tools | EAR tools for coefficients. |
\<PREFIX> | /sbin | Privileged components. |
\<PREFIX> | /man | Documentation. |
\<EAR_ETC> | /ear | Configuration file. |
\<EAR_ETC> | /ear/coeffs | Coefficient files store. |
\<EAR_ETC> | /module | EAR module. |
\<EAR_ETC> | /slurm | ear.plugstack.conf. |
\<EAR_ETC> | /systemd | EAR service files. |
For a better overview of the installation process, return to our Quick installation guide. To continue the installation, visit the configuration page to set up properly the EAR configuration file and the SLURMs plugin stack file.
EAR uses some third party libraries. EAR RPM will not ask for them when installing but they must be available in LD_LIBRARY_PATh when running. Depending on the RPM, different version must be required of this libraries:
Library | Minimum version | References |
---|---|---|
PAPI | 5.4.0 | Website |
GSL | 1.4 | Website |
SLURM | 17.02.6 | Website |
MPI | - | - |
MySQL* | 15.1 | MySQL or MariaDB |
PostgreSQL* | 9.2 | PostgreSQL |
Autoconf | 2.69 | Website |
Also, some drivers has to be present and loaded in the system when starting EAR:
Driver | File | Kernel version | References |
---|---|---|---|
CPUFreq | kernel/drivers/cpufreq/acpi-cpufreq.ko | 3.10 | Information |
Open IPMI | kernel/drivers/char/ipmi/*.ko | 3.10 | Information |
$(EAR_TMP)
in this guide for simplicity).rpm -ivh --relocate /usr=/new_install_path --relocate /etc=/new_etc_path ear.version.rpm
. You can also use the --nodeps
if your dependency test fails.*.in
are compiled to the ready to use version, replacing tags for correct paths. You will have more information of those files in the following pages. Check the next section
for more informationrpm -e ear.version
.Directory | Content / description |
---|---|
/usr/lib | Libraries |
/usr/lib/plugins | Plugins |
/usr/bin | EAR commands |
/usr/bin/tools | EAR tools for coefficients |
/usr/sbin | Privileged components: EARD,EARDBD,EARGMD |
/etc/ear | Configuration file templates |
/etc/ear/coeffs | Folder to store coefficient files. |
/etc/module | EAR module. |
/etc/slurm | ear.plugstack.conf |
/etc/systemd | EAR service files |
The *.in
configuration files are compiled into etc/ear/ear.conf.template
and etc/ear/ear.full.conf.template
, etc/module/ear
, etc/slurm/ear.plugstack.conf
and various
etc/systemd/ear*.service
. You can find more information in the next configuration section.
For a better overview of the installation process, return to our Quick installation guide. To continue the installation, visit the configuration page to set up properly the EAR configuration file and the SLURMs plugin stack file.
The following requirements must be met for EAR to work properly:
tmp_ear_path
must be created by the admin. For instance:
mkdir /var/ear; chmod ugo +rwx /var/ear
ear.conf
and coefficients). Coefficients can be installed in a different path specified at configure time in COEFFS flag. Both ear.conf
and coefficients must be readable in all the nodes (compute
and “service” nodes).ear.conf
: ear.conf
is an ascii file setting default values and cluster descriptions. An ear.conf
is automatically generated based on a ear.conf.in
template. However, sysadmin
must include installation details such as hostname details for EAR services, ports, default values, and list of nodes. For more details, check EAR configuration file below.edb_create
command provided (MySQL/PostgreSQL server must be running and root access to the DB is needed)ear.conf
is a text file describing the EAR package behaviour in the cluster. It must be readable by all compute nodes and by nodes where commands are executed.
Usually the first word in the configuration file expresses the component related with the option. Lines starting with #
are comments.
A test for ear.conf file can be found in the path src/test/functionals/ear_conf
.
# The IP of the node where the MariaDB (MySQL) or PostgreSQL server process is running. Current version uses same names for both DB servers
DBIp=172.30.2.101
# Port in which the server accepts the connections.
DBPort=3306
# MariaDB user that the services will use. Needs INSERT/SELECT privileges. Used by EARDBD
DBUser=eardbd_user
# Password for the previous user. If left blank or commented it will assume the user has no password.
DBPassw=eardbd_pass
# Database user that the commands (eacct, ereport) will use. Only uses SELECT privileges.
DBCommandsUser=ear_commands
# Password for the previous user. If left blank or commented it will assume the user has no password.
DBCommandsPassw=commandspass
# Name of EAR's database in the server.
DBDatabase=EAR
# Maximum number of connections of the commands user to prevent server saturation/malicious actuation. Applies to DBCommandsUser
DBMaxConnections=20
# The following specify the granularity of data reported to database.
# Extended node information reported to database (added: temperature, avg_freq, DRAM and PCK energy in power monitoring).
DBReportNodeDetail=1
# Extended signature hardware counters reported to database.
DBReportSigDetail=1
# Set to 1 if you want Loop signatures to be reported to database.
DBReportLoops=0
# The port where the EARD will be listening.
NodeDaemonPort=50001
# Frequency used by power monitoring service, in seconds.
NodeDaemonPowermonFreq=60
# Maximum supported frequency (1 means nominal, no turbo).
NodeDaemonMaxPstate=1
# Enable (1) or disable (0) the turbo frequency.
NodeDaemonTurbo=0
# Enables the use of the database.
NodeUseDB=1
# Inserts data to MySQL by sending that data to the EARDBD (1) or directly (0).
NodeUseEARDBD=1
# '1' means EAR is controlling frequencies at all times (targeted to production systems) and 0 means EAR will not change the frequencies when users are not using EAR library (targeted to benchmarking systems).
NodeDaemonForceFrequencies=1
# The verbosity level [0..4]
NodeDaemonVerbose=1
# When set to 1, the output is saved in '$EAR_TMP'/eard.log (common configuration) as a log file.Otherwsie, stderr is used.
NodeUseLog=1
# Minimum time between two energy readings for performance accuracy
MinTimePerformanceAccuracy=10000000
# Port where the EARDBD server is listening
DBDaemonPortTCP=50002
# Port where the EARDBD mirror is listening
DBDaemonPortSecTCP=50003
# Port is used to synchronize the server and mirror
DBDaemonSyncPort=50004
# In seconds, interval of time of accumulating data to generate an energy aggregation
DBDaemonAggregationTime=60
# In seconds, time between inserts of the buffered data
DBDaemonInsertionTime=30
# Memory allocated per process. This allocations is used for buffering the data sent to the database by EARD or other components. If there is a server and mirror in a node a double of that value will be allocated. It is expressed in MegaBytes.
DBDaemonMemorySize=120
# The percentage of the memory buffer used by the previous field, by each type. These types are: mpi, non-mpi and learning applications, loops, energy metrics and aggregations and events, in that order. If a type gets 0% of space, this metric is discarded and not saved into the database.
DBDaemonMemorySizePerType=40,20,5,24,5,1,5
# When set to 1, eardbd uses a '$EAR_TMP'/eardbd.log file as a log file
DBDaemonUseLog=1
# Path where coefficients are installed, usually $EAR_ETC/ear/coeffs
CoefficientsDir=/path/to/coeffs
# Number of levels used by DynAIS algorithm.
DynAISLevels=10
# Windows size used by DynAIS, the higher the size the higher the overhead.
DynAISWindowSize=200
# Maximum time in seconds that EAR will wait until a signature is computed. After this value, if no signature is computed, EAR will go to periodic mode.
DynaisTimeout=15
# Time in seconds to compute every application signature when the EAR goes to periodic mode.
LibraryPeriod=10
# Number of MPI calls whether EAR must go to periodic mode or not.
CheckEARModeEvery=1000
# The IP or hostname of the node where the EARGMD demon is running.
EARGMHost=hostname
# Port where EARGMD will be listening.
EARGMPort=50000
# Use '1' or not '0' aggregated metrics to compute total energy.
EARGMUseAggregated=1
# Period T1 and period T2 are specified in seconds. T1 must be less than T2. Global manager updates the information every T1 seconds and uses the energy/power in T2 period to estimate energy/power constraints
EARGMPeriodT1=90
EARGMPeriodT2=259200
# Units field, Can be '-' (Joules), 'K' KiloJoules or 'M' MegaJoules
EARGMUnits=K
# This limit means the maximum energy allowed in 259200 seconds in 550000 KJoules
EARGMEnergyLimit=550000
#
# Global manager modes. Two modes are supported '0' (manual) or '1' (automatic). Manual means Gobal Manager is only monitoring energy&power and reporting to the DB . Automatic means it takes actions to guarantee energy limits.
EARGMMode=0
# A mail can be sent reporting the warning level (and the action taken in automatic mode). 'nomail' means no mail is sent. This option is independent of the node.
EARGMMail=nomail
# Percentage of accumulated energy to start the warning DEFCON level L4, L3 and L2.
EARGMWarningsPerc=85,90,95
# Number of "grace" T1 periods before doing a new re-evaluation. After a warning, EARGM will wait T1xGlobalManagerGracePeriods seconds until it raises a new warning.
EARGMGracePeriods=6
# Verbose level
EARGMVerbose=1
# When set to 1, the output is saved in '$EAR_TMP'/eargmd.log (common configuration) as a log file.
EARGMUseLog=1
# Format for action is: command_name energy_T1 energy_T2 energy_limit T2 T1 units "
# This action is automatically executed at each warning level (only once per grace periods)
EARGMEnergyAction=no_action
# Network extension (using another network instead of the local one). If compute nodes must be accessed from login nodes with a network different than default, and can be accesed using a expension, uncommmet next line and define 'netext' accordingly.
# NetworkExtension=netext
# Default verbose level
Verbose=0
# Path used for communication files, shared memory, etc. It must be PRIVATE per compute node and with read/write permissions. $EAR_TMP
TmpDir=/tmp/ear
# Path where coefficients and configuration are stored. It must be readable in all compute nodes. $EAR_ETC
EtcDir=/path/to/etc
InstDir=/path/to/inst
# Path where metrics are generated in text files when no database is installed. A suffix is included.
DataBasePathName=/etc/ear/dbs/dbs.
# Energy reading plugin (without the extension). Allows to use different system components to read the energy of the node. In this case, this plugin reads the energy of the system using Intel Node Manager.
# look at /path/to/inst/lib/plugins/energy folder to see the list of installed energy plugins
Energy_plugin=energy_nm.so
# Power model plugin (without the extension). The power model plugin is used to predict the power and energy consumption of the next iteration of the executing application.
Energy_model=avx512_model.so
Authorized users that are allowed to change policies, thresholds and frequencies are supposed to be administrators. A list of users, Linux groups, and/or SLURM accounts can be provided to allow normal users to perform that actions. Only normal Authorized users can execute the learning phase.
AuthorizedUsers=user1,user2
AuthorizedAccounts=acc1,acc2,acc3
AuthorizedGroups=xx,yy
Energy tags are pre-defined configurations for some applications (EAR library is not loaded). This energy tags accept a user ids, groups and SLURM accounts of users allowed to use that tag.
# General energy tag
EnergyTag=cpu-intensive pstate=1
# Energy tag with limited users
EnergyTag=memory-intensive pstate=4 users=user1,user2 groups=group1,group2 accounts=acc1,acc2
Tags are used for architectural descriptions. Max. AVX frequencies are used in predictor models and are SKU-specific. At least a default tag is mandatory to be included for a cluster to work properly. At least a default tag is mandatory.
The min_power
, max_power
and error_power
are threshold values that determine if the metrics read might be invalid, and a warning message to syslog will be reported if the values are outside of said
thresholds. error_power
is a more extreme value that if a metric surpasses it, said metric will not be reported to database.
A special energy plugin or energy model can be specified in a tag that will override the global values previously defined in all nodes that have this tag associated with them.
Tag=6148 default=yes max_avx512=2.2 max_avx2=2.6 max_power=500 min_power=50 error_power=600 coeffs=coeffs.default
Tag=6126 max_avx512=2.3 max_avx2=2.9 ceffs=coeffs.6126.default max_power=600 error_power=700
#---------------------------------------------------------------------------------------------------
## Power policies
## ---------------------------------------------------------------------------------------------------
#
## policy names must be exactly file names for policies installeled in the system
DefaultPowerPolicy=min_time
Policy=monitoring Settings=0 DefaultFreq=2.4 Privileged=0
Policy=min_time Settings=0.7 DefaultFreq=2.0 Privileged=0
Policy=min_energy Settings=0.1 DefaultFreq=2.4 Privileged=1
# For homogeneous systems, default frequencies can be easily specified using freqs, for heterogeneous systems it is preferred to use pstates
# Example with pstates (lower pstates corresponds with higher frequencies). Pstate=1 is nominal and 0 is turbo
#Policy=monitoring Settings=0 DefaultPstate=1 Privileged=0
#Policy=min_time Settings=0.7 DefaultPstate=4 Privileged=0
#Policy=min_energy Settings=0.1 DefaultPstate=1 Privileged=1
This section is mandatory since it is used for cluster description. Normally nodes are grouped in islands that share the same hardware characteristics as well as its database managers (EARDBDS). Each line describes an island, and every node must be in an island.
Remember that there are two kinds of database daemons. One called 'server' and other one called 'mirror'. Both performs the metrics buffering process, but just one performs the insert. The mirror will do that insert in case the 'server' process crashes or the node fails.
It is recommended for all islands to have symmetry. For example, if the island I0 and I1 have the server N0 and the mirror N1, the next island would have to point the same N0 and N1 or point to new ones N2 and N3.
Multiple EARDBDs are supported in the same island, so more than one line per island is required, but the condition of symmetry have to be met.
It is recommended that for a island to the server and the mirror running in different nodes. However, the EARDBD program could be both server and mirror at the same time. This means that the islands I0 and I1 could have the N0 server and the N2 mirror, and the islands I2 and I3 the N2 server and N0 mirror, fulfilling the symmetry requirements.
A tag can be specified that will apply to all the nodes in that line. If no tag is defined, the default one will be used as hardware definition
Island=0 Nodes=nodename_list DBIP=EARDB_server_hostname DBSECIP=EARDB_mirror_hostname
#This second island uses a tag that is not the default one
Island=1 Nodes=nodename_list DBIP=EARDB_server_hostname DBSECIP=EARDB_mirror_hostname Tag=6126
Detailed island accepted values:
node1,node2,node3
node[1-3]
node[1,2,3]
node1,node2,node3
node[1-3],node[4,5]
node[1,2],node3
node[1-3],node4
SLURM loads the plugin through a file called plugstack.conf
, which is composed by a list of a plugins. In the file etc/slurm/ear.plugstack.conf
, there is an example entry with the paths already set to the plugin,
temporal and configuration paths.
Example:
required ear_install_path/lib/earplug.so prefix=ear_install_path sysconfdir=etc_ear_path localstatedir=tmp_ear_path earlib_default=off
The argument prefix
points to the EAR installation path and it is used to load the library using LD_PRELOAD
mechanism. Also the localstatedir
is used to contact with the EARD, which by default points
the path you set during the ./configure
using --localstatedir
or EAR_TMP
arguments. Next to these fields, there is the field earlib_default=off
, which means that by default EARL is
not loaded, and eargmd_host
and eargmd_port
, if you plan to connect with the EARGMD component (you can leave this empty).
WARNING: If any EAR component is running in the same machine as the MySQL server some connection problems might occur. This will not happen with PostgreSQL. To solve those issues, input into MySQL's CLI client the
CREATE USER
and GRANT PRIVILEGES
queries from edb_create -o
changing the portion 'user_name'@'%'
to 'user_name'@'localhost'
so that EAR's
users have access to the server from the local machine. There are two ways to configure a database server for EAR's usage.
edb_create -r
located in $EAR_INSTALLATION_PATH/sbin
from a node with root access to the MySQL server. This requires MySQL/PostgreSQL's section of ear.conf to be correctly written. For more info run
edb_create -h
.edb_create -o
will output the queries that would be run with the program that contain
all that is needed for EAR to properly function.For more information about how each ear.conf
flag changes the database creation, see our Database section.
Visit the execution page to run EAR's different components.
The best way to execute all EAR daemon components (EARD, EARDBD, EARGM) is by the unit services method.
NOTE: EAR uses a MariaDB/MySQL server. The server must be started before EAR services are executed.
The generated unit services for the EAR Daemon, EAR Global Manager Daemon and EAR Database Daemon are generated and installed in $(EAR_ETC)/systemd
. You have to copy those unit service files to your systemd
operating
system folder ant then use the systemctl
command to run the daemons.
The way to launch the EAR daemons is by the unit services method. The generated unit services for the EAR Daemon, EAR Global Manager Daemon and EAR Database Daemon are generated and installed in $(EAR_ETC)/systemd
. You have
to copy those unit service files to your systemd
operating system folder and then use the systemctl
command to run the daemons.
Check the EARD, EARDBD, EARGMD pages to find the precise execution commands.
Finally, when using systemctl
commands, you can check messages reported to stderr
using journalctl
. For instance:
journalctl -u eard -f
. Note that if NodeUseLog
is set to 1 in ear.conf
, the messages will not be printed to stderr
but to $EAR_TMP/eard.log
instead. DBDaemonUseLog and
GlobalmanagerUseLog options in ´ear.conf´ specifies the output for EARDBD and EARGM respectivelly.
Additionally, services can be started, stopped or reloaded on parallel using parallel commands such as pdsh. As an example:
sudo pdsh -w nodelist systemctl start eard
The EAR package includes two type of tests. The check tests, prepared to be executed after the make
by typing make check
, with and without privileges, so probably you will have to check it with
sudo
.
When running a check test, 3 types of message are written in the output. 1) Error: the hardware, software or libraries are incompatible with the library. 2) Warning: it's possible that some componentes have to be loaded prior the execution (the library would try to do it). 3) Ok: your system is full compatible with the library.
Name | Checking |
---|---|
cpu_examinable | If the CPU examinable by the library. |
cpu_aperf | If the CPU APERF counter is available. |
cpu_uncores | If there are CPU uncore counters available. |
cpu_uncores_all | If there are all CPU uncore counters available. |
papi_version | If the PAPI version is greater or equal than the reference. |
papi_init | If PAPI initializes correctly. |
papi_comp_available | If PAPI perf counters events are available. |
papi_comp_enabled | If PAPI perf counters events are enabled. |
papi_comp_available | If PAPI perf uncore counters events are available. |
papi_comp_enabled | If PAPI perf uncore counters events are enabled. |
papi_comp_available | If PAPI libmsr events are available. |
papi_comp_enabled | If PAPI libmsr events are enabled. |
papi_comp_available | If PAPI rapl events are available. |
papi_comp_enabled | If PAPI rapl events are enabled. |
gsl_version | If the GSL version is greater or equal than the reference. |
slurm_version | If the SLURM version is greater or equal than the reference. |
module_ipmi_devintf | If the ipmi_devintf (IPMI) driver is running. |
module_acpi_cpufreq | If the acpi-cpufreq (CPUFreq) driver is running. |
Name | Description | Basic arguments |
---|---|---|
coeffs_compute | Computes the learning coefficients | <save.path> <min.frequency> <node.name> |
coeffs_default | Computes a default coefficients file | |
coeffs_null | Created a dummy configuration file to be used by EARD | coeff_path, max.freq min.freq |
coeffs_show | Shows the computed coefficients file in text format | <file.path> |
--help
to expand the application information and list the admitted flags.node1001
in which the minimum frequency set during the learning phase was 1900000 KHz/compute_coeffs /etc/coeffs 1900000 node1001
This is a necessary phase prior to the normal EAR utilization and is a kind of hardware characterization of the nodes. During the phase a matrix of coefficients are calculated and stored. These coefficients will be used to predict the energy consumption and performance of each application.
Please, visit the learning phase wiki page to read the manual and the repository to get the scripts and the kernels.
Some of the core EAR functionality can be dynamically loaded through a plugin mechanism, making EAR more extensible and dynamic than previous version since it is not needed to reinstall the system to add , for instance, a new policy or a new power model. It is only needed to copy the file in the $EAR_INSTALL_PATH/lib/plugin folder and restart some components. The three parts that can be loaded as plugins are: the node energy reading library, the power policy, the power model, and the tracing.
Plugin | Description |
---|---|
Power model | It is used to predict the energy consumption given a target frequency and the current state metrics. |
Power policies | Defines the behaviour of EAR to switch betwen frequencies given energy readings and predictions. |
Energy readings | It is used to read the energy of the node . |
Tracing | EAR library data and internal states changes are exported to the tracing library in case it is specified |
EAR is composed of five main components:
The following image shows the main interactions between components:
EAR's daemon is a per-node process that provides privileged metrics of each node as well as a periodic power monitoring service. Said periodic power metrics are sent to EAR's database either directly or via the database daemon (see configuration page).
For more information, see EARD.
The database daemon acts as an intermediate layer between any EAR component that inserts data and EAR's database to prevent the database server from collapsing due to getting overrun with connections and insert queries.
For more information, see EARDBD.
EAR's Global Manager Daemon (EARGMD) is a cluster wide component that controls the percentage of the maximum energy consumed.
For more information, see EARGM.
The EAR library is the core of the EAR package. The EARL offers a lightweight and simple solution to select the optimal frequency for MPI applications at runtime, with multiple power policies each with a different approach to find said frequency. EARL uses the daemon to read performance metrics and to send application data to EAR's database.
For more information, see EARL.
EAR Loader is the responsible for loading the EAR Library. It is a small and lightweight library loaded by the SLURM Plugin, that identifies the user application and loads its corresponding EAR Library distribution.
For more information, see EARLO.
EAR SLURM plugin allows to dynamically load and configure the EAR library for the SLURM jobs, if the enabling argument is set or is enabled by default. Additionally, it reports any jobs that start or end to the nodes' EARDs for accounting and monitoring purposes.
For more information, see SLURM Plugin.
The node daemon is the component in charge of providing any kind of services that requires privileged capabilities. Current version is conceived as an external process executed with root privileges.
The EARD provides these services, each one covered by one thread:
When executed in production environments, EARD connects with EARDBD service, that has to be up before starting the node daemon, otherwise values reported by EARD to be stored in the database, will be lost.
The EAR Daemon uses the $(EAR_ETC)/ear/ear.conf
file to be configured. It can be dynamically configured by reloading the service.
Please visit the EAR configuration file page for more information about the options of EARD and other components.
To execute this component, this systemctl
command examples are provided:
sudo systemctl start eard
to start the EARD service.sudo systemctl stop eard
to stop the EARD service.sudo systemctl reload eard
to force to reload the configuration of the EARD service.Log messages are generated during the execution. Use journalctl command to see eard message:
sudo journalctl -u eard -f
After executing a "systemctl reload eard" command, not all the EARD options are dynamically updated. The list of updated variables are:
DefaultPstates
NodeDaemonMaxPstate
NodeDaemonVerbose
NodeDaemonPowermonFreq
SupportedPolicies
MinTimePerformanceAccuracy
To reconfigure other options such as EARD connection port, coefficients, etc, it must be stopped and restarted again.
EARDBD caches the records generated by the EARL and EARD in the system and reports it to the centralized database. It is recommended to run several EARDBDs if the cluster is big enough, to reduce the number of inserts and connections to the database.
Also, EARDBD accumulates data during a period of time to decrease the total insertions in the database, helping the performance of big queries. By now just the energy metrics are available to accumulate in the new metric called energy aggregation. EARDBD uses periodic power metrics sent by EARD, the per-node daemon, including job identification details (job id, step id when executed in a SLURM system).
The EAR Database Daemon uses the $(EAR_ETC)/ear/ear.conf
file to be configured. It can be dynamically configured by reloading the service.
Please visit the EAR configuration file page for more information about the options of EARDBD and other components.
To execute this component, this systemctl
command examples are provided:
sudo systemctl start eardbd
to start the EARDBD service.sudo systemctl stop eardbd
to stop the EARDBD service.sudo systemctl reload eardbd
to force to reload the configuration of the EARDBD service.EARGM is a cluster wide component offering cluster energy monitoring and capping. EARGM can work in two modes: manual and automatic. When running in manual mode, EARGM monitors the total energy consumption, evaluates the percentage of energy consumption over the energy limit set by the admin and reports the cluster status to the DB. When running in automatic mode, apart from evaluating the energy consumption percentage it sends the evaluation to computing nodes. EARDs passes these messages to EARL which re-applies the energy policy with the new settings. Apart from sending messages and reporting the energy consumption to the DB, EARGM offers additional features to notify the energy consumption: Automatic execution of commands is supported and mails can be also automatically sent. Both the command to be executed or the mail address can be defined in the ear.conf. In the ear.conf it can be also specified the energy limits, the monitoring period, etc. EARGM uses periodic aggregated power metrics to efficiently compute the cluster energy consumption. Aggregated metrics are computed by EARDBD based on power metrics reported by EARD, the per-node daemon.
The EAR Global Manager uses the $(EAR_ETC)/ear/ear.conf
file to be configured. It can be dynamically configured by reloading the service.
Please visit the EAR configuration file page for more information about the options of EARGM and other components.
To execute this component, this systemctl
command examples are provided:
sudo systemctl start eargmd
to start the EARGM service.sudo systemctl stop eargmd
to stop the EARGM service.sudo systemctl reload eargmd
to force to reload the configuration of the EARGM service.The EAR library is the core of the EAR package. The EARL offers a lightweight and simple solution to select the optional frequency for MPI applications at runtime.
EARL is dynamically loaded next to the running applications by the EAR Loader. Loader intercepts the MPI calls through the PMPI interface, and then calls its respective PMPI function included in the library, which handles the operations. At runtime, EARL goes through the following phase:
Automatic detection of application outer loops. This is done by intercepting MPI calls and invoking the Dynamic Application Iterative Structure detector algorithm. DynAIS is highly optimized for new Intel architectures, reporting low overhead.
Computation of the application signature. Once DynAIS starts reporting iterations for the outer loop, EAR starts to compute the application signature. This signature includes: iteration time, DC power consumption, bandwidth, cycles, instructions, etc. Since the DC power measurements error highly depends on the hardware, EAR automatically detects the hardware characteristics and sets a minimum time to compute the signature in order to minimize the average error.
The EAR Library uses the $(EAR_ETC)/ear.conf
file to be configured. Please visit the EAR configuration file page for more information about the options of EARL and other components.
The library receives the its specific settings through a shared memory regiones initialized by EARD.
For information on how to run applications alongside with EARL see our User guide's section about it, as well as the Policies page.
EAR SLURM plugin allows to dynamically load the EAR Loader for the SLURM jobs (and setpid), if the enabling argument is set or if is enabled by default. The Loader will be executed in each job step, intercepting all MPI calls and passing this information to the EAR Library.
Visit the configuration page to set up properly the SLURM /etc/slurm/plugstack.conf
file.
You can find the complete EAR SLURM Plugin parameter in the user guide.
EAR's database consists of the following tables:
ear.conf
)ereport
command and EARGM, as well as reducing database size (Periodic_metrics of older periods where precision
at node level is not needed can be deleted and the aggregations used instead). ear.conf
. One record every T1 period (defined at ear.conf) is reported.ear.conf
When running edb_create
some tables might not be created, or may have some quirks, depending on some ear.conf
settings. The settings and alterations are as follows:
DBReportNodeDetail
: if set to 1, edb_create
will create to additional columns in the Periodic_metrics table for Temperature (in Celsius) and Frequency (in Hz) accounting.DBReportSigDetail
: if set to 1, Signatures will have additional fields for cycles, instructions, and FLOPS1-8 counters (number of instruction by type).DBMaxConnections
: this will restrict the number of maximum simultaneous commands connections.If any of the settings is set to 0, the table will have fewer details but the table's records will be smaller in stored size.
Any table with missing columns can be later altered by the admin to include said columns. For a full detail of each table's columns, run edb_create -o
with the desired ear.conf
settings.
ear.conf
There are various settings in ear.conf
that restrict the data reported to database, and some errors might occur if the database configuration is different from EARDB's.
DBReportNodeDetail
: if set to 1, the node managers will report temperature, average frequency, DRAM and PCK energy to the database manager, which will try to insert it to Periodic_metrics. If Periodic_metrics does
not have the columns for both metrics, an error will occur and nothing will be inserted. To solve the error, set ReportNodeDetail
to 0 or manually update Periodic_metrics to have the necessary columns.
DBReportSigDetail
: similarly to ReportNodeDetail
, an error will occur if the configuration differs from the one used when creating the database.
DBReportLoops
: if set to 1, EARL detected application loops will be reported to database, each with its corresponding Signature. Set to 0 to disable this feature. Regardless of the setting, no error should occur.
If Signatures and/or Periodic_metrics have the additional columns but their respective settings are set to 0, a NULL will be set in said additional columns, which will make those rows smaller in size (but bigger than if the columns did not exist).
1) How to see ear configuration and metrics at runtime: use –-ear-verbose=1
2) User authorized “issues”. The following list of ear flags are only allowed to Authorized users (ear.conf
): ear-cpufreq, ear-tag,ear-learning, ear-policy-th
.
Action: Check ear option and user authorization (ear.conf)
AuthorizedUsers=user1,user2
AuthorizedAccounts=acc1,acc2,acc3
AuthorizedGroups=xx,yy
If user is not authorized it means it is the expected result
3) How to select a specific energy policy and a different one is applied (validated with ear-verbose=1). Energy policies can be configured to be enabled to all users or not.
Action: Check policy configuration (ear.conf) and user authorization (ear.conf)
#Enabled to all users
Policy=monitoring Settings=0 DefaultFreq=2.4 Privileged=0
#Enabled to authorized users
Policy=monitoring Settings=0 DefaultFreq=2.4 Privileged=1
If not enables or not authorized it is the expected result
4) How to disable EAR library explicitly: use –ear=off
5) How to apply ear settings to all the srun/mpiruns inside a job: Set the options in #SBATCH
headers
#!/bin/bash #SBATCH -N 1 #SBATCH –ear-policy=min_time #application 1 and 2 will run with min_time srun application1 srun application2
6) How to apply different eat settings to different srun/mpirun inside a job: The options per stepid.
srun –ear-policy=min_time application srun –ear-policy=min_energy application
7) How to see which energy policies are installed (srun –help)
Comment: Installed policies, it is possible user is not allowed to run it
8) How to set ear flags with mpirun (intel)? Depending on the intel mpi version. Before version 2019, mpirun had 2 parameters to specify slurm options.
mpirun –bootstrap=slurm -bootstrap-exec-args=”—ear-verbose=1”
Since version 2019, SLURM options must be specified using environment variables:
export I_MPI_HYDRA_BOOTSTRAP=slurm export I_MPI_HYDRA_BOOTSTRAP_EXEC_EXTRA_ARGS --ear-verbose=1"
9) How to set ear flags with mpirun (openmpi)? OpenMPI needs an extra support when srun is not used. Erun command must be used.
mpirun erun –ear-policy=min_energy --program=application
10) Application is using OpenMPI and it blocks when running with EARL and mpirun: Use erun
11) Application works without EAR (--ear=off) and fails with EARL reporting errors related with dynamic libraries
Action: Check if application is using right EAR mpi version. If environment variable is set in mpi modules, it must be automatic, otherwise, validate the --ear-mpi-dist is present when needed
12) How to collect more detailed metrics than available in the DB. Use --ear-user-db flag to generate csv files with all the EARL collected metrics.
13) How to collect paraver traces. Use the environment variables to enable the trace collection and to specify the path
SLURM_EAR_TRACE_PLUGIN$EAR_INSTALL_PATH/lib/plugins/tracer/tracer_paraver.so SLURM_EAR_TRACE_PATH=TRACES_PARAVER/
14) User asks for application metrics with eacct and NO-EARL appears in some of the columns in the output , that means EARL was not loaded with the application or the application fails before MPI_Finalize, nor reporting application data
Action: Check if application was executed with EARL and it didn’t fail.
15) After some time, user asks for application metrics with eacct and application is not reported
Action: Try again after some minutes (applications are not reported immediately)