EAR  4.2.1
EAR Reference Manual
User-guide

EAR was first designed to be usable 100% transparently by users, which means that you can run your applications enabling/disabling/tuning EAR with the less effort for changing your workflow, e.g., submission scripts. This is achieved by providing integrations (e.g., plug-ins, hooks) with system batch schedulers, which do all the effort to set-up EAR on job submission. By now, SLURM is the batch scheduler 100% compatible by EAR thanks to EAR's SLURM SPANK plug-in.

With EAR's SLURM plug-in, running an application with EAR is as easy as submitting a job with either srun, sbatch or mpirun. The EAR Library (EARL) is automatically loaded with some applications when EAR is enabled by default.

Check with the ear-info command if EARL is on/off by default. If it’s off, use --ear=on option offered by EAR SLURM plug-in to enable it. For other schedulers, a simple prolog/epilog command can be created to provide transparent job submission with EAR and default configuration.

[[TOC]]

Use cases

MPI applications

EARL is automatically loaded with MPI applications when EAR is enabled by default (check ear-info). EAR supports the utilization of both mpirun/mpiexec and srun commands.

When using sbacth/srun or salloc, Intel MPI and OpenMPI are fully supported. When using specific MPI flavour commands to start applications (e.g., mpirun, mpiexec.hydra), there are some keypoints which you must take account. See next sections for examples and more details.

Hybrid MPI + (OpenMP, CUDA, MKL) applications

EARL automatically supports this use case. mpirun/mpiexec and srun are supported in the same manner as explained above.

Python MPI applications

EARL cannot detect automatically MPI symbols when Python is used. On that case, an environment variable used to specify which MPI flavour is provided.

Export SLURM_EAR_LOAD_MPI_VERSION environment variable with either intel or open mpi values, e.g., export SLURM_EAR_LOAD_MPI_VERSION="open mpi", whose are the two MPI implementations 100% supported by EAR.

Running MPI applications on SLURM systems

Using srun command

Running MPI applications with EARL on SLURM systems using srun command is the most straightforward way to start using EAR. All jobs are monitored by EAR and the Library is loaded by default depending on the cluster configuration. To run a job with srun and EARL there is no need to load the EAR module.

Even though it is automatic, there are few flags than can be selected at job submission. They are provided by EAR's SLURM SPANK plug-in. When using SLURM commands for job submission, both Intel and OpenMPI implementations are supported.

Using mpirun/mpiexec command

To provide an automatic loading of the EAR library, the only requirement from the MPI library is to be coordinated with the scheduler.

Intel MPI

Recent versions of Intel MPI offers two environment variables that can be used to guarantee the correct scheduler integrations:

You can read here the Intel environment variables guide.

OpenMPI

For joining OpenMPI and EAR it is highly recommended to use SLURM's srun command. When using mpirun, as OpenMPI is not fully coordinated with the scheduler, EARL is not automatilly loaded on all nodes. Therefore EARL will be disabled and only basic energy metrics will be reported. To provide support for this workflow, EAR provides erun command. Read the corresponding examples section for more information about how to use this command.

MPI4PY

To use MPI with Python applications, the EAR Loader cannot automatically detect symbols to classify the application as Intel or OpenMPI. In order to specify it, the user has to define the SLURM_LOAD_MPI_VERSION environment variable with the values intel or open mpi. It is recommended to add in Python modules to make it easy for final users.

Non-MPI applications

Python

Since version 4.1 EAR automatically executes the Library with Python applications, so no action is needed. You must run the application with srun command to pass through the EAR's SLURM SPANK plug-in in order to enable/disable/tuning EAR. See EAR submission flags provided by EAR SLURM integration.

OpenMP, CUDA and Intel MKL

To load EARL automatically with non-MPI applications it is required to have it compiled with dynamic symbols and also it must be executed with srun command. For example, for CUDA applications the --cudart=shared option must be used at compile time. EARL is loaded for OpenMP, MKL and CUDA programming models when symbols are dynamically detected.

Other application types or frameworks

For other programming models or sequential apps not supported by default, EARL can be forced to be loaded by setting SLURM_EAR_LOADER_APPLICATION enviroment variable, which must be defined with the application name. For example:

#!/bin/bash
export SLURM_EAR_LOADER_APPLICATION=my_app
srun my_app

Retrieving EAR data

As a job accounting and monitoring tool, EARL collects some metrics that you can get to see or know your applications behaviour. The Library is doted with several modules and options to be able to provide different kind of information.

As a very simple hint of your application workload, you can enable EARL verbosity to get loop data at runtime. The information is shown at stderr by default. Read how to set up verbosity at [submission time](ear-job-submission-flags) and verbosity environment variables provided for a more advanced tunning of this EAR feature.

To get offline job data EAR provides eacct, a tool to provide the monitored job data stored in the Database. You can request information in different ways, so you can read aggregated job data, per-node or per-loop information among other things. See eacct usage examples for a better overview of which kind of data eacct provides.

There is another way to get runtime and aggregated data during runtime without the need of calling eacct after the job completion. EAR implements a reporting system mechanism which let developers to add new report plug-ins, so there is an infinit set of ways to report EAR collected data.

Therefore EAR releases come with a fully supported report plug-in (called csv_ts) which basically provides the same runtime and aggregated data reported to the Database in CSV files, directly while the job is running. You can load this plug-in in two ways:

  1. By setting --ear-user-db flag at submission time.
  2. Loading directly the report plug-in through an environment variable: export SLURM_EAR_REPORT_ADD=csv_ts.so.

Contact with ear-support@bsc.es for more information

about report plug-ins.

You can also request EAR to report events to the [Database](EAR-Databse). They show more details about EARL internal state and can be provided with eacct command. See how to enable EAR events reporting and which kind of events EAR is reporting.

If your application applies, you can request EAR to report at the end of the execution a summary about its MPI behaviour. The information is provided along two files and is the aggregated data of each process of the application.

Finally, EARL can provide runtime data in the Paraver trace format. Paraver is a flexible performance analysis tool maintained by the Barcelona Supercomputing Center's tools team. This tool provides an easy way to visualize runtime data, computing derived metrics and to provide histograms for better of your application behaviour. See on the environment variables page how to generate Paraver traces.

Contact with ear-support@bsc.es if you want to get more details

about how to deal with EAR data with Paraver.

EAR job submission flags

The following EAR options can be specified when running srun and/or sbatch, and are supported with srun/sbatch/salloc:

Option Description
**--ear**=[on|off] Enables/disables EAR library loading with this job.
**--ear-user-db**=_<filename>_ Asks the EAR Library to generate a set of CSV files with EARL metrics.
**--ear-verbose**=[0|1] Specifies the level of verbosity; the default is 0.

When using ear-user-db flag, one file per node is generated with the average node metrics (node signature) and one file with multiple lines per node is generated with runtime collected metrics (loops node signatures). Read eacct's section in the commands page to know which metrics are reported, as data generated by this flag is the same as the reported (and retrieved later by the command) to the Database.

Verbose messages are placed by default in stderr. For jobs with multiple nodes, ear-verbose option can result in lots of messages mixed at stderr. We recommend to split up SLURM's output (or error) file per-node. You can read SLURM's filename pattern specification for more information.

If you still need to have job output and EAR output separated, you can set SLURM_EARL_VERBOSE_PATH environment variable and one file per node will be generated only with EAR output. The environemnt variable must be set with the path (a directory) where you want the output files to be generated, it will be automatically created if needed.

You can always check the avaiable EAR submission flags provided by EAR's SLURM SPANK

plug-in by typing srun --help.

CPU frequency selection

The EAR configuration file supports the specification of EAR authorized users, who can ask for a more privileged submission options. The most relevant ones are the possibility to ask for a specific optimisation policy and a specific CPU frequency. Contact with sysadmin or helpdesk team to become an authorized user.

GPU frequency selection

EAR version 3.4 and upwards supports GPU monitoring for NVIDIA devices from the point of view of the application and node monitoring. GPU frequency optimization is not yet supported. Authorized users can ask for a specific GPU frequency by setting the SLURM_EAR_GPU_DEF_FREQ environment variable, giving the desired GPU frequency expressed in kHz. Only one frequency for all GPUs is now supported. Contact with sysadmin or helpdesk team to become an authorized user.

To see the list of available frequencies of the GPU you will work on, you can type the following command:

nvidia-smi -q -d SUPPORTED_CLOCKS

Examples

`srun` examples

Having an MPI application asking for one node and 24 tasks, the following is a simple case of job submission. If EARL is turned on by default, no extra options are needed to load it. To check if it is on by default, load the EAR module and execute the ear-info command. EAR verbose is set to 0 by default, i.e., no EAR messages.

srun -J test -N 1 -n 24 --tasks-per-node=24 application

The following executes the application showing EAR messages, including EAR configuration and node signature in stderr.

srun --ear-verbose=1 -J test -N 1 -n 24 --tasks-per-node=24 application

EARL verbose messages are generated in the standard error. For jobs using more than 2 or 3 nodes messages can be overwritten. If the user wants to have EARL messages in a file the SLURM_EARL_VERBOSE_PATH environment variable must be set with a folder name. One file per node will be generated with EARL messages.

export SLURM_EARL_VERBOSE_PATH=logs
srun --ear-verbose=1 -J test -N 1 -n 24 --tasks-per-node=24 application

The following asks for EARL metrics to be stored in csv file after the application execution. Two files per node will be generated: one with the average/global signature and another with loop signatures. The format of output files is _<filename>.<nodename>_.time.csv for the global signature and _<filename>.<nodename>_.time.loops.csv for loop signatures.

srun -J test -N 1 -n 24 --tasks-per-node=24 --ear-user-db=filename application

For EAR authorized users, the following executes the application with a CPU frequency of 2.0GHz:

srun --ear-cpufreq=2000000 --ear-policy=monitoring --ear-verbose=1 -J test -N 1 -n 24 --tasks-per-node=24 application

For --ear-cpufreq to have any effect, you must specify the --ear-policy option even if you want to run your application with the default policy.

`sbatch` + EARL + srun

When using sbatch EAR options can be specified in the same way. If more than one srun is included in the job submission, EAR options can be inherited from sbatch to the different srun instances or they can be specifically modified on each individual srun.

The following example will execute twice the application. Both instances will have the verbosity set to 1. As the job is asking for 10 nodes, we have set the SLURM_EARL_VERBOSE_PATH environment variable set to the ear_log folder. Moreover, the second step will create a set of csv files placed in the ear_metrics folder. The nodename, Job Id and Step Id are part of the filename for a better identification.

#!/bin/bash
#SBATCH -N 1
#SBATCH -e test.%j.err
#SBATCH -o test.%j.out
#SBTACH --ntasks=24
#SBATCH --tasks-per-node=24
#SBATCH --cpus-per-task=1
#SBATCH --ear-verbose=1
export SLURM_EARL_VERBOSE_PATH=ear_logs
srun application
mkdir ear_metrics
srun --ear-user-db=ear_metrics/app_metrics application

EARL + `mpirun`

Intel MPI

When running EAR with mpirun rather than srun, we have to specify the utilization of srun as bootstrap. Version 2019 and newer offers two environment variables for bootstrap server specification and arguments.

export I_MPI_HYDRA_BOOTSTRAP=slurm
export I_MPI_HYDRA_BOOTSTRAP_EXEC_EXTRA_ARGS="--ear-policy=monitoring --ear-verbose=1"
mpiexec.hydra -n 10 application

OpenMPI

Bootstrap is an Intel(R) MPI option but not an OpenMPI option. For OpenMPI srun must be used for an automatic EAR support. In case OpenMPI with mpirun is needed, EAR offers the erun command, which is a program that simulates all the SLURM and EAR SLURM Plug-in pipeline. You can launch erun with the --program option to specify the application name and arguments.

mpirun -n 4 /path/to/erun --program="hostname --alias"

In this example, mpirun would run 4 erun processes. Then, erun will launch the application hostname with its alias parameter. You can use as many parameters as you want but the semicolons have to cover all of them in case there are more than just the program name.

erun will simulate on the remote node both the local and remote pipelines for all created processes. It has an internal system to avoid repeating functions that are executed just one time per job or node, like SLURM does with its plugins.

IMPORTANT NOTE If you are going to launch n applications with erun command through a sbatch job, you must set the environment variable SLURM_STEP_ID to values from 0 to n-1 before each mpirun call. By this way erun will inform the EARD the correct step ID to be stored then to the Database.

EAR job Accounting (`eacct`)

The eacct command shows accounting information stored in the EAR DB for jobs (and steps) IDs. The command uses EAR's configuration file to determine if the user running it is privileged or not, as non-privileged users can only access their information. It provides the following options.

Usage examples

The basic usage of eacct retrieves the last 20 applications (by default) of the user executing it. If a user is privileged, they may see all users applications. The default behaviour shows data from each job-step, aggregating the values from each node in said job-step. If using SLURM as a job manager, a sb (sbatch) job-step is created with the data from the entire execution. A specific job may be specified with -j option.

[user@host EAR]$ eacct -j 175966
JOB-STEP USER APPLICATION POLICY NODES AVG/DEF/IMC(GHz) TIME(s) POWER(W) GBS CPI ENERGY(J) GFLOPS/W IO(MBs) MPI% G-POW (T/U) G-FREQ G-UTIL(G/MEM)
175966-sb user afid NP 2 2.97/3.00/--- 3660.00 381.51 --- --- 2792619 --- --- --- --- --- ---
175966-2 user afid MO 2 2.97/3.00/2.39 1205.26 413.02 146.21 1.04 995590 0.1164 0.0 21.0 --- --- ---
175966-1 user afid MT 2 2.62/2.60/2.37 1234.41 369.90 142.63 1.02 913221 0.1265 0.0 19.7 --- --- ---
175966-0 user afid ME 2 2.71/3.00/2.19 1203.33 364.60 146.23 1.07 877479 0.1310 0.0 17.9 --- --- ---

The command shows a pre-selected set of columns, read eacct's section on the [EAR commands page](EAR-commands).

For node-specific information, the -l (i.e., long) option provides detailed accounting of each individual node: In addition, eacct shows an additional column: VPI(%) (See the example below). The VPI is meaning the percentage of AVX512 instructions over the total number of instructions.

[user@host EAR]$ eacct -j 175966 -l
JOB-STEP NODE ID USER ID APPLICATION AVG-F/IMC-F TIME(s) POWER(s) GBS CPI ENERGY(J) IO(MBS) MPI% VPI(%) G-POW(T/U) G-FREQ G-UTIL(G/M)
175966-sb cmp2506 user afid 2.97/--- 3660.00 388.79 --- --- 1422970 --- --- --- --- --- ---
175966-sb cmp2507 user afid 2.97/--- 3660.00 374.22 --- --- 1369649 --- --- --- --- --- ---
175966-2 cmp2506 user afid 2.97/2.39 1205.27 423.81 146.06 1.03 510807 0.0 21.2 0.23 --- --- ---
175966-2 cmp2507 user afid 2.97/2.39 1205.26 402.22 146.35 1.05 484783 0.0 20.7 0.01 --- --- ---
175966-1 cmp2506 user afid 2.58/2.38 1234.46 374.14 142.51 1.02 461859 0.0 19.4 0.00 --- --- ---
175966-1 cmp2507 user afid 2.67/2.37 1234.35 365.67 142.75 1.03 451362 0.0 20.0 0.01 --- --- ---
175966-0 cmp2506 user afid 2.71/2.19 1203.32 371.76 146.25 1.08 447351 0.0 17.9 0.01 --- --- ---
175966-0 cmp2507 user afid 2.71/2.19 1203.35 357.44 146.21 1.05 430128 0.0 17.9 0.01 --- --- ---

If EARL was loaded during an application execution, runtime data (i.e., EAR loops) may be retrieved by using -r flag. You can still filter the output by Job (and Step) ID.

Finally, to easily transfer eacct’s output, -c option saves the requested data in CSV format. Both aggregated and detailed accountings are available, as well as filtering. When using along with -l or -r options, all metrics stored in the EAR Database are given. Please, read the [commands section page](EAR-commands) to see which of them are available.

[user@host EAR]$ eacct -j 175966.1 -r
JOB-STEP NODE ID ITER. POWER(W) GBS CPI GFLOPS/W TIME(s) AVG_F IMC_F IO(MBS) MPI% G-POWER(T/U) G-FREQ G-UTIL(G/MEM)
175966-1 cmp2506 21 360.6 115.8 0.838 0.086 1.001 2.58 2.30 0.0 11.6 0.0 / 0.0 0.00 0%/0%
175966-1 cmp2507 21 333.7 118.4 0.849 0.081 1.001 2.58 2.32 0.0 12.0 0.0 / 0.0 0.00 0%/0%
175966-1 cmp2506 31 388.6 142.3 1.010 0.121 1.113 2.58 2.38 0.0 19.7 0.0 / 0.0 0.00 0%/0%
175966-1 cmp2507 31 362.8 142.8 1.035 0.130 1.113 2.59 2.37 0.0 19.5 0.0 / 0.0 0.00 0%/0%
175966-1 cmp2506 41 383.3 143.2 1.034 0.124 1.114 2.58 2.38 0.0 19.6 0.0 / 0.0 0.00 0%/0%
[user@host EAR]$ eacct -j 175966 -c test.csv
Successfully written applications to csv. Only applications with EARL will have its information properly written.
[user@host EAR]$ eacct -j 175966.1 -c -l test.csv
Successfully written applications to csv. Only applications with EARL will have its information properly written.

Job energy optimization: EARL policies

The core component of EAR at the user's job level is the EAR Library (EARL). The Library deals with job monitoring and is the component which implements and applies optimization policies based on monitored workload.

We highly recommend you to read [EARL](EARL) documentation and also how energy policies work in order to better understand what is doing the Library internally, so you will can explore easily all features (e.g., tunning variables, collecting data) EAR offers to the end-user so you will have more knowledge about how much resources your application consumes and how to correlate with its computational characteristics.