EAR was first designed to be usable 100% transparently by users, which means that you can run your applications enabling/disabling/tuning EAR with the less effort for changing your workflow, e.g., submission scripts. This is achieved by providing integrations (e.g., plug-ins, hooks) with system batch schedulers, which do all the effort to set-up EAR on job submission. By now, SLURM is the batch scheduler full compatible with EAR thanks to EAR's SLURM SPANK plug-in.
With EAR's SLURM plug-in, running an application with EAR is as easy as submitting a job with either srun
, sbatch
or mpirun
. The EAR Library (EARL) is automatically loaded with some applications when EAR is enabled by default.
Check with the ear-info command if EARL is on
/off
by default. If it’s off
, use --ear=on
option offered by EAR SLURM plug-in to enable it. For other schedulers, a simple prolog/epilog command can be created to provide transparent job submission with EAR and default configuration. The EAR development team had worked also with OAR and PBSPro batch schedulers, but currently there is no any official stable nor supported feature.
EARL is automatically loaded with MPI applications when EAR is enabled by default (check ear-info
). EAR supports the utilization of both mpirun
/mpiexec
and srun
commands.
When using sbacth
/srun
or salloc
, Intel MPI and OpenMPI are fully supported. When using specific MPI flavour commands to start applications (e.g., mpirun
, mpiexec.hydra
), there are some keypoints which you must take account. See next sections for examples and more details.
EARL automatically supports this use case. mpirun
/mpiexec
and srun
are supported in the same manner as explained above.
EARL cannot detect automatically MPI symbols when Python is used. On that case, an environment variable used to specify which MPI flavour is provided.
Export SLURM_EAR_LOAD_MPI_VERSION environment variable with either intel or open mpi values, e.g., export SLURM_EAR_LOAD_MPI_VERSION="open mpi"
, whose are the two MPI implementations 100% supported by EAR.
Running MPI applications with EARL on SLURM systems using srun
command is the most straightforward way to start using EAR. All jobs are monitored by EAR and the Library is loaded by default depending on the cluster configuration. To run a job with srun
and EARL there is no need to load the EAR module.
Even though it is automatic, there are few flags than can be selected at job submission. They are provided by EAR's SLURM SPANK plug-in. When using SLURM commands for job submission, both Intel and OpenMPI implementations are supported.
To provide an automatic loading of the EAR library, the only requirement from the MPI library is to be coordinated with the scheduler.
Recent versions of Intel MPI offers two environment variables that can be used to guarantee the correct scheduler integrations:
I_MPI_HYDRA_BOOTSTRAP
sets the bootstrap server. It must be set to slurm.I_MPI_HYDRA_BOOTSTRAP_EXEC_EXTRA_ARGS
sets additional arguments for the bootstrap server. These arguments are passed to SLURM, and they can be all the same as EAR's SPANK plug-in provides.You can read here the Intel environment variables guide.
For joining OpenMPI and EAR it is highly recommended to use SLURM's srun
command. When using mpirun
, as OpenMPI is not fully coordinated with the scheduler, EARL is not automatilly loaded on all nodes. Therefore EARL will be disabled and only basic energy metrics will be reported. To provide support for this workflow, EAR provides erun command. Read the corresponding examples section for more information about how to use this command.
To use MPI with Python applications, the EAR Loader cannot automatically detect symbols to classify the application as Intel or OpenMPI. In order to specify it, the user has to define the SLURM_LOAD_MPI_VERSION
environment variable with the values intel or open mpi. It is recommended to add in Python modules to make it easy for final users.
Since version 4.1 EAR automatically executes the Library with Python applications, so no action is needed. You must run the application with srun
command to pass through the EAR's SLURM SPANK plug-in in order to enable/disable/tuning EAR. See EAR submission flags provided by EAR SLURM integration.
To load EARL automatically with non-MPI applications it is required to have it compiled with dynamic symbols and also it must be executed with srun
command. For example, for CUDA applications the --cudart=shared
option must be used at compile time. EARL is loaded for OpenMP, MKL and CUDA programming models when symbols are dynamically detected.
For other programming models or sequential apps not supported by default, EARL can be forced to be loaded by setting SLURM_EAR_LOADER_APPLICATION enviroment variable, which must be defined with the application name. For example:
Apptainer (formerly Singularity) is an open source technology for containerization. It is widely used in HPC contexts because the level of virtualization it offers enables the access to local services. It allows for geater reproducibility, making the programs less dependant on the environment they are being run on.
An example singularity command could look something like this:
where IMAGE
is an environment variable that contains the path of the Singularity container, and program
is the executable to be run in the image.
In order to be able to use EAR inside the container two actions are needed:
To bind folders there are two options: (1) using the environment variable SINGULARITY_BIND
/APPTAINER_BIND
or (2) using the -B
flag when running the container. 1 is a comma separated string of pairs of paths [path_1][[:path_2][:perms]]
such that path_1 in local will be mapped into path_2 in the image with the permissions set in perms, which can be r or rw. Specifying path_2 and perm is optional. If they are not specified path_1 will be bound in the same location.
To make EAR working the following paths sould be added to the binding configuration:
$EAR_INSTALL_PATH,$EAR_INSTALL_PATH/bin,$EAR_INSTALL_PATH/lib,$EAR_TMP
You should have an EAR module to have the above environment variables. Contact with your system administrator for more information.
Once paths are deployed, to execute (for example) an OpenMPI application inside a Singularity/Apptainer enabling the EAR Library just the following is needed:
A more complete example would look something like this:
Note that the example exports APPTAINERENV_EAR_REPORT_ADD
to set the environment variable `EAR_REPORT_ADD` to load `sysfs` report plug-in.
As a job accounting and monitoring tool, EARL collects some metrics that you can get to see or know your applications behaviour. The Library is doted with several modules and options to be able to provide different kind of information.
As a very simple hint of your application workload, you can enable EARL verbosity to get loop data at runtime. The information is shown at stderr by default. Read how to set up verbosity at submission time and verbosity environment variables provided for a more advanced tunning of this EAR feature.
To get offline job data EAR provides eacct, a tool to provide the monitored job data stored in the Database. You can request information in different ways, so you can read aggregated job data, per-node or per-loop information among other things. See eacct usage examples for a better overview of which kind of data eacct
provides.
There is another way to get runtime and aggregated data during runtime without the need of calling eacct
after the job completion. EAR implements a reporting system mechanism which let developers to add new report plug-ins, so there is an infinit set of ways to report EAR collected data.
Therefore EAR releases come with a fully supported report plug-in (called csv_ts) which basically provides the same runtime and aggregated data reported to the Database in CSV files, directly while the job is running. You can load this plug-in in two ways:
export SLURM_EAR_REPORT_ADD=csv_ts.so
.Contact with ear-support@bsc.es for more information about report plug-ins.
You can also request EAR to report events to the Database. They show more details about EARL internal state and can be provided with eacct
command. See how to enable EAR events reporting and which kind of events EAR is reporting.
If your application applies, you can request EAR to report at the end of the execution a summary about its MPI behaviour. The information is provided along two files and is the aggregated data of each process of the application.
Finally, EARL can provide runtime data in the Paraver trace format. Paraver is a flexible performance analysis tool maintained by the Barcelona Supercomputing Center's tools team. This tool provides an easy way to visualize runtime data, computing derived metrics and to provide histograms for better of your application behaviour. See on the environment variables page how to generate Paraver traces.
Contact with ear-support@bsc.es if you want to get more details about how to deal with EAR data with Paraver.
The following EAR options can be specified when running srun
and/or sbatch
, and are supported with srun
/sbatch
/salloc
:
Option | Description |
---|---|
--ear=[on|off] | Enables/disables EAR library loading with this job. |
--ear-user-db=<filename> | Asks the EAR Library to generate a set of CSV files with EARL metrics. |
--ear-verbose=[0|1] | Specifies the level of verbosity; the default is 0. |
When using --ear-user-db
flag, one file per node is generated with the average node metrics (node signature) and one file with multiple lines per node is generated with runtime collected metrics (loops node signatures). Read eacct's section in the commands page to know which metrics are reported, as data generated by this flag is the same as the reported (and retrieved later by the command) to the Database.
Verbose messages are placed by default in stderr. For jobs with multiple nodes, ear-verbose
option can result in lots of messages mixed at stderr. We recommend to split up SLURM's output (or error) file per-node. You can read SLURM's filename pattern specification for more information.
If you still need to have job output and EAR output separated, you can set SLURM_EARL_VERBOSE_PATH environment variable and one file per node will be generated only with EAR output. The environemnt variable must be set with the path (a directory) where you want the output files to be generated, it will be automatically created if needed.
You can always check the avaiable EAR submission flags provided by EAR's SLURM SPANK plug-in by typing
srun --help
.
The EAR configuration file supports the specification of EAR authorized users, who can ask for a more privileged submission options. The most relevant ones are the possibility to ask for a specific optimisation policy and a specific CPU frequency.
Contact with sysadmin or helpdesk team to become an authorized user.
--ear-policy=policy_name
flag asks for policy_name policy. Type srun --help
to see policies currently installed in your system.--ear-cpufreq=value
(value must be given in kHz) asks for a specific CPU frequency.EAR version 3.4 and upwards supports GPU monitoring for NVIDIA devices from the point of view of the application and node monitoring. GPU frequency optimization is not yet supported. Authorized users can ask for a specific GPU frequency by setting the SLURM_EAR_GPU_DEF_FREQ
environment variable, giving the desired GPU frequency expressed in kHz. Only one frequency for all GPUs is now supported.
Contact with sysadmin or helpdesk team to become an authorized user.
To see the list of available frequencies of the GPU you will work on, you can type the following command:
Having an MPI application asking for one node and 24 tasks, the following is a simple case of job submission. If EARL is turned on by default, no extra options are needed to load it. To check if it is on by default, load the EAR module and execute the ear-info
command. EAR verbose is set to 0 by default, i.e., no EAR messages.
The following executes the application showing EAR messages, including EAR configuration and node signature in stderr.
EARL verbose messages are generated in the standard error. For jobs using more than 2 or 3 nodes messages can be overwritten. If the user wants to have EARL messages in a file the SLURM_EARL_VERBOSE_PATH
environment variable must be set with a folder name. One file per node will be generated with EARL messages.
The following asks for EARL metrics to be stored in csv file after the application execution. Two files per node will be generated: one with the average/global signature and another with loop signatures. The format of output files is <filename>.<nodename>.time.csv for the global signature and <filename>.<nodename>.time.loops.csv for loop signatures.
For EAR authorized users, the following executes the application with a CPU frequency of 2.0GHz:
For --ear-cpufreq
to have any effect, you must specify the --ear-policy
option even if you want to run your application with the default policy.
When using sbatch
EAR options can be specified in the same way. If more than one srun
is included in the job submission, EAR options can be inherited from sbatch
to the different srun
instances or they can be specifically modified on each individual srun
.
The following example will execute twice the application. Both instances will have the verbosity set to 1. As the job is asking for 10 nodes, we have set the SLURM_EARL_VERBOSE_PATH
environment variable set to the ear_log folder. Moreover, the second step will create a set of csv files placed in the ear_metrics folder. The nodename, Job Id and Step Id are part of the filename for a better identification.
When running EAR with mpirun
rather than srun
, we have to specify the utilization of srun
as bootstrap. Version 2019 and newer offers two environment variables for bootstrap server specification and arguments.
Bootstrap is an Intel(R) MPI option but not an OpenMPI option. For OpenMPI srun
must be used for an automatic EAR support. In case OpenMPI with mpirun
is needed, EAR offers the erun
command, which is a program that simulates all the SLURM and EAR SLURM Plug-in pipeline. You can launch erun
with the --program
option to specify the application name and arguments.
In this example, mpirun
would run 4 erun processes. Then, erun
will launch the application hostname
with its alias parameter. You can use as many parameters as you want but the semicolons have to cover all of them in case there are more than just the program name.
erun
will simulate on the remote node both the local and remote pipelines for all created processes. It has an internal system to avoid repeating functions that are executed just one time per job or node, like SLURM does with its plugins.
IMPORTANT NOTE If you are going to launch n
applications with erun
command through a sbatch job, you must set the environment variable SLURM_STEP_ID
to values from 0
to n-1
before each mpirun
call. By this way erun
will inform the EARD the correct step ID to be stored then to the Database.
The eacct command shows accounting information stored in the EAR DB for jobs (and steps) IDs. The command uses EAR's configuration file to determine if the user running it is privileged or not, as non-privileged users can only access their information. It provides the following options.
The basic usage of eacct
retrieves the last 20 applications (by default) of the user executing it. If a user is privileged, they may see all users applications. The default behaviour shows data from each job-step, aggregating the values from each node in said job-step. If using SLURM as a job manager, a sb (sbatch) job-step is created with the data from the entire execution. A specific job may be specified with -j
option.
The command shows a pre-selected set of columns, read eacct
's section on the EAR commands page.
For node-specific information, the -l
(i.e., long) option provides detailed accounting of each individual node: In addition, eacct
shows an additional column: VPI(%)
(See the example below). The VPI is meaning the percentage of AVX512 instructions over the total number of instructions.
If EARL was loaded during an application execution, runtime data (i.e., EAR loops) may be retrieved by using -r
flag. You can still filter the output by Job (and Step) ID.
Finally, to easily transfer eacct
’s output, -c
option saves the requested data in CSV format. Both aggregated and detailed accountings are available, as well as filtering. When using along with -l
or -r
options, all metrics stored in the EAR Database are given. Please, read the commands section page to see which of them are available.
The core component of EAR at the user's job level is the EAR Library (EARL). The Library deals with job monitoring and is the component which implements and applies optimization policies based on monitored workload.
We highly recommend you to read EARL documentation and also how energy policies work in order to better understand what is doing the Library internally, so you will can explore easily all features (e.g., tunning variables, collecting data) EAR offers to the end-user so you will have more knowledge about how much resources your application consumes and how to correlate with its computational characteristics.