EAR offers the following commands:
mpirun
command: erun.Commands belonging to the first three categories read the EAR configurarion file (ear.conf
) to determine whether the user is authorized, as some of them has some features (or the wall command) only available that set of users. Root is a special case, it doesn't need to be included in the list of authorized users. Some options are disabled when the user is not authorized.
NOTE EAR module must be loaded in your environment in order to use EAR commands.
The eacct
command shows accounting information stored in the EAR DB for jobs (and step) IDs. The command uses EAR's configuration file to determine if the user running it is privileged or not, as non-privileged users can only access their information. It provides the following options.
The basic usage of eacct
retrieves the last 20 applications (by default) of the user executing it. If a user is privileged, they may see all users applications. The default behaviour shows data from each job-step, aggregating the values from each node in said job-step. If using SLURM as a job manager, a sb (sbatch) job-step is created with the data from the entire execution. A specific job may be specified with -j
option.
Below table shows some examples of eacct
usage.
Command line | Description |
---|---|
eacct | Shows last 20 jobs executed by the user. |
eacct -j <JobID> | Shows data of the job <JobID>, one row for each step of the job. |
eacct -j <JobID>.<StepID> | Shows data of the step <StepID> of job <JobID>. |
eacct -j <JobIDx>,<JobIDy>,<JobIDz> | Shows data of jobs (one row per step) <JobIDx>,<JobIDy> and <JobIDz>. |
The command shows a pre-selected set of columns:
Column field | Description |
---|---|
JOB-STEP | JobID and StepID reported. JobID-*sb* is shown for the sbatch step in SLURM systems. |
USER | The username of the user who executed the job. |
APPLICATION | Job’s name or executable name if job name is not provided. |
POLICY | Energy optimization policy name. MO means for monitoring, ME for min_energy, MT for min_time and NP is the job ran without EARL. |
NODES | Number of nodes involved in the job run. |
AVG/DEF/IMC(GHz) | Average CPU frequency, default frequency and average uncore frequency. Includes all the nodes for the step. In GHz. |
TIME(s) | Average step execution time along all nodes, in seconds. |
POWER(W) | Average node power along all the nodes, in Watts. |
GBS | CPU main memory bandwidth (GB/second). Hint for CPU/Memory bound classification. |
CPI | CPU Cycles per Instruction. Hint for CPU/Memory bound classification. |
ENERGY(J) | Accumulated node energy. Includes all the nodes. In Joules. |
GFLOPS/W | CPU GFlops per Watt. Hint for energy efficiency. The metric uses the number of operations, not instructions. |
IO(MBS) | I/O (read and write) Mega Bytes per second. |
MPI% | Percentage of MPI time over the total execution time. It’s the average including all the processes and nodes. |
If EAR supports GPU monitoring/optimisation, the following columns are added:
Column field | Description |
---|---|
G-POW (T/U) | Average GPU power. Accumulated per node and average along involved nodes. T mean for total GPU power consumed (even the job is not using any or all of GPUs in one node). U means for only used GPUs on each node. |
G-FREQ | Average GPU frequency. Per node and average of all the nodes. |
G-UTIL(G/MEM) | GPU utilization and GPU memory utilization. |
For node-specific information, the -l
(i.e., long) option provides detailed accounting of each individual node. In addition, eacct
shows an additional column: VPI(%)
. The VPI is meaning the percentage of AVX512 instructions over the total number of instructions.
For runtime data (EAR loops) one may retrieve them with -r
. Both Job and Step ID filtering works. To easily transfer command's output, -c
option saves it in .csv format. Both aggregated and detailed accountings are available, as well as filtering:
Command line | Description |
---|---|
eacct -j <JobID> -c test.csv | Adds to the file test.csv all metrics shown above for each step if the job <JobID>. |
eacct -j <JobID>.<StepID> -l -c test.csv | Appends to the file test.csv all metrics in the EAR DB for each node involved in step <StepID> of job <JobID>. |
eacct -j <JobID>.<StepID> -r -c test.csv | Appends to the file test.csv all metrics in EAR DB for each loop of each node involved in step <StepID> of job <JobID>. |
When requesting long format (i.e., -l
option) or runtime metrics (i.e., -r
option) to be stored in a CSV file (i.e., -c
option), header names change from the output shown when you don't request CSV format. Below table shows header names of CSV file storing long information about jobs:
Field name | Description |
---|---|
NODENAME | The node name the row information belongs to. |
JOBID | The JobID. |
STEPID | The StepID. For the sbatch step, SLURM_BATCH_SCRIPT value is printed. |
USERID | The username of the user who executed the job. |
GROUPID | The group name of the user who executed the job. |
JOBNAME | Job’s name or executable name if job name is not provided. |
USER_ACC | The account name of the user who executed the job. |
ENERGY_TAG | The energy tag used if the user set one for its job step. |
POLICY | Energy optimization policy name. MO means for monitoring, ME for min_energy, MT for min_time and NP is the job ran without EARL. |
POLICY_TH | The policy threshold used by the optimization policy set with the job. |
AVG_CPUFREQ_KHZ | Average CPU frequency of the job step executed in the node, expressed in kHz. |
AVG_IMCFREQ_KHZ | Average uncore frequency of the job step executed in the node, expressed in kHz. Default data fabric frequency on AMD sockets. |
DEF_FREQ_KHZ | default frequency of the job step executed in the node, expressed in kHz. |
TIME_SEC | Execution time (in seconds) of the application in the node. As this is computed by EARL, sbatch step does not contain such info. |
CPI | CPU Cycles per Instruction. Hint for CPU/Memory bound classification. |
TPI | Memory transactions per Instruction. Hint for CPU/Memory bound classification. |
MEM_GBS | CPU main memory bandwidth (GB/second). Hint for CPU/Memory bound classification. |
IO_MBS | I/O (read and write) Mega Bytes per second. |
PERC_MPI | Percentage of MPI time over the total execution time. |
DC_NODE_POWER_W | Average node power, in Watts. |
DRAM_POWER_W | Average DRAM power, in Watts. Not available on AMD sockets. |
PCK_POWER_W | Average RAPL package power, in Watts. |
CYCLES | Total number of cycles. |
INSTRUCTIONS | Total number of instructions. |
CPU-GFLOPS | CPU GFlops per Watt. Hint for energy efficiency. The metric uses the number of operations, not instructions. |
L1_MISSES | Total number of L1 cache misses. |
L2_MISSES | Total number of L2 cache misses. |
L3_MISSES | Total number of L3/LLC cache misses. |
SPOPS_SINGLE | Total number of single precision 64 bit floating point operations. |
SPOPS_128 | Total number of single precision 128 bit floating point operations. |
SPOPS_256 | Total number of single precision 256 bit floating point operations. |
SPOPS_512 | Total number of single precision 512 bit floating point operations. |
DPOPS_SINGLE | Total number of double precision 64 bit floating point operations. |
DPOPS_128 | Total number of double precision 128 bit floating point operations. |
DPOPS_256 | Total number of double precision 256 bit floating point operations. |
DPOPS_512 | Total number of double precision 512 floating point 512 operations. |
If EAR supports GPU monitoring/optimisation, the following columns are added:
Field name | Description |
---|---|
GPU*x*_POWER_W | Average GPU*x* power, in Watts. |
GPU*x*_FREQ_KHZ | Average GPU*x* frequency, in kHz. |
GPU*x*_MEM_FREQ_KHZ | Average GPu*x* memory frequency, in kHz. |
GPU*x*_UTIL_PERC | Average percentage of GPU*x* utilization. |
GPU*x*_MEM_UTIL_PERC | Average percentage of GPU*x* memory utilization. |
For runtime metrics (i.e., -r
option), USERID, GROUPID, JOBNAME, USER_ACC, ENERGY_TAG (as energy tags disable EARL), POLICY and POLICY_TH are not stored at the CSV file. However, the iteration time (in seconds) is present on each loop as ITER_TIME_SEC, as well as a timestamp (i.e., TIMESTAMP) with the elapsed time in seconds since the EPOCH.
The ereport command creates reports from the energy accounting data from nodes stored in the EAR DB. It is intended to use for energy consumption analysis over a set period of time, with some additional (optional) criteria such as node name or username.
The following example uses the 'all' nodes option to display information for each node, as well as a start_time so it will give the accumulated energy from that moment until the current time.
This example filters by EARDBD host (one per island typically) instead:
And to see the state of the cluster's energy budget (set by the sysadmin) you can use the following:
The econtrol
command modifies cluster settings (temporally) related to power policy settings. These options are sent to all the nodes in the cluster.
NOTE Any changes done with
econtrol
will not be reflected inear.conf
and thus will be lost when reloading the system.
econtrol
's status is a useful tool to monitor the nodes in a cluster. The most basic usage is the hardware status (default type) which shows basic information of all the nodes.
The application status type can be used to retrieve all currently running jobs in the cluster. app_master
gives a summary of all the running applications while app_node
gives detailed information of each node currently running a job.
Creates the EAR DB used for accounting and for the global energy control. Requires root access to the MySQL server. It reads the ear.conf
to get connection details (server IP and port), DB name (which may or may not have been previously created) and EAR's default users (which will be created or altered to have the necessary privileges on EAR's database).
Cleans periodic metrics from the database. Used to reduce the size of EAR's database, it will remove every Periodic_metrics entry older than num_days
:
Removes applications from the database. It is intended to remove old applications to speed up queries and free up space. It can also be used to remove specific applications from database. It removes ALL the information related to those jobs (the following tables will be modified for each job: Loops, if they exist; GPU_signatures, if they exist; Signatures, if they exist; Power signatures, Applications, and Jobs).
It is recommended to run the application with the -o option first to ensure that the queries that will be executed are correct.
erun
is a program that simulates all the SLURM and EAR SLURM Plug-in pipeline. It was designed to provide compatibility between MPI implementations not fully compatible with SLURM SPANK plug-in mechanism (e.g., OpenMPI), which is used to set up EAR at job submission. You can launch erun
with the --program
option to specify the application name and arguments. See the usage below:
The syntax to run an MPI application with erun
has the form ‘mpirun -n <X> erun --program='my_app arg1 arg2 .. argN’. Therefore,
mpirunwill run *X* erun processes. Then,
erunwill launch the application
my_app` with the arguments passed, if specified. You can use as many parameters as you want but the semicolons have to cover all of them in case there are more than just the program name.
erun
will simulate on the remote node both the local and remote pipelines for all created processes. It has an internal system to avoid repeating functions that are executed just one time per job or node, like SLURM does with its plugins.
IMPORTANT NOTE If you are going to launch n
applications with erun
command through a sbatch job, you must set the environment variable SLURM_STEP_ID
to values from 0
to n-1
before each mpirun
call. By this way erun
will inform the EARD the correct step ID to be stored then to the Database.
The --job-id
and --nodes
parameters create the environment variables that SLURM would have created automatically, because it is possible that your application make use of them. The --clean
option removes the temporal files created to synchronize all ERUN processes.
Also you have to load the EAR environment module or define its environment variables in your environment or script:
Variable | Parameter |
---|---|
EAR_INSTALL_PATH=<path> | prefix=<path> |
EAR_TMP=<path> | localstatedir=<path> |
EAR_ETC=<path> | sysconfdir=<path> |
EAR_DEFAULT=<on/off> | default=<on/off> |
ear-info
is a tool created to quickly view useful information about the current EAR installation of the system. It shows relevant details for both users and administrators, such as configuration defaults, installation paths, etc.
The tool prints out information without giving it any argument. It shows a resume about EAR parameters set at compile time, as well as some installation dependent configuration:
Below there is an example of the output:
EAR was designed to be installed on heterogeneous systems, so there are some configuration parameters that are applied to a set of nodes identified by different tags. The --node-conf
flag can be used to request additional information about a specific node. Configuration related to EAR's power capping sub-system, default optimization policies configuration and other parameters associated with the node requested are retrieved. You can read the EAR configuration section for more details about how EAR uses tags to identify and configure different kind of nodes on a given heterogeneous system.
Contact with ear-s.nosp@m.uppo.nosp@m.rt@bs.nosp@m.c.es for more information about the nomenclature used by ear-info
's output.