EAR  4.2.1
EAR Reference Manual
EAR-commands

EAR offers the following commands:

Commands belonging to the first three categories read the EAR configurarion file (ear.conf) to determine whether the user is authorized, as some of them has some features (or the wall command) only available that set of users. Root is a special case, it doesn't need to be included in the list of authorized users. Some options are disabled when the user is not authorized.

NOTE EAR module must be loaded in your environment in order to use EAR commands.

[[TOC]]

EAR job Accounting (eacct)

The eacct command shows accounting information stored in the EAR DB for jobs (and step) IDs. The command uses EAR's configuration file to determine if the user running it is privileged or not, as non-privileged users can only access their information. It provides the following options.

Usage: eacct [Optional parameters]
Optional parameters:
-h displays this message
-v displays current EAR version
-b verbose mode for debugging purposes
-u specifies the user whose applications will be retrieved. Only available to privileged users. [default: all users]
-j specifies the job id and step id to retrieve with the format [jobid.stepid] or the format [jobid1,jobid2,...,jobid_n].
A user can only retrieve its own jobs unless said user is privileged. [default: all jobs]
-a specifies the application names that will be retrieved. [default: all app_ids]
-c specifies the file where the output will be stored in CSV format. If the argument is "no_file" the output will be printed to STDOUT [default: off]
-t specifies the energy_tag of the jobs that will be retrieved. [default: all tags].
-s specifies the minimum start time of the jobs that will be retrieved in YYYY-MM-DD. [default: no filter].
-e specifies the maximum end time of the jobs that will be retrieved in YYYY-MM-DD. [default: no filter].
-l shows the information for each node for each job instead of the global statistics for said job.
-x shows the last EAR events. Nodes, job ids, and step ids can be specified as if were showing job information.
-m prints power signatures regardless of whether mpi signatures are available or not.
-r shows the EAR loop signatures. Nodes, job ids, and step ids can be specified as if were showing job information.
-o modifies the -r option to also show the corresponding jobs. Should be used with -j.
-n specifies the number of jobs to be shown, starting from the most recent one. [default: 20][to get all jobs use -n all]
-f specifies the file where the user-database can be found. If this option is used, the information will be read from the file and not the database.

The basic usage of eacct retrieves the last 20 applications (by default) of the user executing it. If a user is privileged, they may see all users applications. The default behaviour shows data from each job-step, aggregating the values from each node in said job-step. If using SLURM as a job manager, a sb (sbatch) job-step is created with the data from the entire execution. A specific job may be specified with -j option.

Below table shows some examples of eacct usage.

Command line Description
eacct Shows last 20 jobs executed by the user.
eacct -j <JobID> Shows data of the job <JobID>, one row for each step of the job.
eacct -j <JobID>.<StepID> Shows data of the step <StepID> of job <JobID>.
eacct -j <JobIDx>,<JobIDy>,<JobIDz> Shows data of jobs (one row per step) <JobIDx>,<JobIDy> and <JobIDz>.

The command shows a pre-selected set of columns:

Column field Description
JOB-STEP JobID and StepID reported. JobID-*sb* is shown for the sbatch step in SLURM systems.
USER The username of the user who executed the job.
APPLICATION Job’s name or executable name if job name is not provided.
POLICY Energy optimization policy name. MO means for monitoring, ME for min_energy, MT for min_time and NP is the job ran without EARL.
NODES Number of nodes involved in the job run.
AVG/DEF/IMC(GHz) Average CPU frequency, default frequency and average uncore frequency. Includes all the nodes for the step. In GHz.
TIME(s) Average step execution time along all nodes, in seconds.
POWER(W) Average node power along all the nodes, in Watts.
GBS CPU main memory bandwidth (GB/second). Hint for CPU/Memory bound classification.
CPI CPU Cycles per Instruction. Hint for CPU/Memory bound classification.
ENERGY(J) Accumulated node energy. Includes all the nodes. In Joules.
GFLOPS/W CPU GFlops per Watt. Hint for energy efficiency. The metric uses the number of operations, not instructions.
IO(MBS) I/O (read and write) Mega Bytes per second.
MPI% Percentage of MPI time over the total execution time. It’s the average including all the processes and nodes.

If EAR supports GPU monitoring/optimisation, the following columns are added:

Column field Description
G-POW (T/U) Average GPU power. Accumulated per node and average along involved nodes. T mean for total GPU power consumed (even the job is not using any or all of GPUs in one node). U means for only used GPUs on each node.
G-FREQ Average GPU frequency. Per node and average of all the nodes.
G-UTIL(G/MEM) GPU utilization and GPU memory utilization.

For node-specific information, the -l (i.e., long) option provides detailed accounting of each individual node. In addition, eacct shows an additional column: VPI(%). The VPI is meaning the percentage of AVX512 instructions over the total number of instructions.

For runtime data (EAR loops) one may retrieve them with -r. Both Job and Step ID filtering works. To easily transfer command's output, -c option saves it in .csv format. Both aggregated and detailed accountings are available, as well as filtering:

Command line Description
eacct -j <JobID> -c test.csv Adds to the file test.csv all metrics shown above for each step if the job <JobID>.
eacct -j <JobID>.<StepID> -l -c test.csv Appends to the file test.csv all metrics in the EAR DB for each node involved in step <StepID> of job <JobID>.
eacct -j <JobID>.<StepID> -r -c test.csv Appends to the file test.csv all metrics in EAR DB for each loop of each node involved in step <StepID> of job <JobID>.

When requesting long format (i.e., -l option) or runtime metrics (i.e., -r option) to be stored in a CSV file (i.e., -c option), header names change from the output shown when you don't request CSV format. Below table shows header names of CSV file storing long information about jobs:

Field name Description
NODENAME The node name the row information belongs to.
JOBID The JobID.
STEPID The StepID. For the sbatch step, SLURM_BATCH_SCRIPT value is printed.
USERID The username of the user who executed the job.
GROUPID The group name of the user who executed the job.
JOBNAME Job’s name or executable name if job name is not provided.
USER_ACC The account name of the user who executed the job.
ENERGY_TAG The energy tag used if the user set one for its job step.
POLICY Energy optimization policy name. MO means for monitoring, ME for min_energy, MT for min_time and NP is the job ran without EARL.
POLICY_TH The policy threshold used by the optimization policy set with the job.
AVG_CPUFREQ_KHZ Average CPU frequency of the job step executed in the node, expressed in kHz.
AVG_IMCFREQ_KHZ Average uncore frequency of the job step executed in the node, expressed in kHz. Default data fabric frequency on AMD sockets.
DEF_FREQ_KHZ default frequency of the job step executed in the node, expressed in kHz.
TIME_SEC Execution time (in seconds) of the application in the node. As this is computed by EARL, sbatch step does not contain such info.
CPI CPU Cycles per Instruction. Hint for CPU/Memory bound classification.
TPI Memory transactions per Instruction. Hint for CPU/Memory bound classification.
MEM_GBS CPU main memory bandwidth (GB/second). Hint for CPU/Memory bound classification.
IO_MBS I/O (read and write) Mega Bytes per second.
PERC_MPI Percentage of MPI time over the total execution time.
DC_NODE_POWER_W Average node power, in Watts.
DRAM_POWER_W Average DRAM power, in Watts. Not available on AMD sockets.
PCK_POWER_W Average RAPL package power, in Watts.
CYCLES Total number of cycles.
INSTRUCTIONS Total number of instructions.
CPU-GFLOPS CPU GFlops per Watt. Hint for energy efficiency. The metric uses the number of operations, not instructions.
L1_MISSES Total number of L1 cache misses.
L2_MISSES Total number of L2 cache misses.
L3_MISSES Total number of L3/LLC cache misses.
SPOPS_SINGLE Total number of single precision 64 bit floating point operations.
SPOPS_128 Total number of single precision 128 bit floating point operations.
SPOPS_256 Total number of single precision 256 bit floating point operations.
SPOPS_512 Total number of single precision 512 bit floating point operations.
DPOPS_SINGLE Total number of double precision 64 bit floating point operations.
DPOPS_128 Total number of double precision 128 bit floating point operations.
DPOPS_256 Total number of double precision 256 bit floating point operations.
DPOPS_512 Total number of double precision 512 floating point 512 operations.

If EAR supports GPU monitoring/optimisation, the following columns are added:

Field name Description
GPU*x*_POWER_W Average GPU*x* power, in Watts.
GPU*x*_FREQ_KHZ Average GPU*x* frequency, in kHz.
GPU*x*_MEM_FREQ_KHZ Average GPu*x* memory frequency, in kHz.
GPU*x*_UTIL_PERC Average percentage of GPU*x* utilization.
GPU*x*_MEM_UTIL_PERC Average percentage of GPU*x* memory utilization.

For runtime metrics (i.e., -r option), USERID, GROUPID, JOBNAME, USER_ACC, ENERGY_TAG (as energy tags disable EARL), POLICY and POLICY_TH are not stored at the CSV file. However, the iteration time (in seconds) is present on each loop as ITER_TIME_SEC, as well as a timestamp (i.e., TIMESTAMP) with the elapsed time in seconds since the EPOCH.

EAR system energy Report (ereport)

The ereport command creates reports from the energy accounting data from nodes stored in the EAR DB. It is intended to use for energy consumption analysis over a set period of time, with some additional (optional) criteria such as node name or username.

Usage: ereport [options]
Options are as follows:
-s start_time indicates the start of the period from which the energy consumed will be computed. Format: YYYY-MM-DD. Default: end_time minus insertion time*2.
-e end_time indicates the end of the period from which the energy consumed will be computed. Format: YYYY-MM-DD. Default: current time.
-n node_name |all indicates from which node the energy will be computed. Default: none (all nodes computed)
'all' option shows all users individually, not aggregated.
-u user_name |all requests the energy consumed by a user in the selected period of time. Default: none (all users computed).
'all' option shows all users individually, not aggregated.
-t energy_tag|all requests the energy consumed by energy tag in the selected period of time. Default: none (all tags computed).
'all' option shows all tags individually, not aggregated.
-i eardbd_name|all indicates from which eardbd (island) the energy will be computed. Default: none (all islands computed)
'all' option shows all eardbds individually, not aggregated.
-g shows the contents of EAR's database Global_energy table. The default option will show the records for the two previous T2 periods of EARGM.
This option can only be modified with -s, not -e
-x shows the daemon events from -s to -e. If no time frame is specified, it shows the last 20 events.
-v shows current EAR version.
-h shows this message.

Examples

The following example uses the 'all' nodes option to display information for each node, as well as a start_time so it will give the accumulated energy from that moment until the current time.

[user@host EAR]$ ereport -n all -s 2018-09-18
Energy (J) Node Avg. Power (W)
20668697 node1 146
20305667 node2 144
20435720 node3 145
20050422 node4 142
20384664 node5 144
20432626 node6 145
18029624 node7 128

This example filters by EARDBD host (one per island typically) instead:

[user@host EAR]$ ereport -s 2019-05-19 -i all
Energy (J) Node
9356791387 island1
30475201705 island2
37814151095 island3
28573716711 island4
29700149501 island5
26342209716 island6

And to see the state of the cluster's energy budget (set by the sysadmin) you can use the following:

[user@host EAR]$ ereport -g
Energy% Warning lvl Timestamp INC th p_state ENERGY T1 ENERGY T2 TIME T1 TIME T2 LIMIT POLICY
111.486 100 2019-05-22 10:31:34 0 100 893 1011400 907200 600 604800 EnergyBudget
111.492 100 2019-05-22 10:21:34 0 100 859 1011456 907200 600 604800 EnergyBudget
111.501 100 2019-05-22 10:11:34 0 100 862 1011533 907200 600 604800 EnergyBudget
111.514 100 2019-05-22 10:01:34 0 100 842 1011658 907200 600 604800 EnergyBudget
111.532 100 2019-05-22 09:51:34 0 100 828 1011817 907200 600 604800 EnergyBudget
111.554 0 2019-05-22 09:41:34 0 0 837 1012019 907200 600 604800 EnergyBudget

EAR Control (econtrol)

The econtrol command modifies cluster settings (temporally) related to power policy settings. These options are sent to all the nodes in the cluster.

NOTE Any changes done with econtrol will not be reflected in ear.conf and thus will be lost when reloading the system.

Usage: econtrol [options]
--status ->requests the current status for all nodes. The ones responding show the current
power, IP address and policy configuration. A list with the ones not
responding is provided with their hostnames and IP address.
--status=node_name retrieves the status of that node individually.
--type [status_type] ->specifies what type of status will be requested: hardware,
policy, full (hardware+policy), app_node, app_master, eardbd, eargm or power. [default:hardware]
--power ->requests the current power for the cluster.
--power=node_name retrieves the current power of that node individually.
--set-freq [newfreq] ->sets the frequency of all nodes to the requested one
--set-def-freq [newfreq] [pol_name] ->sets the default frequency for the selected policy
--set-max-freq [newfreq] ->sets the maximum frequency
--set-powercap [new_cap] ->sets the powercap of all nodes to the given value. A node can be specified
after the value to only target said node.
--hosts [hostlist] ->sends the command only to the specified hosts. Only works with status, power_status,
--power and --set-powercap
--restore-conf ->restores the configuration for all nodes
--active-only ->supresses inactive nodes from the output in hardware status.
--health-check ->checks all EARDs and EARDBDs for errors and prints all that are unresponsive.
--mail [address] ->sends the output of the program to address.
--ping ->pings all nodes to check whether the nodes are up or not. Additionally,
--ping=node_name pings that node individually.
--version ->displays current EAR version.
--help ->displays this message.

econtrol's status is a useful tool to monitor the nodes in a cluster. The most basic usage is the hardware status (default type) which shows basic information of all the nodes.

[user@login]$ econtrol --status
hostname power temp freq job_id stepid
node2 278 66C 2.59 6878 0
node3 274 57C 2.59 6878 0
node4 52 31C 1.69 0 0
INACTIVE NODES
node1 192.0.0.1

The application status type can be used to retrieve all currently running jobs in the cluster. app_master gives a summary of all the running applications while app_node gives detailed information of each node currently running a job.

[user@login]$ econtrol --status --type=app_master
Job-Step Nodes DC power CPI GBS Gflops Time Avg Freq
6878-0 2 280.13 0.37 24.39 137.57 54.00 2.59
[user@login]$ econtrol --status --type=app_node
Node id Job-Step M-Rank DC power CPI GBS Gflops Time Avg Freq
node2 6878-0 0 280.13 0.37 24.39 137.57 56.00 2.59
node3 6878-0 1 245.44 0.37 24.29 136.40 56.00 2.59

Database commands

edb_create

Creates the EAR DB used for accounting and for the global energy control. Requires root access to the MySQL server. It reads the ear.conf to get connection details (server IP and port), DB name (which may or may not have been previously created) and EAR's default users (which will be created or altered to have the necessary privileges on EAR's database).

Usage:edb_create [options]
-p Specify the password for MySQL's root user.
-o Outputs the commands that would run.
-r Runs the program. If '-o' this option will be override.
-h Shows this message.

edb_clean_pm

Cleans periodic metrics from the database. Used to reduce the size of EAR's database, it will remove every Periodic_metrics entry older than num_days:

Usage:./src/commands/edb_clean_pm [options]
-d num_days REQUIRED: Specify how many days will be kept in database. (defaut: 0 days).
-p Specify the password for MySQL's root user.
-o Print the query instead of running it (default: off).
-r Execute the query (default: on).
-h Display this message.
-v Show current EAR version.

edb_clean_apps

Removes applications from the database. It is intended to remove old applications to speed up queries and free up space. It can also be used to remove specific applications from database. It removes ALL the information related to those jobs (the following tables will be modified for each job: Loops, if they exist; GPU_signatures, if they exist; Signatures, if they exist; Power signatures, Applications, and Jobs).

It is recommended to run the application with the -o option first to ensure that the queries that will be executed are correct.

Usage:edb_clean_apps [-j/-d] [options]
-p The program will request the database user's password.
-u user Database user to execute the operation (it needs DELETE privileges). [default: root]
-j jobid.stepid Job id and step id to delete. If no step_id is introduced, every step within the job will be deleted
-d ndays Days to preserve. It will delete any jobs older than ndays.
-o Prints out the queries that would be executed. Exclusive with -r. [default:on]
-r Runs the queries that would be executed. Exclusive with -o. [default:off]
-l Deletes Loops and its Signatures. [default:off]
-a Deletes Applications and related tables. [default:off]
-h Displays this message

erun

erun is a program that simulates all the SLURM and EAR SLURM Plug-in pipeline. It was designed to provide compatibility between MPI implementations not fully compatible with SLURM SPANK plug-in mechanism (e.g., OpenMPI), which is used to set up EAR at job submission. You can launch erun with the --program option to specify the application name and arguments. See the usage below:

> erun --help
This is the list of ERUN parameters:
Usage: ./erun [OPTIONS]
Options:
--job-id=<arg> Set the JOB_ID.
--nodes=<arg> Sets the number of nodes.
--program=<arg> Sets the program to run.
--clean Removes the internal files.
SLURM options:
...

The syntax to run an MPI application with erun has the form ‘mpirun -n <X> erun --program='my_app arg1 arg2 .. argN’. Therefore,mpirunwill run *X* erun processes. Then,erunwill launch the applicationmy_app` with the arguments passed, if specified. You can use as many parameters as you want but the semicolons have to cover all of them in case there are more than just the program name.

erun will simulate on the remote node both the local and remote pipelines for all created processes. It has an internal system to avoid repeating functions that are executed just one time per job or node, like SLURM does with its plugins.

IMPORTANT NOTE If you are going to launch n applications with erun command through a sbatch job, you must set the environment variable SLURM_STEP_ID to values from 0 to n-1 before each mpirun call. By this way erun will inform the EARD the correct step ID to be stored then to the Database.

The --job-id and --nodes parameters create the environment variables that SLURM would have created automatically, because it is possible that your application make use of them. The --clean option removes the temporal files created to synchronize all ERUN processes.

Also you have to load the EAR environment module or define its environment variables in your environment or script:

Variable Parameter
EAR_INSTALL_PATH=<path> prefix=<path>
EAR_TMP=<path> localstatedir=<path>
EAR_ETC=<path> sysconfdir=<path>
EAR_DEFAULT=<on/off> default=<on/off>