EAR 4.3
Reference Manual
Admin guide

EAR Components

EAR is composed of five main components:

  • Node Manager (EARD). The Node Manager must have root access to the node where it will be running.
  • Database Manager (EARDBD). The database manager requires access to the DB server (we support MariaDB and Postgress). Documentation for Postgress is still under development.
  • Global Manager (EARGM). The global manager needs access to all node managers in the cluster as well as access to database.
  • Library (EARL)
  • SLURM plugin

The following image shows the main interactions between components:

EAR components diagram

   

For a more detailed information about EAR components, visit the Architecture page.

Quick Installation Guide

This section provides a, summed up, step by step installation and execution guide for EAR. For a more in depth explanation of the necessary steps see the Installation from source page or the Installing from RPM section, following the Configuration guide, or contact us at ear-s.nosp@m.uppo.nosp@m.rt@bs.nosp@m.c.es

EAR Requirements

Requirements to compile EAR are:

  • C compiler.
  • MPI compiler.
  • CUDA installation path if NVIDIA is used.
  • Likwid path if Likwid is used.
  • Freeipmi path if freeipmi is used.
  • GSL is needed for coefficient computations.

To install EAR from rpm (only binaries) all these dependencies have been removed except mysqlclient. However, they are needed when running EAR.

SLURM must also be present if the SLURM plugin wants to be used. Since current EAR version only supports automatic execution of applications with EAR library using the SLURM plugin, it must be running when EAR library wants to be used (not needed for node monitoring).

Lastly, but not less important:

  • The drivers for CPU frequency management (acpi-cpufreq) and Open IPMI must be present and loaded in compute nodes.
  • msr kernel module must be loaded in compute nodes.
  • mariaDB or postgress server must be up and running.
  • Hardware counters must be accessible for normal users. Set */proc/sys/kernel/perf_event_paranoid* to 2 (or less). Type sudo sh -c "echo 2 > /proc/sys/kernel/perf_event_paranoid" in compute nodes.

Run ./configure --help to see all the flags and options.

Compiling and installing EAR

Once downloaded the code from repository, execute:

  • autoreconf -i.
./configure --prefix=ear-install-path \
EAR_TMP=ear-tmp-path \
EAR_ETC=ear-etc-path \
CC=c-compiler-path \
MPICC=mpi-compiler-path \
CC_FLAGS=c-flags-compiler \
MPICC_FLAGS=mpi-flags \
--with-cuda=path-to-cuda \
MAKE_NAME=make_extension`

Additionally to the Makefile, MAKE_NAME forces to copy the generated Makefile with the name Makefile._make_extension_. It simplifies the fact of having multiple configurations (1 for each library version needed). More relevant options are:

  • The option --disable-mpi must be set to generate a configuration for non-MPI version of the library.
  • Use MPI_VERSION=ompi for OpenMPI compatible version.

Before running make, review the Makefile and the configuration log to validate all the requirements of your installation have been automatically detected. In particular, if you need to use some specific library such likwid, freeipmi or CUDA. If CUDA path is specified, EAR will be compiled with GPU support. Check also that MySQL ot PostgreSQL paths have been detected. You can use options USER and GROUP if you want to install EAR with a special USER/GROUP.

The following shows how to configure EAR to be compiled with Intel MPI:

autoreconf -i
./configure --prefix=/opt/ear CC=icc MPICC=mpiic MAKE_NAME=impi
make -f Makefile.impi
make -f Makefile.impi install
make -f Makefile.impi doc.install
make -f Makefile.impi etc.install

At this point the EAR binaries will be installed including one version of the EAR library for MPI (default), EAR documentation, EAR service files for EAR daemons and templates for ear.conf files and SLURM plugin. The configure tool tries to automatically detect paths to mysql and/or postgress, scheduler sources, etc. It is mandatory to detect the scheduler path, by default SLURM is assumed. After the configure, check in the Makefile all the options have been detected. After the make install, you should have the following folders in the ear-install-path: bin, sbin, etc, lib, include, man. The bin directory includes commands and tools, the sbin includes EAR services, the lib includes all the libraries and plugins, and etc includes templates and examples for EAR service files, ear.conf file, the EAR module, etc.

Deployment and validation

Monitoring: Compute node and DB

Prepare the configuration

Either installing from sources or rpm, EAR installs a template for ear.conf file in $EAR_ETC/ear/ear.conf.template and $EAR_ETC/ear/ear.conf.full.template. The full version includes all fields. Copy only one as $EAR_ETC/ear/ear.conf and update with the desired configuration. Go to the configuration section to see how to do it. The ear.conf is used by all the services. It is recommended to have in a shared folder to simplify the changes in the configuration.

EAR module

Install and load EAR module to enable commands. It can be found at $EAR_ETC/module. You can add ear module whan it is not in standard path by doing module use $EAR_ETC/module and then module load ear.

EAR Database

Create EAR database with edb_create, installed at $EAR_INSTALL_PATH/sbin. The edb_create -p command will ask you for the DB root password. If you get any problem here, check first whether the node where you are running the command can connect to the DB server. In case problems persists, execute edb_create -o to report the specific SQL queries generated. In case of trouble, contact with ear-s.nosp@m.uppo.nosp@m.rt@bs.nosp@m.c.es or open in issue.

Energy models

EAR uses a power and performance model based on systems signatures. These system signatures are stored in coefficient files.

Before starting EARD, and just for testing, it is needed to create a dummy coefficient file and copy in the coefficients path, by default placed at$EAR_ETC/coeffs. Use the coeffs_null application from tools section.

‍EAR version 4.1 does not require null coefficients.

EAR services

Create soft links or copy EAR service files to start/stop services using system commands such as systemctl in the services folder. EAR service files are generated at $EAR_ETC/systemd and they can usually be placed in $(ETC)/systemd.

  • EARD must be started on compute nodes.
  • EARDBD must be started on service nodes (can be any node with DB access).

Enable and start EARDs and EARDBDs via services (e.g., sudo systemctl start eard, sudo systemctl start eardbd). EARDBD and EARD outputs can be found at $EAR_TMP/eardbd.server.log and $EAR_TMP/eard.log respectively when DBDaemonUseLog and NodeUseLog options are set to 1 in the ear.conf file, respectively. Otherwise, their outputs are generated at stderr and can be seen using the journalctl command (i.e., journalctl -u eard).

By default, a certain level of verbosity is set. It is not recommended to modify it but you can change it by modifying the value of constants in file src/common/output/output_conf.h.

Quick validation

Check that EARDs are up and running correctly with econtrol --status (note that daemons will take around a minute to correctly report energy and not show up as an error in econtrol). EARDs create a per-node text file with values reported to the EARDBD (local to compute nodes). In case there are problems when running econtrol, you can also find this file at $EAR_TMP/nodename.pm_periodic_data.txt.

Check that EARDs are reporting metrics to database with ereport. ereport -n all should report the total energy sent by each daemon since the setup.

Monitoring: EAR plugin

  • Set up EAR's SLURM plugin (see the configuration section for more information).

    ‍It is recommented to create a soft link to the $EAR_ETC/slurm/ear.plugstack.conf

    file in the /etc/slurm/plugstack.conf.d directory to simplify the EAR plugin management.

‍For a first test it is recommened to set default=off in the ear.plugstack.conf

(to disable the automatic loading of the EAR library).

EAR plugin validation

At this point you must be able to see EAR options when doing, for example, srun --help. You must see something like below as part of the output. The EAR plugin must be enabled at login and compute nodes.

[user@hostname ~]$ srun --help
Usage: srun [OPTIONS(0)... [executable(0) [args(0)...]]] [ : [OPTIONS(N)...]] executable(N) [args(N)...]
Parallel run options:
...
Constraint options:
...
Consumable resources related options:
...
Affinity/Multi-core options: (when the task/affinity plugin is enabled)
...
Options provided by plugins:
--ear=on|off Enables/disables Energy Aware Runtime Library
--ear-policy=type Selects an energy policy for EAR
{type=default,gpu_monitoring,monitoring,min_energ-
y,min_time,gpu_min_energy,gpu_min_time}
--ear-cpufreq=frequency Specifies the start frequency to be used by EAR
policy (in KHz)
--ear-policy-th=value Specifies the threshold to be used by EAR policy
(max 2 decimals) {value=[0..1]}
--ear-user-db=file Specifies the file to save the user applications
metrics summary 'file.nodename.csv' file will be
created per node. If not defined, these files
won't be generated.
--ear-verbose=value Specifies the level of the
verbosity{value=[0..1]}; default is 0
--ear-learning=value Enables the learning phase for a given P_STATE
{value=[1..n]}
--ear-tag=tag Sets an energy tag (max 32 chars)
...
Help options:
-h, --help show this help message
--usage display brief usage message
Other options:
-V, --version output version information and exit
  • Submit one application via SLURM and check that it is correctly reported to the database with eacct command.

‍Note that only privileged users can check other users’ applications.

  • Submit one MPI application (corresponding with the version you have compiled) with --ear=on and check that now the output of eacct includes the Library metrics.
  • Set default=on to set the EAR Library loading by default at ear.plugstack.conf. If default is turned off, EARL can be explicitly loaded by setting the flag --ear=off at job submission.

At this point, you can use EAR for monitoring and accounting purposes but it cannot use the power policies offered by EARL. To enable them, first perform a learning phase and compute node coefficients. See the EAR learning phase wiki page. For the coefficients to be active, restart daemons.

Important Reloading daemons will NOT make them load coefficients, restarting the service is the only way.

EAR Library versions: MPI vs. Non-MPI

As commented in the overview, the EAR Library is loaded next to the user MPI application by the EAR Loader. The Library uses MPI symbols, so it is compiled by using the includes provided by your MPI distribution. The selection of the library version is automatic at runtime, but it is not required during the compilation and installation steps. Each compiled library version has its own file name that has to be defined by the MPI_VERSION variable during the ./configure or by editing the root Makefile.

The name list per distribution is exposed in the following table:

Distribution Name MPI_VERSION value
Intel MPI libear.so (default) not required
MVAPICH libear.so (default) not required
OpenMPI libear.ompi.so ompi

If different MPI distributions share the same library name, it means their symbols are compatible between them, so compiling and installing the library one time will be enough. However, if you provide different MPI distributions to users, you will have to compile and install the library multiple times.

EAR makefiles include a specific target for each EAR component, supporting full or partial updates:

Command Description
make -f Makefile.make_extension install Reinstall all the files except etc and doc.
make -f Makefile.make_extension earl.install Reinstall only the EARL.
make -f Makefile.make_extension eard.install Reinstall only the EARD.
make -f Makefile.make_extension earplug.install Reinstall only the EAR SLURM plugin.
make -f Makefile.make_extension eardbd.install Reinstall only the EARDBD.
make -f Makefile.make_extension eargmd.install Reinstall only the EARGMD.
make -f Makefile.make_extension reports.install Reinstall only report plugins.

Before compiling new libraries you have to install by typing make install. Then you can run the ./configure again, changing the MPICC, MPICC_FLAGS and MPI_VERSION variables, or just opening the root Makefile and edit the same variables and MPI_BASE, which just sets the MPI installation root path. Now type make full to perform a clean compilation and make earl.install, to install only the new version of the library.

If your MPI version is not fully compatible, please contact ear-s.nosp@m.uppo.nosp@m.rt@bs.nosp@m.c.es.

See the User guide to check the use cases supported and how to submit jobs with EAR.

Installing from RPM

EAR includes the specification files to create an rpm from an already existing installation. The spec file is placed at etc/rpms. To create the RPM it is needed a valid installation from source. The RPM can be part of the system image. Visit the Requirements page for a quick overview of the requirements.

Execute the rpmbuild.sh script to create the EAR rpm file. Once created, it can be included in the compute nodes images. It is recommened only when no more changes are expected on the installation. Once you have the rpm file, execute the following steps:

  • Before the installation, make sure the installation path is accessible by all the computing nodes. Do the same in the folder where you want to set the temporary files (it will be called $(EAR_TMP) in this guide for simplicity).
  • Default paths are /usr and /etc.
  • Run rpm -ivh --relocate /usr=/new/install/path --relocate /etc=/new/etc/path ear.version.rpm.

‍You can also use the --nodeps if your dependency test fails.

  • During the installation the configuration files *.in are compiled to the ready to use version, replacing tags for correct paths. You will have more information of those files in the following pages. Check the next section for more information.
  • Type rpm -e ear.version to uninstall.

Installation content

The *.in configuration files are compiled into etc/ear/ear.conf.template and etc/ear/ear.full.conf.template, etc/module/ear, etc/slurm/ear.plugstack.conf and various etc/systemd/ear*.service. You can find more information in the configuration page. Below table describes the complet heriarchy of the EAR installation:

Directory Content / description
/usr/lib Libraries and the scheduler plugin.
/usr/lib/plugins EAR plugins.
/usr/bin EAR commands.
/usr/bin/tools EAR tools for coefficients computation.
/usr/sbin Privileged components: EARD, EARDBD, EARGMD.
/etc/ear Configuration files templates.
/etc/ear/coeffs Folder to store coefficient files.
/etc/module EAR module.
/etc/slurm EAR SLURM plugin configuration file.
/etc/systemd EAR service files.

RPM requirements

EAR uses some third party libraries. EAR RPM will not ask for them when installing but they must be available in LD_LIBRARY_PATH when running an application and you want to use EAR. Depending on the RPM, different version must be required for these libraries:

Library Minimum version References
MPI - -
MySQL* 15.1 MySQL or MariaDB
PostgreSQL* 9.2 PostgreSQL
Autoconf 2.69 Website
GSL 1.4 Website
  • Just one of them required.

These libraries are not required, but can be used to get additional functionality or metrics:

Library Minimum version References
SLURM 17.02.6 Website
PBS** 2021 PBSPro or OpenPBS
CUDA/NVML 7.5 CUDA
CUPTI** 7.5 CUDA
Likwid 5.2.1 Likwid
FreeIPMI 1.6.8 FreeIPMI
OneAPI/L0** 1.7.9 OneAPI
LibRedFish** 1.3.6 LibRedFish

** These will be available in next release.

Also, some drivers has to be present and loaded in the system when starting EAR:

Driver File Kernel version References
CPUFreq kernel/drivers/cpufreq/acpi-cpufreq.ko 3.10 Information
Open IPMI kernel/drivers/char/ipmi/*.ko 3.10 Information

Starting Services

The best way to execute all EAR daemon components (EARD, EARDBD, EARGM) is by the unit services method.

NOTE EAR uses a MariaDB/MySQL server. The server must be started before EAR services are executed.

The way to launch the EAR daemons is via unit services. The generated unit services for the EAR Daemon, EAR Global Manager Daemon and EAR Database Daemon are generated and installed in $(EAR_ETC)/systemd. You have to copy those unit service files to your systemd operating system folder and then use the systemctl command to run the daemons. Check the EARD, EARDBD, EARGMD pages to find the precise execution commands.

When using systemctl commands, you can check messages reported to stderr using journalctl. For instance: journalctl -u eard -f. Note that if NodeUseLog is set to 1 in ear.conf, the messages will not be printed to stderr but to $EAR_TMP/eard.log instead. DBDaemonUseLog and GlobalmanagerUseLog options in ear.conf specifies the output for EARDBD and EARGM, respectivelly.

Additionally, services can be started, stopped or reloaded on parallel using parallel commands such as pdsh. As an example: sudo pdsh -w nodelist systemctl start eard.

Updating EAR with a new installation

In some cases, it might be a good idea to create a new install instead of updating your current one, like trying new configurations or when a big update is released.

The steps to do so are:

  • Install EAR in the new folder
  • Replicate old etc (including ear.conf and coefficients) in the new one and update ear.conf with the new ETC path and whatever changes may be needed.
  • Update EAR services in /etc/systemd/system folder (or equivalent, depending on your OS). Service files include ETC path and the absolute path for binaries.
  • Update /etc/slurm/plugstag.conf with the new paths.
  • Create a new EAR module with the updated paths.

Once all that is done, one should have two complete EAR installs that can be switched by changing the binaries that are executed by the services and changing the path in plugstag.conf.

Next steps

For a better overview of the installation process, return to the installation guide. To continue the installation, visit the configuration page to set up properly the EAR configuration file and the EAR SLURM plugin stack file.