Tables
Application information
The following tables contain information directly related to applications executed on the system while EAR was monitoring. The main key is the JOBID.STEPID combination generated by the scheduler.
- Jobs: job information (app_id, user_id, job_id, step_id, etc). One record per JOBID.STEPID is created in the DB.
- Applications: this table's records serve as a link between Jobs and Signatures, providing an application signature (from EARL) for each node of a job. One record per JOBID.STEPID.NODENAME is created in the DB.
- Loops: similar to Applications, but stores a Signature for each application loop detected by EARL, instead of one per each application. This table provides internal details of running applications and could significantly increase the DB size.
- Signatures: EARL computed signature and metrics. One record per JOBID.STEPID.NODENAME is created in the DB when the application is executed with EARL.
- GPU_signatures: EARL computed GPU signatures. This information belongs to a loop or application signature. If the signature is from a node with 4 GPUs there will be 4 records.
- Power_signatures: Basic time and power metrics that can be obtained without EARL. Reported for all applications. One record per JOBID.STEPID.NODENAME is created in the DB.
System monitoring
This tables contain periodic information gathered from the nodes. There is a single-node information table and an aggregated one to increase the speed of queries to get cluster-wide information.
- Periodic_metrics: node metrics reported every N seconds (N is defined in
ear.conf
).
- Periodic_aggregations: sum of all Periodic_metrics in a time period to ease accounting in
ereport
command and EARGM, as well as reducing database size (Periodic_metrics of older periods where precision at node level is not needed can be deleted and the aggregations can be used instead).
Events
- Events: EAR events report. There are several types of events, depending on their source: EARL, EARD-powercap, EARD-runtime and EARGM. For more information, see the table's fields and its header file (src/common/types/event_type.h). For EARL-specific events, also see this.
EARGM reports
- Global_energy: contains reports of cluster-wide energy accounting set by EARGM using the parameters in
ear.conf
. One record every T1 period (defined at ear.conf) is reported.
Learning phase
This tables are the same as their non-learning counterparts, but are specifically used to store the applications executed during a learning phase.
- Learning_applications: same as Applications, restricted to learning phase applications.
- Learning_jobs: same as Jobs, restricted to learning phase jobs.
- Learning_signatures: same as Signatures, restricted to learning phase job metrics.
NOTE In order to have GPU_signatures table created and Periodic_metrics containing GPU data, the databasease must be created (if you follow the edb_create
approach, see the section down below) with GPUs enabled at the compilation time. See how to update from previous versions if you are updating EAR from a release not having GPU metrics.
Creation and maintenance
To create the database a command (edb_create
) is provided by EAR, which can either create the database directly or provide the queries for the database creation so the administrator can use them or modify them at their discretion (any changes may alter the correct function of EAR's accounting).
Since a lot of data is reported by EAR to the database, EAR provides two commands to remove old data and free up space. These are intended to be used with a cron
job or a similar tool, but they can also be run manually without any issues. The two tools are edb_clean_pm
to remove periodic data accounting from nodes, and edb_clean_apps
to remove all the data related to old jobs.
For more information on this commands, check the commands' page on the wiki.
Database creation and ear.conf
When running edb_create
some tables might not be created, or may have some quirks, depending on some ear.conf
settings. The settings and alterations are as follows:
DBReportNodeDetail
: if set to 1, edb_create
will create two additional columns in the Periodic_metrics table for Temperature (in Celsius) and Frequency (in Hz) accounting.
DBReportSigDetail
: if set to 1, Signatures will have additional fields for cycles, instructions, and FLOPS1-8 counters (number of instruction by type).
DBMaxConnections
: this will restrict the number of maximum simultaneous commands connections.
If any of the settings is set to 0, the table will have fewer details but the table's records will be smaller in stored size.
Any table with missing columns can be later altered by the admin to include said columns. For a full detail of each table's columns, run edb_create -o
with the desired ear.conf
settings.
Information reported and ear.conf
There are various settings in ear.conf
that restrict data reported to the database and some errors might occur if the database configuration is different from EARDB's.
DBReportNodeDetail
: if set to 1, node managers will report temperature, average frequency, DRAM and PCK energy to the database manager, which will try to insert it to Periodic_metrics. If Periodic_metrics does not have the columns for both metrics, an error will occur and nothing will be inserted. To solve the error, set ReportNodeDetail
to 0 or manually update Periodic_metrics in order to have the necessary columns.
DBReportSigDetail
: similarly to ReportNodeDetail
, an error will occur if the configuration differs from the one used when creating the database.
DBReportLoops
: if set to 1, EARL detected application loops will be reported to the database, each with its corresponding Signature. Set to 0 to disable this feature. Regardless of the setting, no error should occur.
If Signatures and/or Periodic_metrics have additional columns but their respective settings are set to 0, a NULL will be set in those additional columns, which will make those rows smaller in size (but bigger than if the columns did not exist).
Additionally, if EAR was compiled in a system with GPUs (or with the GPU flag manually enabled), another table to store GPU data will be created.
NOTE the nomenclature is modified from MySQL's type. Any type starting with u
is unsigned. bigint
corresponds to an integer of 64 bits, int
is 32 and smallint
is 16.
For a detailed description of each field in any of the database's tables, see [here](EAR-database-table-descriptions).
Updating from previous versions
From EAR 4.2 to 4.3
To add support for workflows, a new field was added to several tables to allow their accounting:
ALTER TABLE Jobs ADD COLUMN local_id INT UNSIGNED NOT NULL AFTER step_id;
ALTER TABLE Jobs DROP PRIMARY KEY, ADD PRIMARY KEY (job_id, step_id, local_id);
ALTER TABLE Applications ADD COLUMN local_id INT UNSIGNED NOT NULL AFTER step_id;
ALTER TABLE Applications DROP PRIMARY KEY, ADD PRIMARY KEY (job_id, step_id, local_id);
ALTER TABLE Loops ADD COLUMN local_id INT UNSIGNED NOT NULL AFTER step_id;
Three new fields corresponding to L1, L2 and L3 cache misses have been added to the signatures.
NOTE This change only applies to the databases that have been created with the extended application signature (i.e. they have the FLOPS, instructions and cycles counters in their signatures).
ALTER TABLE Signatures
ADD COLUMN L1_misses BIGINT UNSIGNED AFTER perc_MPI,
ADD COLUMN L2_misses BIGINT UNSIGNED AFTER L1_misses,
ADD COLUMN L3_misses BIGINT UNSIGNED AFTER L2_misses;
ALTER TABLE Learning_signatures
ADD COLUMN L1_misses BIGINT UNSIGNED AFTER perc_MPI,
ADD COLUMN L2_misses BIGINT UNSIGNED AFTER L1_misses,
ADD COLUMN L3_misses BIGINT UNSIGNED AFTER L2_misses;
From EAR 4.1 to 4.2
A field in the Events table had its name changed to be more generic. One can do that with EITHER of the following commands:
ALTER TABLE Events RENAME COLUMN freq TO value;
ALTER TABLE Events CHANGE freq value INT unsigned;
Furthermore, some errors on big servers have been found due to the ids of a few fields being too small. To correct this, please run the following commands:
ALTER TABLE Learning_signatures MODIFY COLUMN id BIGINT unsigned AUTO_INCREMENT;
ALTER TABLE Signatures MODIFY COLUMN id BIGINT unsigned AUTO_INCREMENT;
ALTER TABLE Applications MODIFY COLUMN signature_id BIGINT unsigned;
ALTER TABLE Loops MODIFY COLUMN signature_id BIGINT unsigned;
If GPUs are being used, also run:
ALTER TABLE GPU_signatures MODIFY COLUMN id BIGINT unsigned AUTO_INCREMENT;
ALTER TABLE Learning_signatures MODIFY COLUMN min_gpu_sig_id BIGINT unsigned;
ALTER TABLE Learning_signatures MODIFY COLUMN max_gpu_sig_id BIGINT unsigned;
ALTER TABLE Signatures MODIFY COLUMN min_gpu_sig_id BIGINT unsigned;
ALTER TABLE Signatures MODIFY COLUMN max_gpu_sig_id BIGINT unsigned;
From EAR 3.4 to 4.0
Several fields have to be added in this update. To do so, run the following commands to the database's CLI client:
ALTER TABLE Signatures ADD COLUMN avg_imc_f INT unsigned AFTER avg_f;
ALTER TABLE Signatures ADD COLUMN perc_MPI FLOAT AFTER time;
ALTER TABLE Signatures ADD COLUMN IO_MBS FLOAT AFTER GBS;
ALTER TABLE Learning_signatures ADD COLUMN avg_imc_f INT unsigned AFTER avg_f;
ALTER TABLE Learning_signatures ADD COLUMN perc_MPI FLOAT AFTER time;
ALTER TABLE Learning_signatures ADD COLUMN IO_MBS FLOAT AFTER GBS;
From EAR 3.3 to 3.4
If no GPUs were used and they will not be used there are no changes necessary.
If GPUs were being used, type the following commands to the database's CLI client:
ALTER TABLE Signatures ADD COLUMN min_GPU_sig_id BIGINT unsigned, ADD COLUMN max_GPU_sig_id BIGINT unsigned;
ALTER TABLE Learning_signatures ADD COLUMN min_GPU_sig_id BIGINT unsigned, ADD COLUMN max_GPU_sig_id BIGINT unsigned;
CREATE TABLE IF NOT EXISTS GPU_signatures ( id BIGINT unsigned NOT NULL AUTO_INCREMENT, GPU_power FLOAT NOT NULL, GPU_freq INT unsigned NOT NULL, GPU_mem_freq INT unsigned NOT NULL, GPU_util INT unsigned NOT NULL, GPU_mem_util INT unsigned NOT NULL, PRIMARY KEY (id));
If no GPUs were being used but now are present, use the previous query plus the following one:
ALTER TABLE Periodic_metrics ADD COLUMN GPU_energy INT;
Database tables description
EAR's database contains several tables, as described here. Each table contains different information, as described here:
Jobs
- id: Job id given by the scheduler (for example SLURM_JOBID).
- step_id: step id given by the scheduler.
- user_id: the linux username that executed the job.
- app_id: the application/job name as given by the scheduler (not necessarily the executable’s name)
- start_time: timestamp of the job’s[.step] start
- end_time: timestamp of the job’s[.step] end
- start_mpi_time: timestamp of the beginning of application region managed by the EARL. Named MPI for historical reasons. For MPI applications timestamp of the MPI_Init execution.
- end_mpi_time: timestamp of the end of application region managed by the EARL. Named MPI for historical reasons. For MPI applications timestamp of the MPI_Finalize execution.
- policy: EAR policy name in action for the job. Can be “No Policy” if the job runs without EAR.
- threshold: threshold used by the policy to configure it’s behavior. For example, the maximum performance penalty in min_energy.
- job_type:
- def_f: default CPU frequency requested by the user/job manager.
- user_acc: the account the user_id belongs to.
- user_group: the linux group name the user_id belongs to.
- e_tag: energy tag. The user can specify an energy tag to apply pre-defined CPU frequency settings.
Applications
- job_id: job id given by the scheduler. Used as a foreign key for Jobs.
- step_id: step id given by the scheduler. Used as a foreign key for Jobs.
- node_id: the nodename in which the application ran. The names of the nodes are trimmed at any “.”, i.e., node1.at.cluster becomes node1.
- signature_id: the id (index) of the computed signature for the job on this node. If the job runs without EAR library the field will be NULL.
- power_signature_id: the id (index) of the power signature for the job on this node.
Signatures
All the metrics in this table refer to the period of time where the Signature is computed. Typically is 10 sec. Signatures are only reported when the application uses the EAR library.
- id: unique id generated by the database engine to be used in JOIN queries.
- DC_power: average DC node power (in Watts)
- DRAM_power: average DRAM power, including the 2 sockets (in Watts)
- PCK_power: Average CPU power, including the 2 sockets (in Watts)
- EDP: Energy Delay Product computed as (time x time x DC_power)
- GBS: Main memory bandwidth (GB/sec)
- TPI: Main memory transactions per instruction
- CPI: Cycles per instructions.
- Gflops: Giga Floating point operations, per second, generated by the application processes in the node. GFlops/sec.
- time: total execution time (in seconds)
- perc_MPI: average percentage of MPI time vs computational time in the node. Includes all the application processes in the node.
- L1_misses: L1 cache misses counter.
- L2_misses: L2 cache misses counter.
- L3_misses: L3 cache misses counter.
- FLOPS1: Floating point operations Single precision 64 bits consumed by application processes in the node.
- FLOPS2: Floating point operations Single precision 128 bits consumed by application processes in the node.
- FLOPS3 Floating point operations Single precision 256 bits consumed by application processes in the node.
- FLOPS4: Floating point operations Single precision 512 bits consumed by application processes in the node.
- FLOPS5: Floating point operations Double precision 64 bits consumed by application processes in the node.
- FLOPS6: Floating point operations Double precision 128 bits consumed by application processes in the node.
- FLOPS7: Floating point operations Double precision 256 bits consumed by application processes in the node.
- FLOPS8: Floating point operations Double precision 512 bits consumed by application processes in the node.
- instructions: total instructions executed by the application processes in the node
- cycles: total cycles consumed by the application processes in the node
- avg_f: average CPU frequency (includes all the cores used by the application on the node) in KHz
- avg_imc_f: average memory frequency (includes the two sockets) in KHz
- def_f: default CPU frequency used at the beginning of the application in KHz
- min_GPU_sig_id: start of the range containing the GPU_signature’s ids, used for JOIN queries. If an application doesn’t have GPUs it will be NULL
- max_GPU_sig_id: end of the range containing the GPU_signature’s ids, used for JOIN queries. If an application doesn’t have GPUs it will be NULL
1. Each signature corresponds to either a Loop or an Application. When it’s an application it is the average values for its entire runtime. For a loop, the values are the average of only the period comprised by the loop’s start and end.
- Signatures are only reported when an application is running with EARL.
- The GPU signature values are inclusive, i.e. if a signature has a min_id = 1 and max_id = 3, the GPU_signatures with ids 1,2,3 will be from this application.
Power_signatures
Power signatures are measured and reported by the EARD and reported for all the jobs/steps/nodes. It’s independent of the EAR library utilization.
- id: unique id generated by the database engine to be used in JOIN queries.
- DC_power: average DC node power (in Watts)
- DRAM_power: average DRAM power, including the 2 sockets (in Watts)
- PCK_power: Average CPU power, including the 2 sockets (in Watts)
- EDP: Energy Delay Product computed as (time x time x DC_power)
- max_DC_power: maximum DC node power registered by the EAR daemon during the application’s execution (in Watts)
- min_DC_power: minimum DC node power registered by the EAR daemon during the application’s execution (in Watts)
- time: total execution time (in seconds)
- avg_f: average CPU frequency (includes all the cores of the node) in KHz
- def_f: default CPU frequency used at the beginning of the application in KHz
GPU_signatures
- id: unique id generated by the database engine to be used in JOIN queries.
- GPU_power: average GPU power for a single GPU (in Watts)
- GPU_freq: average GPU frequency for a single GPU (in KHz)
- GPU_mem_freq: average GPU memory frequency for a single GPU (in KHz)
- GPU_util: average GPU utilisation for the reported period for a single GPU. (percentage)
- GPU_mem_util: average GPU memory utilisation for the reported period for a single GPU.(percentage)
If an application has more than 1 GPU there will be a signature for each of them.
Loops
Loops are only reported when the EAR library is used.
- event: loop type identificatory. It’s for internal use of the EAR library. Together with size and level is used internally.
- size: loop’s size as computed by DynAIS.
- level: loop’s level of depth (indicative of loops inside of loops)
- job_id: job id given by the job manager. Used as a foreign key for Jobs.
- step_id: step id given by the job manager. Used as a foreign key for Jobs.
- node_id: the nodema,e in which the application ran. The names of the nodes are trimmed at any “.”, i.e., node1.at.cluster becomes node1.
- total_iterations: timestamp at which the loop signature has been reported. It is named total_iterations for historical reasons.
- signature_id: the id of the computed signature for the job on this node.
1. the combination even-size-level forms the Primary Key for the table loops.
- Loops will always have a signature because they are only reported when EAR is used
- When a loop is inserted, the corresponding Job is probably not in the database yet, because Jobs are inserted only when an application finishes. JOIN queries with Jobs can only be done once an application has finished (only the current step id needs to finish, not the entire job).
Events
- id: unique id generated by the database engine to use as primary key.
- timestamp: registered timestamp of when the event happened (NOT when it was inserted)
- event_type: a numerical id for the type of EAR event
- job_id: job id given by the job manager. Used as a foreign key for Jobs.
- step_id: step id given by the job manager. Used as a foreign key for Jobs.
- value: value for the event. The units and semantic depend on the type of event. node_id: the node in which the application ran. The names of the nodes are trimmed at any “.”, i.e., node1.at.cluster becomes node1.
The origins of an event are indicated by its cardinality:
- EARL events’ type is always < 100
- EARD init events’ type is always >=100 <=200
- EARD runtime events’ type is always >=300 and <=400
- EARD powercap events’ type is always >=500 and <=600
- EARGM events’ type is always >=600 and <=700
Certain events do not require a value, so it is set to 0 by default on those cases.
Global_energy
This table is used by the EARGM.
- energy_percent: percentage of consumed energy from the current budget.
- warning_level: current level of closeness to the current energy budget. Higher level means closer to the current budget.
- time: timestamp of the energy event
- inc_th: threshold increment sent to the EARDs to be applied to policies
- p_state: p_state variation sent to the EARDs
- GlobEnergyConsumedT1: current energy consumed within the last period T1
- GlobEnergyConsumedT2: current energy consumed within the last period T2
- GlobEnergyLimit: current energy budget/limit
- GlobEnergyPeriodT1: duration of the current period T1
- GlobEnergyPeriodT2: duration of the current period T2
- GlobEnergyPolicy: current energy policy used by the EARGM
The warning level also indicates which inc_th and p_states are being sent to the EARDs
Periodic_metrics
- id: unique id generated by the database engine to use as primary key.
- start_time: timestamp of the start of the period
- end_time: timestamp of the end of the period
- DC_energy: total energy consumed by the node during the period in Joules
- node_id: the nodename in which the application period was registered. The names of the nodes are trimmed at any “.”, i.e., node1.at.cluster becomes node1.
- job_id: job id given by the scheduler. Used as a foreign key for Jobs. If no job is running in the node during the period it will be 0.
- step_id: step id given by the scheduler. Used as a foreign key for Jobs. If no job is running in the node during the period it will be 0.
- avg_f: average CPU frequency (includes all the cores of the node) in Khz during the period.
- temp: average temperature reported by the node during the period.
- DRAM_energy: total energy consumed by the DRAM (includes 2 sockets) during the period, in Joules
- PCK_energy: total energy consumed by the CPU (includes 2 sockets) during the period, in Joules
- GPU_energy: total energy consumed by the GPU (includes all GPUs) during the period, in Joules
Periodic_aggregations
- id: unique id generated by the database engine to use as primary key
- start_time: timestamp of the start of the period
- end_time: timestamp of the end of the period
- DC_energy: accumulated energy consumed by the period
- eardbd_host: hostname of the eardbd reporting the data to database. The hostnames of the nodes are trimmed at any “.”, i.e., service1.at.cluster becomes service1.