This is a new report plugin to write EAR collected data into a file. Single file is generated per metric per jobID & stepID per node per island per cluster. Only the last collected data metrices are stored into the files, means every time the report runs it saves the current collected values by overwriting the pervious data.
Namespace Format
The below schema has been followed to create the metric files:
{/root_directory/cluster/island/nodename/avg/metricFile}
/root_directory/cluster/island/nodename/current/metricFile
/root_directory/cluster/island/jobs/jobID/stepID/nodename/avg/metricFile
/root_directory/cluster/island/jobs/jobID/stepID/nodename/current/metricFile
The root_directory is the default path where all the created metric files are generated.
The cluster, island and nodename will be replaced by the island number, cluster name, and node information.
metricFile
will be replaced by the name of the metrics collected by EAR.
Metric File Naming Format
The naming format used to create the metric files is implementing the standard sysfs interface format. The current commonly used schema of file naming is:
<type>_<component>_<metric-name>_<unit>
Numbering is used with some metric files if the component has more than one instance like FLOPS counters or GPU data.
Examples of some generated metric files:
- dc_power_watt
- app_sig_pck_power_watt
- app_sig_mem_gbs
- app_sig_flops_6
- avg_imc_freq_KHz
Metrics reported
The following are the reported values for each type of metric recorded by ear:
- report_periodic_metrics
- Average values
- The frequency and temperature values have been calculated by summing the values of all periods since the report loaded until the current period and divide it by the total number of periods.
- The energy value is accumulated value of all the periods since the report loaded until the current one.
- The path to those metric files built as: /root_directory/cluster/island/nodename/avg/metricFile
- Current values
- Represent the current collected EAR metric per period.
- The path to those metric files built as: /root_directory/cluster/island/nodename/current/metricFile
- report_loops
- Current values
- Represent the current collected EAR metric per loop.
- The path to those metric files built as: /root_directory/cluster/island/jobs/jobID/stepID/nodename/current/metricFile
- report_applications
- Current values
- Represent the current collected EAR metric per application.
- The path to those metric files built as: /root_directory/cluster/island/jobs/jobID/stepID/nodename/avg/metricFile
- report_events
- Current values
- Represent the current collected EAR metric pere event.
- The path to those metric files built as: /root_directory/cluster/island/jobs/jobID/stepID/nodename/current/metricFile
``` Note: If the cluster contains GPUs, both report_loops and report_applications will generate new schema files will per GPU which contain all the collected data for each GPU with the paths below: ◦ /root_directory/cluster/island/jobs/jobID/stepID/nodename/current/GPU-ID/metricFile ◦ /root_directory/cluster/island/jobs/jobID/stepID/nodename/avg/GPU-ID/metricFile