EAR 4.3
Reference Manual
EAR Powercap

EAR provides powercap at different levels:

  • Node powercap, where a node cannot exceed their given power consumption.
  • Cluster powercap, where the target power is for the entire cluster. It uses the node powercap to achieve its target.

Node powercap

Node powercap is enforced by the EARD. The initial values for each node's powercap are set in the tags section of the ear.conf (see Tags for more information), which include the power limit, the CPU/PKG powercap plugin and the GPU powercap plugin (if needed). The power limit can be changed at runtime via econtrol or by an active EARGM that has the node under its control.

The EARD enforces the powercap via its plugins, which in turn ensure that the domain they control (CPU/GPU) does not exceed their power allocation.

The main goals of the node powercap is, first and foremost, to enforce the power limit with the secondary goal to maximize performance while under said limit. The EARD will use its current power limit as a budget which it will, in turn, distribute among the domains (controlled by the plugins) according to the current node's needs.

Node powercap can be applied without cluster powercap by defining only the node powercap in the EAR configuration file.

Cluster powercap

Cluster powercap is managed by one or more EARGMs and enforced at a node level by the EARD. EARGMs have an individual power limit set in their definition (see EARGM for more details) and the monitoring frequency. Each EARGM will then ask the nodes under its control (as indicated in the nodes' definition for its power consumption and distribute the budget accordingly. There are two main ways in which the cluster powercap might be enforced; soft and hard cluster powercap.

Soft cluster powercap

This type of powercap is targeted to systems where exceeding the power limit is not a hardware constraint but a rule that needs enforcement for a different reason. In this scenario, the compute nodes will run as if no limit was applied until the total power consumption of the cluster reaches a percentage threshold (defined as the suspend threshold in ear.conf), at which point the EARGM will send a power limit to all the nodes to prevent the global power to go above the actual limit. Additionally, a script can be attached to the activation of the powercap in which the admin can set whichever actions they feel appropriate. Once the cluster power goes below another percentage threshold (defined as the resume threshold in ear.conf) the EARGM will send a message to all the nodes to go back to unlimited power usage, as well as call the deactivation script set by the admin (if any is specified).

In terms of configuration, EARGMPowerCapMode must be set to 2 (soft powercap) and all nodes need to have a max_powercap set in their tag. The value of max_powercap will be the power allocation of the nodes that have that tag. If a node has a max_powercap value of 1, 0 or -1 they will ignore powercap messages from an EARGM in soft cluster powercap mode.

Hard cluster powercap

Hard powercap is used when the system must not, under any circumstance, go above the power limit. This starts by always having a set powercap in the compute nodes. The job of the EARGM is to periodically monitor the state of the nodes, which will request more or less power depending on their current workload, and redistribute the power according to the needs of all nodes.

Possible powercap values

To set the powercap for an entire cluster one can do it two ways, specific values and calculated. With specific values, the powercap value in the EARGM definition must be a number > 0, and that will be the power budget for the EARGM to distribute among the nodes it controls. On the other hand, if powercap=-1 the total power budget will be calculated automatically as the sum of the powercap values set in the tags for the nodes it controls.

For an EARD, the valid values of powercap in its tag are 1 and N > 1. When set to 1, the daemon will run with no power limit until it receives one. On the other hand, if the powercap is a higher number that will be used as the power limit until a different value is set via econtrol or EARGM reallocations.

If either powercap or EARGMPowercapMode is set to 0 in the configuration file, the thread that controls the power limits will not be started and the feature will be disabled.

If the initial powercap value for a node is set to 0 the powercap will be disabled for that node and it will ignore any attempts to set it to a certain value. Set it to 1 if you ever want to set the powercap.

Example configurations

The following is an example for hard powercap on 4 nodes, with a starting powercap of 225W each and a total power budget of 1000W. For clarity a few fields in the tags section have been skipped.

# Wait period between power checks
EARGMPowerPeriod=120
# Activate powercap
EARGMPowercapMode=1
# Set up at least 1 EARGM
EARGMId=1 energy=XXX power=1000 node=node1
# Set up the nodes
Tag=tag1 default=yes max_power=500 min_power=50 error_power=600 powercap=225 powercap_plugin=dvfs.so gpu_powercap_plugin=gpu.so
Island=1 nodes=node[1-4] EARGMId=1

This example is similar to the previous one, but the global powercap is calculated by the EARGM as the sum of the nodes. In this case, the nodes start with a default powercap of 250W and the total budget for the cluster remains 1000W.

# Wait period between power checks
EARGMPowerPeriod=120
# Activate powercap
EARGMPowercapMode=1
# Set up at least 1 EARGM
EARGMId=1 energy=XXX power=-1 node=node1
# Set up the nodes
Tag=tag1 default=yes max_power=500 min_power=50 error_power=600 powercap=250 powercap_plugin=dvfs.so gpu_powercap_plugin=gpu.so
Island=1 nodes=node[1-4] EARGMId=1

The following is a soft powercap example with a power budget of 1000W. The nodes will start without a set powercap but will be ready to activate it.

# Wait period between power checks
EARGMPowerPeriod=120
# Activate powercap as soft powercap
EARGMPowercapMode=2
# Set up at least 1 EARGM
EARGMId=1 energy=XXX power=1000 node=node1
# Set up the nodes
Tag=tag1 default=yes max_power=500 min_power=50 error_power=600 powercap=1 powercap_plugin=dvfs.so gpu_powercap_plugin=gpu.so
Island=1 nodes=node[1-4] EARGMId=1

Finally, this example has ONLY node powercap, with the nodes having a limit of 250W. There will be no reallocation:

# Wait period between power checks
EARGMPowerPeriod=120
# Activate powercap
EARGMPowercapMode=1
# Set up at least 1 EARGM
EARGMId=1 energy=XXX power=0 node=node1
# Set up the nodes
Tag=tag1 default=yes max_power=500 min_power=50 error_power=600 powercap=250 powercap_plugin=dvfs.so gpu_powercap_plugin=gpu.so
Island=1 nodes=node[1-4] EARGMId=1

This is the same, but deactivating the powercap by setting the mode to 0:

# Wait period between power checks
EARGMPowerPeriod=120
# Activate powercap
EARGMPowercapMode=0
# Set up at least 1 EARGM
EARGMId=1 energy=XXX power=1000 node=node1
# Set up the nodes
Tag=tag1 default=yes max_power=500 min_power=50 error_power=600 powercap=250 powercap_plugin=dvfs.so gpu_powercap_plugin=gpu.so
Island=1 nodes=node[1-4] EARGMId=1

This is an erroneous way to set it up, because the nodes' powercap capabilities will not be active:

# Wait period between power checks
EARGMPowerPeriod=120
# Activate powercap
EARGMPowercapMode=1
# Set up at least 1 EARGM
EARGMId=1 energy=XXX power=1000 node=node1
# Set up the nodes
Tag=tag1 default=yes max_power=500 min_power=50 error_power=600 powercap=0 powercap_plugin=dvfs.so gpu_powercap_plugin=gpu.so
Island=1 nodes=node[1-4] EARGMId=1

Similarly, this following example does not work because the EARGM cannot calculate a valid powercap when the nodes are set to unlimited:

# Wait period between power checks
EARGMPowerPeriod=120
# Activate powercap
EARGMPowercapMode=1
# Set up at least 1 EARGM
EARGMId=1 energy=XXX power=-1 node=node1
# Set up the nodes
Tag=tag1 default=yes max_power=500 min_power=50 error_power=600 powercap=1 powercap_plugin=dvfs.so gpu_powercap_plugin=gpu.so
Island=1 nodes=node[1-4] EARGMId=1

Valid configurations

There are three special values for powercap configuration, 1 (unlimited, only for Tags/Node), 0 (disabled) and -1 (auto-configure).

Furthermore, there are three cluster powercap modes for EARGM: 0 (monitoring-only), 1 (hard cluster powercap) and 2 (soft cluster powercap).

EARGM powercap mode EARGM powercap value Tag powercap value Result
ANY 0 1 Cluster powercap disabled, node powercap unlimited (but can be set with econtrol)
ANY 0 0 All powercap types disabled, and cannot be modified without restarting
ANY 0 N Cluster powercap disabled, node powercap set to N
HARD -1 N Cluster powercap set to the sum of the nodes' powercap. Node powercap set to N
HARD N -1 Cluster powercap set to N. Node powercap set to N/number of nodes controlled by EARGM
HARD N M Cluster powercap set to N. Node powercap set to N
SOFT N 1 Cluster powercap set to N, node powercap unlimited. If triggered, node powercap will be set to their max_powercap value
SOFT N M *ERROR *
HARD/SOFT N 0 ERROR
HARD/SOFT -1 -1 ERROR
HARD/SOFT 0 -1 *ERROR
HARD/SOFT 1 -1 *ERROR
HARD/SOFT -1 1 ERROR

NOTE: When using soft cluster powercap, max_powercap value must be properly set for the powercap to work.