EAR provides powercap at different levels:
Node powercap is enforced by the EARD. The initial values for each node's powercap are set in the tags section of the ear.conf (see Tags for more information), which include the power limit, the CPU/PKG powercap plugin and the GPU powercap plugin (if needed). The power limit can be changed at runtime via econtrol
or by an active EARGM
that has the node under its control.
The EARD enforces the powercap via its plugins, which in turn ensure that the domain they control (CPU/GPU) does not exceed their power allocation.
The main goals of the node powercap is, first and foremost, to enforce the power limit with the secondary goal to maximize performance while under said limit. The EARD will use its current power limit as a budget which it will, in turn, distribute among the domains (controlled by the plugins) according to the current node's needs.
Node powercap can be applied without cluster powercap by defining only the node powercap in the EAR configuration file.
Cluster powercap is managed by one or more EARGMs and enforced at a node level by the EARD. EARGMs have an individual power limit set in their definition (see EARGM for more details) and the monitoring frequency. Each EARGM will then ask the nodes under its control (as indicated in the nodes' definition for its power consumption and distribute the budget accordingly. There are two main ways in which the cluster powercap might be enforced; soft and hard cluster powercap.
This type of powercap is targeted to systems where exceeding the power limit is not a hardware constraint but a rule that needs enforcement for a different reason. In this scenario, the compute nodes will run as if no limit was applied until the total power consumption of the cluster reaches a percentage threshold (defined as the suspend threshold in ear.conf), at which point the EARGM will send a power limit to all the nodes to prevent the global power to go above the actual limit. Additionally, a script can be attached to the activation of the powercap in which the admin can set whichever actions they feel appropriate. Once the cluster power goes below another percentage threshold (defined as the resume threshold in ear.conf) the EARGM will send a message to all the nodes to go back to unlimited power usage, as well as call the deactivation script set by the admin (if any is specified).
In terms of configuration, EARGMPowerCapMode
must be set to 2 (soft powercap) and all nodes need to have a max_powercap
set in their tag. The value of max_powercap
will be the power allocation of the nodes that have that tag. If a node has a max_powercap
value of 1, 0 or -1 they will ignore powercap messages from an EARGM in soft cluster powercap mode.
Hard powercap is used when the system must not, under any circumstance, go above the power limit. This starts by always having a set powercap in the compute nodes. The job of the EARGM is to periodically monitor the state of the nodes, which will request more or less power depending on their current workload, and redistribute the power according to the needs of all nodes.
To set the powercap for an entire cluster one can do it two ways, specific values and calculated. With specific values, the powercap
value in the EARGM definition must be a number > 0, and that will be the power budget for the EARGM to distribute among the nodes it controls. On the other hand, if powercap=-1
the total power budget will be calculated automatically as the sum of the powercap values set in the tags for the nodes it controls.
For an EARD, the valid values of powercap
in its tag are 1 and N > 1. When set to 1, the daemon will run with no power limit until it receives one. On the other hand, if the powercap is a higher number that will be used as the power limit until a different value is set via econtrol
or EARGM reallocations.
If either powercap or EARGMPowercapMode is set to 0 in the configuration file, the thread that controls the power limits will not be started and the feature will be disabled.
If the initial powercap value for a node is set to 0 the powercap will be disabled for that node and it will ignore any attempts to set it to a certain value. Set it to 1 if you ever want to set the powercap.
The following is an example for hard powercap on 4 nodes, with a starting powercap of 225W each and a total power budget of 1000W. For clarity a few fields in the tags section have been skipped.
This example is similar to the previous one, but the global powercap is calculated by the EARGM as the sum of the nodes. In this case, the nodes start with a default powercap of 250W and the total budget for the cluster remains 1000W.
The following is a soft powercap example with a power budget of 1000W. The nodes will start without a set powercap but will be ready to activate it.
Finally, this example has ONLY node powercap, with the nodes having a limit of 250W. There will be no reallocation:
This is the same, but deactivating the powercap by setting the mode to 0:
This is an erroneous way to set it up, because the nodes' powercap capabilities will not be active:
Similarly, this following example does not work because the EARGM cannot calculate a valid powercap when the nodes are set to unlimited:
There are three special values for powercap configuration, 1 (unlimited, only for Tags/Node), 0 (disabled) and -1 (auto-configure).
Furthermore, there are three cluster powercap modes for EARGM: 0 (monitoring-only), 1 (hard cluster powercap) and 2 (soft cluster powercap).
EARGM powercap mode | EARGM powercap value | Tag powercap value | Result |
---|---|---|---|
ANY | 0 | 1 | Cluster powercap disabled, node powercap unlimited (but can be set with econtrol ) |
ANY | 0 | 0 | All powercap types disabled, and cannot be modified without restarting |
ANY | 0 | N | Cluster powercap disabled, node powercap set to N |
HARD | -1 | N | Cluster powercap set to the sum of the nodes' powercap. Node powercap set to N |
HARD | N | -1 | Cluster powercap set to N. Node powercap set to N/number of nodes controlled by EARGM |
HARD | N | M | Cluster powercap set to N. Node powercap set to N |
SOFT | N | 1 | Cluster powercap set to N, node powercap unlimited. If triggered, node powercap will be set to their max_powercap value |
SOFT | N | M | *ERROR * |
HARD/SOFT | N | 0 | ERROR |
HARD/SOFT | -1 | -1 | ERROR |
HARD/SOFT | 0 | -1 | *ERROR |
HARD/SOFT | 1 | -1 | *ERROR |
HARD/SOFT | -1 | 1 | ERROR |
NOTE: When using soft cluster powercap, max_powercap value must be properly set for the powercap to work.