HorizontalPodAutoscaler

The HPA is a k8s built-in resource¹ that is also a controller². It automates scaling of other resources horizontally, unlike the VerticalPodAutoscaler which scales resources vertically. It uses an Algorithm to determine the scaling ratio based on the resource metrics derived from the metrics API³ or an alternative⁴.

Design Considerations

Metric Source Types

The HPA operates on certain metric source types:

ContainerResource: Tracks resource usage of individual containers within a group of Pods, selected by the scaleTargetRef field. For example, if you have a web app with a web server container and a sidecar for logging, you can configure the HPA to consider metrics only for the web server container in all replicas. This way, the HPA will scale when there is load to a specific subset of containers unlike when using the Resource type which could pollute the usageRatio (see Algorithm) since it tracks all containers of all selected Pods. In this case, the per-Pod metrics are still averaged, but with part of the containers excluded (the ones that are not specified).
Resource: Makes the HPA scale targets based on “per-Pod” resource metrics (e.g., cpu and memory, defined within the metrics API). The controller compares the average of all matching metrics of all containers across all Pods of a scaleTargetRef to the specified target. See Algorithm for more information.
Pods: Scales based on a single custom metric, describing a Pod. This type is similar to how the Resource type functions, with the exception that it supports only AverageValue (see Target Types) as metrics are reported per-Pod and must be averaged across replicas.
Object: This type is for acting on metrics describing an object different than a Pod. For example, you can track a custom metric, describing⁵ an Ingress object:

type: Object
object:
  metric:
    name: requests-per-second
  describedObject:
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    name: main-route
  target:
    type: Value
    value: 2k

In this example, the HPA would extract the current value of the metric named requests-per-second from the custom metrics API and will directly compare it to the specified target. However, (as this type supports only Value and AverageValue) if it was AverageValue, the HPA would have divided the current metric value by the number of Pods, selected by the scaleTargetRef.

External: This type is for metrics with no direct relationship with any of the k8s objects.

Note

The ContainerResource and Resource types use built-in metrics (e.g., cpu and memory) from the metrics.k8s.io API. Object, Pods, and External are for custom metrics, as they could be different from cluster to cluster. They require a more sophisticated monitoring setup, as well as knowledge of its internals.

Target Types

Each metric source type maps to a corresponding field under the metrics array and defines how the HPA retrieves and interprets metric data for the referenced scale target. They also have targets which specify the value of the given metric toward which the HPA should work. The target can be of three types:

value
averageValue
averageUtilization

These three predispose the HPA to different calculation behaviour, depending on the metric source type context. And although every source type supports them, some of them become redundant or totally useless (as in actually not considered as a case within the codebase) in the various contexts. This is because both the codebase and the documentation are not optimised.

Algorithm

d es i re d R e pl i c a s = ce i l ⌈ c u rre n tR e pl i c a s \times \frac{c u rre n tM e t r i c Va l u e}{d es i re d M e t r i c Va l u e} ⌉

This algorithm represents the behaviour for only a subset of the metric source contexts above. One of these source types is the Resource. As every other type the Resource type supports all three target value types. However, only averageValue and averageUtilization are covered—value isn’t used at all. And although the two targets might imply different behaviour, both of them calculate their replica count proposals using the algorithm above. The process goes as this:

The HPA categorises the pods into the following groups ignored, missing, and unready. The grouping criteria is the following (see groupPods():

Ignores pods with a deletionTimestamp or if their phase is Failed.
Considers pods with a Pending phase, a Ready condition set to “False” after the initial readiness delay has passed, and pods with a missing .status.startTime to be unready.

Note

If a pod is within the cpuInitialisationPeriod, the function checks whether or not they have a missing metric window since last state transition or if their Ready condition is either missing or “False”. If neither of those are true, then the pod will not be considered “unready”.

Pods with missing metrics are considered missing.
Every pod that has survived the above checks is considered ready.

Then, the HPA calculates the average value from all supplied metrics and divides it by the target usage: $\frac{m e t r i cs A v er a g e}{t a r g e t U s a g e}$ , achieving the usage ratio.

Note

The above step is valid in case the target type is averageValue. When using averageUtilization, the HPA takes the extra step of converting the metricsAverage to a percentage before calculating the ratio.

If there aren’t any unready or missing pods, the function directly returns the new replicaCount by $ce i l ⌈ u s a g e R a t i o \times re a d y P o d C o u n t ⌉$ .
If not, the HPA removes any existing metrics associated with the ignored and unready pods and depending whether the scaleTarget will be scaled-down or up, it does the following:

On scale-down, it treats missing pods as using exactly the target amount.
On scale-up, it treats missing pods as using 0% of the resource request.

Then, it recalculates the usage ratio with the new numbers and if the ratio is too insignificant or falls within the Tolerance, it returns the current replica count.
Finally, the HPA recalculates the new replica count, using the following formula: $ce i l ⌈ u s a g e R a t i o \times l e n (m e t r i cs)⌉$ .

For comparison, see issue #79272 which explains in detail how the HPA calculates replica counts for the Object and External types.

Configuring scaling behaviour

The v2 of the HPA allows you to configure the separate scale up and scale down policies under the behavior field. You can also prevent flapping with a stabilisation window for smoothing replica counts, and a tolerance to ignore minor metric fluctuations below a specified threshold.

Scaling Policies

You can specify the rate at which the HPA scales up or down a target.

behavior:
  scaleDown:
    policies:
    - type: Pods
      value: 4
      periodSeconds: 60
    - type: Percent
      value: 10
      periodSeconds: 60

The first policy (Pods) limits the maximum number of replicas the HPA can terminate in 60 seconds (during a scale-down). The second (Percent) limits the workload percentage which can be scaled-down in 60 seconds.

Note

When there are multiple policies, the one which would cause the greatest change when the current scaling decision is being made would be active. After every scaling, the HPA recalculates which policy would make the biggest impact.

Using the above snippet, a scaleTarget with 72 replicas would scale down based on the Percent policy first, because 10% of the current replicas is 7.2, which the HPA rounds up to 8. Ergo, the controller would continue to use the first policy for each consequential scale down, assuming it hasn’t achieved it’s target value, until the number of replicas falls below 40 replicas (10% of 30 is 3), then it will use the Pods policy which allows the biggest number of replicas to be terminated (4).

Stabilisation

A stabilisation window prevents a phenomena called flapping (aka thrashing). This phenomena occurs upon frequent change in the replica numbers due to fluctuations in their metrics. You can avoid it by using a stabilisation window which makes the controller keep the previously desired states during a time window (5m in the example below), and compare the current replica count proposal with the highest of the proposals in the last 5m. If the new one isn’t larger, it keeps the other one.

behavior:
  scaleDown:
    stabilizationWindowSeconds: 300

Tolerance

You can prevent the autoscaler from scaling workloads on small metric variations. This is done through the tolerance field:

behavior:
  scaleUp:
    tolerance: 0.05 # 5% tolerance for scale up

In this case, the HPA will consider scaling up only if the target usage is above 105% (100% target usage + 5% tolerance).

Sources

See k8s api resources. ↩
Unlike other k8s controllers, the HPA doesn’t run constantly, but instead it runs intermittently. The interval is set by the --horizontal-pod-autoscaler-sync-period parameter to the kube-controller-manager (and the default interval is 15 seconds). ↩
This is the API, generally provided by the Metrics Server—a plugin that must be enabled. ↩
Alternatives are provided through the aggregation layer of the k8s api ↩
Describing means that the metric doesn’t necessarily come directly from the object itself—the monitoring system monitors the actual networking solution (e.g., nginx controller) and exposes the metric through the relevant API⁴. ↩

k8s.martindbrv

Traverse

HorizontalPodAutoscaler

Design Considerations

Metric Source Types

Target Types

Algorithm

Configuring scaling behaviour

Scaling Policies

Stabilisation

Tolerance

Sources

Graph View

Table of Contents

Backlinks

k8s.martindbrv

Traverse

HorizontalPodAutoscaler

Design Considerations

Metric Source Types

Target Types

Algorithm

Configuring scaling behaviour

Scaling Policies

Stabilisation

Tolerance

Sources

Footnotes

Graph View

Table of Contents

Backlinks