In k8s, Deployments are part of a group of objects1 referred to as workload managers2—they manage workloads (Pods) on your behalf. Deployments are meant for stateless workloads because, unlike StatefulSets, the Deployment controller3 doesn’t protect the identity of the workloads throughout their lifetimes.

Deployments have their own spec with several key fields, some of which directly represent Kubernetes primitives:

  • Labels and Label Selectors in k8s. The Deployment controller creates a ReplicaSet that uses label selectors to determine which pods to manage.
  • ReplicaSets. Deployments use ReplicaSets to manage the replication of a given workload. ReplicaSets are named <deploymentname-hash>, where the hash is produced by hashing the pod template. Pods then derive their names from the ReplicaSet. This is an important detail for understanding how Deployments implement Revision control and differentiate between resources.
  • Pod Templates.

Revision control

Deployments can rollback a Rollout. They retain revisions of their pod template (up to revisionHistoryLimit4). Updating anything but the template won’t result in a new revision. But, when a new revision is created, the pod-template-hash label is assigned to the managing ReplicaSet. Its value also becomes the foundation for naming the dependents of the Deployment.

Rollout

Upon changing the pod template, the controller starts a rollout of the new pods.

Note

The rollouts and the revisions are interconnected, i.e. a revision is created on a rollout which is only triggered on a pod template change.

By default, the controller is set to use the “RollingUpdate” rollout strategy which gradually scales down the old ReplicaSet and scales up the new one. Under the strategy.rollingUpdate field, you could specify the following sub-fields:

maxSurge

When rolling out, the Deployment controller first starts creating new pods (scale up the new RS), waits for a sufficient number of new pods to be created and only then it starts deleting old ones (scale down the old RS). The same goes in reverse, i.e. it won’t create new pods if a sufficient number of old ones aren’t removed.

For example, during the rollout of a Deployment with 4 replicas, the number of desired pods is exceeded by 25%5 (125% in total, or 5 pods running), then the controller starts deleting old pods and the number of total running pods goes to 4 (old RS is at 3). This process of scaling up and down continues until the new RS has been scaled to the desired number of replicas.

maxUnavailable

You might also encounter the case when a new rollout is unsuccessful. As the default value of the field is 25% (like maxSurge), when a 4-replica-Deployment has a failing rollout the following happens:

Events:
  Type    Reason             Age    From                   Message
  ----    ------             ----   ----                   -------
  Normal  ScalingReplicaSet  9m19s  deployment-controller  Scaled up replica set test-6bc6b589d7 from 0 to 4
  Normal  ScalingReplicaSet  7m41s  deployment-controller  Scaled up replica set test-698c65d78f from 0 to 1
  Normal  ScalingReplicaSet  7m41s  deployment-controller  Scaled down replica set test-6bc6b589d7 from 4 to 3
  Normal  ScalingReplicaSet  7m41s  deployment-controller  Scaled up replica set test-698c65d78f from 1 to 2
  1. The initial rollout (6bc6b589d7) has been scaled successfully to 4 pods.
  2. An update to the pod template triggers a rollout. A new RS is created (698c65d78f).
  3. The new RS scales to 1 pod.
  4. Then, the old one scales down to 3.
  5. Finally, the new RS scales up to 2. At this point, the controller has stopped the bad rollout since the maxUnavailable field is set to 25% (by default) which, in this case, is equal to a single pod. The newly created pods don’t count toward availability because they are failing readiness checks.

Rollover

When you update a pod’s template during a progressing rollout, the controller immediately starts deploying the new pods and cuts off the in-progress scaling of the RS. It doesn’t wait for it to finish before proceeding with the new one. This is called a “rollover”.

Proportional scaling

Deployments can run two revisions simultaneously. This is to prevent risk when either you or an HPA6 scales the Deployment during a faulty rollout.

Continuing the example from maxUnavailable, if a new scaling request comes in at the moment when only 3/4 pods are ready, with 2 failing up-to-date ones, the controller distributes the new replicas proportionally, favouring the ReplicaSet with the most replicas; any remainder is assigned to that same RS.

Pausing/Resuming rollouts

If you think you’ll need to make multiple changes over uncertain time intervals, you could pause the rollouts. After you finish all of your changes, you can resume. This prevents unnecessary rollouts to be triggered.

Sources

Footnotes

  1. See k8s objects. Don’t confuse them with the low-level kind categories (Types (Kinds) Categories), i.e. there is no workload manager category. Regardless, they still qualify as Objects.

  2. Other managers are StatefulSets, DaemonSets, Jobs, CronJobs.

  3. See k8s controllers.

  4. By default 10.

  5. This is the default value for the field, ensuring that a rollout doesn’t exceed 25% surge.

  6. See HorizontalPodAutoscaler.