Zero-Downtime Deploys With Rolling Updates

"Zero-downtime deploy" is one of those phrases everyone claims and few actually achieve. The deploy completes, the dashboard turns green, and somewhere a handful of users got a 502 during the rollout that nobody noticed because the error rate blip lasted four seconds. Real zero-downtime is not a checkbox you enable. It is the result of several settings cooperating correctly.

Kubernetes gives you the machinery in the form of the rolling update strategy, but the defaults do not guarantee a clean rollout on their own. You need readiness probes that tell the truth, graceful shutdown that drains connections, and surge settings that match your capacity. Let us walk through how a rolling update actually works and what it takes to make one invisible.

How a rolling update works

A Deployment with the RollingUpdate strategy replaces old pods with new ones gradually rather than all at once. Kubernetes creates new pods, waits for them to become ready, then terminates old pods, repeating until the whole set is replaced. Two fields control the pace:

spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 0
  replicas: 8

maxSurge is how many pods above the desired count may exist during the rollout, so 25 percent of 8 lets two extra pods spin up. maxUnavailable is how many pods below the desired count may be missing. Setting maxUnavailable: 0 is the key to zero-downtime: it forces Kubernetes to bring up a new ready pod before taking down an old one, so your serving capacity never dips below the target. The cost is that you temporarily run extra pods, which is almost always worth it. The Kubernetes documentation on Deployments covers the full strategy reference.

Readiness is what makes it safe

The entire rolling update relies on Kubernetes knowing when a new pod can actually serve traffic. That signal is the readiness probe. Without one, Kubernetes considers a pod ready the instant its container process starts, which is almost never when the application is actually ready to handle requests. It might still be establishing database connections, warming caches, or loading configuration.

readinessProbe:
  httpGet:
    path: /readyz
    port: 8080
  periodSeconds: 5
  failureThreshold: 3

If you skip this, the rollout proceeds at full speed, routing traffic to pods that immediately return errors, and you ship an outage on purpose. The readiness probe is the gate that makes maxUnavailable: 0 meaningful. A new pod only counts toward capacity once it reports ready, so the old pod it replaces stays in service until the new one can genuinely take over.

A rolling update is only as safe as your readiness probe is honest. If /readyz returns 200 before the app can actually serve, Kubernetes will tear down old pods on a false signal and you will drop traffic during every deploy. Make readiness reflect true serving capability, including any dependencies the first request needs.

The shutdown side everyone forgets

Bringing up new pods cleanly is only half the rollout. The other half is taking old pods down without dropping their in-flight requests, and this is where most "zero-downtime" deploys quietly fail. When Kubernetes terminates a pod, two things happen nearly simultaneously: the pod is removed from Service endpoints, and the container receives SIGTERM. The problem is that endpoint removal propagates asynchronously across the cluster, so for a brief window, load balancers may still send new requests to a pod that has already started shutting down.

The fix is a preStop hook that delays shutdown long enough for endpoint removal to propagate, combined with an application that handles SIGTERM by finishing in-flight work before exiting:

lifecycle:
  preStop:
    exec:
      command: ["sh", "-c", "sleep 10"]
terminationGracePeriodSeconds: 30

The sleep 10 keeps the container alive after the termination signal so the load balancer stops routing to it before the process actually quits. terminationGracePeriodSeconds then gives the application up to 30 seconds to drain existing connections before Kubernetes force-kills it with SIGKILL. Your application code must also catch SIGTERM and stop accepting new connections while letting current requests complete. Miss any of these and you drop requests on every single pod that gets replaced, which during a rollout is all of them.

Watching and controlling a rollout

Once configured, you trigger a rollout by changing the pod template, usually the image tag, and Kubernetes does the rest. You should watch it rather than assume it worked:

kubectl set image deployment/api api=registry.example.com/api:1.8.0
kubectl rollout status deployment/api

rollout status blocks until the rollout completes or stalls, which makes it ideal as a gate in a CD pipeline: if the new pods never become ready, the command never returns success and your pipeline fails loudly instead of declaring victory over a broken deploy. If something does go wrong, rollback is one command because Deployments keep revision history:

kubectl rollout undo deployment/api

This reverses to the previous revision using the same rolling mechanism, so the rollback is itself zero-downtime. Pair this with a progressDeadlineSeconds on the Deployment so a stuck rollout is reported as failed within a bounded time instead of hanging indefinitely.

When rolling updates are not enough

Rolling updates work beautifully when new and old versions can run side by side, which is most of the time. They fall apart when a release includes a backward-incompatible database migration, because during the rollout both versions are live at once and they must agree on the schema. The standard answer is the expand-and-contract pattern: deploy a schema change that both versions tolerate, roll out the new code, then remove the old columns in a later release. The rolling update mechanism is fine; the discipline has to come from how you sequence migrations.

For releases where you want to validate the new version on real traffic before committing, rolling updates also do not give you a clean cutover or instant rollback the way blue-green or canary strategies do. Those are worth reaching for when the blast radius justifies the extra machinery, but for the vast majority of deploys, a correctly configured rolling update with honest readiness probes and graceful shutdown is all you need to make deploys boring.

Takeaways

Set maxUnavailable: 0 with a positive maxSurge so new ready pods come up before old ones are removed.
A rolling update is only as safe as its readiness probe; make /readyz reflect true serving capability.
Add a preStop hook so endpoint removal propagates before the process exits, preventing dropped in-flight requests.
Handle SIGTERM in your app to drain connections, and size terminationGracePeriodSeconds to match.
Gate CD on kubectl rollout status and keep rollout undo ready for a zero-downtime rollback.
Sequence backward-incompatible migrations with expand-and-contract, since both versions run during the rollout.

Zero-Downtime Deploys With Rolling Updates

How a rolling update works

Readiness is what makes it safe

The shutdown side everyone forgets

Watching and controlling a rollout

When rolling updates are not enough

Takeaways

Read next

Liveness vs Readiness Probes: The Difference That Takes Down Deploys

Your CI Is Slow Because of Cache Misses

Multi-Stage Docker Builds: Smaller, Faster, Safer Images