Zero-Downtime Deploys With Rolling Updates
Photo: Unsplash
"Zero-downtime deploy" is one of those phrases everyone claims and few actually achieve. The deploy completes, the dashboard turns green, and somewhere a handful of users got a 502 during the rollout that nobody noticed because the error rate blip lasted four seconds. Real zero-downtime is not a checkbox you enable. It is the result of several settings cooperating correctly.
Kubernetes gives you the machinery in the form of the rolling update strategy, but the defaults do not guarantee a clean rollout on their own. You need readiness probes that tell the truth, graceful shutdown that drains connections, and surge settings that match your capacity. Let us walk through how a rolling update actually works and what it takes to make one invisible.
How a rolling update works
A Deployment with the RollingUpdate strategy replaces old pods with new ones gradually rather than all at once. Kubernetes creates new pods, waits for them to become ready, then terminates old pods, repeating until the whole set is replaced. Two fields control the pace:
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 25%
maxUnavailable: 0
replicas: 8maxSurge is how many pods above the desired count may exist during the rollout, so 25 percent of 8 lets two extra pods spin up. maxUnavailable is how many pods below the desired count may be missing. Setting maxUnavailable: 0 is the key to zero-downtime: it forces Kubernetes to bring up a new ready pod before taking down an old one, so your serving capacity never dips below the target. The cost is that you temporarily run extra pods, which is almost always worth it. The Kubernetes documentation on Deployments covers the full strategy reference.
Readiness is what makes it safe
The entire rolling update relies on Kubernetes knowing when a new pod can actually serve traffic. That signal is the readiness probe. Without one, Kubernetes considers a pod ready the instant its container process starts, which is almost never when the application is actually ready to handle requests. It might still be establishing database connections, warming caches, or loading configuration.
readinessProbe:
httpGet:
path: /readyz
port: 8080
periodSeconds: 5
failureThreshold: 3If you skip this, the rollout proceeds at full speed, routing traffic to pods that immediately return errors, and you ship an outage on purpose. The readiness probe is the gate that makes maxUnavailable: 0 meaningful. A new pod only counts toward capacity once it reports ready, so the old pod it replaces stays in service until the new one can genuinely take over.
A rolling update is only as safe as your readiness probe is honest. If /readyz returns 200 before the app can actually serve, Kubernetes will tear down old pods on a false signal and you will drop traffic during every deploy. Make readiness reflect true serving capability, including any dependencies the first request needs.
The shutdown side everyone forgets
Bringing up new pods cleanly is only half the rollout. The other half is taking old pods down without dropping their in-flight requests, and this is where most "zero-downtime" deploys quietly fail. When Kubernetes terminates a pod, two things happen nearly simultaneously: the pod is removed from Service endpoints, and the container receives SIGTERM. The problem is that endpoint removal propagates asynchronously across the cluster, so for a brief window, load balancers may still send new requests to a pod that has already started shutting down.
The fix is a preStop hook that delays shutdown long enough for endpoint removal to propagate, combined with an application that handles SIGTERM by finishing in-flight work before exiting:
lifecycle:
preStop:
exec:
command: ["sh", "-c", "sleep 10"]
terminationGracePeriodSeconds: 30The sleep 10 keeps the container alive after the termination signal so the load balancer stops routing to it before the process actually quits. terminationGracePeriodSeconds then gives the application up to 30 seconds to drain existing connections before Kubernetes force-kills it with SIGKILL. Your application code must also catch SIGTERM and stop accepting new connections while letting current requests complete. Miss any of these and you drop requests on every single pod that gets replaced, which during a rollout is all of them.
Watching and controlling a rollout
Once configured, you trigger a rollout by changing the pod template, usually the image tag, and Kubernetes does the rest. You should watch it rather than assume it worked:
kubectl set image deployment/api api=registry.example.com/api:1.8.0
kubectl rollout status deployment/apirollout status blocks until the rollout completes or stalls, which makes it ideal as a gate in a CD pipeline: if the new pods never become ready, the command never returns success and your pipeline fails loudly instead of declaring victory over a broken deploy. If something does go wrong, rollback is one command because Deployments keep revision history:
kubectl rollout undo deployment/apiThis reverses to the previous revision using the same rolling mechanism, so the rollback is itself zero-downtime. Pair this with a progressDeadlineSeconds on the Deployment so a stuck rollout is reported as failed within a bounded time instead of hanging indefinitely.
When rolling updates are not enough
Rolling updates work beautifully when new and old versions can run side by side, which is most of the time. They fall apart when a release includes a backward-incompatible database migration, because during the rollout both versions are live at once and they must agree on the schema. The standard answer is the expand-and-contract pattern: deploy a schema change that both versions tolerate, roll out the new code, then remove the old columns in a later release. The rolling update mechanism is fine; the discipline has to come from how you sequence migrations.
For releases where you want to validate the new version on real traffic before committing, rolling updates also do not give you a clean cutover or instant rollback the way blue-green or canary strategies do. Those are worth reaching for when the blast radius justifies the extra machinery, but for the vast majority of deploys, a correctly configured rolling update with honest readiness probes and graceful shutdown is all you need to make deploys boring.
Takeaways
- Set
maxUnavailable: 0with a positivemaxSurgeso new ready pods come up before old ones are removed. - A rolling update is only as safe as its readiness probe; make
/readyzreflect true serving capability. - Add a
preStophook so endpoint removal propagates before the process exits, preventing dropped in-flight requests. - Handle
SIGTERMin your app to drain connections, and sizeterminationGracePeriodSecondsto match. - Gate CD on
kubectl rollout statusand keeprollout undoready for a zero-downtime rollback. - Sequence backward-incompatible migrations with expand-and-contract, since both versions run during the rollout.

