Storage

Configure health checks

5 min readUpdated April 2026

Health checks are Kubernetes’ mechanism for keeping your application reliable without operator intervention. StackBlaze configures two probe types from your dashboard: a readiness probe that controls whether a pod receives traffic, and a liveness probe that controls whether a pod is restarted.

Getting these right is the single most impactful thing you can do to make your service production-grade. A well-configured health check means zero-downtime deploys, automatic recovery from crashes, and no surprises for users.

Readiness vs liveness, what's the difference?

ProbeWhat happens on failureUse for
ReadinessPod removed from load balancer endpoints, no traffic sentApp not yet warm, DB migration running, cache loading
LivenessPod is killed and replaced with a fresh containerDeadlock, OOM, infinite loop, frozen event loop

Express.js, /health endpoint

src/health.ts

import db from './db'

 

app.get('/health', async (req, res) => {

  try {

    // Verify DB reachable, fail fast if not

    await db.raw('SELECT 1')

    res.json(({ status: 'ok', uptime: process.uptime() }))

  } catch {

    res.status(503).json(({ status: 'error' }))

  }

})

FastAPI, /health endpoint

main.py

from fastapi import FastAPI

from sqlalchemy import text

from .database import engine

 

app = FastAPI()

 

@app.get("/health")

async def health():

  with engine.connect() as conn:

    conn.execute(text("SELECT 1"))

  return {"status": "ok"}

Traffic routing with health checks

Load Balancer

routes to passing

pods only

Pod 1, ready

GET /health → 200

receiving traffic

Pod 2, ready

GET /health → 200

receiving traffic

Pod 3, restarting

GET /health → 503

no traffic sent

Under the hood

StackBlaze injects both probes into the pod spec using the values you configure in the Health tab:

  • readinessProbe: httpGet to your configured path. failureThreshold: 3, periodSeconds: 10 by default. Pod is removed from Service endpoints on failure and re-added once it recovers.
  • livenessProbe: same httpGet mechanism but with a longer initial delay (initialDelaySeconds: 30) so the pod has time to start. On consecutive failure, kubelet kills the container.
  • Rolling deploys wait for readiness: during a rolling update, minReadySeconds and the readiness probe together ensure no old pod is terminated until the new one is confirmed healthy. This is what gives you true zero-downtime deploys.
  • startupProbe (advanced), for slow-starting apps (e.g. JVM warm-up), enable the startup probe in advanced settings. It disables liveness checks until the startup probe passes, preventing premature restarts during initialisation.

Step by step

01

Add a /health endpoint to your app

Create a lightweight HTTP endpoint that returns 200 OK when your service is ready to accept traffic. The endpoint should verify that critical dependencies (database connections, caches) are reachable. A 5xx response or a timeout signals to Kubernetes that the pod is unhealthy.

02

Configure the health check in the dashboard

Go to your service → Health tab. Set the path (e.g. /health), the check interval (default: 10s), and the failure threshold (default: 3 consecutive failures before action is taken). You can configure readiness and liveness probes independently with different paths and intervals.

03

StackBlaze routes traffic only to passing pods

Kubernetes removes pods failing their readiness probe from the Service's endpoint list. Traffic is automatically redistributed to healthy pods. This happens silently, end users see no errors. During a rolling deploy, new pods must pass their readiness probe before old pods are terminated.

04

Failed liveness probe triggers automatic restart

While the readiness probe gates traffic, the liveness probe gates pod survival. If a pod fails the liveness probe consecutively (at the configured failure threshold), Kubernetes kills it and starts a fresh replacement. This self-heals frozen or deadlocked processes without any manual intervention.

Dashboard configuration reference

FieldDefaultDescription
Path/healthHTTP GET path for both probes
Period10sHow often the probe runs
Failure threshold3Consecutive failures before action is taken
Timeout5sHow long to wait for a response
Initial delay10sSeconds after container start before probing begins