StackBlaze manages a Kubernetes HorizontalPodAutoscaler (HPA) for every service with autoscaling enabled. The HPA continuously monitors CPU and memory metrics from the cluster's Metrics Server and adjusts the Deployment's replica count to match actual demand.
Scale-up is fast, new pods typically start within 30 seconds. Scale-down is intentionally slow, the HPA waits 5 minutes of below-threshold utilization before removing pods, preventing thrashing during bursty traffic.
Autoscaling architecture
Autoscaling configuration
Minimum replicas
Always running, even at zero traffic
2
Maximum replicas
Hard ceiling on scale-out
10
Target CPU threshold
Scale up when average CPU exceeds this
60%
20%50%80%100%
Monthly spend cap
Pause scaling when limit reached
$150 / mo
Scale-up event log
HPA event log, my-web-service
14:02:31 HPA metrics collected: CPU avg 42% across 2 replicas
14:07:14 HPA metrics collected: CPU avg 58% across 2 replicas
14:08:01 HPA metrics collected: CPU avg 78% across 2 replicas
14:08:48 HPA: all 4 replicas healthy, load balancer updated
14:09:02 HPA metrics collected: CPU avg 41% across 4 replicas
Under the hood
HorizontalPodAutoscaler: Kubernetes native HPA resource targets your Deployment. It polls the Metrics Server every 15 seconds and uses a proportional algorithm to calculate the ideal replica count: ceil(current * currentMetric / desiredMetric).
Metrics Server: a lightweight aggregator that collects CPU and memory usage from each node's kubelet. StackBlaze keeps it running on every cluster. Custom metrics (requests/second, queue depth) are available on enterprise plans via the Prometheus adapter.
Scale-down stabilisation: the HPA uses a 5-minute stabilisation window before removing pods. This prevents flapping when traffic is bursty. Scale-up has no delay, it reacts immediately to protect user experience.
Pod Disruption Budget: StackBlaze automatically creates a PDB ensuring at least 50% of replicas remain available during node drains and cluster upgrades. Your service stays up even during maintenance windows.
Step by step
01
Set minimum and maximum replicas
In the StackBlaze dashboard, open your service and go to the "Scaling" tab. Set a minimum replica count (we recommend at least 2 for production to survive a node failure) and a maximum that fits your budget and expected load.
02
Set a CPU or memory threshold
Choose the metric that best predicts load for your service type. CPU works well for compute-bound services (API servers, workers). Memory is better for data-heavy workloads. StackBlaze defaults to 60% CPU utilization, scale up triggers when average usage across all pods exceeds the threshold.
03
Optionally set a spend cap
Enable the monthly spend cap to prevent runaway scaling costs. When your service hits the cap, scaling pauses and you receive an alert. You can bump the cap or investigate traffic spikes from the dashboard without facing surprise bills.
04
Test with a load generator
Use hey, k6, or Locust to send sustained load to your service. Watch the "Scaling" tab in real-time as the HPA scales up replicas. Check the event log to see exactly when scale-up and scale-down events fired and what metric triggered them.