Kubernetes Cost Optimization: The Essential Guide to Cutting Cloud Spend in 2026

Kubernetes cost optimization is the problem most tams reach for too late, after the AWS invoice lands at twice the expected amount and the CFO is asking questions that engineering cannot answer cleanly. The instinct at that point is to look for a single expensive resource to cut. The reality is almost always different: 32-40% of cloud budget is wasted on idle resources and over-provisioned pods, distributed invisibly across dozens of workloads, accumulating quarter without any single obvious culprit.
This guide covers Kubernetes cost optimization with the especificity that generic FinOps guides skip: why Kubernetes makes cost visibility structurally difficult, the exact commands and configuration to rightsize pods, the Spot instance strategy that saves 60-90% on appropiate workloads, the tools (Kubecost, CAST AI, Infracost) that provide visibility at the namespace and service level, the namespace scheduling pattern that eliminates 70% of non-production waste, and the AI workload cost model that is making bills spike in 2026 for teams running ML pipelines on Kubernetes.

In this guide

Why Kubernetes Makes Cost Visibility Hard by Default

Standard cloud billing tools: AWS Cost Explorer, GCP Cost Management, Azure Cost Analysis, show cluster cost as a single line item. They tell you that your EKS cluster cost €18,000 last month. They do not tell you which namespace, which deployment, or which team generated that cost.
This is the structural problem that makes Kubernetes cost optimization harder than EC2 cost optimization. On EC2, each instance has a predictable cost attributable to a specific workload. On Kubernetes, a node running 20 pods from four different teams generates a single invoice line.
Allocating that cost to specific services requires knowing the resource requests of each pod relative to the total node capacity, a calculation that no cloud billing tool performs by default.

The consequence: teams running Kubernetes without dedicated cost visibility tooling are making optimization decisions based on guesswork. They do not know whether their Payments service is expensive or their Data Pipeline, they only know the total cluster cost. Without allocation, there is nothing to optimize.

The three cost drivers to instrument first:

CPU and memory requests are what Kubernetes uses to schedule pods to nodes and what you pay for not actual usage. A pod requesting 2 CPU and 4 GB RAM reserves that capacity on a node whether it uses 10% of it or 100% of it. The delta between requests and actual usage is waste.

Node utilization is the cluster-level signal. A cluster running at 40% average node utilization means you are paying for 60% of your compute to sit idle. The correct target for production clusters is 60-70% utilization, leaving headroom for traffic spikes without significant idle waste.

Idle namespaces are non-production environments (dev, staging, QA) that run 24/7 but are only used during business hours. A staging cluster running 168 hours per week but used 50 hours per week wastes 70% of its compute cost. This is the single highest-impact fix in Kubernetes cost optimization and the one that requires the least technical complexity.

Kubernetes Cost Optimization: Rightsizing Pods

Over-provisioned resource requests are the largest source of Kubernetes waste at most startups. An engineer sets requests.cpu: 1000m and requests.memory: 2Gi for a service that is actually running at 150m CPU and 400Mi memory because it felt safe at deployment time and nobody revisited it.

Multiply this across 30 services and the over-provisioning compounds to thousands of euros per month in reserved-but-unused capacity.

Step 1: Measure actual usage against requests:

# Current requests across all pods
kubectl get pods --all-namespaces -o json | jq '
  .items[] |
  {
    namespace: .metadata.namespace,
    name: .metadata.name,
    requests: .spec.containers[].resources.requests
  }
'

# Actual usage vs requests (requires metrics-server)
kubectl top pods --all-namespaces --sort-by=cpu

# Find pods using less than 50% of requested CPU
# (pipe through your preferred tool for threshold filtering)
kubectl top pods --all-namespaces | awk 'NR>1 {print $1, $2, $3}' | sort -k3 -rn

Step 2: Set requests based on observed usage, not guesses:

The production rule for Kubernetes cost optimization: set CPU requests at p95 actual usage + 20% buffer. Set memory requests at p99 actual usage + 30% buffer (memory spikes are more dangerous than CPU spikes, CPU throttling is recoverable, OOMKill is not).

# Before: provisioned by intuition
resources:
  requests:
    cpu: "1000m"
    memory: "2Gi"
  limits:
    cpu: "2000m"
    memory: "4Gi"

# After: provisioned from 30 days of actual usage data
# Actual p95 CPU: 120m, Actual p99 memory: 380Mi
resources:
  requests:
    cpu: "150m"      # p95 + 20% buffer
    memory: "500Mi"  # p99 + 30% buffer
  limits:
    cpu: "500m"      # 3x request - allows burst without starving neighbors
    memory: "750Mi"  # 50% above request - memory limit tighter than CPU

Step 3: Vertical Pod Autoscaler (VPA) for continuous rightsizing:

Manual rightsizing is a point-in-time fix. VPA continuously monitors actual usage and either recommends or automatically adjusts resource requests based on observed patterns.

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: payment-api-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: payment-api
  updatePolicy:
    updateMode: "Off"  # Start with 'Off' - recommendations only, no auto-apply
    # Move to 'Auto' once you trust the recommendations
  resourcePolicy:
    containerPolicies:
    - containerName: payment-api
      minAllowed:
        cpu: 50m
        memory: 100Mi
      maxAllowed:
        cpu: 2000m
        memory: 4Gi

# Install VPA
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vertical-pod-autoscaler.yaml

# Check VPA recommendations
kubectl get vpa payment-api-vpa -n production -o json | \
  jq '.status.recommendation.containerRecommendations'

Start VPA in recommendation mode (Off). Review the recommendations for two to four weeks before switching to automatic mode. The recommendations are usually accurate, but there are edge cases, batch jobs with irregular usage patterns, services with traffic spikes that the historical window does not capture — where automatic adjustment needs human review.

Kubernetes Cost Optimization: Namespace Scheduling

Non-production environments running 24/7 are the highest-impact Kubernetes cost optimization fix with the lowest implementation complexity. A staging cluster used 10 hours/day, 5 days/week has 118 hours/week of idle compute that you pay for at the same rate as production.

Automated scale-down with CronJobs:

# Scale down all deployments in staging and dev at 8pm weekdays
apiVersion: batch/v1
kind: CronJob
metadata:
  name: scale-down-non-prod
  namespace: kube-system
spec:
  schedule: "0 20 * * 1-5"
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: namespace-scaler
          restartPolicy: OnFailure
          containers:
          - name: kubectl
            image: bitnami/kubectl:latest
            command:
            - /bin/sh
            - -c
            - |
              for ns in staging dev qa preview; do
                kubectl scale deployment --all --replicas=0 -n $ns
                kubectl scale statefulset --all --replicas=0 -n $ns
              done
              echo "Scale-down complete at $(date)"

# Scale back up at 7am weekdays
apiVersion: batch/v1
kind: CronJob
metadata:
  name: scale-up-non-prod
  namespace: kube-system
spec:
  schedule: "0 7 * * 1-5"
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: namespace-scaler
          restartPolicy: OnFailure
          containers:
          - name: kubectl
            image: bitnami/kubectl:latest
            command:
            - /bin/sh
            - -c
            - |
              for ns in staging dev qa; do
                kubectl scale deployment --all --replicas=1 -n $ns
              done
              echo "Scale-up complete at $(date)"

# RBAC for the namespace scaler CronJobs
apiVersion: v1
kind: ServiceAccount
metadata:
  name: namespace-scaler
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: namespace-scaler
rules:
- apiGroups: ["apps"]
  resources: ["deployments", "statefulsets"]
  verbs: ["get", "list", "patch", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: namespace-scaler
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: namespace-scaler
subjects:
- kind: ServiceAccount
  name: namespace-scaler
  namespace: kube-system

For RDS instances in non-production environments, AWS supports scheduled start/stop natively. A dev database running 50 hours/week instead of 168 costs 70% less with zero configuration complexity beyond setting a schedule tag.

Kubernetes Cost Optimization with Kubecost and CAST AI

Two tools dominate the Kubernetes cost optimization space in 2026. They solve different problems and are complementary rather than competing.

Kubecost: Cost Visibility and Allocation

Kubecost provides the visibility layer that cloud billing tools cannot: cost breakdown by namespace, deployment, label, and team. It answers the question “which service is expensive”, which is the prerequisite for any optimization decision.

# Install Kubecost
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm repo update

helm install kubecost kubecost/cost-analyzer \
  --namespace kubecost \
  --create-namespace \
  --set kubecostToken="your-token" \
  --set prometheus.nodeExporter.enabled=true \
  --set kubecostProductConfigs.clusterName="production-cluster"

# Access the dashboard
kubectl port-forward -n kubecost svc/kubecost-cost-analyzer 9090:9090
# Open http://localhost:9090

The key Kubecost view for Kubernetes cost optimization is the efficiency score per namespace, the ratio of actual usage to requested resources. A namespace with a 35% efficiency score means 65% of reserved compute is idle. That is the first target for rightsizing.

Kubecost allocation query via API:

# Cost breakdown by namespace for the last 30 days
curl -G http://localhost:9090/model/allocation \
  --data-urlencode "window=30d" \
  --data-urlencode "aggregate=namespace" \
  --data-urlencode "accumulate=true" | \
  jq '.data[0] | to_entries | sort_by(-.value.totalCost) | .[:10] | .[] | {namespace: .key, cost: .value.totalCost, efficiency: .value.totalEfficiency}'

The output ranks namespaces by cost and shows efficiency per namespace. Any namespace with efficiency below 0.5 (50%) is a rightsizing candidate.

CAST AI: Automated Kubernetes Cost Optimization

Teams using CAST AI report average savings of 50-65% on Kubernetes costs and the tool takes those savings on a percentage basis, so you only pay when it saves you money. StxNext

CAST AI goes beyond visibility into automated remediation. It connects to your EKS, GKE, or AKS cluster and continuously optimizes: rightsizing pods based on actual usage, scaling nodes in and out based on real demand, and replacing On-Demand nodes with Spot nodes for workloads that tolerate interruption.

The CAST AI model is particularly effective for Kubernetes cost optimization because it handles the complexity of Spot instance management automatically, detecting interruptions, draining workloads before termination, and replacing capacity without manual intervention.

For startups without dedicated FinOps headcount, CAST AI provides managed optimization that would otherwise require a senior infrastructure engineer’s continuous attention. The trade-off is that the tool takes a percentage of savings achieved, typically 10-20%, which means it is only economically rational when the savings it delivers are significantly larger than what you would achieve manually.

When to use CAST AI vs manual optimization:

Use CAST AI when: your cluster spend exceeds €5,000/month, you lack dedicated infrastructure headcount, and your workloads are primarily stateless. The managed automation justifies its cost at that scale.

Use manual optimization (Kubecost + VPA + CronJobs) when: you are below €5,000/month cluster spend, you have the infrastructure expertise in-house, or your workload mix has enough stateful components that automated Spot selection requires careful supervision.

Spot Instances: The Highest-Impact Kubernetes Cost Optimization Lever

Spot instances (AWS), preemptible VMs (GCP), and spot VMs (Azure) offer 60-90% discounts over On-Demand pricing. For fault-tolerant Kubernetes workloads, they’re the single biggest cost lever available. StxNext

The interruption constraint means Spot is appropriate for: stateless web servers, CI/CD runners, batch processing, ML training jobs (with checkpoint support), and non-production environments. It is not appropriate for: databases, stateful services, single-replica critical workloads, or anything where a 2-minute termination notice would cause a production incident.

Mixed node pool strategy on EKS:

# eksctl cluster config - On-Demand for critical, Spot for stateless
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: production-cluster
  region: us-east-1

managedNodeGroups:
  # Baseline On-Demand - stateful and critical workloads
  - name: on-demand-baseline
    instanceType: m6i.xlarge
    minSize: 2
    maxSize: 6
    labels:
      node-lifecycle: on-demand
    taints:
      - key: node-lifecycle
        value: on-demand
        effect: NoSchedule

  # Spot pool - stateless workloads, 60-90% cheaper
  - name: spot-stateless
    instanceTypes:
      - m6i.large
      - m6i.xlarge
      - m5.large
      - m5.xlarge
      - m5a.large    # Multiple instance types reduces interruption risk
    spot: true
    minSize: 0
    maxSize: 30
    labels:
      node-lifecycle: spot

Scheduling stateless workloads to Spot nodes:

spec:
  template:
    spec:
      tolerations:
        - key: "node-lifecycle"
          operator: "Equal"
          value: "spot"
          effect: "NoSchedule"
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 80
              preference:
                matchExpressions:
                  - key: node-lifecycle
                    operator: In
                    values: ["spot"]
      # Ensure graceful shutdown on Spot interruption
      terminationGracePeriodSeconds: 90

AWS Node Termination Handler; mandatory for Spot:

helm repo add eks https://aws.github.io/eks-charts
helm install aws-node-termination-handler eks/aws-node-termination-handler \
  --namespace kube-system \
  --set enableSpotInterruptionDraining=true \
  --set enableScheduledEventDraining=true \
  --set enableRebalanceMonitoring=true

The Node Termination Handler interceots the 2-minute Spot interruption notice ands gracefully drains the node before AWS terminates it: cordoning the node, evicting pods, and allowing them to reschedule to healthy nodes. Without it, Spot interruptions cause abrupt pod termination.

Using multiple instance types in the Spot pool significantly reduces interruption frequency. When one instance type is reclaimed, the cluster can use capacity from the others. A pool with five instance type options has substantially lower interruption rates than a pool with one.

Kubernetes Cost Optimization: The AI Workload Problem

AI workloads are the new cost driver that makes Kubernetes cost optimization more urgent and more complex in 2026. ML training and inference workloads running on GPU nodes are fundamentally different from standard compute workloads and the default Kubernetes resource model handles them poorly.

The GPU cost model:

GPU nodes are expensive. An NVIDIA A10G instance (g5.xlarge on AWS) costs approximately €1.10/hour On-Demand. A cluster with four GPU nodes running 24/7 costs €3,168/month. If those GPU nodes are training models for 8 hours/day and sitting idle the remaining 16 hours, 67% of that cost, over €2,100/month, delivers nothing.

GPU quota enforcement:

# Prevent GPU over-provisioning by namespace
apiVersion: v1
kind: ResourceQuota
metadata:
  name: gpu-quota
  namespace: ml-workloads
spec:
  hard:
    requests.nvidia.com/gpu: "4"   # Maximum 4 GPUs for this namespace
    limits.nvidia.com/gpu: "4"
---
apiVersion: v1
kind: LimitRange
metadata:
  name: gpu-limit-range
  namespace: ml-workloads
spec:
  limits:
  - type: Container
    max:
      nvidia.com/gpu: "2"    # No single container can claim more than 2 GPUs
    default:
      nvidia.com/gpu: "1"
    defaultRequest:
      nvidia.com/gpu: "1"

GPU node scale-to-zero from training workloads:

Training jobs are batch workloads, they run, complete, and de GPU is no longer needer. Use Kubernetes Job resources (not Deployments) for training, and configure Cluster Autoscaler with aggressive scale-down for GPU node pools:

# GPU node pool with aggressive scale-down
# Add these annotations to the GPU node group in your cluster autoscaler config
cluster-autoscaler.kubernetes.io/scale-down-delay-after-add: "10m"
cluster-autoscaler.kubernetes.io/scale-down-unneeded-time: "10m"
cluster-autoscaler.kubernetes.io/scale-down-utilization-threshold: "0.3"

When the training Job completes, the GPU node becomes unneeded and scales down within 10 minutes, eliminating the idle cost entirely. The next training job triggers a scale-up, which takes 3-5 minutes for a new GPU node to join the cluster.

Infracost: Catching Kubernetes Cost Changes in CI

The most effective point for Kubernetes cost optimization is before a resource is created at the infrastructure code review stage.

Infracost integrates into your CI/CD pipeline and shows the cost impact of Terraform changes before they’re applied. A pull request that adds a new RDS instance or changes an EC2 instance type will show the estimated monthly cost delta right in the PR comment. github

For Kubernetes, Infracost analyses Terraform changes to EKS node groups, RDS instances, and other AWS resources that underpin the cluster, not the Kubernetes YAML itself. The combination of Infracost (for infrastructure cost changes) and Kubecost (for runtime allocation) provides full-cycle cost visibility.

# .github/workflows/infracost.yml
name: Infracost

on: [pull_request]

jobs:
  infracost:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write
    steps:
      - uses: actions/checkout@v4

      - name: Setup Infracost
        uses: infracost/actions/setup@v3
        with:
          api-key: ${{ secrets.INFRACOST_API_KEY }}

      - name: Generate cost diff
        run: |
          infracost diff \
            --path=. \
            --format=json \
            --out-file=/tmp/infracost.json

      - name: Post PR comment
        uses: infracost/actions/comment@v3
        with:
          path: /tmp/infracost.json
          behavior: update

An engineer adding a new EKS node group or changing an existing node group instance type sees the monthly cost impact in the PR comment before the change is applied. A node group change from m6i.xlarge to m6i.2xlarge across 5 nodes is €800/month, visible at review time, not on next month’s invoice.

Kubernetes Cost Optimization Checklist

Use this checklist for a quarterly Kubernetes cost optimization audit:

RIGHTSIZING
[ ] Every pod has resource requests AND limits defined
[ ] Resource requests are based on observed p95/p99 usage, not estimates
[ ] VPA installed and generating recommendations
[ ] No namespace has efficiency score below 50% (check Kubecost)

SCHEDULING
[ ] Non-production namespaces scale to zero outside business hours
[ ] CI/CD runners run on Spot nodes
[ ] Stateless web services tolerate Spot and have node affinity configured
[ ] Node Termination Handler installed for Spot pools

AUTOSCALING
[ ] Cluster Autoscaler configured with appropriate scale-down thresholds
[ ] HPA configured for all variable-traffic services
[ ] GPU node pools have aggressive scale-down (training jobs use Job, not Deployment)

VISIBILITY
[ ] Kubecost or equivalent installed - cost visible by namespace and service
[ ] All resources tagged with team, environment, and service
[ ] Monthly cost review in engineering calendar
[ ] Infracost in CI pipeline for infrastructure changes

COMMITMENTS
[ ] Baseline steady-state compute covered by Savings Plans (after 90 days of data)
[ ] Savings Plan coverage reviewed quarterly against actual usage

Conclusion

Kubernetes cost optimization is not a one-time project. It is an operational discipline with the same cadence as performance tuning and security hardening, quarterly audits, continuous monitoring, and tooling that makes waste visible before it compounds.

The 32-40% waste rate that most teams run at is not a sign of careless engineering. It is the predictable result of building quickly without the visibility layer that makes cost-aware decisions possible. Kubecost provides that visibility. VPA provides continuous rightsizing. Spot instances provide the single largest cost lever for appropriate workloads. Infracost prevents expensive decisions from reaching production unnoticed.

At The Good Shell we design and operate Kubernetes infrastructure for startups managing growing cloud spend. See our DevOps and infrastructure services or our case studies to see what a production Kubernetes cost optimization engagement looks like.

For the Kubernetes-native reference on resource management and autoscaling, the official Kubernetes documentation on resource management covers the scheduling and limits model that every optimization practice in this guide builds on.