Implement Kubernetes workload rightsizing with VPA recommendations and cost analysis

Advanced 45 min Apr 26, 2026 43 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Set up Vertical Pod Autoscaler to automatically optimize resource requests and limits for your Kubernetes workloads. Create cost analysis dashboards to track resource utilization and identify opportunities for rightsizing containers in production clusters.

Prerequisites

  • Kubernetes cluster with admin access
  • Helm 3 installed
  • At least 4GB available cluster memory
  • metrics-server running

What this solves

Kubernetes workloads often run with poorly configured resource requests and limits, leading to wasted CPU and memory or application performance issues. The Vertical Pod Autoscaler (VPA) analyzes historical resource usage and provides recommendations for optimal resource allocation. This tutorial shows you how to deploy VPA, configure it for your workloads, and build cost analysis dashboards to track resource efficiency across your cluster.

Prerequisites

You need a running Kubernetes cluster with metrics-server installed and at least 4 GB of available memory. Your cluster should have Helm 3 installed for package management. You'll also need cluster-admin permissions to deploy VPA components and configure RBAC policies.

Step-by-step installation

Install metrics-server for resource collection

The VPA requires metrics-server to collect resource usage data from your pods. Install it using the official manifest if not already present.

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Verify metrics-server is running and collecting data:

kubectl get pods -n kube-system | grep metrics-server
kubectl top nodes
kubectl top pods --all-namespaces

Deploy Vertical Pod Autoscaler

Clone the VPA repository and deploy the components using the provided installation script.

git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-install.sh

Verify all VPA components are running:

kubectl get pods -n kube-system | grep vpa
kubectl get crd | grep verticalpodautoscaler

Install Prometheus for metrics collection

Deploy Prometheus using Helm to collect detailed resource metrics for cost analysis. Add the Prometheus community Helm repository:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

Create a values file for Prometheus configuration:

server:
  retention: "15d"
  resources:
    requests:
      cpu: 200m
      memory: 512Mi
    limits:
      cpu: 500m
      memory: 1Gi
  persistentVolume:
    size: 20Gi

nodeExporter:
  enabled: true

kubeStateMetrics:
  enabled: true

alertmanager:
  enabled: false

pushgateway:
  enabled: false

Install Prometheus in the monitoring namespace:

kubectl create namespace monitoring
helm install prometheus prometheus-community/prometheus \
  --namespace monitoring \
  --values prometheus-values.yaml

Deploy Grafana for visualization

Install Grafana to create cost analysis dashboards using the official Helm chart:

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

Create Grafana configuration values:

adminPassword: "AdminPassword123!"

persistence:
  enabled: true
  size: 10Gi

service:
  type: ClusterIP
  port: 80

datasources:
  datasources.yaml:
    apiVersion: 1
    datasources:
    - name: Prometheus
      type: prometheus
      url: http://prometheus-server.monitoring.svc.cluster.local
      access: proxy
      isDefault: true

resources:
  requests:
    cpu: 100m
    memory: 256Mi
  limits:
    cpu: 500m
    memory: 512Mi

Install Grafana:

helm install grafana grafana/grafana \
  --namespace monitoring \
  --values grafana-values.yaml

Configure VPA for workload analysis

Create a VPA resource to analyze an existing deployment. This example targets a web application deployment:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: webapp-vpa
  namespace: default
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: webapp
  updatePolicy:
    updateMode: "Off"  # Only provide recommendations
  resourcePolicy:
    containerPolicies:
    - containerName: webapp
      minAllowed:
        cpu: 10m
        memory: 64Mi
      maxAllowed:
        cpu: 1000m
        memory: 2Gi
      controlledResources: ["cpu", "memory"]
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: webapp
  namespace: default
spec:
  replicas: 3
  selector:
    matchLabels:
      app: webapp
  template:
    metadata:
      labels:
        app: webapp
    spec:
      containers:
      - name: webapp
        image: nginx:1.24
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 200m
            memory: 256Mi
        ports:
        - containerPort: 80

Apply the configuration:

kubectl apply -f vpa-recommendation.yaml

Create cost analysis dashboard

Create a Grafana dashboard configuration for resource utilization and cost tracking:

{
  "dashboard": {
    "id": null,
    "title": "Kubernetes Cost Analysis",
    "tags": ["kubernetes", "cost", "vpa"],
    "timezone": "browser",
    "panels": [
      {
        "id": 1,
        "title": "CPU Utilization by Namespace",
        "type": "stat",
        "targets": [
          {
            "expr": "sum(rate(container_cpu_usage_seconds_total[5m])) by (namespace)",
            "legendFormat": "{{namespace}}"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "unit": "percent"
          }
        },
        "gridPos": {"h": 8, "w": 12, "x": 0, "y": 0}
      },
      {
        "id": 2,
        "title": "Memory Utilization by Namespace",
        "type": "stat",
        "targets": [
          {
            "expr": "sum(container_memory_usage_bytes) by (namespace) / 1024^3",
            "legendFormat": "{{namespace}}"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "unit": "GB"
          }
        },
        "gridPos": {"h": 8, "w": 12, "x": 12, "y": 0}
      },
      {
        "id": 3,
        "title": "Resource Requests vs Usage",
        "type": "timeseries",
        "targets": [
          {
            "expr": "sum(kube_pod_container_resource_requests{resource=\"cpu\"}) by (namespace)",
            "legendFormat": "CPU Requests - {{namespace}}"
          },
          {
            "expr": "sum(rate(container_cpu_usage_seconds_total[5m])) by (namespace)",
            "legendFormat": "CPU Usage - {{namespace}}"
          }
        ],
        "gridPos": {"h": 8, "w": 24, "x": 0, "y": 8}
      }
    ],
    "time": {
      "from": "now-1h",
      "to": "now"
    },
    "refresh": "30s"
  }
}

Import this dashboard through the Grafana UI or using the API after setting up port forwarding.

Configure VPA recommendOnly mode

Set up VPA to provide recommendations without automatically updating pods. This is safer for production workloads:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: production-app-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: production-app
  updatePolicy:
    updateMode: "Off"
  resourcePolicy:
    containerPolicies:
    - containerName: app
      minAllowed:
        cpu: 50m
        memory: 128Mi
      maxAllowed:
        cpu: 2000m
        memory: 4Gi
      controlledResources: ["cpu", "memory"]
      controlledValues: "RequestsAndLimits"

Apply the VPA configuration:

kubectl apply -f vpa-recommendonly.yaml

Set up cost monitoring alerts

Create Prometheus alerting rules for resource waste detection:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: cost-optimization-alerts
  namespace: monitoring
spec:
  groups:
  - name: resource-waste
    rules:
    - alert: HighCPURequest
      expr: |
        (
          sum(kube_pod_container_resource_requests{resource="cpu"}) by (namespace, pod, container) -
          sum(rate(container_cpu_usage_seconds_total[1h])) by (namespace, pod, container)
        ) > 0.5
      for: 30m
      labels:
        severity: warning
      annotations:
        summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} has excessive CPU requests"
        description: "Container {{ $labels.container }} is requesting {{ $value }} more CPU than it uses"
    - alert: HighMemoryRequest
      expr: |
        (
          sum(kube_pod_container_resource_requests{resource="memory"}) by (namespace, pod, container) -
          sum(container_memory_usage_bytes) by (namespace, pod, container)
        ) / 1024^3 > 0.5
      for: 30m
      labels:
        severity: warning
      annotations:
        summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} has excessive memory requests"
        description: "Container {{ $labels.container }} is requesting {{ $value }}GB more memory than it uses"

Apply the alerting rules:

kubectl apply -f cost-alerts.yaml

Create automated rightsizing script

Build a script to extract VPA recommendations and generate rightsized resource configurations:

#!/bin/bash

Get VPA recommendations for all namespaces

echo "Fetching VPA recommendations..." for vpa in $(kubectl get vpa --all-namespaces -o jsonpath='{range .items[*]}{.metadata.namespace}{" "}{.metadata.name}{"\n"}{end}'); do namespace=$(echo $vpa | cut -d' ' -f1) vpa_name=$(echo $vpa | cut -d' ' -f2) echo "\n=== VPA Recommendations for $namespace/$vpa_name ===" # Extract target deployment target_ref=$(kubectl get vpa $vpa_name -n $namespace -o jsonpath='{.spec.targetRef.name}') # Get current resource requests echo "Current resource requests:" kubectl get deployment $target_ref -n $namespace -o jsonpath='{range .spec.template.spec.containers[*]}{.name}: CPU={.resources.requests.cpu}, Memory={.resources.requests.memory}{"\n"}{end}' # Get VPA recommendations echo "\nVPA recommendations:" kubectl get vpa $vpa_name -n $namespace -o jsonpath='{range .status.recommendation.containerRecommendations[*]}{.containerName}: CPU={.target.cpu}, Memory={.target.memory}{"\n"}{end}' # Calculate potential savings echo "\nRecommendation status:" kubectl get vpa $vpa_name -n $namespace -o jsonpath='{.status.conditions[*].message}' done

Make the script executable and run it:

chmod +x rightsize-workloads.sh
./rightsize-workloads.sh

Access your monitoring setup

Set up port forwarding to access Grafana and view your cost analysis dashboards:

kubectl port-forward -n monitoring service/grafana 3000:80

Access Grafana at http://localhost:3000 with username admin and the password you configured. Import the cost dashboard JSON to start analyzing resource utilization patterns.

Check VPA recommendations for your workloads:

kubectl describe vpa webapp-vpa
kubectl get vpa --all-namespaces -o wide

Configure workload optimization

For production workloads, consider these optimization strategies based on VPA recommendations:

ScenarioVPA ModeUpdate Strategy
Development/TestingAutoImmediate pod restart
StagingAutoRolling update during maintenance
ProductionOffManual review and deployment
Note: VPA requires pods to be restarted to apply new resource requests. Plan updates during maintenance windows for production workloads.

For more advanced cluster management, you might want to explore cluster autoscaling with mixed instance types to complement your workload rightsizing efforts.

Verify your setup

Confirm all components are functioning correctly:

# Check VPA components
kubectl get pods -n kube-system | grep vpa

Verify metrics collection

kubectl top pods --all-namespaces

Check VPA recommendations

kubectl get vpa --all-namespaces

Test Prometheus metrics

kubectl port-forward -n monitoring service/prometheus-server 9090:80 & curl -s "http://localhost:9090/api/v1/query?query=up" | jq '.data.result[] | select(.metric.__name__ == "up") | .value[1]'

Verify Grafana access

kubectl get secret -n monitoring grafana -o jsonpath="{.data.admin-password}" | base64 --decode echo

Common issues

SymptomCauseFix
VPA pods not startingInsufficient cluster resourcesEnsure cluster has at least 4GB available memory and check node capacity
No VPA recommendationsInsufficient metrics dataWait 24-48 hours for VPA to collect usage patterns, verify metrics-server is working
Grafana dashboard emptyPrometheus data source misconfiguredVerify Prometheus service URL and test data source connection in Grafana
VPA recommendations too aggressiveDefault safety margins too lowAdjust minAllowed and maxAllowed in VPA resource policy
Metrics-server certificate errorsSelf-signed certificatesAdd --kubelet-insecure-tls flag to metrics-server deployment

Next steps

Running this in production?

Want this handled for you? Running VPA and cost optimization at scale adds complexity around capacity planning, multi-cluster federation, and automated remediation workflows. See how we run infrastructure like this for European teams with comprehensive monitoring and cost optimization built in.

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle managed devops services for businesses that depend on uptime. From initial setup to ongoing operations.