Monitor Varnish 7 performance with Prometheus and Grafana dashboards

Intermediate 25 min May 23, 2026 30 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Set up comprehensive Varnish monitoring using prometheus-varnish-exporter, custom Grafana dashboards, and performance alerting rules for production cache optimization.

Prerequisites

  • Varnish 7 installed and running
  • Prometheus server configured
  • Grafana dashboard access
  • Basic systemd knowledge

What this solves

Varnish cache performance monitoring helps you track hit rates, response times, backend health, and memory usage in real-time. This tutorial sets up prometheus-varnish-exporter to collect metrics, configures Prometheus scraping, creates comprehensive Grafana dashboards, and establishes alerting rules for cache misses, high latency, and backend failures.

Step-by-step configuration

Install Varnish statistics daemon

First verify Varnish is running and enable statistics collection with varnishstat daemon.

sudo systemctl status varnish
varnishstat -1

Install prometheus-varnish-exporter

Download and install the official Prometheus exporter for Varnish metrics collection.

sudo apt update
wget https://github.com/jonnenauha/prometheus_varnish_exporter/releases/download/1.6.1/prometheus_varnish_exporter-1.6.1.linux-amd64.tar.gz
tar xzf prometheus_varnish_exporter-1.6.1.linux-amd64.tar.gz
sudo cp prometheus_varnish_exporter-1.6.1.linux-amd64/prometheus_varnish_exporter /usr/local/bin/
sudo chmod +x /usr/local/bin/prometheus_varnish_exporter
sudo dnf update -y
wget https://github.com/jonnenauha/prometheus_varnish_exporter/releases/download/1.6.1/prometheus_varnish_exporter-1.6.1.linux-amd64.tar.gz
tar xzf prometheus_varnish_exporter-1.6.1.linux-amd64.tar.gz
sudo cp prometheus_varnish_exporter-1.6.1.linux-amd64/prometheus_varnish_exporter /usr/local/bin/
sudo chmod +x /usr/local/bin/prometheus_varnish_exporter

Create varnish-exporter user

Create a dedicated system user for running the exporter securely without shell access.

sudo useradd --no-create-home --shell /bin/false varnish-exporter
sudo usermod -a -G varnish varnish-exporter

Create systemd service

Configure the exporter to run as a systemd service with proper security settings and automatic restart.

[Unit]
Description=Prometheus Varnish Exporter
After=network.target varnish.service
Requires=varnish.service

[Service]
Type=simple
User=varnish-exporter
Group=varnish-exporter
ExecStart=/usr/local/bin/prometheus_varnish_exporter \
    -varnish-listen-address :9131 \
    -web.listen-address :9131 \
    -web.telemetry-path /metrics
Restart=always
RestartSec=10
KillMode=process

Security settings

NoNewPrivileges=yes PrivateTmp=yes ProtectSystem=strict ProtectHome=yes ReadOnlyPaths=/ [Install] WantedBy=multi-user.target

Start and enable varnish-exporter

Enable the service to start on boot and verify it's collecting metrics properly.

sudo systemctl daemon-reload
sudo systemctl enable --now varnish-exporter
sudo systemctl status varnish-exporter

Test metrics endpoint

Verify the exporter is serving Varnish metrics in Prometheus format on port 9131.

curl http://localhost:9131/metrics | grep varnish_main_cache_hit

Configure firewall

Allow Prometheus server access to the exporter port while restricting external access.

sudo ufw allow from 203.0.113.10 to any port 9131
sudo ufw reload
sudo firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="203.0.113.10" port protocol="tcp" port="9131" accept'
sudo firewall-cmd --reload

Add Prometheus scrape configuration

Configure Prometheus to scrape Varnish metrics every 15 seconds for real-time monitoring.

  - job_name: 'varnish'
    static_configs:
      - targets: ['localhost:9131']
    scrape_interval: 15s
    metrics_path: /metrics
    scrape_timeout: 10s
    honor_labels: true

Restart Prometheus

Apply the new configuration and verify Varnish targets are being scraped successfully.

sudo systemctl restart prometheus
sudo systemctl status prometheus
curl http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | select(.job=="varnish")'

Create Grafana dashboard

Import a comprehensive Varnish dashboard with cache performance, hit rates, and backend health panels.

{
  "dashboard": {
    "id": null,
    "title": "Varnish Cache Performance",
    "tags": ["varnish", "cache", "performance"],
    "timezone": "browser",
    "panels": [
      {
        "id": 1,
        "title": "Cache Hit Rate",
        "type": "stat",
        "targets": [
          {
            "expr": "100 * (rate(varnish_main_cache_hit[5m]) / (rate(varnish_main_cache_hit[5m]) + rate(varnish_main_cache_miss[5m])))",
            "refId": "A"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "unit": "percent",
            "min": 0,
            "max": 100,
            "thresholds": {
              "steps": [
                {"color": "red", "value": 0},
                {"color": "yellow", "value": 80},
                {"color": "green", "value": 95}
              ]
            }
          }
        },
        "gridPos": {"h": 4, "w": 6, "x": 0, "y": 0}
      },
      {
        "id": 2,
        "title": "Requests per Second",
        "type": "stat",
        "targets": [
          {
            "expr": "rate(varnish_main_client_req[5m])",
            "refId": "A"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "unit": "reqps"
          }
        },
        "gridPos": {"h": 4, "w": 6, "x": 6, "y": 0}
      },
      {
        "id": 3,
        "title": "Backend Response Time",
        "type": "timeseries",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, rate(varnish_backend_req_duration_seconds_bucket[5m]))",
            "refId": "A",
            "legendFormat": "95th percentile"
          },
          {
            "expr": "histogram_quantile(0.50, rate(varnish_backend_req_duration_seconds_bucket[5m]))",
            "refId": "B",
            "legendFormat": "50th percentile"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "unit": "s"
          }
        },
        "gridPos": {"h": 8, "w": 12, "x": 0, "y": 4}
      },
      {
        "id": 4,
        "title": "Memory Usage",
        "type": "timeseries",
        "targets": [
          {
            "expr": "varnish_sma_g_bytes{type=\"s0\"}",
            "refId": "A",
            "legendFormat": "Used"
          },
          {
            "expr": "varnish_sma_g_space{type=\"s0\"}",
            "refId": "B",
            "legendFormat": "Available"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "unit": "bytes"
          }
        },
        "gridPos": {"h": 8, "w": 12, "x": 12, "y": 4}
      }
    ],
    "time": {
      "from": "now-1h",
      "to": "now"
    },
    "refresh": "5s"
  }
}

Import dashboard via API

Use Grafana API to import the dashboard and make it available for monitoring.

curl -X POST \
  http://admin:admin@localhost:3000/api/dashboards/db \
  -H 'Content-Type: application/json' \
  -d @/tmp/varnish-dashboard.json

Create alerting rules

Set up Prometheus alerting rules for low cache hit rates, high response times, and backend failures.

groups:
  • name: varnish_alerts
rules: - alert: VarnishLowCacheHitRate expr: 100 * (rate(varnish_main_cache_hit[5m]) / (rate(varnish_main_cache_hit[5m]) + rate(varnish_main_cache_miss[5m]))) < 80 for: 2m labels: severity: warning service: varnish annotations: summary: "Varnish cache hit rate is below 80%" description: "Cache hit rate is {{ $value }}% for the last 5 minutes" - alert: VarnishHighBackendResponseTime expr: histogram_quantile(0.95, rate(varnish_backend_req_duration_seconds_bucket[5m])) > 1 for: 2m labels: severity: critical service: varnish annotations: summary: "Varnish backend response time is high" description: "95th percentile backend response time is {{ $value }}s" - alert: VarnishBackendDown expr: varnish_backend_up == 0 for: 30s labels: severity: critical service: varnish annotations: summary: "Varnish backend is down" description: "Backend {{ $labels.backend }} is not responding" - alert: VarnishHighMemoryUsage expr: (varnish_sma_g_bytes{type="s0"} / varnish_sma_g_space{type="s0"}) * 100 > 90 for: 5m labels: severity: warning service: varnish annotations: summary: "Varnish memory usage is high" description: "Memory usage is {{ $value }}% of allocated space" - alert: VarnishClientErrors expr: rate(varnish_main_client_resp_4xx[5m]) > 10 for: 2m labels: severity: warning service: varnish annotations: summary: "High rate of 4xx errors from Varnish" description: "4xx error rate is {{ $value }} per second"

Update Prometheus configuration

Add the new alerting rules file to Prometheus configuration and restart the service.

rule_files:
  - "/etc/prometheus/rules/varnish.yml"
sudo systemctl restart prometheus
sudo systemctl status prometheus

Configure advanced dashboard panels

Create additional monitoring panels for thread usage, object lifetime, and purge operations.

{
  "panels": [
    {
      "id": 5,
      "title": "Thread Pool Usage",
      "type": "timeseries",
      "targets": [
        {
          "expr": "varnish_main_threads",
          "refId": "A",
          "legendFormat": "Active Threads"
        },
        {
          "expr": "varnish_main_thread_queue_len",
          "refId": "B",
          "legendFormat": "Queued Requests"
        }
      ],
      "gridPos": {"h": 6, "w": 8, "x": 0, "y": 12}
    },
    {
      "id": 6,
      "title": "Object Lifecycle",
      "type": "timeseries",
      "targets": [
        {
          "expr": "rate(varnish_main_n_object[5m])",
          "refId": "A",
          "legendFormat": "Objects Created"
        },
        {
          "expr": "rate(varnish_main_n_objecthead[5m])",
          "refId": "B",
          "legendFormat": "Object Headers"
        }
      ],
      "gridPos": {"h": 6, "w": 8, "x": 8, "y": 12}
    },
    {
      "id": 7,
      "title": "Purge Operations",
      "type": "timeseries",
      "targets": [
        {
          "expr": "rate(varnish_main_n_purge[5m])",
          "refId": "A",
          "legendFormat": "Purges per second"
        },
        {
          "expr": "rate(varnish_main_n_purge_obj[5m])",
          "refId": "B",
          "legendFormat": "Objects purged"
        }
      ],
      "gridPos": {"h": 6, "w": 8, "x": 16, "y": 12}
    }
  ]
}

Set up Grafana notifications

Configure notification channels for critical Varnish alerts to integrate with your existing monitoring workflow.

curl -X POST \
  http://admin:admin@localhost:3000/api/alert-notifications \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "varnish-alerts",
    "type": "webhook",
    "settings": {
      "url": "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK",
      "httpMethod": "POST"
    }
  }'

Verify your setup

Test that all components are working together and collecting metrics properly.

# Check exporter is running and serving metrics
sudo systemctl status varnish-exporter
curl -s http://localhost:9131/metrics | grep -E "varnish_main_(cache_hit|cache_miss|client_req)"

Verify Prometheus is scraping Varnish metrics

curl -s http://localhost:9090/api/v1/query?query=up{job="varnish"}

Test that dashboards show data

curl -s http://admin:admin@localhost:3000/api/search?query=varnish

Generate some cache activity to see metrics

varnishtest -v /usr/share/doc/varnish/examples/test01.vtc

Check alert rules are loaded

curl -s http://localhost:9090/api/v1/rules | jq '.data.groups[] | select(.name=="varnish_alerts")'

Common issues

SymptomCauseFix
Exporter returns "permission denied" User not in varnish group sudo usermod -a -G varnish varnish-exporter
No metrics in Prometheus Firewall blocking scrape Check firewall rules and Prometheus config
Dashboard shows "No data" Wrong metric names or time range Verify metric names with curl localhost:9131/metrics
Alerts not firing Rules syntax error or thresholds too high promtool check rules /etc/prometheus/rules/varnish.yml
High memory usage alerts Varnish cache size misconfigured Adjust malloc size in varnish systemd config
Note: For production environments, consider implementing advanced Grafana dashboards and alerting with more sophisticated notification routing and escalation policies.

Next steps

Running this in production?

Need this managed for you? Setting up monitoring once is straightforward. Keeping dashboards updated, alert thresholds tuned, and responding to performance issues across environments is the harder part. See how we run infrastructure like this for European SaaS and e-commerce teams.

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle managed devops services for businesses that depend on uptime. From initial setup to ongoing operations.