Monitor Varnish 7 with Prometheus and Grafana

Set up comprehensive Varnish monitoring using prometheus-varnish-exporter, custom Grafana dashboards, and performance alerting rules for production cache optimization.

Prerequisites

Varnish 7 installed and running
Prometheus server configured
Grafana dashboard access
Basic systemd knowledge

What this solves

Varnish cache performance monitoring helps you track hit rates, response times, backend health, and memory usage in real-time. This tutorial sets up prometheus-varnish-exporter to collect metrics, configures Prometheus scraping, creates comprehensive Grafana dashboards, and establishes alerting rules for cache misses, high latency, and backend failures.

Step-by-step configuration

Install Varnish statistics daemon

First verify Varnish is running and enable statistics collection with varnishstat daemon.

sudo systemctl status varnish
varnishstat -1

Install prometheus-varnish-exporter

Download and install the official Prometheus exporter for Varnish metrics collection.

sudo apt update
wget https://github.com/jonnenauha/prometheus_varnish_exporter/releases/download/1.6.1/prometheus_varnish_exporter-1.6.1.linux-amd64.tar.gz
tar xzf prometheus_varnish_exporter-1.6.1.linux-amd64.tar.gz
sudo cp prometheus_varnish_exporter-1.6.1.linux-amd64/prometheus_varnish_exporter /usr/local/bin/
sudo chmod +x /usr/local/bin/prometheus_varnish_exporter

sudo dnf update -y
wget https://github.com/jonnenauha/prometheus_varnish_exporter/releases/download/1.6.1/prometheus_varnish_exporter-1.6.1.linux-amd64.tar.gz
tar xzf prometheus_varnish_exporter-1.6.1.linux-amd64.tar.gz
sudo cp prometheus_varnish_exporter-1.6.1.linux-amd64/prometheus_varnish_exporter /usr/local/bin/
sudo chmod +x /usr/local/bin/prometheus_varnish_exporter

Create varnish-exporter user

Create a dedicated system user for running the exporter securely without shell access.

sudo useradd --no-create-home --shell /bin/false varnish-exporter
sudo usermod -a -G varnish varnish-exporter

Create systemd service

Configure the exporter to run as a systemd service with proper security settings and automatic restart.

[Unit]
Description=Prometheus Varnish Exporter
After=network.target varnish.service
Requires=varnish.service

[Service]
Type=simple
User=varnish-exporter
Group=varnish-exporter
ExecStart=/usr/local/bin/prometheus_varnish_exporter \
    -varnish-listen-address :9131 \
    -web.listen-address :9131 \
    -web.telemetry-path /metrics
Restart=always
RestartSec=10
KillMode=process

Security settings
NoNewPrivileges=yes
PrivateTmp=yes
ProtectSystem=strict
ProtectHome=yes
ReadOnlyPaths=/

[Install]
WantedBy=multi-user.target

Start and enable varnish-exporter

Enable the service to start on boot and verify it's collecting metrics properly.

sudo systemctl daemon-reload
sudo systemctl enable --now varnish-exporter
sudo systemctl status varnish-exporter

Test metrics endpoint

Verify the exporter is serving Varnish metrics in Prometheus format on port 9131.

curl http://localhost:9131/metrics | grep varnish_main_cache_hit

Configure firewall

Allow Prometheus server access to the exporter port while restricting external access.

sudo ufw allow from 203.0.113.10 to any port 9131
sudo ufw reload

sudo firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="203.0.113.10" port protocol="tcp" port="9131" accept'
sudo firewall-cmd --reload

Add Prometheus scrape configuration

Configure Prometheus to scrape Varnish metrics every 15 seconds for real-time monitoring.

  - job_name: 'varnish'
    static_configs:
      - targets: ['localhost:9131']
    scrape_interval: 15s
    metrics_path: /metrics
    scrape_timeout: 10s
    honor_labels: true

Restart Prometheus

Apply the new configuration and verify Varnish targets are being scraped successfully.

sudo systemctl restart prometheus
sudo systemctl status prometheus
curl http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | select(.job=="varnish")'

Create Grafana dashboard

Import a comprehensive Varnish dashboard with cache performance, hit rates, and backend health panels.

{
  "dashboard": {
    "id": null,
    "title": "Varnish Cache Performance",
    "tags": ["varnish", "cache", "performance"],
    "timezone": "browser",
    "panels": [
      {
        "id": 1,
        "title": "Cache Hit Rate",
        "type": "stat",
        "targets": [
          {
            "expr": "100 * (rate(varnish_main_cache_hit[5m]) / (rate(varnish_main_cache_hit[5m]) + rate(varnish_main_cache_miss[5m])))",
            "refId": "A"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "unit": "percent",
            "min": 0,
            "max": 100,
            "thresholds": {
              "steps": [
                {"color": "red", "value": 0},
                {"color": "yellow", "value": 80},
                {"color": "green", "value": 95}
              ]
            }
          }
        },
        "gridPos": {"h": 4, "w": 6, "x": 0, "y": 0}
      },
      {
        "id": 2,
        "title": "Requests per Second",
        "type": "stat",
        "targets": [
          {
            "expr": "rate(varnish_main_client_req[5m])",
            "refId": "A"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "unit": "reqps"
          }
        },
        "gridPos": {"h": 4, "w": 6, "x": 6, "y": 0}
      },
      {
        "id": 3,
        "title": "Backend Response Time",
        "type": "timeseries",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, rate(varnish_backend_req_duration_seconds_bucket[5m]))",
            "refId": "A",
            "legendFormat": "95th percentile"
          },
          {
            "expr": "histogram_quantile(0.50, rate(varnish_backend_req_duration_seconds_bucket[5m]))",
            "refId": "B",
            "legendFormat": "50th percentile"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "unit": "s"
          }
        },
        "gridPos": {"h": 8, "w": 12, "x": 0, "y": 4}
      },
      {
        "id": 4,
        "title": "Memory Usage",
        "type": "timeseries",
        "targets": [
          {
            "expr": "varnish_sma_g_bytes{type=\"s0\"}",
            "refId": "A",
            "legendFormat": "Used"
          },
          {
            "expr": "varnish_sma_g_space{type=\"s0\"}",
            "refId": "B",
            "legendFormat": "Available"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "unit": "bytes"
          }
        },
        "gridPos": {"h": 8, "w": 12, "x": 12, "y": 4}
      }
    ],
    "time": {
      "from": "now-1h",
      "to": "now"
    },
    "refresh": "5s"
  }
}

Import dashboard via API

Use Grafana API to import the dashboard and make it available for monitoring.

curl -X POST \
  http://admin:admin@localhost:3000/api/dashboards/db \
  -H 'Content-Type: application/json' \
  -d @/tmp/varnish-dashboard.json

Create alerting rules

Set up Prometheus alerting rules for low cache hit rates, high response times, and backend failures.

groups:
name: varnish_alerts  rules:
  - alert: VarnishLowCacheHitRate
    expr: 100 * (rate(varnish_main_cache_hit[5m]) / (rate(varnish_main_cache_hit[5m]) + rate(varnish_main_cache_miss[5m]))) < 80
    for: 2m
    labels:
      severity: warning
      service: varnish
    annotations:
      summary: "Varnish cache hit rate is below 80%"
      description: "Cache hit rate is {{ $value }}% for the last 5 minutes"

  - alert: VarnishHighBackendResponseTime
    expr: histogram_quantile(0.95, rate(varnish_backend_req_duration_seconds_bucket[5m])) > 1
    for: 2m
    labels:
      severity: critical
      service: varnish
    annotations:
      summary: "Varnish backend response time is high"
      description: "95th percentile backend response time is {{ $value }}s"

  - alert: VarnishBackendDown
    expr: varnish_backend_up == 0
    for: 30s
    labels:
      severity: critical
      service: varnish
    annotations:
      summary: "Varnish backend is down"
      description: "Backend {{ $labels.backend }} is not responding"

  - alert: VarnishHighMemoryUsage
    expr: (varnish_sma_g_bytes{type="s0"} / varnish_sma_g_space{type="s0"}) * 100 > 90
    for: 5m
    labels:
      severity: warning
      service: varnish
    annotations:
      summary: "Varnish memory usage is high"
      description: "Memory usage is {{ $value }}% of allocated space"

  - alert: VarnishClientErrors
    expr: rate(varnish_main_client_resp_4xx[5m]) > 10
    for: 2m
    labels:
      severity: warning
      service: varnish
    annotations:
      summary: "High rate of 4xx errors from Varnish"
      description: "4xx error rate is {{ $value }} per second"

Update Prometheus configuration

Add the new alerting rules file to Prometheus configuration and restart the service.

rule_files:
  - "/etc/prometheus/rules/varnish.yml"

sudo systemctl restart prometheus
sudo systemctl status prometheus

Configure advanced dashboard panels

Create additional monitoring panels for thread usage, object lifetime, and purge operations.

{
  "panels": [
    {
      "id": 5,
      "title": "Thread Pool Usage",
      "type": "timeseries",
      "targets": [
        {
          "expr": "varnish_main_threads",
          "refId": "A",
          "legendFormat": "Active Threads"
        },
        {
          "expr": "varnish_main_thread_queue_len",
          "refId": "B",
          "legendFormat": "Queued Requests"
        }
      ],
      "gridPos": {"h": 6, "w": 8, "x": 0, "y": 12}
    },
    {
      "id": 6,
      "title": "Object Lifecycle",
      "type": "timeseries",
      "targets": [
        {
          "expr": "rate(varnish_main_n_object[5m])",
          "refId": "A",
          "legendFormat": "Objects Created"
        },
        {
          "expr": "rate(varnish_main_n_objecthead[5m])",
          "refId": "B",
          "legendFormat": "Object Headers"
        }
      ],
      "gridPos": {"h": 6, "w": 8, "x": 8, "y": 12}
    },
    {
      "id": 7,
      "title": "Purge Operations",
      "type": "timeseries",
      "targets": [
        {
          "expr": "rate(varnish_main_n_purge[5m])",
          "refId": "A",
          "legendFormat": "Purges per second"
        },
        {
          "expr": "rate(varnish_main_n_purge_obj[5m])",
          "refId": "B",
          "legendFormat": "Objects purged"
        }
      ],
      "gridPos": {"h": 6, "w": 8, "x": 16, "y": 12}
    }
  ]
}

Set up Grafana notifications

Configure notification channels for critical Varnish alerts to integrate with your existing monitoring workflow.

curl -X POST \
  http://admin:admin@localhost:3000/api/alert-notifications \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "varnish-alerts",
    "type": "webhook",
    "settings": {
      "url": "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK",
      "httpMethod": "POST"
    }
  }'

Verify your setup

Test that all components are working together and collecting metrics properly.

# Check exporter is running and serving metrics
sudo systemctl status varnish-exporter
curl -s http://localhost:9131/metrics | grep -E "varnish_main_(cache_hit|cache_miss|client_req)"

Verify Prometheus is scraping Varnish metrics
curl -s http://localhost:9090/api/v1/query?query=up{job="varnish"}

Test that dashboards show data
curl -s http://admin:admin@localhost:3000/api/search?query=varnish

Generate some cache activity to see metrics
varnishtest -v /usr/share/doc/varnish/examples/test01.vtc

Check alert rules are loaded
curl -s http://localhost:9090/api/v1/rules | jq '.data.groups[] | select(.name=="varnish_alerts")'

Common issues

Symptom	Cause	Fix
Exporter returns "permission denied"	User not in varnish group	`sudo usermod -a -G varnish varnish-exporter`
No metrics in Prometheus	Firewall blocking scrape	Check firewall rules and Prometheus config
Dashboard shows "No data"	Wrong metric names or time range	Verify metric names with `curl localhost:9131/metrics`
Alerts not firing	Rules syntax error or thresholds too high	`promtool check rules /etc/prometheus/rules/varnish.yml`
High memory usage alerts	Varnish cache size misconfigured	Adjust `malloc` size in varnish systemd config

Note: For production environments, consider implementing advanced Grafana dashboards and alerting with more sophisticated notification routing and escalation policies.

Next steps

Set up Prometheus and Grafana monitoring stack with Docker Compose for a complete monitoring solution
Install and configure Varnish Cache 7 with NGINX backend to optimize your web acceleration setup
Configure Varnish cache warming strategies to improve cache hit rates
Set up Varnish cluster with load balancing for high availability deployments
Configure advanced alerting rules for comprehensive Varnish performance monitoring

Running this in production?

Need this managed for you? Setting up monitoring once is straightforward. Keeping dashboards updated, alert thresholds tuned, and responding to performance issues across environments is the harder part. See how we run infrastructure like this for European SaaS and e-commerce teams.

Automated install script

Run this to automate the entire setup

install.sh

#!/usr/bin/env bash
set -euo pipefail

# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'

# Configuration
EXPORTER_VERSION="1.6.1"
EXPORTER_URL="https://github.com/jonnenauha/prometheus_varnish_exporter/releases/download/${EXPORTER_VERSION}/prometheus_varnish_exporter-${EXPORTER_VERSION}.linux-amd64.tar.gz"
PROMETHEUS_SERVER_IP="${1:-127.0.0.1}"

# Usage function
usage() {
    echo "Usage: $0 [prometheus_server_ip]"
    echo "Example: $0 192.168.1.100"
    exit 1
}

# Error handling
cleanup_on_error() {
    echo -e "${RED}[ERROR] Installation failed. Cleaning up...${NC}"
    systemctl stop varnish-exporter 2>/dev/null || true
    systemctl disable varnish-exporter 2>/dev/null || true
    rm -f /etc/systemd/system/varnish-exporter.service
    userdel -f varnish-exporter 2>/dev/null || true
    rm -f /usr/local/bin/prometheus_varnish_exporter
    rm -rf /tmp/varnish_exporter_install
    exit 1
}
trap cleanup_on_error ERR

# Check if running as root
if [[ $EUID -ne 0 ]]; then
    echo -e "${RED}This script must be run as root${NC}"
    exit 1
fi

# Detect distribution
if [ -f /etc/os-release ]; then
    . /etc/os-release
    case "$ID" in
        ubuntu|debian) 
            PKG_MGR="apt"
            PKG_INSTALL="apt install -y"
            PKG_UPDATE="apt update"
            FIREWALL_CMD="ufw"
            ;;
        almalinux|rocky|centos|rhel|ol|fedora) 
            PKG_MGR="dnf"
            PKG_INSTALL="dnf install -y"
            PKG_UPDATE="dnf update -y"
            FIREWALL_CMD="firewalld"
            ;;
        amzn) 
            PKG_MGR="yum"
            PKG_INSTALL="yum install -y"
            PKG_UPDATE="yum update -y"
            FIREWALL_CMD="firewalld"
            ;;
        *) 
            echo -e "${RED}Unsupported distribution: $ID${NC}"
            exit 1
            ;;
    esac
else
    echo -e "${RED}/etc/os-release not found. Cannot detect distribution.${NC}"
    exit 1
fi

echo -e "${GREEN}Installing Varnish Prometheus Exporter on $PRETTY_NAME${NC}"

# Step 1: Verify prerequisites
echo -e "${YELLOW}[1/9] Checking prerequisites...${NC}"
if ! systemctl is-active varnish >/dev/null 2>&1; then
    echo -e "${RED}Varnish service is not running. Please install and start Varnish first.${NC}"
    exit 1
fi

if ! command -v varnishstat >/dev/null 2>&1; then
    echo -e "${RED}varnishstat command not found. Please install Varnish tools.${NC}"
    exit 1
fi

# Step 2: Update packages
echo -e "${YELLOW}[2/9] Updating package repositories...${NC}"
$PKG_UPDATE >/dev/null

# Step 3: Install required packages
echo -e "${YELLOW}[3/9] Installing required packages...${NC}"
$PKG_INSTALL wget tar curl >/dev/null

# Step 4: Download and install exporter
echo -e "${YELLOW}[4/9] Downloading Varnish Prometheus Exporter...${NC}"
cd /tmp
mkdir -p varnish_exporter_install
cd varnish_exporter_install

wget -q "$EXPORTER_URL"
tar xzf "prometheus_varnish_exporter-${EXPORTER_VERSION}.linux-amd64.tar.gz"
cp "prometheus_varnish_exporter-${EXPORTER_VERSION}.linux-amd64/prometheus_varnish_exporter" /usr/local/bin/
chmod 755 /usr/local/bin/prometheus_varnish_exporter
chown root:root /usr/local/bin/prometheus_varnish_exporter

# Step 5: Create system user
echo -e "${YELLOW}[5/9] Creating varnish-exporter user...${NC}"
if ! id varnish-exporter >/dev/null 2>&1; then
    useradd --no-create-home --shell /bin/false --system varnish-exporter
fi
usermod -a -G varnish varnish-exporter

# Step 6: Create systemd service
echo -e "${YELLOW}[6/9] Creating systemd service...${NC}"
cat > /etc/systemd/system/varnish-exporter.service << 'EOF'
[Unit]
Description=Prometheus Varnish Exporter
After=network.target varnish.service
Requires=varnish.service

[Service]
Type=simple
User=varnish-exporter
Group=varnish-exporter
ExecStart=/usr/local/bin/prometheus_varnish_exporter \
    -varnish-listen-address :9131 \
    -web.listen-address :9131 \
    -web.telemetry-path /metrics
Restart=always
RestartSec=10
KillMode=process

# Security settings
NoNewPrivileges=yes
PrivateTmp=yes
ProtectSystem=strict
ProtectHome=yes
ReadOnlyPaths=/

[Install]
WantedBy=multi-user.target
EOF

chmod 644 /etc/systemd/system/varnish-exporter.service

# Step 7: Start and enable service
echo -e "${YELLOW}[7/9] Starting varnish-exporter service...${NC}"
systemctl daemon-reload
systemctl enable varnish-exporter
systemctl start varnish-exporter

# Wait for service to start
sleep 3

# Step 8: Configure firewall
echo -e "${YELLOW}[8/9] Configuring firewall...${NC}"
if [[ "$FIREWALL_CMD" == "ufw" ]]; then
    if systemctl is-active ufw >/dev/null 2>&1; then
        ufw allow from "$PROMETHEUS_SERVER_IP" to any port 9131 >/dev/null
        ufw --force reload >/dev/null
    fi
elif [[ "$FIREWALL_CMD" == "firewalld" ]]; then
    if systemctl is-active firewalld >/dev/null 2>&1; then
        firewall-cmd --permanent --add-rich-rule="rule family=\"ipv4\" source address=\"$PROMETHEUS_SERVER_IP\" port protocol=\"tcp\" port=\"9131\" accept" >/dev/null
        firewall-cmd --reload >/dev/null
    fi
fi

# Step 9: Verification
echo -e "${YELLOW}[9/9] Verifying installation...${NC}"

# Check service status
if ! systemctl is-active varnish-exporter >/dev/null; then
    echo -e "${RED}varnish-exporter service is not running${NC}"
    systemctl status varnish-exporter
    exit 1
fi

# Check metrics endpoint
if ! curl -s http://localhost:9131/metrics | grep -q varnish_main_cache_hit; then
    echo -e "${RED}Metrics endpoint is not responding correctly${NC}"
    exit 1
fi

# Cleanup temp files
rm -rf /tmp/varnish_exporter_install

echo -e "${GREEN}✓ Varnish Prometheus Exporter installed successfully!${NC}"
echo ""
echo "Service status:"
systemctl status varnish-exporter --no-pager -l
echo ""
echo "Metrics available at: http://localhost:9131/metrics"
echo "Prometheus scrape target: $PROMETHEUS_SERVER_IP:9131"
echo ""
echo "Add this to your Prometheus configuration:"
echo "  - job_name: 'varnish'"
echo "    static_configs:"
echo "      - targets: ['$(hostname -I | awk '{print $1}'):9131']"
echo "    scrape_interval: 15s"

Review the script before running. Execute with: bash install.sh

#varnish #prometheus #grafana #monitoring #performance

Monitor Varnish 7 performance with Prometheus and Grafana dashboards

Prerequisites

What this solves

Step-by-step configuration

Install Varnish statistics daemon

Install prometheus-varnish-exporter

Create varnish-exporter user

Create systemd service

Security settings

Start and enable varnish-exporter

Test metrics endpoint

Configure firewall

Add Prometheus scrape configuration

Restart Prometheus

Create Grafana dashboard

Import dashboard via API

Create alerting rules

Update Prometheus configuration

Configure advanced dashboard panels

Set up Grafana notifications

Verify your setup

Verify Prometheus is scraping Varnish metrics

Test that dashboards show data

Generate some cache activity to see metrics

Check alert rules are loaded

Common issues

Next steps

Running this in production?

Related tutorials

Configure Consul Connect service mesh monitoring with distributed tracing

Configure OpenTelemetry custom metrics for application monitoring with Prometheus and Grafana

Configure Jaeger with Elasticsearch backend security and encryption

Don't want to manage this yourself?