Implement Varnish cache warming with automated content preloading for high-performance websites

Intermediate 35 min Apr 05, 2026 176 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Set up automated Varnish cache warming with priority URL preloading, systemd timers for scheduled content refreshing, and comprehensive monitoring to optimize cache hit rates and reduce backend server load for high-traffic websites.

Prerequisites

  • Existing Varnish installation
  • Root or sudo access
  • Python 3.6 or higher
  • Basic understanding of HTTP caching

What this solves

Varnish cache warming prevents cold cache performance issues by automatically preloading critical content before users request it. This eliminates first-request latency spikes and maintains consistently fast response times. Cache warming is essential for high-traffic websites that need predictable performance after deployments, server restarts, or cache invalidations.

Step-by-step configuration

Update system packages

Start by updating your package manager to ensure you get the latest versions of required tools.

sudo apt update && sudo apt upgrade -y
sudo dnf update -y

Install cache warming dependencies

Install curl for making HTTP requests, jq for JSON parsing, and Python for advanced warming scripts.

sudo apt install -y curl jq python3 python3-pip python3-venv bc
sudo dnf install -y curl jq python3 python3-pip bc

Create cache warming directory structure

Set up organized directories for warming scripts, URL lists, logs, and configuration files.

sudo mkdir -p /opt/varnish-warming/{scripts,config,logs,urls}
sudo mkdir -p /var/log/varnish-warming

Create the main cache warming script

This script handles URL loading with priority levels, parallel processing, and detailed logging.

#!/bin/bash

Varnish Cache Warming Script

Usage: ./warm-cache.sh [priority_level] [max_parallel]

set -euo pipefail

Configuration

CONFIG_DIR="/opt/varnish-warming/config" LOG_DIR="/var/log/varnish-warming" URL_DIR="/opt/varnish-warming/urls" VARNISH_HOST="127.0.0.1" VARNISH_PORT="80" DEFAULT_TIMEOUT="30" DEFAULT_PARALLEL="10" USER_AGENT="VarnishWarming/1.0"

Load configuration if exists

if [[ -f "$CONFIG_DIR/warming.conf" ]]; then source "$CONFIG_DIR/warming.conf" fi

Parameters

PRIORITY_LEVEL=${1:-"high"} MAX_PARALLEL=${2:-$DEFAULT_PARALLEL} LOG_FILE="$LOG_DIR/warming-$(date +%Y%m%d-%H%M%S).log" STATS_FILE="$LOG_DIR/warming-stats.json"

Logging function

log() { local level=$1 shift echo "[$(date +'%Y-%m-%d %H:%M:%S')] [$level] $*" | tee -a "$LOG_FILE" }

Warm single URL function

warm_url() { local url=$1 local priority=${2:-"medium"} local start_time=$(date +%s.%N) # Add cache-busting headers for first request local response=$(curl -s -w "%{http_code},%{time_total},%{size_download}" \ -H "Cache-Control: no-cache" \ -H "Pragma: no-cache" \ -H "User-Agent: $USER_AGENT" \ -H "X-Cache-Warming: true" \ --max-time "$DEFAULT_TIMEOUT" \ -o /dev/null \ "$url" 2>/dev/null || echo "000,0,0") local http_code=$(echo "$response" | cut -d',' -f1) local response_time=$(echo "$response" | cut -d',' -f2) local content_size=$(echo "$response" | cut -d',' -f3) # Make second request to populate cache if [[ "$http_code" == "200" ]]; then curl -s \ -H "User-Agent: $USER_AGENT" \ --max-time "$DEFAULT_TIMEOUT" \ -o /dev/null \ "$url" 2>/dev/null || true fi local end_time=$(date +%s.%N) local total_time=$(echo "$end_time - $start_time" | bc) # Log result if [[ "$http_code" == "200" ]]; then log "INFO" "SUCCESS: $url ($priority) - ${response_time}s, ${content_size} bytes" echo "success,$url,$priority,$http_code,$response_time,$content_size,$total_time" else log "WARN" "FAILED: $url ($priority) - HTTP $http_code" echo "failed,$url,$priority,$http_code,$response_time,$content_size,$total_time" fi } export -f warm_url export -f log export VARNISH_HOST VARNISH_PORT DEFAULT_TIMEOUT USER_AGENT LOG_FILE

Main warming function

warm_cache() { local priority=$1 local url_file="$URL_DIR/urls-$priority.txt" if [[ ! -f "$url_file" ]]; then log "WARN" "URL file not found: $url_file" return 1 fi local url_count=$(wc -l < "$url_file") log "INFO" "Starting cache warming for $priority priority ($url_count URLs)" # Process URLs in parallel cat "$url_file" | grep -v '^#' | grep -v '^$' | \ parallel -j "$MAX_PARALLEL" --will-cite \ "warm_url {} $priority" > "/tmp/warming-results-$priority.csv" # Calculate statistics local success_count=$(grep -c '^success,' "/tmp/warming-results-$priority.csv" || echo 0) local failed_count=$(grep -c '^failed,' "/tmp/warming-results-$priority.csv" || echo 0) local success_rate=0 if [[ $url_count -gt 0 ]]; then success_rate=$(echo "scale=2; $success_count * 100 / $url_count" | bc) fi log "INFO" "Completed $priority priority: $success_count/$url_count successful (${success_rate}%)" # Update stats file local stats=$(cat <> "$STATS_FILE" # Cleanup rm -f "/tmp/warming-results-$priority.csv" }

Install GNU parallel if not available

if ! command -v parallel &> /dev/null; then log "INFO" "Installing GNU parallel for concurrent processing" if command -v apt &> /dev/null; then sudo apt install -y parallel elif command -v dnf &> /dev/null; then sudo dnf install -y parallel fi fi

Main execution

log "INFO" "Starting Varnish cache warming (priority: $PRIORITY_LEVEL, parallel: $MAX_PARALLEL)" case "$PRIORITY_LEVEL" in "high") warm_cache "high" ;; "medium") warm_cache "medium" ;; "low") warm_cache "low" ;; "all") warm_cache "high" warm_cache "medium" warm_cache "low" ;; *) log "ERROR" "Invalid priority level: $PRIORITY_LEVEL (use: high, medium, low, all)" exit 1 ;; esac log "INFO" "Cache warming completed"

Create configuration file

Set up the main configuration file with customizable parameters for your environment.

# Varnish Cache Warming Configuration

Varnish connection settings

VARNISH_HOST="127.0.0.1" VARNISH_PORT="80"

Request settings

DEFAULT_TIMEOUT="30" DEFAULT_PARALLEL="15" USER_AGENT="VarnishWarming/1.0"

Retry settings

MAX_RETRIES="2" RETRY_DELAY="1"

Logging

LOG_RETENTION_DAYS="7" DEBUG_MODE="false"

Performance settings

RATE_LIMIT_DELAY="0.1" # Seconds between requests to same host MAX_CONCURRENT_HOSTS="5" # Max parallel hosts to warm simultaneously

Create priority URL lists

Set up URL lists organized by priority levels for efficient cache warming scheduling.

# High priority URLs - Critical pages that need immediate caching

Homepage and main landing pages

http://example.com/ http://example.com/index.html http://example.com/about http://example.com/contact

Critical API endpoints

http://example.com/api/health http://example.com/api/config

Essential assets

http://example.com/css/main.css http://example.com/js/app.js http://example.com/images/logo.png

Create medium and low priority URL lists

Add additional URL lists for medium and low priority content to optimize cache warming coverage.

# Medium priority URLs - Important but not critical

Product categories and popular content

http://example.com/products http://example.com/services http://example.com/blog http://example.com/news

User-facing pages

http://example.com/login http://example.com/register http://example.com/dashboard

Common resources

http://example.com/css/theme.css http://example.com/js/utils.js
# Low priority URLs - Nice to have cached

Secondary pages and content

http://example.com/privacy http://example.com/terms http://example.com/help http://example.com/faq

Archive and historical content

http://example.com/archive http://example.com/old-blog-posts

Non-critical assets

http://example.com/images/background.jpg http://example.com/fonts/custom.woff2

Create URL discovery script

This script automatically discovers URLs from sitemaps and access logs to maintain current URL lists.

#!/usr/bin/env python3

import requests
import xml.etree.ElementTree as ET
import json
import re
import sys
import argparse
from urllib.parse import urljoin, urlparse
from collections import defaultdict
import logging

Configure logging

logging.basicConfig( level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s' ) logger = logging.getLogger(__name__) class URLDiscovery: def __init__(self, base_url, output_dir='/opt/varnish-warming/urls'): self.base_url = base_url.rstrip('/') self.output_dir = output_dir self.discovered_urls = defaultdict(list) def discover_from_sitemap(self, sitemap_url=None): """Discover URLs from XML sitemap""" if not sitemap_url: sitemap_url = urljoin(self.base_url, '/sitemap.xml') try: response = requests.get(sitemap_url, timeout=30) response.raise_for_status() root = ET.fromstring(response.content) # Handle sitemap index if 'sitemapindex' in root.tag: for sitemap in root.findall('.//{http://www.sitemaps.org/schemas/sitemap/0.9}sitemap'): loc = sitemap.find('{http://www.sitemaps.org/schemas/sitemap/0.9}loc') if loc is not None: self.discover_from_sitemap(loc.text) else: # Handle URL set for url in root.findall('.//{http://www.sitemaps.org/schemas/sitemap/0.9}url'): loc = url.find('{http://www.sitemaps.org/schemas/sitemap/0.9}loc') priority = url.find('{http://www.sitemaps.org/schemas/sitemap/0.9}priority') if loc is not None: url_text = loc.text priority_value = float(priority.text) if priority is not None else 0.5 # Categorize by priority if priority_value >= 0.8: self.discovered_urls['high'].append(url_text) elif priority_value >= 0.5: self.discovered_urls['medium'].append(url_text) else: self.discovered_urls['low'].append(url_text) logger.info(f"Discovered {sum(len(urls) for urls in self.discovered_urls.values())} URLs from sitemap") except Exception as e: logger.error(f"Error parsing sitemap {sitemap_url}: {e}") def discover_from_access_log(self, log_file, min_requests=5): """Discover popular URLs from access logs""" url_counts = defaultdict(int) try: with open(log_file, 'r') as f: for line in f: # Parse common log format match = re.search(r'"[A-Z]+ (\S+)', line) if match: path = match.group(1) if not path.startswith('/'): continue # Skip common non-cacheable paths if any(skip in path for skip in ['.php', '.cgi', '/admin/', '/api/auth']): continue url = urljoin(self.base_url, path) url_counts[url] += 1 # Categorize by popularity sorted_urls = sorted(url_counts.items(), key=lambda x: x[1], reverse=True) for url, count in sorted_urls: if count >= min_requests * 5: self.discovered_urls['high'].append(url) elif count >= min_requests * 2: self.discovered_urls['medium'].append(url) elif count >= min_requests: self.discovered_urls['low'].append(url) logger.info(f"Discovered {len(sorted_urls)} URLs from access logs") except Exception as e: logger.error(f"Error parsing access log {log_file}: {e}") def save_url_lists(self): """Save discovered URLs to priority files""" for priority, urls in self.discovered_urls.items(): # Remove duplicates and sort unique_urls = list(set(urls)) unique_urls.sort() filename = f"{self.output_dir}/urls-{priority}.txt" try: with open(filename, 'w') as f: f.write(f"# {priority.title()} priority URLs - Auto-generated\n") f.write(f"# Total URLs: {len(unique_urls)}\n\n") for url in unique_urls: f.write(f"{url}\n") logger.info(f"Saved {len(unique_urls)} {priority} priority URLs to {filename}") except Exception as e: logger.error(f"Error saving URL list {filename}: {e}") def main(): parser = argparse.ArgumentParser(description='Discover URLs for Varnish cache warming') parser.add_argument('base_url', help='Base URL of the website') parser.add_argument('--sitemap', help='Sitemap URL (default: /sitemap.xml)') parser.add_argument('--access-log', help='Path to access log file') parser.add_argument('--output-dir', default='/opt/varnish-warming/urls', help='Output directory') parser.add_argument('--min-requests', type=int, default=5, help='Minimum requests for log-based discovery') args = parser.parse_args() discovery = URLDiscovery(args.base_url, args.output_dir) # Discover from sitemap discovery.discover_from_sitemap(args.sitemap) # Discover from access logs if provided if args.access_log: discovery.discover_from_access_log(args.access_log, args.min_requests) # Save results discovery.save_url_lists() total_urls = sum(len(urls) for urls in discovery.discovered_urls.values()) logger.info(f"URL discovery completed. Total URLs: {total_urls}") if __name__ == '__main__': main()

Set proper permissions

Configure ownership and permissions for the warming scripts and directories. The scripts need execute permissions while config files should be readable by the service user.

sudo chmod +x /opt/varnish-warming/scripts/warm-cache.sh
sudo chmod +x /opt/varnish-warming/scripts/discover-urls.py
sudo chown -R root:root /opt/varnish-warming
sudo chmod 755 /opt/varnish-warming/scripts/
sudo chmod 644 /opt/varnish-warming/config/*
sudo chmod 644 /opt/varnish-warming/urls/*
sudo chown -R syslog:adm /var/log/varnish-warming
sudo chmod 755 /var/log/varnish-warming
Note: We use 755 for directories (read/execute for everyone, write for owner) and 644 for files (read for everyone, write for owner). The syslog user owns log directories for proper logging integration.

Create systemd service for cache warming

Set up a systemd service to manage cache warming operations with proper resource limits and logging.

[Unit]
Description=Varnish Cache Warming Service
After=network.target varnish.service
Requires=varnish.service
Wants=network-online.target

[Service]
Type=oneshot
User=www-data
Group=www-data
ExecStart=/opt/varnish-warming/scripts/warm-cache.sh all 15
WorkingDirectory=/opt/varnish-warming
Environment=PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
StandardOutput=journal
StandardError=journal

Resource limits

MemoryMax=512M CPUQuota=50% TasksMax=50

Security settings

NoNewPrivileges=yes PrivateTmp=yes ProtectSystem=strict ProtectHome=yes ReadWritePaths=/var/log/varnish-warming /tmp

Timeout settings

TimeoutStartSec=300 TimeoutStopSec=30 [Install] WantedBy=multi-user.target

Create systemd timer for scheduled warming

Configure automatic cache warming with systemd timers for consistent performance optimization.

[Unit]
Description=Varnish Cache Warming Timer
Requires=varnish-warming.service

[Timer]

Run high priority warming every 30 minutes

OnCalendar=*:0/30 RandomizedDelaySec=60 Persistent=true [Install] WantedBy=timers.target

Create additional timer for full warming

Set up a separate timer for complete cache warming including all priority levels during off-peak hours.

[Unit]
Description=Varnish Full Cache Warming Service
After=network.target varnish.service
Requires=varnish.service

[Service]
Type=oneshot
User=www-data
Group=www-data
ExecStartPre=/opt/varnish-warming/scripts/discover-urls.py http://example.com
ExecStart=/opt/varnish-warming/scripts/warm-cache.sh all 20
WorkingDirectory=/opt/varnish-warming
StandardOutput=journal
StandardError=journal

Resource limits for full warming

MemoryMax=1G CPUQuota=75% TasksMax=100

Security settings

NoNewPrivileges=yes PrivateTmp=yes ProtectSystem=strict ProtectHome=yes ReadWritePaths=/var/log/varnish-warming /tmp /opt/varnish-warming/urls TimeoutStartSec=1800 TimeoutStopSec=60
[Unit]
Description=Varnish Full Cache Warming Timer
Requires=varnish-warming-full.service

[Timer]

Run full warming twice daily during off-peak hours

OnCalendar=02:00,14:00 RandomizedDelaySec=300 Persistent=true [Install] WantedBy=timers.target

Enable and start the services

Reload systemd configuration and enable the cache warming services and timers.

sudo systemctl daemon-reload
sudo systemctl enable varnish-warming.timer
sudo systemctl enable varnish-warming-full.timer
sudo systemctl start varnish-warming.timer
sudo systemctl start varnish-warming-full.timer

Create monitoring and metrics script

Set up comprehensive monitoring for cache warming performance and hit rate analysis.

#!/bin/bash

Varnish Cache Warming Monitor Script

set -euo pipefail LOG_DIR="/var/log/varnish-warming" STATS_FILE="$LOG_DIR/warming-stats.json" METRICS_FILE="$LOG_DIR/warming-metrics.json" VARNISH_STATS_CMD="varnishstat"

Function to get Varnish statistics

get_varnish_stats() { local stats if command -v varnishstat &> /dev/null; then stats=$(varnishstat -1 -j 2>/dev/null || echo '{}') else stats='{}' fi echo "$stats" }

Function to calculate cache hit rate

calculate_hit_rate() { local varnish_stats=$1 local cache_hits=$(echo "$varnish_stats" | jq -r '.MAIN.cache_hit.value // 0') local cache_misses=$(echo "$varnish_stats" | jq -r '.MAIN.cache_miss.value // 0') local total_requests=$((cache_hits + cache_misses)) if [[ $total_requests -gt 0 ]]; then local hit_rate=$(echo "scale=4; $cache_hits * 100 / $total_requests" | bc) echo "$hit_rate" else echo "0" fi }

Function to get warming statistics

get_warming_stats() { if [[ -f "$STATS_FILE" ]]; then # Get last 24 hours of warming stats local cutoff_time=$(date -d '24 hours ago' -Iseconds) jq -s --arg cutoff "$cutoff_time" ' map(select(.timestamp >= $cutoff)) | { total_warming_sessions: length, total_urls_warmed: map(.successful) | add, total_failed_urls: map(.failed) | add, average_success_rate: (map(.success_rate) | add / length), by_priority: group_by(.priority) | map({ priority: .[0].priority, sessions: length, urls_warmed: map(.successful) | add, avg_success_rate: (map(.success_rate) | add / length) }) }' "$STATS_FILE" 2>/dev/null || echo '{}' else echo '{}' fi }

Function to analyze log files

analyze_logs() { local log_pattern="$LOG_DIR/warming-*.log" local recent_logs=$(find "$LOG_DIR" -name "warming-*.log" -mtime -1 2>/dev/null || true) if [[ -n "$recent_logs" ]]; then local total_successes=$(grep -c "SUCCESS:" $recent_logs 2>/dev/null || echo 0) local total_failures=$(grep -c "FAILED:" $recent_logs 2>/dev/null || echo 0) local error_patterns=$(grep "FAILED:" $recent_logs 2>/dev/null | \ awk '{print $NF}' | sort | uniq -c | sort -nr | head -5 || true) echo "{ \"log_analysis\": { \"period\": \"24h\", \"total_successes\": $total_successes, \"total_failures\": $total_failures, \"top_errors\": \"$error_patterns\" } }" else echo '{"log_analysis": {"period": "24h", "status": "no_recent_logs"}}' fi }

Main monitoring function

main() { local timestamp=$(date -Iseconds) local varnish_stats=$(get_varnish_stats) local hit_rate=$(calculate_hit_rate "$varnish_stats") local warming_stats=$(get_warming_stats) local log_analysis=$(analyze_logs) # Combine all metrics local combined_metrics=$(jq -n \ --arg timestamp "$timestamp" \ --argjson varnish_stats "$varnish_stats" \ --arg hit_rate "$hit_rate" \ --argjson warming_stats "$warming_stats" \ --argjson log_analysis "$log_analysis" ' { timestamp: $timestamp, cache_hit_rate: ($hit_rate | tonumber), varnish_stats: { cache_hits: ($varnish_stats.MAIN.cache_hit.value // 0), cache_misses: ($varnish_stats.MAIN.cache_miss.value // 0), backend_requests: ($varnish_stats.MAIN.backend_req.value // 0), objects_cached: ($varnish_stats.MAIN.n_object.value // 0) }, warming_performance: $warming_stats, log_analysis: $log_analysis.log_analysis }') echo "$combined_metrics" >> "$METRICS_FILE" # Keep only last 7 days of metrics local cutoff_time=$(date -d '7 days ago' -Iseconds) local temp_file="/tmp/warming-metrics-temp.json" if [[ -f "$METRICS_FILE" ]]; then jq --arg cutoff "$cutoff_time" 'select(.timestamp >= $cutoff)' "$METRICS_FILE" > "$temp_file" 2>/dev/null || true if [[ -s "$temp_file" ]]; then mv "$temp_file" "$METRICS_FILE" fi fi # Output current status echo "Cache Warming Status Report - $(date)" echo "==========================================" echo "Cache Hit Rate: ${hit_rate}%" echo "Warming Stats: $(echo "$warming_stats" | jq -c .)" echo "Full metrics saved to: $METRICS_FILE" }

Run monitoring

main "$@"

Set up log rotation

Configure logrotate to manage cache warming log files and prevent disk space issues.

/var/log/varnish-warming/*.log {
    daily
    missingok
    rotate 7
    compress
    delaycompress
    notifempty
    create 644 syslog adm
    postrotate
        systemctl reload rsyslog > /dev/null 2>&1 || true
    endrotate
}

/var/log/varnish-warming/*.json {
    weekly
    missingok
    rotate 4
    compress
    delaycompress
    notifempty
    create 644 syslog adm
}

Make monitoring script executable

Set proper permissions for the monitoring script and test the setup.

sudo chmod +x /opt/varnish-warming/scripts/monitor-warming.sh
sudo /opt/varnish-warming/scripts/monitor-warming.sh

Verify your setup

Test the cache warming system and verify all components are working correctly.

# Check systemd services and timers
sudo systemctl status varnish-warming.timer
sudo systemctl status varnish-warming-full.timer

List next scheduled runs

sudo systemctl list-timers | grep warming

Test manual cache warming

sudo -u www-data /opt/varnish-warming/scripts/warm-cache.sh high 5

Check warming logs

sudo tail -f /var/log/varnish-warming/warming-*.log

Verify URL discovery

/opt/varnish-warming/scripts/discover-urls.py http://example.com --output-dir /tmp

Check cache statistics

varnishstat -1 | grep -E '(cache_hit|cache_miss)'

Monitor warming performance

/opt/varnish-warming/scripts/monitor-warming.sh

For enhanced monitoring integration, you can connect cache warming metrics to existing monitoring systems like Prometheus and cAdvisor or set up performance analysis similar to NGINX cache optimization.

Common issues

SymptomCauseFix
Warming script fails with permission deniedIncorrect file permissions or ownershipsudo chown www-data:www-data /opt/varnish-warming -R and verify execute permissions
URLs return 404 during warmingOutdated URL lists or incorrect base URLRun URL discovery script and update configuration with correct domain
High memory usage during warmingToo many parallel workers or large responsesReduce MAX_PARALLEL in config and add response size limits to curl
Systemd timer not runningService file errors or timer not enabledsudo systemctl daemon-reload && sudo systemctl enable --now varnish-warming.timer
Cache hit rate not improvingCache invalidation or TTL too lowCheck Varnish VCL configuration and ensure proper cache headers
Warming takes too longSequential processing or network timeoutsInstall GNU parallel and adjust timeout values in config
Log files growing too largeMissing logrotate configurationApply logrotate config and run sudo logrotate -f /etc/logrotate.d/varnish-warming

Next steps

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle infrastructure performance optimization for businesses that depend on uptime. From initial setup to ongoing operations.