Implement Grafana high availability clustering with PostgreSQL backend and load balancing

Advanced 45 min Apr 23, 2026
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Set up a production-ready Grafana high availability cluster with PostgreSQL shared database backend and HAProxy load balancing for enterprise monitoring infrastructure with automatic failover.

Prerequisites

  • 3+ servers for Grafana cluster
  • 2 servers for PostgreSQL primary/standby
  • 1 server for HAProxy load balancer
  • SSL certificates
  • Network connectivity between all nodes

What this solves

Grafana high availability clustering eliminates single points of failure in your monitoring infrastructure by running multiple Grafana instances sharing a PostgreSQL database. This setup ensures continuous dashboard access during server maintenance, hardware failures, or traffic spikes through automated load balancing and failover mechanisms.

Step-by-step configuration

Update system packages

Start by updating all servers that will host your Grafana cluster components.

sudo apt update && sudo apt upgrade -y
sudo dnf update -y

Install PostgreSQL cluster for shared database

Set up PostgreSQL with streaming replication for high availability. Install PostgreSQL on two servers designated as primary and standby database nodes.

sudo apt install -y postgresql-16 postgresql-16-repmgr
sudo systemctl enable --now postgresql
sudo dnf install -y postgresql16-server postgresql16-repmgr
sudo /usr/pgsql-16/bin/postgresql-16-setup initdb
sudo systemctl enable --now postgresql-16

Configure PostgreSQL primary server

Configure the primary PostgreSQL server for streaming replication. Create the Grafana database and user with appropriate permissions.

sudo -u postgres createdb grafana
sudo -u postgres psql -c "CREATE USER grafana WITH PASSWORD 'SecureGrafanaPass2024';"
sudo -u postgres psql -c "GRANT ALL PRIVILEGES ON DATABASE grafana TO grafana;"

Configure PostgreSQL for replication

Edit PostgreSQL configuration to enable streaming replication and remote connections for Grafana instances.

listen_addresses = '*'
wal_level = replica
max_wal_senders = 10
max_replication_slots = 10
archive_mode = on
archive_command = '/bin/true'
log_replication_commands = on
# Allow Grafana connections
host    grafana         grafana         203.0.113.0/24          md5

Allow replication

host replication postgres 203.0.113.0/24 md5
sudo systemctl restart postgresql

Set up PostgreSQL standby server

Configure the standby PostgreSQL server for automatic failover using pg_basebackup for initial synchronization.

sudo systemctl stop postgresql
sudo -u postgres rm -rf /var/lib/postgresql/16/main/*
sudo -u postgres pg_basebackup -h 203.0.113.10 -D /var/lib/postgresql/16/main -U postgres -v -P -R
primary_conninfo = 'host=203.0.113.10 port=5432 user=postgres'
restore_command = '/bin/true'
standby_mode = on
sudo systemctl start postgresql

Install Grafana on cluster nodes

Install Grafana Enterprise on all nodes that will serve dashboard requests. Download and install the latest Grafana package.

sudo apt install -y software-properties-common wget
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/enterprise/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt update
sudo apt install -y grafana-enterprise
sudo dnf install -y wget
wget https://rpm.grafana.com/gpg.key
sudo rpm --import gpg.key
echo '[grafana]
name=grafana
baseurl=https://rpm.grafana.com
repo_gpgcheck=1
enabled=1
gpgcheck=1
gpgkey=https://rpm.grafana.com/gpg.key' | sudo tee /etc/yum.repos.d/grafana.repo
sudo dnf install -y grafana-enterprise

Configure Grafana for high availability

Configure each Grafana instance to use the shared PostgreSQL database and enable clustering features. Use identical configuration across all nodes.

[database]
type = postgres
host = 203.0.113.10:5432
name = grafana
user = grafana
password = SecureGrafanaPass2024
ssl_mode = require
max_idle_conn = 25
max_open_conn = 300
conn_max_lifetime = 14400

[server]
http_port = 3000
domain = grafana.example.com
root_url = https://grafana.example.com/
serve_from_sub_path = false

[security]
admin_user = admin
admin_password = SecureAdminPass2024
secret_key = SecureSecretKey2024ChangeThis
disable_gravatar = true

[users]
allow_sign_up = false
allow_org_create = false
auto_assign_org = true
auto_assign_org_role = Viewer

[auth]
disable_login_form = false
disable_signout_menu = false

[auth.anonymous]
enabled = false

[session]
provider = postgres
provider_config = host=203.0.113.10 port=5432 user=grafana password=SecureGrafanaPass2024 dbname=grafana sslmode=require
session_life_time = 86400

[alerting]
enabled = true
execute_alerts = true

[unified_alerting]
enabled = true
disabled_orgs = 
min_interval = 10s
max_attempts = 3

[log]
mode = file
level = info
format = text
file_path = /var/log/grafana/grafana.log
max_lines = 1000000
max_size_shift = 28
daily_rotate = true
max_days = 7

Set up SSL certificates

Generate SSL certificates for secure HTTPS access. Create a self-signed certificate for testing or use Let's Encrypt for production.

sudo mkdir -p /etc/grafana/ssl
sudo openssl req -x509 -nodes -days 365 -newkey rsa:4096 \
  -keyout /etc/grafana/ssl/grafana.key \
  -out /etc/grafana/ssl/grafana.crt \
  -subj "/C=US/ST=State/L=City/O=Organization/CN=grafana.example.com"
sudo chown -R grafana:grafana /etc/grafana/ssl
sudo chmod 600 /etc/grafana/ssl/grafana.key
sudo chmod 644 /etc/grafana/ssl/grafana.crt

Configure Grafana SSL settings

Update Grafana configuration to use SSL certificates and secure protocol settings.

[server]
protocol = https
cert_file = /etc/grafana/ssl/grafana.crt
cert_key = /etc/grafana/ssl/grafana.key
tls_min_version = TLS1.2
http_port = 3000

Install and configure HAProxy load balancer

Install HAProxy on a dedicated server to distribute traffic across Grafana instances with health checks and SSL termination.

sudo apt install -y haproxy
sudo dnf install -y haproxy

Configure HAProxy for Grafana clustering

Configure HAProxy with SSL termination, health checks, and load balancing across Grafana instances using round-robin algorithm with sticky sessions.

global
    log stdout local0
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin
    stats timeout 30s
    user haproxy
    group haproxy
    daemon
    ssl-default-bind-options ssl-min-ver TLSv1.2
    ssl-default-bind-ciphers ECDHE+AESGCM:ECDHE+CHACHA20:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS

defaults
    mode http
    log global
    option httplog
    option dontlognull
    option http-server-close
    option redispatch
    retries 3
    timeout http-request 10s
    timeout queue 1m
    timeout connect 10s
    timeout client 1m
    timeout server 1m
    timeout http-keep-alive 10s
    timeout check 10s
    maxconn 3000

frontend grafana_frontend
    bind *:443 ssl crt /etc/haproxy/ssl/grafana.pem
    bind *:80
    redirect scheme https code 301 if !{ ssl_fc }
    default_backend grafana_backend
    option httplog
    capture request header X-Forwarded-For len 64
    capture request header User-Agent len 64

backend grafana_backend
    balance roundrobin
    cookie SERVERID insert indirect nocache
    option httpchk GET /api/health
    http-check expect status 200
    server grafana1 203.0.113.20:3000 check cookie grafana1 ssl verify none
    server grafana2 203.0.113.21:3000 check cookie grafana2 ssl verify none
    server grafana3 203.0.113.22:3000 check cookie grafana3 ssl verify none

frontend stats
    bind *:8404
    stats enable
    stats uri /stats
    stats refresh 30s
    stats admin if TRUE

Create HAProxy SSL certificate

Create a combined SSL certificate file for HAProxy SSL termination. HAProxy requires certificate and key in a single PEM file.

sudo mkdir -p /etc/haproxy/ssl
sudo openssl req -x509 -nodes -days 365 -newkey rsa:4096 \
  -keyout /tmp/haproxy.key \
  -out /tmp/haproxy.crt \
  -subj "/C=US/ST=State/L=City/O=Organization/CN=grafana.example.com"
sudo cat /tmp/haproxy.crt /tmp/haproxy.key | sudo tee /etc/haproxy/ssl/grafana.pem
sudo chmod 600 /etc/haproxy/ssl/grafana.pem
sudo chown haproxy:haproxy /etc/haproxy/ssl/grafana.pem
sudo rm /tmp/haproxy.key /tmp/haproxy.crt

Configure firewall rules

Open required ports for Grafana cluster communication, PostgreSQL replication, and HAProxy load balancer access.

# HAProxy server
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw allow 8404/tcp

Grafana servers

sudo ufw allow 3000/tcp

PostgreSQL servers

sudo ufw allow 5432/tcp sudo ufw enable
# HAProxy server
sudo firewall-cmd --permanent --add-port=80/tcp
sudo firewall-cmd --permanent --add-port=443/tcp
sudo firewall-cmd --permanent --add-port=8404/tcp

Grafana servers

sudo firewall-cmd --permanent --add-port=3000/tcp

PostgreSQL servers

sudo firewall-cmd --permanent --add-port=5432/tcp sudo firewall-cmd --reload

Start and enable all services

Start Grafana instances and HAProxy load balancer. Enable automatic startup on system boot for high availability.

# On all Grafana servers
sudo systemctl enable --now grafana-server

On HAProxy server

sudo systemctl enable --now haproxy

Verify all services

sudo systemctl status grafana-server sudo systemctl status haproxy

Configure health monitoring

Set up monitoring for cluster health using Grafana's built-in metrics and HAProxy statistics. This setup integrates with existing Prometheus monitoring infrastructure.

[metrics]
enabled = true
basic_auth_username = metrics
basic_auth_password = SecureMetricsPass2024
interval_seconds = 10

Verify your setup

Test the high availability cluster by accessing Grafana through the load balancer and verifying failover capabilities.

# Check PostgreSQL replication status
sudo -u postgres psql -c "SELECT * FROM pg_stat_replication;"

Test Grafana cluster connectivity

curl -k https://grafana.example.com/api/health

Check HAProxy statistics

curl http://203.0.113.30:8404/stats

Verify load balancer distribution

for i in {1..10}; do curl -k -s https://grafana.example.com/api/health | grep -o '"database":"[^"]*"'; done

Test database failover

sudo -u postgres psql -c "SELECT pg_is_in_recovery();"
Note: The health endpoint should return HTTP 200 with database status "ok" when the cluster is healthy. Multiple requests should distribute across different Grafana instances.

Common issues

SymptomCauseFix
Grafana won't startDatabase connection failedCheck PostgreSQL connectivity: telnet 203.0.113.10 5432
Session not stickyHAProxy cookie configurationVerify cookie settings in /etc/haproxy/haproxy.cfg
SSL certificate errorsCertificate format or permissionsCheck PEM format: openssl x509 -in /etc/haproxy/ssl/grafana.pem -text
Health check failuresGrafana API endpoint blockedVerify /api/health returns 200: curl -k https://203.0.113.20:3000/api/health
Database replication lagNetwork or disk I/O issuesMonitor replication status: SELECT * FROM pg_stat_replication;
Load balancer 503 errorsAll Grafana backends downCheck backend server status in HAProxy stats at port 8404
Never use chmod 777. It gives every user on the system full access to your files. SSL certificates should use chmod 600 for private keys and proper ownership with chown.

Next steps

Running this in production?

Want this handled for you? Running this at scale adds a second layer of work: capacity planning, failover drills, cost control, and on-call. See how we run infrastructure like this for European teams.

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle managed devops services for businesses that depend on uptime. From initial setup to ongoing operations.