Set up a production-ready Grafana high availability cluster with PostgreSQL shared database backend and HAProxy load balancing for enterprise monitoring infrastructure with automatic failover.
Prerequisites
- 3+ servers for Grafana cluster
- 2 servers for PostgreSQL primary/standby
- 1 server for HAProxy load balancer
- SSL certificates
- Network connectivity between all nodes
What this solves
Grafana high availability clustering eliminates single points of failure in your monitoring infrastructure by running multiple Grafana instances sharing a PostgreSQL database. This setup ensures continuous dashboard access during server maintenance, hardware failures, or traffic spikes through automated load balancing and failover mechanisms.
Step-by-step configuration
Update system packages
Start by updating all servers that will host your Grafana cluster components.
sudo apt update && sudo apt upgrade -y
Install PostgreSQL cluster for shared database
Set up PostgreSQL with streaming replication for high availability. Install PostgreSQL on two servers designated as primary and standby database nodes.
sudo apt install -y postgresql-16 postgresql-16-repmgr
sudo systemctl enable --now postgresql
Configure PostgreSQL primary server
Configure the primary PostgreSQL server for streaming replication. Create the Grafana database and user with appropriate permissions.
sudo -u postgres createdb grafana
sudo -u postgres psql -c "CREATE USER grafana WITH PASSWORD 'SecureGrafanaPass2024';"
sudo -u postgres psql -c "GRANT ALL PRIVILEGES ON DATABASE grafana TO grafana;"
Configure PostgreSQL for replication
Edit PostgreSQL configuration to enable streaming replication and remote connections for Grafana instances.
listen_addresses = '*'
wal_level = replica
max_wal_senders = 10
max_replication_slots = 10
archive_mode = on
archive_command = '/bin/true'
log_replication_commands = on
# Allow Grafana connections
host grafana grafana 203.0.113.0/24 md5
Allow replication
host replication postgres 203.0.113.0/24 md5
sudo systemctl restart postgresql
Set up PostgreSQL standby server
Configure the standby PostgreSQL server for automatic failover using pg_basebackup for initial synchronization.
sudo systemctl stop postgresql
sudo -u postgres rm -rf /var/lib/postgresql/16/main/*
sudo -u postgres pg_basebackup -h 203.0.113.10 -D /var/lib/postgresql/16/main -U postgres -v -P -R
primary_conninfo = 'host=203.0.113.10 port=5432 user=postgres'
restore_command = '/bin/true'
standby_mode = on
sudo systemctl start postgresql
Install Grafana on cluster nodes
Install Grafana Enterprise on all nodes that will serve dashboard requests. Download and install the latest Grafana package.
sudo apt install -y software-properties-common wget
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/enterprise/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt update
sudo apt install -y grafana-enterprise
Configure Grafana for high availability
Configure each Grafana instance to use the shared PostgreSQL database and enable clustering features. Use identical configuration across all nodes.
[database]
type = postgres
host = 203.0.113.10:5432
name = grafana
user = grafana
password = SecureGrafanaPass2024
ssl_mode = require
max_idle_conn = 25
max_open_conn = 300
conn_max_lifetime = 14400
[server]
http_port = 3000
domain = grafana.example.com
root_url = https://grafana.example.com/
serve_from_sub_path = false
[security]
admin_user = admin
admin_password = SecureAdminPass2024
secret_key = SecureSecretKey2024ChangeThis
disable_gravatar = true
[users]
allow_sign_up = false
allow_org_create = false
auto_assign_org = true
auto_assign_org_role = Viewer
[auth]
disable_login_form = false
disable_signout_menu = false
[auth.anonymous]
enabled = false
[session]
provider = postgres
provider_config = host=203.0.113.10 port=5432 user=grafana password=SecureGrafanaPass2024 dbname=grafana sslmode=require
session_life_time = 86400
[alerting]
enabled = true
execute_alerts = true
[unified_alerting]
enabled = true
disabled_orgs =
min_interval = 10s
max_attempts = 3
[log]
mode = file
level = info
format = text
file_path = /var/log/grafana/grafana.log
max_lines = 1000000
max_size_shift = 28
daily_rotate = true
max_days = 7
Set up SSL certificates
Generate SSL certificates for secure HTTPS access. Create a self-signed certificate for testing or use Let's Encrypt for production.
sudo mkdir -p /etc/grafana/ssl
sudo openssl req -x509 -nodes -days 365 -newkey rsa:4096 \
-keyout /etc/grafana/ssl/grafana.key \
-out /etc/grafana/ssl/grafana.crt \
-subj "/C=US/ST=State/L=City/O=Organization/CN=grafana.example.com"
sudo chown -R grafana:grafana /etc/grafana/ssl
sudo chmod 600 /etc/grafana/ssl/grafana.key
sudo chmod 644 /etc/grafana/ssl/grafana.crt
Configure Grafana SSL settings
Update Grafana configuration to use SSL certificates and secure protocol settings.
[server]
protocol = https
cert_file = /etc/grafana/ssl/grafana.crt
cert_key = /etc/grafana/ssl/grafana.key
tls_min_version = TLS1.2
http_port = 3000
Install and configure HAProxy load balancer
Install HAProxy on a dedicated server to distribute traffic across Grafana instances with health checks and SSL termination.
sudo apt install -y haproxy
Configure HAProxy for Grafana clustering
Configure HAProxy with SSL termination, health checks, and load balancing across Grafana instances using round-robin algorithm with sticky sessions.
global
log stdout local0
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon
ssl-default-bind-options ssl-min-ver TLSv1.2
ssl-default-bind-ciphers ECDHE+AESGCM:ECDHE+CHACHA20:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option redispatch
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout http-keep-alive 10s
timeout check 10s
maxconn 3000
frontend grafana_frontend
bind *:443 ssl crt /etc/haproxy/ssl/grafana.pem
bind *:80
redirect scheme https code 301 if !{ ssl_fc }
default_backend grafana_backend
option httplog
capture request header X-Forwarded-For len 64
capture request header User-Agent len 64
backend grafana_backend
balance roundrobin
cookie SERVERID insert indirect nocache
option httpchk GET /api/health
http-check expect status 200
server grafana1 203.0.113.20:3000 check cookie grafana1 ssl verify none
server grafana2 203.0.113.21:3000 check cookie grafana2 ssl verify none
server grafana3 203.0.113.22:3000 check cookie grafana3 ssl verify none
frontend stats
bind *:8404
stats enable
stats uri /stats
stats refresh 30s
stats admin if TRUE
Create HAProxy SSL certificate
Create a combined SSL certificate file for HAProxy SSL termination. HAProxy requires certificate and key in a single PEM file.
sudo mkdir -p /etc/haproxy/ssl
sudo openssl req -x509 -nodes -days 365 -newkey rsa:4096 \
-keyout /tmp/haproxy.key \
-out /tmp/haproxy.crt \
-subj "/C=US/ST=State/L=City/O=Organization/CN=grafana.example.com"
sudo cat /tmp/haproxy.crt /tmp/haproxy.key | sudo tee /etc/haproxy/ssl/grafana.pem
sudo chmod 600 /etc/haproxy/ssl/grafana.pem
sudo chown haproxy:haproxy /etc/haproxy/ssl/grafana.pem
sudo rm /tmp/haproxy.key /tmp/haproxy.crt
Configure firewall rules
Open required ports for Grafana cluster communication, PostgreSQL replication, and HAProxy load balancer access.
# HAProxy server
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw allow 8404/tcp
Grafana servers
sudo ufw allow 3000/tcp
PostgreSQL servers
sudo ufw allow 5432/tcp
sudo ufw enable
Start and enable all services
Start Grafana instances and HAProxy load balancer. Enable automatic startup on system boot for high availability.
# On all Grafana servers
sudo systemctl enable --now grafana-server
On HAProxy server
sudo systemctl enable --now haproxy
Verify all services
sudo systemctl status grafana-server
sudo systemctl status haproxy
Configure health monitoring
Set up monitoring for cluster health using Grafana's built-in metrics and HAProxy statistics. This setup integrates with existing Prometheus monitoring infrastructure.
[metrics]
enabled = true
basic_auth_username = metrics
basic_auth_password = SecureMetricsPass2024
interval_seconds = 10
Verify your setup
Test the high availability cluster by accessing Grafana through the load balancer and verifying failover capabilities.
# Check PostgreSQL replication status
sudo -u postgres psql -c "SELECT * FROM pg_stat_replication;"
Test Grafana cluster connectivity
curl -k https://grafana.example.com/api/health
Check HAProxy statistics
curl http://203.0.113.30:8404/stats
Verify load balancer distribution
for i in {1..10}; do curl -k -s https://grafana.example.com/api/health | grep -o '"database":"[^"]*"'; done
Test database failover
sudo -u postgres psql -c "SELECT pg_is_in_recovery();"
Common issues
| Symptom | Cause | Fix |
|---|---|---|
| Grafana won't start | Database connection failed | Check PostgreSQL connectivity: telnet 203.0.113.10 5432 |
| Session not sticky | HAProxy cookie configuration | Verify cookie settings in /etc/haproxy/haproxy.cfg |
| SSL certificate errors | Certificate format or permissions | Check PEM format: openssl x509 -in /etc/haproxy/ssl/grafana.pem -text |
| Health check failures | Grafana API endpoint blocked | Verify /api/health returns 200: curl -k https://203.0.113.20:3000/api/health |
| Database replication lag | Network or disk I/O issues | Monitor replication status: SELECT * FROM pg_stat_replication; |
| Load balancer 503 errors | All Grafana backends down | Check backend server status in HAProxy stats at port 8404 |
Next steps
- Configure advanced Grafana dashboards with TimescaleDB
- Set up LDAP authentication for enterprise users
- Integrate Prometheus monitoring with GitOps workflows
- Implement automated backup strategies for Grafana Enterprise
- Configure multi-tenant organization management
Running this in production?
Automated install script
Run this to automate the entire setup
#!/usr/bin/env bash
set -euo pipefail
# Grafana High Availability Cluster Installation Script
# Supports: Ubuntu, Debian, AlmaLinux, Rocky Linux, CentOS, RHEL
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'
# Global variables
SCRIPT_NAME=$(basename "$0")
GRAFANA_DB_PASS=""
GRAFANA_ADMIN_PASS=""
PG_PRIMARY_IP=""
NODE_TYPE=""
TOTAL_STEPS=8
print_usage() {
cat << EOF
Usage: $SCRIPT_NAME [OPTIONS]
Install Grafana HA cluster with PostgreSQL backend
Options:
-t, --type TYPE Node type: primary-db, standby-db, or grafana
-p, --primary IP PostgreSQL primary server IP
-d, --db-pass PASS Grafana database password
-a, --admin-pass PASS Grafana admin password
-h, --help Show this help
Example:
$SCRIPT_NAME -t primary-db -d SecurePass123 -a AdminPass123
$SCRIPT_NAME -t standby-db -p 10.0.1.10 -d SecurePass123
$SCRIPT_NAME -t grafana -p 10.0.1.10 -d SecurePass123 -a AdminPass123
EOF
}
log() {
echo -e "${GREEN}[INFO]${NC} $1"
}
warn() {
echo -e "${YELLOW}[WARN]${NC} $1"
}
error() {
echo -e "${RED}[ERROR]${NC} $1" >&2
}
cleanup() {
error "Installation failed. Check logs for details."
exit 1
}
trap cleanup ERR
# Parse command line arguments
while [[ $# -gt 0 ]]; do
case $1 in
-t|--type)
NODE_TYPE="$2"
shift 2
;;
-p|--primary)
PG_PRIMARY_IP="$2"
shift 2
;;
-d|--db-pass)
GRAFANA_DB_PASS="$2"
shift 2
;;
-a|--admin-pass)
GRAFANA_ADMIN_PASS="$2"
shift 2
;;
-h|--help)
print_usage
exit 0
;;
*)
error "Unknown option: $1"
print_usage
exit 1
;;
esac
done
# Validate required parameters
if [[ -z "$NODE_TYPE" || ! "$NODE_TYPE" =~ ^(primary-db|standby-db|grafana)$ ]]; then
error "Node type (-t) is required: primary-db, standby-db, or grafana"
print_usage
exit 1
fi
if [[ "$NODE_TYPE" =~ ^(standby-db|grafana)$ && -z "$PG_PRIMARY_IP" ]]; then
error "Primary IP (-p) is required for standby-db and grafana nodes"
print_usage
exit 1
fi
if [[ -z "$GRAFANA_DB_PASS" ]]; then
error "Database password (-d) is required"
print_usage
exit 1
fi
if [[ "$NODE_TYPE" =~ ^(primary-db|grafana)$ && -z "$GRAFANA_ADMIN_PASS" ]]; then
error "Admin password (-a) is required for primary-db and grafana nodes"
print_usage
exit 1
fi
# Check if running as root
if [[ $EUID -ne 0 ]]; then
error "This script must be run as root"
exit 1
fi
# Detect OS and package manager
if [[ -f /etc/os-release ]]; then
. /etc/os-release
case "$ID" in
ubuntu|debian)
PKG_MGR="apt"
PKG_UPDATE="apt update && apt upgrade -y"
PKG_INSTALL="apt install -y"
PG_VERSION="16"
PG_CONFIG_DIR="/etc/postgresql/16/main"
PG_DATA_DIR="/var/lib/postgresql/16/main"
PG_SERVICE="postgresql"
;;
almalinux|rocky|centos|rhel|ol)
PKG_MGR="dnf"
PKG_UPDATE="dnf update -y"
PKG_INSTALL="dnf install -y"
PG_VERSION="16"
PG_CONFIG_DIR="/var/lib/pgsql/16/data"
PG_DATA_DIR="/var/lib/pgsql/16/data"
PG_SERVICE="postgresql-16"
;;
fedora)
PKG_MGR="dnf"
PKG_UPDATE="dnf update -y"
PKG_INSTALL="dnf install -y"
PG_VERSION="16"
PG_CONFIG_DIR="/var/lib/pgsql/data"
PG_DATA_DIR="/var/lib/pgsql/data"
PG_SERVICE="postgresql"
;;
amzn)
PKG_MGR="yum"
PKG_UPDATE="yum update -y"
PKG_INSTALL="yum install -y"
PG_VERSION="16"
PG_CONFIG_DIR="/var/lib/pgsql/data"
PG_DATA_DIR="/var/lib/pgsql/data"
PG_SERVICE="postgresql"
;;
*)
error "Unsupported distribution: $ID"
exit 1
;;
esac
else
error "Cannot detect OS distribution"
exit 1
fi
log "Starting Grafana HA cluster installation on $PRETTY_NAME"
log "Node type: $NODE_TYPE"
# Step 1: Update system packages
echo -e "${GREEN}[1/$TOTAL_STEPS]${NC} Updating system packages..."
$PKG_UPDATE
# Step 2: Install PostgreSQL (for DB nodes)
if [[ "$NODE_TYPE" =~ ^(primary-db|standby-db)$ ]]; then
echo -e "${GREEN}[2/$TOTAL_STEPS]${NC} Installing PostgreSQL..."
if [[ "$PKG_MGR" == "apt" ]]; then
$PKG_INSTALL postgresql-16 postgresql-contrib-16
systemctl enable --now postgresql
else
$PKG_INSTALL postgresql16-server postgresql16-contrib
if [[ ! -f "$PG_DATA_DIR/PG_VERSION" ]]; then
/usr/pgsql-16/bin/postgresql-16-setup initdb
fi
systemctl enable --now postgresql-16
fi
# Configure PostgreSQL for primary node
if [[ "$NODE_TYPE" == "primary-db" ]]; then
echo -e "${GREEN}[3/$TOTAL_STEPS]${NC} Configuring PostgreSQL primary server..."
# Create Grafana database and user
sudo -u postgres createdb grafana || true
sudo -u postgres psql -c "CREATE USER grafana WITH PASSWORD '$GRAFANA_DB_PASS';" || true
sudo -u postgres psql -c "GRANT ALL PRIVILEGES ON DATABASE grafana TO grafana;"
# Configure PostgreSQL for replication
cat >> "$PG_CONFIG_DIR/postgresql.conf" << EOF
# Grafana HA Configuration
listen_addresses = '*'
wal_level = replica
max_wal_senders = 10
max_replication_slots = 10
archive_mode = on
archive_command = '/bin/true'
log_replication_commands = on
EOF
cat >> "$PG_CONFIG_DIR/pg_hba.conf" << EOF
# Grafana connections
host grafana grafana 0.0.0.0/0 md5
# Replication connections
host replication postgres 0.0.0.0/0 md5
EOF
systemctl restart $PG_SERVICE
# Configure firewall
if command -v firewall-cmd &> /dev/null; then
firewall-cmd --permanent --add-port=5432/tcp
firewall-cmd --reload
elif command -v ufw &> /dev/null; then
ufw allow 5432/tcp
fi
fi
# Configure standby server
if [[ "$NODE_TYPE" == "standby-db" ]]; then
echo -e "${GREEN}[3/$TOTAL_STEPS]${NC} Configuring PostgreSQL standby server..."
systemctl stop $PG_SERVICE
rm -rf "$PG_DATA_DIR"/*
sudo -u postgres pg_basebackup -h "$PG_PRIMARY_IP" -D "$PG_DATA_DIR" -U postgres -v -P -R
chown -R postgres:postgres "$PG_DATA_DIR"
chmod 750 "$PG_DATA_DIR"
systemctl start $PG_SERVICE
fi
else
echo -e "${GREEN}[2/$TOTAL_STEPS]${NC} Skipping PostgreSQL installation (Grafana node)..."
echo -e "${GREEN}[3/$TOTAL_STEPS]${NC} Skipping PostgreSQL configuration (Grafana node)..."
fi
# Step 4: Install Grafana (for Grafana nodes)
if [[ "$NODE_TYPE" == "grafana" ]]; then
echo -e "${GREEN}[4/$TOTAL_STEPS]${NC} Installing Grafana Enterprise..."
if [[ "$PKG_MGR" == "apt" ]]; then
$PKG_INSTALL software-properties-common wget gnupg
wget -q -O - https://packages.grafana.com/gpg.key | apt-key add -
echo "deb https://packages.grafana.com/enterprise/deb stable main" > /etc/apt/sources.list.d/grafana.list
apt update
$PKG_INSTALL grafana-enterprise
else
$PKG_INSTALL wget
rpm --import https://packages.grafana.com/gpg.key
cat > /etc/yum.repos.d/grafana.repo << EOF
[grafana]
name=grafana
baseurl=https://packages.grafana.com/enterprise/rpm
repo_gpgcheck=1
enabled=1
gpgcheck=1
gpgkey=https://packages.grafana.com/gpg.key
EOF
$PKG_INSTALL grafana-enterprise
fi
echo -e "${GREEN}[5/$TOTAL_STEPS]${NC} Configuring Grafana for high availability..."
# Generate secret key
SECRET_KEY=$(openssl rand -base64 32)
# Configure Grafana
cat > /etc/grafana/grafana.ini << EOF
[database]
type = postgres
host = $PG_PRIMARY_IP:5432
name = grafana
user = grafana
password = $GRAFANA_DB_PASS
ssl_mode = require
max_idle_conn = 25
max_open_conn = 300
conn_max_lifetime = 14400
[server]
http_port = 3000
domain = $(hostname -f)
root_url = http://$(hostname -f):3000/
[security]
admin_user = admin
admin_password = $GRAFANA_ADMIN_PASS
secret_key = $SECRET_KEY
disable_gravatar = true
[users]
allow_sign_up = false
allow_org_create = false
[auth.anonymous]
enabled = false
[log]
mode = console file
level = info
EOF
chown root:grafana /etc/grafana/grafana.ini
chmod 640 /etc/grafana/grafana.ini
systemctl enable --now grafana-server
# Configure firewall for Grafana
if command -v firewall-cmd &> /dev/null; then
firewall-cmd --permanent --add-port=3000/tcp
firewall-cmd --reload
elif command -v ufw &> /dev/null; then
ufw allow 3000/tcp
fi
else
echo -e "${GREEN}[4/$TOTAL_STEPS]${NC} Skipping Grafana installation (Database node)..."
echo -e "${GREEN}[5/$TOTAL_STEPS]${NC} Skipping Grafana configuration (Database node)..."
fi
# Step 6: Verify installation
echo -e "${GREEN}[6/$TOTAL_STEPS]${NC} Verifying installation..."
if [[ "$NODE_TYPE" =~ ^(primary-db|standby-db)$ ]]; then
if systemctl is-active --quiet $PG_SERVICE; then
log "PostgreSQL is running successfully"
else
error "PostgreSQL failed to start"
exit 1
fi
fi
if [[ "$NODE_TYPE" == "grafana" ]]; then
if systemctl is-active --quiet grafana-server; then
log "Grafana is running successfully"
else
error "Grafana failed to start"
exit 1
fi
# Test database connection
if sudo -u grafana grafana-cli admin reset-admin-password "$GRAFANA_ADMIN_PASS" &>/dev/null; then
log "Database connection verified"
else
warn "Database connection could not be verified"
fi
fi
# Step 7: Display summary
echo -e "${GREEN}[7/$TOTAL_STEPS]${NC} Installation summary..."
cat << EOF
${GREEN}Grafana HA Cluster Node Installation Complete!${NC}
Node Type: $NODE_TYPE
EOF
if [[ "$NODE_TYPE" == "grafana" ]]; then
cat << EOF
Grafana URL: http://$(hostname -f):3000
Admin User: admin
Admin Password: $GRAFANA_ADMIN_PASS
Database: PostgreSQL at $PG_PRIMARY_IP
EOF
fi
if [[ "$NODE_TYPE" =~ ^(primary-db|standby-db)$ ]]; then
cat << EOF
PostgreSQL Status: Running
Service: $PG_SERVICE
Data Directory: $PG_DATA_DIR
EOF
fi
# Step 8: Next steps
echo -e "${GREEN}[8/$TOTAL_STEPS]${NC} Next steps..."
if [[ "$NODE_TYPE" == "primary-db" ]]; then
log "Set up standby database server with: $SCRIPT_NAME -t standby-db -p $(hostname -I | awk '{print $1}') -d $GRAFANA_DB_PASS"
elif [[ "$NODE_TYPE" == "standby-db" ]]; then
log "Install Grafana nodes with: $SCRIPT_NAME -t grafana -p $PG_PRIMARY_IP -d $GRAFANA_DB_PASS -a <admin_password>"
elif [[ "$NODE_TYPE" == "grafana" ]]; then
log "Configure load balancer to distribute traffic across Grafana instances"
log "Access Grafana at http://$(hostname -f):3000"
fi
log "Installation completed successfully!"
Review the script before running. Execute with: bash install.sh