Configure Elasticsearch 8 cross-cluster replication for disaster recovery

Advanced 45 min Apr 25, 2026 32 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Set up cross-cluster replication between Elasticsearch 8 clusters to ensure data resilience and business continuity. This advanced configuration creates automatic data synchronization across geographically distributed clusters for disaster recovery scenarios.

Prerequisites

  • Two Elasticsearch 8.x clusters with platinum/enterprise license
  • Network connectivity between clusters on ports 9200 and 9300
  • SSL certificates configured on both clusters
  • Administrative access to both clusters

What this solves

Cross-cluster replication (CCR) in Elasticsearch 8 creates real-time data synchronization between clusters in different locations, ensuring your data survives datacenter failures, network partitions, or regional disasters. This setup maintains read-only replicas of critical indices that can be promoted during emergencies.

Prerequisites and cluster preparation

Verify cluster requirements

Both source and destination clusters need Elasticsearch 8.x with the same major version and a platinum or enterprise license.

curl -X GET "localhost:9200/_cluster/health?pretty"
curl -X GET "localhost:9200/_xpack/license"

Configure cluster names and network settings

Each cluster needs a unique name and network configuration that allows inter-cluster communication.

cluster.name: primary-cluster
node.name: primary-node-1
network.host: 0.0.0.0
http.port: 9200
transport.port: 9300

Enable security features

xpack.security.enabled: true xpack.security.transport.ssl.enabled: true xpack.security.transport.ssl.verification_mode: certificate xpack.security.transport.ssl.client_authentication: required xpack.security.transport.ssl.keystore.path: certs/elastic-certificates.p12 xpack.security.transport.ssl.truststore.path: certs/elastic-certificates.p12

HTTP SSL configuration

xpack.security.http.ssl.enabled: true xpack.security.http.ssl.keystore.path: certs/elastic-certificates.p12

Generate SSL certificates for secure communication

Create certificates that both clusters will use for secure inter-cluster communication.

sudo /usr/share/elasticsearch/bin/elasticsearch-certutil ca --out /etc/elasticsearch/certs/elastic-stack-ca.p12 --pass ""
sudo /usr/share/elasticsearch/bin/elasticsearch-certutil cert --ca /etc/elasticsearch/certs/elastic-stack-ca.p12 --ca-pass "" --out /etc/elasticsearch/certs/elastic-certificates.p12 --pass ""
sudo chown elasticsearch:elasticsearch /etc/elasticsearch/certs/*.p12
sudo chmod 640 /etc/elasticsearch/certs/*.p12

Configure remote cluster connections

Set up the destination cluster to recognize the source cluster as a remote cluster for replication.

curl -X PUT "https://localhost:9200/_cluster/settings" -k -u elastic:password -H "Content-Type: application/json" -d'
{
  "persistent": {
    "cluster": {
      "remote": {
        "primary-cluster": {
          "seeds": ["primary-node-1:9300", "primary-node-2:9300"],
          "transport.ping_schedule": "30s",
          "transport.compress": true
        }
      }
    }
  }
}'

Configure source cluster for replication

Create replication user and role

Set up dedicated users with minimal permissions for cross-cluster replication operations.

curl -X POST "https://localhost:9200/_security/role/ccr_remote_role" -k -u elastic:password -H "Content-Type: application/json" -d'
{
  "cluster": ["read_ccr"],
  "indices": [
    {
      "names": ["logs-", "metrics-", "critical-data-*"],
      "privileges": ["read", "read_cross_cluster"]
    }
  ]
}'

Create the replication user

Create a dedicated user account for cross-cluster replication with the appropriate role.

curl -X POST "https://localhost:9200/_security/user/ccr_user" -k -u elastic:password -H "Content-Type: application/json" -d'
{
  "password": "CCR_SecurePass123!",
  "roles": ["ccr_remote_role"],
  "full_name": "Cross Cluster Replication User"
}'

Configure index templates for replication

Set up index templates with settings optimized for replication and disaster recovery scenarios.

curl -X PUT "https://localhost:9200/_index_template/logs-replication" -k -u elastic:password -H "Content-Type: application/json" -d'
{
  "index_patterns": ["logs-*"],
  "template": {
    "settings": {
      "number_of_shards": 2,
      "number_of_replicas": 1,
      "index.soft_deletes.enabled": true,
      "index.soft_deletes.retention_lease.period": "12h",
      "index.refresh_interval": "30s"
    },
    "mappings": {
      "properties": {
        "@timestamp": {"type": "date"},
        "level": {"type": "keyword"},
        "message": {"type": "text"},
        "service": {"type": "keyword"}
      }
    }
  },
  "priority": 200
}'

Enable soft deletes on existing indices

Soft deletes are required for cross-cluster replication to track document changes and deletions.

curl -X PUT "https://localhost:9200/logs-prod/_settings" -k -u elastic:password -H "Content-Type: application/json" -d'
{
  "index.soft_deletes.enabled": true,
  "index.soft_deletes.retention_lease.period": "12h"
}'

Set up destination cluster and follower indices

Configure destination cluster settings

Update the destination cluster configuration to enable cross-cluster replication and set resource limits.

cluster.name: disaster-recovery-cluster
node.name: dr-node-1
network.host: 0.0.0.0

Cross-cluster replication settings

xpack.ccr.enabled: true

Resource limits for replication

indices.recovery.max_bytes_per_sec: 100mb ccr.auto_follow.wait_for_metadata_timeout: 60s ccr.auto_follow.wait_for_timeout: 60s

Restart destination cluster

Apply the configuration changes by restarting the Elasticsearch service.

sudo systemctl restart elasticsearch
sudo systemctl status elasticsearch
sudo systemctl restart elasticsearch
sudo systemctl status elasticsearch

Create manual follower indices

Set up follower indices on the destination cluster that will replicate specific indices from the source cluster.

curl -X PUT "https://dr-cluster:9200/logs-prod-follower/_ccr/follow" -k -u elastic:password -H "Content-Type: application/json" -d'
{
  "remote_cluster": "primary-cluster",
  "leader_index": "logs-prod",
  "settings": {
    "index.number_of_replicas": 0,
    "index.ccr.following_index": true
  },
  "max_read_request_operation_count": 5120,
  "max_read_request_size": "32mb",
  "max_outstanding_read_requests": 12,
  "max_write_request_operation_count": 5120,
  "max_write_request_size": "9mb",
  "max_outstanding_write_requests": 9,
  "max_write_buffer_count": 2147483647,
  "max_write_buffer_size": "512mb",
  "max_retry_delay": "500ms",
  "read_poll_timeout": "1m"
}'

Configure auto-follow patterns

Set up automatic replication for indices matching specific patterns, reducing manual configuration overhead.

curl -X PUT "https://dr-cluster:9200/_ccr/auto_follow/disaster_recovery_pattern" -k -u elastic:password -H "Content-Type: application/json" -d'
{
  "remote_cluster": "primary-cluster",
  "leader_index_patterns": ["logs-", "metrics-", "critical-data-*"],
  "follow_index_pattern": "{{leader_index}}-replica",
  "settings": {
    "index.number_of_replicas": 0
  },
  "max_read_request_operation_count": 5120,
  "max_read_request_size": "32mb",
  "max_outstanding_read_requests": 12,
  "max_write_request_operation_count": 5120,
  "max_write_request_size": "9mb",
  "max_outstanding_write_requests": 9,
  "max_write_buffer_count": 2147483647,
  "max_write_buffer_size": "512mb",
  "max_retry_delay": "500ms",
  "read_poll_timeout": "1m"
}'

Verify replication setup

Check that the follower indices are properly created and actively replicating data from the source cluster.

curl -X GET "https://dr-cluster:9200/_ccr/stats" -k -u elastic:password
curl -X GET "https://dr-cluster:9200/_ccr/follow_stats" -k -u elastic:password

Monitor and manage cross-cluster replication

Set up monitoring dashboards

Configure monitoring to track replication lag, throughput, and error rates across clusters.

curl -X GET "https://dr-cluster:9200/_ccr/stats?pretty" -k -u elastic:password
curl -X GET "https://dr-cluster:9200/_ccr/follow_stats/logs-prod-follower?pretty" -k -u elastic:password

Configure replication alerting

Set up alerts for replication failures, high lag, or connectivity issues between clusters using Elasticsearch Watcher.

curl -X PUT "https://dr-cluster:9200/_watcher/watch/ccr_lag_alert" -k -u elastic:password -H "Content-Type: application/json" -d'
{
  "trigger": {
    "schedule": {
      "interval": "1m"
    }
  },
  "input": {
    "http": {
      "request": {
        "scheme": "https",
        "host": "localhost",
        "port": 9200,
        "method": "get",
        "path": "/_ccr/follow_stats",
        "auth": {
          "basic": {
            "username": "elastic",
            "password": "password"
          }
        }
      }
    }
  },
  "condition": {
    "compare": {
      "ctx.payload.indices.logs-prod-follower.shards.0.time_since_last_read_millis": {
        "gt": 300000
      }
    }
  },
  "actions": {
    "send_email": {
      "email": {
        "to": ["admin@example.com"],
        "subject": "CCR Replication Lag Alert",
        "body": "Cross-cluster replication lag detected for logs-prod-follower index. Time since last read: {{ctx.payload.indices.logs-prod-follower.shards.0.time_since_last_read_millis}}ms"
      }
    }
  }
}'

Implement disaster recovery procedures

Create runbooks for promoting follower indices to leaders during a disaster scenario.

#!/bin/bash

Emergency promotion script for disaster recovery

FOLLOWER_INDEX="$1" CLUSTER_URL="https://dr-cluster:9200" if [ -z "$FOLLOWER_INDEX" ]; then echo "Usage: $0 " exit 1 fi echo "Promoting follower index $FOLLOWER_INDEX to leader..."

Stop following

curl -X POST "$CLUSTER_URL/$FOLLOWER_INDEX/_ccr/unfollow" -k -u elastic:password

Make index writable

curl -X PUT "$CLUSTER_URL/$FOLLOWER_INDEX/_settings" -k -u elastic:password -H "Content-Type: application/json" -d'{ "index.blocks.write": false }' echo "Index $FOLLOWER_INDEX is now writable and ready for production traffic"

Make the script executable and secure

Set appropriate permissions for the disaster recovery script.

sudo chmod 750 /usr/local/bin/promote-follower.sh
sudo chown elasticsearch:elasticsearch /usr/local/bin/promote-follower.sh

Create monitoring and maintenance cron jobs

Set up regular health checks and maintenance tasks for the replication setup.

sudo crontab -u elasticsearch -e
# Check CCR status every 5 minutes
/5    * curl -s "https://localhost:9200/_ccr/stats" -k -u elastic:password | jq '.indices | keys[]' > /var/log/elasticsearch/ccr-health.log

Weekly replication lag report

0 9 1 curl -s "https://localhost:9200/_ccr/follow_stats" -k -u elastic:password | jq '.indices' > /var/log/elasticsearch/weekly-ccr-report.json

Verify your setup

# Check cluster connectivity
curl -X GET "https://dr-cluster:9200/_remote/info" -k -u elastic:password

Verify follower index status

curl -X GET "https://dr-cluster:9200/_ccr/follow_stats" -k -u elastic:password

Check auto-follow patterns

curl -X GET "https://dr-cluster:9200/_ccr/auto_follow" -k -u elastic:password

Test data replication by creating a document on leader

curl -X POST "https://primary-cluster:9200/logs-prod/_doc" -k -u elastic:password -H "Content-Type: application/json" -d'{ "@timestamp": "2024-01-15T10:00:00Z", "level": "info", "message": "Test replication message", "service": "web-app" }'

Verify document appears on follower

sleep 10 curl -X GET "https://dr-cluster:9200/logs-prod-follower/_search?q=message:replication" -k -u elastic:password

Common issues

SymptomCauseFix
Follower creation fails with authentication errorIncorrect CCR user permissionsVerify user has read_ccr cluster privilege and read_cross_cluster index privilege
High replication lagNetwork latency or resource constraintsAdjust max_read_request_size and max_outstanding_read_requests settings
Remote cluster connection timeoutNetwork connectivity or SSL certificate issuesCheck firewall rules for port 9300 and verify SSL certificates match
Auto-follow not creating new indicesPattern mismatch or metadata timeoutCheck pattern syntax and increase ccr.auto_follow.wait_for_metadata_timeout
Follower index becomes read-only unexpectedlyLeader index deleted or network interruptionCheck leader index status and network connectivity
Important: During disaster recovery, promoted follower indices cannot automatically resume following. You'll need to set up new replication from the recovered primary cluster.

Next steps

Running this in production?

Running this at scale? Running multi-cluster Elasticsearch at scale adds complexity: capacity planning across regions, automated failover testing, certificate rotation, and 24/7 monitoring. Our managed platform handles the operational complexity so your team can focus on using the data, not managing the infrastructure.

Need help?

Don't want to manage this yourself?

We handle high availability infrastructure for businesses that depend on uptime. From initial setup to ongoing operations.