Configure Elasticsearch 8 Cross-Cluster Replication

Set up cross-cluster replication between Elasticsearch 8 clusters to ensure data resilience and business continuity. This advanced configuration creates automatic data synchronization across geographically distributed clusters for disaster recovery scenarios.

Prerequisites

Two Elasticsearch 8.x clusters with platinum/enterprise license
Network connectivity between clusters on ports 9200 and 9300
SSL certificates configured on both clusters
Administrative access to both clusters

What this solves

Cross-cluster replication (CCR) in Elasticsearch 8 creates real-time data synchronization between clusters in different locations, ensuring your data survives datacenter failures, network partitions, or regional disasters. This setup maintains read-only replicas of critical indices that can be promoted during emergencies.

Prerequisites and cluster preparation

Verify cluster requirements

Both source and destination clusters need Elasticsearch 8.x with the same major version and a platinum or enterprise license.

curl -X GET "localhost:9200/_cluster/health?pretty"
curl -X GET "localhost:9200/_xpack/license"

Configure cluster names and network settings

Each cluster needs a unique name and network configuration that allows inter-cluster communication.

cluster.name: primary-cluster
node.name: primary-node-1
network.host: 0.0.0.0
http.port: 9200
transport.port: 9300

Enable security features
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.client_authentication: required
xpack.security.transport.ssl.keystore.path: certs/elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: certs/elastic-certificates.p12

HTTP SSL configuration
xpack.security.http.ssl.enabled: true
xpack.security.http.ssl.keystore.path: certs/elastic-certificates.p12

Generate SSL certificates for secure communication

Create certificates that both clusters will use for secure inter-cluster communication.

sudo /usr/share/elasticsearch/bin/elasticsearch-certutil ca --out /etc/elasticsearch/certs/elastic-stack-ca.p12 --pass ""
sudo /usr/share/elasticsearch/bin/elasticsearch-certutil cert --ca /etc/elasticsearch/certs/elastic-stack-ca.p12 --ca-pass "" --out /etc/elasticsearch/certs/elastic-certificates.p12 --pass ""
sudo chown elasticsearch:elasticsearch /etc/elasticsearch/certs/*.p12
sudo chmod 640 /etc/elasticsearch/certs/*.p12

Configure remote cluster connections

Set up the destination cluster to recognize the source cluster as a remote cluster for replication.

curl -X PUT "https://localhost:9200/_cluster/settings" -k -u elastic:password -H "Content-Type: application/json" -d'
{
  "persistent": {
    "cluster": {
      "remote": {
        "primary-cluster": {
          "seeds": ["primary-node-1:9300", "primary-node-2:9300"],
          "transport.ping_schedule": "30s",
          "transport.compress": true
        }
      }
    }
  }
}'

Configure source cluster for replication

Create replication user and role

Set up dedicated users with minimal permissions for cross-cluster replication operations.

curl -X POST "https://localhost:9200/_security/role/ccr_remote_role" -k -u elastic:password -H "Content-Type: application/json" -d'
{
  "cluster": ["read_ccr"],
  "indices": [
    {
      "names": ["logs-", "metrics-", "critical-data-*"],
      "privileges": ["read", "read_cross_cluster"]
    }
  ]
}'

Create the replication user

Create a dedicated user account for cross-cluster replication with the appropriate role.

curl -X POST "https://localhost:9200/_security/user/ccr_user" -k -u elastic:password -H "Content-Type: application/json" -d'
{
  "password": "CCR_SecurePass123!",
  "roles": ["ccr_remote_role"],
  "full_name": "Cross Cluster Replication User"
}'

Configure index templates for replication

Set up index templates with settings optimized for replication and disaster recovery scenarios.

curl -X PUT "https://localhost:9200/_index_template/logs-replication" -k -u elastic:password -H "Content-Type: application/json" -d'
{
  "index_patterns": ["logs-*"],
  "template": {
    "settings": {
      "number_of_shards": 2,
      "number_of_replicas": 1,
      "index.soft_deletes.enabled": true,
      "index.soft_deletes.retention_lease.period": "12h",
      "index.refresh_interval": "30s"
    },
    "mappings": {
      "properties": {
        "@timestamp": {"type": "date"},
        "level": {"type": "keyword"},
        "message": {"type": "text"},
        "service": {"type": "keyword"}
      }
    }
  },
  "priority": 200
}'

Enable soft deletes on existing indices

Soft deletes are required for cross-cluster replication to track document changes and deletions.

curl -X PUT "https://localhost:9200/logs-prod/_settings" -k -u elastic:password -H "Content-Type: application/json" -d'
{
  "index.soft_deletes.enabled": true,
  "index.soft_deletes.retention_lease.period": "12h"
}'

Set up destination cluster and follower indices

Configure destination cluster settings

Update the destination cluster configuration to enable cross-cluster replication and set resource limits.

cluster.name: disaster-recovery-cluster
node.name: dr-node-1
network.host: 0.0.0.0

Cross-cluster replication settings
xpack.ccr.enabled: true

Resource limits for replication
indices.recovery.max_bytes_per_sec: 100mb
ccr.auto_follow.wait_for_metadata_timeout: 60s
ccr.auto_follow.wait_for_timeout: 60s

Restart destination cluster

Apply the configuration changes by restarting the Elasticsearch service.

sudo systemctl restart elasticsearch
sudo systemctl status elasticsearch

sudo systemctl restart elasticsearch
sudo systemctl status elasticsearch

Create manual follower indices

Set up follower indices on the destination cluster that will replicate specific indices from the source cluster.

curl -X PUT "https://dr-cluster:9200/logs-prod-follower/_ccr/follow" -k -u elastic:password -H "Content-Type: application/json" -d'
{
  "remote_cluster": "primary-cluster",
  "leader_index": "logs-prod",
  "settings": {
    "index.number_of_replicas": 0,
    "index.ccr.following_index": true
  },
  "max_read_request_operation_count": 5120,
  "max_read_request_size": "32mb",
  "max_outstanding_read_requests": 12,
  "max_write_request_operation_count": 5120,
  "max_write_request_size": "9mb",
  "max_outstanding_write_requests": 9,
  "max_write_buffer_count": 2147483647,
  "max_write_buffer_size": "512mb",
  "max_retry_delay": "500ms",
  "read_poll_timeout": "1m"
}'

Configure auto-follow patterns

Set up automatic replication for indices matching specific patterns, reducing manual configuration overhead.

curl -X PUT "https://dr-cluster:9200/_ccr/auto_follow/disaster_recovery_pattern" -k -u elastic:password -H "Content-Type: application/json" -d'
{
  "remote_cluster": "primary-cluster",
  "leader_index_patterns": ["logs-", "metrics-", "critical-data-*"],
  "follow_index_pattern": "{{leader_index}}-replica",
  "settings": {
    "index.number_of_replicas": 0
  },
  "max_read_request_operation_count": 5120,
  "max_read_request_size": "32mb",
  "max_outstanding_read_requests": 12,
  "max_write_request_operation_count": 5120,
  "max_write_request_size": "9mb",
  "max_outstanding_write_requests": 9,
  "max_write_buffer_count": 2147483647,
  "max_write_buffer_size": "512mb",
  "max_retry_delay": "500ms",
  "read_poll_timeout": "1m"
}'

Verify replication setup

Check that the follower indices are properly created and actively replicating data from the source cluster.

curl -X GET "https://dr-cluster:9200/_ccr/stats" -k -u elastic:password
curl -X GET "https://dr-cluster:9200/_ccr/follow_stats" -k -u elastic:password

Monitor and manage cross-cluster replication

Set up monitoring dashboards

Configure monitoring to track replication lag, throughput, and error rates across clusters.

curl -X GET "https://dr-cluster:9200/_ccr/stats?pretty" -k -u elastic:password
curl -X GET "https://dr-cluster:9200/_ccr/follow_stats/logs-prod-follower?pretty" -k -u elastic:password

Configure replication alerting

Set up alerts for replication failures, high lag, or connectivity issues between clusters using Elasticsearch Watcher.

curl -X PUT "https://dr-cluster:9200/_watcher/watch/ccr_lag_alert" -k -u elastic:password -H "Content-Type: application/json" -d'
{
  "trigger": {
    "schedule": {
      "interval": "1m"
    }
  },
  "input": {
    "http": {
      "request": {
        "scheme": "https",
        "host": "localhost",
        "port": 9200,
        "method": "get",
        "path": "/_ccr/follow_stats",
        "auth": {
          "basic": {
            "username": "elastic",
            "password": "password"
          }
        }
      }
    }
  },
  "condition": {
    "compare": {
      "ctx.payload.indices.logs-prod-follower.shards.0.time_since_last_read_millis": {
        "gt": 300000
      }
    }
  },
  "actions": {
    "send_email": {
      "email": {
        "to": ["admin@example.com"],
        "subject": "CCR Replication Lag Alert",
        "body": "Cross-cluster replication lag detected for logs-prod-follower index. Time since last read: {{ctx.payload.indices.logs-prod-follower.shards.0.time_since_last_read_millis}}ms"
      }
    }
  }
}'

Implement disaster recovery procedures

Create runbooks for promoting follower indices to leaders during a disaster scenario.

#!/bin/bash
Emergency promotion script for disaster recovery

FOLLOWER_INDEX="$1"
CLUSTER_URL="https://dr-cluster:9200"

if [ -z "$FOLLOWER_INDEX" ]; then
    echo "Usage: $0 "
    exit 1
fi

echo "Promoting follower index $FOLLOWER_INDEX to leader..."

Stop following
curl -X POST "$CLUSTER_URL/$FOLLOWER_INDEX/_ccr/unfollow" -k -u elastic:password

Make index writable
curl -X PUT "$CLUSTER_URL/$FOLLOWER_INDEX/_settings" -k -u elastic:password -H "Content-Type: application/json" -d'{
  "index.blocks.write": false
}'

echo "Index $FOLLOWER_INDEX is now writable and ready for production traffic"

Make the script executable and secure

Set appropriate permissions for the disaster recovery script.

sudo chmod 750 /usr/local/bin/promote-follower.sh
sudo chown elasticsearch:elasticsearch /usr/local/bin/promote-follower.sh

Create monitoring and maintenance cron jobs

Set up regular health checks and maintenance tasks for the replication setup.

sudo crontab -u elasticsearch -e

# Check CCR status every 5 minutes
/5    * curl -s "https://localhost:9200/_ccr/stats" -k -u elastic:password | jq '.indices | keys[]' > /var/log/elasticsearch/ccr-health.log

Weekly replication lag report
0 9   1 curl -s "https://localhost:9200/_ccr/follow_stats" -k -u elastic:password | jq '.indices' > /var/log/elasticsearch/weekly-ccr-report.json

Verify your setup

# Check cluster connectivity
curl -X GET "https://dr-cluster:9200/_remote/info" -k -u elastic:password

Verify follower index status
curl -X GET "https://dr-cluster:9200/_ccr/follow_stats" -k -u elastic:password

Check auto-follow patterns
curl -X GET "https://dr-cluster:9200/_ccr/auto_follow" -k -u elastic:password

Test data replication by creating a document on leader
curl -X POST "https://primary-cluster:9200/logs-prod/_doc" -k -u elastic:password -H "Content-Type: application/json" -d'{
  "@timestamp": "2024-01-15T10:00:00Z",
  "level": "info",
  "message": "Test replication message",
  "service": "web-app"
}'

Verify document appears on follower
sleep 10
curl -X GET "https://dr-cluster:9200/logs-prod-follower/_search?q=message:replication" -k -u elastic:password

Common issues

Symptom	Cause	Fix
Follower creation fails with authentication error	Incorrect CCR user permissions	Verify user has `read_ccr` cluster privilege and `read_cross_cluster` index privilege
High replication lag	Network latency or resource constraints	Adjust `max_read_request_size` and `max_outstanding_read_requests` settings
Remote cluster connection timeout	Network connectivity or SSL certificate issues	Check firewall rules for port 9300 and verify SSL certificates match
Auto-follow not creating new indices	Pattern mismatch or metadata timeout	Check pattern syntax and increase `ccr.auto_follow.wait_for_metadata_timeout`
Follower index becomes read-only unexpectedly	Leader index deleted or network interruption	Check leader index status and network connectivity

Important: During disaster recovery, promoted follower indices cannot automatically resume following. You'll need to set up new replication from the recovered primary cluster.

Next steps

Running this in production?

Running this at scale? Running multi-cluster Elasticsearch at scale adds complexity: capacity planning across regions, automated failover testing, certificate rotation, and 24/7 monitoring. Our managed platform handles the operational complexity so your team can focus on using the data, not managing the infrastructure.

#elasticsearch #disaster-recovery #cross-cluster-replication #high-availability #elasticsearch-8

Configure Elasticsearch 8 cross-cluster replication for disaster recovery

Prerequisites

What this solves

Prerequisites and cluster preparation

Verify cluster requirements

Configure cluster names and network settings

Enable security features

HTTP SSL configuration

Generate SSL certificates for secure communication

Configure remote cluster connections

Configure source cluster for replication

Create replication user and role

Create the replication user

Configure index templates for replication

Enable soft deletes on existing indices

Set up destination cluster and follower indices

Configure destination cluster settings

Cross-cluster replication settings

Resource limits for replication

Restart destination cluster

Create manual follower indices

Configure auto-follow patterns

Verify replication setup

Monitor and manage cross-cluster replication

Set up monitoring dashboards

Configure replication alerting

Implement disaster recovery procedures

Emergency promotion script for disaster recovery

Stop following

Make index writable

Make the script executable and secure

Create monitoring and maintenance cron jobs

Weekly replication lag report

Verify your setup

Verify follower index status

Check auto-follow patterns

Test data replication by creating a document on leader

Verify document appears on follower

Common issues

Next steps

Running this in production?

Related tutorials

Setup ScyllaDB backup validation and automated restore testing

Implement MariaDB backup encryption with Mariabackup and automated restoration

Configure MariaDB Galera cluster for multi-master replication with automatic failover

Don't want to manage this yourself?