Set up automated Elasticsearch 8 backups using snapshot lifecycle management policies with S3 repository storage. Configure retention policies, scheduling, and monitoring for production backup strategies.
Prerequisites
- Elasticsearch 8.x installed and running
- AWS account with S3 access
- Root or sudo access
- Basic familiarity with AWS IAM
What this solves
Elasticsearch snapshot lifecycle management (SLM) automates the creation, retention, and deletion of cluster snapshots to ensure data protection without manual intervention. This tutorial sets up automated backups to Amazon S3 storage with configurable retention policies and monitoring.
Step-by-step configuration
Install and configure AWS CLI
Install the AWS command line interface to manage S3 credentials and test connectivity before configuring Elasticsearch.
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install
aws --version
Create S3 bucket for snapshots
Create a dedicated S3 bucket for Elasticsearch snapshots with versioning enabled for additional data protection.
aws configure
aws s3 mb s3://elasticsearch-snapshots-prod-2024
aws s3api put-bucket-versioning --bucket elasticsearch-snapshots-prod-2024 --versioning-configuration Status=Enabled
Create IAM policy for Elasticsearch S3 access
Create an IAM policy with minimal permissions required for Elasticsearch to read, write, and delete snapshots in the designated bucket.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation",
"s3:ListBucketMultipartUploads",
"s3:ListBucketVersions"
],
"Resource": "arn:aws:s3:::elasticsearch-snapshots-prod-2024"
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:AbortMultipartUpload",
"s3:ListMultipartUploadParts"
],
"Resource": "arn:aws:s3:::elasticsearch-snapshots-prod-2024/*"
}
]
}
Create IAM user and attach policy
Create a dedicated IAM user for Elasticsearch snapshot operations and attach the policy created above.
aws iam create-policy --policy-name ElasticsearchS3SnapshotPolicy --policy-document file://elasticsearch-s3-policy.json
aws iam create-user --user-name elasticsearch-snapshot-user
aws iam attach-user-policy --user-name elasticsearch-snapshot-user --policy-arn arn:aws:iam::YOUR_ACCOUNT_ID:policy/ElasticsearchS3SnapshotPolicy
aws iam create-access-key --user-name elasticsearch-snapshot-user
Configure Elasticsearch keystore with S3 credentials
Add AWS credentials to Elasticsearch's secure keystore to authenticate with S3 without storing credentials in configuration files.
sudo -u elasticsearch /usr/share/elasticsearch/bin/elasticsearch-keystore add s3.client.default.access_key
sudo -u elasticsearch /usr/share/elasticsearch/bin/elasticsearch-keystore add s3.client.default.secret_key
sudo systemctl restart elasticsearch
sudo systemctl status elasticsearch
Install S3 repository plugin
Install the official Elasticsearch S3 repository plugin to enable snapshot storage to Amazon S3.
sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install repository-s3
sudo systemctl restart elasticsearch
sudo systemctl status elasticsearch
Create S3 snapshot repository
Register the S3 bucket as a snapshot repository in Elasticsearch with appropriate settings for chunk size and compression.
curl -X PUT "localhost:9200/_snapshot/s3_repository" -H 'Content-Type: application/json' -d'
{
"type": "s3",
"settings": {
"bucket": "elasticsearch-snapshots-prod-2024",
"region": "us-east-1",
"compress": true,
"chunk_size": "1gb",
"max_restore_bytes_per_sec": "40mb",
"max_snapshot_bytes_per_sec": "40mb"
}
}'
Verify repository configuration
Test the S3 repository configuration to ensure Elasticsearch can successfully connect and write to the bucket.
curl -X POST "localhost:9200/_snapshot/s3_repository/_verify"
curl -X GET "localhost:9200/_snapshot/s3_repository"
Create snapshot lifecycle management policy
Define an SLM policy that creates daily snapshots with retention rules to automatically delete old snapshots after a specified period.
curl -X PUT "localhost:9200/_slm/policy/daily-snapshots" -H 'Content-Type: application/json' -d'
{
"schedule": "0 2 *",
"name": "",
"repository": "s3_repository",
"config": {
"indices": "*",
"ignore_unavailable": false,
"include_global_state": true,
"metadata": {
"taken_by": "snapshot-lifecycle-management",
"taken_because": "daily automated backup"
}
},
"retention": {
"expire_after": "30d",
"min_count": 7,
"max_count": 50
}
}'
Create weekly long-term retention policy
Configure a second SLM policy for weekly snapshots with longer retention for disaster recovery scenarios.
curl -X PUT "localhost:9200/_slm/policy/weekly-snapshots" -H 'Content-Type: application/json' -d'
{
"schedule": "0 3 0",
"name": "",
"repository": "s3_repository",
"config": {
"indices": "*",
"ignore_unavailable": false,
"include_global_state": true,
"metadata": {
"taken_by": "snapshot-lifecycle-management",
"taken_because": "weekly long-term backup"
}
},
"retention": {
"expire_after": "180d",
"min_count": 4,
"max_count": 26
}
}'
Execute manual snapshot for testing
Trigger a manual snapshot using the SLM policy to verify the configuration works correctly before waiting for the scheduled execution.
curl -X POST "localhost:9200/_slm/policy/daily-snapshots/_execute"
curl -X GET "localhost:9200/_slm/policy/daily-snapshots/_execute"
Configure SLM policy monitoring
Enable detailed logging for snapshot lifecycle management operations to track policy execution and failures.
# Add these lines to existing log4j2.properties
logger.slm.name = org.elasticsearch.xpack.slm
logger.slm.level = info
logger.slm.appenderRef.console.ref = console
logger.slm.appenderRef.rolling.ref = rolling
logger.snapshot.name = org.elasticsearch.snapshots
logger.snapshot.level = info
logger.snapshot.appenderRef.console.ref = console
logger.snapshot.appenderRef.rolling.ref = rolling
Set up index lifecycle management integration
Configure index lifecycle management to work with snapshot policies for comprehensive data tiering and backup strategies.
curl -X PUT "localhost:9200/_ilm/policy/logs-policy" -H 'Content-Type: application/json' -d'
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_size": "50GB",
"max_age": "1d"
}
}
},
"warm": {
"min_age": "7d",
"actions": {
"allocate": {
"number_of_replicas": 0
}
}
},
"delete": {
"min_age": "90d"
}
}
}
}'
Monitor snapshot lifecycle management
Check SLM policy status
Monitor the execution status and statistics of your snapshot lifecycle management policies.
curl -X GET "localhost:9200/_slm/policy"
curl -X GET "localhost:9200/_slm/stats"
Monitor snapshot progress
Check the status of ongoing and completed snapshots to verify successful backup operations.
curl -X GET "localhost:9200/_snapshot/s3_repository/_all"
curl -X GET "localhost:9200/_snapshot/_status"
Set up alerting for snapshot failures
Create a watcher to alert on snapshot failures and SLM policy execution problems.
curl -X PUT "localhost:9200/_watcher/watch/snapshot_failure_alert" -H 'Content-Type: application/json' -d'
{
"trigger": {
"schedule": {
"interval": "5m"
}
},
"input": {
"search": {
"request": {
"search_type": "query_then_fetch",
"indices": ["_all"],
"body": {
"query": {
"bool": {
"must": [
{
"range": {
"@timestamp": {
"gte": "now-10m"
}
}
},
{
"match": {
"message": "snapshot failed"
}
}
]
}
}
}
}
}
},
"condition": {
"compare": {
"ctx.payload.hits.total": {
"gt": 0
}
}
},
"actions": {
"send_email": {
"email": {
"to": ["ops@example.com"],
"subject": "Elasticsearch Snapshot Failed",
"body": "Snapshot operation failed. Check cluster logs for details."
}
}
}
}'
Verify your setup
# Check SLM policies are active
curl -X GET "localhost:9200/_slm/policy"
Verify S3 repository connectivity
curl -X POST "localhost:9200/_snapshot/s3_repository/_verify"
Check recent snapshots
curl -X GET "localhost:9200/_snapshot/s3_repository/_all?pretty"
Monitor SLM execution stats
curl -X GET "localhost:9200/_slm/stats?pretty"
Check AWS S3 bucket contents
aws s3 ls s3://elasticsearch-snapshots-prod-2024/
Common issues
| Symptom | Cause | Fix |
|---|---|---|
| Repository verification fails | AWS credentials not configured | Check keystore with elasticsearch-keystore list |
| Snapshots not appearing in S3 | Insufficient IAM permissions | Verify IAM policy allows s3:PutObject and s3:ListBucket |
| SLM policy not executing | Invalid cron schedule | Test schedule with /_slm/policy/POLICY_NAME/_execute |
| Large snapshots timeout | Default timeout too low | Increase max_snapshot_bytes_per_sec in repository settings |
| S3 access denied errors | Bucket region mismatch | Ensure repository region matches S3 bucket region |
| Snapshot retention not working | Min/max count conflicts | Adjust min_count and max_count in retention policy |
Troubleshoot automated snapshots
Debug SLM policy execution
Check detailed logs and execution history to identify issues with snapshot lifecycle management.
# Check SLM execution history
curl -X GET "localhost:9200/_slm/policy/daily-snapshots?human"
View detailed policy statistics
curl -X GET "localhost:9200/_slm/stats?pretty"
Check Elasticsearch logs for SLM errors
sudo tail -f /var/log/elasticsearch/elasticsearch.log | grep -i slm
Test manual snapshot operations
Verify repository configuration by performing manual snapshot and restore operations.
# Create manual test snapshot
curl -X PUT "localhost:9200/_snapshot/s3_repository/test_snapshot_$(date +%Y%m%d_%H%M%S)?wait_for_completion=true"
List all snapshots in repository
curl -X GET "localhost:9200/_snapshot/s3_repository/_all?pretty"
Delete test snapshot
curl -X DELETE "localhost:9200/_snapshot/s3_repository/test_snapshot_*"
Monitor S3 storage costs
Track S3 storage usage and costs for snapshot retention optimization.
# Get bucket size and object count
aws s3api list-objects-v2 --bucket elasticsearch-snapshots-prod-2024 --query "[Contents[].{Key:Key,Size:Size,LastModified:LastModified}]" --output table
Calculate total bucket size
aws s3 ls s3://elasticsearch-snapshots-prod-2024/ --recursive --human-readable --summarize
Next steps
- Set up cross-cluster replication for additional disaster recovery protection
- Configure advanced alerting for snapshot monitoring
- Enable SSL encryption and security hardening for Elasticsearch
- Implement automated index curation and archival policies
Running this in production?
Automated install script
Run this to automate the entire setup
#!/usr/bin/env bash
set -euo pipefail
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color
# Default values
S3_BUCKET=""
AWS_REGION="us-east-1"
ES_USER="elasticsearch"
ES_HOME="/usr/share/elasticsearch"
# Usage function
usage() {
echo "Usage: $0 -b <s3-bucket-name> [-r <aws-region>]"
echo " -b: S3 bucket name for Elasticsearch snapshots (required)"
echo " -r: AWS region (default: us-east-1)"
echo " -h: Show this help message"
exit 1
}
# Parse command line arguments
while getopts "b:r:h" opt; do
case $opt in
b) S3_BUCKET="$OPTARG" ;;
r) AWS_REGION="$OPTARG" ;;
h) usage ;;
*) usage ;;
esac
done
if [[ -z "$S3_BUCKET" ]]; then
echo -e "${RED}Error: S3 bucket name is required${NC}"
usage
fi
# Cleanup function for rollback
cleanup() {
echo -e "${RED}Installation failed. Performing cleanup...${NC}"
systemctl restart elasticsearch 2>/dev/null || true
}
trap cleanup ERR
# Check prerequisites
if [[ $EUID -ne 0 ]]; then
echo -e "${RED}This script must be run as root${NC}"
exit 1
fi
# Auto-detect distribution
if [ -f /etc/os-release ]; then
. /etc/os-release
case "$ID" in
ubuntu|debian)
PKG_MGR="apt"
PKG_INSTALL="apt install -y"
PKG_UPDATE="apt update"
;;
almalinux|rocky|centos|rhel|ol|fedora)
PKG_MGR="dnf"
PKG_INSTALL="dnf install -y"
PKG_UPDATE="dnf update -y"
;;
amzn)
PKG_MGR="yum"
PKG_INSTALL="yum install -y"
PKG_UPDATE="yum update -y"
;;
*)
echo -e "${RED}Unsupported distro: $ID${NC}"
exit 1
;;
esac
else
echo -e "${RED}Cannot detect OS distribution${NC}"
exit 1
fi
echo -e "${GREEN}[1/10] Installing AWS CLI...${NC}"
if ! command -v aws &> /dev/null; then
if [[ "$PKG_MGR" == "dnf" || "$PKG_MGR" == "yum" ]]; then
$PKG_INSTALL unzip curl
else
$PKG_UPDATE
$PKG_INSTALL unzip curl
fi
cd /tmp
curl -s "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip -q awscliv2.zip
./aws/install
rm -rf awscliv2.zip aws/
if ! aws --version; then
echo -e "${RED}AWS CLI installation failed${NC}"
exit 1
fi
else
echo -e "${YELLOW}AWS CLI already installed${NC}"
fi
echo -e "${GREEN}[2/10] Checking Elasticsearch installation...${NC}"
if ! systemctl is-active --quiet elasticsearch; then
echo -e "${RED}Elasticsearch is not running. Please install and start Elasticsearch first.${NC}"
exit 1
fi
echo -e "${GREEN}[3/10] Installing S3 repository plugin...${NC}"
if ! $ES_HOME/bin/elasticsearch-plugin list | grep -q repository-s3; then
sudo -u $ES_USER $ES_HOME/bin/elasticsearch-plugin install --batch repository-s3
systemctl restart elasticsearch
sleep 10
if ! systemctl is-active --quiet elasticsearch; then
echo -e "${RED}Elasticsearch failed to start after plugin installation${NC}"
exit 1
fi
else
echo -e "${YELLOW}S3 repository plugin already installed${NC}"
fi
echo -e "${GREEN}[4/10] Checking AWS credentials...${NC}"
if ! aws sts get-caller-identity &>/dev/null; then
echo -e "${YELLOW}AWS credentials not configured. Please run 'aws configure' first.${NC}"
echo "Press Enter after configuring AWS credentials..."
read -r
fi
echo -e "${GREEN}[5/10] Creating S3 bucket...${NC}"
if ! aws s3api head-bucket --bucket "$S3_BUCKET" 2>/dev/null; then
aws s3 mb "s3://$S3_BUCKET" --region "$AWS_REGION"
aws s3api put-bucket-versioning --bucket "$S3_BUCKET" --versioning-configuration Status=Enabled
echo -e "${GREEN}S3 bucket created successfully${NC}"
else
echo -e "${YELLOW}S3 bucket already exists${NC}"
fi
echo -e "${GREEN}[6/10] Creating IAM policy...${NC}"
POLICY_DOC=$(cat <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation",
"s3:ListBucketMultipartUploads",
"s3:ListBucketVersions"
],
"Resource": "arn:aws:s3:::$S3_BUCKET"
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:AbortMultipartUpload",
"s3:ListMultipartUploadParts"
],
"Resource": "arn:aws:s3:::$S3_BUCKET/*"
}
]
}
EOF
)
POLICY_NAME="ElasticsearchS3SnapshotPolicy-$(date +%s)"
POLICY_ARN=$(aws iam create-policy --policy-name "$POLICY_NAME" --policy-document "$POLICY_DOC" --query 'Policy.Arn' --output text 2>/dev/null || echo "exists")
if [[ "$POLICY_ARN" == "exists" ]]; then
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
POLICY_ARN="arn:aws:iam::${ACCOUNT_ID}:policy/${POLICY_NAME}"
fi
echo -e "${GREEN}[7/10] Creating IAM user...${NC}"
IAM_USER="elasticsearch-snapshot-user-$(date +%s)"
aws iam create-user --user-name "$IAM_USER" >/dev/null
aws iam attach-user-policy --user-name "$IAM_USER" --policy-arn "$POLICY_ARN"
ACCESS_KEY_OUTPUT=$(aws iam create-access-key --user-name "$IAM_USER")
ACCESS_KEY=$(echo "$ACCESS_KEY_OUTPUT" | grep -o '"AccessKeyId": "[^"]*' | cut -d'"' -f4)
SECRET_KEY=$(echo "$ACCESS_KEY_OUTPUT" | grep -o '"SecretAccessKey": "[^"]*' | cut -d'"' -f4)
echo -e "${GREEN}[8/10] Configuring Elasticsearch keystore...${NC}"
echo "$ACCESS_KEY" | sudo -u $ES_USER $ES_HOME/bin/elasticsearch-keystore add -x s3.client.default.access_key
echo "$SECRET_KEY" | sudo -u $ES_USER $ES_HOME/bin/elasticsearch-keystore add -x s3.client.default.secret_key
systemctl restart elasticsearch
sleep 10
if ! systemctl is-active --quiet elasticsearch; then
echo -e "${RED}Elasticsearch failed to start after keystore configuration${NC}"
exit 1
fi
echo -e "${GREEN}[9/10] Creating S3 snapshot repository...${NC}"
sleep 5 # Wait for Elasticsearch to be fully ready
curl -s -X PUT "localhost:9200/_snapshot/s3_repository" -H 'Content-Type: application/json' -d"{
\"type\": \"s3\",
\"settings\": {
\"bucket\": \"$S3_BUCKET\",
\"region\": \"$AWS_REGION\",
\"compress\": true,
\"chunk_size\": \"1gb\",
\"max_restore_bytes_per_sec\": \"40mb\",
\"max_snapshot_bytes_per_sec\": \"40mb\"
}
}" >/dev/null
echo -e "${GREEN}[10/10] Verifying repository configuration...${NC}"
VERIFY_RESULT=$(curl -s -X POST "localhost:9200/_snapshot/s3_repository/_verify")
if echo "$VERIFY_RESULT" | grep -q '"nodes"'; then
echo -e "${GREEN}S3 repository verification successful${NC}"
else
echo -e "${RED}S3 repository verification failed${NC}"
echo "$VERIFY_RESULT"
exit 1
fi
echo -e "${GREEN}Installation completed successfully!${NC}"
echo -e "${YELLOW}Repository details:${NC}"
curl -s -X GET "localhost:9200/_snapshot/s3_repository" | python3 -m json.tool 2>/dev/null || echo "Repository configured"
echo -e "${YELLOW}Next steps:${NC}"
echo "1. Create a snapshot lifecycle policy"
echo "2. Test creating a manual snapshot"
echo "3. Monitor snapshot creation and retention"
Review the script before running. Execute with: bash install.sh