Set up a production-ready ZooKeeper ensemble to enable ClickHouse replication across multiple nodes. This tutorial covers ZooKeeper cluster configuration, ClickHouse integration, security hardening, and failover testing.
Prerequisites
- Minimum 3 servers for ZooKeeper ensemble
- At least 4GB RAM per ZooKeeper node
- Network connectivity between all nodes
- Root or sudo access
- Basic understanding of distributed systems
What this solves
ClickHouse requires ZooKeeper for coordinating data replication across cluster nodes, managing distributed table metadata, and ensuring data consistency. This tutorial sets up a 3-node ZooKeeper ensemble with ClickHouse integration, enabling high availability and automatic failover for your analytics workloads.
Step-by-step configuration
Update system packages
Start by updating your package manager on all nodes to ensure you get the latest versions.
sudo apt update && sudo apt upgrade -y
Install Java Runtime Environment
ZooKeeper requires Java 8 or later. Install OpenJDK which provides the necessary runtime environment.
sudo apt install -y openjdk-17-jre-headless
Create ZooKeeper user and directories
Create a dedicated user for ZooKeeper and set up the required directory structure with proper ownership.
sudo useradd -r -s /bin/false zookeeper
sudo mkdir -p /opt/zookeeper /var/lib/zookeeper /var/log/zookeeper
sudo chown -R zookeeper:zookeeper /opt/zookeeper /var/lib/zookeeper /var/log/zookeeper
Download and install ZooKeeper
Download the latest stable ZooKeeper release and extract it to the installation directory.
cd /tmp
wget https://archive.apache.org/dist/zookeeper/zookeeper-3.8.4/apache-zookeeper-3.8.4-bin.tar.gz
tar -xzf apache-zookeeper-3.8.4-bin.tar.gz
sudo mv apache-zookeeper-3.8.4-bin/* /opt/zookeeper/
sudo chown -R zookeeper:zookeeper /opt/zookeeper
Configure ZooKeeper ensemble
Create the main ZooKeeper configuration file with cluster settings. Replace the IP addresses with your actual server IPs.
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/var/lib/zookeeper
clientPort=2181
maxClientCnxns=60
autopurge.snapRetainCount=3
autopurge.purgeInterval=24
admin.enableServer=true
admin.serverPort=8080
4lw.commands.whitelist=*
Cluster configuration
server.1=203.0.113.10:2888:3888
server.2=203.0.113.11:2888:3888
server.3=203.0.113.12:2888:3888
Set unique server IDs
Each ZooKeeper node needs a unique ID. Create the myid file on each server with its corresponding number.
echo "1" | sudo tee /var/lib/zookeeper/myid
echo "2" | sudo tee /var/lib/zookeeper/myid
echo "3" | sudo tee /var/lib/zookeeper/myid
Configure ZooKeeper logging
Set up proper logging configuration to help with troubleshooting and monitoring.
%d{ISO8601} [myid:%X{myid}] - %-5p [%t:%C{1}@%L] - %m%n
${zookeeper.log.dir}/${zookeeper.log.file}
%d{ISO8601} [myid:%X{myid}] - %-5p [%t:%C{1}@%L] - %m%n
${zookeeper.log.maxbackupindex}
${zookeeper.log.dir}/${zookeeper.log.file}.%i
${zookeeper.log.maxfilesize}
Create ZooKeeper systemd service
Set up a systemd service file to manage ZooKeeper as a system service with automatic restart capabilities.
[Unit]
Description=Apache ZooKeeper
Documentation=http://zookeeper.apache.org/
Requires=network.target remote-fs.target
After=network.target remote-fs.target
[Service]
Type=forking
User=zookeeper
Group=zookeeper
ExecStart=/opt/zookeeper/bin/zkServer.sh start
ExecStop=/opt/zookeeper/bin/zkServer.sh stop
ExecReload=/opt/zookeeper/bin/zkServer.sh restart
WorkingDirectory=/var/lib/zookeeper
Restart=always
RestartSec=10
SuccessExitStatus=143
[Install]
WantedBy=multi-user.target
Configure firewall rules
Open the necessary ports for ZooKeeper cluster communication and client connections.
sudo ufw allow 2181/tcp comment "ZooKeeper client"
sudo ufw allow 2888/tcp comment "ZooKeeper follower"
sudo ufw allow 3888/tcp comment "ZooKeeper election"
sudo ufw allow 8080/tcp comment "ZooKeeper admin"
Start ZooKeeper cluster
Enable and start the ZooKeeper service on all three nodes. Start them sequentially to allow proper leader election.
sudo systemctl daemon-reload
sudo systemctl enable zookeeper
sudo systemctl start zookeeper
sudo systemctl status zookeeper
Install ClickHouse if not present
Install ClickHouse server on your database nodes. This builds upon our ClickHouse clustering tutorial.
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 8919F6BD2B48D754
echo "deb https://packages.clickhouse.com/deb stable main" | sudo tee /etc/apt/sources.list.d/clickhouse.list
sudo apt update
sudo apt install -y clickhouse-server clickhouse-client
Configure ClickHouse ZooKeeper integration
Configure ClickHouse to use your ZooKeeper ensemble for replication coordination.
203.0.113.10
2181
203.0.113.11
2181
203.0.113.12
2181
30000
10000
/clickhouse
user:password
/clickhouse/task_queue/ddl
production_cluster
01
replica_01
Configure ClickHouse cluster settings
Define the cluster topology and replication settings for your ClickHouse nodes.
203.0.113.20
9000
default
203.0.113.21
9000
default
203.0.113.22
9000
default
203.0.113.23
9000
default
Configure ZooKeeper authentication
Set up SASL authentication for secure ZooKeeper communication. Create authentication configuration for production security.
Server {
org.apache.zookeeper.server.auth.DigestLoginModule required
user_clickhouse="ch_password123"
user_admin="admin_password456";
};
export JVMFLAGS="-Djava.security.auth.login.config=/opt/zookeeper/conf/zookeeper_jaas.conf"
Enable ZooKeeper ACLs
Configure access control lists to restrict ZooKeeper path access to authorized clients only.
# Security settings
requireClientAuthScheme=sasl
jaasLoginRenew=3600000
kerberos.removeHostFromPrincipal=true
kerberos.removeRealmFromPrincipal=true
Restart services with new configuration
Restart ZooKeeper and ClickHouse services to apply the authentication and clustering configuration.
sudo systemctl restart zookeeper
sudo systemctl restart clickhouse-server
sudo systemctl status zookeeper clickhouse-server
Create replicated database and tables
Test the replication setup by creating a replicated database and table structure.
clickhouse-client --multiquery --query="
CREATE DATABASE IF NOT EXISTS test_replication ON CLUSTER production_cluster;
CREATE TABLE test_replication.events ON CLUSTER production_cluster (
timestamp DateTime,
user_id UInt64,
event_type String,
properties String
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/events', '{replica}')
ORDER BY (timestamp, user_id)
PARTITION BY toYYYYMM(timestamp);
CREATE TABLE test_replication.events_distributed ON CLUSTER production_cluster (
timestamp DateTime,
user_id UInt64,
event_type String,
properties String
) ENGINE = Distributed(production_cluster, test_replication, events, rand());
"
Verify your setup
Check that ZooKeeper ensemble is healthy and ClickHouse replication is working properly.
# Check ZooKeeper cluster status
echo stat | nc 203.0.113.10 2181
echo mntr | nc 203.0.113.11 2181
Verify ZooKeeper leader election
/opt/zookeeper/bin/zkServer.sh status
Check ClickHouse cluster connectivity
clickhouse-client --query="SELECT * FROM system.clusters WHERE cluster='production_cluster'"
Test replication by inserting data
clickhouse-client --query="INSERT INTO test_replication.events_distributed VALUES (now(), 1001, 'login', '{\"ip\": \"192.168.1.100\"}')"
Verify data appears on all replicas
clickhouse-client --query="SELECT count() FROM test_replication.events"
Check replication queue
clickhouse-client --query="SELECT * FROM system.replication_queue"
Configure monitoring and alerting
Set up ZooKeeper monitoring
Configure monitoring endpoints and health checks for your ZooKeeper ensemble. This integrates well with Prometheus monitoring setups.
#!/bin/bash
ZooKeeper health check script
ZK_HOSTS="203.0.113.10:2181,203.0.113.11:2181,203.0.113.12:2181"
HEALTHY=0
for host in $(echo $ZK_HOSTS | tr "," "\n"); do
if echo ruok | nc $(echo $host | cut -d: -f1) $(echo $host | cut -d: -f2) | grep -q imok; then
echo "$host is healthy"
HEALTHY=$((HEALTHY + 1))
else
echo "$host is unhealthy"
fi
done
if [ $HEALTHY -ge 2 ]; then
echo "ZooKeeper ensemble is healthy ($HEALTHY/3 nodes)"
exit 0
else
echo "ZooKeeper ensemble is unhealthy ($HEALTHY/3 nodes)"
exit 1
fi
chmod 755 /opt/zookeeper/bin/zk-health-check.sh
sudo /opt/zookeeper/bin/zk-health-check.sh
Configure log rotation
Set up proper log rotation to prevent disk space issues with ZooKeeper and ClickHouse logs.
/var/log/zookeeper/*.log {
daily
rotate 30
compress
delaycompress
missingok
notifempty
copytruncate
create 644 zookeeper zookeeper
postrotate
systemctl reload zookeeper
endscript
}
Test failover scenarios
Verify that your setup handles node failures gracefully by testing various failure scenarios.
Test ZooKeeper node failure
Simulate a ZooKeeper node failure and verify the ensemble remains operational.
# Stop one ZooKeeper node
sudo systemctl stop zookeeper
From another node, verify cluster still works
echo stat | nc 203.0.113.11 2181
Test ClickHouse operations still work
clickhouse-client --query="INSERT INTO test_replication.events_distributed VALUES (now(), 1002, 'test_failover', '{\"status\": \"during_zk_failure\"}')"
Restart the failed node
sudo systemctl start zookeeper
Verify it rejoins the ensemble
/opt/zookeeper/bin/zkServer.sh status
Test ClickHouse replica failure
Test ClickHouse replica failover and data consistency during node outages.
# Stop one ClickHouse replica
sudo systemctl stop clickhouse-server
Insert data on remaining nodes
clickhouse-client --host 203.0.113.21 --query="INSERT INTO test_replication.events_distributed VALUES (now(), 1003, 'replica_test', '{\"during\": \"replica_failure\"}')"
Restart the failed replica
sudo systemctl start clickhouse-server
Verify data synchronization
sleep 10
clickhouse-client --query="SELECT count(), max(timestamp) FROM test_replication.events"
Security hardening
Configure SSL/TLS for ZooKeeper
Enable SSL encryption for ZooKeeper client and inter-server communication.
# SSL Configuration
secureClientPort=2182
ssl.keyStore.location=/opt/zookeeper/conf/keystore.jks
ssl.keyStore.password=changeit
ssl.trustStore.location=/opt/zookeeper/conf/truststore.jks
ssl.trustStore.password=changeit
ssl.clientAuth=need
Server-to-server SSL
sslQuorum=true
serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
Create SSL certificates
Generate SSL certificates for secure ZooKeeper communication.
# Generate keystore
sudo keytool -genkeypair -alias zookeeper -keyalg RSA -keysize 2048 -keystore /opt/zookeeper/conf/keystore.jks -dname "CN=zk.example.com,O=Example,C=US" -storepass changeit -keypass changeit
Export certificate
sudo keytool -exportcert -alias zookeeper -keystore /opt/zookeeper/conf/keystore.jks -file /tmp/zookeeper.crt -storepass changeit
Create truststore
sudo keytool -importcert -alias zookeeper -keystore /opt/zookeeper/conf/truststore.jks -file /tmp/zookeeper.crt -storepass changeit -noprompt
Set proper permissions
sudo chown zookeeper:zookeeper /opt/zookeeper/conf/*.jks
sudo chmod 600 /opt/zookeeper/conf/*.jks
Common issues
| Symptom | Cause | Fix |
|---|---|---|
| ZooKeeper won't start | Invalid myid or missing Java | Check /var/lib/zookeeper/myid and verify Java installation |
| ClickHouse can't connect to ZooKeeper | Firewall blocking ports or wrong ZK config | Verify ports 2181, 2888, 3888 are open and check ZK ensemble status |
| Replication queue growing | Network issues or ZooKeeper lag | Check system.replication_queue and ZooKeeper latency with echo mntr |
| Leader election fails | Clock drift or network partitions | Sync clocks with NTP and check network connectivity between all nodes |
| SSL handshake failures | Certificate mismatch or expired certs | Verify certificate validity and hostname matching with keytool -list |
| Authentication failures | Wrong JAAS config or password | Check /opt/zookeeper/conf/zookeeper_jaas.conf and restart ZooKeeper |
Next steps
Automated install script
Run this to automate the entire setup
#!/usr/bin/env bash
set -euo pipefail
# Colors for output
readonly RED='\033[0;31m'
readonly GREEN='\033[0;32m'
readonly YELLOW='\033[1;33m'
readonly NC='\033[0m'
# Configuration
readonly ZOOKEEPER_VERSION="3.8.4"
readonly ZOOKEEPER_URL="https://archive.apache.org/dist/zookeeper/zookeeper-${ZOOKEEPER_VERSION}/apache-zookeeper-${ZOOKEEPER_VERSION}-bin.tar.gz"
# Usage function
usage() {
echo "Usage: $0 <node-id> <node1-ip> <node2-ip> <node3-ip>"
echo "Example: $0 1 10.0.0.10 10.0.0.11 10.0.0.12"
echo "Node ID should be 1, 2, or 3"
exit 1
}
# Logging functions
log_info() { echo -e "${GREEN}$1${NC}"; }
log_warn() { echo -e "${YELLOW}$1${NC}"; }
log_error() { echo -e "${RED}$1${NC}"; }
# Cleanup function
cleanup() {
if [[ $? -ne 0 ]]; then
log_error "Installation failed. Cleaning up..."
systemctl stop zookeeper 2>/dev/null || true
systemctl disable zookeeper 2>/dev/null || true
rm -f /etc/systemd/system/zookeeper.service
rm -rf /opt/zookeeper /var/lib/zookeeper /var/log/zookeeper
userdel -r zookeeper 2>/dev/null || true
fi
}
trap cleanup ERR
# Validate arguments
if [[ $# -ne 4 ]]; then
usage
fi
NODE_ID=$1
NODE1_IP=$2
NODE2_IP=$3
NODE3_IP=$4
if [[ ! "$NODE_ID" =~ ^[1-3]$ ]]; then
log_error "Node ID must be 1, 2, or 3"
exit 1
fi
# Validate IP addresses
for ip in "$NODE1_IP" "$NODE2_IP" "$NODE3_IP"; do
if ! [[ "$ip" =~ ^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$ ]]; then
log_error "Invalid IP address: $ip"
exit 1
fi
done
# Check if running as root
if [[ $EUID -ne 0 ]]; then
log_error "This script must be run as root"
exit 1
fi
# Detect distribution
if [[ -f /etc/os-release ]]; then
. /etc/os-release
case "$ID" in
ubuntu|debian)
PKG_MGR="apt"
PKG_INSTALL="apt install -y"
PKG_UPDATE="apt update && apt upgrade -y"
FIREWALL_CMD="ufw"
;;
almalinux|rocky|centos|rhel|ol|fedora)
PKG_MGR="dnf"
PKG_INSTALL="dnf install -y"
PKG_UPDATE="dnf update -y"
FIREWALL_CMD="firewall-cmd"
;;
amzn)
PKG_MGR="yum"
PKG_INSTALL="yum install -y"
PKG_UPDATE="yum update -y"
FIREWALL_CMD="firewall-cmd"
;;
*)
log_error "Unsupported distribution: $ID"
exit 1
;;
esac
else
log_error "Cannot detect distribution"
exit 1
fi
log_info "[1/9] Updating system packages..."
$PKG_UPDATE
log_info "[2/9] Installing Java Runtime Environment..."
if [[ "$PKG_MGR" == "apt" ]]; then
$PKG_INSTALL openjdk-17-jre-headless wget
else
$PKG_INSTALL java-17-openjdk-headless wget
fi
log_info "[3/9] Creating ZooKeeper user and directories..."
useradd -r -s /bin/false zookeeper || true
mkdir -p /opt/zookeeper /var/lib/zookeeper /var/log/zookeeper
chown -R zookeeper:zookeeper /opt/zookeeper /var/lib/zookeeper /var/log/zookeeper
chmod 755 /opt/zookeeper /var/lib/zookeeper /var/log/zookeeper
log_info "[4/9] Downloading and installing ZooKeeper..."
cd /tmp
wget -q "$ZOOKEEPER_URL" -O apache-zookeeper-${ZOOKEEPER_VERSION}-bin.tar.gz
tar -xzf apache-zookeeper-${ZOOKEEPER_VERSION}-bin.tar.gz
mv apache-zookeeper-${ZOOKEEPER_VERSION}-bin/* /opt/zookeeper/
chown -R zookeeper:zookeeper /opt/zookeeper
chmod 755 /opt/zookeeper/bin/*
log_info "[5/9] Configuring ZooKeeper..."
cat > /opt/zookeeper/conf/zoo.cfg << EOF
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/var/lib/zookeeper
clientPort=2181
maxClientCnxns=60
autopurge.snapRetainCount=3
autopurge.purgeInterval=24
admin.enableServer=true
admin.serverPort=8080
4lw.commands.whitelist=*
server.1=${NODE1_IP}:2888:3888
server.2=${NODE2_IP}:2888:3888
server.3=${NODE3_IP}:2888:3888
EOF
chown zookeeper:zookeeper /opt/zookeeper/conf/zoo.cfg
chmod 644 /opt/zookeeper/conf/zoo.cfg
log_info "[6/9] Setting server ID..."
echo "$NODE_ID" > /var/lib/zookeeper/myid
chown zookeeper:zookeeper /var/lib/zookeeper/myid
chmod 644 /var/lib/zookeeper/myid
log_info "[7/9] Creating systemd service..."
cat > /etc/systemd/system/zookeeper.service << 'EOF'
[Unit]
Description=Apache ZooKeeper
Documentation=http://zookeeper.apache.org/
Requires=network.target remote-fs.target
After=network.target remote-fs.target
[Service]
Type=forking
User=zookeeper
Group=zookeeper
Environment=JAVA_HOME=/usr/lib/jvm/java-17-openjdk
ExecStart=/opt/zookeeper/bin/zkServer.sh start
ExecStop=/opt/zookeeper/bin/zkServer.sh stop
ExecReload=/opt/zookeeper/bin/zkServer.sh restart
WorkingDirectory=/var/lib/zookeeper
Restart=always
RestartSec=10
SuccessExitStatus=143
[Install]
WantedBy=multi-user.target
EOF
chmod 644 /etc/systemd/system/zookeeper.service
systemctl daemon-reload
log_info "[8/9] Configuring firewall..."
if [[ "$FIREWALL_CMD" == "ufw" ]]; then
ufw --force enable 2>/dev/null || true
ufw allow 2181/tcp comment "ZooKeeper client"
ufw allow 2888/tcp comment "ZooKeeper follower"
ufw allow 3888/tcp comment "ZooKeeper election"
ufw allow 8080/tcp comment "ZooKeeper admin"
else
systemctl enable firewalld 2>/dev/null || true
systemctl start firewalld 2>/dev/null || true
firewall-cmd --permanent --add-port=2181/tcp 2>/dev/null || true
firewall-cmd --permanent --add-port=2888/tcp 2>/dev/null || true
firewall-cmd --permanent --add-port=3888/tcp 2>/dev/null || true
firewall-cmd --permanent --add-port=8080/tcp 2>/dev/null || true
firewall-cmd --reload 2>/dev/null || true
fi
log_info "[9/9] Starting ZooKeeper service..."
systemctl enable zookeeper
systemctl start zookeeper
# Verification
sleep 5
log_info "Verifying ZooKeeper installation..."
if systemctl is-active --quiet zookeeper; then
log_info "✓ ZooKeeper service is running"
else
log_error "✗ ZooKeeper service is not running"
exit 1
fi
if netstat -tlnp 2>/dev/null | grep -q ":2181.*java" || ss -tlnp 2>/dev/null | grep -q ":2181.*java"; then
log_info "✓ ZooKeeper is listening on port 2181"
else
log_error "✗ ZooKeeper is not listening on port 2181"
exit 1
fi
log_info "ZooKeeper installation completed successfully!"
log_info "Node ID: $NODE_ID"
log_info "To check cluster status, run: /opt/zookeeper/bin/zkServer.sh status"
log_info "To connect with CLI, run: /opt/zookeeper/bin/zkCli.sh"
Review the script before running. Execute with: bash install.sh