Configure ZooKeeper for ClickHouse replication with multi-node cluster setup

Advanced 45 min Apr 01, 2026 13 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Set up a production-ready ZooKeeper ensemble to enable ClickHouse replication across multiple nodes. This tutorial covers ZooKeeper cluster configuration, ClickHouse integration, security hardening, and failover testing.

Prerequisites

  • Minimum 3 servers for ZooKeeper ensemble
  • At least 4GB RAM per ZooKeeper node
  • Network connectivity between all nodes
  • Root or sudo access
  • Basic understanding of distributed systems

What this solves

ClickHouse requires ZooKeeper for coordinating data replication across cluster nodes, managing distributed table metadata, and ensuring data consistency. This tutorial sets up a 3-node ZooKeeper ensemble with ClickHouse integration, enabling high availability and automatic failover for your analytics workloads.

Step-by-step configuration

Update system packages

Start by updating your package manager on all nodes to ensure you get the latest versions.

sudo apt update && sudo apt upgrade -y
sudo dnf update -y

Install Java Runtime Environment

ZooKeeper requires Java 8 or later. Install OpenJDK which provides the necessary runtime environment.

sudo apt install -y openjdk-17-jre-headless
sudo dnf install -y java-17-openjdk-headless

Create ZooKeeper user and directories

Create a dedicated user for ZooKeeper and set up the required directory structure with proper ownership.

sudo useradd -r -s /bin/false zookeeper
sudo mkdir -p /opt/zookeeper /var/lib/zookeeper /var/log/zookeeper
sudo chown -R zookeeper:zookeeper /opt/zookeeper /var/lib/zookeeper /var/log/zookeeper

Download and install ZooKeeper

Download the latest stable ZooKeeper release and extract it to the installation directory.

cd /tmp
wget https://archive.apache.org/dist/zookeeper/zookeeper-3.8.4/apache-zookeeper-3.8.4-bin.tar.gz
tar -xzf apache-zookeeper-3.8.4-bin.tar.gz
sudo mv apache-zookeeper-3.8.4-bin/* /opt/zookeeper/
sudo chown -R zookeeper:zookeeper /opt/zookeeper

Configure ZooKeeper ensemble

Create the main ZooKeeper configuration file with cluster settings. Replace the IP addresses with your actual server IPs.

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/var/lib/zookeeper
clientPort=2181
maxClientCnxns=60
autopurge.snapRetainCount=3
autopurge.purgeInterval=24
admin.enableServer=true
admin.serverPort=8080
4lw.commands.whitelist=*

Cluster configuration

server.1=203.0.113.10:2888:3888 server.2=203.0.113.11:2888:3888 server.3=203.0.113.12:2888:3888

Set unique server IDs

Each ZooKeeper node needs a unique ID. Create the myid file on each server with its corresponding number.

echo "1" | sudo tee /var/lib/zookeeper/myid
echo "2" | sudo tee /var/lib/zookeeper/myid
echo "3" | sudo tee /var/lib/zookeeper/myid

Configure ZooKeeper logging

Set up proper logging configuration to help with troubleshooting and monitoring.



    
    
    
    

    
        
            %d{ISO8601} [myid:%X{myid}] - %-5p [%t:%C{1}@%L] - %m%n
        
    

    
        ${zookeeper.log.dir}/${zookeeper.log.file}
        
            %d{ISO8601} [myid:%X{myid}] - %-5p [%t:%C{1}@%L] - %m%n
        
        
            ${zookeeper.log.maxbackupindex}
            ${zookeeper.log.dir}/${zookeeper.log.file}.%i
        
        
            ${zookeeper.log.maxfilesize}
        
    

    
        
        
    

Create ZooKeeper systemd service

Set up a systemd service file to manage ZooKeeper as a system service with automatic restart capabilities.

[Unit]
Description=Apache ZooKeeper
Documentation=http://zookeeper.apache.org/
Requires=network.target remote-fs.target
After=network.target remote-fs.target

[Service]
Type=forking
User=zookeeper
Group=zookeeper
ExecStart=/opt/zookeeper/bin/zkServer.sh start
ExecStop=/opt/zookeeper/bin/zkServer.sh stop
ExecReload=/opt/zookeeper/bin/zkServer.sh restart
WorkingDirectory=/var/lib/zookeeper
Restart=always
RestartSec=10
SuccessExitStatus=143

[Install]
WantedBy=multi-user.target

Configure firewall rules

Open the necessary ports for ZooKeeper cluster communication and client connections.

sudo ufw allow 2181/tcp comment "ZooKeeper client"
sudo ufw allow 2888/tcp comment "ZooKeeper follower"
sudo ufw allow 3888/tcp comment "ZooKeeper election"
sudo ufw allow 8080/tcp comment "ZooKeeper admin"
sudo firewall-cmd --permanent --add-port=2181/tcp --comment="ZooKeeper client"
sudo firewall-cmd --permanent --add-port=2888/tcp --comment="ZooKeeper follower"
sudo firewall-cmd --permanent --add-port=3888/tcp --comment="ZooKeeper election"
sudo firewall-cmd --permanent --add-port=8080/tcp --comment="ZooKeeper admin"
sudo firewall-cmd --reload

Start ZooKeeper cluster

Enable and start the ZooKeeper service on all three nodes. Start them sequentially to allow proper leader election.

sudo systemctl daemon-reload
sudo systemctl enable zookeeper
sudo systemctl start zookeeper
sudo systemctl status zookeeper

Install ClickHouse if not present

Install ClickHouse server on your database nodes. This builds upon our ClickHouse clustering tutorial.

sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 8919F6BD2B48D754
echo "deb https://packages.clickhouse.com/deb stable main" | sudo tee /etc/apt/sources.list.d/clickhouse.list
sudo apt update
sudo apt install -y clickhouse-server clickhouse-client
sudo yum install -y yum-utils
sudo yum-config-manager --add-repo https://packages.clickhouse.com/rpm/clickhouse.repo
sudo yum install -y clickhouse-server clickhouse-client

Configure ClickHouse ZooKeeper integration

Configure ClickHouse to use your ZooKeeper ensemble for replication coordination.



    
        
            203.0.113.10
            2181
        
        
            203.0.113.11
            2181
        
        
            203.0.113.12
            2181
        
        30000
        10000
        /clickhouse
        user:password
    
    
    
        /clickhouse/task_queue/ddl
    
    
    
        production_cluster
        01
        replica_01
    

Configure ClickHouse cluster settings

Define the cluster topology and replication settings for your ClickHouse nodes.



    
        
            
                
                    203.0.113.20
                    9000
                    default
                    
                
                
                    203.0.113.21
                    9000
                    default
                    
                
            
            
                
                    203.0.113.22
                    9000
                    default
                    
                
                
                    203.0.113.23
                    9000
                    default
                    
                
            
        
    

Configure ZooKeeper authentication

Set up SASL authentication for secure ZooKeeper communication. Create authentication configuration for production security.

Server {
    org.apache.zookeeper.server.auth.DigestLoginModule required
    user_clickhouse="ch_password123"
    user_admin="admin_password456";
};
export JVMFLAGS="-Djava.security.auth.login.config=/opt/zookeeper/conf/zookeeper_jaas.conf"

Enable ZooKeeper ACLs

Configure access control lists to restrict ZooKeeper path access to authorized clients only.

# Security settings
requireClientAuthScheme=sasl
jaasLoginRenew=3600000
kerberos.removeHostFromPrincipal=true
kerberos.removeRealmFromPrincipal=true

Restart services with new configuration

Restart ZooKeeper and ClickHouse services to apply the authentication and clustering configuration.

sudo systemctl restart zookeeper
sudo systemctl restart clickhouse-server
sudo systemctl status zookeeper clickhouse-server

Create replicated database and tables

Test the replication setup by creating a replicated database and table structure.

clickhouse-client --multiquery --query="
CREATE DATABASE IF NOT EXISTS test_replication ON CLUSTER production_cluster;

CREATE TABLE test_replication.events ON CLUSTER production_cluster (
    timestamp DateTime,
    user_id UInt64,
    event_type String,
    properties String
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/events', '{replica}')
ORDER BY (timestamp, user_id)
PARTITION BY toYYYYMM(timestamp);

CREATE TABLE test_replication.events_distributed ON CLUSTER production_cluster (
    timestamp DateTime,
    user_id UInt64,
    event_type String,
    properties String
) ENGINE = Distributed(production_cluster, test_replication, events, rand());
"

Verify your setup

Check that ZooKeeper ensemble is healthy and ClickHouse replication is working properly.

# Check ZooKeeper cluster status
echo stat | nc 203.0.113.10 2181
echo mntr | nc 203.0.113.11 2181

Verify ZooKeeper leader election

/opt/zookeeper/bin/zkServer.sh status

Check ClickHouse cluster connectivity

clickhouse-client --query="SELECT * FROM system.clusters WHERE cluster='production_cluster'"

Test replication by inserting data

clickhouse-client --query="INSERT INTO test_replication.events_distributed VALUES (now(), 1001, 'login', '{\"ip\": \"192.168.1.100\"}')"

Verify data appears on all replicas

clickhouse-client --query="SELECT count() FROM test_replication.events"

Check replication queue

clickhouse-client --query="SELECT * FROM system.replication_queue"

Configure monitoring and alerting

Set up ZooKeeper monitoring

Configure monitoring endpoints and health checks for your ZooKeeper ensemble. This integrates well with Prometheus monitoring setups.

#!/bin/bash

ZooKeeper health check script

ZK_HOSTS="203.0.113.10:2181,203.0.113.11:2181,203.0.113.12:2181" HEALTHY=0 for host in $(echo $ZK_HOSTS | tr "," "\n"); do if echo ruok | nc $(echo $host | cut -d: -f1) $(echo $host | cut -d: -f2) | grep -q imok; then echo "$host is healthy" HEALTHY=$((HEALTHY + 1)) else echo "$host is unhealthy" fi done if [ $HEALTHY -ge 2 ]; then echo "ZooKeeper ensemble is healthy ($HEALTHY/3 nodes)" exit 0 else echo "ZooKeeper ensemble is unhealthy ($HEALTHY/3 nodes)" exit 1 fi
chmod 755 /opt/zookeeper/bin/zk-health-check.sh
sudo /opt/zookeeper/bin/zk-health-check.sh

Configure log rotation

Set up proper log rotation to prevent disk space issues with ZooKeeper and ClickHouse logs.

/var/log/zookeeper/*.log {
    daily
    rotate 30
    compress
    delaycompress
    missingok
    notifempty
    copytruncate
    create 644 zookeeper zookeeper
    postrotate
        systemctl reload zookeeper
    endscript
}

Test failover scenarios

Verify that your setup handles node failures gracefully by testing various failure scenarios.

Test ZooKeeper node failure

Simulate a ZooKeeper node failure and verify the ensemble remains operational.

# Stop one ZooKeeper node
sudo systemctl stop zookeeper

From another node, verify cluster still works

echo stat | nc 203.0.113.11 2181

Test ClickHouse operations still work

clickhouse-client --query="INSERT INTO test_replication.events_distributed VALUES (now(), 1002, 'test_failover', '{\"status\": \"during_zk_failure\"}')"

Restart the failed node

sudo systemctl start zookeeper

Verify it rejoins the ensemble

/opt/zookeeper/bin/zkServer.sh status

Test ClickHouse replica failure

Test ClickHouse replica failover and data consistency during node outages.

# Stop one ClickHouse replica
sudo systemctl stop clickhouse-server

Insert data on remaining nodes

clickhouse-client --host 203.0.113.21 --query="INSERT INTO test_replication.events_distributed VALUES (now(), 1003, 'replica_test', '{\"during\": \"replica_failure\"}')"

Restart the failed replica

sudo systemctl start clickhouse-server

Verify data synchronization

sleep 10 clickhouse-client --query="SELECT count(), max(timestamp) FROM test_replication.events"

Security hardening

Configure SSL/TLS for ZooKeeper

Enable SSL encryption for ZooKeeper client and inter-server communication.

# SSL Configuration
secureClientPort=2182
ssl.keyStore.location=/opt/zookeeper/conf/keystore.jks
ssl.keyStore.password=changeit
ssl.trustStore.location=/opt/zookeeper/conf/truststore.jks
ssl.trustStore.password=changeit
ssl.clientAuth=need

Server-to-server SSL

sslQuorum=true serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory

Create SSL certificates

Generate SSL certificates for secure ZooKeeper communication.

# Generate keystore
sudo keytool -genkeypair -alias zookeeper -keyalg RSA -keysize 2048 -keystore /opt/zookeeper/conf/keystore.jks -dname "CN=zk.example.com,O=Example,C=US" -storepass changeit -keypass changeit

Export certificate

sudo keytool -exportcert -alias zookeeper -keystore /opt/zookeeper/conf/keystore.jks -file /tmp/zookeeper.crt -storepass changeit

Create truststore

sudo keytool -importcert -alias zookeeper -keystore /opt/zookeeper/conf/truststore.jks -file /tmp/zookeeper.crt -storepass changeit -noprompt

Set proper permissions

sudo chown zookeeper:zookeeper /opt/zookeeper/conf/*.jks sudo chmod 600 /opt/zookeeper/conf/*.jks
Never use chmod 777. It gives every user on the system full access to your files. Instead, fix ownership with chown and use minimal permissions like 600 for SSL certificates.

Common issues

SymptomCauseFix
ZooKeeper won't startInvalid myid or missing JavaCheck /var/lib/zookeeper/myid and verify Java installation
ClickHouse can't connect to ZooKeeperFirewall blocking ports or wrong ZK configVerify ports 2181, 2888, 3888 are open and check ZK ensemble status
Replication queue growingNetwork issues or ZooKeeper lagCheck system.replication_queue and ZooKeeper latency with echo mntr
Leader election failsClock drift or network partitionsSync clocks with NTP and check network connectivity between all nodes
SSL handshake failuresCertificate mismatch or expired certsVerify certificate validity and hostname matching with keytool -list
Authentication failuresWrong JAAS config or passwordCheck /opt/zookeeper/conf/zookeeper_jaas.conf and restart ZooKeeper

Next steps

Automated install script

Run this to automate the entire setup

#zookeeper #clickhouse #replication #clustering #high-availability

Need help?

Don't want to manage this yourself?

We handle infrastructure for businesses that depend on uptime. From initial setup to ongoing operations.

Talk to an engineer