Set up a production-ready Apache Kafka cluster with SSL security, ZooKeeper ensemble, and comprehensive monitoring using JMX and Prometheus for high-throughput message streaming.
Prerequisites
- At least 3 servers with 4GB RAM each
- Root or sudo access
- Network connectivity between cluster nodes
- Basic understanding of distributed systems
What this solves
Apache Kafka is a distributed streaming platform that handles high-throughput, fault-tolerant message streaming for modern applications. This tutorial sets up a production-ready Kafka cluster with multiple brokers, ZooKeeper ensemble coordination, SSL/SASL security, and monitoring integration for enterprise workloads.
Step-by-step installation
Update system packages and install Java
Kafka requires Java 11 or higher to run. Start by updating your system and installing OpenJDK.
sudo apt update && sudo apt upgrade -y
sudo apt install -y openjdk-17-jdk wget curl net-tools
Verify Java installation and set JAVA_HOME:
java -version
export JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64
echo 'export JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64' >> ~/.bashrc
Create Kafka user and directory structure
Create a dedicated user for Kafka services and set up the directory structure with proper permissions.
sudo useradd -r -m -s /bin/bash kafka
sudo mkdir -p /opt/kafka /var/log/kafka /var/lib/kafka/data /var/lib/zookeeper
sudo chown -R kafka:kafka /opt/kafka /var/log/kafka /var/lib/kafka /var/lib/zookeeper
Download and install Kafka
Download the latest Kafka binary distribution and extract it to the installation directory.
cd /tmp
wget https://downloads.apache.org/kafka/2.8.2/kafka_2.13-2.8.2.tgz
tar -xzf kafka_2.13-2.8.2.tgz
sudo mv kafka_2.13-2.8.2/* /opt/kafka/
sudo chown -R kafka:kafka /opt/kafka
sudo chmod -R 755 /opt/kafka
Configure ZooKeeper ensemble
Set up ZooKeeper configuration for a three-node ensemble. This provides fault tolerance and prevents split-brain scenarios.
dataDir=/var/lib/zookeeper
clientPort=2181
maxClientCnxns=0
tickTime=2000
initLimit=10
syncLimit=5
server.1=kafka-node1.example.com:2888:3888
server.2=kafka-node2.example.com:2888:3888
server.3=kafka-node3.example.com:2888:3888
4lw.commands.whitelist=*
autopurge.snapRetainCount=3
autopurge.purgeInterval=24
Create myid file for each ZooKeeper node (use 1 for first node, 2 for second, 3 for third):
echo "1" | sudo tee /var/lib/zookeeper/myid
sudo chown kafka:kafka /var/lib/zookeeper/myid
sudo chmod 644 /var/lib/zookeeper/myid
Configure Kafka broker settings
Configure the primary Kafka server properties for clustering, performance, and reliability.
broker.id=1
listeners=PLAINTEXT://0.0.0.0:9092,SSL://0.0.0.0:9093
advertised.listeners=PLAINTEXT://kafka-node1.example.com:9092,SSL://kafka-node1.example.com:9093
log.dirs=/var/lib/kafka/data
num.network.threads=8
num.io.threads=16
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
num.partitions=3
num.recovery.threads.per.data.dir=2
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=2
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=kafka-node1.example.com:2181,kafka-node2.example.com:2181,kafka-node3.example.com:2181
zookeeper.connection.timeout.ms=18000
group.initial.rebalance.delay.ms=0
min.insync.replicas=2
default.replication.factor=3
auto.create.topics.enable=false
delete.topic.enable=true
Configure SSL security
Set up SSL certificates for encrypted client-broker and inter-broker communication.
sudo mkdir -p /opt/kafka/ssl
cd /opt/kafka/ssl
sudo keytool -keystore kafka.server.keystore.jks -alias kafka-server -validity 3650 -genkey -keyalg RSA -storepass changeme -keypass changeme -dname "CN=kafka-node1.example.com,OU=Engineering,O=Example,L=City,S=State,C=US"
Create truststore and certificate authority:
sudo openssl req -new -x509 -keyout ca-key -out ca-cert -days 3650 -passout pass:changeme -subj "/CN=Kafka-CA"
sudo keytool -keystore kafka.server.truststore.jks -alias CARoot -import -file ca-cert -storepass changeme -noprompt
Add SSL configuration to server.properties:
ssl.keystore.location=/opt/kafka/ssl/kafka.server.keystore.jks
ssl.keystore.password=changeme
ssl.key.password=changeme
ssl.truststore.location=/opt/kafka/ssl/kafka.server.truststore.jks
ssl.truststore.password=changeme
ssl.client.auth=required
ssl.endpoint.identification.algorithm=
security.inter.broker.protocol=SSL
sudo chown -R kafka:kafka /opt/kafka/ssl
sudo chmod -R 640 /opt/kafka/ssl/*.jks
Create systemd service files
Set up systemd services for ZooKeeper and Kafka to enable automatic startup and proper service management.
[Unit]
Description=Apache ZooKeeper
Requires=network.target remote-fs.target
After=network.target remote-fs.target
[Service]
Type=forking
User=kafka
Group=kafka
Environment=JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64
ExecStart=/opt/kafka/bin/zookeeper-server-start.sh -daemon /opt/kafka/config/zookeeper.properties
ExecStop=/opt/kafka/bin/zookeeper-server-stop.sh
Restart=on-failure
RestartSec=5
StandardOutput=journal
StandardError=journal
SyslogIdentifier=zookeeper
[Install]
WantedBy=multi-user.target
[Unit]
Description=Apache Kafka
Requires=zookeeper.service
After=zookeeper.service
[Service]
Type=forking
User=kafka
Group=kafka
Environment=JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64
Environment=JMX_PORT=9999
ExecStart=/opt/kafka/bin/kafka-server-start.sh -daemon /opt/kafka/config/server.properties
ExecStop=/opt/kafka/bin/kafka-server-stop.sh
Restart=on-failure
RestartSec=10
StandardOutput=journal
StandardError=journal
SyslogIdentifier=kafka
[Install]
WantedBy=multi-user.target
Configure firewall rules
Open the necessary ports for Kafka cluster communication and client access.
sudo ufw allow 2181/tcp comment 'ZooKeeper client'
sudo ufw allow 2888/tcp comment 'ZooKeeper peer'
sudo ufw allow 3888/tcp comment 'ZooKeeper leader election'
sudo ufw allow 9092/tcp comment 'Kafka PLAINTEXT'
sudo ufw allow 9093/tcp comment 'Kafka SSL'
sudo ufw allow 9999/tcp comment 'Kafka JMX'
Start and enable services
Start ZooKeeper first, then Kafka, and enable both services for automatic startup.
sudo systemctl daemon-reload
sudo systemctl enable zookeeper kafka
sudo systemctl start zookeeper
sleep 10
sudo systemctl start kafka
sudo systemctl status zookeeper kafka
Configure JMX monitoring
Enable JMX metrics collection for monitoring Kafka performance and health metrics.
rules:
- pattern: kafka.server
<>Value
name: kafka_server_$1_$2
- pattern: kafka.server
<>Value
name: kafka_server_$1_$2
labels:
clientId: "$3"
- pattern: kafka.network
<>Value
name: kafka_network_$1_$2
- pattern: kafka.log
<>Value
name: kafka_log_$1_$2
Download JMX Prometheus exporter:
cd /opt/kafka
sudo wget https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.19.0/jmx_prometheus_javaagent-0.19.0.jar
sudo chown kafka:kafka jmx_prometheus_javaagent-0.19.0.jar
Enable Prometheus metrics export
Configure Kafka to export metrics in Prometheus format for monitoring integration.
# Add to KAFKA_JVM_PERFORMANCE_OPTS (around line 180)
KAFKA_JVM_PERFORMANCE_OPTS="-server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15 -Djava.awt.headless=true -javaagent:/opt/kafka/jmx_prometheus_javaagent-0.19.0.jar=8080:/opt/kafka/config/jmx-exporter.yml"
Restart Kafka to apply JMX configuration:
sudo systemctl restart kafka
sudo systemctl status kafka
Performance tuning configuration
Apply production-ready performance tuning for high-throughput scenarios.
# Network and I/O tuning
num.network.threads=16
num.io.threads=32
queued.max.requests=16000
fetch.purgatory.purge.interval.requests=1000
producer.purgatory.purge.interval.requests=1000
Log and memory settings
log.flush.interval.messages=10000
log.flush.interval.ms=1000
replica.socket.receive.buffer.bytes=65536
Compression and batching
compression.type=snappy
log.cleanup.policy=delete
log.cleaner.enable=true
Configure JVM heap settings:
# Modify KAFKA_HEAP_OPTS (around line 20)
export KAFKA_HEAP_OPTS="-Xmx4G -Xms4G"
Verify your setup
Test the Kafka cluster functionality and verify all components are working correctly.
# Check service status
sudo systemctl status zookeeper kafka
Verify ZooKeeper ensemble
echo stat | nc localhost 2181
Test topic creation
/opt/kafka/bin/kafka-topics.sh --create --topic test-topic --bootstrap-server localhost:9092 --partitions 3 --replication-factor 3
List topics
/opt/kafka/bin/kafka-topics.sh --list --bootstrap-server localhost:9092
Check JMX metrics endpoint
curl http://localhost:8080/metrics | head -20
Test SSL connectivity
/opt/kafka/bin/kafka-topics.sh --list --bootstrap-server localhost:9093 --command-config /opt/kafka/config/ssl-client.properties
Create SSL client configuration for testing:
security.protocol=SSL
ssl.truststore.location=/opt/kafka/ssl/kafka.server.truststore.jks
ssl.truststore.password=changeme
ssl.keystore.location=/opt/kafka/ssl/kafka.server.keystore.jks
ssl.keystore.password=changeme
ssl.key.password=changeme
Common issues
| Symptom | Cause | Fix |
|---|---|---|
| ZooKeeper won't start | Port 2181 already in use | sudo netstat -tulpn | grep 2181 and kill conflicting process |
| Kafka broker fails to start | Insufficient heap memory | Increase KAFKA_HEAP_OPTS in startup script |
| SSL handshake failures | Certificate hostname mismatch | Update certificate CN to match server hostname |
| Topic replication fails | Not enough brokers available | Ensure all 3 brokers are running and connected |
| High consumer lag | Insufficient partitions | Increase partition count: kafka-topics.sh --alter --partitions 6 |
| JMX metrics not available | JMX port conflict | Change JMX_PORT in service file to unused port |
Next steps
- Install and configure Grafana with Prometheus for system monitoring to visualize Kafka metrics
- Install and configure NGINX with HTTP/3 and modern security headers for reverse proxy setup
- Configure Kafka Connect for database integration to stream data between Kafka and databases
- Setup Kafka Schema Registry with Avro for schema evolution and data governance
- Implement Kafka Streams processing applications for real-time data processing
Automated install script
Run this to automate the entire setup
#!/usr/bin/env bash
set -euo pipefail
# Production Kafka Cluster Installation Script
# Supports Ubuntu/Debian and RHEL/Fedora-based distributions
# Color definitions
readonly RED='\033[0;31m'
readonly GREEN='\033[0;32m'
readonly YELLOW='\033[1;33m'
readonly NC='\033[0m'
# Configuration
readonly KAFKA_VERSION="2.8.2"
readonly SCALA_VERSION="2.13"
readonly KAFKA_USER="kafka"
readonly KAFKA_HOME="/opt/kafka"
readonly LOG_DIR="/var/log/kafka"
readonly DATA_DIR="/var/lib/kafka/data"
readonly ZK_DATA_DIR="/var/lib/zookeeper"
# Usage message
usage() {
echo "Usage: $0 <node_id> <hostname> [zk_cluster_string]"
echo "Example: $0 1 kafka-node1.example.com kafka-node1:2181,kafka-node2:2181,kafka-node3:2181"
exit 1
}
# Logging functions
log_info() { echo -e "${GREEN}[INFO]${NC} $*"; }
log_warn() { echo -e "${YELLOW}[WARN]${NC} $*"; }
log_error() { echo -e "${RED}[ERROR]${NC} $*"; }
log_step() { echo -e "${GREEN}[$1]${NC} $2"; }
# Cleanup function for error handling
cleanup() {
local exit_code=$?
if [[ $exit_code -ne 0 ]]; then
log_error "Installation failed. Cleaning up..."
systemctl stop kafka 2>/dev/null || true
systemctl stop zookeeper 2>/dev/null || true
userdel -r $KAFKA_USER 2>/dev/null || true
rm -rf $KAFKA_HOME $LOG_DIR $DATA_DIR $ZK_DATA_DIR 2>/dev/null || true
fi
exit $exit_code
}
trap cleanup ERR
# Parse arguments
[[ $# -lt 2 ]] && usage
readonly NODE_ID="$1"
readonly HOSTNAME="$2"
readonly ZK_CLUSTER="${3:-$HOSTNAME:2181}"
# Validate node ID
[[ ! "$NODE_ID" =~ ^[1-9][0-9]*$ ]] && { log_error "Node ID must be a positive integer"; exit 1; }
# Check prerequisites
log_step "1/12" "Checking prerequisites..."
[[ $EUID -ne 0 ]] && { log_error "This script must be run as root"; exit 1; }
# Detect distribution
if [[ -f /etc/os-release ]]; then
. /etc/os-release
case "$ID" in
ubuntu|debian)
PKG_MGR="apt"
PKG_UPDATE="apt update && apt upgrade -y"
PKG_INSTALL="apt install -y"
JAVA_HOME="/usr/lib/jvm/java-17-openjdk-amd64"
;;
almalinux|rocky|centos|rhel|ol|fedora)
PKG_MGR="dnf"
PKG_UPDATE="dnf update -y"
PKG_INSTALL="dnf install -y"
JAVA_HOME="/usr/lib/jvm/java-17-openjdk"
;;
amzn)
PKG_MGR="yum"
PKG_UPDATE="yum update -y"
PKG_INSTALL="yum install -y"
JAVA_HOME="/usr/lib/jvm/java-17-openjdk"
;;
*)
log_error "Unsupported distribution: $ID"
exit 1
;;
esac
else
log_error "Cannot detect distribution"
exit 1
fi
log_info "Detected distribution: $ID using $PKG_MGR"
# Update system and install dependencies
log_step "2/12" "Updating system packages..."
eval $PKG_UPDATE
log_step "3/12" "Installing Java 17 and dependencies..."
if [[ "$PKG_MGR" == "apt" ]]; then
$PKG_INSTALL openjdk-17-jdk wget curl net-tools
else
$PKG_INSTALL java-17-openjdk-devel wget curl net-tools
fi
# Verify Java installation
java -version || { log_error "Java installation failed"; exit 1; }
export JAVA_HOME
echo "export JAVA_HOME=$JAVA_HOME" >> /etc/environment
# Create Kafka user and directories
log_step "4/12" "Creating Kafka user and directory structure..."
if ! id "$KAFKA_USER" &>/dev/null; then
useradd -r -m -s /bin/bash $KAFKA_USER
fi
mkdir -p $KAFKA_HOME $LOG_DIR $DATA_DIR $ZK_DATA_DIR
chown -R $KAFKA_USER:$KAFKA_USER $KAFKA_HOME $LOG_DIR $DATA_DIR $ZK_DATA_DIR
chmod -R 755 $KAFKA_HOME $LOG_DIR $DATA_DIR $ZK_DATA_DIR
# Download and install Kafka
log_step "5/12" "Downloading and installing Kafka..."
cd /tmp
wget -q "https://downloads.apache.org/kafka/${KAFKA_VERSION}/kafka_${SCALA_VERSION}-${KAFKA_VERSION}.tgz"
tar -xzf "kafka_${SCALA_VERSION}-${KAFKA_VERSION}.tgz"
mv "kafka_${SCALA_VERSION}-${KAFKA_VERSION}"/* $KAFKA_HOME/
chown -R $KAFKA_USER:$KAFKA_USER $KAFKA_HOME
chmod -R 755 $KAFKA_HOME
# Configure ZooKeeper
log_step "6/12" "Configuring ZooKeeper..."
cat > $KAFKA_HOME/config/zookeeper.properties << EOF
dataDir=$ZK_DATA_DIR
clientPort=2181
maxClientCnxns=0
tickTime=2000
initLimit=10
syncLimit=5
4lw.commands.whitelist=*
autopurge.snapRetainCount=3
autopurge.purgeInterval=24
EOF
echo "$NODE_ID" > $ZK_DATA_DIR/myid
chown $KAFKA_USER:$KAFKA_USER $ZK_DATA_DIR/myid
chmod 644 $ZK_DATA_DIR/myid
# Configure Kafka broker
log_step "7/12" "Configuring Kafka broker..."
cat > $KAFKA_HOME/config/server.properties << EOF
broker.id=$NODE_ID
listeners=PLAINTEXT://0.0.0.0:9092
advertised.listeners=PLAINTEXT://$HOSTNAME:9092
log.dirs=$DATA_DIR
num.network.threads=8
num.io.threads=16
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
num.partitions=3
num.recovery.threads.per.data.dir=2
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=2
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=$ZK_CLUSTER
zookeeper.connection.timeout.ms=18000
group.initial.rebalance.delay.ms=0
min.insync.replicas=2
default.replication.factor=3
auto.create.topics.enable=false
delete.topic.enable=true
EOF
chown $KAFKA_USER:$KAFKA_USER $KAFKA_HOME/config/server.properties
chmod 644 $KAFKA_HOME/config/server.properties
# Create ZooKeeper systemd service
log_step "8/12" "Creating ZooKeeper service..."
cat > /etc/systemd/system/zookeeper.service << EOF
[Unit]
Description=Apache ZooKeeper server
Documentation=http://zookeeper.apache.org
Requires=network.target remote-fs.target
After=network.target remote-fs.target
[Service]
Type=simple
User=$KAFKA_USER
Group=$KAFKA_USER
Environment=JAVA_HOME=$JAVA_HOME
ExecStart=$KAFKA_HOME/bin/zookeeper-server-start.sh $KAFKA_HOME/config/zookeeper.properties
ExecStop=$KAFKA_HOME/bin/zookeeper-server-stop.sh
Restart=on-abnormal
[Install]
WantedBy=multi-user.target
EOF
# Create Kafka systemd service
log_step "9/12" "Creating Kafka service..."
cat > /etc/systemd/system/kafka.service << EOF
[Unit]
Description=Apache Kafka server
Documentation=http://kafka.apache.org/documentation.html
Requires=zookeeper.service
After=zookeeper.service
[Service]
Type=simple
User=$KAFKA_USER
Group=$KAFKA_USER
Environment=JAVA_HOME=$JAVA_HOME
ExecStart=$KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/server.properties
ExecStop=$KAFKA_HOME/bin/kafka-server-stop.sh
Restart=on-abnormal
[Install]
WantedBy=multi-user.target
EOF
# Configure firewall
log_step "10/12" "Configuring firewall..."
if command -v ufw &> /dev/null; then
ufw allow 2181/tcp
ufw allow 2888/tcp
ufw allow 3888/tcp
ufw allow 9092/tcp
elif command -v firewall-cmd &> /dev/null; then
firewall-cmd --permanent --add-port=2181/tcp
firewall-cmd --permanent --add-port=2888/tcp
firewall-cmd --permanent --add-port=3888/tcp
firewall-cmd --permanent --add-port=9092/tcp
firewall-cmd --reload
fi
# Start services
log_step "11/12" "Starting services..."
systemctl daemon-reload
systemctl enable zookeeper kafka
systemctl start zookeeper
sleep 10
systemctl start kafka
# Verify installation
log_step "12/12" "Verifying installation..."
sleep 5
if systemctl is-active --quiet zookeeper && systemctl is-active --quiet kafka; then
log_info "ZooKeeper and Kafka services are running successfully"
# Test basic functionality
sudo -u $KAFKA_USER $KAFKA_HOME/bin/kafka-topics.sh --bootstrap-server localhost:9092 --list &>/dev/null && \
log_info "Kafka broker is responding to requests" || log_warn "Kafka broker may not be fully ready yet"
log_info "Installation completed successfully!"
log_info "Kafka Home: $KAFKA_HOME"
log_info "Data Directory: $DATA_DIR"
log_info "Log Directory: $LOG_DIR"
log_info "Node ID: $NODE_ID"
log_info "Services: systemctl {start|stop|status} {zookeeper|kafka}"
else
log_error "Service verification failed. Check logs with: journalctl -u kafka -u zookeeper"
exit 1
fi
Review the script before running. Execute with: bash install.sh