Configure comprehensive ClickHouse monitoring using Prometheus for metrics collection and Grafana for visualization. Set up system metrics, query performance monitoring, and alerting rules for production ClickHouse deployments.
Prerequisites
- ClickHouse server installed and running
- Root or sudo access
- At least 4GB RAM available
- Network connectivity for package installation
What this solves
ClickHouse requires comprehensive monitoring to track query performance, resource utilization, and system health in production environments. This tutorial sets up Prometheus to collect ClickHouse metrics and configures Grafana dashboards for visualization and alerting. You'll implement monitoring for system metrics, query performance, and create alerting rules for proactive issue detection.
Step-by-step configuration
Install and configure Prometheus
Install Prometheus server to collect metrics from ClickHouse instances.
sudo apt update
sudo apt install -y prometheus
sudo systemctl enable prometheus
Configure ClickHouse metrics endpoint
Enable the Prometheus metrics endpoint in ClickHouse configuration to expose internal metrics.
/metrics
9363
true
true
true
Configure Prometheus scrape configuration
Add ClickHouse targets to Prometheus configuration for automatic metrics collection.
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "/etc/prometheus/rules/*.yml"
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'clickhouse'
static_configs:
- targets: ['localhost:9363']
scrape_interval: 10s
metrics_path: /metrics
params:
format: ['prometheus']
- job_name: 'clickhouse-system'
static_configs:
- targets: ['localhost:8123']
scrape_interval: 30s
metrics_path: /
params:
query: ['SELECT metric, value FROM system.metrics FORMAT Prometheus']
basic_auth:
username: monitoring
password: secure_password_123
Create ClickHouse monitoring user
Create a dedicated user for Prometheus to query ClickHouse system tables securely.
clickhouse-client --query "CREATE USER monitoring IDENTIFIED BY 'secure_password_123'"
clickhouse-client --query "GRANT SELECT ON system.* TO monitoring"
clickhouse-client --query "GRANT SELECT ON INFORMATION_SCHEMA.* TO monitoring"
Install and configure Grafana
Install Grafana for creating dashboards and visualization of ClickHouse metrics.
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt update
sudo apt install -y grafana
sudo systemctl enable --now grafana-server
Configure Grafana data source
Add Prometheus as a data source in Grafana for ClickHouse metrics visualization.
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://localhost:9090
isDefault: true
editable: true
jsonData:
timeInterval: "10s"
queryTimeout: "60s"
Create ClickHouse system metrics dashboard
Configure a comprehensive dashboard for ClickHouse system monitoring and performance metrics.
{
"dashboard": {
"id": null,
"title": "ClickHouse System Metrics",
"tags": ["clickhouse", "database"],
"timezone": "browser",
"panels": [
{
"id": 1,
"title": "Query Rate",
"type": "stat",
"targets": [
{
"expr": "rate(ClickHouseProfileEvents_Query[5m])",
"refId": "A"
}
],
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 0}
},
{
"id": 2,
"title": "Memory Usage",
"type": "graph",
"targets": [
{
"expr": "ClickHouseMetrics_MemoryTracking",
"refId": "A"
}
],
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 0}
},
{
"id": 3,
"title": "Active Connections",
"type": "graph",
"targets": [
{
"expr": "ClickHouseMetrics_HTTPConnection + ClickHouseMetrics_TCPConnection",
"refId": "A"
}
],
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 8}
},
{
"id": 4,
"title": "Disk Usage",
"type": "graph",
"targets": [
{
"expr": "ClickHouseAsyncMetrics_DiskTotal_default - ClickHouseAsyncMetrics_DiskAvailable_default",
"refId": "A"
}
],
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 8}
}
],
"time": {
"from": "now-1h",
"to": "now"
},
"refresh": "10s"
}
}
Configure query performance monitoring
Set up monitoring for ClickHouse query performance and slow query detection.
secure_password_123
::/0
readonly
default
system
1
1
1
Create alerting rules
Configure Prometheus alerting rules for ClickHouse health and performance monitoring.
groups:
- name: clickhouse
rules:
- alert: ClickHouseDown
expr: up{job="clickhouse"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "ClickHouse instance is down"
description: "ClickHouse instance {{ $labels.instance }} has been down for more than 1 minute."
- alert: ClickHouseHighMemoryUsage
expr: ClickHouseMetrics_MemoryTracking > 8589934592 # 8GB
for: 5m
labels:
severity: warning
annotations:
summary: "ClickHouse high memory usage"
description: "ClickHouse instance {{ $labels.instance }} is using {{ humanize $value }} bytes of memory."
- alert: ClickHouseSlowQueries
expr: rate(ClickHouseProfileEvents_SlowRead[5m]) > 0.1
for: 2m
labels:
severity: warning
annotations:
summary: "ClickHouse slow queries detected"
description: "ClickHouse instance {{ $labels.instance }} has {{ $value }} slow queries per second."
- alert: ClickHouseHighDiskUsage
expr: ((ClickHouseAsyncMetrics_DiskTotal_default - ClickHouseAsyncMetrics_DiskAvailable_default) / ClickHouseAsyncMetrics_DiskTotal_default) * 100 > 85
for: 10m
labels:
severity: warning
annotations:
summary: "ClickHouse high disk usage"
description: "ClickHouse instance {{ $labels.instance }} disk usage is above 85%."
- alert: ClickHouseReplicationLag
expr: ClickHouseMetrics_ReplicasMaxQueueSize > 100
for: 5m
labels:
severity: critical
annotations:
summary: "ClickHouse replication lag"
description: "ClickHouse replica {{ $labels.instance }} has {{ $value }} items in replication queue."
Install and configure Alertmanager
Set up Alertmanager for handling alerts from Prometheus rules.
sudo apt install -y prometheus-alertmanager
sudo systemctl enable --now prometheus-alertmanager
Configure Alertmanager notifications
Set up email notifications for ClickHouse alerts with proper routing and templates.
global:
smtp_smarthost: 'mail.example.com:587'
smtp_from: 'alerts@example.com'
smtp_auth_username: 'alerts@example.com'
smtp_auth_password: 'email_password_123'
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'web.hook'
routes:
- match:
severity: critical
receiver: 'critical-alerts'
- match:
severity: warning
receiver: 'warning-alerts'
receivers:
- name: 'web.hook'
email_configs:
- to: 'admin@example.com'
subject: 'ClickHouse Alert: {{ .GroupLabels.alertname }}'
body: |
{{ range .Alerts }}
Alert: {{ .Annotations.summary }}
Description: {{ .Annotations.description }}
{{ end }}
- name: 'critical-alerts'
email_configs:
- to: 'critical@example.com'
subject: 'CRITICAL: ClickHouse Alert'
body: |
{{ range .Alerts }}
CRITICAL ALERT: {{ .Annotations.summary }}
{{ .Annotations.description }}
{{ end }}
- name: 'warning-alerts'
email_configs:
- to: 'warnings@example.com'
subject: 'WARNING: ClickHouse Alert'
body: |
{{ range .Alerts }}
Warning: {{ .Annotations.summary }}
{{ .Annotations.description }}
{{ end }}
Restart and enable services
Start all monitoring services and enable them to start on system boot.
sudo systemctl restart clickhouse-server
sudo systemctl restart prometheus
sudo systemctl restart grafana-server
sudo systemctl restart prometheus-alertmanager
sudo systemctl enable clickhouse-server prometheus grafana-server prometheus-alertmanager
Configure firewall rules
Open necessary ports for monitoring services while maintaining security.
sudo ufw allow 3000/tcp # Grafana
sudo ufw allow 9090/tcp # Prometheus
sudo ufw allow 9363/tcp # ClickHouse metrics
sudo ufw allow 9093/tcp # Alertmanager
sudo ufw reload
Verify your setup
Check that all monitoring components are running correctly and collecting metrics.
sudo systemctl status prometheus grafana-server clickhouse-server prometheus-alertmanager
Test ClickHouse metrics endpoint
curl -s http://localhost:9363/metrics | head -10
Check Prometheus targets
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | select(.job=="clickhouse") | .health'
Test ClickHouse monitoring user
clickhouse-client --user monitoring --password secure_password_123 --query "SELECT count() FROM system.metrics"
Verify Grafana is accessible
curl -s http://localhost:3000/api/health
Configure advanced query monitoring
Set up query log analysis
Configure detailed query logging and analysis for performance monitoring.
system
query_log
toYYYYMM(event_date)
7500
system
query_thread_log
toYYYYMM(event_date)
7500
Create query performance dashboard
Add dashboard panels for detailed query performance analysis and slow query identification.
{
"dashboard": {
"id": null,
"title": "ClickHouse Query Performance",
"tags": ["clickhouse", "queries"],
"panels": [
{
"id": 1,
"title": "Query Duration Distribution",
"type": "histogram",
"targets": [
{
"expr": "histogram_quantile(0.95, rate(ClickHouseProfileEvents_QueryTimeMicroseconds[5m]))",
"refId": "A",
"legendFormat": "95th percentile"
}
]
},
{
"id": 2,
"title": "Failed Queries Rate",
"type": "graph",
"targets": [
{
"expr": "rate(ClickHouseProfileEvents_FailedQuery[5m])",
"refId": "A"
}
]
}
]
}
}
Common issues
| Symptom | Cause | Fix |
|---|---|---|
| Prometheus cannot scrape ClickHouse metrics | Metrics endpoint not enabled or firewall blocking | Check prometheus.xml config and firewall rules |
| Authentication failed for monitoring user | User not created or wrong password | Recreate monitoring user with correct permissions |
| Grafana dashboards show no data | Prometheus data source misconfigured | Verify Prometheus URL in Grafana data source settings |
| Alerts not firing | Alertmanager not connected to Prometheus | Check alertmanager configuration in prometheus.yml |
| High memory usage alerts | ClickHouse memory settings too high | Adjust max_memory_usage settings in ClickHouse config |
| Replication lag alerts | Network issues between replicas | Check network connectivity and replica status |
Next steps
- Configure ClickHouse users and RBAC for production environments
- Implement automated ClickHouse backups with S3 storage
- Set up ClickHouse and Kafka real-time data pipeline
- Configure ClickHouse high availability clustering with replication
- Optimize ClickHouse performance for large datasets with partitioning
Automated install script
Run this to automate the entire setup
#!/usr/bin/env bash
set -euo pipefail
# ClickHouse Monitoring Setup with Prometheus and Grafana
# Usage: ./clickhouse-monitoring-setup.sh [clickhouse_host] [monitoring_password]
# Color codes for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'
# Configuration variables
CLICKHOUSE_HOST="${1:-localhost}"
MONITORING_PASSWORD="${2:-$(openssl rand -base64 32)}"
PROMETHEUS_PORT="9090"
GRAFANA_PORT="3000"
CLICKHOUSE_METRICS_PORT="9363"
CLICKHOUSE_HTTP_PORT="8123"
usage() {
echo "Usage: $0 [clickhouse_host] [monitoring_password]"
echo " clickhouse_host: ClickHouse server hostname/IP (default: localhost)"
echo " monitoring_password: Password for monitoring user (default: auto-generated)"
exit 1
}
log() {
echo -e "${GREEN}[$(date +'%Y-%m-%d %H:%M:%S')] $1${NC}"
}
warn() {
echo -e "${YELLOW}[WARNING] $1${NC}"
}
error() {
echo -e "${RED}[ERROR] $1${NC}"
exit 1
}
cleanup() {
warn "Script failed. Cleaning up..."
systemctl stop prometheus grafana-server 2>/dev/null || true
}
trap cleanup ERR
# Detect OS and set package manager
detect_os() {
if [ ! -f /etc/os-release ]; then
error "Cannot detect OS. /etc/os-release not found."
fi
. /etc/os-release
case "$ID" in
ubuntu|debian)
PKG_MGR="apt"
PKG_UPDATE="apt update"
PKG_INSTALL="apt install -y"
PROMETHEUS_CONFIG="/etc/prometheus/prometheus.yml"
GRAFANA_CONFIG="/etc/grafana/grafana.ini"
;;
almalinux|rocky|centos|rhel|ol|fedora)
PKG_MGR="dnf"
PKG_UPDATE="dnf check-update || true"
PKG_INSTALL="dnf install -y"
PROMETHEUS_CONFIG="/etc/prometheus/prometheus.yml"
GRAFANA_CONFIG="/etc/grafana/grafana.ini"
;;
amzn)
PKG_MGR="yum"
PKG_UPDATE="yum check-update || true"
PKG_INSTALL="yum install -y"
PROMETHEUS_CONFIG="/etc/prometheus/prometheus.yml"
GRAFANA_CONFIG="/etc/grafana/grafana.ini"
;;
*)
error "Unsupported distribution: $ID"
;;
esac
log "Detected OS: $ID using $PKG_MGR"
}
check_prerequisites() {
log "[1/8] Checking prerequisites..."
if [ "$EUID" -ne 0 ]; then
error "Please run as root or with sudo"
fi
command -v systemctl >/dev/null 2>&1 || error "systemd is required"
command -v openssl >/dev/null 2>&1 || error "openssl is required"
# Check if ClickHouse is accessible
if ! command -v clickhouse-client >/dev/null 2>&1; then
warn "clickhouse-client not found. Ensure ClickHouse is installed."
fi
}
install_prometheus() {
log "[2/8] Installing and configuring Prometheus..."
$PKG_UPDATE
case "$PKG_MGR" in
apt)
$PKG_INSTALL prometheus
;;
dnf|yum)
$PKG_INSTALL prometheus2
# Create symlink for consistency
[ -L /usr/bin/prometheus ] || ln -s /usr/bin/prometheus2 /usr/bin/prometheus 2>/dev/null || true
;;
esac
systemctl enable prometheus
}
configure_clickhouse_metrics() {
log "[3/8] Configuring ClickHouse metrics endpoint..."
CLICKHOUSE_CONFIG_DIR="/etc/clickhouse-server/config.d"
mkdir -p "$CLICKHOUSE_CONFIG_DIR"
cat > "$CLICKHOUSE_CONFIG_DIR/prometheus_metrics.xml" << EOF
<clickhouse>
<prometheus>
<endpoint>/metrics</endpoint>
<port>$CLICKHOUSE_METRICS_PORT</port>
<metrics>true</metrics>
<events>true</events>
<asynchronous_metrics>true</asynchronous_metrics>
</prometheus>
</clickhouse>
EOF
chown clickhouse:clickhouse "$CLICKHOUSE_CONFIG_DIR/prometheus_metrics.xml" 2>/dev/null || true
chmod 644 "$CLICKHOUSE_CONFIG_DIR/prometheus_metrics.xml"
# Restart ClickHouse if running
if systemctl is-active --quiet clickhouse-server; then
systemctl restart clickhouse-server
sleep 5
fi
}
configure_prometheus() {
log "[4/8] Configuring Prometheus scrape targets..."
mkdir -p /etc/prometheus/rules
chown prometheus:prometheus /etc/prometheus/rules 2>/dev/null || true
chmod 755 /etc/prometheus/rules
cat > "$PROMETHEUS_CONFIG" << EOF
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "/etc/prometheus/rules/*.yml"
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:$PROMETHEUS_PORT']
- job_name: 'clickhouse-metrics'
static_configs:
- targets: ['$CLICKHOUSE_HOST:$CLICKHOUSE_METRICS_PORT']
scrape_interval: 10s
metrics_path: /metrics
- job_name: 'clickhouse-system'
static_configs:
- targets: ['$CLICKHOUSE_HOST:$CLICKHOUSE_HTTP_PORT']
scrape_interval: 30s
metrics_path: /
params:
query: ['SELECT concat(metric, \' \', toString(value)) FROM system.metrics FORMAT LineAsString']
basic_auth:
username: monitoring
password: '$MONITORING_PASSWORD'
EOF
chown prometheus:prometheus "$PROMETHEUS_CONFIG" 2>/dev/null || true
chmod 644 "$PROMETHEUS_CONFIG"
}
create_monitoring_user() {
log "[5/8] Creating ClickHouse monitoring user..."
if command -v clickhouse-client >/dev/null 2>&1; then
clickhouse-client --host "$CLICKHOUSE_HOST" --query "CREATE USER IF NOT EXISTS monitoring IDENTIFIED BY '$MONITORING_PASSWORD'" || warn "Failed to create monitoring user"
clickhouse-client --host "$CLICKHOUSE_HOST" --query "GRANT SELECT ON system.* TO monitoring" || warn "Failed to grant system permissions"
clickhouse-client --host "$CLICKHOUSE_HOST" --query "GRANT SELECT ON INFORMATION_SCHEMA.* TO monitoring" || warn "Failed to grant information schema permissions"
else
warn "ClickHouse client not available. Please manually create monitoring user."
fi
}
install_grafana() {
log "[6/8] Installing Grafana..."
case "$PKG_MGR" in
apt)
$PKG_INSTALL apt-transport-https software-properties-common wget
wget -q -O /usr/share/keyrings/grafana.key https://apt.grafana.com/gpg.key
echo "deb [signed-by=/usr/share/keyrings/grafana.key] https://apt.grafana.com stable main" | tee /etc/apt/sources.list.d/grafana.list
apt update
$PKG_INSTALL grafana
;;
dnf)
cat > /etc/yum.repos.d/grafana.repo << 'EOF'
[grafana]
name=grafana
baseurl=https://rpm.grafana.com
repo_gpgcheck=1
enabled=1
gpgcheck=1
gpgkey=https://rpm.grafana.com/gpg.key
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
EOF
$PKG_INSTALL grafana
;;
yum)
cat > /etc/yum.repos.d/grafana.repo << 'EOF'
[grafana]
name=grafana
baseurl=https://rpm.grafana.com
repo_gpgcheck=1
enabled=1
gpgcheck=1
gpgkey=https://rpm.grafana.com/gpg.key
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
EOF
$PKG_INSTALL grafana
;;
esac
systemctl enable grafana-server
}
configure_grafana() {
log "[7/8] Configuring Grafana data source..."
mkdir -p /etc/grafana/provisioning/datasources
mkdir -p /etc/grafana/provisioning/dashboards
cat > /etc/grafana/provisioning/datasources/prometheus.yml << EOF
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://localhost:$PROMETHEUS_PORT
isDefault: true
editable: true
jsonData:
timeInterval: "10s"
queryTimeout: "60s"
EOF
chown -R grafana:grafana /etc/grafana/provisioning 2>/dev/null || true
chmod -R 644 /etc/grafana/provisioning/datasources/*.yml
chmod 755 /etc/grafana/provisioning/datasources /etc/grafana/provisioning/dashboards
}
start_services() {
log "[8/8] Starting services..."
systemctl start prometheus
systemctl start grafana-server
# Wait for services to start
sleep 10
# Configure firewall if available
if command -v ufw >/dev/null 2>&1 && ufw status | grep -q "Status: active"; then
ufw allow $PROMETHEUS_PORT/tcp
ufw allow $GRAFANA_PORT/tcp
ufw allow $CLICKHOUSE_METRICS_PORT/tcp
elif command -v firewall-cmd >/dev/null 2>&1 && systemctl is-active --quiet firewalld; then
firewall-cmd --permanent --add-port=$PROMETHEUS_PORT/tcp
firewall-cmd --permanent --add-port=$GRAFANA_PORT/tcp
firewall-cmd --permanent --add-port=$CLICKHOUSE_METRICS_PORT/tcp
firewall-cmd --reload
fi
}
verify_installation() {
log "Verifying installation..."
# Check if services are running
if systemctl is-active --quiet prometheus; then
log "✓ Prometheus is running"
else
error "✗ Prometheus is not running"
fi
if systemctl is-active --quiet grafana-server; then
log "✓ Grafana is running"
else
error "✗ Grafana is not running"
fi
# Test endpoints
if curl -s "http://localhost:$PROMETHEUS_PORT/api/v1/query?query=up" >/dev/null; then
log "✓ Prometheus API is responding"
else
warn "✗ Prometheus API is not responding"
fi
log "Installation completed successfully!"
log "Access URLs:"
log " Prometheus: http://localhost:$PROMETHEUS_PORT"
log " Grafana: http://localhost:$GRAFANA_PORT (admin/admin)"
log "ClickHouse monitoring user password: $MONITORING_PASSWORD"
}
main() {
detect_os
check_prerequisites
install_prometheus
configure_clickhouse_metrics
configure_prometheus
create_monitoring_user
install_grafana
configure_grafana
start_services
verify_installation
}
main "$@"
Review the script before running. Execute with: bash install.sh