Set up ClickHouse monitoring with Prometheus and Grafana dashboards

Intermediate 45 min Apr 12, 2026 94 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Configure comprehensive ClickHouse monitoring using Prometheus for metrics collection and Grafana for visualization. Set up system metrics, query performance monitoring, and alerting rules for production ClickHouse deployments.

Prerequisites

  • ClickHouse server installed and running
  • Root or sudo access
  • At least 4GB RAM available
  • Network connectivity for package installation

What this solves

ClickHouse requires comprehensive monitoring to track query performance, resource utilization, and system health in production environments. This tutorial sets up Prometheus to collect ClickHouse metrics and configures Grafana dashboards for visualization and alerting. You'll implement monitoring for system metrics, query performance, and create alerting rules for proactive issue detection.

Step-by-step configuration

Install and configure Prometheus

Install Prometheus server to collect metrics from ClickHouse instances.

sudo apt update
sudo apt install -y prometheus
sudo systemctl enable prometheus
sudo dnf install -y prometheus
sudo systemctl enable prometheus

Configure ClickHouse metrics endpoint

Enable the Prometheus metrics endpoint in ClickHouse configuration to expose internal metrics.



    
        /metrics
        9363
        true
        true
        true
    

Configure Prometheus scrape configuration

Add ClickHouse targets to Prometheus configuration for automatic metrics collection.

global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "/etc/prometheus/rules/*.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - localhost:9093

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'clickhouse'
    static_configs:
      - targets: ['localhost:9363']
    scrape_interval: 10s
    metrics_path: /metrics
    params:
      format: ['prometheus']

  - job_name: 'clickhouse-system'
    static_configs:
      - targets: ['localhost:8123']
    scrape_interval: 30s
    metrics_path: /
    params:
      query: ['SELECT metric, value FROM system.metrics FORMAT Prometheus']
    basic_auth:
      username: monitoring
      password: secure_password_123

Create ClickHouse monitoring user

Create a dedicated user for Prometheus to query ClickHouse system tables securely.

clickhouse-client --query "CREATE USER monitoring IDENTIFIED BY 'secure_password_123'"
clickhouse-client --query "GRANT SELECT ON system.* TO monitoring"
clickhouse-client --query "GRANT SELECT ON INFORMATION_SCHEMA.* TO monitoring"

Install and configure Grafana

Install Grafana for creating dashboards and visualization of ClickHouse metrics.

wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt update
sudo apt install -y grafana
sudo systemctl enable --now grafana-server
sudo dnf install -y https://dl.grafana.com/oss/release/grafana-10.1.0-1.x86_64.rpm
sudo systemctl enable --now grafana-server

Configure Grafana data source

Add Prometheus as a data source in Grafana for ClickHouse metrics visualization.

apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://localhost:9090
    isDefault: true
    editable: true
    jsonData:
      timeInterval: "10s"
      queryTimeout: "60s"

Create ClickHouse system metrics dashboard

Configure a comprehensive dashboard for ClickHouse system monitoring and performance metrics.

{
  "dashboard": {
    "id": null,
    "title": "ClickHouse System Metrics",
    "tags": ["clickhouse", "database"],
    "timezone": "browser",
    "panels": [
      {
        "id": 1,
        "title": "Query Rate",
        "type": "stat",
        "targets": [
          {
            "expr": "rate(ClickHouseProfileEvents_Query[5m])",
            "refId": "A"
          }
        ],
        "gridPos": {"h": 8, "w": 12, "x": 0, "y": 0}
      },
      {
        "id": 2,
        "title": "Memory Usage",
        "type": "graph",
        "targets": [
          {
            "expr": "ClickHouseMetrics_MemoryTracking",
            "refId": "A"
          }
        ],
        "gridPos": {"h": 8, "w": 12, "x": 12, "y": 0}
      },
      {
        "id": 3,
        "title": "Active Connections",
        "type": "graph",
        "targets": [
          {
            "expr": "ClickHouseMetrics_HTTPConnection + ClickHouseMetrics_TCPConnection",
            "refId": "A"
          }
        ],
        "gridPos": {"h": 8, "w": 12, "x": 0, "y": 8}
      },
      {
        "id": 4,
        "title": "Disk Usage",
        "type": "graph",
        "targets": [
          {
            "expr": "ClickHouseAsyncMetrics_DiskTotal_default - ClickHouseAsyncMetrics_DiskAvailable_default",
            "refId": "A"
          }
        ],
        "gridPos": {"h": 8, "w": 12, "x": 12, "y": 8}
      }
    ],
    "time": {
      "from": "now-1h",
      "to": "now"
    },
    "refresh": "10s"
  }
}

Configure query performance monitoring

Set up monitoring for ClickHouse query performance and slow query detection.



    
        
            secure_password_123
            
                ::/0
            
            readonly
            default
            
                system
            
        
    
    
        
            1
            1
            1
        
    

Create alerting rules

Configure Prometheus alerting rules for ClickHouse health and performance monitoring.

groups:
  - name: clickhouse
    rules:
      - alert: ClickHouseDown
        expr: up{job="clickhouse"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "ClickHouse instance is down"
          description: "ClickHouse instance {{ $labels.instance }} has been down for more than 1 minute."

      - alert: ClickHouseHighMemoryUsage
        expr: ClickHouseMetrics_MemoryTracking > 8589934592  # 8GB
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "ClickHouse high memory usage"
          description: "ClickHouse instance {{ $labels.instance }} is using {{ humanize $value }} bytes of memory."

      - alert: ClickHouseSlowQueries
        expr: rate(ClickHouseProfileEvents_SlowRead[5m]) > 0.1
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "ClickHouse slow queries detected"
          description: "ClickHouse instance {{ $labels.instance }} has {{ $value }} slow queries per second."

      - alert: ClickHouseHighDiskUsage
        expr: ((ClickHouseAsyncMetrics_DiskTotal_default - ClickHouseAsyncMetrics_DiskAvailable_default) / ClickHouseAsyncMetrics_DiskTotal_default) * 100 > 85
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "ClickHouse high disk usage"
          description: "ClickHouse instance {{ $labels.instance }} disk usage is above 85%."

      - alert: ClickHouseReplicationLag
        expr: ClickHouseMetrics_ReplicasMaxQueueSize > 100
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "ClickHouse replication lag"
          description: "ClickHouse replica {{ $labels.instance }} has {{ $value }} items in replication queue."

Install and configure Alertmanager

Set up Alertmanager for handling alerts from Prometheus rules.

sudo apt install -y prometheus-alertmanager
sudo systemctl enable --now prometheus-alertmanager
sudo dnf install -y alertmanager
sudo systemctl enable --now alertmanager

Configure Alertmanager notifications

Set up email notifications for ClickHouse alerts with proper routing and templates.

global:
  smtp_smarthost: 'mail.example.com:587'
  smtp_from: 'alerts@example.com'
  smtp_auth_username: 'alerts@example.com'
  smtp_auth_password: 'email_password_123'

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'web.hook'
  routes:
  - match:
      severity: critical
    receiver: 'critical-alerts'
  - match:
      severity: warning
    receiver: 'warning-alerts'

receivers:
  • name: 'web.hook'
email_configs: - to: 'admin@example.com' subject: 'ClickHouse Alert: {{ .GroupLabels.alertname }}' body: | {{ range .Alerts }} Alert: {{ .Annotations.summary }} Description: {{ .Annotations.description }} {{ end }}
  • name: 'critical-alerts'
email_configs: - to: 'critical@example.com' subject: 'CRITICAL: ClickHouse Alert' body: | {{ range .Alerts }} CRITICAL ALERT: {{ .Annotations.summary }} {{ .Annotations.description }} {{ end }}
  • name: 'warning-alerts'
email_configs: - to: 'warnings@example.com' subject: 'WARNING: ClickHouse Alert' body: | {{ range .Alerts }} Warning: {{ .Annotations.summary }} {{ .Annotations.description }} {{ end }}

Restart and enable services

Start all monitoring services and enable them to start on system boot.

sudo systemctl restart clickhouse-server
sudo systemctl restart prometheus
sudo systemctl restart grafana-server
sudo systemctl restart prometheus-alertmanager

sudo systemctl enable clickhouse-server prometheus grafana-server prometheus-alertmanager

Configure firewall rules

Open necessary ports for monitoring services while maintaining security.

sudo ufw allow 3000/tcp  # Grafana
sudo ufw allow 9090/tcp  # Prometheus
sudo ufw allow 9363/tcp  # ClickHouse metrics
sudo ufw allow 9093/tcp  # Alertmanager
sudo ufw reload
sudo firewall-cmd --permanent --add-port=3000/tcp  # Grafana
sudo firewall-cmd --permanent --add-port=9090/tcp  # Prometheus
sudo firewall-cmd --permanent --add-port=9363/tcp  # ClickHouse metrics
sudo firewall-cmd --permanent --add-port=9093/tcp  # Alertmanager
sudo firewall-cmd --reload

Verify your setup

Check that all monitoring components are running correctly and collecting metrics.

sudo systemctl status prometheus grafana-server clickhouse-server prometheus-alertmanager

Test ClickHouse metrics endpoint

curl -s http://localhost:9363/metrics | head -10

Check Prometheus targets

curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | select(.job=="clickhouse") | .health'

Test ClickHouse monitoring user

clickhouse-client --user monitoring --password secure_password_123 --query "SELECT count() FROM system.metrics"

Verify Grafana is accessible

curl -s http://localhost:3000/api/health
Note: Access Grafana at http://your-server:3000 with default credentials admin/admin. Change the password immediately after first login.

Configure advanced query monitoring

Set up query log analysis

Configure detailed query logging and analysis for performance monitoring.



    
        system
        query_log
toYYYYMM(event_date) 7500
system query_thread_log
toYYYYMM(event_date) 7500

Create query performance dashboard

Add dashboard panels for detailed query performance analysis and slow query identification.

{
  "dashboard": {
    "id": null,
    "title": "ClickHouse Query Performance",
    "tags": ["clickhouse", "queries"],
    "panels": [
      {
        "id": 1,
        "title": "Query Duration Distribution",
        "type": "histogram",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, rate(ClickHouseProfileEvents_QueryTimeMicroseconds[5m]))",
            "refId": "A",
            "legendFormat": "95th percentile"
          }
        ]
      },
      {
        "id": 2,
        "title": "Failed Queries Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(ClickHouseProfileEvents_FailedQuery[5m])",
            "refId": "A"
          }
        ]
      }
    ]
  }
}

Common issues

SymptomCauseFix
Prometheus cannot scrape ClickHouse metricsMetrics endpoint not enabled or firewall blockingCheck prometheus.xml config and firewall rules
Authentication failed for monitoring userUser not created or wrong passwordRecreate monitoring user with correct permissions
Grafana dashboards show no dataPrometheus data source misconfiguredVerify Prometheus URL in Grafana data source settings
Alerts not firingAlertmanager not connected to PrometheusCheck alertmanager configuration in prometheus.yml
High memory usage alertsClickHouse memory settings too highAdjust max_memory_usage settings in ClickHouse config
Replication lag alertsNetwork issues between replicasCheck network connectivity and replica status

Next steps

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle managed devops services for businesses that depend on uptime. From initial setup to ongoing operations.