Set up Thanos Receiver for remote write scalability with Prometheus integration

Advanced 45 min Apr 08, 2026 106 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Configure Thanos Receiver to handle high-volume remote write traffic from multiple Prometheus instances. This tutorial covers installation, multi-tenancy setup, and performance optimization for large-scale metrics ingestion.

Prerequisites

  • Prometheus server running
  • At least 4GB RAM available
  • 50GB+ disk space for metrics storage
  • Network access to object storage (optional)

What this solves

Thanos Receiver provides scalable remote write capabilities for Prometheus deployments, allowing you to centralize metrics from multiple Prometheus instances. It solves the problem of limited storage capacity and write throughput when dealing with high-volume metrics ingestion across distributed environments.

Step-by-step installation

Update system packages

Start by updating your package manager to ensure you have the latest versions.

sudo apt update && sudo apt upgrade -y
sudo dnf update -y

Create Thanos user and directories

Create a dedicated system user for running Thanos components and set up necessary directories.

sudo useradd --system --no-create-home --shell /bin/false thanos
sudo mkdir -p /opt/thanos/{bin,data,config}
sudo mkdir -p /var/log/thanos
sudo chown -R thanos:thanos /opt/thanos /var/log/thanos

Download and install Thanos

Download the latest Thanos binary and install it to the appropriate location.

cd /tmp
wget https://github.com/thanos-io/thanos/releases/download/v0.34.1/thanos-0.34.1.linux-amd64.tar.gz
tar -xzf thanos-0.34.1.linux-amd64.tar.gz
sudo cp thanos-0.34.1.linux-amd64/thanos /opt/thanos/bin/
sudo chown thanos:thanos /opt/thanos/bin/thanos
sudo chmod +x /opt/thanos/bin/thanos

Create Thanos Receiver configuration

Set up the main configuration file for Thanos Receiver with multi-tenancy and resource limits.

remote_write:
  # Enable multi-tenancy
  tenant_header: "X-Thanos-Tenant"
  default_tenant_id: "default"
  
  # Resource limits
  request_logging_config:
    decision:
      log_start: false
      log_end: false
  
  # Ingestion limits
  limits:
    write_timeout: 30s
    request_samples: 10000
    request_series: 1000
    tenant_samples_per_second: 10000
    tenant_series_per_second: 1000
    
  # Storage configuration
  storage:
    tsdb:
      path: "/opt/thanos/data/receive"
      retention: "15d"
      min_block_duration: "2h"
      max_block_duration: "2h"
      

Object storage configuration (optional)

object_store: type: S3 config: bucket: "thanos-storage" endpoint: "s3.amazonaws.com" region: "us-east-1" access_key: "your-access-key" secret_key: "your-secret-key"

Configure object storage for long-term retention

Create a storage configuration file for object storage backend integration.

type: S3
config:
  bucket: "thanos-metrics-storage"
  endpoint: "s3.amazonaws.com"
  region: "us-east-1"
  access_key: "AKIA..."
  secret_key: "your-secret-key"
  insecure: false
  signature_version2: false
  encrypt_sse: false
  put_user_metadata: {}
  http_config:
    idle_conn_timeout: 90s
    response_header_timeout: 2m
    insecure_skip_verify: false
  trace:
    enable: false
  part_size: 67108864

Set correct file permissions

Apply secure permissions to configuration files and data directories.

sudo chown thanos:thanos /opt/thanos/config/*.yml
sudo chmod 640 /opt/thanos/config/*.yml
sudo chmod 750 /opt/thanos/data
Never use chmod 777. It gives every user on the system full access to your files. Instead, fix ownership with chown and use minimal permissions.

Create systemd service file

Set up a systemd service for managing the Thanos Receiver process.

[Unit]
Description=Thanos Receiver
Wants=network-online.target
After=network-online.target

[Service]
Type=simple
User=thanos
Group=thanos
ExecStart=/opt/thanos/bin/thanos receive \
    --grpc-address=0.0.0.0:10907 \
    --http-address=0.0.0.0:10909 \
    --remote-write.address=0.0.0.0:19291 \
    --tsdb.path=/opt/thanos/data/receive \
    --tsdb.retention=15d \
    --objstore.config-file=/opt/thanos/config/bucket.yml \
    --label=receive="true" \
    --label=replica="receiver-1" \
    --receive.replication-factor=1 \
    --receive.hashrings-file=/opt/thanos/config/hashring.json \
    --receive.local-endpoint=127.0.0.1:10907 \
    --receive.tenant-header=X-Thanos-Tenant \
    --receive.default-tenant-id=default \
    --log.level=info \
    --log.format=logfmt

Restart=on-failure
RestartSec=5
TimeoutStopSec=30
SyslogIdentifier=thanos-receiver
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target

Create hashring configuration for multi-tenancy

Configure hashring for distributing tenants across receiver instances.

[
  {
    "hashring": "default",
    "tenants": ["default", "tenant-a", "tenant-b"],
    "endpoints": [
      "127.0.0.1:10907"
    ],
    "algorithm": "ketama"
  }
]

Configure firewall rules

Open necessary ports for Thanos Receiver communication.

sudo ufw allow 10907/tcp comment "Thanos Receiver gRPC"
sudo ufw allow 10909/tcp comment "Thanos Receiver HTTP"
sudo ufw allow 19291/tcp comment "Thanos Receiver Remote Write"
sudo ufw reload
sudo firewall-cmd --permanent --add-port=10907/tcp
sudo firewall-cmd --permanent --add-port=10909/tcp
sudo firewall-cmd --permanent --add-port=19291/tcp
sudo firewall-cmd --reload

Enable and start Thanos Receiver

Start the Thanos Receiver service and enable it to start on boot.

sudo systemctl daemon-reload
sudo systemctl enable thanos-receiver
sudo systemctl start thanos-receiver
sudo systemctl status thanos-receiver

Configure Prometheus remote write integration

Configure Prometheus for remote write

Update your Prometheus configuration to send metrics to Thanos Receiver with tenant identification.

global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    cluster: 'production'
    replica: 'prometheus-1'

remote_write:
  - url: "http://localhost:19291/api/v1/receive"
    headers:
      X-Thanos-Tenant: "production"
    queue_config:
      capacity: 10000
      max_shards: 200
      min_shards: 1
      max_samples_per_send: 5000
      batch_send_deadline: 5s
      min_backoff: 30ms
      max_backoff: 100ms
      retry_on_http_429: true
    write_relabel_configs:
      - source_labels: [__name__]
        regex: 'up|process_.|go_.'
        action: drop
        
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
        
  - job_name: 'node-exporter'
    static_configs:
      - targets: ['localhost:9100']

Restart Prometheus to apply configuration

Reload Prometheus configuration to start sending metrics to Thanos Receiver.

sudo systemctl reload prometheus
sudo systemctl status prometheus

Set up monitoring and alerting

Configure Grafana dashboard for Thanos Receiver

Create a monitoring dashboard to track receiver performance and ingestion rates.

{
  "dashboard": {
    "title": "Thanos Receiver Monitoring",
    "panels": [
      {
        "title": "Ingestion Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(prometheus_remote_storage_samples_total[5m])",
            "legendFormat": "Samples/sec"
          }
        ]
      },
      {
        "title": "Active Tenants",
        "type": "stat",
        "targets": [
          {
            "expr": "thanos_receive_tenants",
            "legendFormat": "Tenants"
          }
        ]
      }
    ]
  }
}

Create alerting rules for receiver health

Set up Prometheus alerting rules to monitor receiver performance and availability.

groups:
  - name: thanos.receiver
    rules:
      - alert: ThanosReceiverDown
        expr: up{job="thanos-receiver"} == 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Thanos Receiver is down"
          description: "Thanos Receiver has been down for more than 5 minutes."
          
      - alert: ThanosReceiverHighIngestionRate
        expr: rate(prometheus_remote_storage_samples_total[5m]) > 50000
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "High ingestion rate detected"
          description: "Thanos Receiver is ingesting {{ $value }} samples/sec, which is above the threshold."
          
      - alert: ThanosReceiverDiskSpaceHigh
        expr: (1 - (node_filesystem_free_bytes{mountpoint="/opt/thanos/data"} / node_filesystem_size_bytes{mountpoint="/opt/thanos/data"})) * 100 > 85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Thanos Receiver disk space usage high"
          description: "Disk space usage is above 85% on Thanos Receiver data directory."

Optimize performance and troubleshooting

Configure resource limits and optimization

Fine-tune Thanos Receiver for high-throughput environments by adjusting resource limits.

# Receiver-specific limits
receive:
  limits:
    # Per-tenant limits
    tenant_limits:
      default:
        request_rate: 10000  # requests per second
        request_burst: 20000  # burst capacity
        ingestion_rate: 50000  # samples per second
        ingestion_burst: 100000  # burst samples
        
      production:
        request_rate: 20000
        request_burst: 40000
        ingestion_rate: 100000
        ingestion_burst: 200000
        
    # Global limits
    global:
      max_concurrent_requests: 1000
      timeout: 30s
      

TSDB optimization

tsdb: wal_compression: true head_chunks_write_buffer_size: 4194304 max_exemplars: 100000

Set up log rotation

Configure log rotation to prevent disk space issues with receiver logs.

/var/log/thanos/*.log {
    daily
    missingok
    rotate 30
    compress
    delaycompress
    notifempty
    create 0644 thanos thanos
    postrotate
        systemctl reload thanos-receiver
    endscript
}

Verify your setup

Check that Thanos Receiver is running correctly and ingesting metrics from Prometheus.

# Check service status
sudo systemctl status thanos-receiver

Verify receiver is listening on ports

sudo netstat -tlnp | grep -E '(10907|10909|19291)'

Check receiver metrics endpoint

curl http://localhost:10909/metrics | grep thanos_receive

Verify remote write is working

curl http://localhost:10909/api/v1/query?query=up

Check logs for any errors

sudo journalctl -u thanos-receiver -f --since "10 minutes ago"

Common issues

Symptom Cause Fix
Receiver fails to start Configuration syntax errors Check config with /opt/thanos/bin/thanos receive --help and validate YAML syntax
Remote write timeout errors High ingestion load or network issues Increase write_timeout in receiver config and optimize Prometheus queue_config
Disk space fills up quickly Retention policy too long or high ingestion rate Reduce tsdb.retention or configure object storage for long-term storage
Multi-tenancy not working Missing tenant header configuration Verify X-Thanos-Tenant header is set in Prometheus remote_write config
High memory usage Large number of active series Implement series limiting in receiver config and optimize Prometheus scraping
Connection refused errors Firewall blocking ports Verify firewall rules allow ports 10907, 10909, and 19291

Next steps

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle managed devops services for businesses that depend on uptime. From initial setup to ongoing operations.