Set up Telegraf custom plugins for application monitoring with Prometheus and InfluxDB integration

Intermediate 45 min May 23, 2026 28 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Learn to build custom Telegraf input plugins for application metrics collection, configure dual output to Prometheus and InfluxDB backends, and create comprehensive monitoring dashboards with Grafana for production observability.

Prerequisites

  • Root or sudo access
  • Python 3.6+ installed
  • At least 2GB RAM
  • Network access for package downloads

What this solves

Telegraf's built-in plugins don't always capture the specific metrics your applications generate. This tutorial shows you how to create custom input plugins for application-specific monitoring, configure dual output to both Prometheus and InfluxDB, and build Grafana dashboards for comprehensive observability.

Step-by-step installation

Install Telegraf agent

Start by installing Telegraf from the official InfluxData repository to ensure you get the latest features and security updates.

wget -qO- https://repos.influxdata.com/influxdata-archive_compat.key | sudo apt-key add -
echo "deb https://repos.influxdata.com/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/influxdata.list
sudo apt update
sudo apt install -y telegraf
sudo tee /etc/yum.repos.d/influxdata.repo <

Create custom input plugin directory

Set up a dedicated directory for your custom plugins and ensure proper permissions for the telegraf user to execute them.

sudo mkdir -p /etc/telegraf/scripts
sudo chown telegraf:telegraf /etc/telegraf/scripts
sudo chmod 755 /etc/telegraf/scripts

Build custom application metrics script

Create a custom script that gathers application-specific metrics. This example monitors a web application's response times and connection counts.

#!/usr/bin/env python3
import json
import time
import requests
import psutil
from datetime import datetime

def collect_app_metrics():
    metrics = {}
    timestamp = int(time.time() * 1000000000)  # nanoseconds for InfluxDB
    
    # Application response time check
    try:
        start_time = time.time()
        response = requests.get('http://localhost:8080/health', timeout=5)
        response_time = (time.time() - start_time) * 1000
        metrics['app_response_time'] = response_time
        metrics['app_status_code'] = response.status_code
        metrics['app_available'] = 1 if response.status_code == 200 else 0
    except Exception as e:
        metrics['app_response_time'] = 0
        metrics['app_status_code'] = 0
        metrics['app_available'] = 0
    
    # Database connection pool metrics
    try:
        db_response = requests.get('http://localhost:8080/metrics/db', timeout=2)
        if db_response.status_code == 200:
            db_data = db_response.json()
            metrics['db_active_connections'] = db_data.get('active_connections', 0)
            metrics['db_idle_connections'] = db_data.get('idle_connections', 0)
            metrics['db_max_connections'] = db_data.get('max_connections', 0)
    except:
        pass
    
    # Custom business metrics
    try:
        business_response = requests.get('http://localhost:8080/metrics/business', timeout=2)
        if business_response.status_code == 200:
            business_data = business_response.json()
            metrics['active_users'] = business_data.get('active_users', 0)
            metrics['orders_per_minute'] = business_data.get('orders_per_minute', 0)
            metrics['revenue_last_hour'] = business_data.get('revenue_last_hour', 0)
    except:
        pass
    
    # System resource usage for the application
    for proc in psutil.process_iter(['pid', 'name', 'cpu_percent', 'memory_info']):
        try:
            if 'myapp' in proc.info['name']:
                metrics['app_cpu_percent'] = proc.info['cpu_percent']
                metrics['app_memory_mb'] = proc.info['memory_info'].rss / 1024 / 1024
                break
        except (psutil.NoSuchProcess, psutil.AccessDenied):
            continue
    
    # Output in InfluxDB line protocol format
    tags = f"host={psutil.os.uname().nodename},app=myapp"
    fields = []
    for key, value in metrics.items():
        if isinstance(value, (int, float)):
            fields.append(f"{key}={value}")
        else:
            fields.append(f'{key}="{value}"')
    
    line = f"custom_app_metrics,{tags} {','.join(fields)} {timestamp}"
    print(line)

if __name__ == "__main__":
    collect_app_metrics()

Make the script executable

Set proper permissions and ownership for the custom metrics script.

sudo chmod 755 /etc/telegraf/scripts/app_metrics.py
sudo chown telegraf:telegraf /etc/telegraf/scripts/app_metrics.py

Install Python dependencies

Install required Python packages for the custom metrics script.

sudo apt install -y python3-pip python3-requests python3-psutil
sudo pip3 install requests psutil
sudo dnf install -y python3-pip python3-requests python3-psutil
sudo pip3 install requests psutil

Configure Telegraf with custom plugin and dual outputs

Create a comprehensive Telegraf configuration that includes your custom plugin and outputs to both Prometheus and InfluxDB.

# Global agent configuration
[agent]
  interval = "30s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = ""
  hostname = ""
  omit_hostname = false

Custom application metrics input plugin

[[inputs.exec]] commands = ["/etc/telegraf/scripts/app_metrics.py"] timeout = "10s" data_format = "influx" interval = "30s"

System metrics for context

[[inputs.cpu]] percpu = true totalcpu = true collect_cpu_time = false report_active = false [[inputs.disk]] ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"] [[inputs.diskio]] [[inputs.kernel]] [[inputs.mem]] [[inputs.processes]] [[inputs.swap]] [[inputs.system]]

Network interface monitoring

[[inputs.net]]

HTTP response time monitoring

[[inputs.http_response]] urls = ["http://localhost:8080/health"] response_timeout = "5s" method = "GET" follow_redirects = false

InfluxDB output

[[outputs.influxdb]] urls = ["http://localhost:8086"] database = "telegraf" retention_policy = "" write_consistency = "any" timeout = "5s" username = "telegraf_user" password = "your_influxdb_password"

Prometheus output

[[outputs.prometheus_client]] listen = ":9273" metric_version = 2 collectors_exclude = ["gocollector", "process"] string_as_label = false export_timestamp = false

Install and configure InfluxDB

Set up InfluxDB as one of your time-series backends for storing metrics data.

sudo apt install -y influxdb
sudo systemctl enable --now influxdb
sudo dnf install -y influxdb
sudo systemctl enable --now influxdb

Create InfluxDB database and user

Set up the database and user credentials for Telegraf to write metrics data.

influx
CREATE DATABASE telegraf
CREATE USER "telegraf_user" WITH PASSWORD 'your_influxdb_password'
GRANT ALL ON "telegraf" TO "telegraf_user"
EXIT

Install Prometheus

Set up Prometheus to scrape metrics from Telegraf's Prometheus output endpoint.

sudo useradd --no-create-home --shell /bin/false prometheus
sudo mkdir /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /etc/prometheus /var/lib/prometheus
cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
tar xzf prometheus-2.45.0.linux-amd64.tar.gz
sudo cp prometheus-2.45.0.linux-amd64/prometheus /usr/local/bin/
sudo cp prometheus-2.45.0.linux-amd64/promtool /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/prometheus /usr/local/bin/promtool
sudo useradd --no-create-home --shell /bin/false prometheus
sudo mkdir /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /etc/prometheus /var/lib/prometheus
cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
tar xzf prometheus-2.45.0.linux-amd64.tar.gz
sudo cp prometheus-2.45.0.linux-amd64/prometheus /usr/local/bin/
sudo cp prometheus-2.45.0.linux-amd64/promtool /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/prometheus /usr/local/bin/promtool

Configure Prometheus to scrape Telegraf

Set up Prometheus configuration to collect metrics from Telegraf's Prometheus endpoint.

global:
  scrape_interval: 30s
  evaluation_interval: 30s

scrape_configs:
  - job_name: 'telegraf'
    static_configs:
      - targets: ['localhost:9273']
    scrape_interval: 30s
    metrics_path: /metrics
    
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

Create Prometheus systemd service

Set up Prometheus as a systemd service for automatic startup and management.

[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
    --config.file /etc/prometheus/prometheus.yml \
    --storage.tsdb.path /var/lib/prometheus/ \
    --web.console.templates=/etc/prometheus/consoles \
    --web.console.libraries=/etc/prometheus/console_libraries \
    --web.listen-address=0.0.0.0:9090 \
    --web.enable-lifecycle

[Install]
WantedBy=multi-user.target

Install and configure Grafana

Set up Grafana for creating dashboards that visualize data from both Prometheus and InfluxDB.

sudo apt install -y software-properties-common
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list
sudo apt update
sudo apt install -y grafana
sudo tee /etc/yum.repos.d/grafana.repo <

Start all services

Enable and start all the monitoring stack components.

sudo systemctl enable --now telegraf
sudo systemctl enable --now prometheus
sudo systemctl enable --now grafana-server
sudo systemctl enable --now influxdb

Create Grafana dashboard configuration

Set up a comprehensive dashboard that displays metrics from both data sources.

{
  "dashboard": {
    "id": null,
    "title": "Custom Application Monitoring",
    "tags": ["telegraf", "custom", "application"],
    "timezone": "browser",
    "panels": [
      {
        "id": 1,
        "title": "Application Response Time",
        "type": "stat",
        "targets": [
          {
            "expr": "custom_app_metrics_app_response_time",
            "legendFormat": "Response Time (ms)",
            "refId": "A",
            "datasource": "Prometheus"
          }
        ],
        "gridPos": {"h": 8, "w": 12, "x": 0, "y": 0},
        "fieldConfig": {
          "defaults": {
            "unit": "ms",
            "thresholds": {
              "steps": [
                {"color": "green", "value": null},
                {"color": "yellow", "value": 100},
                {"color": "red", "value": 500}
              ]
            }
          }
        }
      },
      {
        "id": 2,
        "title": "Database Connections",
        "type": "graph",
        "targets": [
          {
            "query": "SELECT mean(\"db_active_connections\") FROM \"custom_app_metrics\" WHERE time >= now() - 1h GROUP BY time(1m) fill(null)",
            "refId": "A",
            "datasource": "InfluxDB"
          }
        ],
        "gridPos": {"h": 8, "w": 12, "x": 12, "y": 0}
      },
      {
        "id": 3,
        "title": "Business Metrics",
        "type": "table",
        "targets": [
          {
            "expr": "custom_app_metrics_active_users",
            "legendFormat": "Active Users",
            "refId": "A",
            "datasource": "Prometheus"
          },
          {
            "expr": "custom_app_metrics_orders_per_minute",
            "legendFormat": "Orders/min",
            "refId": "B",
            "datasource": "Prometheus"
          }
        ],
        "gridPos": {"h": 8, "w": 24, "x": 0, "y": 8}
      }
    ],
    "time": {"from": "now-1h", "to": "now"},
    "refresh": "30s"
  }
}

Configure advanced alerting rules

Create Prometheus alerting rules

Set up alert rules for your custom metrics to get notified when issues occur.

groups:
  • name: custom_app_alerts
rules: - alert: ApplicationDown expr: custom_app_metrics_app_available == 0 for: 2m labels: severity: critical annotations: summary: "Application is down" description: "Application has been down for more than 2 minutes" - alert: HighResponseTime expr: custom_app_metrics_app_response_time > 1000 for: 5m labels: severity: warning annotations: summary: "High application response time" description: "Application response time is {{ $value }}ms" - alert: DatabaseConnectionsHigh expr: custom_app_metrics_db_active_connections > 80 for: 3m labels: severity: warning annotations: summary: "High database connection usage" description: "Database has {{ $value }} active connections"

Update Prometheus configuration for alerts

Add the alert rules file to your Prometheus configuration.

sudo chown prometheus:prometheus /etc/prometheus/app_alerts.yml
sudo systemctl restart prometheus

Verify your setup

# Check all services are running
sudo systemctl status telegraf prometheus grafana-server influxdb

Test custom script execution

sudo -u telegraf /etc/telegraf/scripts/app_metrics.py

Check Telegraf is collecting metrics

sudo journalctl -u telegraf -f

Verify Prometheus is scraping Telegraf

curl http://localhost:9090/api/v1/targets

Check InfluxDB has data

influx -execute 'SHOW MEASUREMENTS ON telegraf'

Test Grafana access

curl http://localhost:3000/api/health
Note: Access Grafana at http://your-server:3000 with default credentials admin/admin. Configure data sources for both Prometheus (http://localhost:9090) and InfluxDB (http://localhost:8086).

Common issues

Symptom Cause Fix
Custom script not executing Permission or path issues sudo -u telegraf /etc/telegraf/scripts/app_metrics.py to test manually
No metrics in Prometheus Telegraf Prometheus output not running Check curl localhost:9273/metrics and verify port 9273 is open
InfluxDB connection failed Database or user doesn't exist Recreate database and user with proper permissions
Python import errors Missing dependencies Install missing packages with pip3 install requests psutil
High memory usage Too frequent collection interval Increase interval in telegraf.conf from 30s to 60s or higher
Grafana dashboards empty Data source not configured Add both Prometheus and InfluxDB as data sources in Grafana settings

Next steps

Running this in production?

Need this managed? Setting up custom monitoring once is straightforward. Keeping it patched, monitored, backed up and performant across environments is the harder part. See how we run infrastructure like this for European SaaS and e-commerce teams.

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle managed devops services for businesses that depend on uptime. From initial setup to ongoing operations.