Configure Linux performance monitoring with collectd and InfluxDB 1.8 for real-time metrics collection

Intermediate 25 min Apr 01, 2026 15 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Set up comprehensive system monitoring using collectd to collect performance metrics and InfluxDB 1.8 as a time-series database backend. This tutorial covers installation, configuration, and retention policies for production monitoring environments.

Prerequisites

  • Root or sudo access
  • At least 2GB RAM
  • 10GB available disk space

What this solves

Modern infrastructure requires continuous monitoring of system performance metrics like CPU usage, memory consumption, disk I/O, and network statistics. Collectd provides a lightweight daemon that collects these metrics with minimal system overhead, while InfluxDB 1.8 offers a robust time-series database for storing and querying performance data.

This monitoring stack enables you to track system trends, identify performance bottlenecks, and set up alerting for critical thresholds. Unlike simple monitoring solutions, this setup provides granular metrics collection with configurable retention policies and efficient data compression.

Step-by-step installation

Update system packages

Start by updating your package manager to ensure you get the latest versions and security patches.

sudo apt update && sudo apt upgrade -y
sudo dnf update -y

Install InfluxDB 1.8

Install InfluxDB 1.8 from the official repository. We use version 1.8 specifically for its mature feature set and stability.

wget -qO- https://repos.influxdata.com/influxdb.key | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/influxdb.gpg > /dev/null
echo "deb [signed-by=/etc/apt/trusted.gpg.d/influxdb.gpg] https://repos.influxdata.com/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/influxdb.list
sudo apt update
sudo apt install -y influxdb
cat > /etc/yum.repos.d/influxdb.repo << 'EOF'
[influxdb]
name = InfluxDB Repository - RHEL
baseurl = https://repos.influxdata.com/rhel/\$releasever/\$basearch/stable
enabled = 1
gpgcheck = 1
gpgkey = https://repos.influxdata.com/influxdb.key
EOF
sudo dnf install -y influxdb

Configure InfluxDB

Modify the InfluxDB configuration to optimize for metrics storage and enable the collectd input plugin.

[meta]
  dir = "/var/lib/influxdb/meta"

[data]
  dir = "/var/lib/influxdb/data"
  wal-dir = "/var/lib/influxdb/wal"
  series-id-set-cache-size = 100

[coordinator]
  write-timeout = "10s"
  max-concurrent-queries = 0
  query-timeout = "0s"
  log-queries-after = "0s"
  max-select-point = 0
  max-select-series = 0
  max-select-buckets = 0

[retention]
  enabled = true
  check-interval = "30m"

[shard-precreation]
  enabled = true
  check-interval = "10m"
  advance-period = "30m"

[monitor]
  store-enabled = true
  store-database = "_internal"
  store-interval = "10s"

[subscriber]
  enabled = true
  http-timeout = "30s"

[http]
  enabled = true
  bind-address = ":8086"
  auth-enabled = false
  log-enabled = true
  write-tracing = false
  pprof-enabled = true
  https-enabled = false
  max-row-limit = 0
  max-connection-limit = 0
  shared-secret = ""
  realm = "InfluxDB"

[[collectd]]
  enabled = true
  bind-address = ":25826"
  database = "collectd"
  retention-policy = ""
  batch-size = 5000
  batch-pending = 10
  batch-timeout = "10s"
  read-buffer = 0
  typesdb = "/usr/share/collectd/types.db"

Start and enable InfluxDB

Enable InfluxDB to start automatically on boot and start the service now.

sudo systemctl enable --now influxdb
sudo systemctl status influxdb

Create InfluxDB database and retention policies

Create a dedicated database for collectd metrics with appropriate retention policies for different data granularities.

influx -execute "CREATE DATABASE collectd"
influx -execute "CREATE RETENTION POLICY \"rp_1h\" ON \"collectd\" DURATION 7d REPLICATION 1"
influx -execute "CREATE RETENTION POLICY \"rp_1d\" ON \"collectd\" DURATION 30d REPLICATION 1"
influx -execute "CREATE RETENTION POLICY \"rp_1w\" ON \"collectd\" DURATION 365d REPLICATION 1 DEFAULT"
influx -execute "SHOW RETENTION POLICIES ON collectd"

Install collectd

Install collectd daemon and common plugins for system monitoring.

sudo apt install -y collectd collectd-utils
sudo dnf install -y collectd collectd-utils epel-release
sudo dnf install -y collectd-network collectd-write_http

Configure collectd main settings

Create a comprehensive collectd configuration that enables essential system monitoring plugins and configures network output to InfluxDB.

Hostname "$(hostname)"
FQDNLookup true
BaseDir "/var/lib/collectd"
PluginDir "/usr/lib/collectd"
TypesDB "/usr/share/collectd/types.db"

AutoLoadPlugin false
CollectInternalStats false

Interval 10
MaxReadInterval 86400
Timeout 2
ReadThreads 5
WriteThreads 5

Logging

LoadPlugin syslog LogLevel info

System monitoring plugins

LoadPlugin cpu LoadPlugin df LoadPlugin disk LoadPlugin interface LoadPlugin load LoadPlugin memory LoadPlugin processes LoadPlugin swap LoadPlugin uptime LoadPlugin users

CPU plugin configuration

ReportByCpu true ReportByState true ValuesPercentage true

Disk space monitoring

MountPoint "/" MountPoint "/var" MountPoint "/tmp" FSType "ext4" FSType "xfs" IgnoreSelected false ReportByDevice false ReportReserved true ReportInodes true ValuesAbsolute true ValuesPercentage true

Disk I/O monitoring

Disk "/^[hsv]d[a-z]/" IgnoreSelected false UdevNameAttr "DEVNAME"

Network interface monitoring

Interface "eth0" Interface "ens3" Interface "lo" IgnoreSelected false

Memory detailed monitoring

ValuesAbsolute true ValuesPercentage true

Process monitoring

Process "collectd" Process "influxdb" Process "sshd" Process "nginx" Process "apache2" ProcessMatch "java" "java.*"

Network plugin for sending to InfluxDB

LoadPlugin network SecurityLevel None Interface "lo" TimeToLive 128 MaxPacketSize 1452 Forward false CacheFlush 1800 ReportStats true

Configure collectd additional plugins

Enable additional monitoring plugins for more comprehensive system visibility.

# TCP connection monitoring
LoadPlugin tcpconns

    ListeningPorts true
    AllPortsSummary true
    LocalPort "22"
    LocalPort "80"
    LocalPort "443"
    LocalPort "8086"


Context switch monitoring

LoadPlugin contextswitch

Entropy monitoring

LoadPlugin entropy

IRQ monitoring

LoadPlugin irq Irq 0 Irq 1 Irq 8 IgnoreSelected false

Thermal monitoring

LoadPlugin thermal ForceUseProcfs false Device "thermal_zone0" Device "thermal_zone1" IgnoreSelected false

System statistics

LoadPlugin vmem Verbose true

File descriptor monitoring

LoadPlugin filecount Instance "proc-fd" Name "fd" Recursive true

Set proper file permissions

Configure correct ownership and permissions for collectd directories and files. The collectd daemon needs read access to system files and write access to its data directory.

Never use chmod 777. It gives every user on the system full access to your files. Instead, fix ownership with chown and use minimal permissions.
sudo chown -R collectd:collectd /var/lib/collectd
sudo chmod 755 /var/lib/collectd
sudo chmod 644 /etc/collectd/collectd.conf
sudo chmod 644 /etc/collectd/collectd.conf.d/additional.conf

Enable and start collectd

Start the collectd service and enable it to start automatically on system boot.

sudo systemctl enable --now collectd
sudo systemctl status collectd

Configure firewall rules

Open necessary ports for InfluxDB HTTP API and collectd network communication.

sudo ufw allow 8086/tcp comment "InfluxDB HTTP API"
sudo ufw allow from 127.0.0.1 to any port 25826 comment "collectd to InfluxDB"
sudo ufw reload
sudo ufw status
sudo firewall-cmd --permanent --add-port=8086/tcp --add-source=127.0.0.1
sudo firewall-cmd --permanent --add-port=25826/udp --add-source=127.0.0.1
sudo firewall-cmd --reload
sudo firewall-cmd --list-all

Verify your setup

Test that collectd is successfully sending metrics to InfluxDB and that data is being stored correctly.

# Check service status
sudo systemctl status influxdb collectd

Verify InfluxDB is receiving data

influx -execute "SHOW DATABASES" influx -execute "USE collectd; SHOW MEASUREMENTS LIMIT 10"

Check recent CPU metrics

influx -execute "USE collectd; SELECT mean(value) FROM cpu_value WHERE time > now() - 5m GROUP BY time(1m), host, instance"

Monitor collectd logs

sudo journalctl -u collectd -f --lines=20

Check collectd network statistics

sudo collectdctl listval | grep network

Configure retention and continuous queries

Set up data downsampling

Create continuous queries to automatically downsample high-resolution data into lower-resolution aggregates for long-term storage.

# Create continuous query for hourly aggregates
influx -execute "USE collectd; CREATE CONTINUOUS QUERY cq_1h ON collectd BEGIN SELECT mean(value) as value INTO collectd.rp_1h.:MEASUREMENT FROM /./ GROUP BY time(1h),  END"

Create continuous query for daily aggregates

influx -execute "USE collectd; CREATE CONTINUOUS QUERY cq_1d ON collectd BEGIN SELECT mean(value) as value INTO collectd.rp_1d.:MEASUREMENT FROM collectd.rp_1h././ GROUP BY time(1d), END"

List continuous queries

influx -execute "SHOW CONTINUOUS QUERIES"

Monitor data retention

Set up monitoring for retention policy enforcement and data cleanup.

# Check shard information
influx -execute "USE collectd; SHOW SHARDS"

Monitor database size

influx -execute "USE collectd; SELECT sum(diskBytes) FROM _internal..tsm1_filestore WHERE time > now() - 1h GROUP BY time(10m)"

Verify retention policy application

influx -execute "USE collectd; SHOW RETENTION POLICIES"

Performance optimization

Tune InfluxDB for metrics workload

Optimize InfluxDB configuration for high-throughput metrics collection.

# Add these optimizations to the existing config
[data]
  # Increase cache sizes for better write performance
  cache-max-memory-size = "1g"
  cache-snapshot-memory-size = "25m"
  cache-snapshot-write-cold-duration = "10m"
  
  # Optimize compaction
  compact-full-write-cold-duration = "4h"
  compact-throughput = "48m"
  compact-throughput-burst = "48m"
  
  # TSM engine optimizations
  tsm-use-madv-willneed = true
  
[coordinator]
  # Optimize write performance
  write-timeout = "30s"
  max-concurrent-queries = 0
  
[http]
  # Increase connection limits
  max-connection-limit = 0
  max-enqueued-write-limit = 0
  enqueued-write-timeout = "30s"

Optimize collectd performance

Fine-tune collectd for minimal system impact while maintaining comprehensive monitoring.

# Performance optimizations
WriteThreads 8
WriteQueueLimitHigh 1000000
WriteQueueLimitLow 800000

Network plugin optimizations

SecurityLevel None Interface "lo" TimeToLive 128 MaxPacketSize 1452 Forward false CacheFlush 1800 ReportStats false

Reduce disk plugin overhead

Disk "/^[hsv]d[a-z]/" IgnoreSelected false UdevNameAttr "DEVNAME" UseBSDName false

Common issues

SymptomCauseFix
collectd fails to startConfiguration syntax errorsudo collectd -T -C /etc/collectd/collectd.conf to test config
No data in InfluxDBNetwork plugin not configuredCheck network plugin config and port 25826 connectivity
InfluxDB connection refusedService not running or firewall blockingsudo systemctl status influxdb and check firewall rules
High CPU usage from collectdToo many plugins or short intervalIncrease interval to 30s and disable unused plugins
Permission denied errorsWrong file ownershipsudo chown -R collectd:collectd /var/lib/collectd
Missing measurements in InfluxDBTypes.db file missingInstall collectd-core package or verify TypesDB path
Data not being retained properlyRetention policies not appliedCheck retention policies and continuous queries are running

Next steps

Automated install script

Run this to automate the entire setup

#collectd #influxdb #monitoring #metrics #performance

Need help?

Don't want to manage this yourself?

We handle infrastructure for businesses that depend on uptime. From initial setup to ongoing operations.

Talk to an engineer