Linux System Resource Monitoring with Alerts

Set up comprehensive Linux system monitoring with Prometheus, Node Exporter, and Alertmanager to track CPU, memory, and disk usage with automated alerts and response scripts for proactive system management.

Prerequisites

Root or sudo access
At least 2GB RAM
Basic understanding of systemd services
Email server access for notifications (optional)

What this solves

System resource monitoring prevents downtime by alerting you when CPU, memory, disk, or network usage reaches critical thresholds. This tutorial sets up Prometheus with Node Exporter for metrics collection, Alertmanager for notifications, and custom scripts for automated responses like service restarts or resource cleanup.

Step-by-step installation

Update system packages

Start by updating your package manager to ensure you get the latest versions of all dependencies.

sudo apt update && sudo apt upgrade -y

sudo dnf update -y

Create system users for monitoring services

Create dedicated users for Prometheus, Node Exporter, and Alertmanager to run services securely without root privileges.

sudo useradd --no-create-home --shell /bin/false prometheus
sudo useradd --no-create-home --shell /bin/false node_exporter
sudo useradd --no-create-home --shell /bin/false alertmanager

Create directory structure

Set up the directory structure for configuration files, data storage, and binaries with correct permissions.

sudo mkdir -p /etc/prometheus /var/lib/prometheus /opt/prometheus
sudo mkdir -p /etc/alertmanager /var/lib/alertmanager
sudo mkdir -p /opt/node_exporter
sudo mkdir -p /opt/monitoring-scripts
sudo chown prometheus:prometheus /etc/prometheus /var/lib/prometheus
sudo chown alertmanager:alertmanager /etc/alertmanager /var/lib/alertmanager

Download and install Prometheus

Download the latest Prometheus release and extract it to the system directory.

cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.48.0/prometheus-2.48.0.linux-amd64.tar.gz
tar xvf prometheus-2.48.0.linux-amd64.tar.gz
sudo cp prometheus-2.48.0.linux-amd64/prometheus /usr/local/bin/
sudo cp prometheus-2.48.0.linux-amd64/promtool /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/prometheus /usr/local/bin/promtool
sudo chmod 755 /usr/local/bin/prometheus /usr/local/bin/promtool

Download and install Node Exporter

Install Node Exporter to collect system metrics like CPU, memory, disk, and network statistics.

cd /tmp
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xvf node_exporter-1.7.0.linux-amd64.tar.gz
sudo cp node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter
sudo chmod 755 /usr/local/bin/node_exporter

Download and install Alertmanager

Install Alertmanager to handle alerts from Prometheus and send notifications via email, Slack, or other channels.

cd /tmp
wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz
tar xvf alertmanager-0.26.0.linux-amd64.tar.gz
sudo cp alertmanager-0.26.0.linux-amd64/alertmanager /usr/local/bin/
sudo cp alertmanager-0.26.0.linux-amd64/amtool /usr/local/bin/
sudo chown alertmanager:alertmanager /usr/local/bin/alertmanager /usr/local/bin/amtool
sudo chmod 755 /usr/local/bin/alertmanager /usr/local/bin/amtool

Configure Prometheus

Create the main Prometheus configuration file with scrape targets for Node Exporter and alerting rules.

global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "alert_rules.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - localhost:9093

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node_exporter'
    static_configs:
      - targets: ['localhost:9100']
    scrape_interval: 5s

Create alerting rules

Define alert rules for critical system resources including CPU, memory, disk usage, and system load.

groups:
name: system_alerts  rules:
  - alert: HighCPUUsage
    expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "High CPU usage on {{ $labels.instance }}"
      description: "CPU usage is above 85% for more than 2 minutes. Current value: {{ $value }}%"

  - alert: HighMemoryUsage
    expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: "High memory usage on {{ $labels.instance }}"
      description: "Memory usage is above 90%. Current value: {{ $value }}%"

  - alert: HighDiskUsage
    expr: (1 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"})) * 100 > 85
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "High disk usage on {{ $labels.instance }}"
      description: "Root filesystem usage is above 85%. Current value: {{ $value }}%"

  - alert: HighSystemLoad
    expr: node_load15 > 2
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High system load on {{ $labels.instance }}"
      description: "15-minute load average is above 2. Current value: {{ $value }}"

  - alert: SystemDown
    expr: up == 0
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "Instance {{ $labels.instance }} is down"
      description: "{{ $labels.instance }} has been down for more than 1 minute"

Configure Alertmanager

Set up Alertmanager configuration with email notifications and webhook integration for automated responses.

global:
  smtp_smarthost: 'localhost:587'
  smtp_from: 'alerts@example.com'
  smtp_auth_username: 'alerts@example.com'
  smtp_auth_password: 'your-smtp-password'

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'default-receiver'
  routes:
  - match:
      severity: critical
    receiver: 'critical-alerts'
  - match:
      severity: warning
    receiver: 'warning-alerts'

receivers:
name: 'default-receiver'  email_configs:
  - to: 'admin@example.com'
    subject: 'Alert: {{ .GroupLabels.alertname }}'
    body: |
      {{ range .Alerts }}
      Alert: {{ .Annotations.summary }}
      Description: {{ .Annotations.description }}
      Instance: {{ .Labels.instance }}
      {{ end }}

name: 'critical-alerts'  email_configs:
  - to: 'admin@example.com'
    subject: 'CRITICAL: {{ .GroupLabels.alertname }}'
    body: |
      {{ range .Alerts }}
      CRITICAL ALERT: {{ .Annotations.summary }}
      Description: {{ .Annotations.description }}
      Instance: {{ .Labels.instance }}
      {{ end }}
  webhook_configs:
  - url: 'http://localhost:8080/webhook/critical'
    send_resolved: true

name: 'warning-alerts'  email_configs:
  - to: 'admin@example.com'
    subject: 'WARNING: {{ .GroupLabels.alertname }}'
    body: |
      {{ range .Alerts }}
      Warning: {{ .Annotations.summary }}
      Description: {{ .Annotations.description }}
      Instance: {{ .Labels.instance }}
      {{ end }}
  webhook_configs:
  - url: 'http://localhost:8080/webhook/warning'
    send_resolved: true

Create automated response scripts

Create scripts that can automatically respond to alerts by freeing disk space, restarting services, or clearing logs.

#!/bin/bash
Automated disk cleanup script

LOG_FILE="/var/log/monitoring-cleanup.log"
echo "$(date): Starting automated disk cleanup" >> $LOG_FILE

Clear temporary files older than 7 days
find /tmp -type f -atime +7 -delete 2>/dev/null
echo "$(date): Cleared old temporary files" >> $LOG_FILE

Rotate and compress logs
journalctl --vacuum-time=30d
echo "$(date): Cleaned journal logs" >> $LOG_FILE

Clear package cache
apt clean 2>/dev/null || dnf clean all 2>/dev/null
echo "$(date): Cleared package cache" >> $LOG_FILE

Check disk usage after cleanup
DISK_USAGE=$(df / | awk 'NR==2 {print $5}' | sed 's/%//')
echo "$(date): Disk usage after cleanup: ${DISK_USAGE}%" >> $LOG_FILE

if [ $DISK_USAGE -lt 80 ]; then
    echo "$(date): Disk cleanup successful" >> $LOG_FILE
else
    echo "$(date): WARNING: Disk usage still high after cleanup" >> $LOG_FILE
fi

Create service restart script

Create a script to automatically restart services when system load is high or memory usage is critical.

#!/bin/bash
Automated service restart script

LOG_FILE="/var/log/monitoring-restarts.log"
ALERT_TYPE=$1

echo "$(date): Received $ALERT_TYPE alert, checking system status" >> $LOG_FILE

Get current system metrics
CPU_USAGE=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | sed 's/%us,//')
MEM_USAGE=$(free | grep Mem | awk '{printf("%.2f", $3/$2 * 100.0)}')
LOAD_AVG=$(uptime | awk -F'load average:' '{print $2}' | awk '{print $1}' | sed 's/,//')

echo "$(date): CPU: ${CPU_USAGE}%, Memory: ${MEM_USAGE}%, Load: ${LOAD_AVG}" >> $LOG_FILE

Restart high-memory services if memory is critical
if (( $(echo "$MEM_USAGE > 90" | bc -l) )); then
    echo "$(date): Critical memory usage, restarting services" >> $LOG_FILE
    systemctl restart apache2 2>/dev/null || systemctl restart nginx 2>/dev/null
    systemctl restart mysql 2>/dev/null || systemctl restart mariadb 2>/dev/null
    echo "$(date): Services restarted" >> $LOG_FILE
fi

Kill processes consuming excessive CPU
if (( $(echo "$CPU_USAGE > 90" | bc -l) )); then
    echo "$(date): High CPU usage detected, checking for runaway processes" >> $LOG_FILE
    # Kill processes using more than 50% CPU (excluding system processes)
    ps aux --sort=-%cpu | awk 'NR>1 && $3>50 && $1!="root" {print $2}' | head -3 | xargs -r kill -TERM
    echo "$(date): Terminated high CPU processes" >> $LOG_FILE
fi

Create webhook server for automated responses

Set up a simple webhook server that receives alerts from Alertmanager and triggers automated response scripts.

#!/usr/bin/env python3
import json
import subprocess
from http.server import HTTPServer, BaseHTTPRequestHandler
import logging

logging.basicConfig(filename='/var/log/monitoring-webhook.log', level=logging.INFO,
                   format='%(asctime)s - %(levelname)s - %(message)s')

class WebhookHandler(BaseHTTPRequestHandler):
    def do_POST(self):
        if self.path == '/webhook/critical':
            content_length = int(self.headers['Content-Length'])
            post_data = self.rfile.read(content_length)
            
            try:
                alert_data = json.loads(post_data.decode('utf-8'))
                logging.info(f"Received critical alert: {alert_data}")
                
                for alert in alert_data.get('alerts', []):
                    alert_name = alert.get('labels', {}).get('alertname', '')
                    
                    if alert_name == 'HighDiskUsage':
                        subprocess.run(['/opt/monitoring-scripts/cleanup-disk.sh'], 
                                     check=False)
                        logging.info("Triggered disk cleanup for HighDiskUsage alert")
                    
                    elif alert_name in ['HighMemoryUsage', 'HighCPUUsage']:
                        subprocess.run(['/opt/monitoring-scripts/restart-services.sh', 'critical'], 
                                     check=False)
                        logging.info(f"Triggered service restart for {alert_name} alert")
                
                self.send_response(200)
                self.end_headers()
                self.wfile.write(b'Alert processed')
                
            except Exception as e:
                logging.error(f"Error processing alert: {e}")
                self.send_response(500)
                self.end_headers()
        
        elif self.path == '/webhook/warning':
            content_length = int(self.headers['Content-Length'])
            post_data = self.rfile.read(content_length)
            
            try:
                alert_data = json.loads(post_data.decode('utf-8'))
                logging.info(f"Received warning alert: {alert_data}")
                
                self.send_response(200)
                self.end_headers()
                self.wfile.write(b'Warning logged')
                
            except Exception as e:
                logging.error(f"Error processing warning: {e}")
                self.send_response(500)
                self.end_headers()
        
        else:
            self.send_response(404)
            self.end_headers()

if __name__ == '__main__':
    server = HTTPServer(('localhost', 8080), WebhookHandler)
    logging.info("Webhook server starting on port 8080")
    server.serve_forever()

Set permissions for monitoring scripts

Make the monitoring scripts executable and set proper ownership for security.

sudo chmod 755 /opt/monitoring-scripts/*.sh
sudo chmod 755 /opt/monitoring-scripts/webhook-server.py
sudo chown root:root /opt/monitoring-scripts/*

Never use chmod 777. It gives every user on the system full access to your files. Scripts only need execute permissions for the owner and group.

Install Python dependencies for webhook server

Install required Python packages for the webhook server to function properly.

sudo apt install -y python3 python3-pip bc
pip3 install --user requests

sudo dnf install -y python3 python3-pip bc
pip3 install --user requests

Create systemd service files

Create systemd service files for Prometheus, Node Exporter, Alertmanager, and the webhook server.

[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io/docs/
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --storage.tsdb.path=/var/lib/prometheus \
  --web.console.templates=/etc/prometheus/consoles \
  --web.console.libraries=/etc/prometheus/console_libraries \
  --web.listen-address=0.0.0.0:9090 \
  --web.enable-lifecycle
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

Create Node Exporter service

Configure the systemd service for Node Exporter to collect system metrics.

[Unit]
Description=Node Exporter
After=network.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter \
  --web.listen-address=:9100 \
  --collector.systemd \
  --collector.processes
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

Create Alertmanager service

Set up the systemd service for Alertmanager to handle alert notifications.

[Unit]
Description=Alertmanager
After=network-online.target

[Service]
User=alertmanager
Group=alertmanager
Type=simple
ExecStart=/usr/local/bin/alertmanager \
  --config.file=/etc/alertmanager/alertmanager.yml \
  --storage.path=/var/lib/alertmanager \
  --web.listen-address=:9093
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

Create webhook server service

Configure the systemd service for the automated response webhook server.

[Unit]
Description=Monitoring Webhook Server
After=network.target

[Service]
Type=simple
User=root
ExecStart=/usr/bin/python3 /opt/monitoring-scripts/webhook-server.py
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

Set correct file ownership

Ensure all configuration files have the correct ownership for their respective services.

sudo chown prometheus:prometheus /etc/prometheus/prometheus.yml
sudo chown prometheus:prometheus /etc/prometheus/alert_rules.yml
sudo chown alertmanager:alertmanager /etc/alertmanager/alertmanager.yml

Configure firewall rules

Open the necessary ports for Prometheus web interface, Node Exporter, Alertmanager, and webhook server.

sudo ufw allow 9090/tcp comment "Prometheus"
sudo ufw allow 9100/tcp comment "Node Exporter"
sudo ufw allow 9093/tcp comment "Alertmanager"
sudo ufw allow 8080/tcp comment "Webhook Server"

sudo firewall-cmd --permanent --add-port=9090/tcp --add-port=9100/tcp --add-port=9093/tcp --add-port=8080/tcp
sudo firewall-cmd --reload

Enable and start services

Reload systemd, enable all monitoring services to start on boot, and start them immediately.

sudo systemctl daemon-reload
sudo systemctl enable --now prometheus
sudo systemctl enable --now node_exporter
sudo systemctl enable --now alertmanager
sudo systemctl enable --now monitoring-webhook

Create monitoring dashboards and notifications

Install and configure Grafana

Add Grafana repository and install it for creating visual dashboards of your monitoring data.

wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt update
sudo apt install -y grafana

sudo dnf install -y https://dl.grafana.com/oss/release/grafana-10.2.2-1.x86_64.rpm

Configure Grafana data source

Create a Grafana configuration to automatically connect to your Prometheus instance.

apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://localhost:9090
    isDefault: true
    editable: true

Enable and start Grafana

Start Grafana service and configure it to start automatically on boot.

sudo systemctl enable --now grafana-server
sudo systemctl status grafana-server

Verify your setup

Check that all services are running and accessible through their web interfaces.

# Check service status
sudo systemctl status prometheus node_exporter alertmanager monitoring-webhook

Verify Prometheus is collecting metrics
curl -s http://localhost:9090/api/v1/query?query=up

Check Node Exporter metrics
curl -s http://localhost:9100/metrics | head -20

Test Alertmanager
curl -s http://localhost:9093/api/v1/status

Check webhook server
curl -s http://localhost:8080/webhook/warning -X POST -d '{}'

View recent logs
sudo journalctl -u prometheus --since "10 minutes ago" --no-pager
sudo journalctl -u alertmanager --since "10 minutes ago" --no-pager

Access the web interfaces:

Prometheus: http://your-server:9090
Alertmanager: http://your-server:9093
Grafana: http://your-server:3000 (admin/admin)

Common issues

Symptom	Cause	Fix
Prometheus won't start	Configuration syntax error	`sudo /usr/local/bin/promtool check config /etc/prometheus/prometheus.yml`
Node Exporter not collecting metrics	Permission issues or wrong user	`sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter`
Alerts not firing	Alert rules not loaded	`sudo /usr/local/bin/promtool check rules /etc/prometheus/alert_rules.yml`
Email notifications not working	SMTP configuration error	Check SMTP settings in `/etc/alertmanager/alertmanager.yml`
Webhook server not responding	Python dependencies missing	`pip3 install --user requests` and restart service
Can't access web interfaces	Firewall blocking ports	Verify firewall rules: `sudo ufw status` or `sudo firewall-cmd --list-ports`

Next steps

Install and configure Grafana with Prometheus for system monitoring to create advanced dashboards
Monitor container performance with Prometheus and cAdvisor for comprehensive metrics collection to extend monitoring to containers
Configure network monitoring with SNMP and Grafana dashboards for infrastructure visibility to monitor network devices
Setup centralized log aggregation with Elasticsearch 8, Logstash 8, and Kibana 8 (ELK Stack) for comprehensive log analysis
Configure Prometheus Alertmanager with Slack integration for team notifications

Automated install script

Run this to automate the entire setup

install.sh

#!/usr/bin/env bash

set -euo pipefail

# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'

# Versions
PROMETHEUS_VERSION="2.48.0"
NODE_EXPORTER_VERSION="1.7.0"
ALERTMANAGER_VERSION="0.26.0"

# Function to print colored output
print_status() {
    echo -e "${GREEN}[INFO]${NC} $1"
}

print_warning() {
    echo -e "${YELLOW}[WARNING]${NC} $1"
}

print_error() {
    echo -e "${RED}[ERROR]${NC} $1"
}

# Cleanup function
cleanup() {
    if [[ $? -ne 0 ]]; then
        print_error "Installation failed. Cleaning up..."
        systemctl stop prometheus node_exporter alertmanager 2>/dev/null || true
        systemctl disable prometheus node_exporter alertmanager 2>/dev/null || true
        userdel prometheus node_exporter alertmanager 2>/dev/null || true
        rm -rf /etc/prometheus /var/lib/prometheus /etc/alertmanager /var/lib/alertmanager /opt/monitoring-scripts
        rm -f /usr/local/bin/{prometheus,promtool,node_exporter,alertmanager,amtool}
        rm -f /etc/systemd/system/{prometheus,node_exporter,alertmanager}.service
    fi
}

trap cleanup ERR

# Check if running as root or with sudo
if [[ $EUID -ne 0 ]]; then
    print_error "This script must be run as root or with sudo"
    exit 1
fi

# Auto-detect distribution
if [ -f /etc/os-release ]; then
    . /etc/os-release
    case "$ID" in
        ubuntu|debian) 
            PKG_MGR="apt"
            PKG_UPDATE="apt update -y"
            PKG_INSTALL="apt install -y"
            PKG_UPGRADE="apt upgrade -y"
            ;;
        almalinux|rocky|centos|rhel|ol|fedora) 
            PKG_MGR="dnf"
            PKG_UPDATE="dnf update -y"
            PKG_INSTALL="dnf install -y"
            PKG_UPGRADE="dnf upgrade -y"
            ;;
        amzn) 
            PKG_MGR="yum"
            PKG_UPDATE="yum update -y"
            PKG_INSTALL="yum install -y"
            PKG_UPGRADE="yum upgrade -y"
            ;;
        *) 
            print_error "Unsupported distribution: $ID"
            exit 1 
            ;;
    esac
else
    print_error "Cannot detect distribution. /etc/os-release not found."
    exit 1
fi

print_status "Starting Prometheus monitoring stack installation on $PRETTY_NAME"

# Step 1: Update system
echo "[1/12] Updating system packages..."
$PKG_UPDATE
$PKG_UPGRADE
$PKG_INSTALL wget tar

# Step 2: Create system users
echo "[2/12] Creating system users..."
useradd --no-create-home --shell /bin/false prometheus 2>/dev/null || true
useradd --no-create-home --shell /bin/false node_exporter 2>/dev/null || true
useradd --no-create-home --shell /bin/false alertmanager 2>/dev/null || true

# Step 3: Create directory structure
echo "[3/12] Creating directory structure..."
mkdir -p /etc/prometheus /var/lib/prometheus /opt/prometheus
mkdir -p /etc/alertmanager /var/lib/alertmanager
mkdir -p /opt/node_exporter /opt/monitoring-scripts
chown prometheus:prometheus /etc/prometheus /var/lib/prometheus
chown alertmanager:alertmanager /etc/alertmanager /var/lib/alertmanager

# Step 4: Download and install Prometheus
echo "[4/12] Installing Prometheus..."
cd /tmp
wget -q "https://github.com/prometheus/prometheus/releases/download/v${PROMETHEUS_VERSION}/prometheus-${PROMETHEUS_VERSION}.linux-amd64.tar.gz"
tar xzf "prometheus-${PROMETHEUS_VERSION}.linux-amd64.tar.gz"
cp "prometheus-${PROMETHEUS_VERSION}.linux-amd64/prometheus" /usr/local/bin/
cp "prometheus-${PROMETHEUS_VERSION}.linux-amd64/promtool" /usr/local/bin/
chown prometheus:prometheus /usr/local/bin/prometheus /usr/local/bin/promtool
chmod 755 /usr/local/bin/prometheus /usr/local/bin/promtool

# Step 5: Download and install Node Exporter
echo "[5/12] Installing Node Exporter..."
wget -q "https://github.com/prometheus/node_exporter/releases/download/v${NODE_EXPORTER_VERSION}/node_exporter-${NODE_EXPORTER_VERSION}.linux-amd64.tar.gz"
tar xzf "node_exporter-${NODE_EXPORTER_VERSION}.linux-amd64.tar.gz"
cp "node_exporter-${NODE_EXPORTER_VERSION}.linux-amd64/node_exporter" /usr/local/bin/
chown node_exporter:node_exporter /usr/local/bin/node_exporter
chmod 755 /usr/local/bin/node_exporter

# Step 6: Download and install Alertmanager
echo "[6/12] Installing Alertmanager..."
wget -q "https://github.com/prometheus/alertmanager/releases/download/v${ALERTMANAGER_VERSION}/alertmanager-${ALERTMANAGER_VERSION}.linux-amd64.tar.gz"
tar xzf "alertmanager-${ALERTMANAGER_VERSION}.linux-amd64.tar.gz"
cp "alertmanager-${ALERTMANAGER_VERSION}.linux-amd64/alertmanager" /usr/local/bin/
cp "alertmanager-${ALERTMANAGER_VERSION}.linux-amd64/amtool" /usr/local/bin/
chown alertmanager:alertmanager /usr/local/bin/alertmanager /usr/local/bin/amtool
chmod 755 /usr/local/bin/alertmanager /usr/local/bin/amtool

# Step 7: Configure Prometheus
echo "[7/12] Configuring Prometheus..."
cat > /etc/prometheus/prometheus.yml << 'EOF'
global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "alert_rules.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - localhost:9093

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node_exporter'
    static_configs:
      - targets: ['localhost:9100']
    scrape_interval: 5s
EOF

# Step 8: Create alerting rules
echo "[8/12] Creating alerting rules..."
cat > /etc/prometheus/alert_rules.yml << 'EOF'
groups:
- name: system_alerts
  rules:
  - alert: HighCPUUsage
    expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "High CPU usage on {{ $labels.instance }}"
      
  - alert: HighMemoryUsage
    expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: "High memory usage on {{ $labels.instance }}"
      
  - alert: DiskSpaceUsage
    expr: (1 - (node_filesystem_avail_bytes{fstype!="tmpfs"} / node_filesystem_size_bytes{fstype!="tmpfs"})) * 100 > 85
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "High disk usage on {{ $labels.instance }}"
EOF

# Step 9: Configure Alertmanager
echo "[9/12] Configuring Alertmanager..."
cat > /etc/alertmanager/alertmanager.yml << 'EOF'
global:
  smtp_smarthost: 'localhost:587'

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'console-notifications'

receivers:
- name: 'console-notifications'
  webhook_configs:
  - url: 'http://localhost:8080/webhook'
EOF

chown prometheus:prometheus /etc/prometheus/prometheus.yml /etc/prometheus/alert_rules.yml
chown alertmanager:alertmanager /etc/alertmanager/alertmanager.yml
chmod 644 /etc/prometheus/prometheus.yml /etc/prometheus/alert_rules.yml /etc/alertmanager/alertmanager.yml

# Step 10: Create systemd services
echo "[10/12] Creating systemd services..."
cat > /etc/systemd/system/prometheus.service << 'EOF'
[Unit]
Description=Prometheus
After=network.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus --config.file=/etc/prometheus/prometheus.yml --storage.tsdb.path=/var/lib/prometheus/ --web.console.templates=/etc/prometheus/consoles --web.console.libraries=/etc/prometheus/console_libraries --web.listen-address=0.0.0.0:9090
Restart=always

[Install]
WantedBy=multi-user.target
EOF

cat > /etc/systemd/system/node_exporter.service << 'EOF'
[Unit]
Description=Node Exporter
After=network.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter --web.listen-address=0.0.0.0:9100
Restart=always

[Install]
WantedBy=multi-user.target
EOF

cat > /etc/systemd/system/alertmanager.service << 'EOF'
[Unit]
Description=Alertmanager
After=network.target

[Service]
User=alertmanager
Group=alertmanager
Type=simple
ExecStart=/usr/local/bin/alertmanager --config.file=/etc/alertmanager/alertmanager.yml --storage.path=/var/lib/alertmanager/ --web.listen-address=0.0.0.0:9093
Restart=always

[Install]
WantedBy=multi-user.target
EOF

# Step 11: Start and enable services
echo "[11/12] Starting services..."
systemctl daemon-reload
systemctl enable prometheus node_exporter alertmanager
systemctl start prometheus node_exporter alertmanager

# Step 12: Configure firewall
echo "[12/12] Configuring firewall..."
if command -v ufw >/dev/null 2>&1; then
    ufw allow 9090/tcp
    ufw allow 9100/tcp
    ufw allow 9093/tcp
elif command -v firewall-cmd >/dev/null 2>&1; then
    firewall-cmd --permanent --add-port=9090/tcp
    firewall-cmd --permanent --add-port=9100/tcp
    firewall-cmd --permanent --add-port=9093/tcp
    firewall-cmd --reload
fi

# Verification
echo "Verifying installation..."
sleep 10

if systemctl is-active --quiet prometheus && systemctl is-active --quiet node_exporter && systemctl is-active --quiet alertmanager; then
    print_status "All services are running successfully!"
    print_status "Prometheus: http://localhost:9090"
    print_status "Node Exporter: http://localhost:9100"
    print_status "Alertmanager: http://localhost:9093"
    print_warning "Remember to configure proper notification channels in /etc/alertmanager/alertmanager.yml"
else
    print_error "Some services failed to start. Check logs with 'journalctl -u prometheus -u node_exporter -u alertmanager'"
    exit 1
fi

# Cleanup temp files
rm -rf /tmp/prometheus-* /tmp/node_exporter-* /tmp/alertmanager-*

print_status "Prometheus monitoring stack installation completed successfully!"

Review the script before running. Execute with: bash install.sh

#prometheus #node-exporter #alertmanager #system-monitoring #automated-alerts

Monitor Linux system resources with performance alerts and automated responses

Prerequisites

What this solves

Step-by-step installation

Update system packages

Create system users for monitoring services

Create directory structure

Download and install Prometheus

Download and install Node Exporter

Download and install Alertmanager

Configure Prometheus

Create alerting rules

Configure Alertmanager

Create automated response scripts

Automated disk cleanup script

Clear temporary files older than 7 days

Rotate and compress logs

Clear package cache

Check disk usage after cleanup

Create service restart script

Automated service restart script

Get current system metrics

Restart high-memory services if memory is critical

Kill processes consuming excessive CPU

Create webhook server for automated responses

Set permissions for monitoring scripts

Install Python dependencies for webhook server

Create systemd service files

Create Node Exporter service

Create Alertmanager service

Create webhook server service

Set correct file ownership

Configure firewall rules

Enable and start services

Create monitoring dashboards and notifications

Install and configure Grafana

Configure Grafana data source

Enable and start Grafana

Verify your setup

Verify Prometheus is collecting metrics

Check Node Exporter metrics

Test Alertmanager

Check webhook server

View recent logs

Common issues

Next steps

Related tutorials

Setup Prometheus Blackbox Exporter for endpoint monitoring with SSL and alerting

Configure Prometheus alerting with AlertManager notifications and webhook integration

Set up Prometheus and Grafana monitoring stack with Docker Compose

Don't want to manage this yourself?