Implement Grafana advanced alerting with webhooks and notification channels

Intermediate 45 min Apr 25, 2026 45 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Set up comprehensive Grafana alerting with webhook endpoints, Slack and Teams notifications, and advanced alert conditions. Configure data sources, create alert rules, and implement custom notification channels for production monitoring.

Prerequisites

  • Grafana 9.0 or higher
  • Prometheus data source
  • SMTP server access for email notifications
  • Webhook endpoints for external integrations
  • Administrative access to configure notification channels

What this solves

Grafana's unified alerting system lets you create sophisticated alert rules that can notify multiple channels when your infrastructure needs attention. This tutorial shows you how to set up webhook endpoints for external integrations, configure notification channels for Slack and Microsoft Teams, and create advanced alert conditions with custom templating for production environments.

Step-by-step configuration

Update system packages

Start by updating your package manager to ensure you have the latest security patches.

sudo apt update && sudo apt upgrade -y
sudo dnf update -y

Install Grafana if not already present

Install Grafana from the official repository if you don't have it running yet.

wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt update
sudo apt install -y grafana
sudo dnf install -y https://dl.grafana.com/oss/release/grafana-10.2.3-1.x86_64.rpm

Configure Grafana for unified alerting

Enable unified alerting in the main configuration file and set retention policies.

[unified_alerting]
enabled = true

[alerting]
enabled = false

Alert rule evaluation interval

[unified_alerting.evaluation] max_attempts = 3 min_interval = 10s

Data retention for alert instances

[unified_alerting.state_history] enabled = true max_age = 168h

Screenshot settings for alert notifications

[unified_alerting.screenshots] capture = true max_concurrent_screenshots = 5

Start and enable Grafana

Start the Grafana service and enable it to run on boot.

sudo systemctl start grafana-server
sudo systemctl enable grafana-server
sudo systemctl status grafana-server

Configure Prometheus data source

Add a Prometheus data source through the Grafana web interface at http://your-server:3000. Navigate to Configuration > Data Sources > Add data source > Prometheus.

Name: Prometheus
URL: http://localhost:9090
Access: Server (default)
Scrape interval: 15s
Query timeout: 60s
HTTP Method: POST

Click "Save & Test" to verify the connection works.

Create contact points for notifications

Navigate to Alerting > Contact points > Add contact point. Create separate contact points for each notification method.

Slack Contact Point:

Name: slack-alerts
Integration: Slack
Webhook URL: https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK
Channel: #alerts
Username: Grafana
Title: {{ template "slack.default.title" . }}
Text: {{ template "slack.default.text" . }}

Microsoft Teams Contact Point:

Name: teams-alerts
Integration: Microsoft Teams
Webhook URL: https://your-tenant.webhook.office.com/webhookb2/YOUR-WEBHOOK-URL
Title: {{ template "teams.default.title" . }}
Summary: {{ template "teams.default.summary" . }}

Email Contact Point:

Name: email-alerts
Integration: Email
Addresses: ops-team@example.com, alerts@example.com
Subject: [GRAFANA] {{ .GroupLabels.alertname }}
Message: {{ range .Alerts }}{{ .Annotations.summary }}{{ end }}

Configure SMTP for email notifications

Update Grafana's SMTP settings to enable email notifications.

[smtp]
enabled = true
host = smtp.gmail.com:587
user = your-email@gmail.com
password = your-app-password
cert_file =
key_file =
skip_verify = false
from_address = grafana@example.com
from_name = Grafana Alerts
ehlo_identity = example.com
startTLS_policy = MandatoryStartTLS

Restart Grafana after updating SMTP settings:

sudo systemctl restart grafana-server

Create custom webhook endpoint

Set up a webhook contact point for external integrations like PagerDuty or custom applications.

Name: custom-webhook
Integration: Webhook
URL: https://api.example.com/v1/alerts
HTTP Method: POST
Authorization Header: Bearer YOUR-API-TOKEN
Content-Type: application/json
Body:
{
  "alert_name": "{{ .GroupLabels.alertname }}",
  "status": "{{ .Status }}",
  "severity": "{{ .GroupLabels.severity }}",
  "summary": "{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}",
  "timestamp": "{{ .Alerts.Firing | len }}",
  "firing_alerts": {{ .Alerts.Firing | len }},
  "resolved_alerts": {{ .Alerts.Resolved | len }}
}

Create notification policies

Navigate to Alerting > Notification policies. Configure routing based on alert labels and severity.

# Root policy (catches all alerts)
Default contact point: email-alerts
Group by: alertname, instance
Group wait: 10s
Group interval: 5m
Repeat interval: 12h

High severity alerts

Matchers:
  • severity = critical
Contact point: slack-alerts, teams-alerts, custom-webhook Group wait: 0s Repeat interval: 5m Override grouping: true

Warning alerts

Matchers:
  • severity = warning
Contact point: slack-alerts Group interval: 10m Repeat interval: 4h

Create advanced alert rules

Navigate to Alerting > Alert rules > New rule. Create comprehensive alert rules with multiple conditions.

CPU Usage Alert Rule:

Rule name: High CPU Usage
Folder: Infrastructure
Group: System Metrics

Query A - Current CPU usage

Query: 100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) Alias: cpu_usage

Query B - CPU usage trend (15min average)

Query: avg_over_time((100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100))[15m:]) Alias: cpu_trend

Condition

Expression: $cpu_usage > 80 AND $cpu_trend > 70 Evaluation: Last value, IS ABOVE, 0

Evaluation behavior

Evaluate every: 1m For: 5m

Labels

severity: warning team: infrastructure service: system

Annotations

summary: High CPU usage detected on {{ $labels.instance }} description: CPU usage is {{ $values.cpu_usage | humanize }}% (15min avg: {{ $values.cpu_trend | humanize }}%)

Create memory usage alert with templating

Create a more complex alert rule for memory usage with custom templating.

Rule name: High Memory Usage
Folder: Infrastructure
Group: System Metrics

Query A - Available memory percentage

Query: (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 Alias: memory_available

Query B - Memory usage percentage

Query: 100 - ((node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100) Alias: memory_usage

Condition

Expression: $memory_available < 15 Evaluation: Last value, IS BELOW, 15

Evaluation behavior

Evaluate every: 30s For: 2m

Labels

severity: critical team: infrastructure service: system runbook_url: https://wiki.example.com/runbooks/memory-alerts

Annotations with advanced templating

summary: Memory usage critical on {{ $labels.instance }} description: | Memory usage is {{ $values.memory_usage | printf "%.1f" }}% ({{ printf "%.1f" (100 - $values.memory_available) }}% used) Available memory: {{ $values.memory_available | printf "%.1f" }}% Instance: {{ $labels.instance }} Job: {{ $labels.job }} Runbook: {{ $labels.runbook_url }}

Create application-specific alert rules

Set up alerts for application metrics like HTTP response times and error rates.

Rule name: High HTTP Error Rate
Folder: Applications
Group: Web Services

Query A - Error rate calculation

Query: rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) * 100 Alias: error_rate

Query B - Request volume

Query: rate(http_requests_total[5m]) Alias: request_rate

Condition

Expression: $error_rate > 5 AND $request_rate > 1 Evaluation: Last value, IS ABOVE, 0

Evaluation behavior

Evaluate every: 30s For: 3m

Labels

severity: critical team: backend service: {{ $labels.service }} environment: {{ $labels.environment }}

Annotations

summary: High error rate for {{ $labels.service }} description: | Error rate: {{ $values.error_rate | printf "%.2f" }}% Request rate: {{ $values.request_rate | printf "%.1f" }} req/s Service: {{ $labels.service }} Environment: {{ $labels.environment }} Instance: {{ $labels.instance }}

Configure alert rule groups and folders

Organize alerts into logical groups for better management. Navigate to Alerting > Alert rules and create folders.

Folders:
├── Infrastructure
│   ├── System Metrics (CPU, Memory, Disk)
│   ├── Network (Connectivity, Bandwidth)
│   └── Database (MySQL, PostgreSQL, Redis)
├── Applications
│   ├── Web Services (HTTP errors, latency)
│   ├── Background Jobs (Queue length, failures)
│   └── API Endpoints (Rate limits, timeouts)
└── Business Metrics
    ├── User Activity (Logins, signups)
    ├── Revenue (Transactions, conversions)
    └── Performance (Page load, API response)

Create silences and maintenance windows

Set up alert silences for planned maintenance. Navigate to Alerting > Silences > New silence.

Matchers:
  • service = "web-frontend"
  • environment = "production"
Start: 2024-01-15 02:00 UTC End: 2024-01-15 04:00 UTC Timezone: UTC Created by: ops-team Comment: Scheduled maintenance - database migration

Or create regex-based silences

Matchers:
  • alertname =~ "High.*Usage"
  • instance =~ "web-[0-9]+\.example\.com"
Duration: 2h Comment: Load testing in progress

Configure alert templates

Create custom notification templates for better alert messages. Navigate to Alerting > Contact points > Message templates.

Name: detailed-alert-template
Content:
{{ define "alert.title" }}
[{{ .Status | toUpper }}{{ if eq .Status "firing" }} x{{ .Alerts.Firing | len }}{{ end }}] {{ .GroupLabels.alertname }}
{{ end }}

{{ define "alert.summary" }}
{{ if gt (len .Alerts.Firing) 0 }}
Firing Alerts:
{{ range .Alerts.Firing }}
• {{ .Annotations.summary }}
  Labels: {{ range .Labels.SortedPairs }}{{ .Name }}={{ .Value }} {{ end }}
  Started: {{ .StartsAt.Format "2006-01-02 15:04:05 UTC" }}
{{ end }}
{{ end }}

{{ if gt (len .Alerts.Resolved) 0 }}
Resolved Alerts:
{{ range .Alerts.Resolved }}
• {{ .Annotations.summary }}
  Resolved: {{ .EndsAt.Format "2006-01-02 15:04:05 UTC" }}
{{ end }}
{{ end }}

Dashboard: {{ .ExternalURL }}
{{ end }}

Configure webhook security

Secure webhook endpoints

Add authentication and validation to your webhook endpoints to prevent unauthorized access.

# In webhook contact point configuration
HTTP Headers:
X-Grafana-Source: alerting
User-Agent: Grafana/10.2.3
Authorization: Bearer YOUR-SECRET-TOKEN

Custom headers for verification

X-Signature: {{ .ExternalURL | sha256 }} X-Timestamp: {{ now.Unix }}

Validate webhook payload

Create a simple webhook receiver to test your configuration.

#!/usr/bin/env python3
import json
import hashlib
import hmac
from http.server import HTTPServer, BaseHTTPRequestHandler

SECRET_TOKEN = "your-secret-token"

class WebhookHandler(BaseHTTPRequestHandler):
    def do_POST(self):
        content_length = int(self.headers['Content-Length'])
        post_data = self.rfile.read(content_length)
        
        # Verify authorization
        auth_header = self.headers.get('Authorization', '')
        if not auth_header.startswith('Bearer '):
            self.send_response(401)
            self.end_headers()
            return
            
        token = auth_header[7:]  # Remove 'Bearer '
        if token != SECRET_TOKEN:
            self.send_response(401)
            self.end_headers()
            return
        
        try:
            alert_data = json.loads(post_data.decode('utf-8'))
            print(f"Received alert: {alert_data['alert_name']}")
            print(f"Status: {alert_data['status']}")
            print(f"Summary: {alert_data['summary']}")
            
            self.send_response(200)
            self.send_header('Content-type', 'application/json')
            self.end_headers()
            self.wfile.write(json.dumps({"status": "received"}).encode())
            
        except json.JSONDecodeError:
            self.send_response(400)
            self.end_headers()

if __name__ == '__main__':
    server = HTTPServer(('localhost', 8080), WebhookHandler)
    print("Webhook test server running on http://localhost:8080")
    server.serve_forever()

Run the test webhook server:

chmod +x /tmp/webhook-test.py
python3 /tmp/webhook-test.py

Advanced alerting features

Create multi-condition alerts

Build complex alerts that require multiple conditions to be met simultaneously.

Rule name: Service Degradation
Folder: Applications
Group: Service Health

Query A - High response time

Query: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) * 1000 Alias: response_time_p95

Query B - Error rate

Query: rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) * 100 Alias: error_rate

Query C - Request volume

Query: rate(http_requests_total[5m]) Alias: request_volume

Complex condition with AND/OR logic

Expression: ($response_time_p95 > 500 AND $request_volume > 10) OR ($error_rate > 2 AND $request_volume > 5) Evaluation: Last value, IS ABOVE, 0 Evaluate every: 30s For: 5m

Configure alert dependencies

Set up alert routing based on dependencies to reduce noise during outages.

# Parent policy - Database connectivity
Matchers:
  • alertname = "Database Connection Failed"
  • service = "postgresql"
Contact point: critical-alerts Group wait: 0s Repeat interval: 5m

Child policy - Application errors (only if DB is healthy)

Matchers:
  • alertname = "High Application Error Rate"
  • NOT database_status = "down"
Contact point: app-team-alerts Group interval: 10m Repeat interval: 30m

Silence application alerts when database is down

Matchers:
  • alertname = "High Application Error Rate"
  • database_status = "down"
Contact point: null

Create alert annotations with links

Add useful links and context to alert notifications for faster troubleshooting.

# In alert rule annotations
summary: High CPU usage on {{ $labels.instance }}
description: |
  CPU usage: {{ $values.cpu_usage | printf "%.1f" }}%
  Instance: {{ $labels.instance }}
  Environment: {{ $labels.environment }}
  
  🔍 Troubleshooting Links:System DashboardCPU DetailsServer Logs,refreshInterval:(pause:!t,value:0),time:(from:now-1h,to:now))&_a=(columns:!(_source),filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'logs-*',key:host,negate:!f,params:(query:'{{ $labels.instance }}'),type:phrase),query:(match_phrase:(host:'{{ $labels.instance }}')))))
  • Runbook

runbook_url: https://wiki.example.com/runbooks/high-cpu-usage
dashboard_url: http://grafana.example.com/d/system/system-overview?var-instance={{ $labels.instance }}

Verify your setup

Test your alerting configuration to ensure everything works correctly.

# Check Grafana status
sudo systemctl status grafana-server

Test contact points from Grafana UI

Navigate to Alerting > Contact points > Test

View alert rule evaluation

curl -H "Authorization: Bearer YOUR-API-KEY" \ "http://localhost:3000/api/v1/eval" \ -X POST \ -H "Content-Type: application/json" \ -d '{ "queries": [{ "refId": "A", "queryType": "", "model": { "expr": "up{job=\"prometheus\"}", "intervalMs": 1000, "maxDataPoints": 43200 } }] }'

Check notification policies

curl -H "Authorization: Bearer YOUR-API-KEY" \ "http://localhost:3000/api/v1/provisioning/policies"

View active alerts

curl -H "Authorization: Bearer YOUR-API-KEY" \ "http://localhost:3000/api/v1/alerts"
Note: Replace YOUR-API-KEY with a service account token created in Grafana under Administration > Service accounts.

Common issues

SymptomCauseFix
Alerts not firingIncorrect query or evaluation settingsCheck query syntax in Explore tab, verify evaluation interval
Notifications not sentContact point configuration errorTest contact point, check SMTP/webhook settings
Too many alert notificationsIncorrect grouping or repeat intervalAdjust notification policy grouping and intervals
Webhook returns 401/403Authentication headers missing or incorrectVerify Authorization header and webhook endpoint security
Email notifications failSMTP configuration incorrectTest SMTP settings, check firewall rules for port 587
Alert templates not renderingTemplate syntax errors or missing variablesValidate template syntax, test with sample alert data
Silences not workingLabel matchers don't match alert labelsCheck exact label names and values in alert rules

Next steps

Running this in production?

Want comprehensive alerting handled for you? Setting this up once is straightforward. Keeping it tuned, managing alert fatigue, and ensuring 24/7 reliability across environments is the harder part. See how we run infrastructure like this for European SaaS and e-commerce teams.

Need help?

Don't want to manage this yourself?

We handle managed devops services for businesses that depend on uptime. From initial setup to ongoing operations.