Implement NGINX Plus active health checks for enterprise environments

Advanced 45 min May 25, 2026 87 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Configure NGINX Plus active health checks to automatically detect and remove unhealthy upstream servers, ensuring high availability and optimal load balancing for enterprise applications.

Prerequisites

  • Valid NGINX Plus license
  • Multiple upstream servers
  • Root access to load balancer
  • Basic NGINX configuration knowledge

What this solves

NGINX Plus active health checks automatically monitor upstream server health and remove unresponsive backends from the load balancer rotation. This prevents failed requests from reaching unhealthy servers and improves application availability by ensuring traffic only reaches functional backends.

Prerequisites

You need a valid NGINX Plus license, upstream servers to monitor, and root access to your load balancer server. This tutorial assumes you have basic knowledge of NGINX configuration and load balancing concepts.

Step-by-step configuration

Install NGINX Plus

Download and install NGINX Plus using the official repository. You'll need your license certificate and key files.

wget https://nginx.org/keys/nginx_signing.key
sudo apt-key add nginx_signing.key
echo "deb https://plus-pkgs.nginx.com/ubuntu $(lsb_release -cs) nginx-plus" | sudo tee /etc/apt/sources.list.d/nginx-plus.list
sudo apt update
sudo apt install -y nginx-plus
sudo rpm --import https://nginx.org/keys/nginx_signing.key
echo "[nginx-plus]
name=nginx-plus repo
baseurl=https://plus-pkgs.nginx.com/centos/$(rpm -E '%{rhel}')/\$basearch/
gpgcheck=1
enabled=1" | sudo tee /etc/yum.repos.d/nginx-plus.repo
sudo dnf install -y nginx-plus

Configure upstream server block

Define your upstream servers with health check parameters. The zone directive enables shared memory for health check status tracking.

upstream backend {
    zone backend 64k;
    server 192.168.1.10:80 max_fails=3 fail_timeout=30s;
    server 192.168.1.11:80 max_fails=3 fail_timeout=30s;
    server 192.168.1.12:80 max_fails=3 fail_timeout=30s;
}

Configure active health checks

Add the health_check directive to your location block. This configures active probes to monitor server health independently of client requests.

server {
    listen 80;
    server_name example.com;
    
    location / {
        proxy_pass http://backend;
        health_check interval=5s fails=3 passes=2 uri=/health match=server_ok;
        
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

Define health check match conditions

Create custom match conditions to validate server responses. This ensures servers return expected content, not just HTTP 200 status codes.

match server_ok {
    status 200;
    header Content-Type ~ "application/json";
    body ~ '"status":"ok"';
}

match api_healthy {
    status 200-299;
    header X-Health-Status = "healthy";
    body !~ "error|maintenance";
}

Configure advanced health check parameters

Set up comprehensive health checks with custom intervals, timeouts, and failure thresholds for different application requirements.

upstream api_servers {
    zone api_backend 128k;
    server 192.168.1.20:8080 slow_start=30s;
    server 192.168.1.21:8080 slow_start=30s;
    server 192.168.1.22:8080 slow_start=30s;
}

server {
    listen 443 ssl;
    server_name api.example.com;
    
    ssl_certificate /etc/ssl/certs/api.example.com.pem;
    ssl_certificate_key /etc/ssl/private/api.example.com.key;
    
    location /api/ {
        proxy_pass http://api_servers;
        health_check interval=10s fails=2 passes=3 uri=/api/health match=api_healthy;
        health_check_timeout 5s;
        
        proxy_connect_timeout 3s;
        proxy_send_timeout 10s;
        proxy_read_timeout 10s;
    }
}

Enable NGINX Plus dashboard

Configure the NGINX Plus API and dashboard to monitor health check status in real-time.

server {
    listen 8080;
    server_name localhost;
    
    # Restrict access to dashboard
    allow 127.0.0.1;
    allow 192.168.1.0/24;
    deny all;
    
    location /api {
        api write=on;
        access_log off;
    }
    
    location = /dashboard.html {
        root /usr/share/nginx/html;
    }
    
    location /swagger-ui {
        root /usr/share/nginx/html;
    }
}

Test and reload configuration

Validate your NGINX configuration syntax and reload the service to apply health check settings.

sudo nginx -t
sudo systemctl reload nginx
sudo systemctl status nginx

Configure upstream server health endpoints

Create health check endpoints on your backend servers that return appropriate status information.

# Example health endpoint response
curl -X GET http://192.168.1.10/health

Expected JSON response:

{ "status": "ok", "timestamp": "2024-01-15T10:30:00Z", "version": "1.2.3", "checks": { "database": "healthy", "cache": "healthy", "disk_space": "healthy" } }

Monitor health check status

Use NGINX Plus API for monitoring

Query the NGINX Plus API to get real-time health check status and upstream server states.

# Get upstream server status
curl http://localhost:8080/api/6/http/upstreams/backend

Get health check statistics

curl http://localhost:8080/api/6/http/upstreams/backend/servers

Enable/disable specific server

curl -X PATCH -d '{"down": true}' http://localhost:8080/api/6/http/upstreams/backend/servers/0

Set up Prometheus monitoring

Export NGINX Plus metrics to Prometheus for centralized monitoring and alerting. This integrates with your existing monitoring stack.

server {
    listen 9113;
    server_name localhost;
    
    allow 127.0.0.1;
    allow 192.168.1.0/24;
    deny all;
    
    location /metrics {
        access_log off;
        return 200 "# HELP nginx_up NGINX status\n# TYPE nginx_up gauge\nnginx_up 1\n";
        add_header Content-Type text/plain;
    }
    
    location /nginx_status {
        stub_status;
        access_log off;
    }
}

Configure log monitoring

Set up structured logging for health check events to track server failures and recovery patterns.

log_format health_check '$remote_addr - $remote_user [$time_local] '
                       '"$request" $status $body_bytes_sent '
                       '"$http_referer" "$http_user_agent" '
                       'upstream_addr="$upstream_addr" '
                       'upstream_status="$upstream_status" '
                       'upstream_response_time="$upstream_response_time"';

access_log /var/log/nginx/health_check.log health_check;

Configure alerting

Set up automated alerts

Configure alerting when upstream servers fail health checks or become unavailable. This ensures rapid response to infrastructure issues.

#!/bin/bash

Monitor NGINX Plus upstream health and send alerts

UPSTREAM_NAME="backend" API_URL="http://localhost:8080/api/6/http/upstreams/$UPSTREAM_NAME" ALERT_EMAIL="ops@example.com"

Get upstream status

STATUS=$(curl -s $API_URL | jq -r '.peers[] | select(.state == "unhealthy") | .server') if [[ -n "$STATUS" ]]; then echo "ALERT: Unhealthy servers detected: $STATUS" | \ mail -s "NGINX Plus Health Check Alert" $ALERT_EMAIL fi

Schedule health monitoring

Add the monitoring script to cron for regular health check status validation.

chmod +x /usr/local/bin/nginx-health-monitor.sh
echo "/2    * /usr/local/bin/nginx-health-monitor.sh" | sudo crontab -

Verify your setup

Test your active health checks by simulating server failures and monitoring the response.

# Check NGINX Plus status
sudo systemctl status nginx-plus

Verify health check configuration

sudo nginx -T | grep -A5 health_check

Test health check endpoint

curl -I http://localhost:8080/dashboard.html

Monitor upstream status

curl -s http://localhost:8080/api/6/http/upstreams/backend | jq '.peers[] | {server: .server, state: .state, health_checks: .health_checks}'

Simulate server failure

sudo iptables -A OUTPUT -d 192.168.1.10 -j DROP

Watch server removal from load balancer (wait 15-30 seconds)

watch 'curl -s http://localhost:8080/api/6/http/upstreams/backend | jq ".peers[] | {server: .server, state: .state}"'
Note: Health checks may take 15-30 seconds to detect failures based on your interval and fail threshold settings. Monitor the dashboard or API during testing to observe state changes.

Common issues

SymptomCauseFix
Health checks not working Missing zone directive in upstream Add zone backend 64k; to upstream block
Match condition fails Backend returns different content Update match block or fix backend health endpoint
Servers marked unhealthy immediately Health endpoint unreachable Check firewall rules and backend server status
Dashboard shows 403 Forbidden IP not in allow list Add your IP to allow directives in dashboard config
SSL health checks fail Certificate validation issues Use proxy_ssl_verify off; or configure proper certificates

Advanced configuration patterns

Multi-tier health checks

Configure different health check strategies for different application tiers with varying sensitivity levels.

# Database tier - strict health checks
upstream database {
    zone db_backend 64k;
    server 192.168.2.10:5432 max_fails=1 fail_timeout=60s;
    server 192.168.2.11:5432 max_fails=1 fail_timeout=60s;
}

API tier - moderate health checks

upstream api { zone api_backend 64k; server 192.168.1.20:8080 max_fails=2 fail_timeout=30s; server 192.168.1.21:8080 max_fails=2 fail_timeout=30s; }

Frontend tier - lenient health checks

upstream frontend { zone frontend_backend 64k; server 192.168.1.30:80 max_fails=3 fail_timeout=15s; server 192.168.1.31:80 max_fails=3 fail_timeout=15s; } server { listen 443 ssl; server_name app.example.com; location /api/ { proxy_pass http://api; health_check interval=5s fails=2 passes=2 uri=/api/health; } location / { proxy_pass http://frontend; health_check interval=10s fails=3 passes=1 uri=/status; } }

Integration with external monitoring

You can integrate NGINX Plus health checks with external monitoring systems. The Prometheus Blackbox Exporter provides additional endpoint monitoring capabilities that complement NGINX Plus active health checks.

For comprehensive alerting workflows, consider setting up Prometheus AlertManager to handle complex notification routing based on health check status and other infrastructure metrics.

Next steps

Running this in production?

Want this handled for you? Running this at scale adds a second layer of work: capacity planning, failover drills, cost control, and on-call. Our managed platform covers monitoring, backups and 24/7 response by default.

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle managed cloud infrastructure for businesses that depend on uptime. From initial setup to ongoing operations.