NGINX Plus Active Health Checks Enterprise Setup

Configure NGINX Plus active health checks to automatically detect and remove unhealthy upstream servers, ensuring high availability and optimal load balancing for enterprise applications.

Prerequisites

Valid NGINX Plus license
Multiple upstream servers
Root access to load balancer
Basic NGINX configuration knowledge

What this solves

NGINX Plus active health checks automatically monitor upstream server health and remove unresponsive backends from the load balancer rotation. This prevents failed requests from reaching unhealthy servers and improves application availability by ensuring traffic only reaches functional backends.

Prerequisites

You need a valid NGINX Plus license, upstream servers to monitor, and root access to your load balancer server. This tutorial assumes you have basic knowledge of NGINX configuration and load balancing concepts.

Step-by-step configuration

Install NGINX Plus

Download and install NGINX Plus using the official repository. You'll need your license certificate and key files.

wget https://nginx.org/keys/nginx_signing.key
sudo apt-key add nginx_signing.key
echo "deb https://plus-pkgs.nginx.com/ubuntu $(lsb_release -cs) nginx-plus" | sudo tee /etc/apt/sources.list.d/nginx-plus.list
sudo apt update
sudo apt install -y nginx-plus

sudo rpm --import https://nginx.org/keys/nginx_signing.key
echo "[nginx-plus]
name=nginx-plus repo
baseurl=https://plus-pkgs.nginx.com/centos/$(rpm -E '%{rhel}')/\$basearch/
gpgcheck=1
enabled=1" | sudo tee /etc/yum.repos.d/nginx-plus.repo
sudo dnf install -y nginx-plus

Configure upstream server block

Define your upstream servers with health check parameters. The zone directive enables shared memory for health check status tracking.

upstream backend {
    zone backend 64k;
    server 192.168.1.10:80 max_fails=3 fail_timeout=30s;
    server 192.168.1.11:80 max_fails=3 fail_timeout=30s;
    server 192.168.1.12:80 max_fails=3 fail_timeout=30s;
}

Configure active health checks

Add the health_check directive to your location block. This configures active probes to monitor server health independently of client requests.

server {
    listen 80;
    server_name example.com;
    
    location / {
        proxy_pass http://backend;
        health_check interval=5s fails=3 passes=2 uri=/health match=server_ok;
        
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

Define health check match conditions

Create custom match conditions to validate server responses. This ensures servers return expected content, not just HTTP 200 status codes.

match server_ok {
    status 200;
    header Content-Type ~ "application/json";
    body ~ '"status":"ok"';
}

match api_healthy {
    status 200-299;
    header X-Health-Status = "healthy";
    body !~ "error|maintenance";
}

Configure advanced health check parameters

Set up comprehensive health checks with custom intervals, timeouts, and failure thresholds for different application requirements.

upstream api_servers {
    zone api_backend 128k;
    server 192.168.1.20:8080 slow_start=30s;
    server 192.168.1.21:8080 slow_start=30s;
    server 192.168.1.22:8080 slow_start=30s;
}

server {
    listen 443 ssl;
    server_name api.example.com;
    
    ssl_certificate /etc/ssl/certs/api.example.com.pem;
    ssl_certificate_key /etc/ssl/private/api.example.com.key;
    
    location /api/ {
        proxy_pass http://api_servers;
        health_check interval=10s fails=2 passes=3 uri=/api/health match=api_healthy;
        health_check_timeout 5s;
        
        proxy_connect_timeout 3s;
        proxy_send_timeout 10s;
        proxy_read_timeout 10s;
    }
}

Enable NGINX Plus dashboard

Configure the NGINX Plus API and dashboard to monitor health check status in real-time.

server {
    listen 8080;
    server_name localhost;
    
    # Restrict access to dashboard
    allow 127.0.0.1;
    allow 192.168.1.0/24;
    deny all;
    
    location /api {
        api write=on;
        access_log off;
    }
    
    location = /dashboard.html {
        root /usr/share/nginx/html;
    }
    
    location /swagger-ui {
        root /usr/share/nginx/html;
    }
}

Test and reload configuration

Validate your NGINX configuration syntax and reload the service to apply health check settings.

sudo nginx -t
sudo systemctl reload nginx
sudo systemctl status nginx

Configure upstream server health endpoints

Create health check endpoints on your backend servers that return appropriate status information.

# Example health endpoint response
curl -X GET http://192.168.1.10/health

Expected JSON response:
{
    "status": "ok",
    "timestamp": "2024-01-15T10:30:00Z",
    "version": "1.2.3",
    "checks": {
        "database": "healthy",
        "cache": "healthy",
        "disk_space": "healthy"
    }
}

Monitor health check status

Use NGINX Plus API for monitoring

Query the NGINX Plus API to get real-time health check status and upstream server states.

# Get upstream server status
curl http://localhost:8080/api/6/http/upstreams/backend

Get health check statistics
curl http://localhost:8080/api/6/http/upstreams/backend/servers

Enable/disable specific server
curl -X PATCH -d '{"down": true}' http://localhost:8080/api/6/http/upstreams/backend/servers/0

Set up Prometheus monitoring

Export NGINX Plus metrics to Prometheus for centralized monitoring and alerting. This integrates with your existing monitoring stack.

server {
    listen 9113;
    server_name localhost;
    
    allow 127.0.0.1;
    allow 192.168.1.0/24;
    deny all;
    
    location /metrics {
        access_log off;
        return 200 "# HELP nginx_up NGINX status\n# TYPE nginx_up gauge\nnginx_up 1\n";
        add_header Content-Type text/plain;
    }
    
    location /nginx_status {
        stub_status;
        access_log off;
    }
}

Configure log monitoring

Set up structured logging for health check events to track server failures and recovery patterns.

log_format health_check '$remote_addr - $remote_user [$time_local] '
                       '"$request" $status $body_bytes_sent '
                       '"$http_referer" "$http_user_agent" '
                       'upstream_addr="$upstream_addr" '
                       'upstream_status="$upstream_status" '
                       'upstream_response_time="$upstream_response_time"';

access_log /var/log/nginx/health_check.log health_check;

Configure alerting

Set up automated alerts

Configure alerting when upstream servers fail health checks or become unavailable. This ensures rapid response to infrastructure issues.

#!/bin/bash

Monitor NGINX Plus upstream health and send alerts
UPSTREAM_NAME="backend"
API_URL="http://localhost:8080/api/6/http/upstreams/$UPSTREAM_NAME"
ALERT_EMAIL="ops@example.com"

Get upstream status
STATUS=$(curl -s $API_URL | jq -r '.peers[] | select(.state == "unhealthy") | .server')

if [[ -n "$STATUS" ]]; then
    echo "ALERT: Unhealthy servers detected: $STATUS" | \
    mail -s "NGINX Plus Health Check Alert" $ALERT_EMAIL
fi

Schedule health monitoring

Add the monitoring script to cron for regular health check status validation.

chmod +x /usr/local/bin/nginx-health-monitor.sh
echo "/2    * /usr/local/bin/nginx-health-monitor.sh" | sudo crontab -

Verify your setup

Test your active health checks by simulating server failures and monitoring the response.

# Check NGINX Plus status
sudo systemctl status nginx-plus

Verify health check configuration
sudo nginx -T | grep -A5 health_check

Test health check endpoint
curl -I http://localhost:8080/dashboard.html

Monitor upstream status
curl -s http://localhost:8080/api/6/http/upstreams/backend | jq '.peers[] | {server: .server, state: .state, health_checks: .health_checks}'

Simulate server failure
sudo iptables -A OUTPUT -d 192.168.1.10 -j DROP

Watch server removal from load balancer (wait 15-30 seconds)
watch 'curl -s http://localhost:8080/api/6/http/upstreams/backend | jq ".peers[] | {server: .server, state: .state}"'

Note: Health checks may take 15-30 seconds to detect failures based on your interval and fail threshold settings. Monitor the dashboard or API during testing to observe state changes.

Common issues

Symptom	Cause	Fix
Health checks not working	Missing zone directive in upstream	Add `zone backend 64k;` to upstream block
Match condition fails	Backend returns different content	Update match block or fix backend health endpoint
Servers marked unhealthy immediately	Health endpoint unreachable	Check firewall rules and backend server status
Dashboard shows 403 Forbidden	IP not in allow list	Add your IP to `allow` directives in dashboard config
SSL health checks fail	Certificate validation issues	Use `proxy_ssl_verify off;` or configure proper certificates

Advanced configuration patterns

Multi-tier health checks

Configure different health check strategies for different application tiers with varying sensitivity levels.

# Database tier - strict health checks
upstream database {
    zone db_backend 64k;
    server 192.168.2.10:5432 max_fails=1 fail_timeout=60s;
    server 192.168.2.11:5432 max_fails=1 fail_timeout=60s;
}

API tier - moderate health checks
upstream api {
    zone api_backend 64k;
    server 192.168.1.20:8080 max_fails=2 fail_timeout=30s;
    server 192.168.1.21:8080 max_fails=2 fail_timeout=30s;
}

Frontend tier - lenient health checks
upstream frontend {
    zone frontend_backend 64k;
    server 192.168.1.30:80 max_fails=3 fail_timeout=15s;
    server 192.168.1.31:80 max_fails=3 fail_timeout=15s;
}

server {
    listen 443 ssl;
    server_name app.example.com;
    
    location /api/ {
        proxy_pass http://api;
        health_check interval=5s fails=2 passes=2 uri=/api/health;
    }
    
    location / {
        proxy_pass http://frontend;
        health_check interval=10s fails=3 passes=1 uri=/status;
    }
}

Integration with external monitoring

You can integrate NGINX Plus health checks with external monitoring systems. The Prometheus Blackbox Exporter provides additional endpoint monitoring capabilities that complement NGINX Plus active health checks.

For comprehensive alerting workflows, consider setting up Prometheus AlertManager to handle complex notification routing based on health check status and other infrastructure metrics.

Next steps

Running this in production?

Want this handled for you? Running this at scale adds a second layer of work: capacity planning, failover drills, cost control, and on-call. Our managed platform covers monitoring, backups and 24/7 response by default.

Automated install script

Run this to automate the entire setup

install.sh

#!/usr/bin/env bash
set -euo pipefail

# NGINX Plus Active Health Checks Installation Script
# Supports Ubuntu, Debian, AlmaLinux, Rocky Linux, CentOS, RHEL

# Colors
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'

# Variables
DOMAIN=${1:-"example.com"}
API_DOMAIN=${2:-"api.example.com"}
BACKEND_IPS=${3:-"192.168.1.10,192.168.1.11,192.168.1.12"}

usage() {
    echo "Usage: $0 [domain] [api_domain] [backend_ips]"
    echo "Example: $0 mysite.com api.mysite.com 10.0.1.10,10.0.1.11,10.0.1.12"
    exit 1
}

error() {
    echo -e "${RED}[ERROR]${NC} $1" >&2
}

warning() {
    echo -e "${YELLOW}[WARNING]${NC} $1"
}

success() {
    echo -e "${GREEN}[SUCCESS]${NC} $1"
}

cleanup() {
    if [[ ${BACKUP_DIR:-} ]]; then
        warning "Failure detected. Backup files available in $BACKUP_DIR"
    fi
}

trap cleanup ERR

# Validate arguments
if [[ $# -gt 3 ]]; then
    usage
fi

# Check root privileges
if [[ $EUID -ne 0 ]]; then
    error "This script must be run as root"
    exit 1
fi

echo "[1/8] Detecting operating system..."

# Detect distribution
if [ -f /etc/os-release ]; then
    . /etc/os-release
    case "$ID" in
        ubuntu|debian) 
            PKG_MGR="apt"
            PKG_INSTALL="apt install -y"
            PKG_UPDATE="apt update"
            NGINX_CONFIG_DIR="/etc/nginx"
            SITES_DIR="/etc/nginx/sites-available"
            SITES_ENABLED="/etc/nginx/sites-enabled"
            USE_SITES=true
            ;;
        almalinux|rocky|centos|rhel|ol)
            PKG_MGR="dnf"
            PKG_INSTALL="dnf install -y"
            PKG_UPDATE="dnf check-update || true"
            NGINX_CONFIG_DIR="/etc/nginx"
            SITES_DIR="/etc/nginx/conf.d"
            USE_SITES=false
            ;;
        fedora)
            PKG_MGR="dnf"
            PKG_INSTALL="dnf install -y"
            PKG_UPDATE="dnf check-update || true"
            NGINX_CONFIG_DIR="/etc/nginx"
            SITES_DIR="/etc/nginx/conf.d"
            USE_SITES=false
            ;;
        amzn)
            PKG_MGR="yum"
            PKG_INSTALL="yum install -y"
            PKG_UPDATE="yum check-update || true"
            NGINX_CONFIG_DIR="/etc/nginx"
            SITES_DIR="/etc/nginx/conf.d"
            USE_SITES=false
            ;;
        *)
            error "Unsupported distribution: $ID"
            exit 1
            ;;
    esac
else
    error "Cannot detect operating system"
    exit 1
fi

success "Detected $PRETTY_NAME"

echo "[2/8] Installing dependencies..."
$PKG_UPDATE
$PKG_INSTALL wget curl gnupg2

echo "[3/8] Setting up NGINX Plus repository..."

# Create backup directory
BACKUP_DIR="/tmp/nginx-plus-backup-$(date +%Y%m%d-%H%M%S)"
mkdir -p "$BACKUP_DIR"

# Backup existing nginx config if present
if [[ -d "$NGINX_CONFIG_DIR" ]]; then
    cp -r "$NGINX_CONFIG_DIR" "$BACKUP_DIR/"
fi

# Download NGINX signing key
wget -qO /tmp/nginx_signing.key https://nginx.org/keys/nginx_signing.key

if [[ "$PKG_MGR" == "apt" ]]; then
    apt-key add /tmp/nginx_signing.key
    echo "deb https://plus-pkgs.nginx.com/$ID $(lsb_release -cs) nginx-plus" > /etc/apt/sources.list.d/nginx-plus.list
    $PKG_UPDATE
else
    rpm --import /tmp/nginx_signing.key
    cat > /etc/yum.repos.d/nginx-plus.repo << EOF
[nginx-plus]
name=nginx-plus repo
baseurl=https://plus-pkgs.nginx.com/centos/\$releasever/\$basearch/
gpgcheck=1
enabled=1
EOF
fi

echo "[4/8] Installing NGINX Plus..."
$PKG_INSTALL nginx-plus

echo "[5/8] Configuring upstream servers..."

# Create upstream configuration
cat > "${NGINX_CONFIG_DIR}/conf.d/upstream.conf" << EOF
upstream backend {
    zone backend 64k;
EOF

IFS=',' read -ra IPS <<< "$BACKEND_IPS"
for ip in "${IPS[@]}"; do
    echo "    server ${ip}:80 max_fails=3 fail_timeout=30s;" >> "${NGINX_CONFIG_DIR}/conf.d/upstream.conf"
done

cat >> "${NGINX_CONFIG_DIR}/conf.d/upstream.conf" << EOF
}

upstream api_servers {
    zone api_backend 128k;
EOF

for ip in "${IPS[@]}"; do
    echo "    server ${ip}:8080 slow_start=30s;" >> "${NGINX_CONFIG_DIR}/conf.d/upstream.conf"
done

cat >> "${NGINX_CONFIG_DIR}/conf.d/upstream.conf" << EOF
}

match server_ok {
    status 200;
    header Content-Type ~ "application/json";
    body ~ '"status":"ok"';
}

match api_healthy {
    status 200-299;
    header X-Health-Status = "healthy";
    body !~ "error|maintenance";
}
EOF

echo "[6/8] Configuring virtual hosts..."

# Main site configuration
if [[ "$USE_SITES" == true ]]; then
    MAIN_CONFIG="${SITES_DIR}/${DOMAIN}"
else
    MAIN_CONFIG="${SITES_DIR}/${DOMAIN}.conf"
fi

cat > "$MAIN_CONFIG" << EOF
server {
    listen 80;
    server_name $DOMAIN;
    
    location / {
        proxy_pass http://backend;
        health_check interval=5s fails=3 passes=2 uri=/health match=server_ok;
        
        proxy_set_header Host \$host;
        proxy_set_header X-Real-IP \$remote_addr;
        proxy_set_header X-Forwarded-For \$proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto \$scheme;
        
        proxy_connect_timeout 3s;
        proxy_send_timeout 10s;
        proxy_read_timeout 10s;
    }
}
EOF

# API site configuration
if [[ "$USE_SITES" == true ]]; then
    API_CONFIG="${SITES_DIR}/${API_DOMAIN}"
else
    API_CONFIG="${SITES_DIR}/${API_DOMAIN}.conf"
fi

cat > "$API_CONFIG" << EOF
server {
    listen 80;
    server_name $API_DOMAIN;
    
    location /api/ {
        proxy_pass http://api_servers;
        health_check interval=10s fails=2 passes=3 uri=/api/health match=api_healthy;
        health_check_timeout 5s;
        
        proxy_set_header Host \$host;
        proxy_set_header X-Real-IP \$remote_addr;
        proxy_set_header X-Forwarded-For \$proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto \$scheme;
        
        proxy_connect_timeout 3s;
        proxy_send_timeout 10s;
        proxy_read_timeout 10s;
    }
}
EOF

# Dashboard configuration
cat > "${NGINX_CONFIG_DIR}/conf.d/dashboard.conf" << EOF
server {
    listen 8080;
    server_name localhost;
    
    # Restrict access to dashboard
    allow 127.0.0.1;
    allow 10.0.0.0/8;
    allow 172.16.0.0/12;
    allow 192.168.0.0/16;
    deny all;
    
    location /api {
        api write=on;
        access_log off;
    }
    
    location = /dashboard.html {
        root /usr/share/nginx/html;
    }
    
    location /swagger-ui {
        root /usr/share/nginx/html;
    }
}
EOF

# Enable sites if using sites-available
if [[ "$USE_SITES" == true ]]; then
    ln -sf "${SITES_DIR}/${DOMAIN}" "${SITES_ENABLED}/"
    ln -sf "${SITES_DIR}/${API_DOMAIN}" "${SITES_ENABLED}/"
    # Remove default site
    rm -f "${SITES_ENABLED}/default"
fi

echo "[7/8] Setting permissions and enabling service..."

# Set proper permissions
chown -R root:root "$NGINX_CONFIG_DIR"
find "$NGINX_CONFIG_DIR" -type d -exec chmod 755 {} \;
find "$NGINX_CONFIG_DIR" -type f -exec chmod 644 {} \;

# Test configuration
nginx -t

# Configure firewall
if command -v firewall-cmd >/dev/null 2>&1; then
    firewall-cmd --permanent --add-service=http
    firewall-cmd --permanent --add-port=8080/tcp
    firewall-cmd --reload
elif command -v ufw >/dev/null 2>&1; then
    ufw allow 80/tcp
    ufw allow 8080/tcp
fi

# Enable and start NGINX Plus
systemctl enable nginx
systemctl restart nginx

echo "[8/8] Verifying installation..."

# Wait for service to start
sleep 2

if systemctl is-active --quiet nginx; then
    success "NGINX Plus is running"
else
    error "NGINX Plus failed to start"
    exit 1
fi

# Test configuration endpoints
if curl -s http://localhost:8080/api/nginx >/dev/null; then
    success "NGINX Plus API is accessible"
else
    warning "NGINX Plus API may not be accessible"
fi

success "NGINX Plus with active health checks installed successfully!"
echo
echo "Configuration details:"
echo "- Main site: http://$DOMAIN"
echo "- API site: http://$API_DOMAIN"
echo "- Dashboard: http://localhost:8080/dashboard.html"
echo "- API endpoint: http://localhost:8080/api"
echo
echo "Backend servers configured:"
for ip in "${IPS[@]}"; do
    echo "  - ${ip}:80 (web)"
    echo "  - ${ip}:8080 (api)"
done
echo
echo "Health check endpoints expected:"
echo "  - GET /health (returns JSON with status:ok)"
echo "  - GET /api/health (returns X-Health-Status: healthy header)"

Review the script before running. Execute with: bash install.sh

#nginx-plus #load-balancing #health-checks #high-availability #enterprise

Implement NGINX Plus active health checks for enterprise environments

Prerequisites

What this solves

Prerequisites

Step-by-step configuration

Install NGINX Plus

Configure upstream server block

Configure active health checks

Define health check match conditions

Configure advanced health check parameters

Enable NGINX Plus dashboard

Test and reload configuration

Configure upstream server health endpoints

Expected JSON response:

Monitor health check status

Use NGINX Plus API for monitoring

Get health check statistics

Enable/disable specific server

Set up Prometheus monitoring

Configure log monitoring

Configure alerting

Set up automated alerts

Monitor NGINX Plus upstream health and send alerts

Get upstream status

Schedule health monitoring

Verify your setup

Verify health check configuration

Test health check endpoint

Monitor upstream status

Simulate server failure

Watch server removal from load balancer (wait 15-30 seconds)

Common issues

Advanced configuration patterns

Multi-tier health checks

API tier - moderate health checks

Frontend tier - lenient health checks

Integration with external monitoring

Next steps

Running this in production?

Related tutorials

Setup OpenResty load balancing with health checks and automatic failover

Configure Apache reverse proxy with caching for microservices

Implement Apache load balancing with SSL termination and health checks

Don't want to manage this yourself?