Learn to deploy Apache Airflow in production with PostgreSQL backend, systemd services, SSL/TLS encryption, and security hardening for enterprise workflow orchestration.
Prerequisites
- Root or sudo access
- 4GB RAM minimum
- 20GB disk space
- Domain name for SSL certificate
- Basic Python knowledge
What this solves
Apache Airflow is a powerful platform for developing, scheduling, and monitoring workflows as directed acyclic graphs (DAGs). This tutorial shows you how to deploy Airflow in production with PostgreSQL as the backend database, proper user management, systemd services, and security hardening. You'll have a scalable workflow orchestration platform ready for enterprise data pipelines.
Step-by-step installation
Update system packages
Start by updating your package manager to ensure you get the latest versions of all dependencies.
sudo apt update && sudo apt upgrade -y
Install system dependencies
Install Python, PostgreSQL client libraries, and other required packages for Airflow compilation and operation.
sudo apt install -y python3 python3-pip python3-venv python3-dev \
build-essential libpq-dev libffi-dev libssl-dev \
postgresql-client redis-tools git curl
Create dedicated airflow user
Create a dedicated system user for running Airflow services. This follows security best practices by avoiding running services as root.
sudo useradd -r -m -s /bin/bash airflow
sudo usermod -aG airflow airflow
Install and configure PostgreSQL
Install PostgreSQL server and create the database and user for Airflow. We'll use a dedicated database for better isolation and performance.
sudo apt install -y postgresql postgresql-contrib
sudo systemctl enable --now postgresql
Create Airflow database and user
Set up the PostgreSQL database, user, and permissions for Airflow. Use a strong password in production environments.
sudo -u postgres createuser airflow
sudo -u postgres createdb airflow_db --owner airflow
sudo -u postgres psql -c "ALTER USER airflow PASSWORD 'AirflowDB2024!Strong';"
sudo -u postgres psql -c "GRANT ALL PRIVILEGES ON DATABASE airflow_db TO airflow;"
Install Redis for task queue
Install Redis to handle Airflow's task queue and caching. Redis provides better performance than database-based queuing.
sudo apt install -y redis-server
sudo systemctl enable --now redis-server
Create Python virtual environment
Switch to the airflow user and create a virtual environment to isolate Airflow dependencies from system packages.
sudo -u airflow bash
cd /home/airflow
python3 -m venv airflow-env
source airflow-env/bin/activate
pip install --upgrade pip setuptools wheel
Install Apache Airflow
Install Airflow with PostgreSQL and Redis providers. Set environment variables for proper dependency resolution.
export AIRFLOW_HOME=/home/airflow/airflow
export AIRFLOW_VERSION=2.8.1
export PYTHON_VERSION="$(python3 -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")')"
export CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
pip install "apache-airflow[postgres,redis,celery,crypto,ssh]==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
Configure Airflow settings
Create the Airflow configuration file with PostgreSQL backend, Redis broker, and production-ready settings.
mkdir -p /home/airflow/airflow
cd /home/airflow/airflow
[core]
dags_folder = /home/airflow/airflow/dags
hostname_callable = airflow.utils.net.get_host_ip_address
default_timezone = utc
executor = CeleryExecutor
sql_alchemy_conn = postgresql://airflow:AirflowDB2024!Strong@localhost:5432/airflow_db
load_examples = False
max_active_runs_per_dag = 16
max_active_tasks_per_dag = 16
store_serialized_dags = True
store_dag_code = True
[database]
sql_alchemy_pool_size = 10
sql_alchemy_max_overflow = 20
sql_alchemy_pool_recycle = 3600
sql_alchemy_pool_pre_ping = True
[webserver]
base_url = https://airflow.example.com
web_server_host = 127.0.0.1
web_server_port = 8080
workers = 4
worker_timeout = 120
worker_refresh_batch_size = 1
worker_refresh_interval = 6000
secret_key = $(python3 -c 'import secrets; print(secrets.token_urlsafe(32))')
expose_config = False
filter_by_owner = True
[celery]
broker_url = redis://localhost:6379/0
result_backend = db+postgresql://airflow:AirflowDB2024!Strong@localhost:5432/airflow_db
worker_concurrency = 16
worker_log_server_port = 8793
[scheduler]
max_tis_per_query = 512
scheduler_heartbeat_sec = 5
num_runs = -1
processor_poll_interval = 1
min_file_process_interval = 30
dag_dir_list_interval = 300
max_dagruns_to_create_per_loop = 10
max_dagruns_per_loop_to_schedule = 20
[logging]
logging_level = INFO
fab_logging_level = WARN
logging_config_class = airflow.config_templates.airflow_local_settings.DEFAULT_LOGGING_CONFIG
remote_logging = False
[metrics]
statsd_on = False
[secrets]
backend = airflow.providers.hashicorp.secrets.vault.VaultBackend
backend_kwargs = {"connections_path": "connections", "variables_path": "variables"}
Initialize Airflow database
Initialize the Airflow metadata database and create an admin user for the web interface.
source /home/airflow/airflow-env/bin/activate
export AIRFLOW_HOME=/home/airflow/airflow
airflow db init
airflow users create \
--username admin \
--firstname Admin \
--lastname User \
--role Admin \
--email admin@example.com \
--password AdminAirflow2024!
Create DAGs directory
Create the directory structure for DAGs and set proper permissions. DAGs are the workflow definitions that Airflow will execute.
mkdir -p /home/airflow/airflow/dags
mkdir -p /home/airflow/airflow/logs
mkdir -p /home/airflow/airflow/plugins
chmod 755 /home/airflow/airflow/dags
chmod 755 /home/airflow/airflow/logs
chmod 755 /home/airflow/airflow/plugins
Exit airflow user session
Return to your regular user session to create systemd service files and configure the reverse proxy.
exit
Create systemd service for Airflow webserver
Create a systemd service file to manage the Airflow webserver process. This ensures automatic startup and proper process management.
[Unit]
Description=Airflow webserver daemon
After=network.target postgresql.service redis.service
Wants=postgresql.service redis.service
[Service]
User=airflow
Group=airflow
Type=notify
PrivateTmp=true
Environment=PATH=/home/airflow/airflow-env/bin
Environment=AIRFLOW_HOME=/home/airflow/airflow
ExecStart=/home/airflow/airflow-env/bin/airflow webserver --pid /run/airflow/webserver.pid
ExecReload=/bin/kill -s HUP $MAINPID
KillMode=mixed
TimeoutStopSec=60
Restart=on-failure
RestartSec=30
RuntimeDirectory=airflow
RuntimeDirectoryMode=0755
[Install]
WantedBy=multi-user.target
Create systemd service for Airflow scheduler
Create a systemd service for the scheduler component, which handles DAG parsing and task scheduling.
[Unit]
Description=Airflow scheduler daemon
After=network.target postgresql.service redis.service
Wants=postgresql.service redis.service
[Service]
User=airflow
Group=airflow
Type=notify
PrivateTmp=true
Environment=PATH=/home/airflow/airflow-env/bin
Environment=AIRFLOW_HOME=/home/airflow/airflow
ExecStart=/home/airflow/airflow-env/bin/airflow scheduler
Restart=on-failure
RestartSec=30
KillMode=mixed
TimeoutStopSec=60
[Install]
WantedBy=multi-user.target
Create systemd service for Celery worker
Create a systemd service for Celery workers that execute the actual tasks defined in your DAGs.
[Unit]
Description=Airflow celery worker daemon
After=network.target postgresql.service redis.service
Wants=postgresql.service redis.service
[Service]
User=airflow
Group=airflow
Type=notify
PrivateTmp=true
Environment=PATH=/home/airflow/airflow-env/bin
Environment=AIRFLOW_HOME=/home/airflow/airflow
ExecStart=/home/airflow/airflow-env/bin/airflow celery worker
Restart=on-failure
RestartSec=30
KillMode=mixed
TimeoutStopSec=600
[Install]
WantedBy=multi-user.target
Install and configure NGINX reverse proxy
Install NGINX to provide SSL termination and reverse proxy capabilities for secure access to Airflow.
sudo apt install -y nginx certbot python3-certbot-nginx
Configure NGINX for Airflow
Create an NGINX virtual host configuration that proxies requests to the Airflow webserver with proper headers and security settings.
upstream airflow {
server 127.0.0.1:8080 fail_timeout=0;
}
server {
listen 80;
server_name airflow.example.com;
return 301 https://$server_name$request_uri;
}
server {
listen 443 ssl http2;
server_name airflow.example.com;
# SSL configuration will be added by certbot
# Security headers
add_header X-Frame-Options DENY;
add_header X-Content-Type-Options nosniff;
add_header X-XSS-Protection "1; mode=block";
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
# Increase client max body size for file uploads
client_max_body_size 100M;
location / {
proxy_pass http://airflow;
proxy_set_header Host $http_host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_redirect off;
proxy_buffering off;
proxy_request_buffering off;
# WebSocket support
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
# Timeout settings
proxy_connect_timeout 60s;
proxy_send_timeout 60s;
proxy_read_timeout 300s;
}
# Static files
location /static/ {
alias /home/airflow/airflow-env/lib/python3.*/site-packages/airflow/www/static/;
expires 1y;
add_header Cache-Control "public, immutable";
}
}
Enable NGINX site and obtain SSL certificate
Enable the NGINX configuration and use Certbot to obtain a free SSL certificate from Let's Encrypt.
sudo ln -s /etc/nginx/sites-available/airflow /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl enable --now nginx
sudo certbot --nginx -d airflow.example.com
Configure firewall rules
Open the necessary ports in your firewall for HTTP, HTTPS, and PostgreSQL access. We only allow specific ports needed for Airflow operation.
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw allow 5432/tcp
sudo ufw --force enable
Start and enable all Airflow services
Start all the Airflow services and enable them to start automatically on system boot.
sudo systemctl daemon-reload
sudo systemctl enable --now airflow-webserver
sudo systemctl enable --now airflow-scheduler
sudo systemctl enable --now airflow-worker
Create a sample DAG
Create a simple DAG to test your Airflow installation and verify that scheduling works correctly.
from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.bash import BashOperator
from airflow.operators.python import PythonOperator
def print_hello():
return "Hello from Airflow!"
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2024, 1, 1),
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5)
}
dag = DAG(
'sample_dag',
default_args=default_args,
description='A simple sample DAG',
schedule=timedelta(hours=1),
catchup=False,
tags=['sample']
)
hello_task = PythonOperator(
task_id='hello_task',
python_callable=print_hello,
dag=dag
)
date_task = BashOperator(
task_id='date_task',
bash_command='date',
dag=dag
)
hello_task >> date_task
sudo chown airflow:airflow /home/airflow/airflow/dags/sample_dag.py
sudo chmod 644 /home/airflow/airflow/dags/sample_dag.py
Production security hardening
Configure PostgreSQL security
Harden PostgreSQL by configuring authentication methods and connection limits. This provides better security for the Airflow database.
# Add these lines to restrict airflow user to local connections only
local airflow_db airflow md5
host airflow_db airflow 127.0.0.1/32 md5
host airflow_db airflow ::1/128 md5
sudo systemctl reload postgresql
Configure Redis security
Secure Redis by enabling authentication and binding to localhost only to prevent unauthorized access.
bind 127.0.0.1 ::1
requirepass RedisAirflow2024!Strong
maxmemory 256mb
maxmemory-policy allkeys-lru
save 900 1
save 300 10
save 60 10000
sudo systemctl restart redis
Update Airflow configuration for Redis auth
Update the Airflow configuration to use Redis authentication and optimize performance settings.
[celery]
broker_url = redis://:RedisAirflow2024!Strong@localhost:6379/0
result_backend = db+postgresql://airflow:AirflowDB2024!Strong@localhost:5432/airflow_db
Set up log rotation
Configure log rotation to prevent disk space issues in production environments.
/home/airflow/airflow/logs/*.log {
daily
missingok
rotate 14
compress
delaycompress
notifempty
create 644 airflow airflow
sharedscripts
postrotate
systemctl reload airflow-webserver airflow-scheduler airflow-worker
endscript
}
Restart services with new configuration
Restart all Airflow services to apply the security configuration changes.
sudo systemctl restart airflow-webserver
sudo systemctl restart airflow-scheduler
sudo systemctl restart airflow-worker
Verify your setup
# Check all services are running
sudo systemctl status airflow-webserver
sudo systemctl status airflow-scheduler
sudo systemctl status airflow-worker
sudo systemctl status postgresql
sudo systemctl status redis
sudo systemctl status nginx
Test database connectivity
sudo -u airflow bash -c 'source /home/airflow/airflow-env/bin/activate && export AIRFLOW_HOME=/home/airflow/airflow && airflow db check'
Check web interface accessibility
curl -I https://airflow.example.com
List DAGs
sudo -u airflow bash -c 'source /home/airflow/airflow-env/bin/activate && export AIRFLOW_HOME=/home/airflow/airflow && airflow dags list'
Access the Airflow web interface at https://airflow.example.com and log in with username admin and password AdminAirflow2024!. You should see the sample DAG in the interface.
Performance tuning
| Component | Setting | Recommended Value |
|---|---|---|
| PostgreSQL | max_connections | 200 |
| PostgreSQL | shared_buffers | 25% of RAM |
| Airflow | worker_concurrency | CPU cores × 4 |
| Airflow | sql_alchemy_pool_size | 10-20 |
| Redis | maxmemory | 10% of total RAM |
Common issues
| Symptom | Cause | Fix |
|---|---|---|
| Webserver won't start | Database connection failed | Check PostgreSQL status and credentials in airflow.cfg |
| Tasks stuck in queued state | Celery worker not running | sudo systemctl restart airflow-worker |
| DAGs not appearing | File permissions or syntax errors | Check chown airflow:airflow and run python dag_file.py |
| SSL certificate errors | Certbot failed or expired | Run sudo certbot renew and restart nginx |
| High memory usage | Too many concurrent workers | Reduce worker_concurrency in airflow.cfg |
| Database locks | Too many connections | Tune sql_alchemy_pool_size and PostgreSQL max_connections |
Next steps
Automated install script
Run this to automate the entire setup
#!/usr/bin/env bash
set -euo pipefail
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'
# Variables
AIRFLOW_USER="airflow"
AIRFLOW_HOME="/home/airflow/airflow"
AIRFLOW_VERSION="2.8.1"
DB_NAME="airflow_db"
DB_PASSWORD="AirflowDB2024!Strong"
# Usage message
usage() {
echo "Usage: $0"
echo "Install and configure Apache Airflow with PostgreSQL backend"
exit 1
}
# Logging functions
log_info() { echo -e "${GREEN}[INFO]${NC} $1"; }
log_warn() { echo -e "${YELLOW}[WARN]${NC} $1"; }
log_error() { echo -e "${RED}[ERROR]${NC} $1"; }
# Cleanup function for rollback
cleanup() {
log_error "Installation failed. Cleaning up..."
systemctl stop airflow-webserver airflow-scheduler 2>/dev/null || true
systemctl disable airflow-webserver airflow-scheduler 2>/dev/null || true
rm -f /etc/systemd/system/airflow-*.service
userdel -r airflow 2>/dev/null || true
sudo -u postgres dropdb airflow_db 2>/dev/null || true
sudo -u postgres dropuser airflow 2>/dev/null || true
exit 1
}
trap cleanup ERR
# Check if running as root
if [[ $EUID -ne 0 ]]; then
log_error "This script must be run as root"
exit 1
fi
# Auto-detect distribution
if [ -f /etc/os-release ]; then
. /etc/os-release
case "$ID" in
ubuntu|debian)
PKG_MGR="apt"
PKG_INSTALL="apt install -y"
PKG_UPDATE="apt update && apt upgrade -y"
POSTGRES_SERVICE="postgresql"
REDIS_SERVICE="redis-server"
REDIS_PKG="redis-server"
POSTGRES_PKG="postgresql postgresql-contrib"
PYTHON_DEV="python3-dev"
;;
almalinux|rocky|centos|rhel|ol|fedora)
PKG_MGR="dnf"
PKG_INSTALL="dnf install -y"
PKG_UPDATE="dnf update -y"
POSTGRES_SERVICE="postgresql"
REDIS_SERVICE="redis"
REDIS_PKG="redis"
POSTGRES_PKG="postgresql-server postgresql-contrib"
PYTHON_DEV="python3-devel"
;;
amzn)
PKG_MGR="yum"
PKG_INSTALL="yum install -y"
PKG_UPDATE="yum update -y"
POSTGRES_SERVICE="postgresql"
REDIS_SERVICE="redis"
REDIS_PKG="redis"
POSTGRES_PKG="postgresql-server postgresql-contrib"
PYTHON_DEV="python3-devel"
;;
*)
log_error "Unsupported distribution: $ID"
exit 1
;;
esac
else
log_error "Cannot detect distribution"
exit 1
fi
log_info "Detected distribution: $ID"
echo "[1/12] Updating system packages..."
$PKG_UPDATE
echo "[2/12] Installing system dependencies..."
if [[ "$ID" == "ubuntu" || "$ID" == "debian" ]]; then
$PKG_INSTALL python3 python3-pip python3-venv $PYTHON_DEV build-essential libpq-dev libffi-dev libssl-dev postgresql-client redis-tools git curl
else
$PKG_INSTALL python3 python3-pip gcc gcc-c++ make postgresql-devel libffi-devel openssl-devel postgresql redis git curl
fi
echo "[3/12] Creating dedicated airflow user..."
useradd -r -m -s /bin/bash $AIRFLOW_USER 2>/dev/null || log_warn "User $AIRFLOW_USER already exists"
echo "[4/12] Installing and configuring PostgreSQL..."
$PKG_INSTALL $POSTGRES_PKG
# Initialize PostgreSQL for RHEL-based systems
if [[ "$ID" != "ubuntu" && "$ID" != "debian" ]]; then
if [ ! -f /var/lib/pgsql/data/postgresql.conf ]; then
postgresql-setup --initdb
fi
fi
systemctl enable --now $POSTGRES_SERVICE
echo "[5/12] Creating Airflow database and user..."
sleep 3
sudo -u postgres createuser $AIRFLOW_USER 2>/dev/null || log_warn "PostgreSQL user already exists"
sudo -u postgres createdb $DB_NAME --owner $AIRFLOW_USER 2>/dev/null || log_warn "Database already exists"
sudo -u postgres psql -c "ALTER USER $AIRFLOW_USER PASSWORD '$DB_PASSWORD';"
sudo -u postgres psql -c "GRANT ALL PRIVILEGES ON DATABASE $DB_NAME TO $AIRFLOW_USER;"
echo "[6/12] Installing and configuring Redis..."
$PKG_INSTALL $REDIS_PKG
systemctl enable --now $REDIS_SERVICE
echo "[7/12] Creating Python virtual environment..."
sudo -u $AIRFLOW_USER bash -c "
cd /home/$AIRFLOW_USER
python3 -m venv airflow-env
source airflow-env/bin/activate
pip install --upgrade pip setuptools wheel
"
echo "[8/12] Installing Apache Airflow..."
sudo -u $AIRFLOW_USER bash -c "
export AIRFLOW_HOME=$AIRFLOW_HOME
export AIRFLOW_VERSION=$AIRFLOW_VERSION
export PYTHON_VERSION=\$(python3 -c 'import sys; print(f\"{sys.version_info.major}.{sys.version_info.minor}\")')
export CONSTRAINT_URL=\"https://raw.githubusercontent.com/apache/airflow/constraints-\${AIRFLOW_VERSION}/constraints-\${PYTHON_VERSION}.txt\"
cd /home/$AIRFLOW_USER
source airflow-env/bin/activate
pip install \"apache-airflow[postgres,redis,celery,crypto,ssh]==\${AIRFLOW_VERSION}\" --constraint \"\${CONSTRAINT_URL}\"
"
echo "[9/12] Configuring Airflow..."
sudo -u $AIRFLOW_USER mkdir -p $AIRFLOW_HOME/{dags,logs,plugins}
sudo -u $AIRFLOW_USER tee $AIRFLOW_HOME/airflow.cfg > /dev/null << EOF
[core]
dags_folder = $AIRFLOW_HOME/dags
hostname_callable = airflow.utils.net.get_host_ip_address
default_timezone = utc
executor = CeleryExecutor
sql_alchemy_conn = postgresql://$AIRFLOW_USER:$DB_PASSWORD@localhost:5432/$DB_NAME
load_examples = False
max_active_runs_per_dag = 16
max_active_tasks_per_dag = 16
store_serialized_dags = True
[webserver]
web_server_port = 8080
base_url = http://localhost:8080
expose_config = False
authenticate = True
auth_backend = airflow.api.auth.backend.basic_auth
[celery]
broker_url = redis://localhost:6379/0
result_backend = db+postgresql://$AIRFLOW_USER:$DB_PASSWORD@localhost:5432/$DB_NAME
[scheduler]
catchup_by_default = False
max_threads = 2
[logging]
base_log_folder = $AIRFLOW_HOME/logs
remote_logging = False
EOF
chown -R $AIRFLOW_USER:$AIRFLOW_USER /home/$AIRFLOW_USER
chmod -R 750 /home/$AIRFLOW_USER
echo "[10/12] Initializing Airflow database..."
sudo -u $AIRFLOW_USER bash -c "
export AIRFLOW_HOME=$AIRFLOW_HOME
cd /home/$AIRFLOW_USER
source airflow-env/bin/activate
airflow db init
airflow users create --username admin --firstname Admin --lastname User --role Admin --email admin@example.com --password admin123
"
echo "[11/12] Creating systemd services..."
tee /etc/systemd/system/airflow-webserver.service > /dev/null << EOF
[Unit]
Description=Airflow webserver daemon
After=network.target postgresql.service redis.service
Wants=postgresql.service redis.service
[Service]
Environment=AIRFLOW_HOME=$AIRFLOW_HOME
User=$AIRFLOW_USER
Group=$AIRFLOW_USER
Type=simple
ExecStart=/home/$AIRFLOW_USER/airflow-env/bin/airflow webserver
Restart=on-failure
RestartSec=5s
[Install]
WantedBy=multi-user.target
EOF
tee /etc/systemd/system/airflow-scheduler.service > /dev/null << EOF
[Unit]
Description=Airflow scheduler daemon
After=network.target postgresql.service redis.service
Wants=postgresql.service redis.service
[Service]
Environment=AIRFLOW_HOME=$AIRFLOW_HOME
User=$AIRFLOW_USER
Group=$AIRFLOW_USER
Type=simple
ExecStart=/home/$AIRFLOW_USER/airflow-env/bin/airflow scheduler
Restart=on-failure
RestartSec=5s
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable airflow-webserver airflow-scheduler
systemctl start airflow-webserver airflow-scheduler
echo "[12/12] Verifying installation..."
sleep 5
# Verify services are running
if systemctl is-active --quiet airflow-webserver && systemctl is-active --quiet airflow-scheduler; then
log_info "Airflow services are running successfully"
else
log_error "Some Airflow services failed to start"
exit 1
fi
# Verify database connection
if sudo -u $AIRFLOW_USER bash -c "cd /home/$AIRFLOW_USER && source airflow-env/bin/activate && export AIRFLOW_HOME=$AIRFLOW_HOME && airflow db check" > /dev/null 2>&1; then
log_info "Database connection verified"
else
log_error "Database connection failed"
exit 1
fi
log_info "Apache Airflow installation completed successfully!"
log_info "Web interface: http://localhost:8080"
log_info "Default credentials: admin / admin123"
log_info "Configuration directory: $AIRFLOW_HOME"
log_warn "Please change the default password and update security settings for production use"
Review the script before running. Execute with: bash install.sh