Install and configure Apache Airflow with PostgreSQL backend and secure production deployment

Intermediate 45 min Apr 01, 2026 45 views
Ubuntu 24.04 Ubuntu 22.04 Debian 12 AlmaLinux 9 Rocky Linux 9 Fedora 41

Learn to deploy Apache Airflow in production with PostgreSQL backend, systemd services, SSL/TLS encryption, and security hardening for enterprise workflow orchestration.

Prerequisites

  • Root or sudo access
  • 4GB RAM minimum
  • 20GB disk space
  • Domain name for SSL certificate
  • Basic Python knowledge

What this solves

Apache Airflow is a powerful platform for developing, scheduling, and monitoring workflows as directed acyclic graphs (DAGs). This tutorial shows you how to deploy Airflow in production with PostgreSQL as the backend database, proper user management, systemd services, and security hardening. You'll have a scalable workflow orchestration platform ready for enterprise data pipelines.

Step-by-step installation

Update system packages

Start by updating your package manager to ensure you get the latest versions of all dependencies.

sudo apt update && sudo apt upgrade -y
sudo dnf update -y

Install system dependencies

Install Python, PostgreSQL client libraries, and other required packages for Airflow compilation and operation.

sudo apt install -y python3 python3-pip python3-venv python3-dev \
  build-essential libpq-dev libffi-dev libssl-dev \
  postgresql-client redis-tools git curl
sudo dnf install -y python3 python3-pip python3-devel \
  gcc gcc-c++ make postgresql-devel libffi-devel openssl-devel \
  postgresql redis git curl

Create dedicated airflow user

Create a dedicated system user for running Airflow services. This follows security best practices by avoiding running services as root.

sudo useradd -r -m -s /bin/bash airflow
sudo usermod -aG airflow airflow

Install and configure PostgreSQL

Install PostgreSQL server and create the database and user for Airflow. We'll use a dedicated database for better isolation and performance.

sudo apt install -y postgresql postgresql-contrib
sudo systemctl enable --now postgresql
sudo dnf install -y postgresql-server postgresql-contrib
sudo postgresql-setup --initdb
sudo systemctl enable --now postgresql

Create Airflow database and user

Set up the PostgreSQL database, user, and permissions for Airflow. Use a strong password in production environments.

sudo -u postgres createuser airflow
sudo -u postgres createdb airflow_db --owner airflow
sudo -u postgres psql -c "ALTER USER airflow PASSWORD 'AirflowDB2024!Strong';"
sudo -u postgres psql -c "GRANT ALL PRIVILEGES ON DATABASE airflow_db TO airflow;"

Install Redis for task queue

Install Redis to handle Airflow's task queue and caching. Redis provides better performance than database-based queuing.

sudo apt install -y redis-server
sudo systemctl enable --now redis-server
sudo dnf install -y redis
sudo systemctl enable --now redis

Create Python virtual environment

Switch to the airflow user and create a virtual environment to isolate Airflow dependencies from system packages.

sudo -u airflow bash
cd /home/airflow
python3 -m venv airflow-env
source airflow-env/bin/activate
pip install --upgrade pip setuptools wheel

Install Apache Airflow

Install Airflow with PostgreSQL and Redis providers. Set environment variables for proper dependency resolution.

export AIRFLOW_HOME=/home/airflow/airflow
export AIRFLOW_VERSION=2.8.1
export PYTHON_VERSION="$(python3 -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")')"
export CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"

pip install "apache-airflow[postgres,redis,celery,crypto,ssh]==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"

Configure Airflow settings

Create the Airflow configuration file with PostgreSQL backend, Redis broker, and production-ready settings.

mkdir -p /home/airflow/airflow
cd /home/airflow/airflow
[core]
dags_folder = /home/airflow/airflow/dags
hostname_callable = airflow.utils.net.get_host_ip_address
default_timezone = utc
executor = CeleryExecutor
sql_alchemy_conn = postgresql://airflow:AirflowDB2024!Strong@localhost:5432/airflow_db
load_examples = False
max_active_runs_per_dag = 16
max_active_tasks_per_dag = 16
store_serialized_dags = True
store_dag_code = True

[database]
sql_alchemy_pool_size = 10
sql_alchemy_max_overflow = 20
sql_alchemy_pool_recycle = 3600
sql_alchemy_pool_pre_ping = True

[webserver]
base_url = https://airflow.example.com
web_server_host = 127.0.0.1
web_server_port = 8080
workers = 4
worker_timeout = 120
worker_refresh_batch_size = 1
worker_refresh_interval = 6000
secret_key = $(python3 -c 'import secrets; print(secrets.token_urlsafe(32))')
expose_config = False
filter_by_owner = True

[celery]
broker_url = redis://localhost:6379/0
result_backend = db+postgresql://airflow:AirflowDB2024!Strong@localhost:5432/airflow_db
worker_concurrency = 16
worker_log_server_port = 8793

[scheduler]
max_tis_per_query = 512
scheduler_heartbeat_sec = 5
num_runs = -1
processor_poll_interval = 1
min_file_process_interval = 30
dag_dir_list_interval = 300
max_dagruns_to_create_per_loop = 10
max_dagruns_per_loop_to_schedule = 20

[logging]
logging_level = INFO
fab_logging_level = WARN
logging_config_class = airflow.config_templates.airflow_local_settings.DEFAULT_LOGGING_CONFIG
remote_logging = False

[metrics]
statsd_on = False

[secrets]
backend = airflow.providers.hashicorp.secrets.vault.VaultBackend
backend_kwargs = {"connections_path": "connections", "variables_path": "variables"}

Initialize Airflow database

Initialize the Airflow metadata database and create an admin user for the web interface.

source /home/airflow/airflow-env/bin/activate
export AIRFLOW_HOME=/home/airflow/airflow

airflow db init
airflow users create \
    --username admin \
    --firstname Admin \
    --lastname User \
    --role Admin \
    --email admin@example.com \
    --password AdminAirflow2024!

Create DAGs directory

Create the directory structure for DAGs and set proper permissions. DAGs are the workflow definitions that Airflow will execute.

mkdir -p /home/airflow/airflow/dags
mkdir -p /home/airflow/airflow/logs
mkdir -p /home/airflow/airflow/plugins
chmod 755 /home/airflow/airflow/dags
chmod 755 /home/airflow/airflow/logs
chmod 755 /home/airflow/airflow/plugins
Never use chmod 777. It gives every user on the system full access to your files. The airflow user owns these directories, so 755 provides the correct access levels.

Exit airflow user session

Return to your regular user session to create systemd service files and configure the reverse proxy.

exit

Create systemd service for Airflow webserver

Create a systemd service file to manage the Airflow webserver process. This ensures automatic startup and proper process management.

[Unit]
Description=Airflow webserver daemon
After=network.target postgresql.service redis.service
Wants=postgresql.service redis.service

[Service]
User=airflow
Group=airflow
Type=notify
PrivateTmp=true
Environment=PATH=/home/airflow/airflow-env/bin
Environment=AIRFLOW_HOME=/home/airflow/airflow
ExecStart=/home/airflow/airflow-env/bin/airflow webserver --pid /run/airflow/webserver.pid
ExecReload=/bin/kill -s HUP $MAINPID
KillMode=mixed
TimeoutStopSec=60
Restart=on-failure
RestartSec=30
RuntimeDirectory=airflow
RuntimeDirectoryMode=0755

[Install]
WantedBy=multi-user.target

Create systemd service for Airflow scheduler

Create a systemd service for the scheduler component, which handles DAG parsing and task scheduling.

[Unit]
Description=Airflow scheduler daemon
After=network.target postgresql.service redis.service
Wants=postgresql.service redis.service

[Service]
User=airflow
Group=airflow
Type=notify
PrivateTmp=true
Environment=PATH=/home/airflow/airflow-env/bin
Environment=AIRFLOW_HOME=/home/airflow/airflow
ExecStart=/home/airflow/airflow-env/bin/airflow scheduler
Restart=on-failure
RestartSec=30
KillMode=mixed
TimeoutStopSec=60

[Install]
WantedBy=multi-user.target

Create systemd service for Celery worker

Create a systemd service for Celery workers that execute the actual tasks defined in your DAGs.

[Unit]
Description=Airflow celery worker daemon
After=network.target postgresql.service redis.service
Wants=postgresql.service redis.service

[Service]
User=airflow
Group=airflow
Type=notify
PrivateTmp=true
Environment=PATH=/home/airflow/airflow-env/bin
Environment=AIRFLOW_HOME=/home/airflow/airflow
ExecStart=/home/airflow/airflow-env/bin/airflow celery worker
Restart=on-failure
RestartSec=30
KillMode=mixed
TimeoutStopSec=600

[Install]
WantedBy=multi-user.target

Install and configure NGINX reverse proxy

Install NGINX to provide SSL termination and reverse proxy capabilities for secure access to Airflow.

sudo apt install -y nginx certbot python3-certbot-nginx
sudo dnf install -y nginx certbot python3-certbot-nginx

Configure NGINX for Airflow

Create an NGINX virtual host configuration that proxies requests to the Airflow webserver with proper headers and security settings.

upstream airflow {
    server 127.0.0.1:8080 fail_timeout=0;
}

server {
    listen 80;
    server_name airflow.example.com;
    return 301 https://$server_name$request_uri;
}

server {
    listen 443 ssl http2;
    server_name airflow.example.com;

    # SSL configuration will be added by certbot
    
    # Security headers
    add_header X-Frame-Options DENY;
    add_header X-Content-Type-Options nosniff;
    add_header X-XSS-Protection "1; mode=block";
    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
    
    # Increase client max body size for file uploads
    client_max_body_size 100M;
    
    location / {
        proxy_pass http://airflow;
        proxy_set_header Host $http_host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_redirect off;
        proxy_buffering off;
        proxy_request_buffering off;
        
        # WebSocket support
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        
        # Timeout settings
        proxy_connect_timeout 60s;
        proxy_send_timeout 60s;
        proxy_read_timeout 300s;
    }
    
    # Static files
    location /static/ {
        alias /home/airflow/airflow-env/lib/python3.*/site-packages/airflow/www/static/;
        expires 1y;
        add_header Cache-Control "public, immutable";
    }
}

Enable NGINX site and obtain SSL certificate

Enable the NGINX configuration and use Certbot to obtain a free SSL certificate from Let's Encrypt.

sudo ln -s /etc/nginx/sites-available/airflow /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl enable --now nginx
sudo cp /etc/nginx/sites-available/airflow /etc/nginx/conf.d/airflow.conf
sudo nginx -t
sudo systemctl enable --now nginx
sudo certbot --nginx -d airflow.example.com

Configure firewall rules

Open the necessary ports in your firewall for HTTP, HTTPS, and PostgreSQL access. We only allow specific ports needed for Airflow operation.

sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw allow 5432/tcp
sudo ufw --force enable
sudo firewall-cmd --permanent --add-service=http
sudo firewall-cmd --permanent --add-service=https
sudo firewall-cmd --permanent --add-port=5432/tcp
sudo firewall-cmd --reload

Start and enable all Airflow services

Start all the Airflow services and enable them to start automatically on system boot.

sudo systemctl daemon-reload
sudo systemctl enable --now airflow-webserver
sudo systemctl enable --now airflow-scheduler
sudo systemctl enable --now airflow-worker

Create a sample DAG

Create a simple DAG to test your Airflow installation and verify that scheduling works correctly.

from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.bash import BashOperator
from airflow.operators.python import PythonOperator

def print_hello():
    return "Hello from Airflow!"

default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': datetime(2024, 1, 1),
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=5)
}

dag = DAG(
    'sample_dag',
    default_args=default_args,
    description='A simple sample DAG',
    schedule=timedelta(hours=1),
    catchup=False,
    tags=['sample']
)

hello_task = PythonOperator(
    task_id='hello_task',
    python_callable=print_hello,
    dag=dag
)

date_task = BashOperator(
    task_id='date_task',
    bash_command='date',
    dag=dag
)

hello_task >> date_task
sudo chown airflow:airflow /home/airflow/airflow/dags/sample_dag.py
sudo chmod 644 /home/airflow/airflow/dags/sample_dag.py

Production security hardening

Configure PostgreSQL security

Harden PostgreSQL by configuring authentication methods and connection limits. This provides better security for the Airflow database.

# Add these lines to restrict airflow user to local connections only
local   airflow_db      airflow                                 md5
host    airflow_db      airflow         127.0.0.1/32            md5
host    airflow_db      airflow         ::1/128                 md5
sudo systemctl reload postgresql

Configure Redis security

Secure Redis by enabling authentication and binding to localhost only to prevent unauthorized access.

bind 127.0.0.1 ::1
requirepass RedisAirflow2024!Strong
maxmemory 256mb
maxmemory-policy allkeys-lru
save 900 1
save 300 10
save 60 10000
sudo systemctl restart redis

Update Airflow configuration for Redis auth

Update the Airflow configuration to use Redis authentication and optimize performance settings.

[celery]
broker_url = redis://:RedisAirflow2024!Strong@localhost:6379/0
result_backend = db+postgresql://airflow:AirflowDB2024!Strong@localhost:5432/airflow_db

Set up log rotation

Configure log rotation to prevent disk space issues in production environments.

/home/airflow/airflow/logs/*.log {
    daily
    missingok
    rotate 14
    compress
    delaycompress
    notifempty
    create 644 airflow airflow
    sharedscripts
    postrotate
        systemctl reload airflow-webserver airflow-scheduler airflow-worker
    endscript
}

Restart services with new configuration

Restart all Airflow services to apply the security configuration changes.

sudo systemctl restart airflow-webserver
sudo systemctl restart airflow-scheduler
sudo systemctl restart airflow-worker

Verify your setup

# Check all services are running
sudo systemctl status airflow-webserver
sudo systemctl status airflow-scheduler
sudo systemctl status airflow-worker
sudo systemctl status postgresql
sudo systemctl status redis
sudo systemctl status nginx

Test database connectivity

sudo -u airflow bash -c 'source /home/airflow/airflow-env/bin/activate && export AIRFLOW_HOME=/home/airflow/airflow && airflow db check'

Check web interface accessibility

curl -I https://airflow.example.com

List DAGs

sudo -u airflow bash -c 'source /home/airflow/airflow-env/bin/activate && export AIRFLOW_HOME=/home/airflow/airflow && airflow dags list'

Access the Airflow web interface at https://airflow.example.com and log in with username admin and password AdminAirflow2024!. You should see the sample DAG in the interface.

Performance tuning

ComponentSettingRecommended Value
PostgreSQLmax_connections200
PostgreSQLshared_buffers25% of RAM
Airflowworker_concurrencyCPU cores × 4
Airflowsql_alchemy_pool_size10-20
Redismaxmemory10% of total RAM
Note: For monitoring your Airflow deployment, consider integrating with Prometheus and Grafana for comprehensive observability.

Common issues

SymptomCauseFix
Webserver won't startDatabase connection failedCheck PostgreSQL status and credentials in airflow.cfg
Tasks stuck in queued stateCelery worker not runningsudo systemctl restart airflow-worker
DAGs not appearingFile permissions or syntax errorsCheck chown airflow:airflow and run python dag_file.py
SSL certificate errorsCertbot failed or expiredRun sudo certbot renew and restart nginx
High memory usageToo many concurrent workersReduce worker_concurrency in airflow.cfg
Database locksToo many connectionsTune sql_alchemy_pool_size and PostgreSQL max_connections

Next steps

Automated install script

Run this to automate the entire setup

#apache-airflow #postgresql #workflow-orchestration #data-pipeline #systemd #ssl #celery #redis

Need help?

Don't want to manage this yourself?

We handle infrastructure for businesses that depend on uptime. From initial setup to ongoing operations.

Talk to an engineer