Configure Apache Airflow with LDAP authentication against Active Directory, implement role-based access control (RBAC), and set up secure user group management for enterprise environments.
Prerequisites
- Apache Airflow installed with PostgreSQL backend
- Active Directory domain controller access
- SSL certificates for HTTPS
- Administrative access to create AD service accounts and groups
What this solves
Apache Airflow's default authentication uses local user accounts, which becomes unmanageable in enterprise environments. This tutorial configures LDAP authentication with Active Directory, enabling centralized user management and role-based access control (RBAC) for secure, scalable access management.
Step-by-step configuration
Install LDAP dependencies
Install the Python LDAP libraries and system dependencies required for Airflow LDAP integration.
sudo apt update
sudo apt install -y libldap2-dev libsasl2-dev libssl-dev python3-dev
sudo -u airflow pip install apache-airflow[ldap]
Create LDAP service account
Create a dedicated service account in Active Directory for Airflow LDAP queries with minimal required permissions.
New-ADUser -Name "airflow-ldap" -UserPrincipalName "airflow-ldap@example.com" -AccountPassword (ConvertTo-SecureString "SecureP@ssw0rd123!" -AsPlainText -Force) -Enabled $true -PasswordNeverExpires $true
Add-ADGroupMember -Identity "Domain Users" -Members "airflow-ldap"
Configure Airflow webserver authentication
Create the webserver configuration file with LDAP authentication settings and RBAC enabled.
import os
from flask_appbuilder.security.manager import AUTH_LDAP
from airflow.auth.managers.fab.security_manager.override import FabAirflowSecurityManagerOverride
Enable LDAP authentication
AUTH_TYPE = AUTH_LDAP
AUTH_LDAP_SERVER = "ldap://dc01.example.com:389"
AUTH_LDAP_USE_TLS = False
AUTH_LDAP_ALLOW_SELF_SIGNED = True
AUTH_LDAP_TLS_DEMAND = False
AUTH_LDAP_TLS_CACERTDIR = "/etc/ssl/certs"
AUTH_LDAP_TLS_CACERTFILE = "/etc/ssl/certs/ca-certificates.crt"
LDAP bind configuration
AUTH_LDAP_BIND_USER = "CN=airflow-ldap,CN=Users,DC=example,DC=com"
AUTH_LDAP_BIND_PASSWORD = "SecureP@ssw0rd123!"
User search configuration
AUTH_LDAP_SEARCH = "DC=example,DC=com"
AUTH_LDAP_UID_FIELD = "sAMAccountName"
AUTH_LDAP_SEARCH_FILTER = "(objectClass=user)"
User information mapping
AUTH_LDAP_FIRSTNAME_FIELD = "givenName"
AUTH_LDAP_LASTNAME_FIELD = "sn"
AUTH_LDAP_EMAIL_FIELD = "mail"
Group search configuration
AUTH_LDAP_GROUP_FIELD = "memberOf"
AUTH_LDAP_GROUP_SEARCH = "DC=example,DC=com"
AUTH_LDAP_GROUP_SEARCH_FILTER = "(objectClass=group)"
Role mapping from AD groups to Airflow roles
AUTH_ROLES_MAPPING = {
"CN=Airflow-Admins,OU=Security Groups,DC=example,DC=com": ["Admin"],
"CN=Airflow-Ops,OU=Security Groups,DC=example,DC=com": ["Op"],
"CN=Airflow-Users,OU=Security Groups,DC=example,DC=com": ["User"],
"CN=Airflow-Viewers,OU=Security Groups,DC=example,DC=com": ["Viewer"]
}
Sync roles at login
AUTH_ROLES_SYNC_AT_LOGIN = True
AUTH_USER_REGISTRATION = True
AUTH_USER_REGISTRATION_ROLE = "Viewer"
Security settings
WTF_CSRF_ENABLED = True
SECRET_KEY = os.environ.get("AIRFLOW_SECRET_KEY", "your-secret-key-change-this")
Custom security manager for additional LDAP features
class CustomSecurityManager(FabAirflowSecurityManagerOverride):
def auth_user_ldap(self, username, password):
"""Custom LDAP authentication with additional logging"""
import logging
log = logging.getLogger(__name__)
try:
result = super().auth_user_ldap(username, password)
if result:
log.info(f"LDAP authentication successful for user: {username}")
else:
log.warning(f"LDAP authentication failed for user: {username}")
return result
except Exception as e:
log.error(f"LDAP authentication error for user {username}: {str(e)}")
return None
SECURITY_MANAGER_CLASS = CustomSecurityManager
Set environment variables
Configure environment variables for secure credential management and LDAP connection settings.
AIRFLOW_SECRET_KEY="$(openssl rand -base64 32)"
AIRFLOW__WEBSERVER__WEB_SERVER_SSL_CERT="/etc/ssl/certs/airflow.crt"
AIRFLOW__WEBSERVER__WEB_SERVER_SSL_KEY="/etc/ssl/private/airflow.key"
AIRFLOW__WEBSERVER__RBAC=True
AIRFLOW__WEBSERVER__AUTHENTICATE=True
AIRFLOW__WEBSERVER__AUTH_BACKEND=airflow.contrib.auth.backends.ldap_auth
Create Active Directory security groups
Create AD security groups that correspond to Airflow roles for proper RBAC implementation.
New-ADGroup -Name "Airflow-Admins" -GroupScope Global -GroupCategory Security -Path "OU=Security Groups,DC=example,DC=com" -Description "Airflow Administrators"
New-ADGroup -Name "Airflow-Ops" -GroupScope Global -GroupCategory Security -Path "OU=Security Groups,DC=example,DC=com" -Description "Airflow Operators"
New-ADGroup -Name "Airflow-Users" -GroupScope Global -GroupCategory Security -Path "OU=Security Groups,DC=example,DC=com" -Description "Airflow Users"
New-ADGroup -Name "Airflow-Viewers" -GroupScope Global -GroupCategory Security -Path "OU=Security Groups,DC=example,DC=com" -Description "Airflow Viewers"
Configure Airflow core settings
Update the main Airflow configuration to enable RBAC and set authentication parameters.
[webserver]
rbac = True
authenticate = True
auth_backend = airflow.contrib.auth.backends.ldap_auth
web_server_port = 8080
web_server_host = 0.0.0.0
base_url = https://airflow.example.com
enable_proxy_fix = True
web_server_ssl_cert = /etc/ssl/certs/airflow.crt
web_server_ssl_key = /etc/ssl/private/airflow.key
[core]
fernet_key = $(python3 -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())")
security = kerberos
remote_logging = True
logging_level = INFO
[ldap]
uri = ldap://dc01.example.com:389
user_filter = objectClass=user
user_name_attr = sAMAccountName
group_member_attr = memberOf
superuser_filter =
bind_user = CN=airflow-ldap,CN=Users,DC=example,DC=com
bind_password = SecureP@ssw0rd123!
basedn = DC=example,DC=com
cacert = /etc/ssl/certs/ca-certificates.crt
search_scope = SUBTREE
Set proper file permissions
Configure secure permissions for Airflow configuration files containing sensitive LDAP credentials.
sudo chown airflow:airflow /opt/airflow/webserver_config.py /opt/airflow/.env /opt/airflow/airflow.cfg
sudo chmod 640 /opt/airflow/webserver_config.py /opt/airflow/.env /opt/airflow/airflow.cfg
sudo chmod 755 /opt/airflow
Initialize Airflow database with RBAC
Initialize the Airflow database and create the default admin user for RBAC setup.
sudo -u airflow bash -c 'cd /opt/airflow && source .env && airflow db init'
sudo -u airflow bash -c 'cd /opt/airflow && source .env && airflow users create --username admin --firstname Admin --lastname User --role Admin --email admin@example.com --password AdminP@ssw0rd123!'
Configure custom role permissions
Create a Python script to define custom roles and permissions for your LDAP groups.
#!/usr/bin/env python3
import sys
sys.path.insert(0, '/opt/airflow')
from airflow import settings
from airflow.auth.managers.fab.models import Role, Permission, ViewMenu
from sqlalchemy.orm import sessionmaker
from airflow.auth.managers.fab.security_manager.override import FabAirflowSecurityManagerOverride
def setup_custom_roles():
"""Create custom roles with specific permissions"""
security_manager = FabAirflowSecurityManagerOverride()
# Define custom permissions for Data Engineer role
data_engineer_perms = [
('can_read', 'DAGs'),
('can_edit', 'DAGs'),
('can_create', 'DagRuns'),
('can_read', 'TaskInstances'),
('can_clear', 'TaskInstances'),
('can_read', 'Logs'),
('can_read', 'ImportError'),
('can_read', 'XComs')
]
# Create Data Engineer role
data_engineer_role = security_manager.find_role('DataEngineer')
if not data_engineer_role:
data_engineer_role = security_manager.add_role('DataEngineer')
# Add permissions to Data Engineer role
for perm_name, view_name in data_engineer_perms:
perm = security_manager.find_permission_view_menu(perm_name, view_name)
if perm and perm not in data_engineer_role.permissions:
data_engineer_role.permissions.append(perm)
print("Custom RBAC roles configured successfully")
if __name__ == '__main__':
setup_custom_roles()
Run RBAC setup script
Execute the RBAC setup script to create custom roles and apply them to your LDAP group mappings.
sudo -u airflow python3 /opt/airflow/setup_rbac.py
Configure systemd service with LDAP settings
Update the Airflow webserver systemd service to load environment variables and LDAP configuration.
[Unit]
Description=Airflow webserver daemon
After=network.target postgresql.service
Wants=postgresql.service
[Service]
EnvironmentFile=/opt/airflow/.env
User=airflow
Group=airflow
Type=notify
ExecStart=/opt/airflow/venv/bin/airflow webserver --config /opt/airflow/webserver_config.py --pid /run/airflow/webserver.pid
ExecReload=/bin/kill -s HUP $MAINPID
KillMode=mixed
TimeoutStopSec=600
KillSignal=SIGINT
Restart=on-failure
RestartSec=42s
PIDFile=/run/airflow/webserver.pid
WorkingDirectory=/opt/airflow
Security settings
PrivateTmp=true
ProtectSystem=strict
ReadWritePaths=/opt/airflow /run/airflow /var/log/airflow
NoNewPrivileges=true
ProtectHome=true
ProtectControlGroups=true
ProtectKernelModules=true
ProtectKernelTunables=true
RestrictRealtime=true
[Install]
WantedBy=multi-user.target
Enable and start Airflow services
Enable and start the Airflow webserver and scheduler services with LDAP authentication configured.
sudo systemctl daemon-reload
sudo systemctl enable airflow-webserver airflow-scheduler
sudo systemctl start airflow-webserver airflow-scheduler
sudo systemctl status airflow-webserver
Configure firewall for LDAP traffic
Open required ports for LDAP communication and Airflow web interface access.
sudo ufw allow 8080/tcp comment "Airflow Web UI"
sudo ufw allow out 389/tcp comment "LDAP"
sudo ufw allow out 636/tcp comment "LDAPS"
sudo ufw reload
Verify your setup
Test LDAP authentication and RBAC functionality to ensure proper integration.
# Check Airflow services status
sudo systemctl status airflow-webserver airflow-scheduler
Test LDAP connection
sudo -u airflow python3 -c "
import ldap
try:
conn = ldap.initialize('ldap://dc01.example.com:389')
conn.simple_bind_s('CN=airflow-ldap,CN=Users,DC=example,DC=com', 'SecureP@ssw0rd123!')
print('LDAP connection successful')
conn.unbind()
except Exception as e:
print(f'LDAP connection failed: {e}')
"
Check webserver logs for LDAP authentication
sudo journalctl -u airflow-webserver -f --lines=50
Verify RBAC roles in database
sudo -u airflow psql -d airflow -c "SELECT name, description FROM ab_role;"
Test web interface access
curl -I https://localhost:8080/login/
Common issues
| Symptom | Cause | Fix |
|---|---|---|
| LDAP bind failed error | Incorrect service account credentials | Verify bind user DN and password in webserver_config.py |
| User authentication works but no roles assigned | AD group DN mismatch in role mapping | Check exact group DNs with ldapsearch -x -H ldap://dc01.example.com -D "CN=airflow-ldap,CN=Users,DC=example,DC=com" -w password -b "DC=example,DC=com" "(objectClass=group)" |
| SSL certificate verification failed | Self-signed AD certificate | Set AUTH_LDAP_ALLOW_SELF_SIGNED = True or add CA certificate to system trust store |
| Permission denied on config files | Incorrect file ownership | sudo chown airflow:airflow /opt/airflow/webserver_config.py && sudo chmod 640 /opt/airflow/webserver_config.py |
| Users can login but see empty DAG list | Missing DAG-level permissions | Add users to appropriate AD groups or grant DAG permissions via web interface |
| Connection timeout to domain controller | Network/firewall blocking LDAP ports | Test connectivity with telnet dc01.example.com 389 and configure firewall rules |
Next steps
Automated install script
Run this to automate the entire setup
#!/usr/bin/env bash
set -euo pipefail
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m'
# Default values
LDAP_SERVER=""
DOMAIN=""
BIND_USER=""
BIND_PASS=""
AIRFLOW_USER="airflow"
AIRFLOW_HOME="/opt/airflow"
# Usage message
usage() {
echo "Usage: $0 -s LDAP_SERVER -d DOMAIN -u BIND_USER -p BIND_PASS"
echo " -s: LDAP server (e.g., dc01.example.com)"
echo " -d: Domain (e.g., example.com)"
echo " -u: LDAP bind user (e.g., airflow-ldap)"
echo " -p: LDAP bind password"
echo " -h: AIRFLOW_HOME (default: /opt/airflow)"
exit 1
}
# Parse arguments
while getopts "s:d:u:p:h:" opt; do
case $opt in
s) LDAP_SERVER="$OPTARG" ;;
d) DOMAIN="$OPTARG" ;;
u) BIND_USER="$OPTARG" ;;
p) BIND_PASS="$OPTARG" ;;
h) AIRFLOW_HOME="$OPTARG" ;;
*) usage ;;
esac
done
if [[ -z "$LDAP_SERVER" || -z "$DOMAIN" || -z "$BIND_USER" || -z "$BIND_PASS" ]]; then
usage
fi
# Cleanup function
cleanup() {
echo -e "${RED}[ERROR]${NC} Installation failed. Rolling back changes..."
systemctl stop airflow-webserver 2>/dev/null || true
systemctl stop airflow-scheduler 2>/dev/null || true
[[ -f "${AIRFLOW_HOME}/webserver_config.py.backup" ]] && mv "${AIRFLOW_HOME}/webserver_config.py.backup" "${AIRFLOW_HOME}/webserver_config.py" 2>/dev/null || true
}
trap cleanup ERR
# Check prerequisites
echo -e "${BLUE}[1/8]${NC} Checking prerequisites..."
if [[ $EUID -ne 0 ]]; then
echo -e "${RED}Error:${NC} This script must be run as root"
exit 1
fi
if ! command -v python3 &> /dev/null; then
echo -e "${RED}Error:${NC} Python 3 is required but not installed"
exit 1
fi
if ! id "$AIRFLOW_USER" &>/dev/null; then
echo -e "${RED}Error:${NC} User $AIRFLOW_USER does not exist"
exit 1
fi
# Auto-detect distribution
echo -e "${BLUE}[2/8]${NC} Detecting operating system..."
if [[ ! -f /etc/os-release ]]; then
echo -e "${RED}Error:${NC} Cannot detect OS distribution"
exit 1
fi
. /etc/os-release
case "$ID" in
ubuntu|debian)
PKG_MGR="apt"
PKG_UPDATE="apt update"
PKG_INSTALL="apt install -y"
LDAP_DEV_PKGS="libldap2-dev libsasl2-dev libssl-dev python3-dev python3-pip"
;;
almalinux|rocky|centos|rhel|ol)
PKG_MGR="dnf"
PKG_UPDATE="dnf update -y"
PKG_INSTALL="dnf install -y"
LDAP_DEV_PKGS="openldap-devel cyrus-sasl-devel openssl-devel python3-devel python3-pip"
;;
fedora)
PKG_MGR="dnf"
PKG_UPDATE="dnf update -y"
PKG_INSTALL="dnf install -y"
LDAP_DEV_PKGS="openldap-devel cyrus-sasl-devel openssl-devel python3-devel python3-pip"
;;
amzn)
PKG_MGR="yum"
PKG_UPDATE="yum update -y"
PKG_INSTALL="yum install -y"
LDAP_DEV_PKGS="openldap-devel cyrus-sasl-devel openssl-devel python3-devel python3-pip"
;;
*)
echo -e "${RED}Error:${NC} Unsupported distribution: $ID"
exit 1
;;
esac
echo -e "${GREEN}Detected:${NC} $PRETTY_NAME ($PKG_MGR)"
# Update package repositories
echo -e "${BLUE}[3/8]${NC} Updating package repositories..."
$PKG_UPDATE
# Install LDAP dependencies
echo -e "${BLUE}[4/8]${NC} Installing LDAP development libraries..."
$PKG_INSTALL $LDAP_DEV_PKGS
# Install Python LDAP libraries
echo -e "${BLUE}[5/8]${NC} Installing Python LDAP libraries..."
sudo -u "$AIRFLOW_USER" pip install apache-airflow[ldap]
# Generate domain components for LDAP
IFS='.' read -ra DOMAIN_PARTS <<< "$DOMAIN"
DC_STRING=""
for part in "${DOMAIN_PARTS[@]}"; do
if [[ -n "$DC_STRING" ]]; then
DC_STRING="${DC_STRING},DC=${part}"
else
DC_STRING="DC=${part}"
fi
done
# Backup existing webserver config
echo -e "${BLUE}[6/8]${NC} Creating webserver configuration..."
[[ -f "${AIRFLOW_HOME}/webserver_config.py" ]] && cp "${AIRFLOW_HOME}/webserver_config.py" "${AIRFLOW_HOME}/webserver_config.py.backup"
# Generate secret key
SECRET_KEY=$(openssl rand -base64 32)
# Create webserver configuration
cat > "${AIRFLOW_HOME}/webserver_config.py" << EOF
import os
from flask_appbuilder.security.manager import AUTH_LDAP
from airflow.auth.managers.fab.security_manager.override import FabAirflowSecurityManagerOverride
# Enable LDAP authentication
AUTH_TYPE = AUTH_LDAP
AUTH_LDAP_SERVER = "ldap://${LDAP_SERVER}:389"
AUTH_LDAP_USE_TLS = False
AUTH_LDAP_ALLOW_SELF_SIGNED = True
AUTH_LDAP_TLS_DEMAND = False
AUTH_LDAP_TLS_CACERTDIR = "/etc/ssl/certs"
AUTH_LDAP_TLS_CACERTFILE = "/etc/ssl/certs/ca-certificates.crt"
# LDAP bind configuration
AUTH_LDAP_BIND_USER = "CN=${BIND_USER},CN=Users,${DC_STRING}"
AUTH_LDAP_BIND_PASSWORD = "${BIND_PASS}"
# User search configuration
AUTH_LDAP_SEARCH = "${DC_STRING}"
AUTH_LDAP_UID_FIELD = "sAMAccountName"
AUTH_LDAP_SEARCH_FILTER = "(objectClass=user)"
# User information mapping
AUTH_LDAP_FIRSTNAME_FIELD = "givenName"
AUTH_LDAP_LASTNAME_FIELD = "sn"
AUTH_LDAP_EMAIL_FIELD = "mail"
# Group search configuration
AUTH_LDAP_GROUP_FIELD = "memberOf"
AUTH_LDAP_GROUP_SEARCH = "${DC_STRING}"
AUTH_LDAP_GROUP_SEARCH_FILTER = "(objectClass=group)"
# Role mapping from AD groups to Airflow roles
AUTH_ROLES_MAPPING = {
"CN=Airflow-Admins,OU=Security Groups,${DC_STRING}": ["Admin"],
"CN=Airflow-Ops,OU=Security Groups,${DC_STRING}": ["Op"],
"CN=Airflow-Users,OU=Security Groups,${DC_STRING}": ["User"],
"CN=Airflow-Viewers,OU=Security Groups,${DC_STRING}": ["Viewer"]
}
# Sync roles at login
AUTH_ROLES_SYNC_AT_LOGIN = True
AUTH_USER_REGISTRATION = True
AUTH_USER_REGISTRATION_ROLE = "Viewer"
# Security settings
WTF_CSRF_ENABLED = True
SECRET_KEY = "${SECRET_KEY}"
# Custom security manager for additional LDAP features
class CustomSecurityManager(FabAirflowSecurityManagerOverride):
def auth_user_ldap(self, username, password):
"""Custom LDAP authentication with additional logging"""
import logging
log = logging.getLogger(__name__)
try:
result = super().auth_user_ldap(username, password)
if result:
log.info(f"LDAP authentication successful for user: {username}")
else:
log.warning(f"LDAP authentication failed for user: {username}")
return result
except Exception as e:
log.error(f"LDAP authentication error for user {username}: {str(e)}")
return None
SECURITY_MANAGER_CLASS = CustomSecurityManager
EOF
# Set proper permissions
chown "$AIRFLOW_USER:$AIRFLOW_USER" "${AIRFLOW_HOME}/webserver_config.py"
chmod 644 "${AIRFLOW_HOME}/webserver_config.py"
# Create environment file for additional configuration
echo -e "${BLUE}[7/8]${NC} Setting environment variables..."
cat > "${AIRFLOW_HOME}/.env" << EOF
AIRFLOW_SECRET_KEY="${SECRET_KEY}"
AIRFLOW__WEBSERVER__AUTHENTICATE=True
AIRFLOW__WEBSERVER__AUTH_BACKEND=airflow.auth.backends.ldap_auth
AIRFLOW__WEBSERVER__RBAC=True
EOF
chown "$AIRFLOW_USER:$AIRFLOW_USER" "${AIRFLOW_HOME}/.env"
chmod 600 "${AIRFLOW_HOME}/.env"
# Restart Airflow services
echo -e "${BLUE}[8/8]${NC} Restarting Airflow services..."
systemctl daemon-reload
if systemctl is-active --quiet airflow-webserver; then
systemctl restart airflow-webserver
else
systemctl start airflow-webserver
fi
if systemctl is-active --quiet airflow-scheduler; then
systemctl restart airflow-scheduler
else
systemctl start airflow-scheduler
fi
# Verification checks
echo -e "${BLUE}Verifying installation...${NC}"
sleep 5
if systemctl is-active --quiet airflow-webserver; then
echo -e "${GREEN}✓${NC} Airflow webserver is running"
else
echo -e "${RED}✗${NC} Airflow webserver is not running"
exit 1
fi
if systemctl is-active --quiet airflow-scheduler; then
echo -e "${GREEN}✓${NC} Airflow scheduler is running"
else
echo -e "${RED}✗${NC} Airflow scheduler is not running"
exit 1
fi
if sudo -u "$AIRFLOW_USER" python3 -c "import ldap3" 2>/dev/null; then
echo -e "${GREEN}✓${NC} LDAP libraries installed successfully"
else
echo -e "${RED}✗${NC} LDAP libraries not properly installed"
exit 1
fi
if [[ -f "${AIRFLOW_HOME}/webserver_config.py" ]]; then
echo -e "${GREEN}✓${NC} Webserver configuration created"
else
echo -e "${RED}✗${NC} Webserver configuration missing"
exit 1
fi
echo -e "${GREEN}[SUCCESS]${NC} Apache Airflow LDAP authentication configuration completed!"
echo -e "${YELLOW}Note:${NC} Ensure the following Active Directory groups exist:"
echo " - CN=Airflow-Admins,OU=Security Groups,${DC_STRING}"
echo " - CN=Airflow-Ops,OU=Security Groups,${DC_STRING}"
echo " - CN=Airflow-Users,OU=Security Groups,${DC_STRING}"
echo " - CN=Airflow-Viewers,OU=Security Groups,${DC_STRING}"
echo -e "${YELLOW}Configuration file:${NC} ${AIRFLOW_HOME}/webserver_config.py"
Review the script before running. Execute with: bash install.sh