Friday, May 9, 2025

 AKS - Airflow setup and use with SSO

Here’s a step-by-step guide to deploying and using Apache Airflow on Azure Kubernetes Service (AKS) [1]:

Step 1: Set Up Your AKS Cluster

If you don’t already have an AKS cluster, create one using IaC. This should already be done for you. Log into the Aks cluster with

az aks get-credentials --resource-group <resourcegroupname> --name <clustername>

kubelogin convert-kubeconfig -l azurecli

Step 2: Install Helm

Ensure Helm is installed on your local machine:

curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash

Step 3: Create a Namespace for Airflow

kubectl create namespace airflow

Step 4: Configure Workload Identity (Optional but Recommended)

This step allows Airflow to securely access Azure resources like Key Vault:

1. Create a service account:

kubectl apply -f - <<EOF

apiVersion: v1

kind: ServiceAccount

metadata:

  name: airflow

  namespace: airflow

EOF

1. Annotate the service account with your Azure identity:

kubectl annotate serviceaccount airflow \

  azure.workload.identity/client-id=<CLIENT_ID> \

  azure.workload.identity/tenant-id=<TENANT_ID> \

  -n airflow

Step 5: Install External Secrets Operator (Optional for Key Vault Integration)

helm repo add external-secrets https://charts.external-secrets.io

helm repo update

helm install external-secrets external-secrets/external-secrets \

  --namespace airflow \

  --create-namespace \

  --set installCRDs=true \

  --wait

Step 6: Add the Apache Airflow Helm Chart

helm repo add apache-airflow https://airflow.apache.org

helm repo update

Step 7: Install Airflow

a kustomization is preferred:

or using helm

helm install airflow apache-airflow/airflow \

  --namespace airflow \

  --set executor=CeleryExecutor \

  --set airflow.image.tag=2.8.1 \

  --set createUser=true \

  --set webserver.defaultUser.username=admin \

  --set webserver.defaultUser.password=admin

You can get Fernet key with

python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"

and optionally save it as a secret

kubectl get secret --namespace airflow airflow-fernet-key -o jsonpath="{.data.fernet-key}" | base64 --decode

I prefer creating a HelmRelease with the official airflow chart with the release file in the references and creating configMaps for values file also in the references and the webserver_config.py discussed a few steps below if you want to specify SSO during setup.

Step 8: Access the Airflow Web UI

Port-forward the web server service:

kubectl port-forward svc/airflow-webserver 8080:8080 -n airflow

Then open your browser and go to: http://localhost:8080

For a complete setup, it could look like this: PIBI NonProd Airflow

Step 9: Setup the app registration for SSO

Add the following:

1. redirect URI as https://<your-airflow-domain>/oauth-authorized/azure or https://<your-airflow-domain>/oauth2/callback

2. Assign API permissions (openid, email, profile) for authentication.

3. Navigate to the token configuration page of the Azure AD application. For ID and access token, add an optional claim on the

1. email

2. preferred_username

3. given_name

4. family_name

5. UPN

4. Edit the groups claim to include sAMAccountName for both ID and Access tokens but leave out SAML.

5. Specify federated identity entries for use with your GitHub repository.

6. Optionally, create App roles on your Azure AD application such as airflow_nonprod_admin, airflow_nonprod_dev and airflow_nonprod_viewer

7. Make sure you have the right client id, client secret and tenant id for the next steps.

Step 10: Create a secret from the app registration in the previous step

kubectl create secret generic airflow-ad-secret \

  --from-literal=client-id=<your-azure-client-id> \

  --from-literal=client-secret=<your-azure-client-secret> \

  --from-literal=tenant-id=<your-azure-tenant-id>

Step 11: Configure the airflow web server for SSO

Create the values file attached in the references or with the following modifications to bring your own values file:

SSO configuration

webserver:

  defaultUser:

    enabled: false

  authBackend: "airflow.providers.microsoft.azure.auth.backend.azure_auth"

  extraEnv:

    - name: AIRFLOW__WEBSERVER__RBAC

      value: "True"

    - name: AIRFLOW__API__AUTH_BACKENDS

      value: "airflow.api.auth.backend.deny_all"

    - name: AIRFLOW__WEBSERVER__AUTH_BACKEND

      value: "airflow.providers.microsoft.azure.auth.backend.azure_auth"

    - name: AIRFLOW__MICROSOFT__CLIENT_ID

      value: "<your-client-id>"

    - name: AIRFLOW__MICROSOFT__CLIENT_SECRET

      value: "<your-client-secret>"

    - name: AIRFLOW__MICROSOFT__TENANT_ID

      value: "<your-tenant-id>"

    - name: AIRFLOW__MICROSOFT__REDIRECT_URI

      value: "https://<your-airflow-domain>/oauth2/callback"

and create a ConfigMap for the values file named airflow-values.

Step 12: Upgrade the airflow deployment

Run the following command:

apply the yaml above

helm upgrade airflow apache-airflow/airflow \

  --namespace <your-namespace> \

  -f values.yaml

Step 13: Using webserver_config.py in Airflow to enable OAuth authentication

Just apply the updated /opt/airflow/webserver_config.py4 as shown below to the airflow container.

webserver_config.py

from airflow.www.fab_security.manager import AUTH_OAUTH

AUTH_TYPE = AUTH_OAUTH

OAUTH_PROVIDERS = [{

    'name': 'Microsoft Azure AD',

    'token_key': 'access_token',

    'remote_app': {

        'api_base_url': "https://login.microsoftonline.com/{TENANT_ID}",

        'access_token_url': "https://login.microsoftonline.com/{TENANT_ID}/oauth2/v2.0/token",

        'authorize_url': "https://login.microsoftonline.com/{TENANT_ID}/oauth2/v2.0/authorize",

        'client_id': "{CLIENT_ID}",

        'client_secret': "{CLIENT_SECRET}",

        'jwks_uri': "https://login.microsoftonline.com/common/discovery/v2.0/keys"

    }

}]

or create.a configMap named airflow-webserver-config with the webserver_config.py file attached in the references and pass it to your instance.

Restart the airflow webserver to apply changes.

Step 14: Configure the ingress for https and redirect URI

create the following YAML:

Specify callback

rules:

  - host: <your-airflow-domain>

    http:

      paths:

        - path: /

          pathType: Prefix

          backend:

            service:

              name: airflow-web

              port:

                number: 8080

Step 15: Test the SSO

• Navigate to https://<your-airflow-domain>.

• You should be redirected to Azure AD login.

• Upon successful login, you’ll be redirected back to Airflow.

webserver_config.py:

from __future__ import annotations

import os

from airflow.www.fab_security.manager import AUTH_OAUTH

from airflow.www.security import AirflowSecurityManager

from airflow.utils.log.logging_mixin import LoggingMixin

basedir = os.path.abspath(os.path.dirname(__file__))

# Flask-WTF flag for CSRF

WTF_CSRF_ENABLED = True

WTF_CSRF_TIME_LIMIT = None

AUTH_TYPE = AUTH_OAUTH

OAUTH_PROVIDERS = [{

    ‘name’:’Microsoft Azure AD’,

    ‘token_key’:’access_token’,

    ‘icon’:’fa-windows’,

    ‘remote_app’: {

        ‘api_base_url’: https://login.microsoftonline.com/{}.format(os.getenv(“AAD_TENANT_ID”)),

        ‘request_token_url’: None,

        ‘request_token_params’: {

            ‘scope’: ‘openid email profile’

        },

        ‘access_token_url’: https://login.microsoftonline.com/{}/oauth2/v2.0/token.format(os.getenv(“AAD_TENANT_ID”)),

        “access_token_params”: {

            ‘scope’: ‘openid email profile’

        },

        ‘authorize_url’: https://login.microsoftonline.com/{}/oauth2/v2.0/authorize.format(os.getenv(“AAD_TENANT_ID”)),

        “authorize_params”: {

            ‘scope’: ‘openid email profile’

        },

        ‘client_id’: os.getenv(“AAD_CLIENT_ID”),

        ‘client_secret’: os.getenv(“AAD_CLIENT_SECRET”),

        ‘jwks_uri’: ‘https://login.microsoftonline.com/common/discovery/v2.0/keys’

    }

}]

AUTH_USER_REGISTRATION_ROLE = “Public”

AUTH_USER_REGISTRATION = True

AUTH_ROLES_SYNC_AT_LOGIN = True

AUTH_ROLES_MAPPING = {

    “airflow_prod_admin”: [“Admin”],

    “airflow_prod_user”: [“Op”],

    “airflow_prod_viewer”: [“Viewer”]

}

Class AzureCustomSecurity(AirflowSecurityManager, LoggingMixin):

    Def get_oauth_user_info(self, provider, response=None):

        Me = self._azure_jwt_token_parse(response[“id_token”])

        Return {

            “name”: me[“name”],

            “email”: me[“email”],

            “first_name”: me[“given_name”],

            “last_name”: me[“family_name”],

            “id”: me[“oid”],

            “username”: me[“preferred_username”],

            “role_keys”: me[“roles”]

        }

# the first of these two appears to work with older Airflow versions, the latter newer.

FAB_SECURITY_MANAGER_CLASS = ‘webserver_config.AzureCustomSecurity’

SECURITY_MANAGER_CLASS = AzureCustomSecurity

Airflow-Repo:

Airflow-release:

Airflow-values.yaml:


No comments:

Post a Comment