If you work in data engineering, you’ve probably heard of dbt and Apache Airflow. Both are powerful tools, but many teams wonder: Can I run dbt pipelines using Airflow? The answer is yes! In this guide, we’ll explore several ways to do this, from the simplest to the most advanced, with easy-to-follow code examples.

What is dbt?

dbt (data build tool) is an open-source tool that helps you transform data in your warehouse by writing SQL SELECT statements. dbt handles running these transformations in the right order, testing your data, and generating documentation.

Key features:

  • Write modular SQL models.
  • Run models in dependency order.
  • Test and document your data.

What is Apache Airflow?

Apache Airflow is an open-source platform to programmatically author, schedule, and monitor workflows. You define workflows as Directed Acyclic Graphs (DAGs) of tasks.

Key features:

  • Schedule and orchestrate complex workflows.
  • Monitor and retry failed tasks.
  • Integrate with many data tools.

Why combine dbt and Airflow?

  • Orchestration: Airflow handles the scheduling and dependencies between dbt runs.
  • Monitoring: Airflow provides a central place to monitor all your data pipelines.
  • Scalability: Airflow can handle large numbers of dbt runs.
  • Cost-effectiveness: Airflow is free to use, while dbt Cloud is paid.
  • Automation: Combine dbt with other tasks (e.g., data ingestion, notifications).

Method 1: The BashOperator Approach

The easiest way to run dbt from Airflow is to use the BashOperator. This operator lets you run any shell command, including dbt CLI commands.

Step by Step Guide

  1. Install dbt in your Airflow environment:
pip install dbt-core dbt-postgres  # or your adapter
  1. Add your dbt project and profiles to your Airflow repo:
  • Place your dbt project in a subfolder (e.g., dags/dbt_project).
  • Add your profiles.yml (can use environment variables for secrets).
  1. Create an Airflow DAG using BashOperator:
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime

default_args = {
    'owner': 'airflow',
    'start_date': datetime(2024, 1, 1),
}

with DAG('dbt_simple_pipeline',
         default_args=default_args,
         schedule_interval='@daily',
         catchup=False) as dag:

    dbt_seed = BashOperator(
        task_id='dbt_seed',
        bash_command='dbt seed --project-dir /path/to/dbt_project --profiles-dir /path/to/profiles'
    )

    dbt_run = BashOperator(
        task_id='dbt_run',
        bash_command='dbt run --project-dir /path/to/dbt_project --profiles-dir /path/to/profiles'
    )

    dbt_test = BashOperator(
        task_id='dbt_test',
        bash_command='dbt test --project-dir /path/to/dbt_project --profiles-dir /path/to/profiles'
    )

    dbt_seed >> dbt_run >> dbt_test

Replace /path/to/dbt_project and /path/to/profiles with your actual paths.

Pros and Cons

Pros Cons
Simple to set up All dbt models run as a single task—no visibility into individual model status
No extra packages needed If one model fails, you must rerun the whole pipeline
Potential dependency conflicts between dbt and Airflow
You need to commit profiles.yml (but can use env vars for secrets)

Method 2: Using Airflow PodOperator

If you use Kubernetes, you can run dbt in isolated pods using Airflow’s KubernetesPodOperator or PodOperator. This is great for large projects or when you want to avoid dependency conflicts.

Step by Step Guide

  1. Create a Dockerfile
# Use an official Python image as the base
FROM python:3.10-slim

# Set environment variables
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1

# Install system dependencies
RUN apt-get update && apt-get install -y \
    build-essential \
    git \
    && rm -rf /var/lib/apt/lists/*

# Install dbt (replace dbt-postgres with your adapter if needed)
RUN pip install --upgrade pip
RUN pip install dbt-core dbt-postgres

# Set work directory
WORKDIR /dbt

# (Optional) Copy your dbt project into the image
# COPY . /dbt

# Default command (can be overridden)
CMD ["dbt", "--help"]
  1. Build the Docker Image. Replace myrepo/dbt:latest with your desired image name and tag.
docker build -t myrepo/dbt:latest .
  1. Log in to your Container Registry. I'm using AWS ECR in this example, but you can use any container registry of your choice.
aws ecr get-login-password --region <region> | docker login --username AWS --password-stdin <aws_account_id>.dkr.ecr.<region>.amazonaws.com
  1. Tag the Image
docker tag myrepo/dbt:latest <aws_account_id>.dkr.ecr.<region>.amazonaws.com/dbt:latest
  1. Push the Image
docker push <aws_account_id>.dkr.ecr.<region>.amazonaws.com/dbt:latest
  1. Create a DAG using KubernetesPodOperator:
from airflow import DAG
from airflow.providers.cncf.kubernetes.operators.kubernetes_pod import KubernetesPodOperator
from datetime import datetime

default_args = {
    'owner': 'airflow',
    'start_date': datetime(2024, 1, 1),
}

with DAG('dbt_k8s_pipeline',
         default_args=default_args,
         schedule_interval='@daily',
         catchup=False) as dag:

    dbt_run = KubernetesPodOperator(
        namespace='default',
        image='myrepo/dbt:latest',  # Your dbt Docker image
        cmds=['dbt', 'run'],
        arguments=[
            '--project-dir', '/dbt_project',
            '--profiles-dir', '/dbt_profiles'
        ],
        name='dbt-run',
        task_id='dbt_run',
        get_logs=True,
        is_delete_operator_pod=True,
        in_cluster=True,
        do_xcom_push=False,
        env_vars={
            'DBT_ENV_SECRET': 'mysecret'
        },
        volume_mounts=[],  # Add if you need to mount volumes
        volumes=[],
    )

Pros and Cons

Pros Cons
Full isolation—no dependency conflicts. Requires Kubernetes and Docker knowledge.
Scales well for large projects. More setup (build/push images, configure volumes).
Works great in Kubernetes-native environments.

Method 3: Using Astronomer Cosmos

Astronomer Cosmos is an open-source Airflow provider that lets you run dbt Core projects as Airflow DAGs and Task Groups. It gives you more control and visibility by turning each dbt model into its own Airflow task.

Key benefits:

  • Use Airflow connections instead of dbt profiles.
  • Install dbt in a separate virtual environment.
  • Each dbt model becomes a separate Airflow task.
  • Simulate dbt build (run + test per model).
  • Use Airflow’s sensors and features.

Step by Step Guide

  1. Install Astronomer Cosmos
pip install astronomer-cosmos
  1. Install dbt in a separate environment (optional but recommended):
python -m venv /path/to/dbt_venv
source /path/to/dbt_venv/bin/activate
pip install dbt-core dbt-postgres
  1. Set up your dbt project and Airflow connection:
  • Place your dbt project in a known location.
  • Configure Airflow connection for your warehouse (e.g., Snowflake, Postgres).
  1. Create a DAG using Cosmos:
from airflow import DAG
from cosmos.providers.dbt.core.operators import DbtDag
from cosmos.providers.dbt.core.config import DbtProjectConfig, DbtProfileConfig, ExecutionConfig
from datetime import datetime

default_args = {
    'owner': 'airflow',
    'start_date': datetime(2024, 1, 1),
}

with DAG('dbt_cosmos_pipeline',
         default_args=default_args,
         schedule_interval='@daily',
         catchup=False) as dag:

    dbt_project_config = DbtProjectConfig(
        dbt_project_path='/path/to/dbt_project'
    )

    dbt_profile_config = DbtProfileConfig(
        profile_name='my_profile',
        target_name='dev',
        # Use Airflow connection instead of profiles.yml
        conn_id='my_snowflake_conn'
    )

    execution_config = ExecutionConfig(
        dbt_executable_path='/path/to/dbt_venv/bin/dbt'
    )

    dbt_dag = DbtDag(
        task_id='dbt_dag',
        project_config=dbt_project_config,
        profile_config=dbt_profile_config,
        execution_config=execution_config
    )

This will create a task for each dbt model and test!

Pros and Cons

Pros Cons
Each dbt model is a separate Airflow task—better visibility and control More complex setup
Can use Airflow connections for credentials More Airflow tasks (could be hundreds if you have many models)
Avoids dependency conflicts by using a separate dbt environment Learning curve for Cosmos
Supports advanced features (sensors, retries, etc.)

Bonus: Integrating with dbt Cloud

dbt Cloud is a managed service for dbt, offering scheduling, logging, and a web UI. You can trigger dbt Cloud jobs from Airflow using the official provider.

Step by Step Guide

  1. Install the dbt Cloud Airflow provider
pip install apache-airflow-providers-dbt-cloud[http]
  1. Create a dbt Cloud service token:
    1. In dbt Cloud, go to Settings → Account → Service Tokens.
  2. Set up Airflow connection:
    1. Store your dbt Cloud credentials in Airflow as a connection:
export AIRFLOW_CONN_DBT_CLOUD_DEFAULT='dbt-cloud://account_id:secret_token@my-url.getdbt.com'
  1. Create a DAG to trigger dbt Cloud jobs:
from airflow import DAG
from airflow.providers.dbt.cloud.operators.dbt import DbtCloudRunJobOperator
from datetime import datetime

default_args = {
    'owner': 'airflow',
    'start_date': datetime(2024, 1, 1),
}

with DAG('dbt_cloud_pipeline',
         default_args=default_args,
         schedule_interval='@daily',
         catchup=False) as dag:

    run_dbt_job = DbtCloudRunJobOperator(
        task_id='run_dbt_cloud_job',
        job_id=12345,  # Replace with your dbt Cloud job ID
        account_id=67890,  # Replace with your dbt Cloud account ID
        wait_for_termination=True,
        dag=dag
    )

Pros and Cons

Pros Cons
Leverage dbt Cloud’s managed features Requires dbt Cloud subscription
Simple Airflow integration. Less control over individual model tasks in Airflow.
No need to manage dbt environments.

Best Practices and Tips

  • Use environment variables for secrets, never hardcode credentials.
  • Monitor your Airflow tasks, set up alerts for failures.
  • Group models using Task Groups or split into multiple DAGs for very large projects.
  • Test locally before deploying to production.
  • Document your pipeline for future maintainers.

Running dbt pipelines with Apache Airflow is not only possible—it’s a best practice for many data teams. Whether you choose the simple BashOperator, the powerful Astronomer Cosmos, the managed dbt Cloud, or a Kubernetes-native approach, you can orchestrate robust, reliable data pipelines.

Start simple, then scale up as your needs grow. Happy data engineering!

If you found this guide helpful, please share it with your colleagues and subscribe for more data engineering tips!

FAQs

Should I use one big DAG or split into multiple DAGs?

For small projects, one DAG is fine. For large projects (hundreds of models), consider splitting by domain or using Task Groups.

Can I use Airflow to run only specific dbt models?

Yes! Use the --models flag in your dbt command or configure Cosmos to select specific models.

What if I have dependency conflicts between dbt and Airflow?

Use a separate virtual environment for dbt (as with Cosmos) or run dbt in a Docker container.

Is it safe to commit profiles.yml?

Yes, if you use environment variables for secrets. Never commit plain-text credentials.