How to Run dbt Pipelines with Apache Airflow & Docker: Step-by-Step Guide for Data Engineers

If you work in data engineering, you’ve probably heard of dbt and Apache Airflow. Both are powerful tools, but many teams wonder: Can I run dbt pipelines using Airflow? The answer is yes! In this guide, we’ll explore several ways to do this, from the simplest to the most advanced, with easy-to-follow code examples.

What is dbt?

dbt (data build tool) is an open-source tool that helps you transform data in your warehouse by writing SQL SELECT statements. dbt handles running these transformations in the right order, testing your data, and generating documentation.

Key features:

Write modular SQL models.
Run models in dependency order.
Test and document your data.

What is Apache Airflow?

Apache Airflow is an open-source platform to programmatically author, schedule, and monitor workflows. You define workflows as Directed Acyclic Graphs (DAGs) of tasks.

Key features:

Schedule and orchestrate complex workflows.
Monitor and retry failed tasks.
Integrate with many data tools.

Why combine dbt and Airflow?

Orchestration: Airflow handles the scheduling and dependencies between dbt runs.
Monitoring: Airflow provides a central place to monitor all your data pipelines.
Scalability: Airflow can handle large numbers of dbt runs.
Cost-effectiveness: Airflow is free to use, while dbt Cloud is paid.
Automation: Combine dbt with other tasks (e.g., data ingestion, notifications).

Method 1: The BashOperator Approach

The easiest way to run dbt from Airflow is to use the BashOperator. This operator lets you run any shell command, including dbt CLI commands.

Step by Step Guide

Install dbt in your Airflow environment:

pip install dbt-core dbt-postgres  # or your adapter

Add your dbt project and profiles to your Airflow repo:

Place your dbt project in a subfolder (e.g., dags/dbt_project).
Add your profiles.yml (can use environment variables for secrets).

Create an Airflow DAG using BashOperator:

from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime

default_args = {
    'owner': 'airflow',
    'start_date': datetime(2024, 1, 1),
}

with DAG('dbt_simple_pipeline',
         default_args=default_args,
         schedule_interval='@daily',
         catchup=False) as dag:

    dbt_seed = BashOperator(
        task_id='dbt_seed',
        bash_command='dbt seed --project-dir /path/to/dbt_project --profiles-dir /path/to/profiles'
    )

    dbt_run = BashOperator(
        task_id='dbt_run',
        bash_command='dbt run --project-dir /path/to/dbt_project --profiles-dir /path/to/profiles'
    )

    dbt_test = BashOperator(
        task_id='dbt_test',
        bash_command='dbt test --project-dir /path/to/dbt_project --profiles-dir /path/to/profiles'
    )

    dbt_seed >> dbt_run >> dbt_test

Replace /path/to/dbt_project and /path/to/profiles with your actual paths.

Pros and Cons

Pros	Cons
Simple to set up	All dbt models run as a single task—no visibility into individual model status
No extra packages needed	If one model fails, you must rerun the whole pipeline
	Potential dependency conflicts between dbt and Airflow
	You need to commit profiles.yml (but can use env vars for secrets)

Method 2: Using Airflow PodOperator

If you use Kubernetes, you can run dbt in isolated pods using Airflow’s KubernetesPodOperator or PodOperator. This is great for large projects or when you want to avoid dependency conflicts.

Step by Step Guide

Create a Dockerfile

# Use an official Python image as the base
FROM python:3.10-slim

# Set environment variables
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1

# Install system dependencies
RUN apt-get update && apt-get install -y \
    build-essential \
    git \
    && rm -rf /var/lib/apt/lists/*

# Install dbt (replace dbt-postgres with your adapter if needed)
RUN pip install --upgrade pip
RUN pip install dbt-core dbt-postgres

# Set work directory
WORKDIR /dbt

# (Optional) Copy your dbt project into the image
# COPY . /dbt

# Default command (can be overridden)
CMD ["dbt", "--help"]

Build the Docker Image. Replace myrepo/dbt:latest with your desired image name and tag.

docker build -t myrepo/dbt:latest .

Log in to your Container Registry. I'm using AWS ECR in this example, but you can use any container registry of your choice.

aws ecr get-login-password --region <region> | docker login --username AWS --password-stdin <aws_account_id>.dkr.ecr.<region>.amazonaws.com

Tag the Image

docker tag myrepo/dbt:latest <aws_account_id>.dkr.ecr.<region>.amazonaws.com/dbt:latest

Push the Image

docker push <aws_account_id>.dkr.ecr.<region>.amazonaws.com/dbt:latest

Create a DAG using KubernetesPodOperator:

from airflow import DAG
from airflow.providers.cncf.kubernetes.operators.kubernetes_pod import KubernetesPodOperator
from datetime import datetime

default_args = {
    'owner': 'airflow',
    'start_date': datetime(2024, 1, 1),
}

with DAG('dbt_k8s_pipeline',
         default_args=default_args,
         schedule_interval='@daily',
         catchup=False) as dag:

    dbt_run = KubernetesPodOperator(
        namespace='default',
        image='myrepo/dbt:latest',  # Your dbt Docker image
        cmds=['dbt', 'run'],
        arguments=[
            '--project-dir', '/dbt_project',
            '--profiles-dir', '/dbt_profiles'
        ],
        name='dbt-run',
        task_id='dbt_run',
        get_logs=True,
        is_delete_operator_pod=True,
        in_cluster=True,
        do_xcom_push=False,
        env_vars={
            'DBT_ENV_SECRET': 'mysecret'
        },
        volume_mounts=[],  # Add if you need to mount volumes
        volumes=[],
    )

Pros and Cons

Pros	Cons
Full isolation—no dependency conflicts.	Requires Kubernetes and Docker knowledge.
Scales well for large projects.	More setup (build/push images, configure volumes).
Works great in Kubernetes-native environments.

Method 3: Using Astronomer Cosmos

Astronomer Cosmos is an open-source Airflow provider that lets you run dbt Core projects as Airflow DAGs and Task Groups. It gives you more control and visibility by turning each dbt model into its own Airflow task.

Key benefits:

Use Airflow connections instead of dbt profiles.
Install dbt in a separate virtual environment.
Each dbt model becomes a separate Airflow task.
Simulate dbt build (run + test per model).
Use Airflow’s sensors and features.

Step by Step Guide

Install Astronomer Cosmos

pip install astronomer-cosmos

Install dbt in a separate environment (optional but recommended):

python -m venv /path/to/dbt_venv
source /path/to/dbt_venv/bin/activate
pip install dbt-core dbt-postgres

Set up your dbt project and Airflow connection:

Place your dbt project in a known location.
Configure Airflow connection for your warehouse (e.g., Snowflake, Postgres).

Create a DAG using Cosmos:

from airflow import DAG
from cosmos.providers.dbt.core.operators import DbtDag
from cosmos.providers.dbt.core.config import DbtProjectConfig, DbtProfileConfig, ExecutionConfig
from datetime import datetime

default_args = {
    'owner': 'airflow',
    'start_date': datetime(2024, 1, 1),
}

with DAG('dbt_cosmos_pipeline',
         default_args=default_args,
         schedule_interval='@daily',
         catchup=False) as dag:

    dbt_project_config = DbtProjectConfig(
        dbt_project_path='/path/to/dbt_project'
    )

    dbt_profile_config = DbtProfileConfig(
        profile_name='my_profile',
        target_name='dev',
        # Use Airflow connection instead of profiles.yml
        conn_id='my_snowflake_conn'
    )

    execution_config = ExecutionConfig(
        dbt_executable_path='/path/to/dbt_venv/bin/dbt'
    )

    dbt_dag = DbtDag(
        task_id='dbt_dag',
        project_config=dbt_project_config,
        profile_config=dbt_profile_config,
        execution_config=execution_config
    )

This will create a task for each dbt model and test!

Pros and Cons

Pros	Cons
Each dbt model is a separate Airflow task—better visibility and control	More complex setup
Can use Airflow connections for credentials	More Airflow tasks (could be hundreds if you have many models)
Avoids dependency conflicts by using a separate dbt environment	Learning curve for Cosmos
Supports advanced features (sensors, retries, etc.)

Bonus: Integrating with dbt Cloud

dbt Cloud is a managed service for dbt, offering scheduling, logging, and a web UI. You can trigger dbt Cloud jobs from Airflow using the official provider.

Step by Step Guide

Install the dbt Cloud Airflow provider

pip install apache-airflow-providers-dbt-cloud[http]

Create a dbt Cloud service token:
1. In dbt Cloud, go to Settings → Account → Service Tokens.
Set up Airflow connection:
1. Store your dbt Cloud credentials in Airflow as a connection:

export AIRFLOW_CONN_DBT_CLOUD_DEFAULT='dbt-cloud://account_id:secret_token@my-url.getdbt.com'

Create a DAG to trigger dbt Cloud jobs:

from airflow import DAG
from airflow.providers.dbt.cloud.operators.dbt import DbtCloudRunJobOperator
from datetime import datetime

default_args = {
    'owner': 'airflow',
    'start_date': datetime(2024, 1, 1),
}

with DAG('dbt_cloud_pipeline',
         default_args=default_args,
         schedule_interval='@daily',
         catchup=False) as dag:

    run_dbt_job = DbtCloudRunJobOperator(
        task_id='run_dbt_cloud_job',
        job_id=12345,  # Replace with your dbt Cloud job ID
        account_id=67890,  # Replace with your dbt Cloud account ID
        wait_for_termination=True,
        dag=dag
    )

Pros and Cons

Pros	Cons
Leverage dbt Cloud’s managed features	Requires dbt Cloud subscription
Simple Airflow integration.	Less control over individual model tasks in Airflow.
No need to manage dbt environments.

Best Practices and Tips

Use environment variables for secrets, never hardcode credentials.
Monitor your Airflow tasks, set up alerts for failures.
Group models using Task Groups or split into multiple DAGs for very large projects.
Test locally before deploying to production.
Document your pipeline for future maintainers.

Running dbt pipelines with Apache Airflow is not only possible—it’s a best practice for many data teams. Whether you choose the simple BashOperator, the powerful Astronomer Cosmos, the managed dbt Cloud, or a Kubernetes-native approach, you can orchestrate robust, reliable data pipelines.

Start simple, then scale up as your needs grow. Happy data engineering!

If you found this guide helpful, please share it with your colleagues and subscribe for more data engineering tips!

FAQs

Should I use one big DAG or split into multiple DAGs?

For small projects, one DAG is fine. For large projects (hundreds of models), consider splitting by domain or using Task Groups.

Can I use Airflow to run only specific dbt models?

Yes! Use the --models flag in your dbt command or configure Cosmos to select specific models.

What if I have dependency conflicts between dbt and Airflow?

Use a separate virtual environment for dbt (as with Cosmos) or run dbt in a Docker container.

Is it safe to commit `profiles.yml`?

Yes, if you use environment variables for secrets. Never commit plain-text credentials.

Running dbt with Airflow

What is dbt?

What is Apache Airflow?

Why combine dbt and Airflow?

Method 1: The BashOperator Approach

Step by Step Guide

Pros and Cons

Method 2: Using Airflow PodOperator

Step by Step Guide

Pros and Cons

Method 3: Using Astronomer Cosmos

Step by Step Guide

Pros and Cons

Bonus: Integrating with dbt Cloud

Step by Step Guide

Pros and Cons

Best Practices and Tips

FAQs

Should I use one big DAG or split into multiple DAGs?

Can I use Airflow to run only specific dbt models?

What if I have dependency conflicts between dbt and Airflow?

Is it safe to commit `profiles.yml`?

Author

Read next

10 Powerful Jinja Loop Patterns for dbt (incl. 3 Bonus Patterns)

“Maximum Recursion Depth Exceeded” Error in dbt Macros

Standardising Logging in dbt Macros

Automating PII Tagging in dbt with OpenAI

Subscribe to my Newsletter

What is dbt?

What is Apache Airflow?

Why combine dbt and Airflow?

Method 1: The BashOperator Approach

Step by Step Guide

Pros and Cons

Method 2: Using Airflow PodOperator

Step by Step Guide

Pros and Cons

Method 3: Using Astronomer Cosmos

Step by Step Guide

Pros and Cons

Bonus: Integrating with dbt Cloud

Step by Step Guide

Pros and Cons

Best Practices and Tips

FAQs

Should I use one big DAG or split into multiple DAGs?

Can I use Airflow to run only specific dbt models?

What if I have dependency conflicts between dbt and Airflow?

Is it safe to commit profiles.yml?

Author

Read next

10 Powerful Jinja Loop Patterns for dbt (incl. 3 Bonus Patterns)

“Maximum Recursion Depth Exceeded” Error in dbt Macros

Standardising Logging in dbt Macros

Automating PII Tagging in dbt with OpenAI

Subscribe to my Newsletter

Is it safe to commit `profiles.yml`?