If you work in data engineering, you’ve probably heard of dbt and Apache Airflow. Both are powerful tools, but many teams wonder: Can I run dbt pipelines using Airflow? The answer is yes! In this guide, we’ll explore several ways to do this, from the simplest to the most advanced, with easy-to-follow code examples.
👨💻
Video tutorial + Code Repo inside! Exclusively for our readers, I've added the code repo that you can simply clone and use to do local development on Airflow and dbt, along with a video on how to set it up on your local machine. Don't forget to check the video and repo link at the end of this article.
What is dbt?
dbt (data build tool) is an open-source tool that helps you transform data in your warehouse by writing SQL SELECT statements. dbt handles running these transformations in the right order, testing your data, and generating documentation.
Key features:
Write modular SQL models.
Run models in dependency order.
Test and document your data.
What is Apache Airflow?
Apache Airflow is an open-source platform to programmatically author, schedule, and monitor workflows. You define workflows as Directed Acyclic Graphs (DAGs) of tasks.
Key features:
Schedule and orchestrate complex workflows.
Monitor and retry failed tasks.
Integrate with many data tools.
Why combine dbt and Airflow?
Orchestration: Airflow handles the scheduling and dependencies between dbt runs.
Monitoring: Airflow provides a central place to monitor all your data pipelines.
Scalability: Airflow can handle large numbers of dbt runs.
Cost-effectiveness: Airflow is free to use, while dbt Cloud is paid.
Automation: Combine dbt with other tasks (e.g., data ingestion, notifications).
Method 1: The BashOperator Approach
The easiest way to run dbt from Airflow is to use the BashOperator. This operator lets you run any shell command, including dbt CLI commands.
Step by Step Guide
Install dbt in your Airflow environment:
pip install dbt-core dbt-postgres # or your adapter
Add your dbt project and profiles to your Airflow repo:
Place your dbt project in a subfolder (e.g., dags/dbt_project).
Add your profiles.yml (can use environment variables for secrets).
Create an Airflow DAG using BashOperator:
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime
default_args = {
'owner': 'airflow',
'start_date': datetime(2024, 1, 1),
}
with DAG('dbt_simple_pipeline',
default_args=default_args,
schedule_interval='@daily',
catchup=False) as dag:
dbt_seed = BashOperator(
task_id='dbt_seed',
bash_command='dbt seed --project-dir /path/to/dbt_project --profiles-dir /path/to/profiles'
)
dbt_run = BashOperator(
task_id='dbt_run',
bash_command='dbt run --project-dir /path/to/dbt_project --profiles-dir /path/to/profiles'
)
dbt_test = BashOperator(
task_id='dbt_test',
bash_command='dbt test --project-dir /path/to/dbt_project --profiles-dir /path/to/profiles'
)
dbt_seed >> dbt_run >> dbt_test
Replace /path/to/dbt_project and /path/to/profiles with your actual paths.
Pros and Cons
Pros
Cons
Simple to set up
All dbt models run as a single task—no visibility into individual model status
No extra packages needed
If one model fails, you must rerun the whole pipeline
Potential dependency conflicts between dbt and Airflow
You need to commit profiles.yml (but can use env vars for secrets)
Method 2: Using Airflow PodOperator
If you use Kubernetes, you can run dbt in isolated pods using Airflow’s KubernetesPodOperator or PodOperator. This is great for large projects or when you want to avoid dependency conflicts.
Step by Step Guide
Create a Dockerfile
# Use an official Python image as the base
FROM python:3.10-slim
# Set environment variables
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1
# Install system dependencies
RUN apt-get update && apt-get install -y \
build-essential \
git \
&& rm -rf /var/lib/apt/lists/*
# Install dbt (replace dbt-postgres with your adapter if needed)
RUN pip install --upgrade pip
RUN pip install dbt-core dbt-postgres
# Set work directory
WORKDIR /dbt
# (Optional) Copy your dbt project into the image
# COPY . /dbt
# Default command (can be overridden)
CMD ["dbt", "--help"]
Build the Docker Image. Replace myrepo/dbt:latest with your desired image name and tag.
docker build -t myrepo/dbt:latest .
Log in to your Container Registry. I'm using AWS ECR in this example, but you can use any container registry of your choice.
from airflow import DAG
from airflow.providers.cncf.kubernetes.operators.kubernetes_pod import KubernetesPodOperator
from datetime import datetime
default_args = {
'owner': 'airflow',
'start_date': datetime(2024, 1, 1),
}
with DAG('dbt_k8s_pipeline',
default_args=default_args,
schedule_interval='@daily',
catchup=False) as dag:
dbt_run = KubernetesPodOperator(
namespace='default',
image='myrepo/dbt:latest', # Your dbt Docker image
cmds=['dbt', 'run'],
arguments=[
'--project-dir', '/dbt_project',
'--profiles-dir', '/dbt_profiles'
],
name='dbt-run',
task_id='dbt_run',
get_logs=True,
is_delete_operator_pod=True,
in_cluster=True,
do_xcom_push=False,
env_vars={
'DBT_ENV_SECRET': 'mysecret'
},
volume_mounts=[], # Add if you need to mount volumes
volumes=[],
)
Pros and Cons
Pros
Cons
Full isolation—no dependency conflicts.
Requires Kubernetes and Docker knowledge.
Scales well for large projects.
More setup (build/push images, configure volumes).
Works great in Kubernetes-native environments.
Method 3 (Recommended): Using Astronomer Cosmos
Astronomer Cosmos is an open-source Airflow provider that lets you run dbt Core projects as Airflow DAGs and Task Groups. It gives you more control and visibility by turning each dbt model into its own Airflow task.
Key benefits:
Use Airflow connections instead of dbt profiles.
Install dbt in a separate virtual environment.
Each dbt model becomes a separate Airflow task.
Simulate dbt build (run + test per model).
Use Airflow’s sensors and features.
Step by Step Guide
Install Astronomer Cosmos
pip install astronomer-cosmos
Install dbt in a separate environment (optional but recommended):
Configure Airflow connection for your warehouse (e.g., Snowflake, Postgres).
Create a DAG using Cosmos:
from airflow import DAG
from cosmos.providers.dbt.core.operators import DbtDag
from cosmos.providers.dbt.core.config import DbtProjectConfig, DbtProfileConfig, ExecutionConfig
from datetime import datetime
default_args = {
'owner': 'airflow',
'start_date': datetime(2024, 1, 1),
}
with DAG('dbt_cosmos_pipeline',
default_args=default_args,
schedule_interval='@daily',
catchup=False) as dag:
dbt_project_config = DbtProjectConfig(
dbt_project_path='/path/to/dbt_project'
)
dbt_profile_config = DbtProfileConfig(
profile_name='my_profile',
target_name='dev',
# Use Airflow connection instead of profiles.yml
conn_id='my_snowflake_conn'
)
execution_config = ExecutionConfig(
dbt_executable_path='/path/to/dbt_venv/bin/dbt'
)
dbt_dag = DbtDag(
task_id='dbt_dag',
project_config=dbt_project_config,
profile_config=dbt_profile_config,
execution_config=execution_config
)
This will create a task for each dbt model and test!
Pros and Cons
Pros
Cons
Each dbt model is a separate Airflow task—better visibility and control
More complex setup
Can use Airflow connections for credentials
More Airflow tasks (could be hundreds if you have many models)
Avoids dependency conflicts by using a separate dbt environment
Learning curve for Cosmos
Supports advanced features (sensors, retries, etc.)
Bonus: Integrating with dbt Cloud
dbt Cloud is a managed service for dbt, offering scheduling, logging, and a web UI. You can trigger dbt Cloud jobs from Airflow using the official provider.
from airflow import DAG
from airflow.providers.dbt.cloud.operators.dbt import DbtCloudRunJobOperator
from datetime import datetime
default_args = {
'owner': 'airflow',
'start_date': datetime(2024, 1, 1),
}
with DAG('dbt_cloud_pipeline',
default_args=default_args,
schedule_interval='@daily',
catchup=False) as dag:
run_dbt_job = DbtCloudRunJobOperator(
task_id='run_dbt_cloud_job',
job_id=12345, # Replace with your dbt Cloud job ID
account_id=67890, # Replace with your dbt Cloud account ID
wait_for_termination=True,
dag=dag
)
Pros and Cons
Pros
Cons
Leverage dbt Cloud’s managed features
Requires dbt Cloud subscription
Simple Airflow integration.
Less control over individual model tasks in Airflow.
No need to manage dbt environments.
Best Practices and Tips
Use environment variables for secrets, never hardcode credentials.
Monitor your Airflow tasks, set up alerts for failures.
Group models using Task Groups or split into multiple DAGs for very large projects.
Test locally before deploying to production.
Document your pipeline for future maintainers.
Running dbt pipelines with Apache Airflow is not only possible—it’s a best practice for many data teams. Whether you choose the simple BashOperator, the powerful Astronomer Cosmos, the managed dbt Cloud, or a Kubernetes-native approach, you can orchestrate robust, reliable data pipelines.
Start simple, then scale up as your needs grow. Happy data engineering!
If you found this guide helpful, please share it with your colleagues and subscribe for more data engineering tips!
Sign up for dbt Engineer
Join 2000+ data engineers and developers discovering the latest in dbt and analytics engineering.
No spam. Unsubscribe anytime.
FAQs
Should I use one big DAG or split into multiple DAGs?
For small projects, one DAG is fine. For large projects (hundreds of models), consider splitting by domain or using Task Groups.
Can I use Airflow to run only specific dbt models?
Yes! Use the --models flag in your dbt command or configure Cosmos to select specific models.
What if I have dependency conflicts between dbt and Airflow?
Use a separate virtual environment for dbt (as with Cosmos) or run dbt in a Docker container.
Is it safe to commit profiles.yml?
Yes, if you use environment variables for secrets. Never commit plain-text credentials.
Code Repo + Video Tutorial
Congrats if you've reached this far. Now, when I was writing this blog post, I've realised that there weren't many tutorials online to setup Airflow with dbt Core for local development. The ones that were available online were either broken or using outdated Airflow and dbt Core. So I've decided to build a template repo, which you guys can clone and start building locally. Check the video and link below for instructions on how to set it up!
Struggling with "maximum recursion depth exceeded" in dbt macros? This article explains why it happens—usually due to missing or misused variables—and shows simple, beginner-friendly steps to fix it, ensuring your dbt projects run smoothly and error-free.
Learn how to standardise logging in dbt macros for clear, consistent CLI output. This guide shows you how to create a reusable logging macro, making debugging and tracking macro activity in your dbt projects easier and more efficient.
Automate PII tagging in your dbt project using OpenAI! Streamline data governance, ensure compliance, and save time by leveraging AI to detect and flag sensitive information in your data models. Integrate this workflow into CI/CD for robust, scalable data protection.
Subscribe to my Newsletter
Join 2000+ data engineers and developers discovering the latest in dbt and analytics engineering.