dbt Monorepo vs Multi-repo

A monorepo is a single code repository that houses multiple dbt models, macros, tests, and configurations across domains or functional areas. Common in early-stage teams or centralized data functions.
A multi-repo architecture divides dbt work into multiple repositories, usually mapped to domains, business units, or teams. Each repo is self-contained with its own dbt_project.yml
.
Industry Context
Monorepo Context | Multi-repo Context |
---|---|
- Popular in startups or centralised data teams where agility and shared logic are key. | - Common in data mesh, large-scale enterprises, or regulated industries requiring modular governance. |
- Tools like dbt Cloud, GitHub Actions, or GitLab CI can easily run a single pipeline. | - Often paired with enterprise CI/CD platforms (Azure DevOps, Jenkins, etc.) to manage isolated pipelines. |
- Encourages tight cross-functional collaboration but may lead to coupling and permission complexity. | - Promotes decentralized ownership (data product teams) in line with Data Mesh or DDD (Domain Driven Design). |
Structural Comparison
Monorepo:
/dbt
├── models/
│ ├── staging/
│ ├── marts/
│ ├── finance/
│ └── marketing/
├── macros/
├── seeds/
├── tests/
└── dbt_project.yml
Multi-repo:
- Repo:
dbt-finance
- Repo:
dbt-marketing
- Repo:
dbt-core-utils
(for shared macros, tests)
Each repo:
/dbt-project
├── models/
├── macros/
├── seeds/
└── dbt_project.yml
Versioning, Releases, and Dependency Management
Monorepo
- Single versioning cycle: one Git tag/version for all domains.
- Challenge: hard to do independent releases; e.g., a marketing model change may trigger full regression testing.
- Best Practice: Use dbt selectors (
--select tag:finance
) to narrow build scope.
Multi-repo
- Independent lifecycle: each repo can be versioned, released, and deployed independently.
- Best Practice: Publish reusable logic (macros/tests) as dbt packages in a
dbt-core-utils
repo.
Security & Governance
Dimension | Monorepo | Multi-repo |
---|---|---|
Access Control | Requires branch protections and folder-level conventions. Git doesn’t natively support fine-grained access per folder. | Git-level permissions per repo. E.g., Finance team can't see Marketing's code unless explicitly granted. |
Audit Trail | All changes logged in one repo – may get noisy. | Easier to audit changes per domain or team. |
Best Practices
- Use CODEOWNERS in monorepos for folder-level review enforcement.
- Implement SonarQube, dbt-expectations, and pre-commit hooks in both for code quality.
CI/CD & Automation
Monorepo
- Single CI/CD pipeline with conditionals:
if: changes include 'models/finance/**'
- Use tools like:
dbt build --select path:models/finance/
- GitHub Actions matrix strategies
- Best Practice: Cache
dbt deps
and state comparison using artifacts for performance.
Multi-repo
- Each repo has its own CI/CD pipeline.
- E.g.,
dbt-finance
triggers on PR merge and deploys only Finance models. - Best Practice: Use Git submodules or a private package registry (like GitHub Packages or Artifactory) for macro/test reuse.
Testing, Quality & Reuse
Factor | Monorepo | Multi-repo |
---|---|---|
Unit Tests | Cross-folder tests easier to write and run | Requires mocks or data contracts across repos |
Macros | Native sharing across domains | Must be published and versioned via packages |
Seeds & Fixtures | Centrally defined and accessed | Duplicated or abstracted via shared repo |
Best Practices
- Use
dbt-utils
andcustom-utils
in both setups. - Document macro and model contracts using dbt docs + descriptions + meta tags.
- Use dbt artifacts and manifest.json for downstream system integrations (like lineage tools or alerting platforms).
Team Collaboration & Ownership
Monorepo
- Encourages shared ownership, but needs discipline in naming, foldering, tagging.
- Difficult for teams to deploy autonomously.
- Good for centralized governance and fast prototyping.
Multi-repo
- Allows true federated data ownership (e.g., Data Mesh).
- Each domain can deploy independently.
- Ideal for regulated industries (banking, healthcare) needing domain isolation.
Best Practices
- Monorepo: Establish a project governance committee for standards and naming.
- Multi-repo: Define and publish interface contracts for shared models (e.g.,
customer_dim
schema, field expectations).
Interdependencies & Data Mesh
Dimension | Monorepo | Multi-repo |
---|---|---|
Cross-domain dependencies | Simple DAGs using folder hierarchy | Requires referencing published sources via source() across repos |
Data Contracts | Implicit within same repo | Must be explicit with schemas and documentation |
Lineage | Easily visualized in dbt docs | May need manifest stitching or 3rd party lineage tools (e.g., Alation, Collibra, Atlan) |
Best Practices
- In multi-repo: Publish key tables (e.g., customer_dim) as
source()
in downstream repos and version their schemas.
Decision Framework: Key Considerations
Before deciding on a repository structure, evaluate the following aspects:
Workflow Dynamics
- Review Processes: Who is responsible for approving pull requests?
- Deployment Pipelines: How are code changes promoted across environments (e.g., dev → QA → prod)?
- Access Controls: Who has access to different environments and data objects?
Team Structure and Collaboration
- Team Interactions: Do teams collaborate closely or operate independently?
- Coding Standards: Are there unified or varied coding styles and review processes?
- Data Source Usage: Do teams share data sources, or are they domain-specific?
- Data Ownership: Are there clear boundaries regarding data ownership and consumption?
When to Choose What?
Choose Monorepo If:
- Small or medium team size
- High collaboration and fast prototyping
- Centralized data modeling and governance
- No strict domain boundaries or compliance barriers
- Limited CI/CD complexity
Choose Multi-repo If:
- Large teams with domain-specific ownership
- Independent deployment cycles needed
- Regulatory/permission isolation is critical
- You're adopting Data Mesh or DDD principles
- Need to scale modular governance and CI/CD
Final Recommendation
- Start Simple: For small teams or organizations new to dbt, a monorepo offers simplicity and ease of management.
- Assess Growth: As the organization scales, monitor for signs that a monorepo is becoming a bottleneck, such as long build times or complex merge conflicts.
- Plan for Transition: If transitioning to a multi-repo setup, plan meticulously to manage dependencies and maintain consistency.
If you're scaling or pursuing data mesh, or have stringent governance needs, go multi-repo and invest in:
- Shared macro/test packaging
- Interface documentation
- Cross-repo dependency tooling
References
How to Configure Your dbt Repository (One or Many)? | dbt Developer Blog
Discussion