ETL with dbt for a small data team — practical patterns
A 1-3 person data team can build maintainable analytics with dbt and avoid the typical mess of ad-hoc SQL. Practical patterns for project structure, testing, documentation, and CI without over-engineering.
A 1-3 person data team facing ad-hoc SQL chaos has two paths: keep adding queries until no one understands the system, or adopt dbt. dbt is overkill for trivial use cases but a clear win when raw SQL queries grow past 30-50 and analysts complain about untrustworthy numbers.
This is what "using dbt well" looks like for a small team in 2026 — not the enterprise patterns from dbt's docs, the boutique reality.
Project structure
Three layers, mapped to folders:
models/
staging/ # raw sources cleaned (stg_*.sql)
intermediate/ # business logic (int_*.sql)
marts/ # final tables for BI (fct_, dim_, agg_*.sql)
sources.yml # external source definitions
seeds/ # static reference data
snapshots/ # SCD2 tracking
macros/ # reusable SQL
tests/ # custom tests
analyses/ # ad-hoc analysis (not built)
This three-layer split (staging → intermediate → marts) is the most important architectural decision. Don't skip it.
Staging models
One staging model per source table. Rename columns, cast types, handle NULLs. No business logic.
-- models/staging/stg_orders.sql
select
order_id::int as order_id,
customer_id::int as customer_id,
order_total::numeric(12,2) as order_total,
order_status::text as order_status,
order_placed_at::timestamp as placed_at,
nullif(promo_code, '') as promo_code
from {{ source('shop', 'orders') }}
Staging models are 1-to-1 with sources. Nothing fancy. They make downstream models clean.
Intermediate models
Reusable business logic. Not exposed to BI directly. Examples:
- "int_orders_with_status" — joins orders with status history.
- "int_customers_lifetime_metrics" — aggregates per customer.
These are used by multiple mart models. Avoid duplicating logic.
Mart models
Final tables for BI. Fact and dimension tables. Aggregated tables for common dashboards.
-- models/marts/fct_orders.sql
select
o.order_id,
o.placed_at,
o.order_total,
c.customer_segment,
p.promo_discount
from {{ ref('stg_orders') }} o
left join {{ ref('dim_customers') }} c
on o.customer_id = c.customer_id
left join {{ ref('int_promos') }} p
on o.promo_code = p.promo_code
Testing
dbt's built-in tests cover most needs:
- not_null — catch missing required fields.
- unique — catch duplicate keys.
- accepted_values — catch bad enums.
- relationships — catch broken foreign keys.
# schema.yml
models:
- name: fct_orders
columns:
- name: order_id
tests:
- unique
- not_null
- name: customer_id
tests:
- relationships:
to: ref('dim_customers')
field: customer_id
Run tests in CI on every PR. Block merges if they fail.
Documentation
Document models and columns inline. dbt builds searchable docs.
- name: fct_orders
description: One row per order. Source of truth for revenue.
columns:
- name: order_total
description: Gross order value before refunds, in EUR.
Future analysts thank you in 6 months when looking for which table represents what.
Snapshots for slowly-changing dimensions
Customer status changes over time. To track history:
-- snapshots/customers_snapshot.sql
{% snapshot customers_snapshot %}
{{
config(
target_schema='snapshots',
unique_key='customer_id',
strategy='timestamp',
updated_at='updated_at',
)
}}
select * from {{ source('shop', 'customers') }}
{% endsnapshot %}
Run nightly. Now you can query "what was this customer's status on 2026-03-15."
CI/CD
Simple but effective:
- On PR: run dbt parse, dbt run --select state:modified+ on staging schema, dbt test.
- On merge to main: deploy models to production schema.
- Scheduled (daily): dbt run + dbt test on production.
- Alerts to Slack on test failures.
What not to do
- Skip staging. Marts hitting sources directly become unmaintainable.
- Use views everywhere. Tables are faster for marts. Views okay for staging.
- Add Jinja everywhere. Macros are powerful and easily abused.
- Skip tests. Untested models silently break.
- One giant model. Break into smaller ones. dbt is built for this.
When dbt is overkill
If you have:
- Under 20 models total.
- Single analyst with full mental model.
- No CI/CD needs.
- Simple linear pipeline.
Then plain SQL files in version control plus a cron job may be enough.
Verdict
dbt is right for any data team with 30+ models, multiple stakeholders, or analytics on production data. Three-layer structure (staging, intermediate, marts), built-in tests on every model, simple CI, snapshots for history. Small teams shouldn't adopt enterprise patterns — just the basics, well done.