ETL with dbt for a small data team — practical patterns

A 1-3 person data team can build maintainable analytics with dbt and avoid the typical mess of ad-hoc SQL. Practical patterns for project structure, testing, documentation, and CI without over-engineering.

A 1-3 person data team facing ad-hoc SQL chaos has two paths: keep adding queries until no one understands the system, or adopt dbt. dbt is overkill for trivial use cases but a clear win when raw SQL queries grow past 30-50 and analysts complain about untrustworthy numbers.

This is what "using dbt well" looks like for a small team in 2026 — not the enterprise patterns from dbt's docs, the boutique reality.

Project structure

Three layers, mapped to folders:

models/
  staging/        # raw sources cleaned (stg_*.sql)
  intermediate/   # business logic (int_*.sql)
  marts/          # final tables for BI (fct_, dim_, agg_*.sql)
  sources.yml     # external source definitions
seeds/            # static reference data
snapshots/        # SCD2 tracking
macros/           # reusable SQL
tests/            # custom tests
analyses/         # ad-hoc analysis (not built)

This three-layer split (staging → intermediate → marts) is the most important architectural decision. Don't skip it.

Staging models

One staging model per source table. Rename columns, cast types, handle NULLs. No business logic.

-- models/staging/stg_orders.sql
select
    order_id::int as order_id,
    customer_id::int as customer_id,
    order_total::numeric(12,2) as order_total,
    order_status::text as order_status,
    order_placed_at::timestamp as placed_at,
    nullif(promo_code, '') as promo_code
from {{ source('shop', 'orders') }}

Staging models are 1-to-1 with sources. Nothing fancy. They make downstream models clean.

Intermediate models

Reusable business logic. Not exposed to BI directly. Examples:

  • "int_orders_with_status" — joins orders with status history.
  • "int_customers_lifetime_metrics" — aggregates per customer.

These are used by multiple mart models. Avoid duplicating logic.

Mart models

Final tables for BI. Fact and dimension tables. Aggregated tables for common dashboards.

-- models/marts/fct_orders.sql
select
    o.order_id,
    o.placed_at,
    o.order_total,
    c.customer_segment,
    p.promo_discount
from {{ ref('stg_orders') }} o
left join {{ ref('dim_customers') }} c
  on o.customer_id = c.customer_id
left join {{ ref('int_promos') }} p
  on o.promo_code = p.promo_code

Testing

dbt's built-in tests cover most needs:

  • not_null — catch missing required fields.
  • unique — catch duplicate keys.
  • accepted_values — catch bad enums.
  • relationships — catch broken foreign keys.
# schema.yml
models:
  - name: fct_orders
    columns:
      - name: order_id
        tests:
          - unique
          - not_null
      - name: customer_id
        tests:
          - relationships:
              to: ref('dim_customers')
              field: customer_id

Run tests in CI on every PR. Block merges if they fail.

Documentation

Document models and columns inline. dbt builds searchable docs.

  - name: fct_orders
    description: One row per order. Source of truth for revenue.
    columns:
      - name: order_total
        description: Gross order value before refunds, in EUR.

Future analysts thank you in 6 months when looking for which table represents what.

Snapshots for slowly-changing dimensions

Customer status changes over time. To track history:

-- snapshots/customers_snapshot.sql
{% snapshot customers_snapshot %}
{{
  config(
    target_schema='snapshots',
    unique_key='customer_id',
    strategy='timestamp',
    updated_at='updated_at',
  )
}}
select * from {{ source('shop', 'customers') }}
{% endsnapshot %}

Run nightly. Now you can query "what was this customer's status on 2026-03-15."

CI/CD

Simple but effective:

  • On PR: run dbt parse, dbt run --select state:modified+ on staging schema, dbt test.
  • On merge to main: deploy models to production schema.
  • Scheduled (daily): dbt run + dbt test on production.
  • Alerts to Slack on test failures.

What not to do

  • Skip staging. Marts hitting sources directly become unmaintainable.
  • Use views everywhere. Tables are faster for marts. Views okay for staging.
  • Add Jinja everywhere. Macros are powerful and easily abused.
  • Skip tests. Untested models silently break.
  • One giant model. Break into smaller ones. dbt is built for this.

When dbt is overkill

If you have:

  • Under 20 models total.
  • Single analyst with full mental model.
  • No CI/CD needs.
  • Simple linear pipeline.

Then plain SQL files in version control plus a cron job may be enough.

Verdict

dbt is right for any data team with 30+ models, multiple stakeholders, or analytics on production data. Three-layer structure (staging, intermediate, marts), built-in tests on every model, simple CI, snapshots for history. Small teams shouldn't adopt enterprise patterns — just the basics, well done.

Learn more about our competence
Web development, AI, automation — what we build and how.