#data #dbt #ETL #analytics #small team

ETL with dbt for a small data team — practical patterns

A 1-3 person data team can build maintainable analytics with dbt and avoid the typical mess of ad-hoc SQL. Practical patterns for project structure, testing, documentation, and CI without over-engineering.

Jun 8, 2026

A 1-3 person data team facing ad-hoc SQL chaos has two paths: keep adding queries until no one understands the system, or adopt dbt. dbt is overkill for trivial use cases but a clear win when raw SQL queries grow past 30-50 and analysts complain about untrustworthy numbers.

This is what "using dbt well" looks like for a small team in 2026 — not the enterprise patterns from dbt's docs, the boutique reality.

Project structure

Three layers, mapped to folders:

models/
  staging/        # raw sources cleaned (stg_*.sql)
  intermediate/   # business logic (int_*.sql)
  marts/          # final tables for BI (fct_, dim_, agg_*.sql)
  sources.yml     # external source definitions
seeds/            # static reference data
snapshots/        # SCD2 tracking
macros/           # reusable SQL
tests/            # custom tests
analyses/         # ad-hoc analysis (not built)

This three-layer split (staging → intermediate → marts) is the most important architectural decision. Don't skip it.

Staging models

One staging model per source table. Rename columns, cast types, handle NULLs. No business logic.

-- models/staging/stg_orders.sql
select
    order_id::int as order_id,
    customer_id::int as customer_id,
    order_total::numeric(12,2) as order_total,
    order_status::text as order_status,
    order_placed_at::timestamp as placed_at,
    nullif(promo_code, '') as promo_code
from {{ source('shop', 'orders') }}

Staging models are 1-to-1 with sources. Nothing fancy. They make downstream models clean.

Intermediate models

Reusable business logic. Not exposed to BI directly. Examples:

"int_orders_with_status" — joins orders with status history.
"int_customers_lifetime_metrics" — aggregates per customer.

These are used by multiple mart models. Avoid duplicating logic.

Mart models

Final tables for BI. Fact and dimension tables. Aggregated tables for common dashboards.

-- models/marts/fct_orders.sql
select
    o.order_id,
    o.placed_at,
    o.order_total,
    c.customer_segment,
    p.promo_discount
from {{ ref('stg_orders') }} o
left join {{ ref('dim_customers') }} c
  on o.customer_id = c.customer_id
left join {{ ref('int_promos') }} p
  on o.promo_code = p.promo_code

Testing

dbt's built-in tests cover most needs:

not_null — catch missing required fields.
unique — catch duplicate keys.
accepted_values — catch bad enums.
relationships — catch broken foreign keys.

# schema.yml
models:
  - name: fct_orders
    columns:
      - name: order_id
        tests:
          - unique
          - not_null
      - name: customer_id
        tests:
          - relationships:
              to: ref('dim_customers')
              field: customer_id

Run tests in CI on every PR. Block merges if they fail.

Documentation

Document models and columns inline. dbt builds searchable docs.

  - name: fct_orders
    description: One row per order. Source of truth for revenue.
    columns:
      - name: order_total
        description: Gross order value before refunds, in EUR.

Future analysts thank you in 6 months when looking for which table represents what.

Snapshots for slowly-changing dimensions

Customer status changes over time. To track history:

-- snapshots/customers_snapshot.sql
{% snapshot customers_snapshot %}
{{
  config(
    target_schema='snapshots',
    unique_key='customer_id',
    strategy='timestamp',
    updated_at='updated_at',
  )
}}
select * from {{ source('shop', 'customers') }}
{% endsnapshot %}

Run nightly. Now you can query "what was this customer's status on 2026-03-15."

CI/CD

Simple but effective:

On PR: run dbt parse, dbt run --select state:modified+ on staging schema, dbt test.
On merge to main: deploy models to production schema.
Scheduled (daily): dbt run + dbt test on production.
Alerts to Slack on test failures.

What not to do

Skip staging. Marts hitting sources directly become unmaintainable.
Use views everywhere. Tables are faster for marts. Views okay for staging.
Add Jinja everywhere. Macros are powerful and easily abused.
Skip tests. Untested models silently break.
One giant model. Break into smaller ones. dbt is built for this.

When dbt is overkill

If you have:

Under 20 models total.
Single analyst with full mental model.
No CI/CD needs.
Simple linear pipeline.

Then plain SQL files in version control plus a cron job may be enough.

Verdict

dbt is right for any data team with 30+ models, multiple stakeholders, or analytics on production data. Three-layer structure (staging, intermediate, marts), built-in tests on every model, simple CI, snapshots for history. Small teams shouldn't adopt enterprise patterns — just the basics, well done.

Our web development services

Web development, AI, automation — what we build and how.