Observability for a 5-person team — Sentry, Grafana, plain logs

Enterprise observability stacks (Datadog, New Relic, full OpenTelemetry pipelines) cost $50K+/year and require dedicated ownership. Small teams can build effective observability for under $200/month with Sentry, Grafana Cloud, and structured logs.

A 5-engineer team running production needs observability. Datadog or New Relic would solve it — for $30K-100K/year and ongoing tuning. There's a lighter stack that covers 90% of needs for under $200/month and requires no dedicated ownership.

The three pillars, applied small

1. Errors — Sentry

Sentry catches application errors, groups by signature, alerts. For a 5-person team:

  • Free tier covers ~5K errors/month — enough for small SaaS.
  • Team plan ($26/user/month) scales without surprises.
  • Integrations with everything (Slack, Linear, GitHub).
  • Release tracking ties errors to deployments.
  • Source maps for client-side errors.

Setup: one SDK install, one DSN per environment. Done in 30 minutes.

2. Metrics and dashboards — Grafana Cloud

Free tier includes 10K metrics, 50GB logs, 50GB traces. Generous for small SaaS.

Use:

  • Prometheus exporters for app and infrastructure metrics.
  • Loki for log aggregation.
  • Tempo for traces (when needed).
  • Built-in dashboards for common services.

Setup: 1-2 days for instrumentation + dashboards.

3. Logs — structured JSON to Loki or BetterStack

Every log line as JSON with consistent fields:

{
  "level": "error",
  "time": "2026-06-12T15:23:01Z",
  "service": "api",
  "event": "payment_failed",
  "user_id": "u-42",
  "order_id": "o-9876",
  "error": "insufficient_funds",
  "trace_id": "abc123"
}

Searchable, parseable, joins with metrics via trace_id.

The minimum useful set of metrics

  • Request rate per endpoint.
  • Error rate per endpoint (4xx separated from 5xx).
  • Latency p50, p95, p99 per endpoint.
  • Database query time p95.
  • External API call latency + error rate per provider.
  • Background job queue depth and processing time.
  • Infrastructure — CPU, memory, disk I/O on each instance.

Resist the urge to instrument everything. Add metrics when they answer a specific question.

The minimum useful set of dashboards

  • Overall health. Request rate, error rate, p95 latency across services.
  • Per-service deep dive. Same metrics broken down by endpoint.
  • External dependencies. Latency and error rates for third-party APIs.
  • Infrastructure. CPU/memory/disk on each instance.
  • Business metrics. Signups, conversions, revenue (per hour/day).

Five dashboards. Each one fits on a screen. Each has one purpose.

Alerts that don't burn out on-call

Alert only on issues that require immediate action:

  • Error rate >5% sustained for 5 minutes.
  • p95 latency >2x baseline for 5 minutes.
  • Disk space >90%.
  • Database connection pool exhaustion.
  • Background queue not draining.
  • Payment processor down.

Page-worthy alerts: 5-15. Anything else is a dashboard signal, not a page.

Tracing — when you need it

Tracing is powerful but expensive in time. Add it when:

  • You can't explain latency from metrics alone.
  • Multi-service requests are common.
  • Customer-reported issues need fine-grained debugging.

Skip it when:

  • You have one service.
  • Logs with trace_id correlation are enough.
  • Cost vs benefit doesn't pencil.

On-call rotation for small teams

  • 2-week rotation per engineer.
  • Weekly handoff Mondays.
  • Runbook for top 10 alert types — what to check, who to escalate to.
  • Postmortem after every page (lightweight, 1-page max).

Tools: PagerDuty free tier (3 users), incident.io free tier.

What to skip

  • Full OpenTelemetry pipeline. Overkill for small team.
  • Dedicated monitoring engineer. Distributed responsibility works at this scale.
  • Enterprise APM ($30K+/year). Sentry + Grafana cover 90% for 1% the cost.
  • Synthetic monitoring at scale. 1-2 endpoint checks from Pingdom or BetterStack is enough.
  • SLI/SLO frameworks. Useful at scale. Premature for small team.

Cost

For a 5-engineer team running typical SaaS:

  • Sentry Team plan: $130/month (5 users).
  • Grafana Cloud Free or Pro: $0-150/month.
  • PagerDuty free.
  • BetterStack uptime: $20-40/month.

Total: $150-320/month. Versus $2,500-8,000/month for Datadog at equivalent coverage.

Verdict

Small teams don't need enterprise observability. Sentry for errors, Grafana for metrics/dashboards, structured JSON logs to Loki, lean alerting. Setup in a week, $150-320/month, covers 90% of needs. Add tracing and SLOs when you've outgrown this — usually at 30+ engineers, not 5.

Learn more about our competence
Web development, AI, automation — what we build and how.