Observability for a 5-person team — Sentry, Grafana, plain logs
Enterprise observability stacks (Datadog, New Relic, full OpenTelemetry pipelines) cost $50K+/year and require dedicated ownership. Small teams can build effective observability for under $200/month with Sentry, Grafana Cloud, and structured logs.
A 5-engineer team running production needs observability. Datadog or New Relic would solve it — for $30K-100K/year and ongoing tuning. There's a lighter stack that covers 90% of needs for under $200/month and requires no dedicated ownership.
The three pillars, applied small
1. Errors — Sentry
Sentry catches application errors, groups by signature, alerts. For a 5-person team:
- Free tier covers ~5K errors/month — enough for small SaaS.
- Team plan ($26/user/month) scales without surprises.
- Integrations with everything (Slack, Linear, GitHub).
- Release tracking ties errors to deployments.
- Source maps for client-side errors.
Setup: one SDK install, one DSN per environment. Done in 30 minutes.
2. Metrics and dashboards — Grafana Cloud
Free tier includes 10K metrics, 50GB logs, 50GB traces. Generous for small SaaS.
Use:
- Prometheus exporters for app and infrastructure metrics.
- Loki for log aggregation.
- Tempo for traces (when needed).
- Built-in dashboards for common services.
Setup: 1-2 days for instrumentation + dashboards.
3. Logs — structured JSON to Loki or BetterStack
Every log line as JSON with consistent fields:
{
"level": "error",
"time": "2026-06-12T15:23:01Z",
"service": "api",
"event": "payment_failed",
"user_id": "u-42",
"order_id": "o-9876",
"error": "insufficient_funds",
"trace_id": "abc123"
}
Searchable, parseable, joins with metrics via trace_id.
The minimum useful set of metrics
- Request rate per endpoint.
- Error rate per endpoint (4xx separated from 5xx).
- Latency p50, p95, p99 per endpoint.
- Database query time p95.
- External API call latency + error rate per provider.
- Background job queue depth and processing time.
- Infrastructure — CPU, memory, disk I/O on each instance.
Resist the urge to instrument everything. Add metrics when they answer a specific question.
The minimum useful set of dashboards
- Overall health. Request rate, error rate, p95 latency across services.
- Per-service deep dive. Same metrics broken down by endpoint.
- External dependencies. Latency and error rates for third-party APIs.
- Infrastructure. CPU/memory/disk on each instance.
- Business metrics. Signups, conversions, revenue (per hour/day).
Five dashboards. Each one fits on a screen. Each has one purpose.
Alerts that don't burn out on-call
Alert only on issues that require immediate action:
- Error rate >5% sustained for 5 minutes.
- p95 latency >2x baseline for 5 minutes.
- Disk space >90%.
- Database connection pool exhaustion.
- Background queue not draining.
- Payment processor down.
Page-worthy alerts: 5-15. Anything else is a dashboard signal, not a page.
Tracing — when you need it
Tracing is powerful but expensive in time. Add it when:
- You can't explain latency from metrics alone.
- Multi-service requests are common.
- Customer-reported issues need fine-grained debugging.
Skip it when:
- You have one service.
- Logs with trace_id correlation are enough.
- Cost vs benefit doesn't pencil.
On-call rotation for small teams
- 2-week rotation per engineer.
- Weekly handoff Mondays.
- Runbook for top 10 alert types — what to check, who to escalate to.
- Postmortem after every page (lightweight, 1-page max).
Tools: PagerDuty free tier (3 users), incident.io free tier.
What to skip
- Full OpenTelemetry pipeline. Overkill for small team.
- Dedicated monitoring engineer. Distributed responsibility works at this scale.
- Enterprise APM ($30K+/year). Sentry + Grafana cover 90% for 1% the cost.
- Synthetic monitoring at scale. 1-2 endpoint checks from Pingdom or BetterStack is enough.
- SLI/SLO frameworks. Useful at scale. Premature for small team.
Cost
For a 5-engineer team running typical SaaS:
- Sentry Team plan: $130/month (5 users).
- Grafana Cloud Free or Pro: $0-150/month.
- PagerDuty free.
- BetterStack uptime: $20-40/month.
Total: $150-320/month. Versus $2,500-8,000/month for Datadog at equivalent coverage.
Verdict
Small teams don't need enterprise observability. Sentry for errors, Grafana for metrics/dashboards, structured JSON logs to Loki, lean alerting. Setup in a week, $150-320/month, covers 90% of needs. Add tracing and SLOs when you've outgrown this — usually at 30+ engineers, not 5.