Serverless cost traps and how to avoid them
Serverless promises pay-only-for-what-you-use. Real bills frequently surprise teams: 50x estimated cost from unexpected patterns. The traps are predictable and avoidable if you know what to look for.
Serverless billing is brutally simple in principle: pay for invocations and duration. In practice, the bill can be 5-50x what was estimated. Specific patterns cause this, and most teams discover them only after the credit card statement arrives.
Trap 1: idle but allocated
You provisioned 3GB memory for a function that uses 200MB during a 100ms burst. AWS charges for the full 3GB × duration.
Fix: profile memory usage and provision accurately. Often you can drop from 3GB to 512MB.
Trap 2: cold start overhead
Provisioned concurrency keeps functions warm. Costs even when no invocations happen. Easy to leave running and forget.
Fix: enable provisioned concurrency only for production-critical paths. Set autoscaling schedules to ramp down off-peak.
Trap 3: log volume
Functions log per invocation. With 100K invocations/hour and verbose JSON logs, you generate gigabytes/day. CloudWatch Logs charges for ingest and storage.
Real numbers: structured logs at 5KB per invocation × 100K/hour = 12GB/day. CloudWatch ingest alone ~$6/day or $180/month for log writes alone, plus storage.
Fix: log levels in production, sample debug logs, route to cheaper storage (S3 Glacier for archive).
Trap 4: retry storms
A downstream service is slow. Functions time out. They retry. Now 3x invocations for the same work. Downstream gets worse. More retries.
Cost compounds. Latency too.
Fix: exponential backoff with jitter, circuit breakers, dead-letter queues for failed messages.
Trap 5: infinite recursion
Function A writes to S3. S3 triggers function A again. Each invocation triggers another. Within minutes, millions of invocations.
Real story: $30K AWS bill in 4 hours from this pattern.
Fix: explicit guards. Never let a function trigger itself directly or transitively without conditions. Set CloudWatch alarms on invocation rate.
Trap 6: distributed tracing overhead
Tracing every span sends data to AWS X-Ray, Datadog, or similar. At high throughput, tracing costs match application costs.
Fix: sampling. 1% sampling preserves ability to debug while dropping cost 99%.
Trap 7: warm pool overprovisioning
You set provisioned concurrency to 100 to handle traffic spikes. Spikes don't happen. You're paying for 100 warm functions doing nothing.
Fix: autoscale provisioned concurrency based on metrics. Or accept some cold starts for less-critical paths.
Trap 8: VPC connection cost
Functions in VPC use NAT Gateway for outbound internet. NAT Gateway charges per GB transferred. High-throughput function fetching API data through NAT can rack up serious costs.
Fix: VPC endpoints for AWS services, avoid NAT for high-traffic outbound, use Lambda outside VPC when possible.
Trap 9: Step Functions transitions
Each step transition is billable. A 50-step workflow × 100K invocations = 5M state transitions. Standard workflows are pricey at scale.
Fix: use Express workflows for high-frequency cases. Or batch work into fewer steps.
Trap 10: dev/test environments
Multiple environments multiply costs. Dev has 10K test invocations/day. Staging has integration tests running constantly. None cleaned up.
Fix: monthly review of all environments. Aggressive autoscaling to zero in non-prod. Scheduled shutdowns.
Detection
Set up cost alerts before they're surprises:
- AWS Budgets alert at 50%, 75%, 100% of expected monthly spend.
- Cost anomaly detection — AWS will email on unusual patterns.
- Weekly review of top-cost services.
- Tag every resource so cost attribution works.
When serverless is wrong
Despite the appeal, serverless is not the right answer for:
- Steady high-throughput workloads. Container or VM is cheaper.
- Long-running jobs. Lambda's 15-minute limit forces ugly workarounds.
- Heavy connection pooling needs. Database connections per invocation are expensive.
- Stateful processing. Functions are stateless by design.
Hybrid is normal
Most production systems use Lambda for spiky, low-throughput, event-driven work; containers (ECS/EKS) or VMs for steady workloads. Mixing is fine.
Verdict
Serverless costs hit teams that don't actively manage them. The traps are predictable: idle allocations, log spam, retries, recursion, observability overhead. Set budgets and anomaly alerts before deploying serverless to production. Profile memory and duration. Use circuit breakers and sampling. Tag everything. Without these, the bill will surprise you at the wrong time.