The average AWS bill has significant waste. Not because teams are careless, but because AWS pricing is complex, defaults favor AWS revenue, and the path of least resistance leads to expensive configurations. Here are the specific places where most teams are leaving money on the table.
EC2: On-Demand When You Need Committed Pricing
Running production workloads on On-Demand EC2 instances is the most common and most expensive mistake. On-Demand pricing has no commitment and no discount.
| Purchase option | Discount vs On-Demand |
|---|---|
| On-Demand | 0% |
| 1-year Compute Savings Plan | ~34% |
| 3-year Compute Savings Plan | ~54% |
| Spot Instances | ~70-90% |
If your instances run 24/7, Savings Plans are straightforward. A Compute Savings Plan applies to any EC2 usage (any region, any instance family, any OS) making it more flexible than Reserved Instances. A team running $10,000/month in On-Demand EC2 can reach $6,600/month with a 1-year Compute Savings Plan immediately.
Spot instances for fault-tolerant workloads - batch jobs, CI/CD workers, data processing - save 70-90%. Use Spot for anything that can be interrupted and restarted.
NAT Gateway: The Silent Bill Killer
NAT Gateway charges $0.045 per GB of data processed plus $0.045/hour per gateway. This is invisible until your bill arrives.
A common anti-pattern: resources in private subnets downloading packages, fetching S3 objects, or calling AWS APIs through a NAT Gateway. All of that traffic is billed at $0.045/GB.
Fix 1: VPC Endpoints for AWS services
resource "aws_vpc_endpoint" "s3" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.us-east-1.s3"
route_table_ids = [aws_route_table.private.id]
}
S3 access from private subnets goes through the VPC endpoint - no NAT Gateway, no charge. Same pattern works for DynamoDB, SQS, SNS, Secrets Manager, SSM, ECR.
Fix 2: Put read-heavy workloads in public subnets (with security groups limiting inbound traffic)
Teams that moved from NAT Gateway to VPC endpoints for S3 and ECR access typically cut their NAT Gateway costs by 40-60%.
RDS: Multi-AZ When Read Replicas Are Enough
Multi-AZ RDS doubles your instance cost. It provides automatic failover - the standby is ready within 60-120 seconds if the primary fails.
For most applications, a Read Replica with manual promotion is sufficient and costs the same as a single instance. The tradeoff: ~5-10 minutes to promote instead of 1-2 minutes for Multi-AZ failover.
If your SLA requires under 2 minutes of downtime for a database failure, Multi-AZ is justified. If not, a read replica for reads plus manual failover capability saves 50% on RDS.
Also check: are you running db.r6g.2xlarge when monitoring shows 15% CPU utilization? RDS is often over-provisioned because the migration friction to downsize feels high. Schedule a right-sizing review quarterly.
Fargate: CPU/Memory Ratios Matter
Fargate pricing is per vCPU and per GB of memory. The valid combinations constrain your options - you cannot request exactly what you need, you round up to the next valid size.
Common waste: requesting 4 vCPU / 8GB when your application uses 1.5 vCPU / 3GB. You pay for 4 vCPU / 8GB.
Enable Container Insights for Fargate, collect CPU and memory utilization for two weeks, then right-size. Moving from 4 vCPU / 8GB to 2 vCPU / 4GB (if utilization supports it) is a 50% reduction in Fargate costs. If you are running containers on Kubernetes instead of Fargate, the same overprovisioning problem exists at a larger scale - see Kubernetes cost optimization tactics for cluster-specific fixes.
Data Transfer: The Most Confusing Bill Line
AWS charges for data transfer out to the internet, between regions, and between availability zones. The rules are complex and the costs accumulate:
- Data transfer out to internet: $0.09/GB (first 10TB)
- Cross-AZ transfer: $0.01/GB each direction
- Cross-region transfer: $0.02-0.08/GB depending on regions
Cross-AZ transfer is the one that surprises teams. If your application makes frequent requests across availability zones - application servers in us-east-1a talking to databases in us-east-1b - you pay for it. Keep services that talk frequently in the same AZ.
For data transfer to the internet, CloudFront has lower egress rates than direct EC2 egress and serves content from edge locations. For applications serving significant data to users, CloudFront in front of S3 or ALB often reduces both data transfer costs and latency.
CloudWatch Logs: Retention Policies
Default CloudWatch Logs retention is never (logs are kept forever). Many teams have years of log data accumulating at $0.03/GB per month. A moderately busy application can accumulate 50GB+ per month.
resource "aws_cloudwatch_log_group" "app" {
name = "/app/production"
retention_in_days = 30 # or 7, or 90 - depends on your compliance needs
}
Set retention policies on all log groups. Application logs beyond 30-90 days are rarely useful. Compliance requirements may push this to 90 or 365 days, but “never” is the wrong default.
The Audit Process
Run this monthly:
- AWS Cost Explorer - filter by service, sort by cost, look for surprises
- AWS Trusted Advisor (or third-party tools like Infracost) - auto-identifies idle resources
- Right-sizing recommendations in Compute Optimizer - EC2 and Lambda right-sizing suggestions
- Data transfer section of your bill - anything above expectations
Bottom Line
Most AWS over-spending falls in five categories: On-Demand EC2 instead of Savings Plans, NAT Gateway data processing instead of VPC endpoints, over-provisioned RDS, incorrectly sized Fargate tasks, and unlimited CloudWatch log retention. None of these require architectural changes. They are configuration decisions that compound into thousands of dollars per month. Audit each one and you will likely find 20-35% of your AWS bill is recoverable.
Comments