Infrastructure Documentation
AWS Cloud Infrastructure for sprkzdoc.com
Overview
Infrastructure at a Glance
Platform: AWS Cloud (us-west-2 region)
IaC Tool: OpenTofu 1.6.0+ (Terraform)
Environment: Development (dev)
Domain: sprkzdoc.com
Estimated Cost: $110-160/month
High-Level Architecture
Key Architecture Features
- High Availability: Multi-AZ deployment with automatic failover
- Scalability: Auto-scaling ECS tasks (1-3) based on CPU/memory
- Security: Private subnets, security groups, encrypted storage
- Performance: CloudFront CDN, ElastiCache, optimized networking
- Monitoring: CloudWatch alarms, X-Ray tracing, CloudTrail audit logs
Infrastructure Modules
The infrastructure is composed of 12 specialized Terraform modules, each handling a specific aspect of the system:
🌐 Networking
Purpose: VPC, subnets, routing
Resources: VPC (10.0.0.0/16), 6 subnets, NAT Gateway
Lines: 217
🔒 Security
Purpose: Security groups, IAM roles
Resources: 4 security groups, 2 IAM roles, 3 secrets
Size: 5.5 KB
💾 Database
Purpose: RDS MySQL, ElastiCache
Resources: RDS instance, Valkey cluster, secrets
Lines: 299
🗄️ Storage
Purpose: S3 buckets with encryption
Resources: 6 S3 buckets, lifecycle policies
Lines: 540
🚀 Compute
Purpose: ECR, ECS cluster
Resources: ECR repo, ECS cluster, CloudWatch logs
Lines: 84
⚖️ Load Balancing
Purpose: Application Load Balancer
Resources: ALB, target groups, HTTPS listeners
Lines: 143
🌍 DNS
Purpose: Route53, ACM certificates
Resources: DNS records, ACM validation
Lines: 202
⚡ CDN
Purpose: CloudFront distributions
Resources: 3 distributions, OAC, caching
Lines: 405
📊 Monitoring
Purpose: CloudWatch, X-Ray, CloudTrail
Resources: Log groups, 7 alarms, tracing
Lines: 309
Purpose: SES email service
Resources: Domain identity, DKIM, receipt rules
Lines: 282
🔧 Application Service
Purpose: Reusable service deployment
Resources: Target groups, auto-scaling
Lines: 367
🏠 Homepage
Purpose: Static website hosting
Resources: S3 bucket, CloudFront, certificates
Status: Recently added
AWS Resources Summary
| Resource Type | Count | Details |
|---|---|---|
| Security Groups | 4 | ALB, ECS, RDS, ElastiCache |
| IAM Roles | 2 | Task execution, Task runtime |
| Subnets | 6 | 2 public, 1 private, 2 database, 1 cache |
| S3 Buckets | 7 | Static, templates, private, emails, formman, homepage, logs |
| CloudFront Distributions | 4 | Viewer, FormMan, Homepage, API assets |
| Route53 Records | 8+ | Subdomains, domain identity, SES records |
| RDS Instances | 1 | MySQL 8.0 (db.t3.micro) |
| ElastiCache Clusters | 1 | Valkey 7.2 (cache.t4g.small) |
| ECS Clusters | 1 | Fargate capacity provider |
| CloudWatch Alarms | 7 | ALB, ECS, RDS, ElastiCache monitoring |
| ACM Certificates | 4 | ALB, CloudFront distributions |
| Secrets Manager Entries | 4 | RDS, ElastiCache, App config |
S3 Buckets Overview
| Bucket | Purpose | Versioning | Lifecycle |
|---|---|---|---|
| sprkzdoc-static-assets-dev | Static web assets | Disabled | None |
| sprkzdoc-pdf-templates-dev | PDF templates | Enabled | 30-day old version expiry, 90-day IA transition |
| sprkzdoc-private-dev | Private storage (PHI) | Enabled | 90-day old version expiry |
| sprkzdoc-incoming-emails-dev | SES email storage | Disabled | 30-day IA, 90-day expiry |
| sprkzdoc-formman-dev | FormMan static app | Disabled | None |
| sprkzdoc-homepage-dev | Website homepage | Disabled | None |
| sprkzdoc-logs-dev | ALB & CloudFront logs | Disabled | 30-day IA, 90-day Glacier, 365-day expiry |
Deployment Guide
Prerequisites
- AWS Account with admin credentials configured
- OpenTofu >= 1.6.0 installed
- Docker for building container images
- AWS CLI configured and authenticated
- Domain hosted in Route53 (sprkzdoc.com)
Initial Deployment
# 1. Initialize OpenTofu
cd ../infra.sprkzdoc.com
tofu init
# 2. Review the plan
tofu plan -out=infrastructure.tfplan
# 3. Apply the infrastructure
tofu apply infrastructure.tfplan
# 4. Build and push Docker image
cd hello-world-app
docker build -t sprkzdoc-app:latest .
# 5. Login to ECR
aws ecr get-login-password --region us-west-2 | \
docker login --username AWS --password-stdin \
${ACCOUNT_ID}.dkr.ecr.us-west-2.amazonaws.com
# 6. Tag and push image
docker tag sprkzdoc-app:latest \
${ACCOUNT_ID}.dkr.ecr.us-west-2.amazonaws.com/sprkzdoc-application-dev:latest
docker push \
${ACCOUNT_ID}.dkr.ecr.us-west-2.amazonaws.com/sprkzdoc-application-dev:latest
# 7. Deploy ECS service
./scripts/deploy-ecs-tasks.sh
Deployment Scripts
| Script | Purpose |
|---|---|
deploy.sh |
Full infrastructure deployment |
deploy-ecs-tasks.sh |
Deploy ECS task definitions and services |
deploy-service.sh |
Universal service deployment (build, push, deploy) |
deploy-application.sh |
Application-specific deployment |
deploy-formman.sh |
FormMan service deployment |
verify-deployment.sh |
Post-deployment verification and testing |
State Management
Backend: S3 bucket with encryption and versioning
Locking: DynamoDB prevents concurrent modifications
Bucket: fperx-terraform-state-dev
Lock Table: fperx-terraform-locks-dev
Recovery: Automatic S3 versioning for state rollback
Cost Breakdown
| Service | Configuration | Monthly Cost |
|---|---|---|
| ECS Fargate | 1 vCPU, 2GB RAM, 24/7 | $20 |
| RDS MySQL | db.t3.micro, 24/7 | $10 |
| ElastiCache | cache.t4g.small, 24/7 | $23 |
| Application Load Balancer | 2 AZs, 24/7 | $16 |
| NAT Gateway | 1 gateway, 24/7 | $32 |
| CloudFront | Low traffic (4 distributions) | $1 |
| S3 Storage | ~50GB across 7 buckets | $2 |
| Other Services | Secrets Manager, CloudWatch, etc. | $6 |
| TOTAL MONTHLY COST | $110 | |
Cost Optimization Options
Manual Stop/Start: Stop RDS when not needed (~$10/month savings)
Scale Down ECS: Set desired count to 0 (~$20/month savings)
Note: NAT Gateway cannot be stopped (~$32/month fixed cost)
WAF: Currently disabled to save $5/month in dev environment
Cost-Saving Features
- S3 Gateway Endpoint: FREE - eliminates NAT data transfer costs for S3
- Lifecycle Policies: Automatic transition to cheaper storage tiers
- Log Retention: 3-day retention in dev (vs 30+ days in prod)
- Single-AZ RDS: Half the cost of Multi-AZ (acceptable for dev)
- Auto-Scaling: ECS tasks scale down during low traffic
Security Features
Data Protection
- Encryption at Rest: AES256 on all S3 buckets, RDS, and ElastiCache
- Encryption in Transit: TLS 1.2+ enforced on ALB, CloudFront, ElastiCache, RDS
- Secrets Management: All passwords and tokens stored in AWS Secrets Manager
- Password Generation: Auto-generated strong passwords (32+ characters)
Network Security
- VPC Isolation: Private subnets for database and cache layers
- Security Groups: Least privilege ingress/egress rules
- NAT Gateway: Masks private IP addresses for outbound traffic
- S3 Gateway Endpoint: Direct S3 access without internet routing
- CloudFront OAC: Origin Access Control for S3 bucket security
Identity & Access Management
- Separate IAM Roles: Task execution role vs. task runtime role
- Least Privilege: Task role only has required S3, SES, Secrets permissions
- Service-to-Service: VPC security group rules control inter-service communication
- No Hardcoded Credentials: All secrets retrieved from Secrets Manager at runtime
Audit & Compliance
- CloudTrail: All API calls logged to S3 with 7-day retention
- CloudWatch Logs: Centralized logging for ECS, RDS, SES, API Gateway
- X-Ray Tracing: Distributed tracing for performance and error analysis
- PHI Data: Restricted to private storage bucket with versioning and encryption
Deletion Protection
ALB & RDS: Can be enabled to prevent accidental deletion
Current Setting: Disabled for dev environment (enable in production)
Toggle via: enable_deletion_protection variable
Monitoring & Observability
CloudWatch Log Groups
| Log Group | Retention | Purpose |
|---|---|---|
| /ecs/sprkzdoc-application-dev | 3 days | Container application logs |
| /aws/rds/instance/.../error | Auto | RDS error logs |
| /aws/cloudtrail/sprkzdoc-dev | 7 days | API audit trail |
| /aws/ses/sprkzdoc-dev | 3 days | Email service logs |
CloudWatch Alarms
| Alarm | Threshold | Duration |
|---|---|---|
| ALB High Response Time | >2 seconds | 10 minutes (2 periods) |
| ALB Unhealthy Targets | >=1 target | 5 minutes |
| ECS High CPU | >80% | 15 minutes |
| ECS High Memory | >80% | 15 minutes |
| RDS High CPU | >80% | 15 minutes |
| RDS Low Storage | <2GB free | 5 minutes |
| ElastiCache High CPU | >75% | 15 minutes |
Health Checks
- ALB Target Group: Health check on
/healthendpoint every 30 seconds - ECS Service: Rolling deployment with circuit breaker protection
- RDS: Multi-AZ failover capability (manually managed in dev)
- ElastiCache: Automatic failure detection and recovery
Distributed Tracing
X-Ray: Enabled for end-to-end request tracing
Sampling: Default sampling rule configured
Integration: ECS tasks send traces to X-Ray daemon
Analysis: Service maps, latency analysis, error tracking
Troubleshooting Quick Reference
ECS Tasks Not Starting
# Check service events
aws ecs describe-services --cluster sprkzdoc-cluster-dev \
--services sprkzdoc-application-service-dev
# View CloudWatch logs
aws logs tail /ecs/sprkzdoc-application-dev --follow
# Check task failures
aws ecs list-tasks --cluster sprkzdoc-cluster-dev \
--desired-status STOPPED
Database Connection Issues
# Verify security group rules
aws ec2 describe-security-groups \
--group-ids sg-034f3d950be0ee440
# Check RDS logs
aws logs tail /aws/rds/instance/sprkzdoc-mysql-dev/error \
--follow
# Test database connectivity from ECS task
aws ecs execute-command --cluster sprkzdoc-cluster-dev \
--task <task-id> --interactive --command "/bin/bash"
ACM Certificate Issues
# Verify CNAME records created
aws route53 list-resource-record-sets \
--hosted-zone-id Z10449081GN434OHGYWP7 \
--query "ResourceRecordSets[?Type=='CNAME']"
# Check certificate status
aws acm describe-certificate \
--certificate-arn <arn> \
--region us-east-1
CloudFront Deployment Issues
# Check distribution status
aws cloudfront get-distribution \
--id E27W1P6D1AJLGE \
--query 'Distribution.Status'
# Invalidate cache
aws cloudfront create-invalidation \
--distribution-id E27W1P6D1AJLGE \
--paths "/*"