Café Nimbus — AWS Cloud Infrastructure

Five Phases

Every decision.
Every tradeoff.

Phase One

Get Online

Amazon S3 · Versioning · Lifecycle Policies · Cross-Region Replication

The Problem

Café Nimbus had no web presence. Customers were finding competitors instead. The site needed to be fast, cheap to run, and impossible to accidentally break during a content update.

What I Built

Static website on Amazon S3 with public access controlled entirely through bucket policy. S3 versioning enabled from day one. Lifecycle policies to transition older versions to S3-IA after 30 days. Cross-region replication as day-one non-negotiable.

Validated

Site live via S3 endpoint
403 confirmed before policy — access correctly blocked
File deleted and restored in under 60 seconds
Lifecycle rules confirmed active
Replication: object in destination bucket within 30 seconds

The Decision

Cross-region replication from day one. The counter-argument: Café Nimbus is too small for it. My counter: a regional outage hitting their only S3 bucket takes their entire web presence offline with no recovery option. Replication costs pennies at this scale.

The principle: Replication from day one, not after the first outage. The cost of preventing a disaster is always lower than the cost of recovering from one.

Phase Two

Get Dynamic

EC2 · LAMP Stack · AMI · Multi-Region Deployment

The Problem

A static site can display a menu. It cannot take orders, manage inventory, or run a real application. Café Nimbus needed a backend — and one that could be reproduced exactly if it ever had to be rebuilt.

What I Built

EC2 running a full LAMP stack — Linux, Apache, MySQL, PHP. After validating end-to-end (menu, order placement, data persistence), I created a golden AMI before touching anything else. From that AMI, launched an identical instance in a second region in minutes.

Validated

LAMP stack deployed, Apache accessible
Menu items loading correctly
Order placed and confirmed persisted
AMI created from configured instance
Second instance from AMI in alternate region — identical

The Decision

AMI before anything else. A manually configured server that only one person knows how to rebuild is a liability. An AMI is an asset. Every production environment should have a golden image before it serves a single real user.

The principle: A manually configured server is a liability. An AMI is an asset. The cost of creating it is an hour. The cost of not having it is a full rebuild under pressure.

Phase Three

Get Secure

Custom VPC · Bastion Host · NAT Gateway · Network ACLs

The Problem

Café Nimbus was going public with their platform. Their infrastructure had no meaningful network boundaries — everything was reachable from everywhere. Security had to be layered. A single misconfigured security group should not be enough to expose the entire backend.

What I Built

Custom VPC with /16 CIDR. Public subnet for ALB and bastion only. Private subnet for all application servers and the database — no public IPs, ever. NAT Gateway for outbound-only private traffic. Network ACLs as a stateless second layer of defence.

Validated

Private instances unreachable via direct connection
Bastion SSH confirmed as only entry path
NAT Gateway routing confirmed for private outbound
NACL deny rules blocked traffic as expected
Permitted traffic passed normally

The Decision

Security groups + NACLs in combination. Security groups are stateful — they remember connections. NACLs are stateless — they evaluate every packet independently. Having both means a misconfigured security group doesn't automatically become an open door.

The principle: No backend resource ever got a public IP. A single misconfigured security group without NACLs as a backstop is one mistake from a complete exposure.

Phase Four

Get Scalable

Application Load Balancer · Auto Scaling Group · Multi-AZ

The Problem

A single EC2 instance — no matter how well configured — is a single point of failure. When traffic spikes during a promotion, the site goes down. When the instance fails, the business goes dark. Neither is acceptable for a company preparing for national expansion.

What I Built

Application Load Balancer across two Availability Zones. Auto Scaling Group triggered by CPU utilization — not a schedule. Minimum instances maintained, scale-out on threshold breach, scale-in when load drops. Health checks replace failed instances automatically.

Validated

ALB distributing traffic across both AZs
Load simulation triggered scale-out within 90 seconds
Instance manually terminated mid-test — no visible interruption
ASG replaced terminated instance automatically
Scale-in confirmed when load dropped

The Decision

CPU utilization trigger, not scheduled scaling. Traffic is demand-driven, not time-predictable. Scheduling assumes you know when customers will come. Utilization-based scaling responds to what's actually happening.

The principle: Multi-AZ is not a luxury. A single-AZ deployment with ten instances is still a single point of failure. Two AZs with two instances each is genuinely resilient.

Phase Five

Get Autonomous

AWS Lambda · Amazon SNS · Amazon EventBridge

The Problem

Every morning, the operations team spent 45 minutes manually pulling the previous day's sales data and emailing it to management. Error-prone, time-consuming, and entirely unnecessary. The solution had to be serverless — running a cron job on EC2 means paying compute 24/7 for a 30-second task.

What I Built

Two Lambda functions: DataExtractor (queries RDS inside VPC) and SalesAnalysisReport (formats and delivers the report). SNS email topic for the operations distribution list. EventBridge rule fires at 8AM daily — no human involved, no idle compute.

Validated

DataExtractor confirmed connecting to RDS within VPC
Sales data pulled and formatted correctly
SNS subscription confirmed active
Lambda manually triggered — email within 30 seconds
EventBridge scheduled execution confirmed

The Decision

Lambda, not a cron job on EC2. Lambda runs only when triggered — at this workload, monthly cost is effectively zero. An EC2-based cron costs $15–30/month to idle 24/7 for a task that executes once per day. Serverless is not always the right answer. Here, it is the only answer.

The principle: The infrastructure does not need a human to survive a failure, handle a traffic spike, or send the morning report. That was the mandate. This is the result.

Decision	Chosen ✓	Rejected ✗	Rationale
Static hosting	S3 + bucket policy	EC2-hosted static site	No reason to run compute for files that never change
Content protection	S3 versioning day one	No versioning	Accidental overwrites have no recovery path without it
Regional resilience	Cross-region replication	Single-region only	One regional outage = total web presence loss
Server reproducibility	AMI before second deploy	Manual reconfiguration	A manually built server cannot be rebuilt reliably under pressure
Backend access	Bastion host only	Direct SSH + public IP	No backend resource should ever have a direct public route
Outbound private traffic	NAT Gateway	Public subnet for EC2s	Private isolation requires outbound-only — not bidirectional
Network defence	SGs + NACLs combined	Security groups alone	One misconfigured SG without NACLs = open door
Scaling trigger	CPU utilization	Scheduled scaling	Traffic is demand-driven, not time-predictable
AZ strategy	Multi-AZ ALB + ASG	Single-AZ more instances	Single AZ is a single point of failure regardless of count
Reporting automation	Lambda + EventBridge	Cron job on EC2	Idle compute 24/7 for a 30-second daily task

What I'd Add in Production

No architecture is finished.

These are the gaps I would close before calling this production-ready for a real business.

AWS WAF on ALB

The load balancer is publicly exposed. Without a Web Application Firewall, SQL injection and XSS have no automated defence at the network edge.

RDS + Secrets Manager

In a hardened environment, RDS credentials would be rotated automatically through Secrets Manager — no application code would contain a hardcoded password.

CloudTrail — All Regions

Every API call in the account should be logged. Without CloudTrail, there is no audit trail if something goes wrong or someone does something they shouldn't.

VPC Endpoints for S3

Traffic between EC2 and S3 currently routes through NAT Gateway. VPC Endpoints keep that traffic on the AWS private network and eliminate the NAT cost.

CloudWatch Dashboard

ALB request count, ASG instance count, Lambda errors, and RDS connections — all visible in one place, with SNS alarms when anything goes out of range.

WAF + Shield

For a nationally expanding café brand, DDoS protection at the ALB layer moves from a nice-to-have to a business continuity requirement.

Café
Nimbus

A café brand.
One server. No plan.

The full AWS stack

Every decision.
Every tradeoff.

Get Online

The Problem

What I Built

Validated

The Decision

Get Dynamic

The Problem

What I Built

Validated

The Decision

Get Secure

The Problem

What I Built

Validated

The Decision

Get Scalable

The Problem

What I Built

Validated

The Decision

Get Autonomous

The Problem

What I Built

Validated

The Decision

Every choice. Every reason.

13 services.
One infrastructure.

No architecture is finished.

AWS WAF on ALB

RDS + Secrets Manager

CloudTrail — All Regions

VPC Endpoints for S3

CloudWatch Dashboard

WAF + Shield

Five layers.
Zero manual steps.

CaféNimbus

A café brand.One server. No plan.

The full AWS stack

Every decision.Every tradeoff.

Get Online

The Problem

What I Built

Validated

The Decision

Get Dynamic

The Problem

What I Built

Validated

The Decision

Get Secure

The Problem

What I Built

Validated

The Decision

Get Scalable

The Problem

What I Built

Validated

The Decision

Get Autonomous

The Problem

What I Built

Validated

The Decision

Every choice. Every reason.

13 services.One infrastructure.

No architecture is finished.

AWS WAF on ALB

RDS + Secrets Manager

CloudTrail — All Regions

VPC Endpoints for S3

CloudWatch Dashboard

WAF + Shield

Five layers.Zero manual steps.

Café
Nimbus

A café brand.
One server. No plan.

Every decision.
Every tradeoff.

13 services.
One infrastructure.

Five layers.
Zero manual steps.