AWS Scenarios
Highly Available RDS App
Design a relational AWS application that survives instance and Availability Zone failures using private subnets, RDS Multi-AZ, backups, and application retry behavior.
After this, you will understand
This scenario separates three database requirements that exam answers often blur: availability, read scaling, and recovery from bad data.
Use RDS Multi-AZ for automatic failover, backups for recovery, and read replicas only when reads need to scale.
Learners use read replicas as the high availability answer or forget that bad writes can be replicated to every copy.
Place RDS privately, enable Multi-AZ for failover, keep backups for recovery, and make the application tolerate reconnects.
Think before readingWhy is Multi-AZ not enough to recover from an accidental table drop?
Reading in progress
This page is saved in your local study history so you can continue later.
Study path
Read these in order
Start with the mechanics, then move into the patterns that explain why the system is shaped this way.
Concepts Covered
- RDS private placement
- Database subnet groups
- Multi-AZ failover
- Read replicas
- Automated backups
- Point-in-time recovery
- Application retry behavior
- Connection pools
- Security groups
- Exam traps around database availability
1. Situation
A production web application uses a relational database. The business requires the app to remain available during a single database instance failure or Availability Zone problem. It also requires recovery from accidental data changes such as a bad migration or deleted records.
The application runs in private subnets behind an Application Load Balancer. The database should not be publicly accessible. The team wants low operational overhead, so running a self-managed database on EC2 is not preferred.
The likely database service is Amazon RDS or Aurora. This scenario focuses on RDS because it is a common SAA-C03 foundation.
2. Naive Design
The naive database design is one RDS DB instance in one Availability Zone:
app instances -> single-AZ RDS DB instance
It may have automated backups, but there is no standby for fast failover. The application assumes database connections never break. The database might be publicly accessible for convenience. The security group might allow access from broad IP ranges.
Another naive design adds a read replica and assumes the system is now highly available. That confuses read scaling with automatic failover.
A third naive design uses Multi-AZ and assumes backups are unnecessary. That confuses availability with data recovery.
3. What Breaks
Single-AZ databases are vulnerable to instance or AZ disruption. If the DB instance is impaired, the app loses its database until the instance recovers or manual restore happens.
Read replicas do not automatically solve primary failure for classic RDS designs. They are generally asynchronous and designed for read scaling or certain recovery patterns. They need promotion and application routing to become a replacement writer.
Multi-AZ improves availability but does not protect against every data problem. If an application deletes important rows, the deletion can replicate. A standby is not a time machine.
Application behavior can also break failover. Connection pools may hold stale connections. DNS caching may delay endpoint changes. Long transactions can be interrupted.
The architecture needs database availability, backup recovery, and application retry behavior together.
4. AWS Architecture
Use RDS in private database subnets across multiple Availability Zones.
Create a DB subnet group that includes private database subnets in at least two AZs. Configure RDS Multi-AZ when high availability and managed failover are required.
The application connects to the RDS endpoint, not to a specific underlying host. Security groups allow database traffic only from the application tier security group.
Enable automated backups and choose a retention window that matches recovery requirements. Use manual snapshots before risky changes. Consider point-in-time recovery for accidental changes.
If read traffic is a bottleneck, add read replicas and route read-only queries to replica endpoints. Do not use them as a substitute for Multi-AZ unless the question is specifically about manual promotion or disaster recovery planning.
5. Request Or Data Flow
User traffic enters through the Application Load Balancer and reaches app instances in private subnets.
The app connects to the RDS endpoint using database credentials from Secrets Manager or another secure store. The RDS endpoint directs traffic to the current writer.
During normal operation, the primary DB instance handles writes and reads unless the application intentionally sends some reads to read replicas.
If the primary fails in a Multi-AZ deployment, RDS performs failover to standby infrastructure. The endpoint remains the application-facing address, but existing database connections may break. The app must reconnect.
Backups run according to the configured retention. If bad data is written, recovery uses point-in-time restore or snapshots, not the standby.
6. Security Controls
Do not make the RDS database publicly accessible for a normal private application.
Use private database subnets and security groups. The database security group should allow inbound database port traffic only from the application security group.
Use IAM permissions to restrict who can modify RDS resources. Use database users and roles to restrict what applications can do inside the database.
Encrypt storage with KMS when required. Use TLS from the application to the database when sensitive data or compliance requirements call for it.
Store database credentials in Secrets Manager or Systems Manager Parameter Store. Rotate secrets where appropriate.
Enable logging and monitoring that fit the engine: CloudWatch metrics, enhanced monitoring, slow query logs, Performance Insights, and audit logs where needed.
7. Resilience Controls
RDS Multi-AZ is the main high availability control. It helps with DB instance failure, maintenance events, and some AZ-level failures.
Backups are the recovery control. Automated backups provide point-in-time recovery within the retention window. Manual snapshots support longer retention and pre-change checkpoints.
Use app-level retries with backoff. Make sure the app can reconnect after failover. Tune DNS caching and connection pool behavior so stale connections do not cause extended errors.
For regional disaster recovery, consider cross-Region backups, snapshot copy, cross-Region read replicas, or Aurora global database depending on RTO and RPO.
Test failover. A design that exists only on a diagram is not operational resilience.
8. Performance Controls
Scale the app tier separately from the database. More EC2 instances can increase database connection pressure, so use connection pooling.
Use read replicas when read-heavy traffic is the bottleneck and the app can tolerate replica lag for those reads.
Tune indexes, queries, and schema before assuming instance size is the only performance lever.
Use caching for repeated reads when staleness is acceptable. ElastiCache can reduce database pressure, but it must not become the only durable copy of data.
Choose storage type, IOPS, and instance class based on workload metrics, not guesswork.
9. Cost Controls
Multi-AZ adds cost, but it buys managed failover. Use it for production workloads with availability requirements.
Read replicas add instance and storage cost. Add them when read scaling or regional read locality is actually required.
Backups and snapshots consume storage. Keep retention aligned with compliance and recovery needs.
Right-size the DB instance. Overprovisioning hides query problems and increases cost. Underprovisioning causes user-facing pain.
Consider Aurora or DynamoDB only when the workload requirements justify them. "Managed" alone is not enough reason to change data models.
10. Exam Variants
"Automatically fail over if the DB instance fails" points to RDS Multi-AZ.
"Scale read-heavy database traffic" points to read replicas.
"Recover to a specific point before accidental deletion" points to automated backups and point-in-time recovery.
"Database must not be internet accessible" points to private subnets and security groups.
"Application gets connection errors during failover" points to retry and reconnection behavior.
"Lowest operational overhead for relational database" points to RDS or Aurora, not self-managed EC2 database.
11. Common Traps
Do not use read replicas as the default automatic failover answer.
Do not assume Multi-AZ standby serves read traffic in classic RDS DB instance deployments.
Do not remove backups because Multi-AZ is enabled.
Do not expose RDS publicly for application convenience.
Do not hardcode a database host address that bypasses the RDS endpoint.
Do not ignore application connection behavior. Failover can still interrupt active sessions.
12. Related Topics
Review Amazon RDS, RDS Multi-AZ vs Read Replicas, and Public Web App On AWS.
Official AWS references:
What to study next
These links keep the session moving: read prerequisites first, then open the systems, concepts, and patterns that deepen this page.
Prerequisites
Read these first if the mechanics feel unfamiliar.
More Links
Additional references connected to this page.