AWS Exam Review
Resilient Architecture Trap Drills
Practice SAA-C03 resilience traps around loose coupling, Multi-AZ, failover, read replicas, backups, replication, queues, retries, disaster recovery, RTO, and RPO.
After this, you will understand
These drills teach resilience as failure-mode matching, not as sprinkling Multi-AZ and queues into every answer.
Each drill names a failure or recovery requirement and asks which AWS control actually addresses it.
Learners confuse read scaling with failover, replication with backup, queueing with exactly-once processing, or active-active with every DR requirement.
Name the failure mode, then choose the matching control: redundancy, loose coupling, backup, replication, retry, DLQ, or regional DR.
Think before readingWhat is the fastest way to avoid resilience distractors?
Reading in progress
This page is saved in your local study history so you can continue later.
Study path
Read these in order
Start with the mechanics, then move into the patterns that explain why the system is shaped this way.
Concepts Covered
- Resilience practice drills
- Multi-AZ and Multi-Region design
- Read replicas and failover
- Backup and point-in-time restore
- Replication and corruption risk
- Queue buffering and DLQs
- Stateless workloads
- Disaster recovery patterns
- RTO and RPO
- SAA-C03 distractor patterns
1. Domain Mental Model
Resilience questions are failure-mode matching questions.
Use this mental model:
availability keeps serving
recovery restores good state
scaling handles demand
decoupling absorbs mismatch
If the question asks for automatic database failover, a read replica may be a distractor. If the question asks for recovery after accidental deletion, Multi-AZ may be a distractor. If the question asks for traffic spikes, backups may be irrelevant. If the question asks for low RTO after regional failure, daily snapshots may not be enough.
2. Official Task Map
This drill page maps to the resilient architecture domain:
- scalable and loosely coupled architectures
- highly available and fault-tolerant architectures
The exam blends these heavily. A single scenario may mention a load balancer, Auto Scaling group, queue, database, and DR target. The right answer depends on which part is failing or under pressure.
Treat every distractor as solving a different problem.
3. What AWS Is Testing
AWS is testing whether you can choose resilience mechanisms by requirement.
Common tested mechanisms include ALB, NLB, Auto Scaling, SQS, SNS, EventBridge, Step Functions, Lambda retries, DLQs, RDS Multi-AZ, Aurora replicas, read replicas, DynamoDB global tables, S3 versioning, S3 replication, AWS Backup, Route 53 failover, CloudFront, Global Accelerator, and DR patterns.
The exam also tests operational realism. A design can use Multi-AZ and still fail if the app cannot reconnect. A queue can buffer messages and still need idempotent consumers. A backup can exist and still be useless if restore is untested.
4. Service And Concept Clusters
Use this cluster map while drilling:
- Compute availability: Amazon EC2 Auto Scaling, AWS Auto Scaling, ALB vs NLB vs GWLB
- Decoupling: Amazon SQS, Amazon SNS, Amazon EventBridge, SQS vs SNS vs EventBridge
- Workflow resilience: AWS Step Functions, Step Functions vs SQS And Lambda Retries
- Database resilience: RDS Multi-AZ vs Read Replicas, RDS And Aurora Recovery Choices
- Data recovery: AWS Backup, S3 Replication, Backup vs Replication Recovery Design
- Regional recovery: Multi-Region Disaster Recovery On AWS, CloudFront vs Global Accelerator
5. Architecture Reasoning Patterns
Use this drill checklist:
1. Is the problem demand, component failure, AZ failure, Region failure, or bad data?
2. Is the workload stateless or stateful?
3. Is the requirement RTO, RPO, availability, throughput, or durability?
4. Does the design need automatic failover or manual restore?
5. Could the chosen mechanism copy bad state?
6. Does the application tolerate retries and reconnects?
For data systems, always separate live continuity from historical recovery.
For async systems, always think about retries, duplicate processing, visibility timeout, DLQ, and idempotency.
For regional DR, always ask whether the standby environment has data, infrastructure, quotas, secrets, DNS, and runbooks.
6. High-Yield Comparisons
Drill 1: RDS primary instance fails.
Wrong instinct: read replica.
Better answer: RDS Multi-AZ for managed automatic failover.
Drill 2: accidental table drop.
Wrong instinct: Multi-AZ standby.
Better answer: point-in-time recovery or snapshot restore.
Drill 3: producer overwhelms processor.
Wrong instinct: bigger instance only.
Better answer: SQS buffer plus scalable consumers when async processing fits.
Drill 4: poison messages keep failing.
Wrong instinct: infinite retries.
Better answer: DLQ and investigation path.
Drill 5: need near-zero regional RTO.
Wrong instinct: backup and restore.
Better answer: warm standby or active-active depending on RTO/RPO.
Drill 6: global static content latency.
Wrong instinct: multi-Region EC2.
Better answer: CloudFront caching when content is cacheable.
7. Scenario Triggers
"Automatically replace unhealthy instances" points to Auto Scaling group health checks.
"Distribute web traffic across AZs" points to load balancer plus targets in multiple AZs.
"Decouple producers and consumers" points to SQS, SNS, EventBridge, or Step Functions.
"Fan out one event to multiple subscribers" points to SNS or EventBridge depending on event-routing needs.
"Recover to last known good point" points to backups, PITR, snapshots, or versioning.
"Scale reads from relational database" points to read replicas.
"Automatic database failover" points to Multi-AZ or Aurora HA behavior.
"Regional failover with DNS" points to Route 53 health checks and routing policies.
"Low RPO global NoSQL table" may point to DynamoDB global tables.
8. Common Traps
Do not call read replicas the default high availability answer.
Do not call replication a backup strategy by itself.
Do not assume Multi-AZ protects against bad writes.
Do not ignore app reconnection behavior during database failover.
Do not assume SQS prevents duplicate processing.
Do not forget DLQs for failed asynchronous work.
Do not design one-AZ public and private subnets for high availability.
Do not choose active-active multi-Region when the stated RTO allows simpler DR.
Do not forget restore testing.
Do not ignore quotas in DR accounts and Regions.
9. Study Path
Study and drill in this order:
- Design Resilient Architectures
- Amazon EC2 Auto Scaling
- ALB vs NLB vs GWLB
- SQS vs SNS vs EventBridge
- Step Functions vs SQS And Lambda Retries
- Highly Available RDS App
- Backup vs Replication Recovery Design
- Multi-Region Disaster Recovery On AWS
- Event-Driven Order Processing
- Static Site With CloudFront And S3
Repeat the drills until each phrase maps to a failure mode.
10. Related Topics
Review Design Resilient Architectures, Secure Architecture Trap Drills, High-Performing Architecture Trap Drills, and Cost-Optimized Architecture Trap Drills.
Official AWS references:
What to study next
These links keep the session moving: read prerequisites first, then open the systems, concepts, and patterns that deepen this page.
Prerequisites
Read these first if the mechanics feel unfamiliar.
More Links
Additional references connected to this page.