AWS Exam Review

Resilient Architecture Trap Drills

Practice SAA-C03 resilience traps around loose coupling, Multi-AZ, failover, read replicas, backups, replication, queues, retries, disaster recovery, RTO, and RPO.

intermediate5 min readUpdated 2026-06-05CloudCertificationReliabilityOperationsCapacityTradeoffs

Loose CouplingMulti-AZBackupReplicationRead ReplicaDead-Letter QueueRTORPO

After this, you will understand

These drills teach resilience as failure-mode matching, not as sprinkling Multi-AZ and queues into every answer.

Plain version

Each drill names a failure or recovery requirement and asks which AWS control actually addresses it.

Decision pressure

Learners confuse read scaling with failover, replication with backup, queueing with exactly-once processing, or active-active with every DR requirement.

Exam-ready model

Name the failure mode, then choose the matching control: redundancy, loose coupling, backup, replication, retry, DLQ, or regional DR.

Think before readingWhat is the fastest way to avoid resilience distractors?

Separate availability, scalability, durability, and recovery before choosing the AWS service or feature.

Reading in progress

This page is saved in your local study history so you can continue later.

Next: Design High-Performing Architectures

Study path

Read these in order

Start with the mechanics, then move into the patterns that explain why the system is shaped this way.

Concepts Covered

Resilience practice drills
Multi-AZ and Multi-Region design
Read replicas and failover
Backup and point-in-time restore
Replication and corruption risk
Queue buffering and DLQs
Stateless workloads
Disaster recovery patterns
RTO and RPO
SAA-C03 distractor patterns

1. Domain Mental Model

Resilience questions are failure-mode matching questions.

Use this mental model:

availability keeps serving
recovery restores good state
scaling handles demand
decoupling absorbs mismatch

If the question asks for automatic database failover, a read replica may be a distractor. If the question asks for recovery after accidental deletion, Multi-AZ may be a distractor. If the question asks for traffic spikes, backups may be irrelevant. If the question asks for low RTO after regional failure, daily snapshots may not be enough.

2. Official Task Map

This drill page maps to the resilient architecture domain:

scalable and loosely coupled architectures
highly available and fault-tolerant architectures

The exam blends these heavily. A single scenario may mention a load balancer, Auto Scaling group, queue, database, and DR target. The right answer depends on which part is failing or under pressure.

Treat every distractor as solving a different problem.

3. What AWS Is Testing

AWS is testing whether you can choose resilience mechanisms by requirement.

Common tested mechanisms include ALB, NLB, Auto Scaling, SQS, SNS, EventBridge, Step Functions, Lambda retries, DLQs, RDS Multi-AZ, Aurora replicas, read replicas, DynamoDB global tables, S3 versioning, S3 replication, AWS Backup, Route 53 failover, CloudFront, Global Accelerator, and DR patterns.

The exam also tests operational realism. A design can use Multi-AZ and still fail if the app cannot reconnect. A queue can buffer messages and still need idempotent consumers. A backup can exist and still be useless if restore is untested.

4. Service And Concept Clusters

Use this cluster map while drilling:

Compute availability: Amazon EC2 Auto Scaling, AWS Auto Scaling, ALB vs NLB vs GWLB
Decoupling: Amazon SQS, Amazon SNS, Amazon EventBridge, SQS vs SNS vs EventBridge
Workflow resilience: AWS Step Functions, Step Functions vs SQS And Lambda Retries
Database resilience: RDS Multi-AZ vs Read Replicas, RDS And Aurora Recovery Choices
Data recovery: AWS Backup, S3 Replication, Backup vs Replication Recovery Design
Regional recovery: Multi-Region Disaster Recovery On AWS, CloudFront vs Global Accelerator

5. Architecture Reasoning Patterns

Use this drill checklist:

1. Is the problem demand, component failure, AZ failure, Region failure, or bad data?
2. Is the workload stateless or stateful?
3. Is the requirement RTO, RPO, availability, throughput, or durability?
4. Does the design need automatic failover or manual restore?
5. Could the chosen mechanism copy bad state?
6. Does the application tolerate retries and reconnects?

For data systems, always separate live continuity from historical recovery.

For async systems, always think about retries, duplicate processing, visibility timeout, DLQ, and idempotency.

For regional DR, always ask whether the standby environment has data, infrastructure, quotas, secrets, DNS, and runbooks.

6. High-Yield Comparisons

Drill 1: RDS primary instance fails.

Wrong instinct: read replica.

Better answer: RDS Multi-AZ for managed automatic failover.

Drill 2: accidental table drop.

Wrong instinct: Multi-AZ standby.

Better answer: point-in-time recovery or snapshot restore.

Drill 3: producer overwhelms processor.

Wrong instinct: bigger instance only.

Better answer: SQS buffer plus scalable consumers when async processing fits.

Drill 4: poison messages keep failing.

Wrong instinct: infinite retries.

Better answer: DLQ and investigation path.

Drill 5: need near-zero regional RTO.

Wrong instinct: backup and restore.

Better answer: warm standby or active-active depending on RTO/RPO.

Drill 6: global static content latency.

Wrong instinct: multi-Region EC2.

Better answer: CloudFront caching when content is cacheable.

7. Scenario Triggers

"Automatically replace unhealthy instances" points to Auto Scaling group health checks.

"Distribute web traffic across AZs" points to load balancer plus targets in multiple AZs.

"Decouple producers and consumers" points to SQS, SNS, EventBridge, or Step Functions.

"Fan out one event to multiple subscribers" points to SNS or EventBridge depending on event-routing needs.

"Recover to last known good point" points to backups, PITR, snapshots, or versioning.

"Scale reads from relational database" points to read replicas.

"Automatic database failover" points to Multi-AZ or Aurora HA behavior.

"Regional failover with DNS" points to Route 53 health checks and routing policies.

"Low RPO global NoSQL table" may point to DynamoDB global tables.

8. Common Traps

Do not call read replicas the default high availability answer.

Do not call replication a backup strategy by itself.

Do not assume Multi-AZ protects against bad writes.

Do not ignore app reconnection behavior during database failover.

Do not assume SQS prevents duplicate processing.

Do not forget DLQs for failed asynchronous work.

Do not design one-AZ public and private subnets for high availability.

Do not choose active-active multi-Region when the stated RTO allows simpler DR.

Do not forget restore testing.

Do not ignore quotas in DR accounts and Regions.

9. Study Path

Study and drill in this order:

Repeat the drills until each phrase maps to a failure mode.

Review Design Resilient Architectures, Secure Architecture Trap Drills, High-Performing Architecture Trap Drills, and Cost-Optimized Architecture Trap Drills.

Official AWS references:

What to study next

These links keep the session moving: read prerequisites first, then open the systems, concepts, and patterns that deepen this page.

Prerequisites

Read these first if the mechanics feel unfamiliar.

Design Resilient ArchitecturesStart here if Design Resilient Architectures is still fuzzy.Backup vs Replication Recovery DesignStart here if Backup vs Replication Recovery Design is still fuzzy.

Read these in order

What to study next

Prerequisites

More Links