AWS Services

AWS Auto Scaling

Understand AWS Auto Scaling in SAA-C03 context, including scaling plans, Application Auto Scaling, predictive scaling guidance, EC2 Auto Scaling boundaries, scalable resources, target tracking, scheduled scaling, cost, and traps.

foundation7 min readUpdated 2026-06-03CloudCertificationCapacityCostOperationsTradeoffs
AWS Auto ScalingScaling PlanApplication Auto ScalingScalable TargetTarget TrackingPredictive ScalingScheduled ScalingScaling StrategyCloudWatch Metrics

After this, you will understand

AWS Auto Scaling teaches learners that scaling is a cross-service pattern, while EC2 Auto Scaling is only the EC2 fleet version of that pattern.

Plain version

AWS Auto Scaling historically coordinates scaling plans, while Application Auto Scaling provides automatic scaling for many non-EC2 scalable resources.

Decision pressure

Learners use EC2 Auto Scaling for DynamoDB, ECS service count, Aurora replicas, or Lambda provisioned concurrency, or miss AWS's current guidance to use direct scaling policies for many cases.

Exam-ready model

Use the service-specific scaling mechanism: EC2 Auto Scaling for EC2 fleets, Application Auto Scaling for supported service resources, and scaling plans only when the scenario explicitly points to grouped scaling-plan behavior.

Think before readingWhat is the exam-safe distinction between EC2 Auto Scaling and Application Auto Scaling?
EC2 Auto Scaling manages EC2 instance fleets; Application Auto Scaling manages scalable resources in other supported services such as ECS services, DynamoDB capacity, Aurora replicas, and Lambda provisioned concurrency.

Reading in progress

This page is saved in your local study history so you can continue later.

Study path

Read these in order

Start with the mechanics, then move into the patterns that explain why the system is shaped this way.

  1. 1AWS Elastic Beanstalkaws-services
  2. 2AWS CloudFormationaws-services

Concepts Covered

  • AWS Auto Scaling
  • Scaling plans
  • Application Auto Scaling
  • EC2 Auto Scaling boundaries
  • Scalable targets
  • Target tracking, step scaling, scheduled scaling, and predictive scaling
  • CloudWatch metrics
  • Cross-service scaling decisions
  • Current guidance around scaling plans
  • SAA-C03 traps

1. Plain-English Mental Model

AWS scaling has three closely related ideas:

EC2 Auto Scaling = scale EC2 instance groups
Application Auto Scaling = scale supported non-EC2 service resources
AWS Auto Scaling scaling plans = grouped scaling-plan layer across related resources

The naming is the confusing part.

When the resource is an EC2 fleet, think Amazon EC2 Auto Scaling.

When the resource is an ECS service desired count, DynamoDB table capacity, Aurora replica count, Lambda provisioned concurrency, or another supported service resource, think Application Auto Scaling.

When the exam or docs mention scaling plans, resource discovery, scaling strategies, and groups of related scalable resources, think AWS Auto Scaling scaling plans.

2. Why This Service Exists

Cloud systems do not have one kind of capacity.

A web application may have EC2 instances, ECS tasks, DynamoDB capacity, Aurora replicas, and Lambda provisioned concurrency. Each resource type has its own shape, limits, and metrics. Scaling only the web tier does not help if DynamoDB throttles or if ECS service count stays fixed.

AWS Auto Scaling and Application Auto Scaling exist because capacity control is a cross-service problem.

The current nuance: AWS documentation still describes scaling plans, but it also recommends using predictive scaling policies directly on EC2 Auto Scaling or Application Auto Scaling resources for certain predictive scaling use cases. So for modern architecture reasoning, do not turn every scaling question into "use AWS Auto Scaling scaling plans." Choose the scaling mechanism that matches the resource and requirement.

3. The Naive Approach And Where It Breaks

The naive pattern is:

scale EC2 instances -> assume whole application scaled

This breaks when another tier is the bottleneck. More EC2 instances can increase pressure on a database, queue, cache, or downstream service.

Another naive pattern is using manual scheduled capacity changes. This works until traffic changes unpredictably or teams forget to update schedules.

A third mistake is thinking every AWS service uses EC2 Auto Scaling. It does not. ECS service scaling, DynamoDB auto scaling, Aurora replica scaling, and Lambda provisioned concurrency scaling use Application Auto Scaling concepts or service-specific integrations.

The better mental model is to identify the scalable resource first, then choose the scaling policy type and metric.

4. Core Primitives

A scalable target is the resource whose capacity can change. Examples include ECS service desired count, DynamoDB read or write capacity, Aurora replicas, or Lambda provisioned concurrency.

A scaling policy defines how capacity changes. Target tracking maintains a metric target. Step scaling reacts to alarm thresholds with defined increments. Scheduled scaling changes capacity at known times. Predictive scaling forecasts capacity needs from historical patterns for supported resources.

A CloudWatch metric supplies the signal. The metric must represent utilization or demand in a way that scaling capacity can affect.

AWS Auto Scaling scaling plans can discover scalable resources through tags or CloudFormation stacks and apply scaling strategies.

Application Auto Scaling is the web service used for automatic scaling of many supported resources beyond EC2 Auto Scaling groups.

EC2 Auto Scaling remains the service for EC2 Auto Scaling groups.

5. Architecture Use Cases

Use EC2 Auto Scaling for a stateless EC2 web tier behind an ALB.

Use Application Auto Scaling for an ECS service:

CloudWatch metric -> Application Auto Scaling policy -> ECS desired task count

Use Application Auto Scaling for DynamoDB provisioned capacity when using provisioned mode and workload demand changes.

Use Aurora replica auto scaling when read replica count should change based on read pressure.

Use Lambda provisioned concurrency scaling when a function needs warm capacity that follows expected demand.

Use scheduled scaling when the traffic pattern is known, such as business hours.

Use predictive scaling when historical patterns are strong and supported by the specific resource.

7. Security Model

Scaling security is mostly about permissions and blast radius.

Operators and automation that can modify scaling policies can affect availability and cost. A bad maximum capacity can create unexpected spend. A bad minimum capacity can make a service unavailable.

Use IAM to restrict who can register scalable targets, change policies, update Auto Scaling groups, or modify CloudWatch alarms.

Scaling services need service-linked roles or IAM permissions to adjust target resources. Those roles should be understood and monitored.

CloudTrail records API activity. CloudWatch metrics and alarms provide operational visibility.

For multi-account environments, governance should define who can set scaling limits and which accounts or environments may use aggressive scaling.

8. Reliability And Resilience

Scaling improves resilience only when it matches the real bottleneck and preserves healthy behavior.

If the database is saturated, scaling the web tier may make failure worse.

If the metric is delayed or sparse, scaling may react too late.

If cooldowns and warmups are wrong, the system can oscillate between too much and too little capacity.

If maximum capacity is too low, the service cannot grow during a spike. If it is too high, a runaway workload can create cost or downstream pressure.

Scaling policies should be paired with load testing, monitoring, quotas, and failure planning.

9. Performance And Scaling

Target tracking works best when the metric moves inversely with capacity. For example, average CPU per instance usually falls when more instances are added, assuming load spreads evenly.

Not every metric is valid for target tracking. A queue depth may not automatically fall per unit of capacity unless you normalize it, such as messages per worker.

Scheduled scaling is simple and effective for predictable patterns, but weak for unexpected spikes.

Predictive scaling can prepare capacity before known recurring peaks, but it depends on historical signal quality and supported resources.

Application Auto Scaling lets each service scale its own capacity shape. ECS tasks, DynamoDB capacity, Aurora replicas, and Lambda provisioned concurrency do not scale like EC2 instances.

10. Cost Model

Auto scaling is often described as cost optimization, but it is really capacity matching.

You save money by reducing idle resources. You protect performance by adding capacity before overload. The balance depends on minimum capacity, target values, cooldowns, maximum capacity, and business tolerance for latency.

Scaling services generally do not add large direct charges by themselves, but the scaled resources do. CloudWatch metrics, alarms, and predictive scaling data access can also matter.

Too-low targets can overprovision capacity. Too-high targets can save money while increasing latency and error risk.

Maximum capacity is a cost guardrail. Minimum capacity is a readiness and resilience guardrail.

12. SAA-C03 Exam Signals

"Scale EC2 instances in a group" points to Amazon EC2 Auto Scaling.

"Scale ECS service desired count" points to Application Auto Scaling through ECS service auto scaling.

"Scale DynamoDB provisioned capacity" points to Application Auto Scaling-backed DynamoDB auto scaling.

"Scale Aurora read replicas" points to Application Auto Scaling support for Aurora replicas.

"Group scalable resources by tag or CloudFormation stack and apply a scaling plan" points to AWS Auto Scaling scaling plans.

"Predictable daily spike" can point to scheduled or predictive scaling depending on wording.

"Metric should stay around a target value" points to target tracking.

13. Common Exam Traps

Do not choose EC2 Auto Scaling for non-EC2 scalable resources.

Do not choose a scaling plan just because the phrase "auto scaling" appears. Identify the resource first.

Do not scale on a metric that capacity cannot actually improve.

Do not forget service quotas. Scaling policies cannot exceed quotas or resource limits.

Do not set maximum capacity so low that scaling cannot handle the scenario.

Do not treat predictive scaling as a replacement for dynamic scaling unless the scenario clearly supports that design.

Review Amazon EC2 Auto Scaling before this page so the EC2-specific version is clear.

Next, study Amazon ECS And AWS Fargate, Amazon DynamoDB, and Amazon Aurora to see how service-specific scaling decisions differ.

Official AWS references:

What to study next

These links keep the session moving: read prerequisites first, then open the systems, concepts, and patterns that deepen this page.