AWS Services

Amazon Kinesis Data Streams

Understand Kinesis Data Streams for real-time streaming ingestion, including streams, shards, producers, consumers, retention, enhanced fan-out, and exam traps.

foundation5 min readUpdated 2026-06-03CloudCertificationDataCapacityOperations

Data StreamShardProducerConsumerPartition KeySequence NumberRetentionEnhanced Fan-Out

After this, you will understand

Kinesis makes streaming architecture concrete: continuous events need ordered shards, replay windows, and consumers, not just a queue.

Plain version

Kinesis Data Streams captures and stores real-time records so multiple consumers can process streaming data.

Decision pressure

Learners use SQS for every asynchronous workload and miss streaming, ordered partitioned records, replay, and multiple independent consumers.

Exam-ready model

Use Kinesis Data Streams when producers generate continuous event records that need low-latency processing, replay, and shard-based scaling.

Think before readingWhat is the clean difference between SQS and Kinesis Data Streams?

SQS is a queue for decoupled work; Kinesis Data Streams is a streaming log with ordered shard records and replayable retention.

Reading in progress

This page is saved in your local study history so you can continue later.

Next: Kinesis vs SQS vs EventBridge

Study path

Read these in order

Start with the mechanics, then move into the patterns that explain why the system is shaped this way.

Concepts Covered

Streaming data
Kinesis Data Streams
Streams and shards
Producers and consumers
Partition keys
Sequence numbers
Retention and replay
Enhanced fan-out
Kinesis versus SQS, EventBridge, Firehose, and MSK
SAA-C03 streaming traps

1. Plain-English Mental Model

Amazon Kinesis Data Streams is a managed streaming data service.

The simple model is:

producers -> stream shards -> ordered records -> consumers

A stream stores records for a retention period. Producers write records. Consumers read records. Records with the same partition key go to the same shard, preserving order within that shard.

This is different from a queue. A queue usually delivers a message to one worker and then removes it after processing. A stream keeps records for a window so multiple consumers can read and replay them.

Kinesis Data Streams is useful when data is continuous and time-sensitive: clickstreams, IoT telemetry, logs, metrics, fraud events, game events, and real-time analytics.

2. Why This Service Exists

Some systems generate events faster than batch jobs can handle.

Click events arrive continuously. Devices send readings every second. Applications emit logs and metrics. Fraud detection wants to inspect transactions quickly. Analytics teams want near-real-time dashboards.

Kinesis Data Streams exists to ingest, buffer, order, and fan out these records to streaming consumers.

For SAA-C03, it appears in questions about real-time streaming ingestion, producers and consumers, shard scaling, ordered records, replay, retention windows, enhanced fan-out, and multiple independent consumers.

The common boundary: Kinesis Data Streams stores streams. Kinesis Data Firehose delivers streaming data to destinations with less custom consumer code. SQS queues work. EventBridge routes events. MSK provides managed Apache Kafka.

3. The Naive Approach And Where It Breaks

The naive pattern is a database write per event:

every click -> write directly to analytics database

This breaks when bursts overwhelm the database, consumers need replay, multiple teams need the same event stream, or producers should not know every downstream destination.

Another naive pattern is using SQS for every event workload. SQS is excellent for decoupled work queues, but it is not designed as a replayable ordered event log with multiple consumers reading the same record stream.

Another mistake is choosing Kinesis when the requirement is simply to deliver records to S3 with minimal processing. Firehose may be the lower-operational-overhead option.

Kinesis Data Streams is for stream processing control.

4. Core Primitives

A data stream is the named stream resource.

A record is the data unit written to a stream.

A shard is a sequence of records with read and write capacity.

A partition key determines which shard receives a record.

A sequence number identifies record order within a shard.

Producers write records to the stream.

Consumers read records from the stream.

Retention controls how long records remain available for replay.

Enhanced fan-out gives registered consumers dedicated read throughput per shard.

On-demand and provisioned capacity modes support different scaling models.

5. Architecture Use Cases

Use Kinesis Data Streams for clickstream ingestion:

web app -> Kinesis Data Streams -> Lambda, analytics consumer, S3 delivery pipeline

Use it for IoT telemetry that needs near-real-time processing.

Use it for fraud or anomaly detection pipelines where a consumer evaluates records quickly.

Use multiple consumers when one team needs real-time metrics, another needs durable storage, and another needs alerting.

Use Lambda event source mappings for serverless stream processing.

Use Firehose when the goal is managed delivery to S3, Redshift, OpenSearch, or other supported destinations with less consumer management.

7. Security Model

Kinesis security includes IAM, encryption, network access, and consumer permissions.

Producers need permission to put records. Consumers need permission to read records.

Encryption at rest can use KMS.

Applications can access Kinesis through AWS service endpoints, and VPC interface endpoints can keep traffic private where supported.

Records may contain sensitive data. Treat streams as data stores with access control, retention policy, and audit needs.

CloudTrail records management actions, while application-level record access and processing need observability through metrics and logs.

8. Reliability And Resilience

Kinesis Data Streams stores records durably across multiple Availability Zones in a Region during the retention period.

Consumers can checkpoint progress and replay from a known position.

Producer retry logic matters. Duplicate records can happen, so consumers should be idempotent where needed.

Retention duration must match recovery needs. If a consumer is down longer than retention, unread records may expire.

Shard hot spots can happen if partition keys are poorly distributed.

Monitor iterator age, write throttles, read throttles, consumer errors, and shard utilization.

9. Performance And Scaling

Kinesis scaling is shard-based in provisioned mode.

Each shard has write and read capacity limits. More shards increase capacity and parallelism.

On-demand mode reduces capacity planning for variable workloads.

Partition key design controls distribution. A single hot partition key can overload one shard while other shards are idle.

Enhanced fan-out helps when multiple consumers need high read throughput from the same stream.

Batching records improves producer efficiency.

10. Cost Model

Kinesis cost depends on capacity mode, shard hours or on-demand throughput, PUT payload units, enhanced fan-out consumers, extended retention, and data transfer.

Provisioned mode can be efficient for predictable traffic but requires shard management.

On-demand mode is simpler for unpredictable workloads but has different pricing characteristics.

Firehose may be cheaper operationally when no custom stream consumer is needed.

Cost is tied to throughput, retention, and number of consumers.

12. SAA-C03 Exam Signals

"Real-time streaming data ingestion" points to Kinesis Data Streams.

"Multiple consumers process the same stream" points to Kinesis Data Streams.

"Replay records within a retention window" points to Kinesis Data Streams.

"Ordered records per partition key" points to Kinesis Data Streams.

"Managed delivery of streaming data to S3 or Redshift" may point to Kinesis Data Firehose.

"Queue workers process messages once" points to SQS.

"Route application events to targets" points to EventBridge.

13. Common Exam Traps

Do not choose SQS when replayable streaming is required.

Do not choose Kinesis Data Streams when Firehose delivery is enough.

Do not ignore shard hot spots.

Do not assume ordering across the whole stream. Ordering is per shard.

Do not set retention shorter than consumer recovery needs.

Do not forget duplicate handling and idempotent consumers.

Review Amazon SQS, Amazon EventBridge, AWS Glue, Amazon OpenSearch Service, and Amazon Redshift.

Official AWS references:

What to study next

These links keep the session moving: read prerequisites first, then open the systems, concepts, and patterns that deepen this page.

Prerequisites

Read these first if the mechanics feel unfamiliar.

Amazon SQSStart here if Amazon SQS is still fuzzy.AWS GlueStart here if AWS Glue is still fuzzy.

Read these in order

What to study next

Prerequisites

More Links