AWS Services
Amazon Kinesis Data Streams
Understand Kinesis Data Streams for real-time streaming ingestion, including streams, shards, producers, consumers, retention, enhanced fan-out, and exam traps.
After this, you will understand
Kinesis makes streaming architecture concrete: continuous events need ordered shards, replay windows, and consumers, not just a queue.
Kinesis Data Streams captures and stores real-time records so multiple consumers can process streaming data.
Learners use SQS for every asynchronous workload and miss streaming, ordered partitioned records, replay, and multiple independent consumers.
Use Kinesis Data Streams when producers generate continuous event records that need low-latency processing, replay, and shard-based scaling.
Think before readingWhat is the clean difference between SQS and Kinesis Data Streams?
Reading in progress
This page is saved in your local study history so you can continue later.
Study path
Read these in order
Start with the mechanics, then move into the patterns that explain why the system is shaped this way.
Concepts Covered
- Streaming data
- Kinesis Data Streams
- Streams and shards
- Producers and consumers
- Partition keys
- Sequence numbers
- Retention and replay
- Enhanced fan-out
- Kinesis versus SQS, EventBridge, Firehose, and MSK
- SAA-C03 streaming traps
1. Plain-English Mental Model
Amazon Kinesis Data Streams is a managed streaming data service.
The simple model is:
producers -> stream shards -> ordered records -> consumers
A stream stores records for a retention period. Producers write records. Consumers read records. Records with the same partition key go to the same shard, preserving order within that shard.
This is different from a queue. A queue usually delivers a message to one worker and then removes it after processing. A stream keeps records for a window so multiple consumers can read and replay them.
Kinesis Data Streams is useful when data is continuous and time-sensitive: clickstreams, IoT telemetry, logs, metrics, fraud events, game events, and real-time analytics.
2. Why This Service Exists
Some systems generate events faster than batch jobs can handle.
Click events arrive continuously. Devices send readings every second. Applications emit logs and metrics. Fraud detection wants to inspect transactions quickly. Analytics teams want near-real-time dashboards.
Kinesis Data Streams exists to ingest, buffer, order, and fan out these records to streaming consumers.
For SAA-C03, it appears in questions about real-time streaming ingestion, producers and consumers, shard scaling, ordered records, replay, retention windows, enhanced fan-out, and multiple independent consumers.
The common boundary: Kinesis Data Streams stores streams. Kinesis Data Firehose delivers streaming data to destinations with less custom consumer code. SQS queues work. EventBridge routes events. MSK provides managed Apache Kafka.
3. The Naive Approach And Where It Breaks
The naive pattern is a database write per event:
every click -> write directly to analytics database
This breaks when bursts overwhelm the database, consumers need replay, multiple teams need the same event stream, or producers should not know every downstream destination.
Another naive pattern is using SQS for every event workload. SQS is excellent for decoupled work queues, but it is not designed as a replayable ordered event log with multiple consumers reading the same record stream.
Another mistake is choosing Kinesis when the requirement is simply to deliver records to S3 with minimal processing. Firehose may be the lower-operational-overhead option.
Kinesis Data Streams is for stream processing control.
4. Core Primitives
A data stream is the named stream resource.
A record is the data unit written to a stream.
A shard is a sequence of records with read and write capacity.
A partition key determines which shard receives a record.
A sequence number identifies record order within a shard.
Producers write records to the stream.
Consumers read records from the stream.
Retention controls how long records remain available for replay.
Enhanced fan-out gives registered consumers dedicated read throughput per shard.
On-demand and provisioned capacity modes support different scaling models.
5. Architecture Use Cases
Use Kinesis Data Streams for clickstream ingestion:
web app -> Kinesis Data Streams -> Lambda, analytics consumer, S3 delivery pipeline
Use it for IoT telemetry that needs near-real-time processing.
Use it for fraud or anomaly detection pipelines where a consumer evaluates records quickly.
Use multiple consumers when one team needs real-time metrics, another needs durable storage, and another needs alerting.
Use Lambda event source mappings for serverless stream processing.
Use Firehose when the goal is managed delivery to S3, Redshift, OpenSearch, or other supported destinations with less consumer management.
7. Security Model
Kinesis security includes IAM, encryption, network access, and consumer permissions.
Producers need permission to put records. Consumers need permission to read records.
Encryption at rest can use KMS.
Applications can access Kinesis through AWS service endpoints, and VPC interface endpoints can keep traffic private where supported.
Records may contain sensitive data. Treat streams as data stores with access control, retention policy, and audit needs.
CloudTrail records management actions, while application-level record access and processing need observability through metrics and logs.
8. Reliability And Resilience
Kinesis Data Streams stores records durably across multiple Availability Zones in a Region during the retention period.
Consumers can checkpoint progress and replay from a known position.
Producer retry logic matters. Duplicate records can happen, so consumers should be idempotent where needed.
Retention duration must match recovery needs. If a consumer is down longer than retention, unread records may expire.
Shard hot spots can happen if partition keys are poorly distributed.
Monitor iterator age, write throttles, read throttles, consumer errors, and shard utilization.
9. Performance And Scaling
Kinesis scaling is shard-based in provisioned mode.
Each shard has write and read capacity limits. More shards increase capacity and parallelism.
On-demand mode reduces capacity planning for variable workloads.
Partition key design controls distribution. A single hot partition key can overload one shard while other shards are idle.
Enhanced fan-out helps when multiple consumers need high read throughput from the same stream.
Batching records improves producer efficiency.
10. Cost Model
Kinesis cost depends on capacity mode, shard hours or on-demand throughput, PUT payload units, enhanced fan-out consumers, extended retention, and data transfer.
Provisioned mode can be efficient for predictable traffic but requires shard management.
On-demand mode is simpler for unpredictable workloads but has different pricing characteristics.
Firehose may be cheaper operationally when no custom stream consumer is needed.
Cost is tied to throughput, retention, and number of consumers.
12. SAA-C03 Exam Signals
"Real-time streaming data ingestion" points to Kinesis Data Streams.
"Multiple consumers process the same stream" points to Kinesis Data Streams.
"Replay records within a retention window" points to Kinesis Data Streams.
"Ordered records per partition key" points to Kinesis Data Streams.
"Managed delivery of streaming data to S3 or Redshift" may point to Kinesis Data Firehose.
"Queue workers process messages once" points to SQS.
"Route application events to targets" points to EventBridge.
13. Common Exam Traps
Do not choose SQS when replayable streaming is required.
Do not choose Kinesis Data Streams when Firehose delivery is enough.
Do not ignore shard hot spots.
Do not assume ordering across the whole stream. Ordering is per shard.
Do not set retention shorter than consumer recovery needs.
Do not forget duplicate handling and idempotent consumers.
15. Related Topics
Review Amazon SQS, Amazon EventBridge, AWS Glue, Amazon OpenSearch Service, and Amazon Redshift.
Official AWS references:
What to study next
These links keep the session moving: read prerequisites first, then open the systems, concepts, and patterns that deepen this page.
Prerequisites
Read these first if the mechanics feel unfamiliar.
More Links
Additional references connected to this page.