Concepts
Event Streams
Durable ordered logs of events that let systems publish state changes and let independent consumers process them asynchronously.
Concepts Covered
- Events
- Producers
- Consumers
- Consumer groups
- Durable logs
- Replay
- Ordering boundaries
- Consumer lag
- Delivery guarantees
Definition
An event stream is a durable ordered sequence of records that describe things that happened in a system.
Example:
LikeCreated(post_id=42, user_id=7, event_id=evt_1)
LikeRemoved(post_id=42, user_id=7, event_id=evt_2)
Services publish events to the stream. Independent consumers read those events and build downstream behavior such as counters, notifications, analytics, ranking features, inboxes, search indexes, and projections.
The Pain That Forces Event Streams
At small scale, one service can do everything directly:
Like API:
insert like
update counter
send notification
update ranking
write analytics
return response
This looks simple, but it makes the user request depend on every downstream feature.
If analytics is slow, the like is slow. If notifications are down, the like might fail. If ranking logic grows expensive, the core user action becomes fragile.
The product problem is that one committed state change often needs to trigger many pieces of work, but the user-facing write should not wait for all of them.
The Naive Failure
Imagine a like request that synchronously calls five downstream systems.
1. Insert like succeeds.
2. Counter update succeeds.
3. Notification call times out.
4. Analytics call succeeds.
5. API returns failure because one downstream call failed.
What should the client do? Retry the like? If it retries, the counter or analytics may duplicate. If it does not retry, the user sees failure even though the like exists.
The naive design mixes the source-of-truth command with secondary workflows.
Mental Model
An event stream lets the source system say:
This happened. Whoever cares can process it independently.
The Like API should not need to know every current and future feature that reacts to a like. It should commit the source state and publish an event. Consumers can then process the event at their own pace.
Event streams turn direct coupling into asynchronous coordination.
How Event Streams Work
Core pieces:
- Producer: writes events.
- Broker or stream: stores ordered records.
- Consumer: reads events and performs work.
- Consumer group: allows multiple workers to share processing.
- Offset or cursor: remembers how far a consumer has read.
Example flow:
Like API -> LikeCreated event -> event stream
-> counter consumer
-> notification consumer
-> analytics consumer
-> ranking consumer
Each consumer can fail, retry, lag, or deploy independently without blocking the original Like API request.
Ordering And Partitioning
Streams are usually ordered inside a partition, not globally across the whole platform.
If all events for post_42 go to the same partition, consumers can process that post's events in order:
partition_key = post_id
This helps correctness, but it can create hot partitions when one key receives huge traffic. Partitioning is always a tradeoff between ordering and load distribution.
Replay
Replay is one of the most powerful properties of event streams.
If a counter projection is wrong, the system can reset the counter consumer and process old events again.
If a new analytics system launches, it can read historical events instead of waiting for new ones.
Replay is only safe when consumers are designed carefully. Replaying events can duplicate side effects if consumers are not idempotent.
What Event Streams Guarantee
Event streams can provide:
- durable storage of events
- ordered processing within a partition
- independent consumers
- replay from retained history
- buffering during downstream outages
They do not automatically provide:
- exactly-once business effects
- correct producer behavior
- infinite retention
- no lag
- no duplicate delivery
- schema compatibility
Operational Reality
Operators must watch:
- producer publish failures
- consumer lag
- oldest unprocessed event age
- partition skew
- retry rate
- dead-lettered events
- schema compatibility failures
- event retention windows
- replay duration
Failure modes:
- Consumers fall behind and projections become stale.
- Duplicate delivery causes duplicate side effects.
- Bad partition keys create hot partitions.
- Producers publish events that do not match committed database state.
- Event schemas change in ways old consumers cannot parse.
- Retention expires before a consumer can replay.
Related Topics
Knowledge links
Use these links to understand what to know first, where this idea appears, and what to study next.
Prerequisites
Read these first if this topic feels unfamiliar.
Used In Systems
System studies where this idea appears in context.
Related Concepts
Core ideas that connect to this topic.
Related Patterns
Reusable architecture moves built from these ideas.