System Design
YouTube / Netflix Video Streaming System
Design a video streaming system that handles uploads, transcoding, adaptive bitrate playback, CDN delivery, metadata, recommendations hooks, and watch analytics.
After this, you will understand
Why video streaming is not file download, but a pipeline that turns uploads into many playback artifacts and keeps viewers watching through CDN and adaptive bitrate decisions.
Upload one video file, store it, and let every viewer download or stream that original file from the application servers.
Uploads are huge, codecs differ, networks fluctuate, popular segments become globally hot, and watch events create high-volume analytics writes.
Separate upload, processing, playback metadata, CDN delivery, and analytics so the viewer path stays fast while heavy media work runs asynchronously.
Think before readingIf a creator uploads a 4 GB video and one million users start watching it, which work must happen before playback and which work must stay off the playback path?
Reading in progress
This page is saved in your local study history so you can continue later.
Study path
Read these in order
Start with the mechanics, then move into the patterns that explain why the system is shaped this way.
Concepts Covered
- Video transcoding
- Adaptive bitrate streaming
- Playback manifests
- CDN edge caching
- Source uploads and file chunking
- Metadata and entitlement checks
- Playback session lifecycle
- Watch events and analytics pipelines
- Processing queues, retries, and backpressure
- Dead-letter queues for failed media jobs
1. Introduction
A YouTube or Netflix-style video streaming system lets users upload or publish video, browse metadata, start playback quickly, and keep watching while the network changes.
The visible product behavior looks simple: click a video, press play, and watch.
The backend problem is harder because video is large, expensive, and read-heavy. The system is not only storing a file. It is preparing many playback versions, publishing them to delivery infrastructure, authorizing playback, collecting watch signals, and keeping the viewer path fast even when millions of people start the same video.
This module uses "YouTube / Netflix Video Streaming" as a familiar product shape, not as a claim about YouTube, Netflix, or any private implementation.
At small scale, a service can upload one video file and stream it from a web server. At production scale, that fails because:
- the original upload may not be playable on every device
- one file cannot fit all network conditions
- application servers should not serve massive media bytes
- popular videos create extreme read pressure
- transcoding is CPU-heavy and slow
- watch analytics should not slow playback
- failures can leave video stuck between uploaded, processing, and playable states
The core mental model: video platforms separate the media lifecycle from the playback path.
upload -> process -> publish playback assets -> serve segments -> collect watch signals
2. Product Requirements
Functional Requirements
- Creators or publishers can upload source videos.
- The system stores video metadata such as title, duration, owner, visibility, thumbnails, and processing state.
- Uploaded videos are processed into playback-ready renditions.
- Viewers can start playback quickly.
- Playback adapts to device and network conditions.
- Popular videos can be served to many regions without overloading origin storage.
- The system records watch events for analytics, recommendations, billing, or creator dashboards.
- Operators can block, delete, unpublish, or reprocess videos.
Non-Functional Requirements
- Playback startup should be low latency.
- The system should minimize buffering during playback.
- Media delivery should scale with read-heavy traffic.
- Upload and processing failures should be recoverable.
- Processing queues should not starve small videos behind huge jobs.
- Analytics ingestion should tolerate high event volume.
- Authorization should protect private or paid content.
- Origin storage and application servers should be shielded from repeated segment reads.
3. Core Engineering Challenges
| Challenge | Why it matters |
|---|---|
| Large uploads | Videos are too large for fragile one-shot requests. |
| Transcoding cost | Processing is CPU-heavy, slow, and failure-prone. |
| Device compatibility | Different clients support different codecs and resolutions. |
| Network variability | Viewers move between strong and weak network conditions. |
| CDN placement | Media bytes should be served close to users. |
| Hot content | Trending videos make the first segments extremely hot. |
| Playback metadata | Players need accurate manifests and segment URLs. |
| Watch analytics | Playback creates high-volume event streams. |
| Entitlement checks | Private, regional, or paid content needs access control. |
The naive implementation fails when it treats a video as one file served by one service. A production design treats video as a pipeline of state transitions and derived assets.
4. High-Level Architecture
flowchart LR Creator[Creator Client] --> UploadAPI[Upload API] UploadAPI --> SourceStore[(Source Object Store)] UploadAPI --> MetadataDB[(Video Metadata DB)] UploadAPI --> ProcessingQueue[Processing Queue] ProcessingQueue --> TranscodeWorkers[Transcode Workers] TranscodeWorkers --> PlaybackStore[(Playback Asset Store)] TranscodeWorkers --> ManifestService[Manifest Publisher] ManifestService --> CDN[CDN Edge Caches] Viewer[Viewer Client] --> PlaybackAPI[Playback API] PlaybackAPI --> MetadataDB PlaybackAPI --> ManifestService Viewer --> CDN Viewer --> WatchEvents[Watch Event Ingestion] WatchEvents --> EventStream[Event Stream] EventStream --> AnalyticsStore[(Analytics Store)] EventStream --> RecommendationSignals[Recommendation Signals]
The upload path, processing path, playback path, and analytics path have different priorities.
- Upload cares about durability, resumability, and source metadata.
- Processing cares about queues, retries, workers, and derived artifacts.
- Playback cares about latency, entitlement, manifests, CDN, and segments.
- Analytics cares about high-volume ingestion and delayed aggregation.
5. Core Components
Upload API
The Upload API creates an upload session and records durable intent. Large videos should not depend on one long request. The upload path may use chunked or resumable upload so clients can recover from network drops.
The upload state might move through:
created -> uploading -> uploaded -> processing -> playable -> failed
The Upload API should not synchronously transcode the video. It should verify the source object, write metadata, and enqueue processing work.
Source Object Store
The source object store holds the original uploaded file. This file is the source of truth for reprocessing, new encodings, quality fixes, thumbnails, and audit workflows.
The source file may be rarely read after processing, but it must be durable.
Video Metadata Service
Metadata includes:
- video ID
- owner or publisher ID
- title and description
- visibility
- duration
- upload state
- processing state
- available renditions
- thumbnail references
- moderation or policy state
- regional or entitlement rules
Metadata is on the playback control path. Media bytes are not.
Transcoding Pipeline
The transcoding pipeline reads the source file and produces playback renditions. Workers may generate several resolutions, bitrates, audio tracks, thumbnails, captions, or preview sprites.
This is asynchronous because it is expensive and can fail. It needs retry limits, dead-letter handling, backpressure, and job prioritization.
Playback Asset Store
Playback assets are the segments, thumbnails, audio tracks, subtitle tracks, and manifests that viewers fetch. These assets should be immutable or versioned so CDN caches can serve them safely.
Manifest Service
The manifest describes which renditions and segments are available. A player fetches it before downloading media segments.
The manifest is a contract:
these renditions exist
these segment URLs are valid
these tracks align on this timeline
If the manifest points to missing assets, playback fails.
CDN
The CDN serves playback segments close to viewers. It protects origin storage and reduces latency.
The CDN is especially important because video traffic is skewed. A small number of videos can dominate bandwidth, and the first few segments of each video are often hotter than later segments.
Playback API
The Playback API checks metadata, entitlement, region, device constraints, and playback state. It returns enough information for the player to fetch a manifest and begin playback.
It should not proxy every segment through the application backend. The heavy bytes should flow through CDN paths.
Watch Event Ingestion
Players emit events such as:
- playback started
- first frame rendered
- segment downloaded
- quality changed
- rebuffer started
- rebuffer ended
- playback paused
- watch progress
- playback ended
These events feed analytics, recommendations, creator dashboards, experiments, and reliability monitoring. They should be ingested asynchronously so analytics pressure does not slow playback.
6. Data Modeling
Video Metadata
video
- video_id
- owner_id
- title
- visibility
- upload_state
- processing_state
- duration_ms
- source_object_key
- created_at
- updated_at
Rendition
video_rendition
- rendition_id
- video_id
- codec
- resolution
- bitrate
- segment_prefix
- status
- created_at
Manifest
playback_manifest
- manifest_id
- video_id
- version
- manifest_object_key
- status
- published_at
Watch Event
watch_event
- event_id
- video_id
- user_id or anonymous_session_id
- session_id
- event_type
- playback_position_ms
- bitrate
- device_type
- region
- occurred_at
Watch events are usually append-heavy and high-volume. They should not be stored like normal transactional metadata.
7. Request Lifecycle
Upload Lifecycle
1. Creator requests an upload session.
2. Upload service returns upload URL/session ID.
3. Client uploads chunks or source bytes.
4. Upload service verifies object size and checksum.
5. Metadata service marks video as uploaded.
6. Processing job is enqueued.
7. Workers transcode renditions and generate segments.
8. Manifest is published.
9. Video becomes playable.
If processing fails, the video should enter a recoverable failed state. Operators or automated repair jobs can retry, reprocess, or mark the upload as invalid.
Playback Lifecycle
1. Viewer opens video page.
2. Application fetches metadata.
3. Viewer presses play.
4. Playback API checks entitlement and availability.
5. Player receives manifest URL.
6. Player downloads manifest.
7. Player downloads initial segments from CDN.
8. Player adapts bitrate based on buffer and network.
9. Player emits watch events asynchronously.
The most important latency moments are startup and rebuffer recovery. Users notice time-to-first-frame and stalls more than they notice backend architecture elegance.
8. Scaling Problems
Hot First Segments
The first segments of popular videos can become extremely hot because many users start playback but fewer finish the entire video. CDN and cache policy should account for this skew.
Processing Queue Pressure
Long videos can monopolize workers. A fair processing system may separate queues by duration, priority, publisher tier, or job type.
Origin Protection
If CDN hit ratio drops, origin storage can suddenly receive traffic it was not sized for. Origin shielding, cache prewarming, and immutable segment paths help.
Watch Event Volume
Playback events can outnumber video metadata writes by orders of magnitude. Event ingestion needs batching, sampling for some event types, and backpressure.
Manifest Correctness
The player can only fetch what the manifest describes. Missing segments, stale manifests, or expired URLs can break playback even if most assets exist.
9. Distributed Systems Concepts
Source Of Truth And Derived Artifacts
The uploaded source file and video metadata are source-of-truth state. Transcoded renditions, segments, thumbnails, manifests, search documents, recommendations signals, and analytics aggregates are derived.
That distinction matters because derived artifacts can be rebuilt, repaired, or regenerated.
Asynchronous Processing
Transcoding should run outside the upload request. This improves upload reliability but introduces processing states, retry policies, and user-facing delays before a video becomes playable.
Backpressure
Processing queues and watch event ingestion need backpressure. Without it, one traffic spike can overload workers, storage, event streams, or analytics consumers.
Idempotency
Upload completion, processing jobs, manifest publication, and watch event ingestion should tolerate retries. Workers may process the same job more than once, so publishing should be versioned or idempotent.
Caching
Metadata, manifests, thumbnails, and media segments have different caching rules. Video segments are usually easier to cache when immutable. Authorization and signed URLs complicate sharing.
10. Reliability & Failure Handling
Important failure modes:
- upload succeeds but processing job is never enqueued
- processing succeeds but manifest publication fails
- manifest points to missing segments
- CDN caches a bad asset
- popular content causes regional CDN misses
- watch event ingestion falls behind
- playback API is healthy but CDN delivery fails
- source file is corrupted or unsupported
- retries create duplicate processing work
Repair strategies:
- reconciliation job finds uploaded videos without processing jobs
- processing state machine prevents silent stuck states
- dead-letter queues capture poison media jobs
- manifest validation checks segment existence before publish
- CDN purge or versioned asset paths recover from bad assets
- watch event pipeline can replay from durable streams
- dashboards track upload-to-playable latency and playback error rates
11. Real-World Company Approaches
Public explanations of large video platforms often mention themes like transcoding pipelines, CDNs, adaptive bitrate playback, metadata services, recommendations, and analytics. The safe lesson is not a private implementation detail. The reusable architecture shape is:
source upload
-> asynchronous media processing
-> playback manifests and segments
-> CDN delivery
-> player telemetry
-> analytics and recommendations
Different products optimize differently.
A creator platform may prioritize upload throughput, moderation, and long-tail storage cost.
A subscription streaming platform may prioritize catalog quality, regional placement, entitlement checks, and predictable playback experience.
Both shapes still separate heavy media processing from low-latency playback.
12. Tradeoffs & Alternatives
| Decision | Option A | Option B | Tradeoff |
|---|---|---|---|
| Processing timing | Transcode before publish | Publish partial availability | Faster availability vs quality completeness |
| Segment length | Short segments | Long segments | Faster adaptation vs request overhead |
| Asset URLs | Stable versioned URLs | Short-lived signed URLs | Cache reuse vs access control |
| CDN strategy | Pull on demand | Prewarm selected content | Lower operational work vs better launch readiness |
| Analytics | Emit every event | Sample some events | Full fidelity vs ingestion cost |
| Encoding ladder | Many renditions | Few renditions | Playback flexibility vs compute/storage cost |
No single choice is universally correct. The product promise drives the architecture.
13. Evolution Path
Stage 1: Direct Upload And Basic Playback
Store source videos, generate one playable version, and serve through simple storage.
Stage 2: Background Transcoding
Introduce processing queues, workers, thumbnails, and multiple renditions.
Stage 3: CDN Delivery
Move playback segments to CDN paths and stop routing media bytes through application servers.
Stage 4: Adaptive Playback
Generate manifests, segment ladders, and player telemetry so playback adapts to network conditions.
Stage 5: Operational Media Platform
Add reprocessing, regional placement, entitlement rules, analytics replay, quality monitoring, and cost controls.
The architecture evolves because scale turns "store and serve a video" into a media lifecycle, delivery, and telemetry system.
14. Key Engineering Lessons
- Video playback should not be treated as a normal file download.
- The uploaded source file is not the same thing viewers stream.
- Transcoding creates derived artifacts that need retries, validation, and repair.
- Adaptive bitrate streaming protects playback from changing network conditions.
- CDN edge caching keeps repeated segment reads close to users.
- Manifest correctness is critical because players trust it.
- Watch analytics should be asynchronous and replayable.
- The viewer path should stay isolated from heavy media processing.
15. Related Topics
What to study next
These links keep the session moving: read prerequisites first, then open the systems, concepts, and patterns that deepen this page.
Used In Systems
System studies where this idea appears in context.
Related Concepts
Core ideas that connect to this topic.
Related Patterns
Reusable architecture moves built from these ideas.