System Design

YouTube / Netflix Video Streaming System

Design a video streaming system that handles uploads, transcoding, adaptive bitrate playback, CDN delivery, metadata, recommendations hooks, and watch analytics.

advanced12 min readUpdated 2026-05-20ModelingCapacityDataReliabilityOperationsTradeoffs

Video TranscodingAdaptive Bitrate StreamingPlayback ManifestsCDN Edge CachingFile ChunkingCachingEvent StreamsBackpressureAnalytics PipelinesIdempotencyDead-Letter Queues

After this, you will understand

Why video streaming is not file download, but a pipeline that turns uploads into many playback artifacts and keeps viewers watching through CDN and adaptive bitrate decisions.

Simple version

Upload one video file, store it, and let every viewer download or stream that original file from the application servers.

Breaks when

Uploads are huge, codecs differ, networks fluctuate, popular segments become globally hot, and watch events create high-volume analytics writes.

Architecture move

Separate upload, processing, playback metadata, CDN delivery, and analytics so the viewer path stays fast while heavy media work runs asynchronously.

Think before readingIf a creator uploads a 4 GB video and one million users start watching it, which work must happen before playback and which work must stay off the playback path?

Transcoding, packaging, and publishing can happen before or around availability. Playback should mostly fetch manifests and CDN-cached segments, while analytics and recommendations signals flow asynchronously.

Reading in progress

This page is saved in your local study history so you can continue later.

Next: Netflix-Style Global Live Event Streaming System

Study path

Read these in order

Start with the mechanics, then move into the patterns that explain why the system is shaped this way.

Concepts Covered

Video transcoding
Adaptive bitrate streaming
Playback manifests
CDN edge caching
Source uploads and file chunking
Metadata and entitlement checks
Playback session lifecycle
Watch events and analytics pipelines
Processing queues, retries, and backpressure
Dead-letter queues for failed media jobs

1. Introduction

A YouTube or Netflix-style video streaming system lets users upload or publish video, browse metadata, start playback quickly, and keep watching while the network changes.

The visible product behavior looks simple: click a video, press play, and watch.

The backend problem is harder because video is large, expensive, and read-heavy. The system is not only storing a file. It is preparing many playback versions, publishing them to delivery infrastructure, authorizing playback, collecting watch signals, and keeping the viewer path fast even when millions of people start the same video.

This module uses "YouTube / Netflix Video Streaming" as a familiar product shape, not as a claim about YouTube, Netflix, or any private implementation.

At small scale, a service can upload one video file and stream it from a web server. At production scale, that fails because:

the original upload may not be playable on every device
one file cannot fit all network conditions
application servers should not serve massive media bytes
popular videos create extreme read pressure
transcoding is CPU-heavy and slow
watch analytics should not slow playback
failures can leave video stuck between uploaded, processing, and playable states

The core mental model: video platforms separate the media lifecycle from the playback path.

upload -> process -> publish playback assets -> serve segments -> collect watch signals

2. Product Requirements

Functional Requirements

Creators or publishers can upload source videos.
The system stores video metadata such as title, duration, owner, visibility, thumbnails, and processing state.
Uploaded videos are processed into playback-ready renditions.
Viewers can start playback quickly.
Playback adapts to device and network conditions.
Popular videos can be served to many regions without overloading origin storage.
The system records watch events for analytics, recommendations, billing, or creator dashboards.
Operators can block, delete, unpublish, or reprocess videos.

Non-Functional Requirements

Playback startup should be low latency.
The system should minimize buffering during playback.
Media delivery should scale with read-heavy traffic.
Upload and processing failures should be recoverable.
Processing queues should not starve small videos behind huge jobs.
Analytics ingestion should tolerate high event volume.
Authorization should protect private or paid content.
Origin storage and application servers should be shielded from repeated segment reads.

3. Core Engineering Challenges

Challenge	Why it matters
Large uploads	Videos are too large for fragile one-shot requests.
Transcoding cost	Processing is CPU-heavy, slow, and failure-prone.
Device compatibility	Different clients support different codecs and resolutions.
Network variability	Viewers move between strong and weak network conditions.
CDN placement	Media bytes should be served close to users.
Hot content	Trending videos make the first segments extremely hot.
Playback metadata	Players need accurate manifests and segment URLs.
Watch analytics	Playback creates high-volume event streams.
Entitlement checks	Private, regional, or paid content needs access control.

The naive implementation fails when it treats a video as one file served by one service. A production design treats video as a pipeline of state transitions and derived assets.

4. High-Level Architecture

flowchart LR
  Creator[Creator Client] --> UploadAPI[Upload API]
  UploadAPI --> SourceStore[(Source Object Store)]
  UploadAPI --> MetadataDB[(Video Metadata DB)]
  UploadAPI --> ProcessingQueue[Processing Queue]

  ProcessingQueue --> TranscodeWorkers[Transcode Workers]
  TranscodeWorkers --> PlaybackStore[(Playback Asset Store)]
  TranscodeWorkers --> ManifestService[Manifest Publisher]
  ManifestService --> CDN[CDN Edge Caches]

  Viewer[Viewer Client] --> PlaybackAPI[Playback API]
  PlaybackAPI --> MetadataDB
  PlaybackAPI --> ManifestService
  Viewer --> CDN

  Viewer --> WatchEvents[Watch Event Ingestion]
  WatchEvents --> EventStream[Event Stream]
  EventStream --> AnalyticsStore[(Analytics Store)]
  EventStream --> RecommendationSignals[Recommendation Signals]

The upload path, processing path, playback path, and analytics path have different priorities.

Upload cares about durability, resumability, and source metadata.
Processing cares about queues, retries, workers, and derived artifacts.
Playback cares about latency, entitlement, manifests, CDN, and segments.
Analytics cares about high-volume ingestion and delayed aggregation.

5. Core Components

Upload API

The Upload API creates an upload session and records durable intent. Large videos should not depend on one long request. The upload path may use chunked or resumable upload so clients can recover from network drops.

The upload state might move through:

created -> uploading -> uploaded -> processing -> playable -> failed

The Upload API should not synchronously transcode the video. It should verify the source object, write metadata, and enqueue processing work.

Source Object Store

The source object store holds the original uploaded file. This file is the source of truth for reprocessing, new encodings, quality fixes, thumbnails, and audit workflows.

The source file may be rarely read after processing, but it must be durable.

Video Metadata Service

Metadata includes:

video ID
owner or publisher ID
title and description
visibility
duration
upload state
processing state
available renditions
thumbnail references
moderation or policy state
regional or entitlement rules

Metadata is on the playback control path. Media bytes are not.

Transcoding Pipeline

The transcoding pipeline reads the source file and produces playback renditions. Workers may generate several resolutions, bitrates, audio tracks, thumbnails, captions, or preview sprites.

This is asynchronous because it is expensive and can fail. It needs retry limits, dead-letter handling, backpressure, and job prioritization.

Playback Asset Store

Playback assets are the segments, thumbnails, audio tracks, subtitle tracks, and manifests that viewers fetch. These assets should be immutable or versioned so CDN caches can serve them safely.

Manifest Service

The manifest describes which renditions and segments are available. A player fetches it before downloading media segments.

The manifest is a contract:

these renditions exist
these segment URLs are valid
these tracks align on this timeline

If the manifest points to missing assets, playback fails.

CDN

The CDN serves playback segments close to viewers. It protects origin storage and reduces latency.

The CDN is especially important because video traffic is skewed. A small number of videos can dominate bandwidth, and the first few segments of each video are often hotter than later segments.

Playback API

The Playback API checks metadata, entitlement, region, device constraints, and playback state. It returns enough information for the player to fetch a manifest and begin playback.

It should not proxy every segment through the application backend. The heavy bytes should flow through CDN paths.

Watch Event Ingestion

Players emit events such as:

playback started
first frame rendered
segment downloaded
quality changed
rebuffer started
rebuffer ended
playback paused
watch progress
playback ended

These events feed analytics, recommendations, creator dashboards, experiments, and reliability monitoring. They should be ingested asynchronously so analytics pressure does not slow playback.

6. Data Modeling

Video Metadata

video
- video_id
- owner_id
- title
- visibility
- upload_state
- processing_state
- duration_ms
- source_object_key
- created_at
- updated_at

Rendition

video_rendition
- rendition_id
- video_id
- codec
- resolution
- bitrate
- segment_prefix
- status
- created_at

Manifest

playback_manifest
- manifest_id
- video_id
- version
- manifest_object_key
- status
- published_at

Watch Event

watch_event
- event_id
- video_id
- user_id or anonymous_session_id
- session_id
- event_type
- playback_position_ms
- bitrate
- device_type
- region
- occurred_at

Watch events are usually append-heavy and high-volume. They should not be stored like normal transactional metadata.

7. Request Lifecycle

Upload Lifecycle

1. Creator requests an upload session.
2. Upload service returns upload URL/session ID.
3. Client uploads chunks or source bytes.
4. Upload service verifies object size and checksum.
5. Metadata service marks video as uploaded.
6. Processing job is enqueued.
7. Workers transcode renditions and generate segments.
8. Manifest is published.
9. Video becomes playable.

If processing fails, the video should enter a recoverable failed state. Operators or automated repair jobs can retry, reprocess, or mark the upload as invalid.

Playback Lifecycle

1. Viewer opens video page.
2. Application fetches metadata.
3. Viewer presses play.
4. Playback API checks entitlement and availability.
5. Player receives manifest URL.
6. Player downloads manifest.
7. Player downloads initial segments from CDN.
8. Player adapts bitrate based on buffer and network.
9. Player emits watch events asynchronously.

The most important latency moments are startup and rebuffer recovery. Users notice time-to-first-frame and stalls more than they notice backend architecture elegance.

8. Scaling Problems

Hot First Segments

The first segments of popular videos can become extremely hot because many users start playback but fewer finish the entire video. CDN and cache policy should account for this skew.

Processing Queue Pressure

Long videos can monopolize workers. A fair processing system may separate queues by duration, priority, publisher tier, or job type.

Origin Protection

If CDN hit ratio drops, origin storage can suddenly receive traffic it was not sized for. Origin shielding, cache prewarming, and immutable segment paths help.

Watch Event Volume

Playback events can outnumber video metadata writes by orders of magnitude. Event ingestion needs batching, sampling for some event types, and backpressure.

Manifest Correctness

The player can only fetch what the manifest describes. Missing segments, stale manifests, or expired URLs can break playback even if most assets exist.

9. Distributed Systems Concepts

Source Of Truth And Derived Artifacts

The uploaded source file and video metadata are source-of-truth state. Transcoded renditions, segments, thumbnails, manifests, search documents, recommendations signals, and analytics aggregates are derived.

That distinction matters because derived artifacts can be rebuilt, repaired, or regenerated.

Asynchronous Processing

Transcoding should run outside the upload request. This improves upload reliability but introduces processing states, retry policies, and user-facing delays before a video becomes playable.

Backpressure

Processing queues and watch event ingestion need backpressure. Without it, one traffic spike can overload workers, storage, event streams, or analytics consumers.

Idempotency

Upload completion, processing jobs, manifest publication, and watch event ingestion should tolerate retries. Workers may process the same job more than once, so publishing should be versioned or idempotent.

Caching

Metadata, manifests, thumbnails, and media segments have different caching rules. Video segments are usually easier to cache when immutable. Authorization and signed URLs complicate sharing.

10. Reliability & Failure Handling

Important failure modes:

upload succeeds but processing job is never enqueued
processing succeeds but manifest publication fails
manifest points to missing segments
CDN caches a bad asset
popular content causes regional CDN misses
watch event ingestion falls behind
playback API is healthy but CDN delivery fails
source file is corrupted or unsupported
retries create duplicate processing work

Repair strategies:

reconciliation job finds uploaded videos without processing jobs
processing state machine prevents silent stuck states
dead-letter queues capture poison media jobs
manifest validation checks segment existence before publish
CDN purge or versioned asset paths recover from bad assets
watch event pipeline can replay from durable streams
dashboards track upload-to-playable latency and playback error rates

11. Real-World Company Approaches

Public explanations of large video platforms often mention themes like transcoding pipelines, CDNs, adaptive bitrate playback, metadata services, recommendations, and analytics. The safe lesson is not a private implementation detail. The reusable architecture shape is:

source upload
  -> asynchronous media processing
  -> playback manifests and segments
  -> CDN delivery
  -> player telemetry
  -> analytics and recommendations

Different products optimize differently.

A creator platform may prioritize upload throughput, moderation, and long-tail storage cost.

A subscription streaming platform may prioritize catalog quality, regional placement, entitlement checks, and predictable playback experience.

Both shapes still separate heavy media processing from low-latency playback.

12. Tradeoffs & Alternatives

Decision	Option A	Option B	Tradeoff
Processing timing	Transcode before publish	Publish partial availability	Faster availability vs quality completeness
Segment length	Short segments	Long segments	Faster adaptation vs request overhead
Asset URLs	Stable versioned URLs	Short-lived signed URLs	Cache reuse vs access control
CDN strategy	Pull on demand	Prewarm selected content	Lower operational work vs better launch readiness
Analytics	Emit every event	Sample some events	Full fidelity vs ingestion cost
Encoding ladder	Many renditions	Few renditions	Playback flexibility vs compute/storage cost

No single choice is universally correct. The product promise drives the architecture.

Video playback should not be treated as a normal file download.
The uploaded source file is not the same thing viewers stream.
Transcoding creates derived artifacts that need retries, validation, and repair.
Adaptive bitrate streaming protects playback from changing network conditions.
CDN edge caching keeps repeated segment reads close to users.
Manifest correctness is critical because players trust it.
Watch analytics should be asynchronous and replayable.
The viewer path should stay isolated from heavy media processing.

What to study next

These links keep the session moving: read prerequisites first, then open the systems, concepts, and patterns that deepen this page.

Used In Systems

System studies where this idea appears in context.

Netflix-Style Global Live Event Streaming SystemSee the idea under full production pressure.

Related Concepts

Core ideas that connect to this topic.

Related Patterns

Reusable architecture moves built from these ideas.

Upload Then Reference MediaLearn the reusable move this page points toward.Retry With Backoff And JitterLearn the reusable move this page points toward.Dead-Letter QueueLearn the reusable move this page points toward.Bulkhead IsolationLearn the reusable move this page points toward.