System Design

YouTube / Netflix Video Streaming System

Design a video streaming system that handles uploads, transcoding, adaptive bitrate playback, CDN delivery, metadata, recommendations hooks, and watch analytics.

advanced12 min readUpdated 2026-05-20ModelingCapacityDataReliabilityOperationsTradeoffs
Video TranscodingAdaptive Bitrate StreamingPlayback ManifestsCDN Edge CachingFile ChunkingCachingEvent StreamsBackpressureAnalytics PipelinesIdempotencyDead-Letter Queues

After this, you will understand

Why video streaming is not file download, but a pipeline that turns uploads into many playback artifacts and keeps viewers watching through CDN and adaptive bitrate decisions.

Simple version

Upload one video file, store it, and let every viewer download or stream that original file from the application servers.

Breaks when

Uploads are huge, codecs differ, networks fluctuate, popular segments become globally hot, and watch events create high-volume analytics writes.

Architecture move

Separate upload, processing, playback metadata, CDN delivery, and analytics so the viewer path stays fast while heavy media work runs asynchronously.

Think before readingIf a creator uploads a 4 GB video and one million users start watching it, which work must happen before playback and which work must stay off the playback path?
Transcoding, packaging, and publishing can happen before or around availability. Playback should mostly fetch manifests and CDN-cached segments, while analytics and recommendations signals flow asynchronously.

Reading in progress

This page is saved in your local study history so you can continue later.

Study path

Read these in order

Start with the mechanics, then move into the patterns that explain why the system is shaped this way.

  1. 1Video TranscodingConcept
  2. 2Adaptive Bitrate StreamingConcept
  3. 3Playback ManifestsConcept
  4. 4CDN Edge CachingConcept
  5. 5File ChunkingConcept
  6. 6Analytics PipelinesConcept
  7. 7BackpressureConcept
  8. 8Retry With Backoff And JitterPattern
  9. 9Dead-Letter QueuePattern

Concepts Covered

1. Introduction

A YouTube or Netflix-style video streaming system lets users upload or publish video, browse metadata, start playback quickly, and keep watching while the network changes.

The visible product behavior looks simple: click a video, press play, and watch.

The backend problem is harder because video is large, expensive, and read-heavy. The system is not only storing a file. It is preparing many playback versions, publishing them to delivery infrastructure, authorizing playback, collecting watch signals, and keeping the viewer path fast even when millions of people start the same video.

This module uses "YouTube / Netflix Video Streaming" as a familiar product shape, not as a claim about YouTube, Netflix, or any private implementation.

At small scale, a service can upload one video file and stream it from a web server. At production scale, that fails because:

  • the original upload may not be playable on every device
  • one file cannot fit all network conditions
  • application servers should not serve massive media bytes
  • popular videos create extreme read pressure
  • transcoding is CPU-heavy and slow
  • watch analytics should not slow playback
  • failures can leave video stuck between uploaded, processing, and playable states

The core mental model: video platforms separate the media lifecycle from the playback path.

upload -> process -> publish playback assets -> serve segments -> collect watch signals

2. Product Requirements

Functional Requirements

  • Creators or publishers can upload source videos.
  • The system stores video metadata such as title, duration, owner, visibility, thumbnails, and processing state.
  • Uploaded videos are processed into playback-ready renditions.
  • Viewers can start playback quickly.
  • Playback adapts to device and network conditions.
  • Popular videos can be served to many regions without overloading origin storage.
  • The system records watch events for analytics, recommendations, billing, or creator dashboards.
  • Operators can block, delete, unpublish, or reprocess videos.

Non-Functional Requirements

  • Playback startup should be low latency.
  • The system should minimize buffering during playback.
  • Media delivery should scale with read-heavy traffic.
  • Upload and processing failures should be recoverable.
  • Processing queues should not starve small videos behind huge jobs.
  • Analytics ingestion should tolerate high event volume.
  • Authorization should protect private or paid content.
  • Origin storage and application servers should be shielded from repeated segment reads.

3. Core Engineering Challenges

ChallengeWhy it matters
Large uploadsVideos are too large for fragile one-shot requests.
Transcoding costProcessing is CPU-heavy, slow, and failure-prone.
Device compatibilityDifferent clients support different codecs and resolutions.
Network variabilityViewers move between strong and weak network conditions.
CDN placementMedia bytes should be served close to users.
Hot contentTrending videos make the first segments extremely hot.
Playback metadataPlayers need accurate manifests and segment URLs.
Watch analyticsPlayback creates high-volume event streams.
Entitlement checksPrivate, regional, or paid content needs access control.

The naive implementation fails when it treats a video as one file served by one service. A production design treats video as a pipeline of state transitions and derived assets.

4. High-Level Architecture

flowchart LR
  Creator[Creator Client] --> UploadAPI[Upload API]
  UploadAPI --> SourceStore[(Source Object Store)]
  UploadAPI --> MetadataDB[(Video Metadata DB)]
  UploadAPI --> ProcessingQueue[Processing Queue]

  ProcessingQueue --> TranscodeWorkers[Transcode Workers]
  TranscodeWorkers --> PlaybackStore[(Playback Asset Store)]
  TranscodeWorkers --> ManifestService[Manifest Publisher]
  ManifestService --> CDN[CDN Edge Caches]

  Viewer[Viewer Client] --> PlaybackAPI[Playback API]
  PlaybackAPI --> MetadataDB
  PlaybackAPI --> ManifestService
  Viewer --> CDN

  Viewer --> WatchEvents[Watch Event Ingestion]
  WatchEvents --> EventStream[Event Stream]
  EventStream --> AnalyticsStore[(Analytics Store)]
  EventStream --> RecommendationSignals[Recommendation Signals]

The upload path, processing path, playback path, and analytics path have different priorities.

  • Upload cares about durability, resumability, and source metadata.
  • Processing cares about queues, retries, workers, and derived artifacts.
  • Playback cares about latency, entitlement, manifests, CDN, and segments.
  • Analytics cares about high-volume ingestion and delayed aggregation.

5. Core Components

Upload API

The Upload API creates an upload session and records durable intent. Large videos should not depend on one long request. The upload path may use chunked or resumable upload so clients can recover from network drops.

The upload state might move through:

created -> uploading -> uploaded -> processing -> playable -> failed

The Upload API should not synchronously transcode the video. It should verify the source object, write metadata, and enqueue processing work.

Source Object Store

The source object store holds the original uploaded file. This file is the source of truth for reprocessing, new encodings, quality fixes, thumbnails, and audit workflows.

The source file may be rarely read after processing, but it must be durable.

Video Metadata Service

Metadata includes:

  • video ID
  • owner or publisher ID
  • title and description
  • visibility
  • duration
  • upload state
  • processing state
  • available renditions
  • thumbnail references
  • moderation or policy state
  • regional or entitlement rules

Metadata is on the playback control path. Media bytes are not.

Transcoding Pipeline

The transcoding pipeline reads the source file and produces playback renditions. Workers may generate several resolutions, bitrates, audio tracks, thumbnails, captions, or preview sprites.

This is asynchronous because it is expensive and can fail. It needs retry limits, dead-letter handling, backpressure, and job prioritization.

Playback Asset Store

Playback assets are the segments, thumbnails, audio tracks, subtitle tracks, and manifests that viewers fetch. These assets should be immutable or versioned so CDN caches can serve them safely.

Manifest Service

The manifest describes which renditions and segments are available. A player fetches it before downloading media segments.

The manifest is a contract:

these renditions exist
these segment URLs are valid
these tracks align on this timeline

If the manifest points to missing assets, playback fails.

CDN

The CDN serves playback segments close to viewers. It protects origin storage and reduces latency.

The CDN is especially important because video traffic is skewed. A small number of videos can dominate bandwidth, and the first few segments of each video are often hotter than later segments.

Playback API

The Playback API checks metadata, entitlement, region, device constraints, and playback state. It returns enough information for the player to fetch a manifest and begin playback.

It should not proxy every segment through the application backend. The heavy bytes should flow through CDN paths.

Watch Event Ingestion

Players emit events such as:

  • playback started
  • first frame rendered
  • segment downloaded
  • quality changed
  • rebuffer started
  • rebuffer ended
  • playback paused
  • watch progress
  • playback ended

These events feed analytics, recommendations, creator dashboards, experiments, and reliability monitoring. They should be ingested asynchronously so analytics pressure does not slow playback.

6. Data Modeling

Video Metadata

video
- video_id
- owner_id
- title
- visibility
- upload_state
- processing_state
- duration_ms
- source_object_key
- created_at
- updated_at

Rendition

video_rendition
- rendition_id
- video_id
- codec
- resolution
- bitrate
- segment_prefix
- status
- created_at

Manifest

playback_manifest
- manifest_id
- video_id
- version
- manifest_object_key
- status
- published_at

Watch Event

watch_event
- event_id
- video_id
- user_id or anonymous_session_id
- session_id
- event_type
- playback_position_ms
- bitrate
- device_type
- region
- occurred_at

Watch events are usually append-heavy and high-volume. They should not be stored like normal transactional metadata.

7. Request Lifecycle

Upload Lifecycle

1. Creator requests an upload session.
2. Upload service returns upload URL/session ID.
3. Client uploads chunks or source bytes.
4. Upload service verifies object size and checksum.
5. Metadata service marks video as uploaded.
6. Processing job is enqueued.
7. Workers transcode renditions and generate segments.
8. Manifest is published.
9. Video becomes playable.

If processing fails, the video should enter a recoverable failed state. Operators or automated repair jobs can retry, reprocess, or mark the upload as invalid.

Playback Lifecycle

1. Viewer opens video page.
2. Application fetches metadata.
3. Viewer presses play.
4. Playback API checks entitlement and availability.
5. Player receives manifest URL.
6. Player downloads manifest.
7. Player downloads initial segments from CDN.
8. Player adapts bitrate based on buffer and network.
9. Player emits watch events asynchronously.

The most important latency moments are startup and rebuffer recovery. Users notice time-to-first-frame and stalls more than they notice backend architecture elegance.

8. Scaling Problems

Hot First Segments

The first segments of popular videos can become extremely hot because many users start playback but fewer finish the entire video. CDN and cache policy should account for this skew.

Processing Queue Pressure

Long videos can monopolize workers. A fair processing system may separate queues by duration, priority, publisher tier, or job type.

Origin Protection

If CDN hit ratio drops, origin storage can suddenly receive traffic it was not sized for. Origin shielding, cache prewarming, and immutable segment paths help.

Watch Event Volume

Playback events can outnumber video metadata writes by orders of magnitude. Event ingestion needs batching, sampling for some event types, and backpressure.

Manifest Correctness

The player can only fetch what the manifest describes. Missing segments, stale manifests, or expired URLs can break playback even if most assets exist.

9. Distributed Systems Concepts

Source Of Truth And Derived Artifacts

The uploaded source file and video metadata are source-of-truth state. Transcoded renditions, segments, thumbnails, manifests, search documents, recommendations signals, and analytics aggregates are derived.

That distinction matters because derived artifacts can be rebuilt, repaired, or regenerated.

Asynchronous Processing

Transcoding should run outside the upload request. This improves upload reliability but introduces processing states, retry policies, and user-facing delays before a video becomes playable.

Backpressure

Processing queues and watch event ingestion need backpressure. Without it, one traffic spike can overload workers, storage, event streams, or analytics consumers.

Idempotency

Upload completion, processing jobs, manifest publication, and watch event ingestion should tolerate retries. Workers may process the same job more than once, so publishing should be versioned or idempotent.

Caching

Metadata, manifests, thumbnails, and media segments have different caching rules. Video segments are usually easier to cache when immutable. Authorization and signed URLs complicate sharing.

10. Reliability & Failure Handling

Important failure modes:

  • upload succeeds but processing job is never enqueued
  • processing succeeds but manifest publication fails
  • manifest points to missing segments
  • CDN caches a bad asset
  • popular content causes regional CDN misses
  • watch event ingestion falls behind
  • playback API is healthy but CDN delivery fails
  • source file is corrupted or unsupported
  • retries create duplicate processing work

Repair strategies:

  • reconciliation job finds uploaded videos without processing jobs
  • processing state machine prevents silent stuck states
  • dead-letter queues capture poison media jobs
  • manifest validation checks segment existence before publish
  • CDN purge or versioned asset paths recover from bad assets
  • watch event pipeline can replay from durable streams
  • dashboards track upload-to-playable latency and playback error rates

11. Real-World Company Approaches

Public explanations of large video platforms often mention themes like transcoding pipelines, CDNs, adaptive bitrate playback, metadata services, recommendations, and analytics. The safe lesson is not a private implementation detail. The reusable architecture shape is:

source upload
  -> asynchronous media processing
  -> playback manifests and segments
  -> CDN delivery
  -> player telemetry
  -> analytics and recommendations

Different products optimize differently.

A creator platform may prioritize upload throughput, moderation, and long-tail storage cost.

A subscription streaming platform may prioritize catalog quality, regional placement, entitlement checks, and predictable playback experience.

Both shapes still separate heavy media processing from low-latency playback.

12. Tradeoffs & Alternatives

DecisionOption AOption BTradeoff
Processing timingTranscode before publishPublish partial availabilityFaster availability vs quality completeness
Segment lengthShort segmentsLong segmentsFaster adaptation vs request overhead
Asset URLsStable versioned URLsShort-lived signed URLsCache reuse vs access control
CDN strategyPull on demandPrewarm selected contentLower operational work vs better launch readiness
AnalyticsEmit every eventSample some eventsFull fidelity vs ingestion cost
Encoding ladderMany renditionsFew renditionsPlayback flexibility vs compute/storage cost

No single choice is universally correct. The product promise drives the architecture.

13. Evolution Path

Stage 1: Direct Upload And Basic Playback

Store source videos, generate one playable version, and serve through simple storage.

Stage 2: Background Transcoding

Introduce processing queues, workers, thumbnails, and multiple renditions.

Stage 3: CDN Delivery

Move playback segments to CDN paths and stop routing media bytes through application servers.

Stage 4: Adaptive Playback

Generate manifests, segment ladders, and player telemetry so playback adapts to network conditions.

Stage 5: Operational Media Platform

Add reprocessing, regional placement, entitlement rules, analytics replay, quality monitoring, and cost controls.

The architecture evolves because scale turns "store and serve a video" into a media lifecycle, delivery, and telemetry system.

14. Key Engineering Lessons

  • Video playback should not be treated as a normal file download.
  • The uploaded source file is not the same thing viewers stream.
  • Transcoding creates derived artifacts that need retries, validation, and repair.
  • Adaptive bitrate streaming protects playback from changing network conditions.
  • CDN edge caching keeps repeated segment reads close to users.
  • Manifest correctness is critical because players trust it.
  • Watch analytics should be asynchronous and replayable.
  • The viewer path should stay isolated from heavy media processing.

What to study next

These links keep the session moving: read prerequisites first, then open the systems, concepts, and patterns that deepen this page.