System Design

Google Drive / Dropbox File Sync System

Design a cloud file sync system that handles chunked uploads, metadata sync, version history, conflict resolution, delta transfer, offline clients, and repair.

advanced13 min readUpdated 2026-05-20ModelingCapacityDataReliabilityOperationsTradeoffs

File ChunkingMetadata SyncDelta TransferVersion HistoryConflict ResolutionCursor-Based SyncIdempotencyEventual ConsistencyBackpressureReconciliation

After this, you will understand

Why file sync is not uploading files, but reconciling chunks, metadata, versions, conflicts, offline clients, and repair after partial failure.

Simple version

Upload the whole file on every save and let each device poll for the latest folder state.

Breaks when

Files are large, clients disconnect, edits overlap, renames and deletes need ordering, and polling is too slow for efficient recovery.

Architecture move

Separate chunked content storage from metadata sync, use cursors and version history, detect conflicts, and run reconciliation for drift.

Think before readingIf a laptop edits a file offline while another device renames it, which part of the system should decide what changed?

The content bytes, metadata timeline, and sync cursor need separate treatment. Conflict handling should preserve user work while metadata sync explains the order of changes.

Reading in progress

This page is saved in your local study history so you can continue later.

Next: YouTube / Netflix Video Streaming System

Study path

Read these in order

Start with the mechanics, then move into the patterns that explain why the system is shaped this way.

Concepts Covered

File chunking
Metadata sync
Delta transfer
Version history
Conflict resolution
Cursor-based sync
Offline-first clients
Tombstones and deletes
Chunk manifests
Sync journals
Reconciliation and repair

1. Introduction

A Google Drive or Dropbox-style file sync system keeps files available across laptops, phones, browsers, shared folders, and offline clients.

The visible product behavior looks simple: save a file on one device, see it on another device.

The backend problem is harder because file sync is not one problem. It is several problems that happen to look like one product:

moving large bytes efficiently
keeping folder metadata coherent
handling offline edits
preserving older versions
resolving conflicts without losing work
syncing many devices after network gaps
enforcing permissions while clients cache data locally
repairing drift after partial failures

This module uses "Google Drive / Dropbox File Sync" as a familiar product shape, not as a claim about Google Drive, Dropbox, or any private implementation.

At small scale, a client can upload a whole file and the server can store the latest copy.

At production scale, that naive model breaks because files are large, clients disconnect, edits overlap, deletes need tombstones, metadata changes outnumber byte changes, and users expect recovery when sync goes wrong.

2. Product Requirements

Functional Requirements

Users can create, upload, edit, rename, move, delete, and restore files.
Users can organize files into folders.
Clients can sync changes across multiple devices.
Clients can continue working while offline and sync later.
Large files can upload and download reliably.
The product keeps version history for recovery.
Concurrent edits should not silently lose user data.
Shared folders and permissions should affect visibility.
Clients can resume partial uploads and downloads.
The system can repair metadata or chunk drift.

Non-Functional Requirements

File bytes must be durable once a version is committed.
Metadata reads and writes should feel low latency.
Sync should avoid re-uploading unchanged bytes.
Offline clients should converge when they reconnect.
Conflict handling should preserve user work.
The system should tolerate object storage, worker, and network failures.
Sync storms should not overload the metadata or blob plane.
Permission changes should propagate quickly enough to avoid unsafe access.
Operators should be able to audit, reconcile, and restore state.

3. Core Engineering Challenges

The core challenge is that file sync has two very different planes.

The blob plane handles bytes:

chunks -> manifests -> object storage -> download

The metadata plane handles meaning:

file name -> folder -> current version -> permissions -> delete state

Treating these as the same path makes the system slow and fragile. Treating them as completely independent creates drift.

The hard parts are:

A file version should not become visible before its chunks are durable.
A delete must reach offline devices later.
A rename should not look like a delete plus unrelated create unless the product chooses that model.
A stale device should not overwrite newer work.
A new device should be able to bootstrap from a snapshot and then catch up with a journal.
A reconnect storm should not cause every client to scan and download everything at once.
Permissions must apply to metadata and bytes, including cached or old versions.

4. High-Level Architecture

A practical design separates client sync, metadata, blob storage, and async repair.

Client Sync Engine
  -> Metadata API
  -> Upload Session API
  -> Chunk Upload Service
  -> Sync Feed API

Metadata API
  -> File Metadata Store
  -> Version Store
  -> Sync Event Journal
  -> Permission Store

Chunk Upload Service
  -> Chunk Store / Object Storage
  -> Manifest Store
  -> Integrity Verifier

Async Workers
  -> Notification Fan-Out
  -> Garbage Collection
  -> Reconciliation
  -> Malware / Policy Scanning
  -> Search / Preview Indexing

The metadata service is the source of truth for file identity, folder placement, current version, tombstones, and permissions.

The chunk store is the durable storage layer for bytes. It should not decide which file version is current. It stores chunks and manifests that metadata records reference.

The sync journal lets devices ask:

Give me all metadata changes after cursor 84211.

That journal is what makes offline recovery deterministic.

5. Core Components

Client Sync Engine

The client sync engine watches local file changes, computes chunks, uploads missing data, pulls metadata changes, applies server events, and manages local cache state.

It should treat the server as authoritative for committed metadata while still allowing local optimistic work.

Metadata API

The Metadata API handles file and folder operations:

create file
rename file
move file
delete file
commit new version
restore old version
list folder
fetch sync changes

It validates permissions, applies version checks, writes metadata records, and appends sync events.

File Metadata Store

The metadata store keeps the file tree:

stable file IDs
parent folder IDs
names
owner and workspace
current version pointer
delete state
metadata version

This store should be optimized for folder listing, file lookup, and mutation safety.

Chunk Upload Service

The chunk service accepts large byte uploads in smaller units.

It verifies hashes, stores chunks, supports resume, and avoids forcing large binary payloads through the metadata service.

Manifest Store

A manifest describes the chunks that make up a file version.

version v17 -> [chunk_a, chunk_b, chunk_c]

This lets multiple file versions reuse unchanged chunks.

Version Store

The version store tracks every committed file version, its manifest, its parent version, who created it, and when.

Version history is a recovery feature, not just a storage detail.

Sync Event Journal

The sync journal records metadata changes in an ordered stream.

Clients use it to recover after disconnection and to avoid expensive full scans.

Conflict Resolver

The conflict resolver decides what happens when a client tries to commit a version based on stale metadata.

For normal files, preserving both versions as a conflict copy is often safer than silently overwriting user work.

Reconciliation Workers

Reconciliation workers compare metadata, manifests, chunks, sync events, and local indexes to detect drift.

They repair safe inconsistencies and alert on unsafe ones.

6. Data Modeling

File Node

file_node
- file_id
- workspace_id
- parent_id
- name
- type: file | folder
- owner_id
- current_version_id
- metadata_version
- deleted_at
- created_at
- updated_at

The file_id should remain stable across renames and moves.

File Version

file_version
- version_id
- file_id
- parent_version_id
- manifest_id
- created_by_user_id
- created_by_device_id
- base_version_id
- size_bytes
- content_hash
- created_at
- reason: upload | edit | restore | conflict

The base_version_id is important for conflict detection. A client should say which version it edited.

Chunk

chunk
- chunk_hash
- size_bytes
- storage_key
- verification_state
- reference_count
- created_at

Chunks can be content-addressed by hash, but access control should be enforced through file metadata and version references, not by exposing raw chunks as public objects.

Manifest

manifest
- manifest_id
- chunk_hashes
- total_size_bytes
- algorithm
- created_at

The manifest is the bridge between version metadata and stored bytes.

Sync Event

sync_event
- sequence
- workspace_id
- file_id
- event_type
- metadata_version
- payload
- created_at

Events should contain enough information for clients to update local state or fetch the needed records.

Device Sync State

device_sync_state
- device_id
- workspace_id
- last_sync_sequence
- last_successful_sync_at
- client_version

Per-device state matters because a phone, laptop, and tablet may all be at different sync positions.

Upload Session

upload_session
- session_id
- user_id
- device_id
- file_id
- base_version_id
- expected_size_bytes
- uploaded_chunks
- expires_at
- state

Upload sessions help resume large uploads and clean up abandoned work.

7. Request Lifecycle

Uploading A New File

1. Client detects a new local file.
2. Client splits the file into chunks and computes hashes.
3. Client asks server which chunks already exist.
4. Client uploads missing chunks.
5. Server verifies chunk hashes and stores them.
6. Client commits a manifest and file version through Metadata API.
7. Metadata API creates file_node and file_version in a transaction.
8. Metadata API appends sync_event.
9. Other devices receive push hints or later pull the sync feed.
10. Other devices download only the chunks they need.

The key boundary is step 6. Uploaded chunks alone do not make a visible file. The metadata commit does.

Editing An Existing File

1. Client reads current server version v7.
2. User edits locally while online or offline.
3. Client computes new chunk manifest.
4. Client uploads missing chunks.
5. Client commits new version with base_version_id = v7.
6. Server checks whether current_version_id is still v7.
7. If yes, server advances current_version_id to v8.
8. If no, server applies conflict policy.

This prevents stale clients from silently overwriting newer versions.

Syncing A Reconnected Device

1. Device reconnects with last_sync_sequence = 84211.
2. Sync API returns metadata events after 84211.
3. Client applies creates, updates, moves, deletes, and permission changes.
4. Client decides which file versions need local bytes.
5. Client downloads missing chunks in the background.
6. Client advances cursor only after metadata events are applied safely.

Push can wake the device, but durable sync should be the source of truth.

Handling A Conflict

laptop commits version v8 based on v7
phone later commits version candidate based on v7
server sees current version is v8
server creates conflict version v9_conflict
server exposes both user-visible states

The product may create a conflict copy, ask the user to choose, or run a file-type-specific merge if it can do so safely.

8. Scaling Problems

Large File Uploads

Large uploads create long-lived connections, retry pressure, and storage load.

Chunking and resumable upload reduce wasted work, but they introduce metadata overhead and orphan cleanup.

Metadata Hotspots

Shared folders can become hot because many users read or update the same file tree.

The system may need folder-level partitioning, read replicas, caching, and rate limits around expensive listing or permission expansion.

Sync Fan-Out

One metadata change may need to reach many devices.

The system should avoid pushing full changes to every client synchronously. A push hint can tell devices to fetch from the sync journal when ready.

Reconnect Storms

After a network outage, client release, or regional recovery, many devices may reconnect and run sync at the same time.

Backpressure is important. Clients should use jitter, pagination, rate limits, and resumable sync.

Version Storage Growth

Version history improves recovery, but it grows storage and metadata.

Chunk reuse helps, but retention, garbage collection, and legal deletion rules still need careful design.

Conflict Rate

Conflicts are not only an edge case. Shared folders, offline edits, and weak networks can make conflicts common.

Operators should be able to see conflict rate by file type, workspace, client version, and folder.

9. Distributed Systems Concepts

File Chunking

File chunking makes large files retryable and reusable. It moves the system away from whole-file transfer.

Metadata Sync

Metadata sync is the durable path that tells clients what changed. It is usually more important than raw byte movement.

Delta Transfer

Delta transfer avoids moving unchanged bytes. It saves bandwidth, but it does not decide correctness.

Version History

Version history makes bad overwrites recoverable. It gives users and operators a normal restore path.

Conflict Resolution

Conflict resolution protects user work when concurrent edits cannot be safely merged.

Cursor-Based Sync

Cursor-based sync lets devices recover missed changes after disconnecting.

Idempotency

Idempotency prevents client retries from creating duplicate files, duplicate versions, or repeated conflict copies.

Eventual Consistency

Eventual consistency appears because clients, metadata replicas, local indexes, previews, search, and file bytes may converge at different times.

Backpressure

Backpressure keeps sync recovery from becoming the incident during reconnect storms or storage slowness.

10. Reliability & Failure Handling

Upload Succeeds But Metadata Commit Fails

The uploaded chunks are now orphaned. They should expire through upload session cleanup or garbage collection.

The user should be able to retry the metadata commit if the client still has the manifest.

Metadata Commit Succeeds But Chunk Is Missing

This is more serious. A visible file version points at unavailable bytes.

The system should verify chunk existence before commit, monitor manifest integrity, and reconcile manifests against chunk storage.

Client Retries Commit After Timeout

The client may not know whether the server accepted the write.

Use a client mutation ID so retrying the same commit returns the same result instead of creating another version.

Delete Races With Upload

A user may delete a file while another device uploads a new version.

The metadata service must define whether the upload is rejected, restored as a new file, or converted into a conflict.

Permission Changes Lag

If a file is unshared, clients and cached download URLs may still exist.

The system should use short-lived download authorization, permission-aware metadata sync, and server-side checks before issuing fresh access.

Sync Journal Retention Expires

An old device may reconnect with a cursor older than the retained journal.

The device needs a snapshot sync path:

your cursor is too old, rebuild from current metadata snapshot

Reconciliation Finds Drift

Drift can happen between file nodes, versions, manifests, chunks, local search indexes, and sync events.

Repair should be careful. Some drift can be fixed automatically. Missing chunks for visible versions should page an operator or mark the version unavailable until repaired.

11. Real-World Company Approaches

Large cloud file products usually separate file metadata from blob storage because these workloads behave differently.

Common public architecture patterns include:

chunked or resumable uploads for large files
content hashes for integrity and deduplication
metadata journals for client sync
local client indexes for offline mode
tombstones for deletes
conflict copies when automatic merge is unsafe
version history for restore
async workers for previews, scanning, indexing, and cleanup

Do not assume every product uses the same exact model. The important lesson is the pressure: file bytes, metadata, sync state, and user-visible correctness each need separate handling.

File sync is mostly a metadata consistency problem attached to expensive byte movement.
Uploaded chunks should not become user-visible until metadata commits a version.
Offline clients require a durable sync journal, not just push notifications.
Conflict resolution should preserve user work when safety is uncertain.
Version history turns overwrite bugs into recoverable events.
Deletes need tombstones when offline devices exist.
Reconnect storms can turn sync recovery into a production incident.
Reconciliation is part of the product, because metadata, chunks, versions, and indexes can drift.

What to study next

These links keep the session moving: read prerequisites first, then open the systems, concepts, and patterns that deepen this page.

Related Concepts

Core ideas that connect to this topic.

Related Patterns

Reusable architecture moves built from these ideas.

Cursor-Based SyncLearn the reusable move this page points toward.Reconciliation JobLearn the reusable move this page points toward.Retry With Backoff And JitterLearn the reusable move this page points toward.