Concepts
Metadata Sync
Keep file names, folders, permissions, versions, tombstones, and device cursors coherent while file bytes move through a separate storage path.
After this, you will understand
How Metadata Sync helps you see where this idea appears in production systems, what problem forces it, and how to reason about the tradeoffs.
Treat the idea as a definition to memorize.
Real systems force the idea to handle Metadata Journal, File Tree, and Tombstone.
Use the concept to decide what the system guarantees, what it risks, and what it costs to operate.
Think before readingWhere would Metadata Sync appear in a real production system, and what failure or bottleneck would it help you reason about?
Reading in progress
This page is saved in your local study history so you can continue later.
Concepts Covered
- File metadata
- Folder trees
- Rename and move events
- Tombstones
- Metadata journals
- Device sync cursors
- Permission state
- Derived local indexes
Definition
Metadata sync is the process of keeping file system state coherent across devices without treating the file bytes as the only thing that matters.
File bytes are only one part of a cloud drive product. Users also expect names, folders, moves, deletes, sharing rules, version pointers, and local availability state to converge across devices.
A sync system usually separates:
metadata plane: names, folders, versions, permissions, tombstones
blob plane: chunks, bytes, manifests, storage lifecycle
The metadata plane tells clients what exists and what changed. The blob plane moves the bytes.
The Pain That Forces This Concept
Suppose a user renames design.pdf on a laptop while a phone is offline.
If the phone only scans storage for file bytes later, it may not know whether the file was renamed, moved, deleted, replaced, or duplicated. If a desktop client deletes a folder, other devices need to learn about the delete even though no new bytes were uploaded.
Without metadata sync, products drift in strange ways:
- files reappear after delete
- folders contain stale children
- renamed files duplicate instead of moving
- permissions lag behind shared links
- clients download bytes they no longer need
- offline devices overwrite newer server state
The system needs a durable history of metadata changes.
Mental Model
Think of metadata as a synchronized file tree backed by a journal.
event 101: create file f1 under folder root
event 102: upload version v1 for f1
event 103: rename f1 from draft.md to design.md
event 104: move f1 into folder docs
event 105: delete f1
Each device stores a cursor into that journal. When it reconnects, it asks for metadata events after its last cursor and applies them locally.
How It Works
A metadata sync service typically stores source-of-truth records:
file_node
- file_id
- parent_id
- name
- type
- owner_id
- current_version_id
- deleted_at
- metadata_version
sync_event
- sequence
- file_id
- event_type
- metadata_version
- payload
- created_at
Clients sync in two layers:
1. Pull metadata changes after cursor.
2. Decide which file chunks or versions need download.
This prevents a client from downloading large bytes before it knows whether the file is still visible, renamed, or allowed.
Tradeoffs
Metadata sync has to choose product semantics:
| Question | Why It Matters |
|---|---|
| Is rename an update or delete plus create? | Affects identity and conflicts |
| Are deletes tombstones? | Offline devices need to learn about deletes |
| Is folder ordering meaningful? | Clients may need extra state |
| Are permissions part of the same journal? | Stale permission sync can leak access |
| How long is metadata history retained? | Old devices may need snapshot fallback |
The hardest part is not storing metadata. The hard part is defining what each metadata event means when clients are offline and changes overlap.
Operational Reality
Operators should monitor:
- metadata sync lag per device
- event journal retention windows
- snapshot fallback rate
- tombstone count
- metadata conflict rate
- permission propagation latency
- client apply failures
- reconciliation drift between file tree and version records
Failure modes:
- A delete event expires before an offline device syncs.
- A folder move creates a cycle because validation is weak.
- Permission changes lag behind file visibility.
- Local search indexes drift from server metadata.
- A client applies events out of order.
- Metadata commit succeeds but blob upload never finishes.
Related Topics
What to study next
These links keep the session moving: read prerequisites first, then open the systems, concepts, and patterns that deepen this page.
Prerequisites
Read these first if the mechanics feel unfamiliar.
Used In Systems
System studies where this idea appears in context.
Related Concepts
Core ideas that connect to this topic.
Related Patterns
Reusable architecture moves built from these ideas.