Patterns

Upload-Then-Reference Media

Upload large media outside the message send path, then send a lightweight message that references the stored media object.

intermediate4 min readUpdated unknownDataReliabilityOperationsTradeoffs
Media Message PipelineObject StorageResumable UploadMedia ReferencesOrphan Cleanup

Concepts Covered

  • Media upload separation
  • Object storage
  • Message references
  • Resumable upload
  • Async processing
  • Orphan cleanup
  • Download authorization
  • Placeholder delivery

1. Intent

Upload-Then-Reference Media keeps large files out of the core message send path.

Instead of sending a video, image, or document through the same pipeline as a text message, the client uploads the file to a media service first. The chat message then stores a lightweight reference:

message_type = image
media_object_id = media_123

This keeps messaging focused on durable metadata, ordering, delivery, and receipts.

2. The Problem Without This Pattern

If raw media travels through realtime gateways, message APIs, queues, workers, and message tables, the whole system becomes heavier.

Large files increase:

  • request time
  • memory pressure
  • retry cost
  • queue storage
  • gateway bandwidth
  • database row size
  • failure probability

A weak mobile network can make the send path block for a long time. A video upload failure can look like a message delivery failure even though these are different problems.

The system needs to separate:

store this file

from:

send this message

3. How The Pattern Works

Typical flow:

1. Client requests upload authorization.
2. Media service creates an upload session.
3. Client uploads bytes to object storage or an upload endpoint.
4. Media service stores metadata and optional processing state.
5. Client sends a chat message with media_object_id.
6. Recipients receive the message reference.
7. Recipients download media through authorized URLs or proxies.

If processing is needed, it runs asynchronously:

media_uploaded -> thumbnail_worker -> scan_worker -> transcode_worker

The message can be delivered while thumbnails or previews are still being prepared, depending on product rules.

4. When To Use It

Use this pattern when:

  • messages can contain large files
  • uploads need retry or resume
  • media processing is separate from message delivery
  • files belong in object storage
  • recipients can download media lazily
  • media authorization matters
  • uploads may fail independently from sends

It is common for chat apps, collaboration tools, social platforms, and document-sharing products.

5. When Not To Use It

It may be unnecessary when:

  • payloads are always tiny
  • files are never retained
  • the product only sends small inline metadata
  • upload reliability is not important
  • media does not require processing, authorization, or storage lifecycle

Even then, keeping binary data out of core transactional tables is usually a good instinct.

6. Data And Operational Model

Media object:

media_object
- media_object_id
- owner_user_id
- upload_state
- size_bytes
- content_type
- object_storage_key
- thumbnail_state
- scan_state
- expires_at

Message reference:

message
- message_id
- conversation_id
- message_type
- media_object_id

Operators should monitor:

  • upload success rate
  • upload session age
  • orphaned media count
  • processing queue lag
  • download error rate
  • object storage cost
  • thumbnail and scan failure rate
  • authorization failure rate

Orphan cleanup is part of the pattern. Uploaded media that never becomes referenced by a sent message should expire or be garbage collected.

7. Failure Modes

  • Upload succeeds but message send fails, leaving orphaned media.
  • Message send succeeds but media processing fails.
  • Download authorization expires too early.
  • Clients retry upload and create duplicate media objects.
  • Processing queues lag and previews appear late.
  • Media metadata is deleted while messages still reference it.
  • Raw media accidentally flows through realtime gateways.
  • A placeholder is delivered but the final media never becomes available.

8. Tradeoffs

BenefitCost
Keeps message path lightweightAdds upload sessions and media metadata
Supports resumable uploadsRequires orphan cleanup
Allows async thumbnails and scansRecipients may see placeholders
Reduces gateway and queue pressureDownload authorization becomes its own concern
Separates file storage from message deliveryMore states to reconcile

Upload-then-reference turns media into a storage lifecycle problem attached to messaging, instead of letting media overload the messaging pipeline itself.

Knowledge links

Use these links to understand what to know first, where this idea appears, and what to study next.

Prerequisites

Read these first if this topic feels unfamiliar.

Used In Systems

System studies where this idea appears in context.

Related Concepts

Core ideas that connect to this topic.