Patterns

Connection Registry

Track which realtime gateway currently owns each connected user device so delivery workers can route live events without making gateways durable truth.

intermediate4 min readUpdated unknownReliabilityOperationsTradeoffs
Realtime GatewaysPresenceHeartbeatsTTL StateConnection Routing

Concepts Covered

  • Connection ownership
  • Gateway routing
  • Device presence
  • TTL-based liveness
  • Heartbeat refresh
  • Stale registry entries
  • Reconnect storms
  • Offline fallback

1. Intent

The Connection Registry pattern lets backend workers discover where a connected device currently lives.

In a realtime messaging system, clients connect to a fleet of gateway servers. A delivery worker trying to push a message to device d_42 needs an answer to a simple question:

Which gateway currently owns device d_42?

The registry stores that temporary mapping. It makes realtime delivery possible without forcing every delivery worker to know about every open socket.

2. The Problem Without This Pattern

Without a connection registry, delivery workers have bad options.

They can broadcast every outbound message to every gateway and let each gateway check whether it owns the device. That wastes network and CPU.

They can require all messages for a user to land on one gateway. That creates sticky routing, difficult failover, and hot spots.

They can store socket ownership only in memory inside each gateway. That is fast locally, but useless to the rest of the system.

A distributed chat system needs a shared but temporary routing index.

3. How The Pattern Works

When a device connects, the gateway writes a registry record:

connection_registry
- device_id: d_42
- user_id: u_7
- gateway_id: gw_18
- connection_id: conn_991
- region: pk-south-1
- expires_at: now + 45 seconds

The record is refreshed by heartbeats. If the gateway dies or the device disappears, the record expires automatically.

Delivery flow:

1. Delivery worker receives a message delivery task.
2. Worker looks up recipient device in the registry.
3. If a fresh mapping exists, worker sends the event to that gateway.
4. Gateway pushes the event over the live connection.
5. If the lookup misses or push fails, worker falls back to offline delivery.

The registry is a hint, not a guarantee. The device can disconnect between lookup and push.

4. When To Use It

Use this pattern when:

  • clients maintain long-lived connections
  • delivery workers are separate from gateway servers
  • users can reconnect to different gateway instances
  • the system needs low-latency push to online clients
  • connection ownership changes often
  • live delivery must fall back to offline sync

It is common in chat, collaborative editors, multiplayer presence systems, notification platforms, and live dashboards.

5. When Not To Use It

This pattern may be unnecessary when:

  • the product uses simple polling
  • one server handles all connections
  • realtime delivery is not required
  • connection count is low enough for direct routing
  • stale routing would create unacceptable side effects and no fallback exists

It should not become the durable source of message truth. If registry state is lost, clients should still recover through sync.

6. Data And Operational Model

The registry is usually stored in a fast, TTL-capable store.

Operators should monitor:

  • active registry entries
  • stale entry rate
  • lookup latency
  • push failures after successful lookup
  • heartbeat refresh rate
  • reconnect rate
  • gateway imbalance
  • expired connection count
  • registry write throughput

Write volume can be high because heartbeats refresh many entries. The design should avoid refreshing too frequently, but not so slowly that presence becomes stale for too long.

Common controls:

  • TTL longer than heartbeat interval
  • gateway-side connection draining
  • stale-entry fallback to offline delivery
  • per-gateway connection limits
  • reconnect rate limiting

7. Failure Modes

  • A stale entry points delivery to a dead gateway.
  • A gateway crashes before deleting its registry entries.
  • Heartbeats overload the registry store.
  • Reconnect storms create write spikes.
  • Registry lookup succeeds, but the device disconnects before push.
  • One gateway owns too many connections because load balancing is uneven.
  • Delivery logic treats registry presence as proof of delivery.

8. Tradeoffs

BenefitCost
Enables direct routing to connected devicesAdds a shared ephemeral store
Avoids broadcasting to every gatewayRegistry entries can be stale
Supports horizontal gateway fleetsHeartbeat writes can be expensive
Makes gateway ownership observableRequires offline fallback
Helps presence and routingCan become hot during reconnect storms

Connection registries make realtime delivery efficient, but they must be paired with durable sync because connection state is temporary by nature.

Knowledge links

Use these links to understand what to know first, where this idea appears, and what to study next.

Prerequisites

Read these first if this topic feels unfamiliar.

Used In Systems

System studies where this idea appears in context.

Related Concepts

Core ideas that connect to this topic.

Related Patterns

Reusable architecture moves built from these ideas.