Patterns
Connection Registry
Track which realtime gateway currently owns each connected user device so delivery workers can route live events without making gateways durable truth.
Concepts Covered
- Connection ownership
- Gateway routing
- Device presence
- TTL-based liveness
- Heartbeat refresh
- Stale registry entries
- Reconnect storms
- Offline fallback
1. Intent
The Connection Registry pattern lets backend workers discover where a connected device currently lives.
In a realtime messaging system, clients connect to a fleet of gateway servers. A delivery worker trying to push a message to device d_42 needs an answer to a simple question:
Which gateway currently owns device d_42?
The registry stores that temporary mapping. It makes realtime delivery possible without forcing every delivery worker to know about every open socket.
2. The Problem Without This Pattern
Without a connection registry, delivery workers have bad options.
They can broadcast every outbound message to every gateway and let each gateway check whether it owns the device. That wastes network and CPU.
They can require all messages for a user to land on one gateway. That creates sticky routing, difficult failover, and hot spots.
They can store socket ownership only in memory inside each gateway. That is fast locally, but useless to the rest of the system.
A distributed chat system needs a shared but temporary routing index.
3. How The Pattern Works
When a device connects, the gateway writes a registry record:
connection_registry
- device_id: d_42
- user_id: u_7
- gateway_id: gw_18
- connection_id: conn_991
- region: pk-south-1
- expires_at: now + 45 seconds
The record is refreshed by heartbeats. If the gateway dies or the device disappears, the record expires automatically.
Delivery flow:
1. Delivery worker receives a message delivery task.
2. Worker looks up recipient device in the registry.
3. If a fresh mapping exists, worker sends the event to that gateway.
4. Gateway pushes the event over the live connection.
5. If the lookup misses or push fails, worker falls back to offline delivery.
The registry is a hint, not a guarantee. The device can disconnect between lookup and push.
4. When To Use It
Use this pattern when:
- clients maintain long-lived connections
- delivery workers are separate from gateway servers
- users can reconnect to different gateway instances
- the system needs low-latency push to online clients
- connection ownership changes often
- live delivery must fall back to offline sync
It is common in chat, collaborative editors, multiplayer presence systems, notification platforms, and live dashboards.
5. When Not To Use It
This pattern may be unnecessary when:
- the product uses simple polling
- one server handles all connections
- realtime delivery is not required
- connection count is low enough for direct routing
- stale routing would create unacceptable side effects and no fallback exists
It should not become the durable source of message truth. If registry state is lost, clients should still recover through sync.
6. Data And Operational Model
The registry is usually stored in a fast, TTL-capable store.
Operators should monitor:
- active registry entries
- stale entry rate
- lookup latency
- push failures after successful lookup
- heartbeat refresh rate
- reconnect rate
- gateway imbalance
- expired connection count
- registry write throughput
Write volume can be high because heartbeats refresh many entries. The design should avoid refreshing too frequently, but not so slowly that presence becomes stale for too long.
Common controls:
- TTL longer than heartbeat interval
- gateway-side connection draining
- stale-entry fallback to offline delivery
- per-gateway connection limits
- reconnect rate limiting
7. Failure Modes
- A stale entry points delivery to a dead gateway.
- A gateway crashes before deleting its registry entries.
- Heartbeats overload the registry store.
- Reconnect storms create write spikes.
- Registry lookup succeeds, but the device disconnects before push.
- One gateway owns too many connections because load balancing is uneven.
- Delivery logic treats registry presence as proof of delivery.
8. Tradeoffs
| Benefit | Cost |
|---|---|
| Enables direct routing to connected devices | Adds a shared ephemeral store |
| Avoids broadcasting to every gateway | Registry entries can be stale |
| Supports horizontal gateway fleets | Heartbeat writes can be expensive |
| Makes gateway ownership observable | Requires offline fallback |
| Helps presence and routing | Can become hot during reconnect storms |
Connection registries make realtime delivery efficient, but they must be paired with durable sync because connection state is temporary by nature.
9. Related Systems And Concepts
Knowledge links
Use these links to understand what to know first, where this idea appears, and what to study next.
Prerequisites
Read these first if this topic feels unfamiliar.
Used In Systems
System studies where this idea appears in context.
Related Concepts
Core ideas that connect to this topic.
Related Patterns
Reusable architecture moves built from these ideas.