Scaling & Multi-Region

Single-writer, distributed reads

A matching engine is a single-writer system by design. There's one sequencer, one order book, one ledger — this is how every major exchange operates, from Nasdaq to CME. You can't shard a central limit order book across regions without breaking price-time priority or accepting split-brain state.

Olympus doesn't fight this constraint. The matching engine runs in one location. Everything around it — market data, read APIs, WebSocket streaming — can be distributed globally.

Architecture

              Region A (primary)                Region B / C (read replicas)
         +-----------------------+           +---------------------------+
         |  API (reads + writes) |           |  API (reads only)         |
orders > |  Sequencer            |  snapshot |  Snapshot consumer        |
         |  CoreEngine           |  stream > |  WebSocket fan-out        |
         |  Event Processor      |           |  depth / balance / kline  |
         |  RocksDB              |           +---------------------------+
         |  reth (EVM)           |
         |  Bridge / Settlement  |           +---------------------------+
         |                       |  tick log |  Standby (DR)             |
         |                       |  stream > |  Continuous replay        |
         +-----------------------+           |  Warm failover            |
                                             +---------------------------+

Primary instance

Runs the full binary as it exists today: API server, sequencer, core engine, event processor, RocksDB persistence, embedded reth node, bridge and settlement. All writes (order placement, cancellation, admin operations) are handled here.

Two additions for the distributed model:

Snapshot publishing — after each tick, the engine already publishes an EngineSnapshot via ArcSwap for local API reads. The same snapshot gets serialized and published to a message bus (NATS, Redis pub/sub, or Kafka) for remote consumers.
Tick log streaming — the append log write that already happens in the event processor gets mirrored to a durable stream for the standby instance.

Read replicas

Lightweight instances deployed in each target region. They consume the snapshot stream and serve read-only API endpoints:

GET /api/v1/depth — order book depth
GET /api/v1/exchangeInfo — instrument configs
GET /api/v1/ticker/bookTicker — best bid/ask
GET /api/v1/balance — account balances
GET /ws — WebSocket streaming (order book, trades, klines, allMids)

Read replicas don't run a sequencer, engine, or reth node. They receive snapshots, store them in local memory via ArcSwap (the same pattern the primary uses), and serve requests from that. The data is one tick stale plus network latency to the region — the same staleness model the primary's API layer already operates under.

Write requests (POST /api/v1/order, DELETE /api/v1/order, admin endpoints) are either rejected at the replica or proxied to the primary. Proxying is simpler for clients (they hit the same endpoint regardless of region) but adds a round trip.

Warm standby

A second full instance in a different region or availability zone, continuously replaying the tick log stream. It doesn't serve traffic — it just maintains near-current engine state.

On primary failure:

Standby finishes replaying any buffered ticks
Standby starts accepting writes (becomes the new primary)
DNS or load balancer cuts over
Read replicas switch to the new primary's snapshot stream

Recovery time is bounded by how far behind the standby's replay is, which under normal conditions is sub-second. The deterministic replay guarantee means the standby produces identical state from the same tick log — no reconciliation needed.

The embedded reth node would need a separate replication strategy (database snapshot + block replay) since it has its own state. For the initial version, the standby can re-deploy the settlement contract and start fresh — committed batch history is immutable on the primary's chain, and the standby's chain is a new instance.

Implementation phases

Phase 1: WebSocket broadcast optimisation

Status: Done — MarketDataChannels with tokio::broadcast is already in place.

The previous implementation serialized the order book independently for each WebSocket connection. The current architecture uses broadcast channels: a MarketDataPublisher serializes each message once and sends it to all subscribers via tokio::broadcast. This is the prerequisite for handling high connection counts.

Phase 2: Snapshot publishing

Add a snapshot serialization + publish step to the engine thread or event processor:

// After ArcSwap::store(), serialize and publish
let snap_bytes = bincode::serialize(&snapshot)?;
message_bus.publish("olympus.snapshots", snap_bytes).await;

Key decisions:

Format: bincode for speed, or MessagePack for cross-language consumers
Transport: NATS for low latency, Kafka for durability and replay
Frequency: Every tick (1ms default) or throttled (e.g., every 50ms) depending on bandwidth

The snapshot struct (EngineSnapshot) already derives Serialize/Deserialize, so the serialization path exists.

Continuous mode interaction: In continuous matching mode, snapshot publishing is debounced (OLYMPUS_SNAPSHOT_INTERVAL_US, default 500µs). Read replicas consuming the snapshot stream will see updates at this interval rather than per-order. The debounce interval sets the floor for read replica staleness in continuous mode.

Phase 3: Read replica binary

A second binary (or a --read-replica mode of the existing binary) that:

Connects to the message bus and subscribes to snapshot stream
Deserializes snapshots into ArcSwap<EngineSnapshot>
Runs the same Axum API server with read-only routes
Runs the same WebSocket handler with the same MarketDataChannels broadcast

Most of the code is reusable — the API handlers already read from ArcSwap<EngineSnapshot> with no awareness of how the snapshot gets there. The only change is the snapshot source: instead of a local engine thread calling ArcSwap::store(), a consumer task receives snapshots from the message bus.

Phase 4: Tick log streaming + warm standby

Mirror the RocksDB append log writes to a durable stream:

// In the event processor, after append_log.append():
tick_stream.publish("olympus.ticks", &serialized_tick).await;

The standby instance consumes this stream and replays ticks through its own CoreEngine. On failover, it has near-current state and can be promoted.

Phase 5: Write proxying

Read replicas accept write requests and forward them to the primary:

Client (Singapore) > Read Replica (Singapore) > Primary (London) > Sequencer
                                               < Response
                   < Response

This simplifies client configuration (single endpoint per region) at the cost of one additional network hop on writes. The alternative — clients connect directly to the primary for writes — is lower latency but requires clients to know about two endpoints.

Cloud provider requirements

Requirement	Purpose
Persistent block storage (EBS, Persistent Disk)	RocksDB on primary and standby
Message bus (NATS, Kafka, Redis)	Snapshot + tick log streaming
Multi-region compute	Read replicas and standby
DNS failover or global load balancer	Traffic routing, DR cutover
Private network peering between regions	Low-latency snapshot delivery

Railway's single-volume-per-service constraint and lack of cross-region volume sharing make it unsuitable for this model. AWS, GCP, or Azure provide the primitives needed: regional compute, durable messaging, and network peering.

What doesn't change

The core architecture stays the same. The matching engine is still a single-writer on a dedicated thread. The sequencer still assigns monotonic sequences. The event processor still persists to RocksDB and processes bridge instructions. The settlement contract still commits batch roots on-chain.

Scaling is additive — publishing snapshots, adding read replicas, streaming the tick log — not a rewrite. The ArcSwap pattern that makes local API reads lock-free is the same pattern that makes remote API reads work: swap in a new snapshot, serve from it.

Scaling & Multi-Region

On this page