Continuous Mode Tuning
Environment variables, deployment profiles, and optimization details for continuous matching mode.
Overview
Continuous matching mode (OLYMPUS_CONTINUOUS_MATCHING=true) matches orders individually on arrival instead of batching into ticks. The hot path is optimized with three techniques:
- Debounced snapshot publishing —
EngineSnapshot::from_engine()runs at a configurable interval instead of per-order - Batched broadcast sends — market data and persistence broadcasts are accumulated and flushed with the snapshot
- Three-phase receive loop — drain via
try_recv, spin briefly, then fall back to blockingrecv_timeout
+--------------------------------------------------------------+
| Engine Thread Loop |
| |
| Phase 1: Drain --> try_recv() in tight loop (no syscalls) |
| | |
| v |
| Phase 2: Timers --> debounced snapshot + commit check |
| | |
| v |
| Phase 3: Spin --> spin_loop() x OLYMPUS_SPIN_ITERS |
| | |
| v |
| Phase 4: Block --> recv_timeout(remaining commit interval) |
+--------------------------------------------------------------+Architecture: Decoupled Matching and Market Data
Olympus follows the same architecture as production exchanges like Binance: the matching engine and market data feed operate on independent cadences.
┌─────────────────────────────────────────────────────┐
│ Matching Engine (event-driven) │
│ Each order matched immediately on arrival (~µs) │
│ Results accumulated in memory │
├─────────────────────────────────────────────────────┤
│ Snapshot Publisher (OLYMPUS_SNAPSHOT_INTERVAL_US) │
│ Publishes ArcSwap snapshot for REST API reads │
├─────────────────────────────────────────────────────┤
│ Market Data Feed (OLYMPUS_MARKET_DATA_INTERVAL_MS) │
│ Batches depth diffs + mids → WS broadcast │
│ Trades + account updates sent per-tick (no batching)│
├─────────────────────────────────────────────────────┤
│ Commitment (OLYMPUS_TICK_INTERVAL_MS) │
│ Batches matched results → hash chain + persistence │
└─────────────────────────────────────────────────────┘Matching is event-driven in continuous mode — each order is processed immediately via try_recv(). There is no "tick interval" for matching; the engine processes orders as fast as they arrive.
Market data is published on a fixed timer, independent of matching speed. This prevents WS channel overflow when the engine processes thousands of orders per second. Binance uses 100ms/1000ms intervals; Olympus defaults to 100ms.
Persistence (commitment batching) controls how often matched results are hashed and written to RocksDB. Wider batches reduce I/O overhead but increase the crash window (data loss on unclean shutdown).
Environment Variable Reference
| Variable | Default | Description | Tradeoff |
|---|---|---|---|
OLYMPUS_CONTINUOUS_MATCHING | false | Enable continuous (event-driven) matching | Crash window vs latency |
OLYMPUS_SNAPSHOT_INTERVAL_US | 500 | Microseconds between snapshot publishes (REST API freshness) | API staleness vs CPU overhead |
OLYMPUS_MARKET_DATA_INTERVAL_MS | 100 | Milliseconds between WS market data broadcasts (depth diffs, mids) | UI freshness vs WS throughput |
OLYMPUS_WS_CHANNEL_CAPACITY | 512 | Broadcast channel buffer size for WS messages | Memory vs lag tolerance |
OLYMPUS_TICK_INTERVAL_MS | 1 | Commitment batch interval — persistence + hash chain (ms) | Crash window vs I/O overhead |
OLYMPUS_SPIN_ITERS | 256 | Spin iterations before blocking on the channel | CPU usage vs wake-up latency |
OLYMPUS_EVM_BLOCK_TIME_MS | 1000 | EVM block production interval | Block explorer update rate |
OLYMPUS_ENGINE_CORE | (unset) | Pin engine thread to CPU core | Latency consistency |
OLYMPUS_HASHER_CORE | (unset) | Pin hasher thread to CPU core | Hash throughput |
What does each interval control?
SNAPSHOT_INTERVAL_US— how fresh the REST API is (depth, bookTicker, balance). Lower = more current reads, higher CPU.MARKET_DATA_INTERVAL_MS— how often WS clients receive order book updates. 100ms = 10 updates/sec (industry standard). Lower = faster UI, higher WS bandwidth.TICK_INTERVAL_MS— how often matched results are persisted. Does NOT affect matching speed. Higher = less I/O, larger crash window.
Deployment Profiles
Cloud / Railway
Shared CPU cores, network-attached storage. Save CPU, accept higher latency.
OLYMPUS_CONTINUOUS_MATCHING=true
OLYMPUS_SNAPSHOT_INTERVAL_US=1000
OLYMPUS_MARKET_DATA_INTERVAL_MS=100
OLYMPUS_SPIN_ITERS=0
OLYMPUS_TICK_INTERVAL_MS=100
OLYMPUS_EVM_BLOCK_TIME_MS=1000
# No core pinning — shared infrastructureDocker / dev
Local development. Low overhead, good enough latency.
OLYMPUS_CONTINUOUS_MATCHING=true
OLYMPUS_SNAPSHOT_INTERVAL_US=1000
OLYMPUS_MARKET_DATA_INTERVAL_MS=100
OLYMPUS_SPIN_ITERS=0
OLYMPUS_TICK_INTERVAL_MS=1Bare metal / production
Dedicated cores, pinned threads. Balance freshness and overhead.
OLYMPUS_CONTINUOUS_MATCHING=true
OLYMPUS_SNAPSHOT_INTERVAL_US=500
OLYMPUS_MARKET_DATA_INTERVAL_MS=100
OLYMPUS_SPIN_ITERS=256
OLYMPUS_TICK_INTERVAL_MS=1
OLYMPUS_ENGINE_CORE=2
OLYMPUS_HASHER_CORE=3HFT / ultra-low-latency
Aggressive spinning, tight snapshots, wider commit batches.
OLYMPUS_CONTINUOUS_MATCHING=true
OLYMPUS_SNAPSHOT_INTERVAL_US=250
OLYMPUS_MARKET_DATA_INTERVAL_MS=50
OLYMPUS_SPIN_ITERS=1024
OLYMPUS_TICK_INTERVAL_MS=5
OLYMPUS_ENGINE_CORE=2
OLYMPUS_HASHER_CORE=3
# Also: isolcpus=2,3 in kernel boot paramsHFT profile crash window
Setting OLYMPUS_TICK_INTERVAL_MS=5 means up to 5ms of matched orders can be lost on crash. This is an acceptable tradeoff for HFT workloads where latency matters more than durability, but should be documented in operational runbooks.
Optimization Details
Snapshot debounce
Before: EngineSnapshot::from_engine() after every match_order(). This method iterates all instruments, calls bid_depth(1000) and ask_depth(1000) on each book (walking up to 1000 BTreeMap levels, summing VecDeque remaining quantities), and clones the full balance map. At 1000 orders/sec with 4 instruments and deep books, snapshot overhead was 50-200ms/sec.
After: Snapshot publishes at most once per OLYMPUS_SNAPSHOT_INTERVAL_US. At 1000 orders/sec with a 500µs interval, overhead drops to ~2ms/sec. API reads may be up to OLYMPUS_SNAPSHOT_INTERVAL_US stale — at 500µs, this is well within acceptable market data latency for any UI refreshing at ≥10ms intervals.
Metric: snapshot_debounce_batch_size histogram shows how many orders accumulate between snapshot publishes. Higher values indicate the debounce is absorbing more per-order overhead.
Batched broadcasts
Before: tokio::broadcast::send() after every order — each send allocates an Arc, performs atomic reference counting for all subscribers, and may contend with the tokio runtime.
After: Trades and order events are accumulated in Vec buffers and flushed when the snapshot debounce fires. The Vec growth is amortized across orders (no per-order heap allocation once capacity is reached). Broadcast subscribers receive trades in micro-batches (up to OLYMPUS_SNAPSHOT_INTERVAL_US worth) instead of per-order.
Three-phase receive loop
Before: recv_timeout(Duration::from_micros(100)) on every iteration. Even when orders are flowing, each call enters the kernel's clock subsystem (Instant::now() inside the timeout check).
After:
- Phase 1 (drain):
try_recv()in a tight loop. No timer checks, noInstant::now(), no syscalls. During a burst of 1000 orders, the engine processes all of them before checking any timer. - Phase 2 (timers): After the drain exhausts, check the snapshot debounce timer and commitment timer. This runs once per burst, not per order.
- Phase 3 (spin):
std::hint::spin_loop()×OLYMPUS_SPIN_ITERS. Each iteration is ~1ns. At 256 iterations, this is ~256ns of CPU time before giving up. Catches orders that arrive just after the drain. - Phase 4 (block):
recv_timeoutwith the remaining commitment interval. Only reached when the engine is genuinely idle.
Reduced cloning
Before: Per-order Vec::clone() on single.trades, single.order_updates, and single.bridge_instructions to accumulate into commitment buffers.
After: Vec::append(&mut single.trades) moves elements without allocating. The only remaining clone is for trade stamping (market data needs timestamp_ns set, commitment needs un-stamped trades), which is one clone per trade rather than a full Vec allocation. Additionally, InstrumentId clones are now stack-local (CompactString, 24-byte memcpy) rather than heap allocations, and ledger operations no longer clone InstrumentId per call thanks to the nested map structure (AccountId → InstrumentId → AccountBalance), further reducing per-order overhead.
Monitoring
The following metrics are specific to continuous mode:
| Metric | Type | What to watch |
|---|---|---|
continuous_orders_matched_total | Counter | Order throughput in continuous mode |
snapshot_debounce_batch_size | Histogram | Orders per snapshot publish — higher means better amortization |
snapshot_publish_latency_ns | Histogram | Per-snapshot cost — should be stable regardless of order rate |
matching_latency_ns | Histogram | Per-tick matching (batch mode only — not recorded in continuous mode) |