Matching Engine
Deep dive into the matching engine — order book structure, matching algorithm, balance model, determinism, and thread architecture.
Overview
The matching engine is the deterministic core of Olympus. It processes batches of transactions called ticks, executes order matching with price-time priority, manages balances, and produces cryptographic commitments for on-chain verifiability.
The engine sits at the centre of the data pipeline:
API > Sequencer > Engine > TickResult > Persistence / Bridge / WebSocketEvery operation inside the engine is pure and deterministic — given the same sequence of ticks, any instance will produce identical state hashes. This property is what makes the validium architecture possible: the on-chain settlement contract can verify that off-chain execution was correct by checking committed state roots.
For system-level architecture see the Architecture page. This page focuses on the engine internals.
Data Flow
The system supports two matching modes: batch (default) and continuous (feature-gated).
Batch Mode (default)
+-----------+ +------------+ +------------+ +--------+ +------------------+
| REST |--->| Sequencer |--->| Core |--->| Hasher |--->| Event Processor |
| API | | (batching) | | Engine | | Thread | | |
+-----------+ +------------+ +------------+ +--------+ +------------------+
| | |
| | +--> Append Log (RocksDB)
| | +--> Bridge > EVM
| | +--> Block Submitter
| |
| +------+------+
| | ArcSwap |---> API reads (lock-free)
| | Snapshot |
| +-------------+
| |
| +------+------+
| | Broadcast |---> WebSocket subscribers
| | Channel |
| +-------------+- API receives orders via REST and pushes
Transactionobjects to a crossbeam channel. - Sequencer buffers transactions and flushes them as a
Tickat a configurable interval (default 1ms). Each tick gets a monotonically increasing sequence number and a nanosecond timestamp. - Core Engine receives the tick, calls
match_tick()to process all transactions, and produces aMatchResult. Snapshot and market data are published immediately. - Hasher Thread receives the
MatchResultand computes cryptographic commitments (Blake3 state hash + merkle trades root) off the engine hot path. Constructs the finalTickResult. - Event Processor persists the tick to the append log, triggers periodic snapshots, and forwards bridge instructions to the EVM layer.
- ArcSwap Snapshot is published after every tick for lock-free reads by API handlers.
- Broadcast Channel fans out trade data to WebSocket subscribers via
TickMarketData.
Continuous Mode (OLYMPUS_CONTINUOUS_MATCHING=true)
+-----------+ +------------+ +--------+ +------------------+
| REST |--------------------->| Core |--->| Hasher |--->| Event Processor |
| API | (bypasses | Engine | | Thread | | |
+-----------+ sequencer) +------------+ +--------+ +------------------+
|
+------+------+
| Debounced |--> ArcSwap (API reads)
| snapshot |--> Broadcast (WS + persist)
| (500us) |
+-------------+In continuous mode, the engine receives individual Transaction objects directly from the API channel, bypassing the sequencer entirely. Each order is matched immediately via match_order(). Snapshot publishing and broadcast sends are debounced to amortize their cost across multiple orders:
- Snapshot publish fires at a configurable interval (
OLYMPUS_SNAPSHOT_INTERVAL_US, default 500µs) instead of after every order, eliminating per-orderEngineSnapshot::from_engine()overhead. - Market data and persistence broadcasts are accumulated and flushed with the snapshot, reducing per-order
Arcallocations and broadcast channel contention. - Receive loop uses a three-phase strategy: drain all available orders via
try_recv(no syscalls), spin briefly (OLYMPUS_SPIN_ITERSiterations), then fall back to blockingrecv_timeoutwith the remaining commit interval.
A commitment timer (configurable, default 1ms) periodically batches accumulated results into a Tick for hashing, persistence, and bridge processing.
This gives sub-microsecond matching latency at the cost of a crash recovery tradeoff (see Crash Recovery).
Order Book Data Structure
Each instrument has its own OrderBook with three internal structures:
| Structure | Type | Purpose |
|---|---|---|
bids | BTreeMap<Reverse<Price>, VecDeque<Order>> | Buy orders, highest price first |
asks | BTreeMap<Price, VecDeque<Order>> | Sell orders, lowest price first |
index | FxHashMap<OrderId, (Side, Price)> | O(1) lookup for cancellations |
Price-time priority is maintained naturally:
- The
BTreeMapsorts by price (highest bid first viaReverse, lowest ask first). - Within each price level, the
VecDequemaintains FIFO (time) ordering — the first order inserted is the first to be matched.
Complexity
| Operation | Complexity | Notes |
|---|---|---|
| Insert order | O(log P) | BTreeMap insertion where P = number of price levels |
| Cancel order | O(1) lookup + O(N) level scan | Global order→instrument index for O(1) book lookup, then FxHashMap lookup to find (side, price), then linear scan within the VecDeque at that price level |
| Peek best bid/ask | O(1) | BTreeMap first element access |
| Depth query | O(L) | Iterate L price levels |
Matching Algorithm
When an aggressive order arrives, the engine walks the opposite side of the book:
- Buy order walks asks lowest-price-first
- Sell order walks bids highest-price-first
For each resting order at the front of the best price level:
- Price check: buyer must be willing to pay ≥ ask price; seller must accept ≤ bid price.
- Self-trade prevention: if the resting order belongs to the same account as the incoming order, the resting order is cancelled (popped from the book, reserved balance released) and the engine continues to the next resting order. No trade is created. This prevents market makers from consuming their own stale orders when requoting after price movements.
- Fill quantity:
min(incoming_remaining, resting_remaining). - Trade price: always the resting order's price (price improvement for the aggressor).
- Balance settlement: buyer pays quote from reserved, receives base; seller pays base from reserved, receives quote.
- Resting order update: if fully filled, pop from the level; if partially filled, update remaining.
After exhausting all matchable resting orders:
| Order Type | Remaining Quantity Handling |
|---|---|
| Limit | Rests on the book (fill-or-rest) |
| Market | Cancelled, reserved balance released |
| ImmediateOrCancel | Cancelled, reserved balance released |
Balance Model
The ledger tracks balances per (AccountId, InstrumentId) pair using a three-tier model:
+-----------------------------------------------------+
| Total Balance |
| |
| +----------+ +----------+ +----------+ |
| |Available | | Reserved | | Locked | |
| | | | | | | |
| | Free to | | Backing | | Bridge | |
| | trade or | | open | | ops | |
| | withdraw | | orders | | (mint/ | |
| | | | | | burn) | |
| +----------+ +----------+ +----------+ |
+-----------------------------------------------------+State Transitions
| Event | Transition |
|---|---|
| Deposit / Admin Credit | available += amount |
| Place Order | available -= cost → reserved += cost |
| Cancel Order | reserved -= cost → available += cost |
| Fill (resting side) | reserved -= cost, counterparty available += proceeds |
| Lock for Bridge | available -= amount → locked += amount |
| Unlock from Bridge | locked -= amount → available += amount |
Asset identifiers follow a BASE-QUOTE convention derived from the instrument ID. For example, instrument AAPL-USD derives base asset AAPL and quote asset USD. A buyer reserves quote currency; a seller reserves base currency.
Price Improvement
When a buy limit order at price P fills at a better price F (F < P), the reserve was P × qty but settlement is F × qty. The difference (P - F) × qty is released back to available after matching completes. This applies to:
- Fully filled orders — excess =
reserve_amount - total_fill_cost - Partially filled + resting — excess on filled portion released, rest stays reserved for the resting order
- Market/IOC orders — unfilled reserve + price improvement excess both released
Market Order Reserve
Market buy orders reserve based on best_ask × 1.1 (10% slippage buffer). If the order fills at the actual ask price, the 10% excess is released. If no asks exist, the order uses a minimal reserve and cancels immediately.
Safety Checks
The matching engine enforces several invariants to prevent state corruption and DOS attacks:
| Check | When | Behavior |
|---|---|---|
| Duplicate order ID | Before placement | Rejected with DuplicateOrderId — prevents uncancellable phantom orders |
| Max open orders | Before placement | Rejected with TooManyOpenOrders if account has ≥ 1000 resting orders |
| Insufficient balance | Before placement | Rejected with InsufficientBalance — ledger.reserve() fails |
| Instrument halted | Before placement | Rejected with InstrumentHalted |
| Ledger invariant | During settlement | Returns LedgerInvariant error — logged, metrics incremented |
| Self-trade prevention | During matching | Resting order from the same account is cancelled (not filled) |
Determinism
The engine guarantees that the same tick sequence produces the same state_hash. This is enforced by:
- No randomness during matching — all decisions are deterministic given the input.
- No I/O during matching — the engine runs in-memory only.
- No clock reads during matching — timestamps come from the
Tickstruct, assigned by the sequencer before the tick enters the engine. - Deterministic iteration —
BTreeMapiterates in sorted order;VecDequemaintains insertion order.
State Hash Computation
After processing a tick, the engine computes a Blake3 hash over:
Blake3(
tick.sequence (u64, little-endian)
tick.timestamp_ns (u64, little-endian)
for each trade:
trade.sequence (u64, little-endian)
trade.instrument_id (UTF-8 bytes)
trade.price (string bytes)
trade.quantity (string bytes)
tick.transactions.len (u64, little-endian)
)Merkle Trades Root
A binary merkle tree is built over trade leaf hashes. Each leaf is:
Blake3(
trade.sequence | instrument_id | price | quantity |
buyer_account | seller_account | buyer_order_id |
seller_order_id | timestamp_ns
)The merkle root is included in the TickResult and can be committed on-chain. Individual trade inclusion can be proven with a merkle proof.
Crash Recovery
The engine supports crash recovery through snapshots and log replay:
+----------------------------------------------------------+
| Startup Sequence |
| |
| 1. Check for latest snapshot in RocksDB |
| +- Found: deserialize CoreEngine from snapshot |
| +- Not found: start with empty engine |
| |
| 2. Replay ticks from append log |
| +- If snapshot exists: replay from snapshot_seq + 1 |
| +- If no snapshot: replay from sequence 0 |
| |
| 3. Resume sequencer at last_sequence + 1 |
| |
| Result: identical engine state as before crash |
+----------------------------------------------------------+Snapshots are full serializations of the CoreEngine struct (including all order books, the ledger, and bridge state). They are saved to RocksDB periodically (every 1000 ticks by default). The append log stores every tick as MessagePack-encoded binary, keyed by sequence number in big-endian byte order for lexicographic ordering.
Batch Mode Recovery
In batch mode, every tick processed by the engine is sent to the hasher and then to the event processor, which persists it to the append log before any further processing. A crash at any point results in at most one tick of lost work (the tick currently in flight between the engine and the event processor). On recovery, process_tick() replays deterministically and produces identical state.
Continuous Mode Recovery Tradeoffs
In continuous mode, there is a crash window between individual match_order() calls and the next commitment timer fire. During this window:
| What happens | Risk |
|---|---|
Engine state is mutated in memory by match_order() | State changes are lost on crash |
| Trades may have been broadcast to WebSocket subscribers (if the debounce timer fired) | Subscribers may see trades that were never persisted |
| Snapshot (ArcSwap) may reflect post-match state (if the debounce timer fired) | API reads may show state ahead of persistence |
Bounded loss: The crash window is at most one commitment interval (default 1ms). Trades matched in the last interval before a crash are lost. On recovery, the engine replays from the last committed tick, producing a consistent but slightly behind state.
Replay correctness: The committed ticks contain all raw Transaction objects. On recovery, process_tick() re-derives trades, order updates, and bridge instructions deterministically. The replayed state is self-consistent — it just may not include the very last sub-millisecond of activity.
Mitigation (not yet implemented): A per-transaction write-ahead log could persist each Transaction before match_order() is called, closing the crash window entirely. This would add per-order I/O latency, partially negating the benefit of continuous mode.
Thread Model
The engine runs on a dedicated OS thread — not a tokio task. This gives it:
- No preemption from async task scheduling
- Predictable latency without cooperative yield points
- Sole ownership of the
CoreEnginestruct — no mutex needed
A separate hasher thread computes cryptographic commitments (Blake3 state hash + merkle trades root) off the engine hot path. This means the engine can begin processing the next tick immediately after matching, without waiting for hash computation.
Communication Channels
| Channel | Type | Direction | Purpose |
|---|---|---|---|
tick_receiver | crossbeam::unbounded | Sequencer → Engine | Incoming ticks (batch mode) |
tx_receiver | crossbeam::unbounded | API → Engine | Incoming transactions (continuous mode) |
hash_sender | crossbeam::unbounded | Engine → Hasher | PendingHash for commitment |
result_sender | crossbeam::unbounded | Hasher → Event Processor | ProcessedTick output |
ArcSwap<EngineSnapshot> | Lock-free atomic pointer | Engine → API | Snapshot for reads |
tokio::broadcast | Broadcast channel | Engine → WS layer | TickMarketData (batched per snapshot interval in continuous mode) |
tokio::broadcast | Broadcast channel | Engine → Persistence | TickPersistenceData (batched per snapshot interval in continuous mode) |
crossbeam crossbeam crossbeam
Sequencer -----------------> Engine -----------------> Hasher ----------> Event Processor
(tokio task) (OS thread) (OS thread) (tokio task)
|
+--> ArcSwap (lock-free snapshot)
|
+--> broadcast (market data)Lock-free reads via ArcSwap: API handlers call snapshot.load() which returns an Arc<EngineSnapshot>. In batch mode, reads may be one tick stale. In continuous mode, reads may be up to OLYMPUS_SNAPSHOT_INTERVAL_US (default 500µs) stale due to debounced snapshot publishing.
CPU Pinning
Both the engine and hasher threads can be pinned to dedicated CPU cores to avoid cache invalidation from OS core migration:
| Env Var | Thread | Purpose |
|---|---|---|
OLYMPUS_ENGINE_CORE | Engine thread | Pin matching to a dedicated core |
OLYMPUS_HASHER_CORE | Hasher thread | Pin hash computation to a dedicated core |
Tick Lifecycle
Batch Mode
- API handler receives an order request and pushes a
Transactionto the sequencer channel. - Sequencer buffers transactions in a
Vec<Transaction>. - At each tick interval, the sequencer flushes the buffer into a
Tick:Tick { sequence: u64, // monotonically increasing timestamp_ns: u64, // nanoseconds since Unix epoch transactions: Vec<Transaction>, } - The tick is sent to the engine thread via crossbeam channel.
- Engine calls
match_tick(&tick)which iterates all transactions and returns aMatchResult. Snapshot and market data are published immediately. - The
MatchResultis sent to the hasher thread, which computesstate_hashandtrades_rootand produces the finalTickResult.
Continuous Mode
- API handler pushes a
Transactiondirectly to the engine channel. - Engine drains all available orders via
try_recvin a tight loop — no timer checks or syscalls during the drain. - Each order is matched immediately via
match_order(). Results are accumulated into broadcast buffers (stamped trades for market data, order events for persistence) and commitment buffers (un-stamped trades, order updates, bridge instructions). - After the drain exhausts, the engine checks the snapshot debounce timer (
OLYMPUS_SNAPSHOT_INTERVAL_US, default 500µs). If dirty, it publishes a newEngineSnapshotviaArcSwapand flushes batched market data + persistence broadcasts. - A commitment timer (
OLYMPUS_TICK_INTERVAL_MS, default 1ms) batches accumulated transactions into aTick, which is sent to the hasher thread for commitment computation. - If no orders are available, the engine spins briefly (
OLYMPUS_SPIN_ITERSiterations ofspin_loop()) before falling back to a blockingrecv_timeoutwith the remaining commit interval.
In continuous mode, ticks represent commitment boundaries rather than batch boundaries. The matching happens order-by-order, but snapshot publishing, broadcast sends, and hashing happen in periodic batches.
Transaction Variants
| Variant | Description |
|---|---|
PlaceOrder | Submit a new order (limit, market, or IOC) |
CancelOrder | Cancel an existing resting order |
CancelReplace | Atomic cancel-and-replace (not yet implemented) |
LockAsset | Lock balance for bridge withdrawal (generates mint instruction) |
UnlockAsset | Unlock balance after bridge deposit (burn confirmation) |
AdminCreditAccount | Credit an account with assets (deposits) |
AdminAddInstrument | Register a new trading instrument |
AdminHaltInstrument | Halt trading on an instrument |
AdminResumeInstrument | Resume trading on a halted instrument |
AdminPauseBridge | Pause all bridge operations |
AdminResumeBridge | Resume bridge operations |
Instrument Management
Each instrument is defined by an InstrumentConfig:
InstrumentConfig {
instrument_id: InstrumentId, // e.g. "AAPL-USD"
price_scale: u8, // decimal places for prices (from tick_size)
qty_scale: u8, // decimal places for quantities (from lot_size)
tick_raw: i64, // tick_size in raw units at price_scale
lot_raw: i64, // lot_size in raw units at qty_scale
qty_divisor: i64, // 10^qty_scale, precomputed for price*qty
max_order_size_raw: i64, // max order size in raw units at qty_scale
status: InstrumentStatus, // Active or Halted
base_asset: Option<InstrumentId>, // precomputed base asset (e.g. "AAPL")
quote_asset: Option<InstrumentId>, // precomputed quote asset (e.g. "USD")
}Validation
- Price validation:
Price::new_raw(value, tick_raw)ensures the price is a positive exact multiple oftick_raw. - Quantity validation:
Quantity::new(value, lot_size)ensures the quantity is a positive exact multiple oflot_size. - Halted instruments: orders placed on a halted instrument are rejected.
Asset Derivation
Instrument IDs follow the BASE-QUOTE convention. Base and quote assets are precomputed at instrument registration via InstrumentConfig::with_derived_assets() and stored in the config's base_asset and quote_asset fields. This eliminates per-order string splitting on the hot path.
"AAPL-USD" > base: "AAPL", quote: "USD"
"ETH-USD" > base: "ETH", quote: "USD"This derivation determines which asset is reserved when placing an order:
- Buy order: reserves
price × quantityof the quote asset - Sell order: reserves
quantityof the base asset