Olympus
Performance

Benchmarks

FMI benchmark suite — what each benchmark measures, how to run it, and how it maps to regulatory performance concerns.

FMI Benchmark Suite

The benchmark suite uses Criterion.rs to measure every metric a Financial Market Infrastructure would care about: matching latency, throughput, order book scaling, crash recovery time, persistence overhead, and snapshot construction cost.

Benchmarks are compiled as separate binaries via [dev-dependencies] and harness = false. They do not affect the production binary.

Running

cargo bench --workspace                              # all benchmarks
cargo bench -p olympus-core                             # single crate
cargo bench -p olympus-core --bench engine              # single bench file
cargo bench --bench engine -- "tick_vs_book_depth"   # single group by filter
cargo bench --workspace --no-run                     # compile only (CI)

HTML reports are generated at target/criterion/report/index.html.

Benchmark Reference

engine/process_tick

Measures the latency of CoreEngine::process_tick() — processing a single tick (batch of transactions) through the full engine pipeline. This includes order validation, balance reservation, order matching against the book, ledger settlement, Blake3 state hash computation, and merkle root construction over trades.

Parameterised over the number of transactions per tick: 1, 10, 50, 100, 500, 1000.

Shows how tick processing time scales with batch size. Maps directly to the FMI concern of matching latency.

cargo bench -p olympus-core --bench engine -- "engine/process_tick/"

engine/process_tick_with_trades

Same as process_tick but all incoming orders are aggressive (they cross the spread and produce trades). This exercises the full matching loop including fill settlement, partial fill handling, and merkle tree construction over trade hashes.

Parameterised over the number of trades per tick: 1, 10, 50, 100.

Isolates the cost of the matching + settlement path versus resting-only ticks.

cargo bench -p olympus-core --bench engine -- "engine/process_tick_with_trades"

engine/tick_vs_book_depth

Measures tick processing time as a function of how many resting orders already sit on the book. The tick itself is small (10 orders, no trades) — the variable is background book depth.

Parameterised over resting order count: 10, 100, 1,000, 5,000, 10,000.

Answers the FMI question: does matching latency degrade as book depth grows?

cargo bench -p olympus-core --bench engine -- "engine/tick_vs_book_depth"

engine/sustained_throughput

Processes 1,000 consecutive ticks of 10 orders each (10,000 total orders) in a single measurement. Reports throughput in elements per second rather than per-tick latency.

Answers the FMI question: what is the maximum sustained order rate?

cargo bench -p olympus-core --bench engine -- "engine/sustained_throughput"

continuous/match_order

Measures continuous-mode order matching throughput using Criterion. Each order is matched individually (no tick batching). Parameterised over batch sizes: 1, 10, 50, 100, 500, 1,000 orders.

Shows how throughput scales with batch size in continuous mode — reaches 3.9M orders/sec sustained at 1,000 orders.

cargo bench -p olympus-core --bench engine_continuous -- "continuous/match_order/"

continuous/match_order_with_trades

Same as above but all orders produce trades. Parameterised over trade counts: 1, 10, 50, 100.

Isolates the overhead of fill settlement in continuous mode — reaches 2.2M orders/sec at 100 trades.

cargo bench -p olympus-core --bench engine_continuous -- "continuous/match_order_with_trades"

continuous/order_vs_book_depth

Measures continuous-mode matching latency as a function of resting book depth. Parameterised over resting orders: 10, 100, 1,000, 5,000, 10,000.

Answers the FMI question: does continuous matching degrade as book depth grows?

cargo bench -p olympus-core --bench engine_continuous -- "continuous/order_vs_book_depth"

continuous/sustained_throughput

Processes 10,000 consecutive orders in continuous mode. Reports sustained throughput in elements per second.

Currently measures 3.9M orders/sec sustained.

cargo bench -p olympus-core --bench engine_continuous -- "continuous/sustained_throughput"

orderbook/add_order

Measures the time to insert a single order into a pre-filled order book. The book is populated with varying numbers of distinct price levels to show how insertion scales with book size.

Parameterised over existing price levels: 10, 100, 500, 1,000, 5,000.

cargo bench -p olympus-core --bench orderbook -- "orderbook/add_order"

orderbook/cancel_order

Measures the time to cancel a single resting order from the middle of the book. Uses the OrderId index for O(1) lookup, then removes from the price level's FIFO queue.

Parameterised over existing price levels: 10, 100, 1,000, 5,000.

cargo bench -p olympus-core --bench orderbook -- "orderbook/cancel_order"

orderbook/pop_best

Measures the time to pop the front-of-queue order from the best bid or best ask level. This is the hot path during order matching — every fill calls pop_best_ask or pop_best_bid.

Parameterised over price levels: 10, 100, 1,000. Measured separately for bid and ask sides.

cargo bench -p olympus-core --bench orderbook -- "orderbook/pop_best"

orderbook/bid_depth and orderbook/ask_depth

Measures the cost of computing aggregated depth (price, total quantity) across all levels. This is called during snapshot construction and by API depth queries.

Parameterised over price levels: 10, 100, 500, 1,000.

cargo bench -p olympus-core --bench orderbook -- "orderbook/bid_depth"
cargo bench -p olympus-core --bench orderbook -- "orderbook/ask_depth"

matching/place_order_no_match

Measures MatchingEngine::place_order() for a limit order that does not cross the spread — it validates, reserves balance, scans the book, finds no match, and rests.

Parameterised over resting order count: 0, 100, 1,000, 5,000.

Shows the baseline cost of order placement without matching.

cargo bench -p olympus-core --bench matching -- "matching/place_order_no_match"

matching/place_order_full_match

Measures place_order() for a limit order that immediately matches against the best resting order. Includes balance settlement and trade creation.

Parameterised over resting order count: 10, 100, 1,000.

Shows the cost of a single fill through the matching engine.

cargo bench -p olympus-core --bench matching -- "matching/place_order_full_match"

matching/cancel_order

Measures MatchingEngine::cancel_order() which uses the global order→instrument index for O(1) lookup of the target book by OrderId, then verifies account ownership, removes the order from the book, and releases reserved balance.

Parameterised over instrument count: 1, 5, 10.

Cancel latency is now constant regardless of the number of active instruments, thanks to the global index.

cargo bench -p olympus-core --bench matching -- "matching/cancel_order"

merkle/from_leaves

Measures construction of a complete binary Blake3 merkle tree from a set of leaf hashes. This runs after every tick to compute the trades_root commitment.

Parameterised over leaf count: 1, 10, 50, 100, 500, 1,000.

Maps to the FMI concern of merkle root cost — how much the cryptographic commitment adds to tick latency.

cargo bench -p olympus-core --bench merkle -- "merkle/from_leaves"

merkle/root

Measures the cost of reading the root hash from an already-constructed tree. Expected to be near-zero (single array lookup).

Parameterised over tree size: 10, 100, 1,000 leaves.

cargo bench -p olympus-core --bench merkle -- "merkle/root"

merkle/hash_trade

Measures the cost of hashing a single Trade struct into a 32-byte leaf using Blake3. This is called once per trade per tick before tree construction.

cargo bench -p olympus-core --bench merkle -- "merkle/hash_trade"

merkle/proof

Measures generation of a merkle inclusion proof for a leaf at a given index. Proof generation walks from the leaf to the root collecting sibling hashes.

Parameterised over tree size: 10, 100, 1,000 leaves.

cargo bench -p olympus-core --bench merkle -- "merkle/proof"

snapshot/from_engine

Measures EngineSnapshot::from_engine() — constructing a read-only snapshot of the full engine state. This extracts all instrument configs, up to 1,000 levels of order book depth per instrument, and all account balances.

Parameterised over (instruments, resting orders): (1, 100), (1, 1K), (1, 5K), (5, 1K), (10, 500).

Maps to the FMI concern of snapshot cost — how long the engine thread blocks to publish state for API readers via ArcSwap.

cargo bench -p olympus-core --bench snapshot -- "snapshot/from_engine"

snapshot/engine_serialize

Measures serializing the entire CoreEngine to a binary blob (MessagePack). This is the persistence path — the engine is serialized to produce a checkpoint that can be written to RocksDB.

Parameterised over resting order count: 100, 1,000, 5,000.

cargo bench -p olympus-core --bench snapshot -- "snapshot/engine_serialize"

snapshot/engine_deserialize

Measures deserializing a CoreEngine from a binary blob. This is the crash recovery hot path — on startup, the engine is restored from the latest persisted snapshot.

Parameterised over resting order count: 100, 1,000, 5,000.

cargo bench -p olympus-core --bench snapshot -- "snapshot/engine_deserialize"

store/append_write

Measures the latency of appending a single tick to the RocksDB append-only log. Each tick is serialized to MessagePack and written to the sequencer_log column family.

Parameterised over transactions per tick: 1, 10, 50, 100.

Maps to the FMI concern of persistence latency — the overhead of durable storage on the hot path.

cargo bench -p olympus-store --bench persistence -- "store/append_write"

store/replay

Measures replaying all ticks from the append log starting at sequence 0. Each entry is deserialized from MessagePack back into a Tick struct.

Parameterised over stored tick count: 100, 1,000, 5,000.

Maps to the FMI concern of replay speed — how fast the system can rebuild state from the transaction log after a crash.

cargo bench -p olympus-store --bench persistence -- "store/replay"

store/snapshot (save and load)

Measures saving and loading raw byte blobs to the RocksDB snapshot column family. Tested with synthetic payloads of varying size.

Parameterised over payload size: 10 KB, 100 KB, 1 MB.

cargo bench -p olympus-store --bench persistence -- "store/snapshot"

sequencer/submit

Measures the throughput of Sequencer::submit() — pushing a single transaction into the pending buffer. This is a Vec::push internally, so it benchmarks the allocation and serialization overhead.

cargo bench -p olympus-sequencer --bench sequencer -- "sequencer/submit"

sequencer/flush

Measures Sequencer::flush() — draining the pending buffer into a Tick and sending it over the crossbeam channel. The variable is how many transactions are batched.

Parameterised over batch size: 1, 10, 50, 100, 500, 1,000.

Maps to the FMI concern of sequencer throughput.

cargo bench -p olympus-sequencer --bench sequencer -- "sequencer/flush"

sequencer/submit_flush_cycle

Measures a full cycle: 100 submits followed by a single flush. This is the realistic usage pattern — transactions accumulate between tick intervals, then flush as a batch.

cargo bench -p olympus-sequencer --bench sequencer -- "sequencer/submit_flush_cycle"

integration/crash_recovery

Measures the full crash recovery path: deserialize an engine snapshot from binary, then replay N ticks on top of it. This is the end-to-end cold start time.

Parameterised over replay tick count: 0, 100, 500, 1,000.

Maps directly to the FMI concern of crash recovery time.

cargo bench --bench integration -- "integration/crash_recovery"

integration/full_pipeline

Measures a complete request lifecycle: submit 10 transactions to the sequencer, flush to produce a tick, process the tick through the engine, construct a snapshot, and serialize the engine for persistence.

This is the most realistic single-tick benchmark — it exercises every component in sequence.

cargo bench --bench integration -- "integration/full_pipeline"

integration/multi_instrument

Measures processing a single tick containing 20 orders spread across 5 different instruments. Shows how the engine scales when multiple order books are active simultaneously.

cargo bench --bench integration -- "integration/multi_instrument"

latency (HDR histogram)

Reports p50/p95/p99/p99.9 latency percentiles for ten scenarios: process_tick, match_tick, match_order (resting), match_order (crossing with trade), multi-fill sweeps (1/5/10/25/50 fills), and compute_commitments. Uses HdrHistogram instead of Criterion for true latency distributions.

See the Latency page for detailed results and methodology.

cargo bench -p olympus-core --bench latency

FMI Metrics Mapping

FMI ConcernBenchmark
Matching latencyengine/process_tick, matching/place_order_*, latency, continuous/match_order
Latency percentiles (p99, p99.9)latency
Max orders/secengine/sustained_throughput, continuous/sustained_throughput
Book depth impactengine/tick_vs_book_depth, continuous/order_vs_book_depth
Replay speedstore/replay, integration/crash_recovery
Snapshot cost (engine thread blocking)snapshot/from_engine
Merkle root costmerkle/from_leaves
Persistence latencystore/append_write
Sequencer throughputsequencer/flush, sequencer/submit_flush_cycle
Crash recovery timeintegration/crash_recovery
Full pipeline latencyintegration/full_pipeline

On this page