Home / Input Output / ouroboros-leios-sim
Apr 04, 6-7 PM (0)
Apr 04, 7-8 PM (0)
Apr 04, 8-9 PM (0)
Apr 04, 9-10 PM (0)
Apr 04, 10-11 PM (0)
Apr 04, 11-12 AM (0)
Apr 05, 12-1 AM (0)
Apr 05, 1-2 AM (0)
Apr 05, 2-3 AM (0)
Apr 05, 3-4 AM (0)
Apr 05, 4-5 AM (0)
Apr 05, 5-6 AM (0)
Apr 05, 6-7 AM (0)
Apr 05, 7-8 AM (0)
Apr 05, 8-9 AM (0)
Apr 05, 9-10 AM (0)
Apr 05, 10-11 AM (0)
Apr 05, 11-12 PM (0)
Apr 05, 12-1 PM (0)
Apr 05, 1-2 PM (0)
Apr 05, 2-3 PM (0)
Apr 05, 3-4 PM (0)
Apr 05, 4-5 PM (0)
Apr 05, 5-6 PM (0)
Apr 05, 6-7 PM (0)
Apr 05, 7-8 PM (0)
Apr 05, 8-9 PM (0)
Apr 05, 9-10 PM (0)
Apr 05, 10-11 PM (0)
Apr 05, 11-12 AM (0)
Apr 06, 12-1 AM (0)
Apr 06, 1-2 AM (0)
Apr 06, 2-3 AM (0)
Apr 06, 3-4 AM (0)
Apr 06, 4-5 AM (0)
Apr 06, 5-6 AM (0)
Apr 06, 6-7 AM (0)
Apr 06, 7-8 AM (0)
Apr 06, 8-9 AM (0)
Apr 06, 9-10 AM (0)
Apr 06, 10-11 AM (0)
Apr 06, 11-12 PM (0)
Apr 06, 12-1 PM (0)
Apr 06, 1-2 PM (0)
Apr 06, 2-3 PM (0)
Apr 06, 3-4 PM (0)
Apr 06, 4-5 PM (0)
Apr 06, 5-6 PM (0)
Apr 06, 6-7 PM (0)
Apr 06, 7-8 PM (0)
Apr 06, 8-9 PM (0)
Apr 06, 9-10 PM (0)
Apr 06, 10-11 PM (0)
Apr 06, 11-12 AM (0)
Apr 07, 12-1 AM (0)
Apr 07, 1-2 AM (0)
Apr 07, 2-3 AM (0)
Apr 07, 3-4 AM (0)
Apr 07, 4-5 AM (0)
Apr 07, 5-6 AM (0)
Apr 07, 6-7 AM (0)
Apr 07, 7-8 AM (0)
Apr 07, 8-9 AM (2)
Apr 07, 9-10 AM (1)
Apr 07, 10-11 AM (0)
Apr 07, 11-12 PM (0)
Apr 07, 12-1 PM (0)
Apr 07, 1-2 PM (0)
Apr 07, 2-3 PM (0)
Apr 07, 3-4 PM (0)
Apr 07, 4-5 PM (0)
Apr 07, 5-6 PM (0)
Apr 07, 6-7 PM (0)
Apr 07, 7-8 PM (0)
Apr 07, 8-9 PM (0)
Apr 07, 9-10 PM (0)
Apr 07, 10-11 PM (0)
Apr 07, 11-12 AM (0)
Apr 08, 12-1 AM (0)
Apr 08, 1-2 AM (0)
Apr 08, 2-3 AM (0)
Apr 08, 3-4 AM (0)
Apr 08, 4-5 AM (0)
Apr 08, 5-6 AM (0)
Apr 08, 6-7 AM (0)
Apr 08, 7-8 AM (0)
Apr 08, 8-9 AM (0)
Apr 08, 9-10 AM (0)
Apr 08, 10-11 AM (0)
Apr 08, 11-12 PM (0)
Apr 08, 12-1 PM (1)
Apr 08, 1-2 PM (0)
Apr 08, 2-3 PM (2)
Apr 08, 3-4 PM (1)
Apr 08, 4-5 PM (0)
Apr 08, 5-6 PM (0)
Apr 08, 6-7 PM (0)
Apr 08, 7-8 PM (0)
Apr 08, 8-9 PM (0)
Apr 08, 9-10 PM (0)
Apr 08, 10-11 PM (0)
Apr 08, 11-12 AM (0)
Apr 09, 12-1 AM (0)
Apr 09, 1-2 AM (0)
Apr 09, 2-3 AM (0)
Apr 09, 3-4 AM (0)
Apr 09, 4-5 AM (0)
Apr 09, 5-6 AM (0)
Apr 09, 6-7 AM (0)
Apr 09, 7-8 AM (0)
Apr 09, 8-9 AM (0)
Apr 09, 9-10 AM (0)
Apr 09, 10-11 AM (1)
Apr 09, 11-12 PM (1)
Apr 09, 12-1 PM (0)
Apr 09, 1-2 PM (1)
Apr 09, 2-3 PM (1)
Apr 09, 3-4 PM (1)
Apr 09, 4-5 PM (5)
Apr 09, 5-6 PM (0)
Apr 09, 6-7 PM (0)
Apr 09, 7-8 PM (0)
Apr 09, 8-9 PM (0)
Apr 09, 9-10 PM (0)
Apr 09, 10-11 PM (0)
Apr 09, 11-12 AM (0)
Apr 10, 12-1 AM (0)
Apr 10, 1-2 AM (0)
Apr 10, 2-3 AM (0)
Apr 10, 3-4 AM (0)
Apr 10, 4-5 AM (0)
Apr 10, 5-6 AM (0)
Apr 10, 6-7 AM (0)
Apr 10, 7-8 AM (0)
Apr 10, 8-9 AM (1)
Apr 10, 9-10 AM (2)
Apr 10, 10-11 AM (25)
Apr 10, 11-12 PM (24)
Apr 10, 12-1 PM (2)
Apr 10, 1-2 PM (0)
Apr 10, 2-3 PM (2)
Apr 10, 3-4 PM (2)
Apr 10, 4-5 PM (2)
Apr 10, 5-6 PM (0)
Apr 10, 6-7 PM (0)
Apr 10, 7-8 PM (0)
Apr 10, 8-9 PM (0)
Apr 10, 9-10 PM (0)
Apr 10, 10-11 PM (0)
Apr 10, 11-12 AM (0)
Apr 11, 12-1 AM (0)
Apr 11, 1-2 AM (0)
Apr 11, 2-3 AM (0)
Apr 11, 3-4 AM (0)
Apr 11, 4-5 AM (0)
Apr 11, 5-6 AM (0)
Apr 11, 6-7 AM (0)
Apr 11, 7-8 AM (0)
Apr 11, 8-9 AM (0)
Apr 11, 9-10 AM (0)
Apr 11, 10-11 AM (1)
Apr 11, 11-12 PM (0)
Apr 11, 12-1 PM (0)
Apr 11, 1-2 PM (1)
Apr 11, 2-3 PM (0)
Apr 11, 3-4 PM (0)
Apr 11, 4-5 PM (1)
Apr 11, 5-6 PM (0)
Apr 11, 6-7 PM (0)
80 commits this week Apr 04, 2026 - Apr 11, 2026
net-rs: contiguity walk falls back to block_cache
select_chain_once's contiguity guard called chain_tree.ancestors(last_hash)
to verify that a peer's replay chain reaches the picked common ancestor.
The walk terminates at the first block whose parent is not in chain_tree.

On_block_received inserts every fetched block into both chain_tree and
block_cache — but the chain_tree insert is skipped when the header has no
parsed info AND chain_tree doesn't already know the block (so block_no=0).
That leaves the block in block_cache without a chain_tree entry, and the
next contiguity walk terminates early, firing `fork mismatch (replay
doesn't reach ancestor)` → OrphanCandidate. With the cooldown cap this no
longer infects other peers, but individual nodes under sustained fork
load can slowly get stuck on this false mismatch.

Fix: add a hybrid walker that follows prev_hash links using chain_tree
first and block_cache as a fallback. The walk terminates at a genuine
gap (neither store has the parent) or at a genesis child (prev_hash=None)
— both distinguished via a new HybridWalk.reached_origin flag so the
genesis-reached check in select_chain_once still works.

The walk is a new private method on PraosConsensus in selection.rs:
walk_ancestors_hybrid(start_hash) -> HybridWalk.

4 new unit tests exercise:
- chain_tree-only case (back-compat with pre-fix behaviour)
- block_cache fallback (tree has tip + anchor, middle only in cache)
- gap termination (parent in neither store → reached_origin=false)
- start_only_in_cache (start block only in block_cache)

Cluster verification at p=0.2: 24/25 nodes stayed healthy for ~55 min
(vs previous build which had 4 stuck by T+60min). The one stuck node
(node-4) hit a separate mux-level ingress-overflow bug during catch-up
fetches, not the contiguity walk — tracked separately.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
net-rs: orphan_cooldown to cap re-intersection cascade rate
evaluate_and_fetch had no persistent skip set — every new event (TipAdvanced,
RolledBack, BlockFetchFailed) re-ran select_chain_once with an empty local
`tried` HashSet, so a peer that failed the contiguity guard would be
re-classified as OrphanCandidate hundreds of times per second, each iteration
clearing its entries and sending another NetworkCommand::ReIntersect. Two
peers on a single node could generate 500k+ orphan log lines per 30 min
and saturate CPU on receiving peers, propagating stuckness.

A naive pending-set guard (clear on IntersectionFound) didn't help on
localhost because re-intersection round-trips complete in 1-3 ms, faster
than TipAdvanced events arrive. The peer ping-pongs between orphan and
re-intersected in a tight loop.

Fix: time-based cooldown instead.

- PraosConsensus.orphan_cooldown: HashMap<PeerId, Instant> holds the
  earliest time each peer can be reconsidered in chain selection.
- ORPHAN_COOLDOWN = 1s — caps orphan/ReIntersect emissions at ≤1/sec/peer.
- evaluate_and_fetch builds `skip` from unexpired cooldown entries (and
  prunes expired ones), inserts new orphans with `now + ORPHAN_COOLDOWN`,
  and gates the log + ReIntersect send on the transition via an
  `already_cooling` check so the rate-limited state is visible exactly
  once per entry.
- IntersectionFound does NOT clear the cooldown — the entry must expire
  naturally so the peer's ChainSync stream has time to rebuild contiguous
  entries under the new anchor before we re-evaluate.
- PeerDisconnected clears the cooldown entry (prevents leaks).

5 new unit tests:
- orphan_first_time_sends_reintersect_and_marks_cooldown
- orphan_while_cooling_does_not_resend_reintersect
- many_tip_advances_while_cooling_do_not_cascade (1000 events → 0 extra)
- intersection_found_does_not_clear_cooldown
- peer_disconnected_clears_cooldown

Cluster verification at p=0.2: fresh run held 21-24/25 nodes healthy for
~2 hours. Orphan-cascade total across all 25 nodes ~5k in that window
(vs 500k+ per-node on the pre-fix build). Stuck count stable at 4/25
in a bounded partition — the remaining stuck nodes fall to a separate
chain_tree contiguity-walk bug (tracked as a follow-up), not the cascade.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
net-rs: non-blocking coordinator sends + RAII per-IP counter
Coordinator main loop used .send().await on both network_events (coord→app)
and peer.commands (coord→peer task), while the app's consensus loop sent
back via network_commands — creating a circular blocking dependency that
deadlocked under fork pressure. The per-IP counter was only decremented
from remove_peer via PeerEvent::Failed, so deadlock also leaked inbound
slots until nodes refused all new connections (the "ghost" pattern).

Two fixes:

- RAII IpCountGuard stored in PeerState.ip_guard. Drop decrements the
  slot on any PeerState removal (remove_peer, shutdown drain, accept-path
  failure), regardless of event-loop liveness.

- Gate the peer_events branch of the main select\! on
  network_events.capacity() >= MIN_EMIT_HEADROOM (256). When the app is
  slow, the gate closes and peer tasks block on peer_event_sender.send,
  propagating backpressure through the mux demuxer to TCP. Other
  branches stay active so the coord still drains network_commands and
  runs timers. All network_events sends become synchronous try_send via
  emit_event; all peer.commands sends become send_peer_command, which
  schedules peer removal on Full via a pending_removals queue drained
  after each select\! iteration.

Channel capacities raised as absorbers so the gate is the backpressure
mechanism rather than the channel ceiling: network_events 256→8192,
peer.commands 16→256, peer_events 256→2048, network_commands 64→1024.

New tests: ip_count_guard_decrements_on_drop,
coordinator_still_processes_commands_when_app_is_slow,
coordinator_removes_peer_when_its_command_channel_fills. The existing
accepted_peer_does_not_schedule_reconnection test now also asserts the
ip_counts entry is released.

Cluster at p=0.2 held 23/25 healthy across the 30-minute mark where the
old coordinator saw the first per-IP limit warnings. Zero per-IP
warnings and zero SDU timeouts observed so far; the remaining 2 stuck
nodes are suffering from the separate OrphanCandidate re-intersection
bug (tracked separately) rather than the coordinator cascade.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
net-rs: decouple chain selection from fetch decisions
Block arrival now drives chain selection directly: on_block_received
calls try_switch_to(this_block), which walks chain_tree backward to
find the common ancestor with the adopted chain and switches if all
intermediate blocks are cached. No peer chain consultation needed.

Fetch decisions are separate: evaluate_and_fetch examines peer chains
to determine what blocks to request, handles OrphanCandidate
re-intersection, and issues FetchBlockRange commands.

This separation means a node that receives blocks from any source
can immediately apply them without depending on peer chain state
that may be stale or fragmented after rollbacks.

Cluster tested: 25 nodes at p=0.2 for 20 minutes (350+ blocks),
zero stuck nodes. Previously 7-13 nodes would get permanently stuck.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
net-rs: split praos.rs into module directory
Mechanical refactor — no behavioral changes. Split the 2900-line
praos.rs into separate files by concern:

  praos/mod.rs        — PraosConsensus struct, public API, integration tests
  praos/peer_chain.rs — PeerChain tracking (entries, anchors, rollbacks)
  praos/selection.rs  — chain selection (select_chain_once, try_stored_switch)
  praos/fetching.rs   — fetch decisions (issue_fetch, in_flight, retry)
  praos/validation.rs — validator pipeline (apply, rollback, block received)

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
net-rs: periodic gap fetch to bridge disconnected chain_tree segments
When try_stored_switch finds the walk from best_tip doesn't reach
adopted_tip (gap in chain_tree), report the gap point so the periodic
retry can issue a targeted fetch for the missing blocks. Only fetches
forward ranges (gap slot > adopted slot) and only from the periodic
retry timer to avoid fetch storms on every block-received event.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
net-rs: chain selection on stored blocks via chain_tree walk
Add try_stored_switch(): walk back from chain_tree's best tip to
the adopted tip and switch to the longest contiguous prefix of
cached blocks. Finds switchable chains from ANY source regardless
of which peer announced them — blocks from multiple peers that
form a contiguous path all get applied.

Called at the top of select_chain before peer-chain evaluation.
Replaces the per-peer partial switch that could only see one
peer's replay at a time.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
net-rs: route block fetches to announcing peer, serve partial ranges
Two fetch pipeline fixes for lagging nodes:

1. FetchBlockRange carries optional peer_id hint so the coordinator
   routes directly to the peer that announced the chain, bypassing
   fragment-based lookup (fragments lose points after rollbacks).

2. ChainStore::get_range returns available blocks when the requested
   `to` point isn't in the store (peer rolled back past that tip).
   Previously returned empty/NoBlocks, starving lagging nodes.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
net-rs: partial chain switch when full replay is incomplete
When select_chain gets WaitingForBlocks, apply the contiguous prefix
of already-cached blocks immediately instead of waiting for the full
replay chain to the peer's tip. Each subsequent block arrival triggers
another partial switch, allowing lagging nodes to make incremental
progress rather than stalling until the complete chain is available.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
net-rs: fix fork-mismatch sticking in chain selection
When the contiguity guard detects that all replay blocks have bodies
but chain_tree.ancestors() doesn't reach the common ancestor, this
means the replay chain goes through a different fork than the ancestor
(stale PeerChain entries from an abandoned fork mixed with new ones).

Previously this returned WaitingForBlocks and issued a range fetch
that could never succeed (no blocks connect two different forks),
causing nodes to loop forever on the same failing fetch.

Now returns OrphanCandidate instead, and the OrphanCandidate handler
always clears PeerChain entries and requests re-intersection. This
forces ChainSync to rebuild from a fresh intersection point,
resolving the stale-entry contamination.

Cluster test (p=0.2, 25 nodes, 20 min): 24/25 nodes at tip with
289 blocks and 5104 fork switches, zero gap-to-ancestor events.
Previous behavior: 6-10 permanently stuck nodes.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Split tx backlog into separate local and peer queues
The tx_backlog_max_size limit was incorrectly gating locally-generated
transactions based on total backlog depth (including peer-received txs).
Split the single backlog queue into two separate VecDeques so the limit
only applies to locally-generated transactions. Track and report max
backlog watermarks independently for each queue.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Add min-latency-clusters shard strategy with configurable balance
Agglomerative clustering (Kruskal-style): sorts edges by latency and
merges lowest-latency pairs first, maximizing the CMB lookahead.
Balance controlled by shard-max-size-pct config (default 200%).

Achieves 7-10ms min cross-shard latency vs 1ms for other strategies,
but worse cluster shapes increase cross-shard traffic. For uniformly
connected topologies, zero-latency-clusters (balanced, 1ms lookahead)
still wins on wall clock. Min-latency-clusters would shine on
topologies with natural geographic clusters.

Also adds CMB lookahead to shard startup log for diagnostics.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Add geographic shard strategy using k-means on node coordinates
- Add Geographic variant to ShardStrategy, configurable via shard-strategy
- K-means++ clustering on node coordinates, keeping 0-latency components
  together, falls back to zero-latency-clusters if coordinates missing
- Add location field to NodeConfiguration (from topology coordinates)
- Extract union-find helpers to shared module
- Refactor zero_latency_clusters to expose reusable component-building
  and balanced-assignment functions

Benchmarks show geographic helps when topology has clear regional clusters
with high inter-region latency. For uniformly-connected topologies, the
balance penalty outweighs the marginal lookahead improvement.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Add min-cut shard strategy and cross-shard edge diagnostics
Min-cut uses recursive bisection with Kernighan-Lin refinement to
minimize cross-shard edge count. Also logs cross-shard edge count
and CMB lookahead at startup for diagnostics.

Shard strategy benchmark summary (6 shards, 100 slots, realistic.yaml):
| Strategy              | Wall   | User  | Cross-shard | Lookahead | Sizes              |
|-----------------------|--------|-------|-------------|-----------|--------------------|
| (none, 1 shard)       | 9m19s  | 36m   | —           | —         | [3000]             |
| zero-latency-clusters | 3m48s  | 39m   | 82.0%       | 1ms       | [500x6]            |
| min-latency-clusters  | 4m51s  | 43m   | 55.9%       | 8ms       | [600,600,600,600,300,300] |
| geographic            | 4m18s  | 43m   | 61.9%       | 1ms       | [796,743,703,758]  |
| min-cut               | 4m45s  | 43m   | 80.5%       | 1ms       | [750,750,375x4]    |

For uniformly connected topologies, balance dominates — zero-latency-
clusters wins despite 82% cross-shard edges. Strategies optimizing for
fewer cross-shard edges or higher lookahead create imbalance that
negates their gains. Would benefit from topologies with natural clusters.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Refactor: extract actor engine to actor.rs, clean up sim.rs
- Extract actor-model simulation (NodeListWrapper, ActorSimulation,
  init_nodes, run logic) into sim/actor.rs
- sim.rs is now a thin dispatch layer: Simulation newtype wrapping
  SimulationInner enum (Actor vs Sequential)
- Unify sequential single-shard and multi-shard builders into a
  single build_typed() function
- Group cross-shard state into CrossShardState sub-struct
- build() takes event_sender directly, creates its own infrastructure
- Remove dead code (per_shard_node_configs, init_node_impls)
- Net -407 lines across sim.rs + sequential.rs

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Spawn each shard as independent tokio task using CMB conservative PDES
Each shard (ClockCoordinator + NetworkCoordinator + TxProducer) now runs as
its own tokio::spawn'd task, enabling true parallel execution across cores.

Key changes:
- Replace select_all with tokio::spawn per shard (Simulation::run takes self)
- Cross-shard messages route directly NC-to-NC via delivery channels
- Target NC handles timing locally via its own Connection (no broker)
- CMB ceiling: min(peer.time + min_latency) with null message advancement
- Notified::enable() prevents missed notifications across concurrent tasks
- TX generation rate scaled by shard_count for consistent output
- Delete broker.rs (replaced by direct NC-to-NC routing)

4-shard runs ~2x faster than 1-shard with matching simulation results.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Replace sequential-engine bool with engine enum, add turbo.yaml preset
Rename `sequential-engine: true/false` to `engine: sequential | actor` with
actor as the default to avoid surprises. Add parameters/turbo.yaml convenience
preset (sequential engine, 6 shards, zero-latency-clusters) for ~5x speedup.
Update CLAUDE.md and README.md with engine and shard documentation.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Make parallel-threshold configurable, default 10, add to schema
Extract the hardcoded PARALLEL_THRESHOLD constant into a configurable
parallel-threshold parameter (default 10, was 32). Add engine and
parallel-threshold to config.schema.json. Disable rayon in the
determinism test since event channel ordering is non-deterministic
under parallel execution.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Refactor: extract shared CPU task and tx generation logic to common.rs/tx.rs
Deduplicate code between actor (driver.rs) and sequential engines:
- Extract NodeEvent, CpuTaskWrapper, schedule_cpu_task, complete_cpu_subtask
  into new common.rs shared module
- Extract TxGeneratorCore from both TransactionProducer (actor) and
  TxGenerator (sequential) into tx.rs; delete TxGenerator entirely
- TransactionProducer now wraps TxGeneratorCore as a thin async actor
- WeightedLookup made pub(super) for reuse

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Add configurable peer tx backlog cap, rename tx_backlog to tx_generated_backlog
The peer backlog queue was unbounded and could cause memory explosion.
Add leios-tx-peer-backlog-max-size config (null = unbounded) to cap it
independently. Rename the existing backlog config from
leios-tx-backlog-max-size to leios-tx-generated-backlog-max-size for
clarity. Peer txs dropped due to a full backlog are tracked separately
as PeerBacklogFull.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Add sequential DES engine with rayon BSP parallelism
Replace the actor-based simulation with a synchronous event loop for
single-shard runs. The sequential engine eliminates tokio coordination
overhead (channels, oneshot allocs, task scheduling) by processing
events directly from a global priority queue.

Events at the same timestamp are batched and processed in parallel
across nodes using rayon, following a Bulk Synchronous Parallel model:
pop batch → resolve deliveries → parallel node compute → apply results.

Enabled by default for single-shard (sequential-engine: true in config).
Falls back to sequential processing for small batches (<32 events).
The actor engine remains available via sequential-engine: false.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>