WIP: Update the cost-estimate to CIP-164 protocol
Home /
Input Output /
ouroboros-leios-sim
Apr 16, 6-7 AM (0)
Apr 16, 7-8 AM (0)
Apr 16, 8-9 AM (2)
Apr 16, 9-10 AM (3)
Apr 16, 10-11 AM (2)
Apr 16, 11-12 PM (0)
Apr 16, 12-1 PM (3)
Apr 16, 1-2 PM (1)
Apr 16, 2-3 PM (0)
Apr 16, 3-4 PM (0)
Apr 16, 4-5 PM (0)
Apr 16, 5-6 PM (0)
Apr 16, 6-7 PM (0)
Apr 16, 7-8 PM (0)
Apr 16, 8-9 PM (0)
Apr 16, 9-10 PM (0)
Apr 16, 10-11 PM (0)
Apr 16, 11-12 AM (0)
Apr 17, 12-1 AM (0)
Apr 17, 1-2 AM (0)
Apr 17, 2-3 AM (0)
Apr 17, 3-4 AM (0)
Apr 17, 4-5 AM (0)
Apr 17, 5-6 AM (0)
Apr 17, 6-7 AM (0)
Apr 17, 7-8 AM (0)
Apr 17, 8-9 AM (0)
Apr 17, 9-10 AM (0)
Apr 17, 10-11 AM (0)
Apr 17, 11-12 PM (0)
Apr 17, 12-1 PM (0)
Apr 17, 1-2 PM (0)
Apr 17, 2-3 PM (3)
Apr 17, 3-4 PM (0)
Apr 17, 4-5 PM (0)
Apr 17, 5-6 PM (0)
Apr 17, 6-7 PM (0)
Apr 17, 7-8 PM (1)
Apr 17, 8-9 PM (1)
Apr 17, 9-10 PM (0)
Apr 17, 10-11 PM (0)
Apr 17, 11-12 AM (0)
Apr 18, 12-1 AM (0)
Apr 18, 1-2 AM (0)
Apr 18, 2-3 AM (0)
Apr 18, 3-4 AM (0)
Apr 18, 4-5 AM (0)
Apr 18, 5-6 AM (1)
Apr 18, 6-7 AM (0)
Apr 18, 7-8 AM (0)
Apr 18, 8-9 AM (0)
Apr 18, 9-10 AM (0)
Apr 18, 10-11 AM (1)
Apr 18, 11-12 PM (0)
Apr 18, 12-1 PM (0)
Apr 18, 1-2 PM (0)
Apr 18, 2-3 PM (0)
Apr 18, 3-4 PM (0)
Apr 18, 4-5 PM (0)
Apr 18, 5-6 PM (0)
Apr 18, 6-7 PM (0)
Apr 18, 7-8 PM (0)
Apr 18, 8-9 PM (0)
Apr 18, 9-10 PM (0)
Apr 18, 10-11 PM (0)
Apr 18, 11-12 AM (0)
Apr 19, 12-1 AM (0)
Apr 19, 1-2 AM (0)
Apr 19, 2-3 AM (0)
Apr 19, 3-4 AM (0)
Apr 19, 4-5 AM (0)
Apr 19, 5-6 AM (0)
Apr 19, 6-7 AM (0)
Apr 19, 7-8 AM (0)
Apr 19, 8-9 AM (0)
Apr 19, 9-10 AM (0)
Apr 19, 10-11 AM (0)
Apr 19, 11-12 PM (0)
Apr 19, 12-1 PM (0)
Apr 19, 1-2 PM (0)
Apr 19, 2-3 PM (0)
Apr 19, 3-4 PM (0)
Apr 19, 4-5 PM (0)
Apr 19, 5-6 PM (0)
Apr 19, 6-7 PM (0)
Apr 19, 7-8 PM (0)
Apr 19, 8-9 PM (0)
Apr 19, 9-10 PM (0)
Apr 19, 10-11 PM (0)
Apr 19, 11-12 AM (0)
Apr 20, 12-1 AM (0)
Apr 20, 1-2 AM (1)
Apr 20, 2-3 AM (0)
Apr 20, 3-4 AM (0)
Apr 20, 4-5 AM (0)
Apr 20, 5-6 AM (0)
Apr 20, 6-7 AM (0)
Apr 20, 7-8 AM (0)
Apr 20, 8-9 AM (0)
Apr 20, 9-10 AM (0)
Apr 20, 10-11 AM (0)
Apr 20, 11-12 PM (0)
Apr 20, 12-1 PM (0)
Apr 20, 1-2 PM (0)
Apr 20, 2-3 PM (0)
Apr 20, 3-4 PM (0)
Apr 20, 4-5 PM (1)
Apr 20, 5-6 PM (0)
Apr 20, 6-7 PM (0)
Apr 20, 7-8 PM (0)
Apr 20, 8-9 PM (0)
Apr 20, 9-10 PM (0)
Apr 20, 10-11 PM (0)
Apr 20, 11-12 AM (0)
Apr 21, 12-1 AM (0)
Apr 21, 1-2 AM (0)
Apr 21, 2-3 AM (1)
Apr 21, 3-4 AM (0)
Apr 21, 4-5 AM (0)
Apr 21, 5-6 AM (0)
Apr 21, 6-7 AM (0)
Apr 21, 7-8 AM (0)
Apr 21, 8-9 AM (2)
Apr 21, 9-10 AM (2)
Apr 21, 10-11 AM (0)
Apr 21, 11-12 PM (1)
Apr 21, 12-1 PM (1)
Apr 21, 1-2 PM (0)
Apr 21, 2-3 PM (0)
Apr 21, 3-4 PM (0)
Apr 21, 4-5 PM (0)
Apr 21, 5-6 PM (0)
Apr 21, 6-7 PM (0)
Apr 21, 7-8 PM (0)
Apr 21, 8-9 PM (0)
Apr 21, 9-10 PM (0)
Apr 21, 10-11 PM (2)
Apr 21, 11-12 AM (0)
Apr 22, 12-1 AM (0)
Apr 22, 1-2 AM (0)
Apr 22, 2-3 AM (0)
Apr 22, 3-4 AM (0)
Apr 22, 4-5 AM (0)
Apr 22, 5-6 AM (0)
Apr 22, 6-7 AM (0)
Apr 22, 7-8 AM (2)
Apr 22, 8-9 AM (0)
Apr 22, 9-10 AM (0)
Apr 22, 10-11 AM (0)
Apr 22, 11-12 PM (0)
Apr 22, 12-1 PM (0)
Apr 22, 1-2 PM (0)
Apr 22, 2-3 PM (0)
Apr 22, 3-4 PM (0)
Apr 22, 4-5 PM (0)
Apr 22, 5-6 PM (0)
Apr 22, 6-7 PM (0)
Apr 22, 7-8 PM (0)
Apr 22, 8-9 PM (0)
Apr 22, 9-10 PM (0)
Apr 22, 10-11 PM (0)
Apr 22, 11-12 AM (0)
Apr 23, 12-1 AM (0)
Apr 23, 1-2 AM (0)
Apr 23, 2-3 AM (0)
Apr 23, 3-4 AM (0)
Apr 23, 4-5 AM (1)
Apr 23, 5-6 AM (0)
Apr 23, 6-7 AM (0)
32 commits this week
Apr 16, 2026
-
Apr 23, 2026
Publish preview of iframe version
filip(feat): add iframe design implementation with latest main changes
Merge pull request #859 from input-output-hk/dnadales/cumulative-tx-bytes-metric
Add confirmed tx throughput to proto-devnet dashboard
Bump flake.lock to latest leios-prototype
Picks up the merged cumulative-tx-size work in ouroboros-consensus and cardano-node now that both branches incorporate it on leios-prototype.
Switch to jemalloc for better multi-threaded allocation
jemalloc handles concurrent allocation from rayon worker threads better than glibc's ptmalloc2 (per-thread caches, less lock contention) and returns freed pages more aggressively, reducing RSS bloat from allocator fragmentation. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Skip event buffering when no output file is requested
The deterministic event sorting pipeline (added in 54389c5ec) was cloning and buffering every simulation event even when no -o output file was given. At T=0.250 with 1500 nodes this accumulated 7M+ OutputEvent structs (~10 GB) at peak, causing RSS to balloon from ~21 GB (actual node state) to 59 GB and OOM. Guard the clone/buffer/flush path with a has_output check. RSS at slot 656 dropped from 59 GB to 28 GB — matching tracked node state plus normal allocator overhead. Also adds EventMonitor and LivenessMonitor stats logging every 60 slots for ongoing memory diagnostics. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Add network queue stats instrumentation
Expose per-shard connection queue statistics (total/active connections, queued messages, queued bytes) via a shared NetworkStatsCollector. Each shard's sequential engine updates its counters at slot boundaries; the node's existing log_memory_stats reads the aggregate. Output appears every 60 slots alongside Memory stats, covering all shards. Initial profiling showed zero queued messages in turbo mode (zero-latency clusters bypass bandwidth queues), ruling out network queues as the cause of the ~40 GB RSS vs ~20 GB tracked-state gap. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Fix topology connectivity with minimal reciprocal links
Replace the full symmetrization (which nearly doubled link count from 39k to 59k) with a targeted fixup: for each node not listed as anyone's producer, add a single reciprocal link back from its first producer. This adds only 432 links (one per BP) vs ~20k before. BPs were the only nodes needing fixup — they pick 2 relay producers but no relay was picking them back, making them invisible to the sim's consumer-edge BFS. Relays cross-reference each other enough to be naturally reachable. Re-generated topology: 38,943 links (vs 59,268 symmetric, 38,511 original asymmetric). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Apply suggestion from @ch1bo
Co-authored-by: Sebastian Nagel <[email protected]>
Apply suggestion from @ch1bo
Co-authored-by: Sebastian Nagel <[email protected]>
Update roadmap
Update roadmap.md
Fix generate-topology to produce bidirectional links
The sim's connectivity BFS traverses consumer edges (reverse of producers). Unidirectional producer links left nodes unreachable, causing "Graph must be fully connected!" errors. Symmetrize all links so every A→B producer also creates B→A. Also rename generate_topology.py → generate-topology.py and summarize_topology.py → summarize-topology.py for consistency with the other shell scripts. Re-generated topology-v2-expanded-1500.yaml (59,268 links, fully connected). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Snapshot of Frisby's SQLite latency investigation during 2026 March
Retain EB-critical TXs on peer backlog overflow
Problem
-------
When a node's peer TX backlog hits its cap (e.g. 10,000), incoming TXs
are silently dropped from self.txs. If a dropped TX is referenced by a
pending Endorser Block, the EB's validation scan (try_validating_eb)
finds has_tx() = false and the EB is never marked all_txs_seen. The EB
then misses its vote window and is orphaned by the next Ranking Block
(WrongEB). Because the TX is never re-offered by peers, the one-shot
missing_txs trigger — already consumed by acknowledge_tx — cannot
re-fire, leaving the EB permanently stuck.
Under Poisson-clustered RB production (e.g. seed 4 at 0.200 MB/s), this
cascade produced 48 EBs with 19 uncertified (40%), 23M peer TX drops,
and a mean of only 348 votes/EB (well below the 450 quorum).
Fix
---
Two changes in propagate_tx():
1. Move the mempool insertion check (try_add_to_mempool) BEFORE
acknowledge_tx, so that missing_txs has not yet been consumed at the
point where we decide whether to drop.
2. When PeerBacklogFull fires, check whether the TX is referenced by a
pending EB (self.leios.missing_txs.contains_key). If yes, keep the
TX in self.txs (skip the backlog, but preserve has_tx = true) and
fall through to acknowledge_tx normally. If no, drop as before.
This retains only EB-critical TXs — bounded by (pending_EBs × EB_size),
typically a few thousand entries and ~3 MB of HashMap overhead per node.
Non-critical TXs are still dropped, preserving the memory cap's purpose.
Effect on seed 4 sequential 0.200/wfa-ls (worst-case seed)
-----------------------------------------------------------
EBs uncert mean WrongEB drops peak RSS
caps (before): 48 19 348 1138 23.2M ~20 GB
caps-retain: 45 8 470 1330 5.9M ~24 GB
nocaps (ref): 46 8 473 1516 0 ~35 GB
Uncertified EBs: 19 → 8 (40% → 18%)
Mean votes/EB: 348 → 470 (near nocaps 473)
Peer TX drops: 23.2M → 5.9M (−74%)
Peak RSS: ~20 → ~24 GB (+20%, well below nocaps ~35 GB)
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Add no-caps parameter file and baseline voting results
parameters/no-caps.yaml disables all three memory caps for diagnostic
experiments (peer backlog, generated backlog, TX max age).
voting_results.csv captures the full 4-way matrix at 0.200/wfa-ls:
{turbo,sequential} × {caps,nocaps} × seeds 0-4. Key findings:
- Seed 4 is the stress seed: caps cause 40% uncertified (seq) vs 17%
without caps. Root cause is a race in propagate_tx where
acknowledge_tx consumes the one-shot missing_txs trigger before
PeerBacklogFull drops the TX.
- Seeds 1,3 are cap-insensitive (well-spaced RBs).
- No-caps converges all seeds to 16-22% uncertified.
- Stale rows (pre-rayon-fix, pre-seed-wiring) labelled as such.
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Add -L/--label to cip-voting-options.sh for tagging CSV rows
Adds a label column (position 5, between seed and time_seconds) to distinguish experiment configurations (e.g. "caps", "nocaps") without relying on memory of which rows came from which invocation. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Add RSS to poll-sim.sh process status line
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Wire seed parameter through RawParameters to SimConfiguration
The seed field existed on SimConfiguration but was hardcoded to 0 in build(). Adding it to RawParameters (with #[serde(default)]) lets it be set via -p YAML files, which the -S/--seed flag in cip-voting-options.sh already generates. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Fix rayon non-determinism: remove .filter() from parallel dispatch
rayon's filter() on an indexed parallel iterator produces an unindexed iterator whose collect() does NOT preserve element order — the output Vec order depends on work-stealing scheduling, which varies per process. Moving the empty-work check into .map() keeps the iterator indexed, so collect() is deterministic regardless of rayon thread scheduling. This was the root cause of the bistable attractor at 0.200/wfa-ls: the same seed+config could land on either 28/8 (healthy) or 81/49 (pathological) depending on which process-launch rayon happened to schedule. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Write per-run sim logs, and poll-sim picks latest by default
cip-voting-options.sh was piping every run through `tee /dev/stderr` which reopens /proc/self/fd/2 on each invocation; on Linux that gives a fresh offset-0 open-file-description, so successive seeds in a -S sweep overwrote the combined log from byte 0 — only the in-flight seed ever survived on disk. Now each run tees to /tmp/sim-T<T>-<mode>-<engine>-seed<N>.log so every seed retains its full log. poll-sim.sh defaults to the latest /tmp/sim-*.log when no path is given, so the normal /loop monitor workflow keeps working without changes. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Add -S/--seed to cip-voting-options.sh for multi-seed sweeps
Seed is the innermost loop so a partial run still yields a complete seed distribution for each (throughput, mode) cell. CSV grows a seed column (position 4); existing rows should be backfilled with seed=0. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Add -P/--extra-params and scripts/poll-sim.sh
cip-voting-options.sh gains a repeatable -P/--extra-params flag that layers additional YAML parameter files on top of the existing config chain (applied last so they override everything). Useful for quick experiments — e.g., `-P /tmp/coarse-timestamp.yaml` to bump timestamp-resolution-ms without touching the committed parameter set. poll-sim.sh prints a concise one-line status of a running sim-cli plus the log tail, intended for use from /loop or cron to watch a long-running benchmark without blocking Claude's thread on sleep. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Make multi-shard sequential engine deterministic
Cross-shard message delivery order in the sequential engine previously depended on OS thread scheduling of peer shards, so runs with shard_count > 1 produced different event sequences across runs. Fixing this required four coordinated changes: 1. **Deterministic cross-shard merge**: tag every CrossShardMsg with `source_shard` and a per-sender monotonic `seq`. Receiving shards buffer incoming messages into a `BinaryHeap` keyed on `(send_time, source_shard, seq)` and only deliver those whose send_time is strictly less than the minimum of every peer's advertised `shared_time`. Under that rule, no future message can arrive with an earlier send_time, so delivery order is a pure function of sent messages (the messages themselves are produced deterministically per-shard). 2. **Strict CMB ceiling**: the block condition changes from `timestamp > ceiling` to `timestamp >= ceiling`. At the boundary `timestamp == ceiling`, a peer might still be about to send a message whose `delivery_time == timestamp`; using strict less-than ensures every message with `delivery_time <= timestamp` is already on the mpsc by the time we process `timestamp`. 3. **Content-derived sort at pop**: BinaryHeap pop order for equal-timestamp events is a function of push history, which under multi-shard can vary across runs (cross-shard pushes from drain interleave with intra-shard pushes from apply_batch_output). Collect all events at the current timestamp into a Vec and sort by `GlobalEvent::sort_key()` before processing, so the order is a pure function of event content. 4. **Ceiling-aware termination**: replace the primary-shard-cancels-on-SlotBoundary scheme with an independent per-shard termination check that only breaks when the local queue has no events with `ts < end_time` AND the CMB ceiling is also `>= end_time`. Every shard stops at the same simulation time, independent of token-cancellation propagation races. 5. **Second drain before popping**: run drain_cross_shard_safe a second time after the ceiling check passes. The top-of-loop drain may run before the peer has advanced enough for send_time=`timestamp - eps` messages to be deliverable; the post-ceiling-check drain catches them, preventing a cross-shard delivery from landing in a later iteration and splitting a timestamp's events across batches. New test `test_sequential_multi_shard_deterministic` compares per-node event trajectories across two runs under shard_count=2. Passes 500/500 in release mode (was failing in ~100% of runs before the fix, ~25% with only the sort fix, 2% with the termination fix, 0% with the second drain). All 55 sim-core tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>