Home / Input Output / ouroboros-leios
Apr 29, 11-12 PM (1)
Apr 29, 12-1 PM (3)
Apr 29, 1-2 PM (0)
Apr 29, 2-3 PM (0)
Apr 29, 3-4 PM (5)
Apr 29, 4-5 PM (0)
Apr 29, 5-6 PM (0)
Apr 29, 6-7 PM (0)
Apr 29, 7-8 PM (0)
Apr 29, 8-9 PM (0)
Apr 29, 9-10 PM (0)
Apr 29, 10-11 PM (0)
Apr 29, 11-12 AM (0)
Apr 30, 12-1 AM (0)
Apr 30, 1-2 AM (0)
Apr 30, 2-3 AM (0)
Apr 30, 3-4 AM (0)
Apr 30, 4-5 AM (0)
Apr 30, 5-6 AM (0)
Apr 30, 6-7 AM (0)
Apr 30, 7-8 AM (0)
Apr 30, 8-9 AM (0)
Apr 30, 9-10 AM (0)
Apr 30, 10-11 AM (2)
Apr 30, 11-12 PM (0)
Apr 30, 12-1 PM (0)
Apr 30, 1-2 PM (0)
Apr 30, 2-3 PM (0)
Apr 30, 3-4 PM (3)
Apr 30, 4-5 PM (0)
Apr 30, 5-6 PM (0)
Apr 30, 6-7 PM (0)
Apr 30, 7-8 PM (6)
Apr 30, 8-9 PM (6)
Apr 30, 9-10 PM (0)
Apr 30, 10-11 PM (0)
Apr 30, 11-12 AM (1)
May 01, 12-1 AM (0)
May 01, 1-2 AM (0)
May 01, 2-3 AM (0)
May 01, 3-4 AM (0)
May 01, 4-5 AM (0)
May 01, 5-6 AM (0)
May 01, 6-7 AM (0)
May 01, 7-8 AM (0)
May 01, 8-9 AM (0)
May 01, 9-10 AM (0)
May 01, 10-11 AM (0)
May 01, 11-12 PM (0)
May 01, 12-1 PM (0)
May 01, 1-2 PM (0)
May 01, 2-3 PM (0)
May 01, 3-4 PM (0)
May 01, 4-5 PM (0)
May 01, 5-6 PM (0)
May 01, 6-7 PM (0)
May 01, 7-8 PM (0)
May 01, 8-9 PM (0)
May 01, 9-10 PM (0)
May 01, 10-11 PM (0)
May 01, 11-12 AM (0)
May 02, 12-1 AM (0)
May 02, 1-2 AM (0)
May 02, 2-3 AM (0)
May 02, 3-4 AM (0)
May 02, 4-5 AM (0)
May 02, 5-6 AM (0)
May 02, 6-7 AM (0)
May 02, 7-8 AM (0)
May 02, 8-9 AM (0)
May 02, 9-10 AM (0)
May 02, 10-11 AM (0)
May 02, 11-12 PM (0)
May 02, 12-1 PM (0)
May 02, 1-2 PM (0)
May 02, 2-3 PM (0)
May 02, 3-4 PM (0)
May 02, 4-5 PM (0)
May 02, 5-6 PM (0)
May 02, 6-7 PM (0)
May 02, 7-8 PM (0)
May 02, 8-9 PM (0)
May 02, 9-10 PM (0)
May 02, 10-11 PM (0)
May 02, 11-12 AM (0)
May 03, 12-1 AM (0)
May 03, 1-2 AM (0)
May 03, 2-3 AM (0)
May 03, 3-4 AM (0)
May 03, 4-5 AM (0)
May 03, 5-6 AM (0)
May 03, 6-7 AM (0)
May 03, 7-8 AM (0)
May 03, 8-9 AM (0)
May 03, 9-10 AM (0)
May 03, 10-11 AM (0)
May 03, 11-12 PM (0)
May 03, 12-1 PM (0)
May 03, 1-2 PM (0)
May 03, 2-3 PM (0)
May 03, 3-4 PM (0)
May 03, 4-5 PM (0)
May 03, 5-6 PM (0)
May 03, 6-7 PM (0)
May 03, 7-8 PM (0)
May 03, 8-9 PM (0)
May 03, 9-10 PM (0)
May 03, 10-11 PM (0)
May 03, 11-12 AM (0)
May 04, 12-1 AM (0)
May 04, 1-2 AM (0)
May 04, 2-3 AM (0)
May 04, 3-4 AM (0)
May 04, 4-5 AM (0)
May 04, 5-6 AM (0)
May 04, 6-7 AM (0)
May 04, 7-8 AM (0)
May 04, 8-9 AM (0)
May 04, 9-10 AM (0)
May 04, 10-11 AM (0)
May 04, 11-12 PM (0)
May 04, 12-1 PM (0)
May 04, 1-2 PM (0)
May 04, 2-3 PM (0)
May 04, 3-4 PM (0)
May 04, 4-5 PM (0)
May 04, 5-6 PM (0)
May 04, 6-7 PM (0)
May 04, 7-8 PM (0)
May 04, 8-9 PM (0)
May 04, 9-10 PM (0)
May 04, 10-11 PM (0)
May 04, 11-12 AM (0)
May 05, 12-1 AM (0)
May 05, 1-2 AM (0)
May 05, 2-3 AM (0)
May 05, 3-4 AM (0)
May 05, 4-5 AM (0)
May 05, 5-6 AM (0)
May 05, 6-7 AM (1)
May 05, 7-8 AM (0)
May 05, 8-9 AM (0)
May 05, 9-10 AM (0)
May 05, 10-11 AM (0)
May 05, 11-12 PM (0)
May 05, 12-1 PM (0)
May 05, 1-2 PM (0)
May 05, 2-3 PM (0)
May 05, 3-4 PM (0)
May 05, 4-5 PM (0)
May 05, 5-6 PM (0)
May 05, 6-7 PM (0)
May 05, 7-8 PM (0)
May 05, 8-9 PM (1)
May 05, 9-10 PM (0)
May 05, 10-11 PM (0)
May 05, 11-12 AM (0)
May 06, 12-1 AM (0)
May 06, 1-2 AM (0)
May 06, 2-3 AM (0)
May 06, 3-4 AM (0)
May 06, 4-5 AM (0)
May 06, 5-6 AM (0)
May 06, 6-7 AM (0)
May 06, 7-8 AM (0)
May 06, 8-9 AM (71)
May 06, 9-10 AM (0)
May 06, 10-11 AM (0)
May 06, 11-12 PM (0)
100 commits this week Apr 29, 2026 - May 06, 2026
Update CLAUDE.mds for 1500-node sweep, --memory-limit-file, done markers
The 2026w18 doc now reflects the harness as it actually behaves: it
covers both 750n and 1500n topologies, documents the --memory-limit-file
flag, the `done` marker semantics, and the continue-on-failure logic in
the sweep wrappers. Adds a per-topology run-time table, an honest
"Memory and disk requirements" section explaining why memory-limit
caps don't help at 1500n high throughput (the per-node txs cache is
diffusion-limited, not throughput-limited), and the rationale for the
256 GB virtual ulimit. Voting-mode thresholds now show that `everyone`
includes zero-stake relays in the simulator. Removes the "original CIP
results at experiment root" lines — those files were never actually
checked in there.

The sim-rs doc gets a brief note on 1500n RSS scaling and the one-line
flush_window tweak that would shrink the end-of-sim EventMonitor spike
without altering correctness.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
1500-node sweep: final 10 Plutus runs (sweep complete)
Completes the topology-v2-1500 sweep: everyone × 4 (Plutus 5000-50000)
plus top-stake-fraction × 6 (Plutus 1000-50000). The full sweep finished
2026-05-01 at 07:24:33 BST, totalling 33 runs (5 NA × 3 modes + 6 Plutus
× 3 modes), all completing 100% TX finalization except the 50000 Gstep/EB
collapse case which is the expected pathological behaviour at that
Plutus level.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
check-progress: also report trace-processor and pigz post-processing
Previously the script only watched sim-cli, so cron output read "NO SIM
RUNNING" while the experiment was actually busy in the trace processor
or the final pigz of csv files — making a healthy run look stuck. Now
it reports any of sim-cli / leios-trace-processor / pigz -p 3 -9f and
prefixes each line with the binary name so it's obvious which phase is
active.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Append voting benchmark results from cip-voting-options.sh runs
The voting_results.csv accumulates rows from each cip-voting-options.sh
invocation; this commit captures the runs done while developing the
2026w18 sweep harness (label tags include `caps-retain`, `no-caps`,
and seed/throughput sweep variants). Adds 127 rows for future
reference / analysis.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
1500-node sweep results: 23 completed runs (NA × 3 modes + Plutus partial)
Commits the small text artifacts (case.csv, config.yaml, summary.txt,
time.txt, done marker) for every topology-v2-1500 seed-0 run that has
finished its full pipeline. Excludes the bulky outputs (sim.log.gz,
*.csv.gz, stdout, stderr) per the existing .gitignore.

Coverage at this snapshot:
- All 5 NA throughputs (0.150-0.350) × all 3 voting modes = 15 runs
- All 6 wfa-ls Plutus levels (1000-50000) = 6 runs
- everyone Plutus 1000 and 2000 = 2 runs (sweep still in progress)

Plus the `done` marker for the canonical NA,0.350/everyone/topology-v2
baseline (the run formerly known as seed-0.no-limits, promoted to
seed-0 in the cleanup commit).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
run-all-voting-modes: continue across modes on partial failure
run-sweep.sh now exits 1 when any experiment fails (continue-on-
failure logic), which under set -eo pipefail aborted the outer loop
after the first mode with any OOM. Wrap the inner call so failures
in one mode don't lose the remaining modes; report at the end.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Prune old vote and EB state in linear leios to bound memory
Vote bundles, per-EB tallies, EB state, and relay announcements were
stored forever, causing 50GB+ memory with large voter committees.

Add slot-based pruning that removes all leios state for EBs older than
the full voting lifecycle (3*header_diffusion + vote_stage +
diffuse_stage + buffer). Also remove write-only certified_ebs field.

Reduces peak memory from 54GB to ~31GB for 750-voter/1500-slot runs.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Add stateless context-derived RNG primitive; migrate VRF/lottery
The simulator's stateful ChaChaRng-per-node design is fragile: RNG
consumption count per node depends on control flow (e.g., "did this
node receive an EB in time to vote"), which depends on network timing.
Any microsecond-scale timing drift changes the number of RNG draws on a
node, desynchronising its RNG state, and every downstream random
decision on that node diverges — a macro-amplifier that turns upstream
timing blips into EB-scale outcome drift.

It's also unrealistic. Cardano's real VRF is stateless per slot:
vrf_output = f(key, nonce || slot) is a pure function that doesn't
"advance" with each use.

Introduce a stateless oracle: every random draw becomes a pure function
of (global_seed, context). The new `sim-core/src/rng` module provides:

- DrawSite enum naming every call site (RbLottery, VoteVrf, MempoolSwap,
  TxGen{Node,Body,Frequency}, TxConflict, Withhold*, test/lottery site
  variants). Discriminant plus variant fields are hashed into the
  context, so distinct call sites never collide.
- Rng::draw_{u64,range,f64_01,bool}, all pure functions of
  (seed, node, slot, site).
- SplitMixHasher — portable deterministic hasher: endian-pinned writes
  (to_le_bytes in every write_uNN), splitmix64-style mixing, splitmix
  finalizer. Not cryptographic; fine for a sim (no adversarial inputs)
  and ~ns per draw.

Ten unit tests in rng::tests cover: determinism, different-seed
differentiation, 500-context collision check, 600-trial-index
distinctness, site-variant-on-same-(node,slot) distinctness, range/
probability sanity, endian-independence, and golden vectors pinning the
hash output (tested to catch accidental hash-function changes).

Migrate the VRF/lottery call paths for all three node variants:

- sim/lottery.rs: LotteryConfig::run signature changes from
  `(kind, success_rate, &mut ChaChaRng)` to
  `(kind, success_rate, &Rng, NodeId, slot, DrawSite)`. MockLotteryResults
  (tests) unchanged: still keyed by LotteryKind.
- sim/linear_leios.rs: run_vrf threads slot+site through; RB lottery
  uses DrawSite::RbLottery; vote VRF enumerates its (up to) 600 trials
  as DrawSite::VoteVrf { eb_id, trial }.
- sim/stracciatella.rs: inline run_vrf (bypasses LotteryConfig) migrated
  similarly. DrawSites: RbLottery, EbLottery{pipeline, trial},
  VoteVrfPipeline{pipeline, trial}.
- sim/leios.rs: inline run_vrf migrated. DrawSites: IbLottery, EbLottery,
  VoteVrfPipeline, RbLottery.

Nodes still hold a ChaChaRng for mempool shuffle, withhold-TX attack,
TxGeneratorCore, and new_tx body randomness. These are migrated in
follow-up phases. The critical VRF path — the macro-amplifier that
cascades network-timing non-determinism into per-node RNG-state
desynchronisation — is now structurally deterministic by construction.

All 51 sim-core tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Fix EB state pruning causing protocol collapse
The prune_old_leios_state function could prune an EB from node state
before an endorsing RB arrived, causing the node to add the EB to
incomplete_onchain_ebs with no way to validate it (body already gone).
This permanently set produce_empty_block=true, shutting down all block
production on affected nodes and cascading across the network.

Fix: don't add an EB to incomplete_onchain_ebs if it's in pruned_ebs
(meaning it was already validated before being pruned — no conflict
risk). Also add defensive guards in the pruning loop to skip EBs that
are in incomplete_onchain_ebs, and preserve ebs_by_rb mappings for
incomplete EBs.

Bisected: pre-pruning commit a1649012d produces 100% TX finalization
at 0.200 throughput; post-pruning ce028db2d collapses to 46%.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Convert HashMap/HashSet to BTreeMap/BTreeSet in linear_leios node state
Eliminates non-deterministic iteration order in NodeLeiosState,
LedgerState, and LinearLeiosNode.txs. All key types already implement
Ord. At typical map sizes (5-50 entries for leios state, 100s-1000s
for txs) BTreeMap has negligible CPU overhead and slightly lower
memory usage than HashMap.

The praos state (NodePraosState) was already using BTreeMap; this
brings the leios state into line. Remaining HashSet usages are pure
membership tests (contains/insert) that do not affect determinism.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Retain EB-critical TXs on peer backlog overflow
Problem
-------
When a node's peer TX backlog hits its cap (e.g. 10,000), incoming TXs
are silently dropped from self.txs.  If a dropped TX is referenced by a
pending Endorser Block, the EB's validation scan (try_validating_eb)
finds has_tx() = false and the EB is never marked all_txs_seen.  The EB
then misses its vote window and is orphaned by the next Ranking Block
(WrongEB).  Because the TX is never re-offered by peers, the one-shot
missing_txs trigger — already consumed by acknowledge_tx — cannot
re-fire, leaving the EB permanently stuck.

Under Poisson-clustered RB production (e.g. seed 4 at 0.200 MB/s), this
cascade produced 48 EBs with 19 uncertified (40%), 23M peer TX drops,
and a mean of only 348 votes/EB (well below the 450 quorum).

Fix
---
Two changes in propagate_tx():

1. Move the mempool insertion check (try_add_to_mempool) BEFORE
   acknowledge_tx, so that missing_txs has not yet been consumed at the
   point where we decide whether to drop.

2. When PeerBacklogFull fires, check whether the TX is referenced by a
   pending EB (self.leios.missing_txs.contains_key).  If yes, keep the
   TX in self.txs (skip the backlog, but preserve has_tx = true) and
   fall through to acknowledge_tx normally.  If no, drop as before.

This retains only EB-critical TXs — bounded by (pending_EBs × EB_size),
typically a few thousand entries and ~3 MB of HashMap overhead per node.
Non-critical TXs are still dropped, preserving the memory cap's purpose.

Effect on seed 4 sequential 0.200/wfa-ls (worst-case seed)
-----------------------------------------------------------
                  EBs  uncert  mean   WrongEB  drops   peak RSS
caps (before):    48   19      348    1138     23.2M   ~20 GB
caps-retain:      45    8      470    1330      5.9M   ~24 GB
nocaps (ref):     46    8      473    1516      0      ~35 GB

Uncertified EBs:  19 → 8  (40% → 18%)
Mean votes/EB:    348 → 470  (near nocaps 473)
Peer TX drops:    23.2M → 5.9M  (−74%)
Peak RSS:         ~20 → ~24 GB  (+20%, well below nocaps ~35 GB)

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
De-RNG Linear Leios completely: withhold attacker + TxGeneratorCore
Migrate every remaining stateful-RNG use reachable from Linear Leios:

- linear_leios.rs generate_withheld_txs: `self.rng.random_bool(p)` is
  replaced with `rng.draw_bool(node, slot, DrawSite::WithholdDecision,
  p)`. The distribution sample for `txs_to_generate` and the per-tx
  `new_tx` body generation use `Rng::seeded_chacha(node, slot, site)`
  to produce one-shot ChaChaRngs seeded from context — this keeps the
  rand_distr / `new_tx` machinery unchanged while removing the
  cross-call stateful coupling.

- tx.rs TxGeneratorCore: replaces its `ChaChaRng` with the stateless
  `SimRng` plus a monotonic `next_tx_idx: u64`. Each TX is generated
  from a one-shot ChaChaRng seeded from
  `("tx_generator", tx_idx)` — so the generated TX stream is a pure
  function of the master seed regardless of per-node or network-timing
  behaviour. Propagates the `SimRng` type through TransactionProducer
  and its callers in sim/sequential.rs and sharding/shard.rs; the
  master-RNG `.next_u64()` consumption is preserved to keep any
  remaining downstream draws on stracciatella/leios variants seeded
  the same way they were.

- Drops `rng: ChaChaRng` field from `LinearLeiosNode`. The NodeImpl
  trait signature still takes a `ChaChaRng` for the other variants, so
  LinearLeiosNode::new accepts it as `_rng` and discards.

New Rng methods: `seeded_chacha(node, slot, site)` for context-tied
one-shot ChaChaRng seeding, and `seeded_chacha_from<K: Hash>(&K)` for
sim-wide (non-node-tied) draws like the TX generator.

All 54 sim-core tests pass; clippy clean for Linear Leios and
TxGeneratorCore.

Stracciatella and full-Leios variants retain their stateful `self.rng`
for now — they build fine but are out of scope for the current
determinism investigation.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Fix topology connectivity with minimal reciprocal links
Replace the full symmetrization (which nearly doubled link count from
39k to 59k) with a targeted fixup: for each node not listed as
anyone's producer, add a single reciprocal link back from its first
producer.  This adds only 432 links (one per BP) vs ~20k before.

BPs were the only nodes needing fixup — they pick 2 relay producers
but no relay was picking them back, making them invisible to the
sim's consumer-edge BFS.  Relays cross-reference each other enough
to be naturally reachable.

Re-generated topology: 38,943 links (vs 59,268 symmetric, 38,511
original asymmetric).

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Add deterministic CIP experiment scripts and sim-rs vote config fields
Scripts for running CIP experiments with deterministic turbo mode:
- run-deterministic.sh: per-experiment runner with voting mode, seed,
  and engine selection (turbo default, actor/sequential optional)
- run-all-NA.sh: runs all CIP throughputs (0.150-0.350) for a given mode
- run-all-voting-modes.sh: runs all throughputs x all voting modes
- combine-results-multi-vote.sh: collects results for a given voting mode
  into the format expected by analysis.ipynb

Add sim-rs persistent/non-persistent vote config fields to
experiments/config.yaml alongside existing Haskell sim fields. Both
halves are set to the same original CIP values so the weighted average
is unchanged. Without these, sim-rs silently uses defaults from
config.default.yaml (total probability 500 instead of 600).

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Add -P/--extra-params and scripts/poll-sim.sh
cip-voting-options.sh gains a repeatable -P/--extra-params flag that
layers additional YAML parameter files on top of the existing config
chain (applied last so they override everything). Useful for quick
experiments — e.g., `-P /tmp/coarse-timestamp.yaml` to bump
timestamp-resolution-ms without touching the committed parameter set.

poll-sim.sh prints a concise one-line status of a running sim-cli plus
the log tail, intended for use from /loop or cron to watch a
long-running benchmark without blocking Claude's thread on sleep.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Add process RSS to memory stats and simplify praos.blocks instrumentation
Read VmRSS from /proc/self/status and log it alongside estimated totals
so we can directly compare instrumented vs actual memory usage.

Simplify praos.blocks stats back to basic entry count and tx_refs — the
detailed unique/endorse breakdown showed praos.blocks is not a
significant memory consumer.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Fix rayon non-determinism: remove .filter() from parallel dispatch
rayon's filter() on an indexed parallel iterator produces an unindexed
iterator whose collect() does NOT preserve element order — the output
Vec order depends on work-stealing scheduling, which varies per process.
Moving the empty-work check into .map() keeps the iterator indexed, so
collect() is deterministic regardless of rayon thread scheduling.

This was the root cause of the bistable attractor at 0.200/wfa-ls: the
same seed+config could land on either 28/8 (healthy) or 81/49
(pathological) depending on which process-launch rayon happened to
schedule.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Make multi-shard turbo engine fully deterministic
Two sources of cross-run non-determinism in the multi-shard
sequential engine:

1. Shard assignment: zero_latency_clusters, min_latency_clusters, and
   min_cut all collected components via HashMap whose iteration order
   varies per process.  Switch to BTreeMap and add a stable tiebreaker
   to the component sort so shard-to-node mapping is a pure function
   of the topology.

2. TX ID assignment: RealTransactionConfig used shared AtomicU64
   counters (next_id, input_id) across shard threads with Relaxed
   ordering, making ID assignment depend on OS scheduling.  Move
   counters into per-shard TxGeneratorCore with offset ranges
   (shard_index * 1B) so IDs are deterministic and non-overlapping.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>