Update roadmap
Update roadmap.md
Update roadmap.md
The sim's connectivity BFS traverses consumer edges (reverse of producers). Unidirectional producer links left nodes unreachable, causing "Graph must be fully connected!" errors. Symmetrize all links so every A→B producer also creates B→A. Also rename generate_topology.py → generate-topology.py and summarize_topology.py → summarize-topology.py for consistency with the other shell scripts. Re-generated topology-v2-expanded-1500.yaml (59,268 links, fully connected). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Problem
-------
When a node's peer TX backlog hits its cap (e.g. 10,000), incoming TXs
are silently dropped from self.txs. If a dropped TX is referenced by a
pending Endorser Block, the EB's validation scan (try_validating_eb)
finds has_tx() = false and the EB is never marked all_txs_seen. The EB
then misses its vote window and is orphaned by the next Ranking Block
(WrongEB). Because the TX is never re-offered by peers, the one-shot
missing_txs trigger — already consumed by acknowledge_tx — cannot
re-fire, leaving the EB permanently stuck.
Under Poisson-clustered RB production (e.g. seed 4 at 0.200 MB/s), this
cascade produced 48 EBs with 19 uncertified (40%), 23M peer TX drops,
and a mean of only 348 votes/EB (well below the 450 quorum).
Fix
---
Two changes in propagate_tx():
1. Move the mempool insertion check (try_add_to_mempool) BEFORE
acknowledge_tx, so that missing_txs has not yet been consumed at the
point where we decide whether to drop.
2. When PeerBacklogFull fires, check whether the TX is referenced by a
pending EB (self.leios.missing_txs.contains_key). If yes, keep the
TX in self.txs (skip the backlog, but preserve has_tx = true) and
fall through to acknowledge_tx normally. If no, drop as before.
This retains only EB-critical TXs — bounded by (pending_EBs × EB_size),
typically a few thousand entries and ~3 MB of HashMap overhead per node.
Non-critical TXs are still dropped, preserving the memory cap's purpose.
Effect on seed 4 sequential 0.200/wfa-ls (worst-case seed)
-----------------------------------------------------------
EBs uncert mean WrongEB drops peak RSS
caps (before): 48 19 348 1138 23.2M ~20 GB
caps-retain: 45 8 470 1330 5.9M ~24 GB
nocaps (ref): 46 8 473 1516 0 ~35 GB
Uncertified EBs: 19 → 8 (40% → 18%)
Mean votes/EB: 348 → 470 (near nocaps 473)
Peer TX drops: 23.2M → 5.9M (−74%)
Peak RSS: ~20 → ~24 GB (+20%, well below nocaps ~35 GB)
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
parameters/no-caps.yaml disables all three memory caps for diagnostic
experiments (peer backlog, generated backlog, TX max age).
voting_results.csv captures the full 4-way matrix at 0.200/wfa-ls:
{turbo,sequential} × {caps,nocaps} × seeds 0-4. Key findings:
- Seed 4 is the stress seed: caps cause 40% uncertified (seq) vs 17%
without caps. Root cause is a race in propagate_tx where
acknowledge_tx consumes the one-shot missing_txs trigger before
PeerBacklogFull drops the TX.
- Seeds 1,3 are cap-insensitive (well-spaced RBs).
- No-caps converges all seeds to 16-22% uncertified.
- Stale rows (pre-rayon-fix, pre-seed-wiring) labelled as such.
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Adds a label column (position 5, between seed and time_seconds) to distinguish experiment configurations (e.g. "caps", "nocaps") without relying on memory of which rows came from which invocation. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The seed field existed on SimConfiguration but was hardcoded to 0 in build(). Adding it to RawParameters (with #[serde(default)]) lets it be set via -p YAML files, which the -S/--seed flag in cip-voting-options.sh already generates. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
rayon's filter() on an indexed parallel iterator produces an unindexed iterator whose collect() does NOT preserve element order — the output Vec order depends on work-stealing scheduling, which varies per process. Moving the empty-work check into .map() keeps the iterator indexed, so collect() is deterministic regardless of rayon thread scheduling. This was the root cause of the bistable attractor at 0.200/wfa-ls: the same seed+config could land on either 28/8 (healthy) or 81/49 (pathological) depending on which process-launch rayon happened to schedule. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
cip-voting-options.sh was piping every run through `tee /dev/stderr` which reopens /proc/self/fd/2 on each invocation; on Linux that gives a fresh offset-0 open-file-description, so successive seeds in a -S sweep overwrote the combined log from byte 0 — only the in-flight seed ever survived on disk. Now each run tees to /tmp/sim-T<T>-<mode>-<engine>-seed<N>.log so every seed retains its full log. poll-sim.sh defaults to the latest /tmp/sim-*.log when no path is given, so the normal /loop monitor workflow keeps working without changes. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Seed is the innermost loop so a partial run still yields a complete seed distribution for each (throughput, mode) cell. CSV grows a seed column (position 4); existing rows should be backfilled with seed=0. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
cip-voting-options.sh gains a repeatable -P/--extra-params flag that layers additional YAML parameter files on top of the existing config chain (applied last so they override everything). Useful for quick experiments — e.g., `-P /tmp/coarse-timestamp.yaml` to bump timestamp-resolution-ms without touching the committed parameter set. poll-sim.sh prints a concise one-line status of a running sim-cli plus the log tail, intended for use from /loop or cron to watch a long-running benchmark without blocking Claude's thread on sleep. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Cross-shard message delivery order in the sequential engine previously depended on OS thread scheduling of peer shards, so runs with shard_count > 1 produced different event sequences across runs. Fixing this required four coordinated changes: 1. **Deterministic cross-shard merge**: tag every CrossShardMsg with `source_shard` and a per-sender monotonic `seq`. Receiving shards buffer incoming messages into a `BinaryHeap` keyed on `(send_time, source_shard, seq)` and only deliver those whose send_time is strictly less than the minimum of every peer's advertised `shared_time`. Under that rule, no future message can arrive with an earlier send_time, so delivery order is a pure function of sent messages (the messages themselves are produced deterministically per-shard). 2. **Strict CMB ceiling**: the block condition changes from `timestamp > ceiling` to `timestamp >= ceiling`. At the boundary `timestamp == ceiling`, a peer might still be about to send a message whose `delivery_time == timestamp`; using strict less-than ensures every message with `delivery_time <= timestamp` is already on the mpsc by the time we process `timestamp`. 3. **Content-derived sort at pop**: BinaryHeap pop order for equal-timestamp events is a function of push history, which under multi-shard can vary across runs (cross-shard pushes from drain interleave with intra-shard pushes from apply_batch_output). Collect all events at the current timestamp into a Vec and sort by `GlobalEvent::sort_key()` before processing, so the order is a pure function of event content. 4. **Ceiling-aware termination**: replace the primary-shard-cancels-on-SlotBoundary scheme with an independent per-shard termination check that only breaks when the local queue has no events with `ts < end_time` AND the CMB ceiling is also `>= end_time`. Every shard stops at the same simulation time, independent of token-cancellation propagation races. 5. **Second drain before popping**: run drain_cross_shard_safe a second time after the ceiling check passes. The top-of-loop drain may run before the peer has advanced enough for send_time=`timestamp - eps` messages to be deliverable; the post-ceiling-check drain catches them, preventing a cross-shard delivery from landing in a later iteration and splitting a timestamp's events across batches. New test `test_sequential_multi_shard_deterministic` compares per-node event trajectories across two runs under shard_count=2. Passes 500/500 in release mode (was failing in ~100% of runs before the fix, ~25% with only the sort fix, 2% with the termination fix, 0% with the second drain). All 55 sim-core tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
`TxGeneratorCore::generate` computed inter-tx delay as `config.frequency_ms.sample() as u64 * shard_count as u64` and passed it to `Duration::from_millis`. The `as u64` cast truncated each sample: a configured 7.5 ms became 7 ms, producing TXs ~7% faster than requested. For the 0.200/wfa-ls single-shard run this meant 128,572 TXs over 900s (~214 KB/s) instead of the intended ~120,000 TXs (~200 KB/s). Only affects configurations with sub-ms precision and no batching. Turbo is largely unaffected (1 ms resolution, 10 ms tx-batch-window collapses the fractional delay anyway). Switch to `Duration::from_secs_f64`, preserving sub-millisecond precision via nanosecond-resolution Duration. Clamp to `.max(0.0)` so distributions that can sample negative (e.g., Normal) keep the old "treat negative as zero delay" behaviour rather than panicking in `from_secs_f64`. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Migrate every remaining stateful-RNG use reachable from Linear Leios:
- linear_leios.rs generate_withheld_txs: `self.rng.random_bool(p)` is
replaced with `rng.draw_bool(node, slot, DrawSite::WithholdDecision,
p)`. The distribution sample for `txs_to_generate` and the per-tx
`new_tx` body generation use `Rng::seeded_chacha(node, slot, site)`
to produce one-shot ChaChaRngs seeded from context — this keeps the
rand_distr / `new_tx` machinery unchanged while removing the
cross-call stateful coupling.
- tx.rs TxGeneratorCore: replaces its `ChaChaRng` with the stateless
`SimRng` plus a monotonic `next_tx_idx: u64`. Each TX is generated
from a one-shot ChaChaRng seeded from
`("tx_generator", tx_idx)` — so the generated TX stream is a pure
function of the master seed regardless of per-node or network-timing
behaviour. Propagates the `SimRng` type through TransactionProducer
and its callers in sim/sequential.rs and sharding/shard.rs; the
master-RNG `.next_u64()` consumption is preserved to keep any
remaining downstream draws on stracciatella/leios variants seeded
the same way they were.
- Drops `rng: ChaChaRng` field from `LinearLeiosNode`. The NodeImpl
trait signature still takes a `ChaChaRng` for the other variants, so
LinearLeiosNode::new accepts it as `_rng` and discards.
New Rng methods: `seeded_chacha(node, slot, site)` for context-tied
one-shot ChaChaRng seeding, and `seeded_chacha_from<K: Hash>(&K)` for
sim-wide (non-node-tied) draws like the TX generator.
All 54 sim-core tests pass; clippy clean for Linear Leios and
TxGeneratorCore.
Stracciatella and full-Leios variants retain their stateful `self.rng`
for now — they build fine but are out of scope for the current
determinism investigation.
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Replace `candidates.shuffle(&mut self.rng)` in
LinearLeiosNode::sample_from_mempool with Rng::context_shuffle, which
performs Fisher-Yates using DrawSite::MempoolSwap { call, idx } for
each swap. The `call` discriminator distinguishes independent shuffle
invocations at the same (node, slot): the RB-body sample uses call=0,
the EB-body sample uses call=1, so they don't collide.
DrawSite::MempoolSwap gains a `call: u32` field. Three new rng tests
cover: deterministic-per-context, distinct-calls-yield-distinct-perms,
multiset-preservation.
Threads `slot` and `shuffle_call` through sample_from_mempool's
signature. Both call sites (RB path, EB path) in try_generate_rb pass
the active slot and their assigned call index.
Note: the default `leios-mempool-sampling-strategy: ordered-by-id`
means the shuffle branch doesn't fire in the current benchmark; this
is structural cleanup so Linear Leios contains no remaining
stateful-RNG uses on its hot VRF / sampling path.
Stracciatella and full Leios variants still use stateful `self.rng` for
their shuffle paths; those will be migrated in a follow-up.
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The simulator's stateful ChaChaRng-per-node design is fragile: RNG
consumption count per node depends on control flow (e.g., "did this
node receive an EB in time to vote"), which depends on network timing.
Any microsecond-scale timing drift changes the number of RNG draws on a
node, desynchronising its RNG state, and every downstream random
decision on that node diverges — a macro-amplifier that turns upstream
timing blips into EB-scale outcome drift.
It's also unrealistic. Cardano's real VRF is stateless per slot:
vrf_output = f(key, nonce || slot) is a pure function that doesn't
"advance" with each use.
Introduce a stateless oracle: every random draw becomes a pure function
of (global_seed, context). The new `sim-core/src/rng` module provides:
- DrawSite enum naming every call site (RbLottery, VoteVrf, MempoolSwap,
TxGen{Node,Body,Frequency}, TxConflict, Withhold*, test/lottery site
variants). Discriminant plus variant fields are hashed into the
context, so distinct call sites never collide.
- Rng::draw_{u64,range,f64_01,bool}, all pure functions of
(seed, node, slot, site).
- SplitMixHasher — portable deterministic hasher: endian-pinned writes
(to_le_bytes in every write_uNN), splitmix64-style mixing, splitmix
finalizer. Not cryptographic; fine for a sim (no adversarial inputs)
and ~ns per draw.
Ten unit tests in rng::tests cover: determinism, different-seed
differentiation, 500-context collision check, 600-trial-index
distinctness, site-variant-on-same-(node,slot) distinctness, range/
probability sanity, endian-independence, and golden vectors pinning the
hash output (tested to catch accidental hash-function changes).
Migrate the VRF/lottery call paths for all three node variants:
- sim/lottery.rs: LotteryConfig::run signature changes from
`(kind, success_rate, &mut ChaChaRng)` to
`(kind, success_rate, &Rng, NodeId, slot, DrawSite)`. MockLotteryResults
(tests) unchanged: still keyed by LotteryKind.
- sim/linear_leios.rs: run_vrf threads slot+site through; RB lottery
uses DrawSite::RbLottery; vote VRF enumerates its (up to) 600 trials
as DrawSite::VoteVrf { eb_id, trial }.
- sim/stracciatella.rs: inline run_vrf (bypasses LotteryConfig) migrated
similarly. DrawSites: RbLottery, EbLottery{pipeline, trial},
VoteVrfPipeline{pipeline, trial}.
- sim/leios.rs: inline run_vrf migrated. DrawSites: IbLottery, EbLottery,
VoteVrfPipeline, RbLottery.
Nodes still hold a ChaChaRng for mempool shuffle, withhold-TX attack,
TxGeneratorCore, and new_tx body randomness. These are migrated in
follow-up phases. The critical VRF path — the macro-amplifier that
cascades network-timing non-determinism into per-node RNG-state
desynchronisation — is now structurally deterministic by construction.
All 51 sim-core tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The previous determinism microtest would pass even with a lingering non-determinism source downstream of the bandwidth-queue fix, because TestNode ignored its seeded ChaChaRng and its event payloads didn't depend on any accumulated per-node state. Any timing-induced drift in message-delivery order across runs was undetectable. Extend TestNode to roll self.rng.random::<u64>() on each Ping and Heartbeat receipt and weave the roll into the event payload (and into the returned Pong reply). Event content is now tied to accumulated per-node RNG state, so any non-determinism in message-delivery order or count desynchronises the RNG and surfaces as a differing roll=... field in a compared event. Add test_sequential_deterministic_bw_under_rayon which exercises the rayon-parallel path (parallel_threshold=1) under bandwidth contention and asserts per-node event trajectories (timestamp-sorted) match across runs. The existing test_sequential_deterministic runs serial; this one catches any rayon-visible shared-state non-determinism. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Positional args had grown unwieldy. Rewrite with flag parsing: -t/--topology, -T/--throughput, -m/--mode, -e/--engine, -s/--slots, --quorum-fraction, --stake-fraction. Add an `--engine` selector that writes an on-the-fly override file: actor — default (tokio async), single-shard, non-deterministic sequential — single-shard sequential DES (deterministic) turbo — sequential DES with 6 shards (non-deterministic, fast) Add `engine` as a CSV column so runs from different engines can live in the same file and be pivoted cleanly. Add determinism-run.sh / determinism-check.sh as a simple 3-run harness for spot-checking single-shard-sequential determinism against the 0.200/wfa-ls scenario. determinism-run.sh runs the benchmark 3× and writes progress to /tmp/det-run-state; determinism-check.sh prints a concise status summary (safe to poll from /loop or cron). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Connection::split_bytes_amongst_queues iterated `bandwidth_queues` (a `HashMap` with std RandomState) and distributed a `bytes % queues` remainder by walking a stable-sorted vec. When two mini-protocols had equal queued bytes, the stable sort preserved HashMap iteration order, so the +1 byte landed on a non-deterministic protocol. Under bandwidth contention this shifted message arrival timestamps, and the divergence cascaded into different EB certification outcomes across otherwise identical runs. Switch `bandwidth_queues` to `BTreeMap` and widen the `TProtocol` bound from `Hash` to `Ord`. Add `PartialOrd, Ord` to the production `MiniProtocol` derive; propagate the `Ord` bound through `Network`, `NetworkCoordinator`, and `sharding::shard`. Tie-break is now by `TProtocol`'s Ord order (Tx < Block < IB < EB < Vote) — a stable, documentable bias strictly better than the previous stable-but-random behaviour. Add `test_sequential_deterministic_under_bandwidth_contention` that forces two mini-protocols to queue simultaneously on bandwidth-capped links and asserts bit-identical event streams (timestamps included). The pre-existing `test_sequential_deterministic` is kept as the no-bandwidth lane. Note multi-shard sequential remains non-deterministic (std_mpsc cross-shard message interleaving); add a comment to flag this. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
fix: handle transient errors and reduce poll frequency in wait-for-test
Add committee-selection-algorithm config with three modes: - wfa-ls (default): existing VRF lottery matching CIP-0164 wFA+LS - everyone: every node votes unconditionally (1 vote each) - top-stake-fraction: nodes covering top N% of cumulative stake vote This enables traffic analysis comparing the CIP's VRF-based scheme against simpler alternatives. Vote bundle sizes, CPU times, diffusion, and threshold checking are unchanged — only the selection mechanism differs. Includes benchmark script (scripts/cip-voting-options.sh) that runs CIP topology under turbo mode across all three committee modes. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The TxGeneratorCore refactor (8a4da350) moved node selection logic into TxGeneratorCore but left a reference to the removed `node_lookup` local. Replace with `self.sinks` which serves the same empty-check purpose. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
ci(antithesis): poll for test results after submission