Fixed issues, found by GitHub coPilot
Home /
Input Output /
ouroboros-leios
May 23, 10-11 AM (0)
May 23, 11-12 PM (0)
May 23, 12-1 PM (0)
May 23, 1-2 PM (0)
May 23, 2-3 PM (0)
May 23, 3-4 PM (0)
May 23, 4-5 PM (0)
May 23, 5-6 PM (0)
May 23, 6-7 PM (0)
May 23, 7-8 PM (0)
May 23, 8-9 PM (0)
May 23, 9-10 PM (0)
May 23, 10-11 PM (0)
May 23, 11-12 AM (0)
May 24, 12-1 AM (0)
May 24, 1-2 AM (0)
May 24, 2-3 AM (0)
May 24, 3-4 AM (0)
May 24, 4-5 AM (0)
May 24, 5-6 AM (0)
May 24, 6-7 AM (0)
May 24, 7-8 AM (0)
May 24, 8-9 AM (0)
May 24, 9-10 AM (0)
May 24, 10-11 AM (0)
May 24, 11-12 PM (0)
May 24, 12-1 PM (0)
May 24, 1-2 PM (0)
May 24, 2-3 PM (0)
May 24, 3-4 PM (0)
May 24, 4-5 PM (0)
May 24, 5-6 PM (0)
May 24, 6-7 PM (0)
May 24, 7-8 PM (2)
May 24, 8-9 PM (0)
May 24, 9-10 PM (0)
May 24, 10-11 PM (0)
May 24, 11-12 AM (1)
May 25, 12-1 AM (0)
May 25, 1-2 AM (0)
May 25, 2-3 AM (0)
May 25, 3-4 AM (0)
May 25, 4-5 AM (0)
May 25, 5-6 AM (12)
May 25, 6-7 AM (0)
May 25, 7-8 AM (0)
May 25, 8-9 AM (0)
May 25, 9-10 AM (0)
May 25, 10-11 AM (0)
May 25, 11-12 PM (1)
May 25, 12-1 PM (0)
May 25, 1-2 PM (2)
May 25, 2-3 PM (2)
May 25, 3-4 PM (1)
May 25, 4-5 PM (0)
May 25, 5-6 PM (1)
May 25, 6-7 PM (1)
May 25, 7-8 PM (0)
May 25, 8-9 PM (1)
May 25, 9-10 PM (0)
May 25, 10-11 PM (0)
May 25, 11-12 AM (0)
May 26, 12-1 AM (0)
May 26, 1-2 AM (0)
May 26, 2-3 AM (1)
May 26, 3-4 AM (0)
May 26, 4-5 AM (1)
May 26, 5-6 AM (0)
May 26, 6-7 AM (0)
May 26, 7-8 AM (0)
May 26, 8-9 AM (0)
May 26, 9-10 AM (0)
May 26, 10-11 AM (10)
May 26, 11-12 PM (19)
May 26, 12-1 PM (2)
May 26, 1-2 PM (0)
May 26, 2-3 PM (0)
May 26, 3-4 PM (0)
May 26, 4-5 PM (0)
May 26, 5-6 PM (0)
May 26, 6-7 PM (0)
May 26, 7-8 PM (0)
May 26, 8-9 PM (0)
May 26, 9-10 PM (0)
May 26, 10-11 PM (0)
May 26, 11-12 AM (0)
May 27, 12-1 AM (0)
May 27, 1-2 AM (0)
May 27, 2-3 AM (1)
May 27, 3-4 AM (1)
May 27, 4-5 AM (2)
May 27, 5-6 AM (0)
May 27, 6-7 AM (27)
May 27, 7-8 AM (0)
May 27, 8-9 AM (2)
May 27, 9-10 AM (3)
May 27, 10-11 AM (7)
May 27, 11-12 PM (0)
May 27, 12-1 PM (7)
May 27, 1-2 PM (0)
May 27, 2-3 PM (4)
May 27, 3-4 PM (17)
May 27, 4-5 PM (0)
May 27, 5-6 PM (2)
May 27, 6-7 PM (0)
May 27, 7-8 PM (0)
May 27, 8-9 PM (0)
May 27, 9-10 PM (0)
May 27, 10-11 PM (0)
May 27, 11-12 AM (0)
May 28, 12-1 AM (0)
May 28, 1-2 AM (0)
May 28, 2-3 AM (1)
May 28, 3-4 AM (0)
May 28, 4-5 AM (0)
May 28, 5-6 AM (0)
May 28, 6-7 AM (0)
May 28, 7-8 AM (0)
May 28, 8-9 AM (1)
May 28, 9-10 AM (1)
May 28, 10-11 AM (2)
May 28, 11-12 PM (4)
May 28, 12-1 PM (2)
May 28, 1-2 PM (0)
May 28, 2-3 PM (0)
May 28, 3-4 PM (0)
May 28, 4-5 PM (1)
May 28, 5-6 PM (1)
May 28, 6-7 PM (0)
May 28, 7-8 PM (1)
May 28, 8-9 PM (1)
May 28, 9-10 PM (0)
May 28, 10-11 PM (0)
May 28, 11-12 AM (0)
May 29, 12-1 AM (0)
May 29, 1-2 AM (0)
May 29, 2-3 AM (0)
May 29, 3-4 AM (0)
May 29, 4-5 AM (0)
May 29, 5-6 AM (0)
May 29, 6-7 AM (0)
May 29, 7-8 AM (0)
May 29, 8-9 AM (1)
May 29, 9-10 AM (0)
May 29, 10-11 AM (0)
May 29, 11-12 PM (0)
May 29, 12-1 PM (1)
May 29, 1-2 PM (0)
May 29, 2-3 PM (1)
May 29, 3-4 PM (0)
May 29, 4-5 PM (0)
May 29, 5-6 PM (0)
May 29, 6-7 PM (0)
May 29, 7-8 PM (0)
May 29, 8-9 PM (0)
May 29, 9-10 PM (0)
May 29, 10-11 PM (1)
May 29, 11-12 AM (0)
May 30, 12-1 AM (0)
May 30, 1-2 AM (0)
May 30, 2-3 AM (0)
May 30, 3-4 AM (0)
May 30, 4-5 AM (0)
May 30, 5-6 AM (0)
May 30, 6-7 AM (0)
May 30, 7-8 AM (0)
May 30, 8-9 AM (0)
May 30, 9-10 AM (0)
May 30, 10-11 AM (0)
149 commits this week
May 23, 2026
-
May 30, 2026
Adopt the eager non-voter pipeline
Under eager, non-voters pre-fetch the EB body and missing closure during the diffusion window in parallel with voter activity (rather than starting from scratch when the certRB arrives, which is the lazy alternative we previously documented). Add a new "Where does p_cert come from?" sub-paragraph in §3.1 between the bug-fix pseudocode and the existing "Why P_cert, not P(EB exists)" subsection.
Introduce 'topology-source' with 'type' to the config, update readme.
Fix not respecting total stake in yaml mode
Merge branch 'main' into cet/memory-telemetry
Resolves a parallel-implementation conflict in the LeiosStore retention path: PR #913 landed on main with an untagged `VecDeque<LeiosNotification>` and a `notification_evictable` predicate that inspects each notification's referenced slots, while this branch had a slot-tagged deque. Adopt main's predicate-based design — it handles `VotesOffer` more accurately than the conservative max-slot tag — and re-layer this branch's additions on top: - `notifications_bytes_estimate` field in stats (Serialize-derived through net-core's serde dep) with `notification_heap_bytes` only counting the variable `VotesOffer` payload, never the enum size. - `tick_slot` advances retention by wall-clock without touching the version counter or watch channel — eviction is silent. - `push_notification` filter drops late-arriving offers whose slots are already below cutoff, before they can sit at the back of the deque past the front-only pop_front loop. - `notifications_after` clamps the cursor to the next-write index on overshoot so a consumer that gets ahead reconverges on the next inject. All five new behaviour tests + a 10k-slot stress test exercise the eviction guarantee end-to-end. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
LeiosStore: address Copilot review feedback on retention
Four fixes to the slot-windowed eviction added in 7f8ec6d5f: - Drop the double-count of size_of::<LeiosNotification>() in the notifications byte estimate (per_entry_overhead already includes it). - Reorder inject_* to update max_slot before pushing the notification, and skip the push if the slot is already below the retention cutoff. The front-only pop_front eviction loop can't reach a late-slot entry sitting at the back of the deque, so they have to be filtered at the source. - notifications_after now clamps the returned start index to the next write position when the caller has overshot, instead of echoing the overshoot back forever and stranding the consumer. - tick_slot no longer bumps the version counter or signals the watch channel. Eviction is silent to readers — no new notifications were added — so per-slot wall-clock advancement no longer wakes every LeiosNotify subscription. Five new unit tests cover each behaviour, including a 10k-slot stress test that asserts the deque and data maps stay O(retention) under sustained EB / vote / tick_slot load. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Address review comments
ptrace improvements
nix shell providing jupyter notebook
Add notes/TOC.md with suggested reading order
Six chunking notes have accumulated, all interconnected. The TOC gives a one-paragraph summary of each in a suggested reading sequence: math foundations → empirical sweep → small-n puzzle → staircase visualization → MC confidence → low-p regime. Added plots.
Mark off-frame curve continuations in chunking plots
Adds a "▶" chevron at the right edge of each truncated curve (one that doesn't reach y=1 within the visible x-range), so the reader sees immediately that the curve continues off-frame. Per-n CI plots additionally annotate with the mass-off-frame percentage; the main multi-curve plot keeps just the chevron to stay uncluttered. Two correctness fixes baked in: - Mass-off-frame uses bisect_right(sorted_times, xmax) to compute the true F at xmax, independent of the polyline's downsampling, so the annotation is exact even when the displayed curve doesn't sample near xmax. - Polyline sampling now combines uniform downsampling with dense coverage of the top 1 % of chunk indices. Without the dense tail the uniform step skips over the loss-affected chunks at low p (<1 % of mass) and the curve visually flat-lines through the loss tail. With it the curve climbs honestly. Bumps margin_right 30 → 60 to make room for the chevron + label.
Adds a --runs CLI flag to tools/chunk_compare.py
(previously hardcoded to 50000) so the user can scale Monte Carlo for tail / conditional stability. Appends a "Validation at M=500 000" section to the low-p note showing the n=32 conditional file_P99|loss revised from 0.92 s (M=50k, ~10 conditional samples) to 1.69 s (M=500k, ~100 samples) — the qualitative story holds (chunking still gives ~80% reduction in bad-outcome magnitude) but the n=32 corner needs the larger M to be reportable. Also tags the original M=50k conditional table as such. Co-authored with Claude.
Improvements and explanations
1. Extends simulate_one_run to also return n_effective_losses (loss events that actually reshaped cwnd; losses in the file-completing round don't count). Adds monte_carlo_with_losses() helper alongside the existing monte_carlo() so backward-compat is preserved. tools/chunk_compare.py grows a --conditional flag that prints a companion table reporting P[≥1 loss] and P99|≥1 loss at both chunk and whole-file levels. The file-level conditional uses a resampling estimator (default B=200000) over the chunk samples. The new metric is the meaningful one at low p (e.g. p=1e-6) where marginal P99 collapses to the no-loss minimum time + jitter and tells you nothing about loss-tail behavior. Documented in a new note, parallel-chunking-low-p.md, with concrete numbers showing that the conditional baseline P99 at p=1e-6 is ~9 s vs the marginal P99 of ~0.57 s, and that chunking reduces the conditional file P99 from ~9 s (n=1) to ~0.9 s (n=32). 2. Replaces the imprecise "high-RTT / file-fits-in-slow-start" claim with the accurate mechanism: chunking attacks the loss-induced tail but cannot reduce the per-chunk slow-start ramp T_floor ≈ ceil(log2(N/n)) · RTT, so the achievable relative gain shrinks as RTT (and thus the floor) grows. Adds an RTT-sweep table at 50/250/1000 ms showing the benefit dropping from -75% to -63% as evidence. 3. Two new diagnostics for the parallel-chunking analysis, plus a note documenting the underlying problem (file-P99 lives deep in the chunk distribution's tail, especially for n >= 8, so single-run estimates can be unreliable around CDF steps): - tools/chunking_stability.py: reruns the chunk-distribution sim K times with different seeds and reports mean/std/range/CoV per n with a verdict flag. Catches CDF-step-induced instability that bootstrap CIs can miss (e.g. n=16 in the default config shows CoV ~10% with a 30% seed-to-seed range). - tools/plot_chunking.py --ci: keeps the main multi-curve plot unchanged and additionally emits one per-n SVG with a pointwise bootstrap 95% CI band. Uses random.binomialvariate for an efficient exact-bootstrap of the empirical CDF at each x-point. - notes/parallel-chunking-mc-confidence.md: quantifies the tail- data problem, presents both bootstrap CI and seed-to-seed tables, and lists mitigations from "bump --runs" to importance sampling. 4. Adds a "Validation at M=500 000" section comparing the 10-seed sweep at M=50k vs M=500k. Key finding: n=16 was the only n materially wrong at M=50k (CoV 9.9% → 0.3%, mean shifted +3%), and chunk_compare's shipped n=16 number was correct but seed-fragile. Refreshes the per-regime recommendation with concrete M=500k numbers and rewrites the "Diagnostic tooling" section now that both tools ship. 5. Records the explanation for why the chunking plot curves rise → plateau → rise rather than forming smooth S-curves. The model genuinely concentrates runs at a small finite set of outcome modes (no loss; loss in slow-start round r for each r; rare multi-loss), and F_file = F_chunk^n amplifies the existing steps at large n. Cross-links to the four other chunking notes. Co-authored with Claude.
Produce CDF of time to transmit a datum over TCP
Monte carlo simulation of TCP CUBIC, taking into account slow start, transition into congestion avoidance, and packet losses which affect the cwnd. The `tools` directory contains `plot_chunking.py` script for producing a CDF showing the benefit of chunking a large datum and requesting from multiple peers. 1. Extends the De Silva et al. throughput model with random loss (Bernoulli or Gilbert-Elliott Markov), runs a Monte Carlo to build a CDF of download times, and reports a user-specified percentile to stdout and as an SVG plot. 2. Each round now draws its own RTT from a configurable distribution (lognormal default, normal/uniform/none alternatives), and the CUBIC formula uses real elapsed wall-clock seconds instead of round count × nominal RTT. This breaks up the hard CDF step that arose when many no-loss runs all completed at exactly N · RTT. Setting model: none reproduces the prior round-quantized behavior. 3. Reuses estimator.load_config / monte_carlo / percentile_of to compare a single-shot download against n-chunk parallel downloads at n = 2, 4, 8, 16, 32. Reports both an optimistic (full link per chunk) and a realistic (link/n per chunk) bandwidth model, alongside the paired note in notes/parallel-chunking.md. Co-authored with Claude
Address Copilot review: doc accuracy and reject node_limit
Add handling of yaml topologies to net-rs
Address Copilot review: doc accuracy and reject node_limit
Generated notebooks for all reports
Cosmetic commits, docs and tracing::info optimization
Merge pull request #913 from input-output-hk/prc/net-node-memory
net-node memory + per-message size enforcement
net-core: front-prune LeiosStore notifications alongside slot eviction
`LeiosStore::notifications: Vec<LeiosNotification>` was never pruned. Every `inject_block` / `inject_block_txs` / `inject_votes` pushed an entry that lived forever, even though the blocks/votes those entries point at get slot-window-evicted by the same `bump_version` call. Over a long-running cluster the vec grows monotonically with the total inject rate. Switch storage to `VecDeque` and add a `notifications_pruned_count` so logical (caller-facing) cursors stay monotonic across pruning. At slot-window eviction, front-prune notifications whose every referenced slot is below the cutoff — those refer to data the store no longer holds, so re-sending them to a subscriber would be a wasted round trip. Stop at the first non-evictable front entry: notifications arrive in roughly slot order, so the leak past the cutoff is bounded by out-of-order arrivals (next bump catches up). `notifications_after` now takes `&mut usize`. Callers track a monotonic logical cursor; if it lags the prune frontier the call bumps it forward so subsequent `*after += 1` increments stay aligned with the items actually consumed. `notification_count` reports the all-time logical total. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
net-node, shared-consensus: demote per-tx / per-event spam to debug
Three info! lines fire once per item under steady cluster load and together accounted for ~600K of the ~660K info-level log lines in a 27-minute test run, dominating disk usage: - `transaction received` (net-node) 366K lines - `network event` (net-node, default arm) 155K lines - `mempool: evicting oldest tx` (shared) 71K lines At a 1 tx/s/node generation rate with a 10K-cap mempool, the eviction line fires on every admit once the cap is reached. At RUST_LOG=info on a 25-node cluster these saturate disk in roughly half a day. None of the three is useful at info: the per-tx and per-event lines are item-level traces (debug territory) and eviction at cap is a steady-state condition, not a notable event. Periodic `mempool state sizes` / `praos state sizes` / `leios_store: stats` lines remain at info — those carry the diagnostic signal. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
net-core: BlockFetch demuxer cap uses INGRESS_LIMIT, not per-message cap
The demuxer enforces `set_ingress_limit` as a *buffer* cap on the
per-protocol ingress queue, not as a per-message cap. Under server
pipelining (`MsgStartBatch` immediately followed by `MsgBlock`) both
segments can land in the buffer before the codec processes
`MsgStartBatch` and bumps the limit for `StStreaming`. With the
prior per-state caps that race manifested two ways:
1. `StBusy` at `SIZE_LIMIT_SMALL = 65_535` rejected the pipelined
block body outright (a 65K+ block was already enough to trip
it).
2. Even with `StBusy` raised to `SIZE_LIMIT_STREAMING = 2.5 MB`,
a real Praos block body — particularly the post-EB-overflow
fallback path where txs the EB couldn't carry get inlined into
the RB — can legitimately exceed 2.5 MB. Overnight one landed
at 2,506,268 bytes and tripped the new cap, cascading SDU
timeouts and freezing the cluster.
The spec defines `INGRESS_LIMIT = 230 MB` as the per-protocol
ingress buffer cap exactly for this case. Use it for both `StBusy`
and `StStreaming`. Spec per-message rejection at
`SIZE_LIMIT_SMALL` / `SIZE_LIMIT_STREAMING` belongs in the codec at
decode time (not yet wired); the framework's `size_limit` callback
controls buffer sizing only.
`StIdle` keeps `SIZE_LIMIT_SMALL` — the client never receives in
that state, so the tighter cap stands.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>