dolos - 30 commits this week

scarmuega · Tue, 23 Jun 26 19:55:59 +0000 · dolos

fix(cardano): inject Plutus POSIXTime in milliseconds for mempool validation

The phase-2 script-context evaluation in the mempool built pallas'
`SlotConfig` directly from `ChainSummary`, which keeps `slot_length` and
`timestamp` in seconds (the Ouroboros era-summary convention). Pallas'
`SlotConfig` expects milliseconds and performs no scaling of its own
(`zero_time + (slot - zero_slot) * slot_length`), so the resulting
`POSIXTime` in `txInfoValidRange` was 1000x too small. Scripts that
inspect the validity range (deadlines, time-locks) would mis-validate.

Centralize the seconds->milliseconds conversion in a new
`ChainSummary::to_pallas_slot_config()` helper and use it from
`evaluate_tx`. This mirrors the existing gRPC era-summary boundary, which
already multiplies `timestamp` by 1000 on output.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

ad69e07a · fix/mempool-script-context-posixtime-ms · 2/62 ++ 5 --

scarmuega · Tue, 23 Jun 26 11:48:47 +0000 · dolos

release: v1.3.2

ea7960a1 · main · 3/11 ++ 2,116 --

scarmuega · Mon, 22 Jun 26 12:29:37 +0000 · dolos

chore(trp): bump tx3-resolver/tx3-cardano to 0.23.0 (#1027)

7646dd07 · main · 2/8 ++ 8 --

scarmuega · Mon, 22 Jun 26 12:04:12 +0000 · dolos

chore(trp): bump tx3-resolver/tx3-cardano to 0.23.0

Adopt the lang minor that lets the resolver bind arguments of aggregate
types (record/list/tuple/map) via the self-describing tagged wire form,
so `trp.resolve` no longer fails with `target type not supported` for a
complex argument. Bare scalar args are unaffected (decoded exactly as
before). Pulls tx3-tir 0.19.0 transitively.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

ad469f2a · pull/1027/head · 2/8 ++ 8 --

scarmuega · Sat, 20 Jun 26 00:25:04 +0000 · dolos

release: v1.3.1

534158e2 · main · 3/21 ++ 10 --

scarmuega · Sat, 20 Jun 26 00:23:50 +0000 · dolos

fix(cli): keep snapshot progress bar live during ranged download (#1026)

6d01a9d0 · main · 2/36 ++ 15 --

scarmuega · Sat, 20 Jun 26 00:03:30 +0000 · dolos

fix(bootstrap): keep snapshot progress bar live during ranged download

The bar only advanced once per completed 64 MiB chunk, so it sat still for
seconds at a time and looked frozen even though the download was progressing.

Increment progress as bytes arrive within each chunk (rolling a failed attempt's
bytes back before retry so the count never overshoots the total), and enable a
steady tick so the bar keeps redrawing while the downloader is blocked on window
backpressure.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

e6bda992 · fix/bootstrap-progress-bar · 2/36 ++ 15 --

scarmuega · Fri, 19 Jun 26 23:55:33 +0000 · dolos

fix(bootstrap): keep snapshot progress bar live during ranged download

The bar only advanced once per completed 64 MiB chunk, so it sat still for
seconds at a time and looked frozen even though the download was progressing.

Increment progress as bytes arrive within each chunk (rolling a failed attempt's
bytes back before retry so the count never overshoots the total), and enable a
steady tick so the bar keeps redrawing while the downloader is blocked on window
backpressure.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

bdb68fa5 · fix/bootstrap-ranged-download · 2/36 ++ 15 --

scarmuega · Fri, 19 Jun 26 20:20:57 +0000 · dolos

fix(cli): download snapshots via ranged ring buffer (#1025)

0452ec2c · main · 3/468 ++ 1 --

scarmuega · Fri, 19 Jun 26 20:07:30 +0000 · dolos

style(bootstrap): use io::Error::other for clippy io_other_error

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

817b674a · fix/bootstrap-ranged-download · 1/5 ++ 8 --

scarmuega · Fri, 19 Jun 26 20:05:02 +0000 · dolos

fix(bootstrap): download snapshots via ranged ring buffer

Bootstrapping from the Cloudflare R2 snapshot bucket intermittently failed
with "error decoding response body / operation timed out" while unpacking a
segment. The old path streamed a single HTTP response directly into the tar
extractor, so the connection stayed open for the entire multi-GB transfer and
its lifetime was coupled to disk-write backpressure. R2 tears down such
long-lived, slowly-drained responses where S3 tolerated them, and any
transient stall on that one connection was fatal and unrecoverable.

Replace the single stream with a ranged ring buffer (new `ranged` module):

- A background thread downloads the snapshot in bounded 64 MiB byte ranges,
  staging a small fixed-size window (4 chunks, ~256 MiB) on disk ahead of the
  extractor. Backpressure is applied *before* a request is issued (via a permit
  pool), so the server never sees an idle/slow-drained connection.
- Each chunk is short-lived and retried with exponential backoff on failure,
  making transient stalls recoverable instead of fatal.
- A HEAD probe selects the ranged path when the endpoint advertises
  `Accept-Ranges: bytes`; otherwise it falls back to the original single-stream
  download (with its own untimed client, since an overall timeout must not cap
  a full-body stream).

Verified end-to-end against the live R2 endpoint with an ignored integration
test that downloads a prefix through the ring buffer and asserts byte-exactness
versus a direct ranged fetch, plus window backpressure and staging cleanup.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

117693be · fix/bootstrap-ranged-download · 3/471 ++ 1 --

scarmuega · Fri, 19 Jun 26 13:06:52 +0000 · dolos

release: v1.3.0

005ef0c4 · main · 3/35 ++ 10 --

scarmuega · Fri, 19 Jun 26 13:05:50 +0000 · dolos

chore: bump pallas to 1.1.1 (#1024)

43814ca1 · main · 2/66 ++ 66 --

scarmuega · Fri, 19 Jun 26 12:56:46 +0000 · dolos

chore: bump pallas to 1.1.1

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

40736ffd · pull/1024/head · 2/66 ++ 66 --

scarmuega · Fri, 19 Jun 26 12:21:39 +0000 · dolos

chore(trp): bump tx3-cardano and tx3-resolver to 0.22.0 (#1022)

dfb0c243 · main · 2/11 ++ 47 --

scarmuega · Fri, 19 Jun 26 12:21:01 +0000 · dolos

chore(cardano): update gov proposal mappings (#1023)

c18ea81e · main · 1/16 ++ 0 --

scarmuega · Fri, 19 Jun 26 12:10:27 +0000 · dolos

chore(cardano): add gov proposal mappings for new enacted proposals

Cross-referenced DBSync against hacks.rs for all networks:

- mainnet: TreasuryWithdrawals 5ad10ad3...#0 (enacted 637) and
  ParameterChange c82f3834...#0 (enacted 638)
- preview: ParameterChange 2a2dc37b...#0 (enacted 1330) and its
  superseded sibling 1ec9c47c...#0 (dropped at 1330) as Canceled

preprod had no missing entries.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

36df4614 · pull/1023/head · 1/16 ++ 0 --

scarmuega · Fri, 19 Jun 26 12:00:23 +0000 · dolos

chore(trp): bump tx3-cardano and tx3-resolver to 0.22.0

Also bump the tx3-sdk dev-dependency to 0.13.0. Pulls tx3-tir 0.18.0
in via the lockfile. Keeps the embedded TRP resolver current with the
latest published tx3 crate line; no behavioral change.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

a712134c · pull/1022/head · 2/11 ++ 47 --

adrian1-dot · Fri, 19 Jun 26 11:03:55 +0000 · dolos

fix(minibf): source address-utxo tx_hash from TxoRef, not archive block_data (#1009)

5e002aae · main · 1/4 ++ 5 --

dependabot[bot] · Fri, 19 Jun 26 03:22:28 +0000 · dolos

chore(deps): bump actions/checkout from 4 to 7 in /.github/workflows

Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 7.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v4...v7)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '7'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>

0d437f47 · dependabot/github_actions/dot-github/workflows/actions/checkout-7 · 4/11 ++ 11 --

scarmuega · Thu, 18 Jun 26 13:05:05 +0000 · dolos

test(cardano): reproduce import EWRAP-finalize-window resume gap (#1018)

Extends the boundary-resume harness while evaluating import crash/resume
safety across the full RUPD -> EWRAP -> ESTART boundary.

Findings:
- RUPD re-run is idempotent (PendingRewardState writes are overwrite-by-key,
  recompute is deterministic, finalize overwrites).
- ESTART shard crash + resume is safe (shard-skip via estart_progress +
  atomic, guarded finalize) — Scenario A still passes, now also asserting pots.
- EWRAP has a finalize-window gap: the last shard sets
  ewrap_progress.committed == total (EWrapProgress::apply), but EpochWrapUpV3
  (which assembles the final EndStats and rotates rolling/pparams) commits
  separately and does NOT touch ewrap_progress. On restart,
  EwrapWorkUnit::initialize reads committed == total -> is_complete() -> skips
  BOTH shards and finalize. A crash in [last EWRAP shard committed, before
  EpochWrapUpV3 commits] therefore permanently skips EpochWrapUpV3 on resume,
  so ESTART reset consumes a non-finalized `end` (default epoch_incentives) and
  un-rotated rolling stats: reserves stay too high, treasury too low.

Adds:
- pots (reserves, treasury, utxos, rewards, fees) to the state fingerprint so
  monetary-accounting corruption is caught, not just per-entity epoch drift.
- import_crash_ewrap_finalize_window_resumes_to_identical_state (#[ignore],
  reproduces the gap): crashes the 2nd EWRAP after all shards, before finalize,
  then resumes; pots diverge from a clean run by exactly the skipped epoch-1
  incentives (~8.99e12 reserves/treasury). Un-ignore once the EWRAP finalize
  marker is fixed.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

b847b3b5 · test/boundary-resume-repro-1018 · 1/119 ++ 1 --

scarmuega · Thu, 18 Jun 26 12:51:37 +0000 · dolos

test(cardano): add deterministic epoch-boundary resume reproduction (#1018)

Adds `tests/boundary_resume.rs`, a self-contained, deterministic harness that
crosses real epoch boundaries with actual PoolState/AccountState/EpochState and
interrupts the boundary two ways, classifying lead vs lag.

Uses a shrunken-but-coherent genesis (epoch_length=100, byron k=1, f=0.05) so
the randomness/stability windows land inside the epoch and RUPD/EWRAP/ESTART
fire after a few hundred synthetic blocks, driven through ToyDomain.

Tests:
- baseline_clean_run_crosses_boundaries_aligned (passes): a clean run crosses
  >=2 boundaries with every entity aligned to EpochState.number.
- import_crash_mid_estart_resumes_to_identical_state (passes): crash mid-ESTART
  (commit shards 0..16, no finalize) then resume yields byte-identical state.
  Refutes the import resume path as the lag source.
- rollback_across_boundary_reapplies_to_identical_state (#[ignore], reproduces
  #1018): rollback across a boundary does NOT revert the boundary transition
  (EpochState.number stays advanced) because boundary transitions are not in
  the WAL; re-applying re-fires the boundary and double-advances every entity,
  silently. Un-ignore once boundary transitions are reversible on rollback.

Runs in --release: the synthetic generator funds tx fees and registration
deposits from un-pot-backed custom_utxos, which trips the unrelated
pots.is_consistent debug_assert at a boundary (orthogonal to the snapshot-
rotation logic under test). Debug mode skips with a message. Follow-up tracked
separately.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

027e49d7 · test/boundary-resume-repro-1018 · 2/433 ++ 0 --

gonzalezzfelipe · Thu, 18 Jun 26 11:52:12 +0000 · dolos

chore: switch snapshot source to R2 (#1014)

9502e292 · main · 1/1 ++ 1 --

scarmuega · Wed, 17 Jun 26 22:53:34 +0000 · dolos

fix(cardano): fail loud on lagging pool snapshots and unfinished epoch boundaries (#1017)

722f525e · main · 8/205 ++ 104 --

scarmuega · Wed, 17 Jun 26 21:17:05 +0000 · dolos

fix(cardano): fail loud on lagging pool snapshots and unfinished epoch boundaries

This is hardening, not recovery. PR #1016 made a pool whose snapshot lags the
current epoch surface at RUPD instead of panicking obscurely. This adds two
fail-loud guards so the same class of corruption is caught earlier and a
half-finished boundary can't silently double-apply. It does NOT implement true
shard resume, and it does NOT repair an already-lagging pool — those remain
open (see #1018 and the restored "TODO: implement true shard resume" notes).

Piece A — guard the silent-corruption hole. `MintedBlocksInc::apply`
accumulates the block count into the pool's positional `live` snapshot slot,
which only holds this epoch's blocks when the snapshot is aligned. A lagging
pool would silently fold later-epoch blocks into a mislabeled slot, corrupting
the `blocks_minted` reward input. `apply` now asserts the snapshot is aligned
to the block's epoch, failing at the origin (block processing) rather than as a
downstream RUPD failure. It sits in the infallible delta-apply layer alongside
its existing invariant `expect`s, so it is a descriptive panic. The block epoch
rides on a transient `#[serde(skip)]` field; WAL-stored deltas are only ever
undone (never re-applied), so the WAL format is unchanged.

Piece B — guard ESTART finalize. `commit_finalize` now asserts every shard
committed and the epoch has not advanced before rotating pools / advancing the
epoch, returning BrokenInvariant::EpochBoundaryIncomplete otherwise — a
defensive assertion that turns a would-be silent double-rotation into a loud
error. It guards the finalize step only; it does NOT make the per-shard
`AccountTransition` replay idempotent.

Error codes + troubleshooting. The two errors are codified (LEDGER-001 pool
snapshot lagging, LEDGER-002 epoch boundary incomplete) with concise messages;
the explanatory prose and operator guidance live in a new
docs/content/operations/troubleshooting.mdx page.

Out of scope: making boundary resume idempotent (the real fix, tracked in
#1018), and rebuilding an already-corrupted pool snapshot window from the
archive. A node that already persisted a lag keeps failing loud with LEDGER-001
and needs a re-bootstrap.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

7d008e47 · fix/boundary-pool-recovery · 8/205 ++ 104 --