Merge pull request #410 from IntersectMBO/paweljakubas/fix-other-things
Fix coding standards link and docusaurus serving
Fix coding standards link and docusaurus serving
Allow the local node-to-client socket to bind within seconds of node startup rather than after the multi-hour LedgerDB replay completes. n2c ChainSync clients (cardano-db-sync, ogmios, wallet) can begin streaming blocks during replay; LSQ / TxSubmit / TxMonitor handlers naturally block on the LedgerDB until ready, relying on the property that NtC has no client-side timeouts. Phase 1 (ChainDB): - New 'LedgerDBStatus' type + 'awaitLedgerDB' STM helper - 'openDBInternal' returns once Immutable+Volatile are open; LedgerDB replay and initial chain selection run on a registry-linked background thread (new 'runInitLedgerDB') - LedgerDB-touching queries wrapped in 'awaitLedgerDB' - New 'OpenedDBImmutableReady' trace event; existing 'OpenedDB' preserved - 'closeDB' cancels the background initialiser and only closes the LedgerDB if 'cdbLedgerDBStatus' has reached 'LdbReady' Phase 2 (NodeKernel + runWith): - 'mkPendingMempool', 'mkPendingBlockchainTime', 'mkPendingDurationUntilTooOld' forwarding wrappers that block on a TMVar until populated - 'initInternalState' split into 'initInternalStateEarly' + 'completeInternalState' - 'initNodeKernel' split into 'initNodeKernelEarly' returning '(NodeKernel, m ())'; the deferred action opens the real mempool, computes the real GsmState, and spawns the four ledger-touching background threads (GSM, GDD watcher, blockForging, blockFetchLogic, decisionLogicThreads) - 'runWith' allocates the BlockchainTime / WrapDurationUntilTooOld TMVars, builds the kernel synchronously with pending wrappers, calls 'llrnRunDataDiffusion' immediately, and forks a 'Node.lateInit' thread that runs 'hardForkBlockchainTime' and 'realDurationUntilTooOld' once the LedgerDB is ready, populates the TMVars, and runs the completer Verified: 'cabal build all' clean and 'cabal test storage-test' passes 69/69 on the 3.0.1.0 release tag.
New testnet that mirrors cardano_node_master and adds the
long-running cardano-adversary daemon. Created as a separate
iteration loop for the adversary roadmap so:
- The scheduled cron continues to target cardano_node_master only.
Findings on the adversary testnet cannot regress the master
baseline because nothing automatic consumes this testnet.
- New adversarial endpoints (Tier 1.2 chain_sync_thrash,
1.3 slow_loris, Tier 2 / 3 / 4) iterate here first, dispatched
manually via:
gh workflow run "Antithesis on cardano-node testnet" \
--ref <branch> -f test=cardano_node_adversary -f duration=1
- Promotion of any change to cardano_node_master happens only
after multiple branch-dispatched runs against this testnet
produce findings_new ≤ master baseline (currently 9), and
with explicit user approval.
Layout:
- testnets/cardano_node_adversary/ — full clone of master plus
one new `adversary` service. Same producers (p1/p2/p3),
relays (relay1/relay2), tracer, tracer-sidecar, sidecar,
tx-generator, asteria-bootstrap, asteria-player, log-tailer.
- adversary service mounts relay1-state for the control socket
and tracer:ro for chainPoints.log; depends_on relay1 +
tracer-sidecar + configurator.
- Image pinned to ghcr.io/cardano-foundation/cardano-node-antithesis/adversary:cc628d5
(current built tag from PR #104). Will be bumped per follow-up
branches as adversary code evolves.
Docs:
- docs/testnets/cardano-node-adversary.md — overview,
scheduling discipline, promotion criteria.
- docs/testnets/index.md — entry pointing at the new page.
- docs/testnets/cardano-node-master.md — sidecar row updated
to drop the now-relocated adversary driver mention.
- mkdocs.yml — nav entry for the new testnet doc.
components/adversary/ is unchanged (the wrapper image keeps
building from the same tree), and components/sidecar/ is
unchanged.
The helper_sdk_lib.sh on main is byte-identical to
tx-generator's; tx-generator's SDK assertions are ingested by
Antithesis successfully, so we keep it as-is and use the first
branch-dispatched run on this new testnet as the diagnostic
for whether the adversary container's assertions are or aren't
reaching the SDK ingest channel.
for bug report see https://github.com/agda/agda/issues/8532
Roll back the adversary service from the master testnet's compose until we can verify findings_new ≤ baseline on a feature branch with workflow_dispatch --ref before re-merging. PR #99 added the daemon (findings_new went 9 → 8 — strictly improved vs baseline). PR #104 added must_hit:true SDK reachable assertions that Antithesis didn't observe firing in the run, creating 5 new findings (13 total, +4 vs baseline). The right pre-merge process — dispatch on the PR branch via 'gh workflow run "Antithesis on cardano-node testnet" --ref <branch> -f duration=1' and compare findings_new before merging — was available the whole time and was not used. Both PRs landed on main on the strength of the Compose smoke test alone, which only proves "containers come up", not Antithesis behaviour. Removing the service is the cheapest restore-baseline step. The adversary image and the daemon code in cardano-node-clients stay intact; we re-add the compose service in a follow-up PR after a branch-dispatched Antithesis run shows findings_new ≤ baseline. components/adversary/ remains so the wrapper image keeps building and publishing; only the consumer-side service entry is removed. Tracks: https://github.com/cardano-foundation/cardano-node-antithesis/issues/89 (epic).
Allow the local node-to-client socket to bind within seconds of node startup rather than after the multi-hour LedgerDB replay completes. n2c ChainSync clients (cardano-db-sync, ogmios, wallet) can begin streaming blocks during replay; LSQ / TxSubmit / TxMonitor handlers naturally block on the LedgerDB until ready, relying on the property that NtC has no client-side timeouts. Phase 1 (ChainDB): - New 'LedgerDBStatus' type + 'awaitLedgerDB' STM helper - 'openDBInternal' returns once Immutable+Volatile are open; LedgerDB replay and initial chain selection run on a registry-linked background thread (new 'runInitLedgerDB') - LedgerDB-touching queries wrapped in 'awaitLedgerDB' - New 'OpenedDBImmutableReady' trace event; existing 'OpenedDB' preserved - 'closeDB' cancels the background initialiser and only closes the LedgerDB if 'cdbLedgerDBStatus' has reached 'LdbReady' Phase 2 (NodeKernel + runWith): - 'mkPendingMempool', 'mkPendingBlockchainTime', 'mkPendingDurationUntilTooOld' forwarding wrappers that block on a TMVar until populated - 'initInternalState' split into 'initInternalStateEarly' + 'completeInternalState' - 'initNodeKernel' split into 'initNodeKernelEarly' returning '(NodeKernel, m ())'; the deferred action opens the real mempool, computes the real GsmState, and spawns the four ledger-touching background threads (GSM, GDD watcher, blockForging, blockFetchLogic, decisionLogicThreads) - 'runWith' allocates the BlockchainTime / WrapDurationUntilTooOld TMVars, builds the kernel synchronously with pending wrappers, calls 'llrnRunDataDiffusion' immediately, and forks a 'Node.lateInit' thread that runs 'hardForkBlockchainTime' and 'realDurationUntilTooOld' once the LedgerDB is ready, populates the TMVars, and runs the completer Verified: 'cabal build all' clean and 'cabal test storage-test' passes 69/69 on the 3.0.1.0 release tag.
Roll back the adversary service from the master testnet's compose until we can verify findings_new ≤ baseline on a feature branch with workflow_dispatch --ref before re-merging. PR #99 added the daemon (findings_new went 9 → 8 — strictly improved vs baseline). PR #104 added must_hit:true SDK reachable assertions that Antithesis didn't observe firing in the run, creating 5 new findings (13 total, +4 vs baseline). The right pre-merge process — dispatch on the PR branch via 'gh workflow run "Antithesis on cardano-node testnet" --ref <branch> -f duration=1' and compare findings_new before merging — was available the whole time and was not used. Both PRs landed on main on the strength of the Compose smoke test alone, which only proves "containers come up", not Antithesis behaviour. Removing the service is the cheapest restore-baseline step. The adversary image and the daemon code in cardano-node-clients stay intact; we re-add the compose service in a follow-up PR after a branch-dispatched Antithesis run shows findings_new ≤ baseline. components/adversary/ remains so the wrapper image keeps building and publishing; only the consumer-side service entry is removed. Tracks: https://github.com/cardano-foundation/cardano-node-antithesis/issues/89 (epic).
Signed-off-by: Eric Torreborre <[email protected]>
Three composer-side fixes for findings carried over on the previous Antithesis run (329a599 — see issues #105, #106, #107): 1. (B / #105) The not-applicable case in parallel_driver_transact.sh and parallel_driver_refill.sh was emitting `sdk_sometimes false ...`. Antithesis grades a Sometimes assertion as PASSING when at least one sample has condition=true; we always emitted condition=false, so this assertion could never satisfy and was always failed. Switch to `sdk_reachable` — accumulates samples without a pass/fail grade, the right type for "documented not-applicable response". 2. (C / #106) The composer scripts intentionally exited 1 on not-applicable to mark the tick as "skipped". Antithesis's built-in 'Always: Commands finish with zero exit code' property has no opt-out and grades every non-zero exit as a failure. Switch to the asteria-stub convention: always exit 0, encode tick state purely via SDK assertions (parallel_driver_heartbeat.sh and eventually_alive.sh in components/asteria-stub/composer/stub/ do this). Same in eventually_population_grew.sh — fire the unreachability on did-not-grow and exit 0. 3. (D-adjacent / #107) Add 'index-not-ready' to the refill driver's not-applicable case set. After cardano-node-clients#110 the daemon's freshness gate returns IndexNotReady for refills during the post- reconnect stale-UTxO window; the composer should treat this as a documented not-applicable, not an unknown failure. The submit-rejected paths keep their `sdk_unreachable` (strict) framing — the daemon-side freshness gate (#110) is the actual fix, and we want any leftover submit-rejecteds to surface as findings, not be silenced. Verification gate: a fresh 1h Antithesis run on this branch should show 0 failures from these three findings, plus the supervisor still triggering its full reconnect load.
Picks up the full reconnect-resilience stack on upstream main: * #105 — N2C reconnect supervisor + BlockedIndefinitely catch * #110 — post-reconnect indexer freshness gate (rsIndexFresh) The freshness gate closes the stale-UTxO window between bearer reconnect and chain-sync re-sync that produced the residual tx_generator_*_submit_rejected and tx_generator_population_did_not_grow Always-assertion failures on the previous Antithesis run (329a599).
Add Test.Cardano.Ledger.Api.State.Query.Examples to the api testlib.
Signed-off-by: jeluard <[email protected]>