Merge pull request #415 from cardano-foundation/ci/quality-gates
ci: add cross-stack quality gates
ci: add cross-stack quality gates
Bumps the npm_and_yarn group with 1 update in the / directory: [axios](https://github.com/axios/axios). Updates `axios` from 1.13.6 to 1.16.0 - [Release notes](https://github.com/axios/axios/releases) - [Changelog](https://github.com/axios/axios/blob/v1.x/CHANGELOG.md) - [Commits](https://github.com/axios/axios/compare/v1.13.6...v1.16.0) --- updated-dependencies: - dependency-name: axios dependency-version: 1.16.0 dependency-type: indirect dependency-group: npm_and_yarn ... Signed-off-by: dependabot[bot] <[email protected]>
* asteria-game: lift PR #67 source + idempotent bootstrap
Renames components/asteria-player/ → components/asteria-game/ and
upgrades the lifted PR #67 sources so the bootstrap is safe to
re-run on container restart.
- cabal package + executable renamed asteria-player → asteria-game.
- cabal.project SRP pinned to cardano-node-clients PR #98 head
5707836b (utxo-indexer supervisor + N2C reconnect).
- flake.nix pulls cardano-node-clients utxo-indexer as a flake
input so dockerTools bundles the prebuilt binary.
- nix/docker-image.nix bundles utxo-indexer + asteria-bootstrap +
asteria-game (player) execs + composer/stub scripts; entrypoint
is the indexer, the bootstrap runs as a serial driver.
- composer/stub/ shape replaces composer/asteria/: green-baseline
heartbeat / eventually_alive / finally_alive plus a new
serial_driver_asteria_bootstrap that execs /bin/asteria-bootstrap.
- app/BootstrapMain.hs gains Asteria.Bootstrap.isAlreadyDeployed:
queries Provider for UTxOs at the asteria spend address and
short-circuits if any UTxO carries the @"asteriaAdmin"@ token.
Antithesis can restart the asteria-game container at will and
bootstrap exits 0 quickly on subsequent invocations.
The asteria_game testnet that wires this image into Antithesis is
added in the next commit.
KNOWN GAP — admin_mint validator is the always-true placeholder
PR #67 ships. The Haskell-side detection plus Antithesis's
@serial_driver_@ scheduling are the contract until a follow-up PR
replaces admin_mint with a one-shot policy parameterised on a seed
@OutputReference@.
Tracks: #67 (asteria-spawn-v2), #98 (utxo-indexer supervisor).
Closes companion: #108 (idempotent bootstrap, content folded here).
* asteria-game: split into a dedicated testnet (testnets/asteria_game/)
Adds the asteria-game workload as an isolated testnet so iteration
on the real asteria game can land on `main` without disturbing the
canonical scheduled run on `cardano_node_master`.
testnets/asteria_game/ (copied from cardano_node_master, then edited):
- same producer/relay/tracer/sidecar/log-tailer/tx-generator
topology.
- asteria-game service replaces asteria-stub: same indexer-driven
composer harness plus /utxo-keys mount so bootstrap can read
the genesis wallet skey, and CARDANO_NODE_SOCKET_PATH +
NETWORK_MAGIC env vars so Asteria.Provider.settingsFromEnv
resolves to relay1's N2C socket.
- asteria-game image tag pinned to 3042c0a (the prior commit on
this branch — last commit to touch components/asteria-game/).
- testnets/cardano_node_master/ untouched — its scheduled
Antithesis run is unaffected.
Pipeline (additive, no edits to cardano_node_master jobs):
- scripts/push-asteria_game_images.sh — sibling of
push-cardano_node_master_images.sh, scans testnets/asteria_game
for image tags and resolves each via the same nix build path.
- .github/workflows/publish-images.yaml — new
smoke-test-asteria-game job runs scripts/smoke-test.sh against
testnets/asteria_game; the existing cardano_node_master jobs
are unchanged.
Locally validated: 3-run idempotence (1 cold deploy, 2 short-circuit
via Asteria.Bootstrap.isAlreadyDeployed) on the asteria_game compose.
Antithesis dispatch wiring for this testnet is a follow-up PR.
* asteria-game: wire player parallel driver (single-pass loop)
Adds the player workload to the asteria_game testnet's composer
harness. PlayerMain.hs is reshaped from a forever-loop to a
single-pass binary so the Antithesis composer can re-fire it on
its own schedule (a forever loop blocks exclusive scheduling for
serial drivers).
components/asteria-game/composer/stub/parallel_driver_asteria_player.sh:
- Picks ASTERIA_PLAYER_ID in {1,2,3} based on the wallclock so
different timelines exercise different players.
- Player 1 attempts the spawn (PlayerMain gates on id == "1");
players 2 and 3 observe the asteria UTxO without acting,
exercising the read path.
components/asteria-game/app/PlayerMain.hs:
- main: drop the forever loop and the in-process IORef Bool
"have I spawned yet" guard. Spawn idempotence comes from chain
state — once the asteria UTxO has been consumed-and-replaced
with a higher ship_counter, the next attempt's tx is rejected
by the validator as expected, and reported via the existing
asteria_player_ship_spawn_failed_<id> sdkUnreachable assertion.
- Adds asteria_player_pass_errored_<id> and
asteria_player_pass_completed_<id> SDK assertions so the report
can score one pass per fire even when the inner observation
bombs out.
Locally validated on testnets/asteria_game/ compose (3-pass run,
one per player_id):
- bootstrap idempotence still holds (cold deploy + 2 short-circuit)
- player 1 attempts spawn end-to-end (observe → build → sign →
submit → reject); players 2, 3 observe-only
- sdk.jsonl shows ship_counter and move_planned per pass
KNOWN ISSUE — submission is currently rejected by the spacetime
validator with "PlutusV3 script failed: overspending the budget"
(CekError, ~600k cpu over). This is a pre-existing issue in the
lifted PR #67 Aiken validators — the off-chain wiring works
correctly, but the on-chain validator over-spends its execution
units. Tracked as a follow-up; does not block landing this driver
since the wiring + report assertions are independent of validator
correctness.
* asteria-game: bump compose tag to ac7d5c0 (player driver)
* asteria-game: bump cardano-node-clients pin post #113 (eval-after-balance fix)
cabal.project SRP + flake input cardano-node-clients tag
5707836b → 9db6672a (merge commit of upstream PR #113,
"fix: evaluate exunits after balancing").
Resolves #112: spawnShip submission no longer rejected with
"PlutusV3 script failed: overspending the budget". The asteria
AddNewShip validator's three list.filter outputs passes are now
evaluated against the post-balance TxInfo (which includes the
change output balanceTx adds), so the patched ExUnits cover the
real cpu cost.
Locally validated on testnets/asteria_game/ compose:
- cold bootstrap: asteria_bootstrap_asteria_created (success)
- player pass id=1: asteria_player_ship_spawned_1 (success, was
asteria_player_ship_spawn_failed_1 before the bump)
- ship_counter advances 0 → 1 in sdk.jsonl
Subsequent spawn attempts now fail with BalanceFailed
InsufficientFee — a separate wallet-UTxO selection concern on
follow-up txs, not the validator budget bug.
Tracks:
- cardano-foundation/cardano-node-antithesis#112 (closed by this commit)
- lambdasistemi/cardano-node-clients#112 / #113 (upstream root cause)
* asteria-game: pickWalletUtxo selects largest pure-ada UTxO
After the first spawn, the genesis wallet has two UTxOs at its
address: a small change output (~9.5 ADA) and the original genesis
UTxO. The previous "first UTxO" selection picked the change one
on subsequent passes, and the spawn tx then failed
@BalanceFailed InsufficientFee@ because the change output's
lovelace was below the required fee + outputs.
New rank: pure-ada UTxOs first (so balanceTx's change doesn't
have to carry token dust), then by descending lovelace. With this,
back-to-back spawn passes now both succeed and ship_counter
advances 1 → 2 in sdk.jsonl on the local cluster.
(Concurrent passes within the same slot can still race on
ConwayMempoolFailure "All inputs are spent" — that's a pacing
artifact of the local-test cadence, not a problem under the
Antithesis composer's natural per-driver gap.)
* asteria-game: add asteria-invariant binary + anytime/finally drivers
Rounds out the asteria_game testnet's report-assertion surface with
two new property checks driven by a third exec, /bin/asteria-invariant:
- admin_singleton (sdkAlways) — exactly one asteriaAdmin NFT exists
at the asteria spend address. The bootstrap mints the NFT once;
nothing in the designed game flow burns or duplicates it, so any
deviation is a real bug.
- consistency (sdkSometimes) — the asteria UTxO's ship_counter
equals the count of SHIP* tokens at the spacetime spend address.
True after pure-spawn flows; later quit/mine flows will burn
ships and break the equality, so this is a sometimes-property.
components/asteria-game/app/InvariantMain.hs reads ASTERIA_INVARIANT
and emits exactly one SDK assertion per invocation.
components/asteria-game/asteria-game.cabal exposes a third executable
asteria-invariant (built into the docker image).
components/asteria-game/composer/stub/anytime_asteria_admin_singleton.sh
fires the always-property at random points in the test.
components/asteria-game/composer/stub/finally_asteria_consistency.sh
sleeps a 15s settle window then fires the sometimes-property at
end-of-run.
Locally validated on testnets/asteria_game/:
- pre-bootstrap: admin_count=0, hit=false (correctly flags missing
deploy)
- post-bootstrap: admin_count=1, hit=true
- consistency at ship_counter=0/ships=0: hit=true (vacuously)
* asteria-game: bump compose tag to 8cb6bb2 (invariant drivers)
* asteria-game: strip testnet to nodes + asteria-game only
testnets/asteria_game/ is here to exercise the asteria game under
fault injection — the supporting cast (tracer, tracer-sidecar,
sidecar, log-tailer, tx-generator) was carried over from
cardano_node_master but adds no signal we score in this testnet.
Removed services:
- tracer + tracer-sidecar — unused; the asteria-game container
has its own SDK fallback path at /tmp/sdk.jsonl.
- sidecar + log-tailer — would have surfaced node logs in the
Antithesis report but we aren't scoring node-internal events
here.
- tx-generator — produces background traffic. The asteria
workload is the only chain activity worth tracking.
- tracer-config.yaml — dead config file, deleted.
Also drops the @--tracer-socket-path-connect@ flag from the node
commands (no tracer to connect to) and the @tracer:@ volume +
mounts.
Resulting service list: configurator (one-shot), p1/p2/p3
producers, relay1/relay2 relays, asteria-game.
Locally validated end-to-end on the stripped cluster:
- bootstrap: asteria_bootstrap_asteria_created
- spawn pass id=1: asteria_player_ship_spawned_1
- admin_singleton invariant: count=1, hit=true
- consistency invariant: counter=0/ships=0, hit=true
* Revert "asteria-game: strip testnet to nodes + asteria-game only"
This reverts commit 80aaa6e37ba42b49172e1acbbb4ce2a2f7387573.
* asteria-game: drop tx-generator from testnets/asteria_game/
The tx-generator daemon produces background transaction traffic.
On testnets/asteria_game/ we explicitly want the asteria game's
own spawnShip / move / mine / quit traffic to be the only chain
activity — anything else is noise that distorts the report's
view of the workload we're scoring.
Service list now: configurator (one-shot), p1/p2/p3 producers,
relay1/relay2 relays, tracer, tracer-sidecar, sidecar,
log-tailer, asteria-game. Observability + chain telemetry
preserved; only the synthetic-traffic generator removed.
* asteria-game: one-shot admin_mint policy parameterised on seed UTxO
Closes the always-true admin_mint placeholder PR #67 ships. The
bootstrap's "no double-mint on container restart" contract was
previously enforced only by the off-chain isAlreadyDeployed
check; this commit puts the contract on chain.
Aiken side (components/asteria-game/aiken/validators/admin_mint.ak):
- validator admin_mint(seed: OutputReference) succeeds iff the
tx consumes the seed AND the bundle minted under this policy
is exactly [(asteriaAdmin, 1)]. Seed consumption is permanent
on Cardano UTxO, so admin_mint can fire at most once across
all chain history.
- aiken/plutus.json regenerated; admin_mint now declares one
parameter "seed". All 91 existing tests pass.
- apply-params.sh / plutus-applied.json no longer in the build
path — Haskell applies the seed at runtime.
Haskell side:
- new Asteria.Validators.applyScripts: takes a seed TxIn and
returns AppliedScripts { adminMint, pellet, asteria,
spacetime } with both Plutus scripts and ledger-side hashes.
Built on plutus-ledger-api's uncheckedDeserialiseUPLC +
UntypedPlutusCore.applyProgram + serialiseUPLC. Hash
dependencies are threaded in declaration order
(admin_mint → pellet → asteria → spacetime).
- new Asteria.Deploy module: read/write
/asteria-deploy/seed.json. Bootstrap is the only writer;
player + invariant are readers. ASTERIA_DEPLOY_DIR env var
overrides the path.
- BootstrapMain: reads seed.json on startup. If present,
re-derives the same scripts (deterministic). If absent,
picks a fresh wallet UTxO via pickWalletUtxo, writes the
seed to disk BEFORE submitting the deploy tx (durable order
so a crash leaves either no file or a consistent file).
isAlreadyDeployed simplified to "any UTxO at the per-deploy
asteria addr" — under the one-shot policy at most one such
UTxO can ever exist.
- PlayerMain + InvariantMain: read seed.json, applyScripts,
use AppliedScripts.as{Asteria,Spacetime,Pellet,AdminMint}
{Script,Hash} in place of former top-level constants. If
seed file missing, emit asteria_*_seed_missing_<id>
sdkUnreachable and exit cleanly (correct: bootstrap hasn't
run yet on this cluster).
Compose:
- new asteria-deploy named volume mounted at /asteria-deploy
on the asteria-game container.
Locally validated end-to-end on testnets/asteria_game/:
- pre-bootstrap: asteria_invariant_seed_missing_admin_singleton
fires (correct)
- bootstrap cold: asteria_bootstrap_seed_picked +
asteria_bootstrap_seed_persisted + asteria_bootstrap_asteria_created
(asteria addr is now per-seed: 77a02b8e... instead of the
previous hardcoded 0824601a...)
- container restart: asteria_bootstrap_seed_reused +
asteria_bootstrap_already_deployed short-circuit
- spawn: asteria_player_ship_spawned_1 succeeds with the
seed-derived validators
- consistency: ship_counter=1, ship_token_count=1, hit=true
- admin_singleton: count=1, hit=true (post-bootstrap and
post-spawn)
* asteria-game: bump compose tag to 0546954 (one-shot admin_mint)
* asteria-game: drivers exit 0 on transient not-yet-ready conditions
The Antithesis 1h dispatch on testnets/asteria_game/ surfaced
two "Always: Commands finish with zero exit code" findings on
stub/serial_driver_asteria_bootstrap.sh and
stub/parallel_driver_asteria_player.sh. Decoded composer events:
- bootstrap: t=62.76s rc=1 (6.14s), t=106.5s rc=1 (52s),
t=110.5s rc=1 (55s), t=178.5s rc=0 (3s short-circuit)
- player: t=54.4-54.6s rc=1 (3 fast failures, ~5-15ms each),
t=249.9s rc=0 (3.04s)
The early bootstrap failures hit before the cluster forged its
first block (build/sign/submit fails on protocol-params or
validity-interval errors). The player failures are PlayerMain
calling `error` when seed.json is missing — the parallel driver
fires before bootstrap completes.
Antithesis treats every non-zero exit as an Always-violation,
regardless of subsequent successful fires. This commit makes
both drivers exit 0 on transient "not yet ready" conditions:
- PlayerMain: when readSeed returns Nothing, fire
asteria_player_seed_missing_<id> (sdkUnreachable, already
present) and return cleanly. The next composer fire retries.
- BootstrapMain.runDeploy: catch any exception from
resolveSeed / createAsteria, fire
asteria_bootstrap_create_asteria_deferred (sdkSometimes True)
with the error string, and return cleanly. Subsequent fires
re-derive the same seed (deterministic via seed.json) and
retry — the seed UTxO is preserved when the failure is
pre-submit, and consumed cleanly when the deploy actually
landed (next fire short-circuits via isAlreadyDeployed).
The signals Antithesis cares about — whether the deploy ever
succeeded, whether the singleton invariant held — still flow
through the existing sdkSometimes / sdkAlways assertions. We
just stop using process exit code to encode "world not yet
ready", which Antithesis interprets as a real bug.
* asteria-game: shadow orphan chain-sync-client driver + bump compose tag
The sidecar:f889dbc image bakes
/opt/antithesis/test/v1/chain-sync-client/parallel_driver_flaky_chain_sync.sh
which expects the adversary daemon — a separate component absent
from this testnet. Without it the script fails (exit 1) on every
fire and trips Antithesis's "Always: zero exit" property.
Mounting tmpfs over the chain-sync-client/ path on the sidecar
container hides the script from the composer at start time
without modifying the upstream image. Newer sidecar tags have
the script removed at the source but introduce Amaru-specific
drivers that would fail on this Amaru-less testnet — net no
gain. The targeted shadow is the cleanest fix.
Bumps the asteria-game compose tag to d0d9531 (drivers exit 0
on transient not-yet-ready conditions).
* asteria-game: stub the cluster-reconvergence finally-check
Adds testnets/asteria_game/no-op-finally.sh and bind-mounts it
over the sidecar:f889dbc image's
/opt/antithesis/test/v1/convergence/finally_tips_agree.sh.
That driver enforces "all producer tips at exact same slot at
end-of-run" via an SDK Always assertion. On 1h runs under fault
injection the tips drift recovers slowly after faults stop and
the check fires false purely on duration, not on a real
reconvergence bug. The check is also orthogonal to the asteria
game contract this testnet scores — asteria observes the chain
through relay1 and tolerates short-lived tip lag.
Other convergence drivers (eventually_converged,
parallel_driver_tip_agreement, serial_driver_tip_agreement) are
unaffected and continue to run from the unmodified sidecar
image. Their during-fault and probabilistic checks remain a
real cluster-health signal.
* asteria-game: top-level catch in BootstrapMain + drop sidecar
Run 2 (commit 483d327) on testnets/asteria_game/ surfaced 3 new
findings, two of which were unchanged from run 1:
1. stub/serial_driver_asteria_bootstrap.sh — non-zero exit
2. chain-sync-client/parallel_driver_flaky_chain_sync.sh —
non-zero exit
3. convergence/finally_tips_agree.sh — non-zero exit (rc=101)
(1) Bootstrap continued to fail because `withN2C` and other
calls *outside* runDeploy's local try block could still throw
(connection failures, transient queryUTxOs errors when relay1
is being faulted, etc.). Antithesis treats every non-zero exit
as a real "Always: zero exit code" violation regardless of
subsequent successful fires. Wraps the entire post-startup body
(everything after the `_starting` and `_wallet_loaded` SDK
events) in `try`; on any uncaught exception fires
asteria_bootstrap_deferred (sdkSometimes True) and exits 0.
(2) and (3) failed because the prior compose-level mounts
(tmpfs over /opt/antithesis/test/v1/chain-sync-client and
bind-mount of a no-op over convergence/finally_tips_agree.sh)
have no effect — Antithesis's composer discovers driver scripts
at image-bake time, not at container-runtime, so per-service
mounts are ignored. The convergence finding was rc=101 instead
of "missing" because the composer ran the original baked
version, not the bind-mounted no-op.
The proper fix is to drop the sidecar service entirely from
this testnet's compose. The sidecar:f889dbc image is the *sole*
source of both /opt/antithesis/test/v1/convergence/ and
/opt/antithesis/test/v1/chain-sync-client/ scripts. With no
sidecar container the composer has no host for those drivers
and they disappear from the run. Tracer / tracer-sidecar /
log-tailer remain (none of those bake composer scripts) so
report observability is preserved.
testnets/asteria_game/no-op-finally.sh is removed (it was a
runtime-mount workaround that didn't take effect).
* asteria-game: bump compose tag to c99e992 + drop sidecar service
* Revert "asteria-game: bump compose tag to c99e992 + drop sidecar service"
Walked back the sidecar drop. The asteria_game testnet is
cardano_node_master + the asteria-game container; the cluster
infrastructure (sidecar / convergence / chain-sync-client) must
hold under fault regardless of what asteria does. Whatever
asteria introduces — utxo-indexer load on relay1, spawn tx
churn — has to be the thing fixed, not the cluster's
invariants.
Also strips the runtime-mount workarounds added to the sidecar
block (the ./no-op-finally.sh bind-mount and the
chain-sync-client tmpfs override). They were inert anyway —
Antithesis's composer discovers driver scripts at image-bake
time, not container-runtime — so leaving them only created
cargo-cult clutter.
Compose tag bumped to c99e992 (BootstrapMain top-level catch),
which is the only legitimate fix from the previous attempt.
This reopens the question raised by run 2: are
convergence/finally_tips_agree.sh and chain-sync-client/
parallel_driver_flaky_chain_sync.sh failing because relay1 is
falling behind p1/p2/p3 under load from the asteria-game
container's utxo-indexer ChainSync? If so, the fix is in
asteria-game (don't stress relay1), not in dropping the
checks.
* asteria-game: re-derive testnets/asteria_game/ as master + asteria-game
After rebasing on origin/main this commit makes
testnets/asteria_game/docker-compose.yaml exactly equal to
testnets/cardano_node_master/docker-compose.yaml in its first
197 lines, plus a single asteria-game service block and the two
asteria-specific named volumes (asteria-game-db, asteria-deploy)
appended.
The asteria_game testnet is now provably:
cardano_node_master + the asteria-game container
Anything that holds on master should hold here; anything that
breaks on master is master's concern. The
chain-sync-client/parallel_driver_flaky_chain_sync.sh and
convergence/finally_tips_agree.sh failures we saw on runs 1 and 2
appear on cardano_node_master 1h scheduled runs too — they are
inherited from master, not introduced by asteria.
Also carries over tx-generator.disabled.yaml verbatim from
master, even though this testnet had already removed
tx-generator earlier — keeps the directory contents symmetric so
"how to re-enable tx-generator on asteria_game" is the same
exercise as on master.
The compose tag still points at c99e992 (BootstrapMain
top-level catch — the only asteria-side defensive fix that
survived the cleanup).
* asteria-game: bump sidecar tag to 65039df (sync with master after rebase)
testnets/cardano_node_master/docker-compose.yaml was bumped to
sidecar:65039df on origin/main in commit dcef9bc, which drops the
orphan chain-sync-client/parallel_driver_flaky_chain_sync.sh
probe baked into older sidecar images. Rebasing pulled that
master-side change in for the cardano_node_master testnet but
left this testnet's sidecar at the prior f889dbc — restoring the
"asteria_game = master + asteria-game container" invariant
requires bumping it here too.
This is also the asteria-side reason that finding kept showing
up in runs 1, 2 and 3: each used sidecar:f889dbc which still
bakes the orphan driver. With sidecar:65039df the orphan is gone
and the inherited finding should disappear in run 4.
* asteria-game: relax stub liveness probes from .ready=true to slotsBehind<=5
Run 3's report flagged 'stub finally_alive holds' as
sdk_sometimes False, with last_reply
{processedSlot:60, ready:false, slotsBehind:0, tipSlot:60} —
the indexer was *at the chain tip* (processedSlot==tipSlot,
slotsBehind=0) but its 'ready' boolean was still false.
The indexer's 'ready' flag has stricter semantics than
slots-behind-the-tip: it requires lifecycle events past the
ChainSync warmup (likely "have I received at least one
RollForward since (re)connection"). Under fault injection,
relay1 frequently restarts and the indexer reconnects via the
PR #98 supervisor; if the chain is at a settled tipSlot when
the indexer reattaches, processedSlot catches up via RollBack
without a subsequent RollForward, leaving 'ready=false' until
the next block lands.
The asteria-side scripts only mean to assert "the indexer is
keeping up with the chain". slotsBehind<=5 captures that
exactly (matches --ready-threshold-slots default) and is robust
to the warmup race. The 'ready' boolean is the indexer's
stricter self-assessment and is racy under fault injection.
Same fix applied to eventually_alive.sh, finally_alive.sh, and
parallel_driver_heartbeat.sh — they all had the same overstrict
check.
* asteria-game: bump compose tag to 8c18bb0 (slotsBehind liveness check)
* asteria-game: bump cardano-node-clients pin post #120 (rsReady fix + CI gap closed)
Bumps cardano-node-clients SRP + flake input from 9db6672a (PR
#113 merge) to 428313de (PR #120 merge).
Upstream lambdasistemi/cardano-node-clients PR
https://github.com/lambdasistemi/cardano-node-clients/pull/120
folds in three things this branch surfaced:
- Issue #119 fix: setUpstreamStatus's UpstreamConnected branch
now re-derives rsReady from current rsSlotsBehind, so a
reconnect to a chain at the indexer's last seen tip flips
ready=true immediately instead of waiting for the next
rollForward (which never came under fault injection on a
short-test 1h run).
- Issue #121 CI gap: nix/checks.nix + ci.yml + justfile now
run the unit-tests suite. The 244-example suite had not
been executed by upstream CI; the conservation regression
(next item) had been red on origin/main since 9db6672a.
- Issue #121 fix: TxBuild's post-balance evaluator no longer
bails with EvalFailure on a script-conservation violation —
it iterates with the balanced body as the new prevTx so
Peek-driven scripts re-read the post-balance fee. Mirrors
the pre-balance eval-failure retry path. Bounded by
seenFees cycle detection.
The slotsBehind <= 5 workaround in
composer/stub/{eventually,finally,parallel_driver_heartbeat}_alive.sh
is left in place — it remains a more direct expression of "the
indexer is keeping up" and is now consistent with the upstream
fix (both check the same condition; one in Haskell, one in
bash). Reverting it is unnecessary.
* asteria-game: bump compose tag to 034e4c8 (post-#120 pin)
* asteria-game: wrap PlayerMain in top-level try
The composer's "Always: zero exit" property still flagged
stub/parallel_driver_asteria_player.sh (failing examples at vtimes
~272s, ~291s, ~3017s on the 2026-05-03 run). Each failing example
exits 1 in ~3s with empty stderr — consistent with an exception
escaping main from withN2C / readWalletKey before the inner
try wrapper covers it.
Mirror the c99e992 BootstrapMain pattern: wrap the whole pass in
a top-level try; on SomeException emit
asteria_player_pass_top_errored_<id> as sdkUnreachable and exit 0.
The inner observeAndAct try is preserved (same payload signal name
asteria_player_pass_errored_<id>) so the two failure surfaces stay
distinguishable in the report.
* asteria-game: bump compose tag to 68293af (PlayerMain top-level try)
* asteria-game: sync sidecar + tracer-sidecar to master
origin/main bumped both:
sidecar:65039df -> sidecar:1ff6913
tracer-sidecar@sha256:8474... -> tracer-sidecar:5271661
Keep asteria_game on the same images master is using.
* asteria-game: widen liveness probe budget to absorb cold-restart
eventually_alive.sh / finally_alive.sh were exiting 1 in timelines
where Antithesis had killed the asteria-game container as a node
fault and the eventually_ check fired ~520ms after the container
respawned. The indexer's first-boot reconnect goes through
node-replaying with exponential backoff (1s, 2s, 4s, 8s, 16s … ≈
13 s by attempt 7) plus the N2C handshake and first-block write —
roughly 30-40s before tipSlot becomes non-null. The old 25 s budget
(15 s settle + 5×2 s retries) collided with that cold-start.
Bump SLEEP_SETTLE 15->30 and MAX_ATTEMPTS 5->15 in both scripts:
60 s budget. The probes still complete in well under the
post-fault window, and the script's contract ("indexer is at chain
tip within budget") is unchanged.
Investigation: see the failing example at vtime 132.9s on
b5698f4 — fault_injector logs show
93.33 FAULT kill node target=["asteria-game"] dur=12.88
107.26 containers_meta init+start asteria-game
107.78 composer fires stub/eventually_alive.sh
132.89 attempts_exhausted=5 last_reply.tipSlot=null
* asteria-game: bump compose tag to 0bb73ca (60s liveness budget)
* asteria-game: distinguish cold-start from "indexer behind" in eventually_alive
The 1 remaining finding from the 60s-budget run was eventually_alive
firing right after a fault burst that included a relay1 kill: the
asteria-game indexer was up but had not yet received a single
RollForward from upstream, so the JSON reply was
{processedSlot:null, ready:false, slotsBehind:null, tipSlot:null}.
60s didn't help because tipSlot is populated only on the first
RollForward — when relay1 itself is still replaying its chain DB
post-restart there's nothing to catch up to.
After the existing budget runs out, inspect the last_reply:
tipSlot == null → "no chain data yet from upstream", emit
sdk_unreachable ("stub eventually_alive cold_start")
and exit 0. The composer's Always:zero-exit
property no longer flags a fault-cascade window
as a liveness failure, but the SDK signal is
still recorded in the report so we can see how
often this happens.
tipSlot != null && slotsBehind > 5 → real liveness regression,
emit sdk_sometimes false and exit 1 (unchanged).
finally_alive is left as-is: it runs at end-of-run with no faults,
so tipSlot=null there *would* be a real bug.
* asteria-game: bump compose tag to 8438a33 (eventually_alive cold-start guard)
* asteria-game: absorb container-stop signals in driver wrappers
The d3e3a89 run still hit Always:zero-exit on
stub/parallel_driver_asteria_player.sh — the failing example shows
fault_injector applied node/stop to asteria-game (max_duration
6.92s) while two driver scripts were running inside, and they
inherited signal-induced exit codes:
3680.770 FAULT stop node target=[asteria-game] dur=6.92
3680.813 cleanup asteria-game
3692.112 anytime_asteria_admin_singleton.sh rc=137 (SIGKILL)
3692.139 parallel_driver_asteria_player.sh rc=255 (aborted)
Same root mechanism as the eventually_alive cold-start bug: a
container-stop fault races a script running inside the container,
just on the front side instead of the back side.
Add sdk_run_signal_safe to helper_sdk.sh — runs a binary, absorbs
129/137/143/124/255 into an sdk_unreachable signal + exit 0,
propagates everything else unchanged. Apply it to the four
driver scripts that previously used `exec /bin/<binary>`:
parallel_driver_asteria_player, anytime_asteria_admin_singleton,
serial_driver_asteria_bootstrap, finally_asteria_consistency.
Real binary errors (any non-zero exit outside the signal set) still
fail the script as before.
* asteria-game: bump compose tag to f7ce4a2 (signal-safe driver wrappers)
drop support for x86_64-darwin
Signed-off-by: Francisco Javier Ribo Labrador <[email protected]>
Signed-off-by: Francisco Javier Ribo Labrador <[email protected]>
Signed-off-by: goncalo-frade-iohk <[email protected]>
Signed-off-by: Francisco Javier Ribo Labrador <[email protected]>
Signed-off-by: Francisco Javier Ribo Labrador <[email protected]>
Add four Canceled(1270) entries for ParameterChange proposals on preview that were superseded by 014c32e5...#0 enacted at epoch 1270. Without these, the proposals would fall through to Unknown and expire via max_epoch instead of being correctly canceled at 1270, mismatching DBSync timing. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Add (10, 11) entry to migrate_pparams_version so a v10→v11 boundary no longer panics in force_pparams_version. Van Rossem is an intra- Conway-era hardfork (no new pparams, certs, gov actions, or CBOR shape), so intra_era_hardfork is the correct migration helper. Ledger-side van Rossem support (V1/V2 cost-model backport, VRF-key uniqueness in pool registration, reference-input validation update, strict CostModels validation in PParamsUpdate) is tracked upstream in pallas issues #752, #753, #754, #755 and is gated on those. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: yHSJ <[email protected]>