Home / Cardano Foundation / cardano-node-antithesis
May 04, 9-10 AM (1)
May 04, 10-11 AM (2)
May 04, 11-12 PM (0)
May 04, 12-1 PM (0)
May 04, 1-2 PM (0)
May 04, 2-3 PM (1)
May 04, 3-4 PM (4)
May 04, 4-5 PM (0)
May 04, 5-6 PM (0)
May 04, 6-7 PM (0)
May 04, 7-8 PM (0)
May 04, 8-9 PM (0)
May 04, 9-10 PM (0)
May 04, 10-11 PM (0)
May 04, 11-12 AM (0)
May 05, 12-1 AM (0)
May 05, 1-2 AM (0)
May 05, 2-3 AM (0)
May 05, 3-4 AM (0)
May 05, 4-5 AM (0)
May 05, 5-6 AM (0)
May 05, 6-7 AM (0)
May 05, 7-8 AM (3)
May 05, 8-9 AM (2)
May 05, 9-10 AM (4)
May 05, 10-11 AM (2)
May 05, 11-12 PM (7)
May 05, 12-1 PM (0)
May 05, 1-2 PM (4)
May 05, 2-3 PM (0)
May 05, 3-4 PM (7)
May 05, 4-5 PM (0)
May 05, 5-6 PM (0)
May 05, 6-7 PM (0)
May 05, 7-8 PM (0)
May 05, 8-9 PM (0)
May 05, 9-10 PM (0)
May 05, 10-11 PM (0)
May 05, 11-12 AM (0)
May 06, 12-1 AM (0)
May 06, 1-2 AM (0)
May 06, 2-3 AM (0)
May 06, 3-4 AM (0)
May 06, 4-5 AM (0)
May 06, 5-6 AM (0)
May 06, 6-7 AM (1)
May 06, 7-8 AM (0)
May 06, 8-9 AM (2)
May 06, 9-10 AM (3)
May 06, 10-11 AM (0)
May 06, 11-12 PM (0)
May 06, 12-1 PM (0)
May 06, 1-2 PM (0)
May 06, 2-3 PM (1)
May 06, 3-4 PM (1)
May 06, 4-5 PM (0)
May 06, 5-6 PM (0)
May 06, 6-7 PM (3)
May 06, 7-8 PM (0)
May 06, 8-9 PM (0)
May 06, 9-10 PM (0)
May 06, 10-11 PM (0)
May 06, 11-12 AM (0)
May 07, 12-1 AM (0)
May 07, 1-2 AM (0)
May 07, 2-3 AM (0)
May 07, 3-4 AM (0)
May 07, 4-5 AM (0)
May 07, 5-6 AM (1)
May 07, 6-7 AM (1)
May 07, 7-8 AM (0)
May 07, 8-9 AM (5)
May 07, 9-10 AM (0)
May 07, 10-11 AM (0)
May 07, 11-12 PM (0)
May 07, 12-1 PM (0)
May 07, 1-2 PM (0)
May 07, 2-3 PM (1)
May 07, 3-4 PM (1)
May 07, 4-5 PM (0)
May 07, 5-6 PM (0)
May 07, 6-7 PM (0)
May 07, 7-8 PM (0)
May 07, 8-9 PM (0)
May 07, 9-10 PM (0)
May 07, 10-11 PM (0)
May 07, 11-12 AM (0)
May 08, 12-1 AM (0)
May 08, 1-2 AM (0)
May 08, 2-3 AM (0)
May 08, 3-4 AM (0)
May 08, 4-5 AM (0)
May 08, 5-6 AM (0)
May 08, 6-7 AM (0)
May 08, 7-8 AM (0)
May 08, 8-9 AM (0)
May 08, 9-10 AM (0)
May 08, 10-11 AM (0)
May 08, 11-12 PM (0)
May 08, 12-1 PM (0)
May 08, 1-2 PM (0)
May 08, 2-3 PM (1)
May 08, 3-4 PM (1)
May 08, 4-5 PM (0)
May 08, 5-6 PM (1)
May 08, 6-7 PM (0)
May 08, 7-8 PM (0)
May 08, 8-9 PM (0)
May 08, 9-10 PM (0)
May 08, 10-11 PM (0)
May 08, 11-12 AM (0)
May 09, 12-1 AM (0)
May 09, 1-2 AM (0)
May 09, 2-3 AM (0)
May 09, 3-4 AM (0)
May 09, 4-5 AM (0)
May 09, 5-6 AM (0)
May 09, 6-7 AM (0)
May 09, 7-8 AM (0)
May 09, 8-9 AM (0)
May 09, 9-10 AM (0)
May 09, 10-11 AM (0)
May 09, 11-12 PM (0)
May 09, 12-1 PM (0)
May 09, 1-2 PM (0)
May 09, 2-3 PM (0)
May 09, 3-4 PM (0)
May 09, 4-5 PM (0)
May 09, 5-6 PM (0)
May 09, 6-7 PM (0)
May 09, 7-8 PM (0)
May 09, 8-9 PM (0)
May 09, 9-10 PM (0)
May 09, 10-11 PM (0)
May 09, 11-12 AM (0)
May 10, 12-1 AM (0)
May 10, 1-2 AM (0)
May 10, 2-3 AM (0)
May 10, 3-4 AM (0)
May 10, 4-5 AM (0)
May 10, 5-6 AM (0)
May 10, 6-7 AM (0)
May 10, 7-8 AM (0)
May 10, 8-9 AM (0)
May 10, 9-10 AM (0)
May 10, 10-11 AM (2)
May 10, 11-12 PM (0)
May 10, 12-1 PM (0)
May 10, 1-2 PM (0)
May 10, 2-3 PM (0)
May 10, 3-4 PM (0)
May 10, 4-5 PM (0)
May 10, 5-6 PM (0)
May 10, 6-7 PM (0)
May 10, 7-8 PM (0)
May 10, 8-9 PM (0)
May 10, 9-10 PM (0)
May 10, 10-11 PM (0)
May 10, 11-12 AM (0)
May 11, 12-1 AM (0)
May 11, 1-2 AM (0)
May 11, 2-3 AM (0)
May 11, 3-4 AM (0)
May 11, 4-5 AM (0)
May 11, 5-6 AM (0)
May 11, 6-7 AM (0)
May 11, 7-8 AM (2)
May 11, 8-9 AM (0)
May 11, 9-10 AM (0)
63 commits this week May 04, 2026 - May 11, 2026
chore(testnet): bump asteria-game pin to 126bb4e to activate --kill-after=2
Mirrors the fix→chore pattern of #143 (444a1a5 → dfd6a3e). The
publish-images workflow rebuilds asteria-game at the SHA pinned in
each testnet's docker-compose; without this bump the next Antithesis
run would still pull the old :444a1a5 image and not exercise the
SIGKILL escalation.

Re #145.
fix(asteria-game): add --kill-after=2 to all stub timeout-wrapped binaries
Closes #145.

After #143 landed, six of seven previously-failing stubs went green
on the first post-merge cron of 8690faa, but
stub/parallel_driver_asteria_player.sh still tripped the
Always:zero-exit-code property with one new finding (3h 19m run,
faults on). Decoded examples:

  example 1  fail  rc=1  runtime 27.27 s
  example 2  fail  rc=1  runtime 28.65 s
  example 3  fail  rc=1  runtime 47.31 s
  example 4  pass  rc=0  runtime  3.02 s

The wrapper is `sdk_run_signal_safe ... timeout 12 /bin/asteria-game`.
Plain `timeout 12` sends SIGTERM at the 12 s deadline but does not
escalate to SIGKILL — that requires --kill-after. The Haskell
binary catches SIGTERM, runs slow N2C cleanup that fails on a
torn socket, then exits rc=1 (Haskell default unhandled-exception
code). sdk_run_signal_safe deliberately propagates non-signal
exits so the property fires.

Adding --kill-after=2 escalates to SIGKILL 2 s after SIGTERM. The
kernel-killed exit (137) is in sdk_run_signal_safe's absorb set
along with 124/129/143/255, so the script terminates deterministically
inside (deadline + 2) seconds and the property stays green.

Same shape on three sibling stubs that haven't tripped yet but
have the same wrapper pattern; fixed all four together to match
the failure mode rather than the run-by-run symptom:

  parallel_driver_asteria_player.sh   timeout 12 → --kill-after=2 12
  anytime_asteria_admin_singleton.sh  timeout 12 → --kill-after=2 12
  finally_asteria_consistency.sh      timeout 30 → --kill-after=2 30
  serial_driver_asteria_bootstrap.sh  timeout 25 → --kill-after=2 25

Local smoke (process that catches SIGTERM and runs slow cleanup):
  - plain timeout 1   → exit 124 sometimes, child rc otherwise (leaky)
  - --kill-after=2 1  → exit 137 reliably

No semantic change for healthy runs; failure path now terminates
predictably and is absorbed.
chore(testnet): bump asteria-game pin to 444a1a5 to activate signal-trap absorbers
Mirrors the 6be8939 → 290a8ed3 pattern: ship the script fix in one
commit, then bump the docker-compose pin in a follow-up so
publish-images.yaml rebuilds asteria-game with the new scripts and
the next Antithesis run actually exercises the fix.

Updates both testnets that pin asteria-game:
- testnets/cardano_node_master/docker-compose.yaml
- testnets/cardano_node_adversary/docker-compose.yaml

Re #142.
fix(asteria-game): absorb in-bash signals + flock sdk.jsonl appends so stub/*.sh honour exit-0 contract under fault injection
Closes #142.

Run try-10 of commit 290a8ed3 reported 7 NEW findings, all under
"Always: Commands finish with zero exit code" against stub/*.sh. Try-11
on the same commit and the same image digests was clean. Same code,
different scheduling — the stubs flake when Antithesis fault injection
delivers a signal to the bash interpreter (not the wrapped binary), or
when concurrent parallel_driver invocations race on /tmp/sdk.jsonl.

Three changes in helper_sdk.sh:

- _sdk_emit now wraps its >> /tmp/sdk.jsonl append with `flock -x` on
  the open append-FD, so two concurrent shells can't interleave at the
  syscall level under FS-fault injection.
- New `sdk_install_signal_trap` installs absorbing traps on
  SIGTERM/SIGINT/SIGPIPE that emit `sdk_sometimes_optional false` and
  exit 0; sourced once at the top of every stub script.
- New `sdk_run_signal_safe_fn` extends `sdk_run_signal_safe` to wrap
  shell-function bodies, not just single-binary launches — needed for
  the heartbeat/eventually_alive/finally_alive stubs whose work is a
  printf|timeout 1 socat|jq pipeline rather than one binary.

Per-stub:

- parallel_driver_heartbeat.sh, eventually_alive.sh, finally_alive.sh:
  body extracted into a local `_xxx_body` function, run through the
  new fn-wrapper. Variable names lower-cased to local style.
- anytime_asteria_admin_singleton.sh, finally_asteria_consistency.sh,
  parallel_driver_asteria_player.sh, serial_driver_asteria_bootstrap.sh:
  add the signal trap as defense-in-depth around the existing
  sdk_run_signal_safe binary wrap.

Smoke-tested standalone: 10 concurrent shells × 5 emits each produces
exactly 50 valid JSON lines (no race / loss); SIGTERM mid-sleep yields
exit 0 with the trap-emitted observation; `timeout 124` via the
fn-wrapper yields exit 0 with `must_hit:false`.

Real verification is the next Antithesis run on this branch.
chore(asteria-game): bump pin to 6a9a93b to activate cold_start fix
Activates the must_hit:false cold_start emit from the prior commit.
publish-images will resolve the new tag, build asteria-game from
6a9a93b (which contains both the budget fix from PR #135 and the
must_hit:false fix from this branch), push, and the next master
cron + adversary dispatch will pull the new image.

Tagging the immediate commit instead of the eventual rebase SHA on
main: publish-images resolves the tag via `git rev-list -n 1 <tag>`,
which works as long as the tag exists in the repo's git history —
6a9a93b's parent is on main, so the tag will resolve cleanly even
after rebase merge.
fix(asteria-game): make cold_start emit must_hit:false to silence Sometimes-failed finding
PR #135 replaced sdk_unreachable in the cold-start path with
sdk_sometimes false to drop AlwaysOrUnreachable's hit:true +
condition:false finding mode. But sdk_sometimes hardcodes
must_hit:true, so the assertion now fails differently: the report
flags any must_hit:true Sometimes that only ever sees condition:false
("Sometimes assertions → stub eventually_alive cold_start: new",
3 examples on the first master verify run after the budget bump).

Add sdk_sometimes_optional which emits with must_hit:false. Same
shape as sdk_sometimes but never a finding — observation-only. Use
it for the cold_start emit since that branch isn't guaranteed to
be reached across all timelines (depends on whether a fault-cascade
window happens to coincide with an eventually_ dispatch).

The helper refactor adds an optional 8th argument to _sdk_emit
(default true). All existing callers keep their behavior. The new
sdk_sometimes_optional is a sibling of sdk_sometimes that passes
false for must_hit. sdk_unreachable, sdk_reachable, sdk_always
unchanged.

Smoke-tested locally — sdk_sometimes_optional false emits a JSONL
event with must_hit:false as expected.
chore(asteria-game): bump pin to 6a9a93b to activate cold_start fix
Activates the must_hit:false cold_start emit from the prior commit.
publish-images will resolve the new tag, build asteria-game from
6a9a93b (which contains both the budget fix from PR #135 and the
must_hit:false fix from this branch), push, and the next master
cron + adversary dispatch will pull the new image.

Tagging the immediate commit instead of the eventual rebase SHA on
main: publish-images resolves the tag via `git rev-list -n 1 <tag>`,
which works as long as the tag exists in the repo's git history —
6a9a93b's parent is on main, so the tag will resolve cleanly even
after rebase merge.
fix(asteria-game): make cold_start emit must_hit:false to silence Sometimes-failed finding
PR #135 replaced sdk_unreachable in the cold-start path with
sdk_sometimes false to drop AlwaysOrUnreachable's hit:true +
condition:false finding mode. But sdk_sometimes hardcodes
must_hit:true, so the assertion now fails differently: the report
flags any must_hit:true Sometimes that only ever sees condition:false
("Sometimes assertions → stub eventually_alive cold_start: new",
3 examples on the first master verify run after the budget bump).

Add sdk_sometimes_optional which emits with must_hit:false. Same
shape as sdk_sometimes but never a finding — observation-only. Use
it for the cold_start emit since that branch isn't guaranteed to
be reached across all timelines (depends on whether a fault-cascade
window happens to coincide with an eventually_ dispatch).

The helper refactor adds an optional 8th argument to _sdk_emit
(default true). All existing callers keep their behavior. The new
sdk_sometimes_optional is a sibling of sdk_sometimes that passes
false for must_hit. sdk_unreachable, sdk_reachable, sdk_always
unchanged.

Smoke-tested locally — sdk_sometimes_optional false emits a JSONL
event with must_hit:false as expected.
chore(asteria-game): bump pin to e49d4ab to activate budget fix
PR #135 (commit fd0543a / merged as e49d4ab on main) tightened
asteria-game's eventually_alive.sh + finally_alive.sh budgets and
dropped the AlwaysOrUnreachable cold-start emit. The fix lives in
source but the compose files in both cardano_node_master and
cardano_node_adversary still pin asteria-game:f7ce4a2 — the
pre-fix tag — so publish-images saw the existing tag in registry,
skipped the rebuild, and the buggy image is what every dispatch
and cron run still pulls.

Bumping both pins to asteria-game:e49d4ab (the merge SHA on main
that contains the fix). publish-images will resolve the new tag,
build from source at e49d4ab, and push. Master cron + adversary
dispatches will pick the new image up on next run.

Production impact: of the last three master cron runs, two failed
on the same flake we diagnosed in PR #135 (eventually_alive.sh and
finally_alive.sh tripping the composer's per-command timeout). This
bump clears the cause.
fix(asteria-game): tighten eventually/finally probe budgets + drop AlwaysOrUnreachable cold-start
Two related findings on the cardano_node_adversary 1h dispatch
(report 9_VSL0Up0MFelP0KPcfYVGa2):

  Always: Commands finish with zero exit code → stub/eventually_alive.sh
  Always: Commands finish with zero exit code → stub/finally_alive.sh
  Always assertions → stub eventually_alive cold_start

Both probes were budgeted at 30 s settle + 15×2 s retries = 60 s
worst case. The Antithesis composer's per-command timeout is well
below that — observed ≤16 s for parallel/eventually commands and
≤54 s for finally commands across multiple reports — so the
probes were getting SIGKILL'd by composer mid-loop and registering
as exit-code findings. Tightening to 3 s settle + 8×1 s = 11 s
worst case fits comfortably under any of the observed bounds.

The cold-start path used `sdk_unreachable`, which emits an
`AlwaysOrUnreachable` assertion with hit:true + condition:false —
that fires as an Always-class finding, defeating the script
comment's stated intent ("emit silently and exit 0 so a
fault-cascade window doesn't flag as a real liveness failure"). The
right primitive for an informational observation is
`sdk_sometimes false`, which records the rate without triggering a
finding when the assertion isn't continuously true. Switched the
cold-start emit to that.

Both probes now exit 0 unconditionally — the SDK assertion already
records the outcome; a non-zero shell exit just duplicates the
signal under the "Always: zero exit code" property. The Sometimes
events visible in the report make the failure mode equally
diagnosable without needing a finding.

The 13 s indexer cold-start absorption noted in the original
comments isn't lost — `slotsBehind <= 5` still polls every 1 s for
8 attempts after the 3 s settle, giving roughly the same number of
RollForward arrivals to catch up. The settle is purely the initial
"don't hammer the socket while the indexer is reconnecting" delay;
the loop's per-attempt sleep does the actual waiting.
fix(asteria-game): tighten eventually/finally probe budgets + drop AlwaysOrUnreachable cold-start
Two related findings on the cardano_node_adversary 1h dispatch
(report 9_VSL0Up0MFelP0KPcfYVGa2):

  Always: Commands finish with zero exit code → stub/eventually_alive.sh
  Always: Commands finish with zero exit code → stub/finally_alive.sh
  Always assertions → stub eventually_alive cold_start

Both probes were budgeted at 30 s settle + 15×2 s retries = 60 s
worst case. The Antithesis composer's per-command timeout is well
below that — observed ≤16 s for parallel/eventually commands and
≤54 s for finally commands across multiple reports — so the
probes were getting SIGKILL'd by composer mid-loop and registering
as exit-code findings. Tightening to 3 s settle + 8×1 s = 11 s
worst case fits comfortably under any of the observed bounds.

The cold-start path used `sdk_unreachable`, which emits an
`AlwaysOrUnreachable` assertion with hit:true + condition:false —
that fires as an Always-class finding, defeating the script
comment's stated intent ("emit silently and exit 0 so a
fault-cascade window doesn't flag as a real liveness failure"). The
right primitive for an informational observation is
`sdk_sometimes false`, which records the rate without triggering a
finding when the assertion isn't continuously true. Switched the
cold-start emit to that.

Both probes now exit 0 unconditionally — the SDK assertion already
records the outcome; a non-zero shell exit just duplicates the
signal under the "Always: zero exit code" property. The Sometimes
events visible in the report make the failure mode equally
diagnosable without needing a finding.

The 13 s indexer cold-start absorption noted in the original
comments isn't lost — `slotsBehind <= 5` still polls every 1 s for
8 attempts after the 3 s settle, giving roughly the same number of
RollForward arrivals to catch up. The settle is purely the initial
"don't hammer the socket while the indexer is reconnecting" delay;
the loop's per-attempt sleep does the actual waiting.
chore(testnet): mirror master in cardano_node_adversary
Make `cardano_node_adversary` the dress rehearsal for the future
master testnet — the state master will reach when the adversary
container is promoted into it.

Adversary's compose now equals master's compose verbatim plus the
adversary service stanza (slotted between sidecar and
tracer-sidecar). Concrete additions:

- tx-generator service + asteria-game service (both newly active
  in master via ecf7910).
- Restored container_name labels, original config-comment
  annotations, and the default network's `name:` so the diff
  master → adversary is exactly +adversary-block.
- New volumes asteria-game-db + asteria-deploy needed by
  asteria-game.

The tracer-sidecar pin stays on the post-#129 build (8dbf509) so
adversary keeps the Layer 3 fork-depth checklist; master will pick
it up when adversary is promoted in a follow-up.

Smoke-tested locally: 3 producers responding, adversary driver
hits p1 cleanly, tx-generator drove 5/5 transacts, asteria
indexer observed refill UTxO.
chore(testnet): mirror master in cardano_node_adversary
Make `cardano_node_adversary` the dress rehearsal for the future
master testnet — the state master will reach when the adversary
container is promoted into it.

Adversary's compose now equals master's compose verbatim plus the
adversary service stanza (slotted between sidecar and
tracer-sidecar). Concrete additions:

- tx-generator service + asteria-game service (both newly active
  in master via ecf7910).
- Restored container_name labels, original config-comment
  annotations, and the default network's `name:` so the diff
  master → adversary is exactly +adversary-block.
- New volumes asteria-game-db + asteria-deploy needed by
  asteria-game.

The tracer-sidecar pin stays on the post-#129 build (8dbf509) so
adversary keeps the Layer 3 fork-depth checklist; master will pick
it up when adversary is promoted in a follow-up.

Smoke-tested locally: 3 producers responding, adversary driver
hits p1 cleanly, tx-generator drove 5/5 transacts, asteria
indexer observed refill UTxO.