docs(release-notes): remove self-references
Signed-off-by: Chris Gianelloni <[email protected]>
Signed-off-by: Chris Gianelloni <[email protected]>
I swapped them; stupidly.
Addresses the correctness gap flagged in the review of the Txn-on- mod_revision change: if a freshly-started process's very first put hit GrpcDeadlineExceeded after server-side commit, the next retry's compare-failure was indistinguishable from a Carol-style restart (both have lastModRev==0). The old code's lastModRev==0 → re-peek branch would then double-deliver the message. Now broadcastMessages issues a one-time range query at startup (queryInitialModRev) and seeds lastModRevVar with whatever etcd thinks the current mod_revision of our key is — zero if the key does not yet exist. After this seeding the only way mod_revision can advance past lastModRev is via our own writes, so any compare-failure unambiguously means "we already delivered" and the caller pops. Side-effects: - putMessage no longer returns Bool; the re-peek path is gone. - The compare-failure branch traces a new BroadcastDeduped event carrying the previous and observed revisions, so an operator debugging packet-loss behaviour can see when the dedup actually fires. - The structurally-unreachable "compare failed AND range empty" case now `fail`s loudly rather than silently spinning — the surrounding race kills the node and a clean restart re-seeds against whatever state etcd has. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: KtorZ <[email protected]>
Signed-off-by: Chris Gianelloni <[email protected]>
Various fixes
Signed-off-by: Chris Gianelloni <[email protected]>
Signed-off-by: KtorZ <[email protected]>
Signed-off-by: KtorZ <[email protected]>
Signed-off-by: KtorZ <[email protected]>
Signed-off-by: KtorZ <[email protected]>
Implement case on constant
The real fix should be to avoid references on types with a lifetime that are only borrowing values; there's no need to send references of them. Signed-off-by: KtorZ <[email protected]>
Signed-off-by: KtorZ <[email protected]>
Signed-off-by: Chris Gianelloni <[email protected]>
Fix script integrity hash mismatch preprod
Signed-off-by: KtorZ <[email protected]>
The Txn-on-mod_revision approach worked for the duplicate-broadcast bug
under packet loss, but lost messages whenever a peer restarted against
persisted etcd state. Carol's restart in
'can survive a bit of downtime of 1 in 3 nodes' is exactly that
scenario: her in-memory 'lastModRev' starts at 0, etcd's
'msg-<carol-host>' already has a non-zero 'mod_revision' from before,
so the compare fails — and the old code treated that the same as a
deadline-exceeded retry and didn't re-put. Her first 'AckSn' after
restart vanished and snapshot 3 never confirmed.
The two compare-failure cases have a clean discriminator: the in-memory
'lastModRev'. Only we ever write to our own key, so an etcd
'mod_revision' that's ahead of our recorded 'lastModRev' means one of:
(a) lastModRev == 0 — fresh process, etcd has stale-but-real state.
Our put did NOT deliver this attempt; the outer loop must re-peek
with the now-corrected 'lastModRev' and try again.
(b) lastModRev > 0 — we previously delivered something at that
revision and a later attempt has now advanced 'mod_revision'.
Only we write here, so that "later attempt" was ours and was
already delivered to peers — pop and move on, do NOT issue a
second put (the original bug we set out to fix).
'putMessage' now returns 'Bool' encoding this decision; 'broadcastMessages'
pops on 'True', re-peeks on 'False'.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Wraps each broadcast 'put' in a Txn: compare mod_revision(msg-<host>) == lastModRev success: put(msg-<host>, value) failure: range(msg-<host>) A retry whose original committed server-side (which happens when the original 'put' returned GrpcDeadlineExceeded to the client but had already been processed) hits the failure branch — mod_revision has advanced past lastModRev — so no second put runs, and the watcher on each peer never sees a duplicate revision. The range result lets us adopt the actual mod_revision as our new baseline and move on. A legitimately new logical broadcast (same key, possibly even same content) presents a new compare attempt against the up-to-date lastModRev, the success branch runs, and a new revision is created exactly once. Single key per peer; no etcd disk-space regression versus master. Previous attempts at this: - Receiver-side content dedup: wrong because Hydra legitimately re-broadcasts identical content for resubmits (e.g. 'can submit a timed tx' in hydra-cluster). - Unique-per-cycle key with createRevision==0 CAS: correct but grew the etcd keyspace unboundedly. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>