praos: surface upstream gappy ChainSync via two complementary WARNs
Diagnosing a wedged catch-up against the public Leios dev relay
required grepping across the orphan / fork-mismatch INFO traffic and
inferring the cause from cache state. Two new WARNs hand the
diagnosis directly to an operator skimming logs:
- **ChainSync ingress contiguity check** in `record_peer_tip`: when an
arriving header's `prev_hash` doesn't match the previously-announced
one's hash, log the (block_no, hash) pair on each side and the
implied skipped-block count. Throttled per peer
(`GAP_WARNING_INTERVAL = 10 s`) so a sustained non-contiguous forward
doesn't flood the log. This is the direct signal — the WARN fires
the moment upstream commits the offence.
- **Stuck-validation rollup** in `retry_select_chain`: when validation
has been frozen for `STUCK_THRESHOLD = 30 s` and some peer offers a
strictly-better tip, emit one rollup line summarising stuck duration,
adopted vs best-peer block_no, the count of entries in that peer's
replay whose parent_hash we don't have locally, and the peer-chain
size. Throttled to one fire per `STUCK_WARNING_INTERVAL = 60 s`.
This covers the general "stuck for any reason" case and stays
informative when the ingress check has gone quiet under its
per-peer cooldown.
Both lines were verified against the dev relay: ingress fires within
~30 s of catch-up reaching the wedge boundary (with the exact missing
block hash prefix in the message), and the rollup fires 30 s later
with `unreachable_parent_hashes > 0`, both throttled correctly under
sustained wedge load.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>