Retain EB-critical TXs on peer backlog overflow
Problem
-------
When a node's peer TX backlog hits its cap (e.g. 10,000), incoming TXs
are silently dropped from self.txs. If a dropped TX is referenced by a
pending Endorser Block, the EB's validation scan (try_validating_eb)
finds has_tx() = false and the EB is never marked all_txs_seen. The EB
then misses its vote window and is orphaned by the next Ranking Block
(WrongEB). Because the TX is never re-offered by peers, the one-shot
missing_txs trigger — already consumed by acknowledge_tx — cannot
re-fire, leaving the EB permanently stuck.
Under Poisson-clustered RB production (e.g. seed 4 at 0.200 MB/s), this
cascade produced 48 EBs with 19 uncertified (40%), 23M peer TX drops,
and a mean of only 348 votes/EB (well below the 450 quorum).
Fix
---
Two changes in propagate_tx():
1. Move the mempool insertion check (try_add_to_mempool) BEFORE
acknowledge_tx, so that missing_txs has not yet been consumed at the
point where we decide whether to drop.
2. When PeerBacklogFull fires, check whether the TX is referenced by a
pending EB (self.leios.missing_txs.contains_key). If yes, keep the
TX in self.txs (skip the backlog, but preserve has_tx = true) and
fall through to acknowledge_tx normally. If no, drop as before.
This retains only EB-critical TXs — bounded by (pending_EBs × EB_size),
typically a few thousand entries and ~3 MB of HashMap overhead per node.
Non-critical TXs are still dropped, preserving the memory cap's purpose.
Effect on seed 4 sequential 0.200/wfa-ls (worst-case seed)
-----------------------------------------------------------
EBs uncert mean WrongEB drops peak RSS
caps (before): 48 19 348 1138 23.2M ~20 GB
caps-retain: 45 8 470 1330 5.9M ~24 GB
nocaps (ref): 46 8 473 1516 0 ~35 GB
Uncertified EBs: 19 → 8 (40% → 18%)
Mean votes/EB: 348 → 470 (near nocaps 473)
Peer TX drops: 23.2M → 5.9M (−74%)
Peak RSS: ~20 → ~24 GB (+20%, well below nocaps ~35 GB)
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>