Fix clippy warnings
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
npm dev/build/start now auto-fetch via fetch-geojson.sh when the file is missing. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Deduplicate code between actor (driver.rs) and sequential engines: - Extract NodeEvent, CpuTaskWrapper, schedule_cpu_task, complete_cpu_subtask into new common.rs shared module - Extract TxGeneratorCore from both TransactionProducer (actor) and TxGenerator (sequential) into tx.rs; delete TxGenerator entirely - TransactionProducer now wraps TxGeneratorCore as a thin async actor - WeightedLookup made pub(super) for reuse Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Rename `sequential-engine: true/false` to `engine: sequential | actor` with actor as the default to avoid surprises. Add parameters/turbo.yaml convenience preset (sequential engine, 6 shards, zero-latency-clusters) for ~5x speedup. Update CLAUDE.md and README.md with engine and shard documentation. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Replace NodeListWrapper enum in actor.rs with RunnableNode trait object, reducing boilerplate and making the actor engine generic over any NodeImpl. Extract new_generic<N>() to allow tests to construct simulations with arbitrary node types. Add TestNode — a minimal NodeImpl that ping-pongs messages between peers, exercises timed events and CPU tasks — and run it through all 4 engine modes (actor/sequential × single/multi-shard) plus a determinism check. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
When a shard is blocked by CMB ceiling, sleep on the cross-shard channel (100μs timeout) instead of busy-spinning with yield_now(). Eliminates massive sys time overhead at higher shard counts. 12 shards: 7m32s → 2m36s (now matches 6-shard performance) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Each shard runs its own SequentialSimulation on a dedicated OS thread (std::thread::scope), with CMB ceiling enforcement via AtomicTimestamp and cross-shard messages via std::sync::mpsc channels. With 6 shards on realistic 3000-node topology: - Sequential+CMB: 2m28s wall (16 cores utilized) - Actor+CMB: 3m45s wall (10 cores utilized) - 1.5x speedup over actor sharding, 4x over actor single-shard Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Replace the actor-based simulation with a synchronous event loop for single-shard runs. The sequential engine eliminates tokio coordination overhead (channels, oneshot allocs, task scheduling) by processing events directly from a global priority queue. Events at the same timestamp are batched and processed in parallel across nodes using rayon, following a Bulk Synchronous Parallel model: pop batch → resolve deliveries → parallel node compute → apply results. Enabled by default for single-shard (sequential-engine: true in config). Falls back to sequential processing for small batches (<32 events). The actor engine remains available via sequential-engine: false. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Min-cut uses recursive bisection with Kernighan-Lin refinement to minimize cross-shard edge count. Also logs cross-shard edge count and CMB lookahead at startup for diagnostics. Shard strategy benchmark summary (6 shards, 100 slots, realistic.yaml): | Strategy | Wall | User | Cross-shard | Lookahead | Sizes | |-----------------------|--------|-------|-------------|-----------|--------------------| | (none, 1 shard) | 9m19s | 36m | — | — | [3000] | | zero-latency-clusters | 3m48s | 39m | 82.0% | 1ms | [500x6] | | min-latency-clusters | 4m51s | 43m | 55.9% | 8ms | [600,600,600,600,300,300] | | geographic | 4m18s | 43m | 61.9% | 1ms | [796,743,703,758] | | min-cut | 4m45s | 43m | 80.5% | 1ms | [750,750,375x4] | For uniformly connected topologies, balance dominates — zero-latency- clusters wins despite 82% cross-shard edges. Strategies optimizing for fewer cross-shard edges or higher lookahead create imbalance that negates their gains. Would benefit from topologies with natural clusters. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Agglomerative clustering (Kruskal-style): sorts edges by latency and merges lowest-latency pairs first, maximizing the CMB lookahead. Balance controlled by shard-max-size-pct config (default 200%). Achieves 7-10ms min cross-shard latency vs 1ms for other strategies, but worse cluster shapes increase cross-shard traffic. For uniformly connected topologies, zero-latency-clusters (balanced, 1ms lookahead) still wins on wall clock. Min-latency-clusters would shine on topologies with natural geographic clusters. Also adds CMB lookahead to shard startup log for diagnostics. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Add Geographic variant to ShardStrategy, configurable via shard-strategy - K-means++ clustering on node coordinates, keeping 0-latency components together, falls back to zero-latency-clusters if coordinates missing - Add location field to NodeConfiguration (from topology coordinates) - Extract union-find helpers to shared module - Refactor zero_latency_clusters to expose reusable component-building and balanced-assignment functions Benchmarks show geographic helps when topology has clear regional clusters with high inter-region latency. For uniformly-connected topologies, the balance penalty outweighs the marginal lookahead improvement. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Move shard infrastructure (Shard struct, NetworkWrapper, network setup, cross-shard routing, edge wiring, CMB peer setup, TX producers) from sim.rs into sharding/shard.rs. sim.rs now only handles node/driver/actor creation and top-level orchestration. - Shard uses trait-object NetworkRunnable (BoxFuture) for type erasure - build_shards() takes typed networks and returns Vec<Shard> - sim.rs init_nodes() only creates nodes, drivers, actors - Net reduction: -176 lines from sim.rs Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Each shard (ClockCoordinator + NetworkCoordinator + TxProducer) now runs as its own tokio::spawn'd task, enabling true parallel execution across cores. Key changes: - Replace select_all with tokio::spawn per shard (Simulation::run takes self) - Cross-shard messages route directly NC-to-NC via delivery channels - Target NC handles timing locally via its own Connection (no broker) - CMB ceiling: min(peer.time + min_latency) with null message advancement - Notified::enable() prevents missed notifications across concurrent tasks - TX generation rate scaled by shard_count for consistent output - Delete broker.rs (replaced by direct NC-to-NC routing) 4-shard runs ~2x faster than 1-shard with matching simulation results. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Introduces per-shard ClockCoordinators, Networks, and TransactionProducers so that node groups can advance virtual time independently. Cross-shard messages are routed through a CrossShardBroker with Connection-based latency modeling. Each coordinator computes its own time ceiling from peer shard times + minimum inter-shard latency (lookahead). Currently works correctly with shard-count=1 (default). Multi-shard execution has a deadlock that needs further investigation — likely related to coordinator polling within nested select!/select_all. Key changes: - shard-count config parameter (default 1, backward compatible) - CrossShardBroker for cross-shard message delivery - NetworkCoordinator routes cross-shard messages to broker - ClockCoordinator computes ceiling from peer shard lookaheads - Fix notify race: create notified() futures before checking state - Fix broker barrier: use TaskInitiator instead of ClockBarrier - Per-shard TransactionProducer with filter_map for shard's nodes Co-Authored-By: Claude Opus 4.6 <[email protected]>
finish_task() previously sent a ClockEvent::FinishTask through the same mpsc channel as Wait/CancelWait events, creating contention. Now it does an atomic fetch_sub and signals a Notify, letting the coordinator wake without channel round-trips. Also handles the resulting race where time can advance before a Wait event arrives. Co-Authored-By: Claude Opus 4.6 <[email protected]>