Add multi-shard support to sequential DES engine
Each shard runs its own SequentialSimulation on a dedicated OS thread (std::thread::scope), with CMB ceiling enforcement via AtomicTimestamp and cross-shard messages via std::sync::mpsc channels. With 6 shards on realistic 3000-node topology: - Sequential+CMB: 2m28s wall (16 cores utilized) - Actor+CMB: 3m45s wall (10 cores utilized) - 1.5x speedup over actor sharding, 4x over actor single-shard Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>