solvers.MoERoutingModel
solvers.MoERoutingModel()Models first-order MoE routing imbalance and expert-parallel all-to-all cost.
Sparse models decouple memory from compute, but routing is rarely perfectly balanced. This model keeps the abstraction small: a single imbalance factor inflates the effective active experts and the routed-token communication volume. It does not simulate a router or token dispatcher.
Methods
| Name | Description |
|---|---|
| solve | Estimate effective active parameters and optional EP all-to-all latency. |
solve
solvers.MoERoutingModel.solve(
model,
batch_size,
seq_len,
precision='fp16',
ep_size=1,
routing_imbalance_factor=1.0,
fleet=None,
)Estimate effective active parameters and optional EP all-to-all latency.