solvers.MoERoutingModel

solvers.MoERoutingModel()

Models first-order MoE routing imbalance and expert-parallel all-to-all cost.

Sparse models decouple memory from compute, but routing is rarely perfectly balanced. This model keeps the abstraction small: a single imbalance factor inflates the effective active experts and the routed-token communication volume. It does not simulate a router or token dispatcher.

Methods

Name Description
solve Estimate effective active parameters and optional EP all-to-all latency.

solve

solvers.MoERoutingModel.solve(
    model,
    batch_size,
    seq_len,
    precision='fp16',
    ep_size=1,
    routing_imbalance_factor=1.0,
    fleet=None,
)

Estimate effective active parameters and optional EP all-to-all latency.

Back to top