core.solver.MoERoutingModel

core.solver.MoERoutingModel()

Models first-order Mixture-of-Experts routing imbalance and optional expert-parallel all-to-all cost.

Sparse models decouple memory from compute: total parameters determine memory, while active parameters determine the compute path. MoERoutingModel adds one teachable knob, routing_imbalance_factor, for hot-expert effects.

Methods

Name Description
solve Estimate effective active parameters and optional EP all-to-all latency.

solve

core.solver.MoERoutingModel.solve(
    model,
    batch_size,
    seq_len,
    precision='fp16',
    ep_size=1,
    routing_imbalance_factor=1.0,
    fleet=None,
)

Parameters

Name Type Description Default
model SparseTransformerWorkload Sparse Transformer / MoE workload. required
batch_size int Local routed batch size. required
seq_len int Sequence length. required
precision str Activation precision for routed bytes. 'fp16'
ep_size int Expert-parallel degree. 1
routing_imbalance_factor float Hot-expert multiplier, where 1.0 is balanced. 1.0
fleet Fleet Optional fabric used to compute all-to-all latency. None

Returns

MoERoutingResult with effective active experts, effective active parameters, routed bytes, and optional all-to-all latency.

Back to top