core.solver.MoERoutingModel
core.solver.MoERoutingModel()Models first-order Mixture-of-Experts routing imbalance and optional expert-parallel all-to-all cost.
Sparse models decouple memory from compute: total parameters determine memory, while active parameters determine the compute path. MoERoutingModel adds one teachable knob, routing_imbalance_factor, for hot-expert effects.
Methods
| Name | Description |
|---|---|
| solve | Estimate effective active parameters and optional EP all-to-all latency. |
solve
core.solver.MoERoutingModel.solve(
model,
batch_size,
seq_len,
precision='fp16',
ep_size=1,
routing_imbalance_factor=1.0,
fleet=None,
)Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| model | SparseTransformerWorkload | Sparse Transformer / MoE workload. | required |
| batch_size | int | Local routed batch size. | required |
| seq_len | int | Sequence length. | required |
| precision | str | Activation precision for routed bytes. | 'fp16' |
| ep_size | int | Expert-parallel degree. | 1 |
| routing_imbalance_factor | float | Hot-expert multiplier, where 1.0 is balanced. |
1.0 |
| fleet | Fleet | Optional fabric used to compute all-to-all latency. | None |
Returns
MoERoutingResult with effective active experts, effective active parameters, routed bytes, and optional all-to-all latency.