models.types.SparseTransformerWorkload

models.types.SparseTransformerWorkload()

Sparse Transformer / Mixture-of-Experts workload type.

parameters represents total resident parameters and therefore memory pressure. active_parameters represents the parameters used per token and therefore the first-order compute path. experts and active_experts_per_token describe the routing structure used by MoERoutingModel and expert-parallel communication estimates.

Key Fields

Name	Type	Description
parameters	Quantity	Total model parameters.
active_parameters	Quantity	Parameters active per token.
experts	int	Total number of experts.
active_experts_per_token	int	Top-k experts selected per token.
layers	int	Transformer layer count.
hidden_dim	int	Hidden dimension used for activation and routing-volume estimates.

Example

from mlsysim import SparseTransformerWorkload, ureg

moe = SparseTransformerWorkload(
    name="Toy-MoE-64B",
    architecture="Sparse Transformer",
    parameters=64e9 * ureg.count,
    active_parameters=8e9 * ureg.count,
    experts=8,
    active_experts_per_token=2,
    layers=32,
    hidden_dim=4096,
    heads=32,
)