Getting Started
Install MLSys·im and run your first analysis in under 5 minutes.
MLSys·im assumes basic Python familiarity (variables, functions, pip install). No prior ML or hardware knowledge is required. Key concepts like roofline analysis, memory-bound vs. compute-bound, and FLOP/s are explained in context throughout the tutorials. For a full reference of terms, see the Glossary.
Installation
MLSys·im requires Python 3.9+ and installs cleanly with pip:
pip install mlsysimFor development or to follow along with tutorials locally:
git clone https://github.com/harvard-edge/cs249r_book
cd cs249r_book/mlsysim
pip install -e ".[dev]"All tutorials in this documentation can also be run on Google Colab or Binder without any local installation. Look for the launch buttons at the top of each tutorial.
Your First Analysis
Once installed, you can run a complete roofline analysis in five lines:
import mlsysim
from mlsysim import Engine
# 1. Load a model and hardware from the vetted Zoo
model = mlsysim.Models.ResNet50
hardware = mlsysim.Hardware.Cloud.A100
# 2. Solve — the Engine applies the Iron Law of ML Systems
profile = Engine.solve(model=model, hardware=hardware, batch_size=1, precision="fp16")
# 3. Read the results
print(f"Bottleneck: {profile.bottleneck}") # → 'Memory Bound'
print(f"Latency: {profile.latency}") # → 0.34 ms
print(f"Throughput: {profile.throughput}") # → 2941 samples/secMLSys·im uses the Pint library for physical units. All quantities carry attached units (ms, GB, TFLOP/s, etc.). Use .to('ms') to convert between units. Use .magnitude to extract the raw number when you need it for calculations or plotting.
The Agent-Ready CLI
MLSys·im features a powerful, infrastructure-as-code (IaC) CLI designed for both human exploration and CI/CD automation. You can evaluate hardware and workloads directly from your terminal.
1. Explore the Registry
Discover built-in hardware, models, and infrastructure:
mlsysim zoo hardware
mlsysim zoo models2. Quick Evaluation
Evaluate the physics of a workload on a specific hardware node instantly:
mlsysim eval Llama3_8B H100 --batch-size 323. Declarative Infrastructure (YAML)
Define your entire cluster and SLA constraints in a declarative mlsys.yaml file:
version: "1.0"
workload:
name: "Llama3_70B"
batch_size: 4096
hardware:
name: "H100"
nodes: 64
ops:
region: "Quebec"
duration_days: 14.0
constraints:
assert:
- metric: "performance.latency"
max: 50.0Evaluate the full 3-lens scorecard (Feasibility, Performance, Macro):
mlsysim eval cluster.yamlFor automated scripts and AI agents, use the -o json flag for strict, parseable output.
Supported Workloads
MLSys·im models five families of ML architectures. Each workload type knows how to compute its own FLOPs, memory footprint, and arithmetic intensity:
| Workload | Key Parameters | Scaling Behavior |
|---|---|---|
| Transformer | Parameters, layers, heads, sequence length | 2P FLOPs/token |
| CNN | Parameters, inference FLOPs | Fixed per image |
| Sparse (MoE) | Total vs. active parameters, experts | Active P for FLOPs, total P for memory |
| SSM (Mamba) | Parameters, state dimension | O(1) state cache |
| Diffusion | Parameters, denoising steps T | T × FLOPs/step |
# Access workloads from the Model Zoo
model = mlsysim.Models.Language.Llama3_70B # Transformer
model = mlsysim.Models.ResNet50 # CNN
# Or define a custom workload
from mlsysim import TransformerWorkload
custom = TransformerWorkload(
name="My-Model", parameters=7e9,
layers=32, hidden_dim=4096, heads=32
)All workloads share a common lower() method that produces a ComputationGraph — the intermediate representation consumed by every solver. This means new workload types automatically work with all existing solvers.
See the Model Zoo for the full list of vetted workloads.
Understanding the Output
| Field | What it means |
|---|---|
bottleneck |
'Memory Bound' or 'Compute Bound' — which resource limits performance |
latency |
Time to process one batch, derived from the roofline ceiling |
throughput |
Samples per second = batch_size / latency |
latency_compute |
Time if only compute were the constraint |
latency_memory |
Time if only memory bandwidth were the constraint |
If latency_memory > latency_compute, you’re memory-bound: buying faster GPUs won’t help much. You need to increase batch size or use a more compute-dense operation (e.g., fused attention). If you’re compute-bound, that’s when parallelism and quantization pay off.
Exploring the Zoo
MLSys·im ships with vetted registries of hardware, models, infrastructure, and systems. Use tab-completion to explore:
# Hardware: Cloud, Workstation, Mobile, Edge, Tiny categories
mlsysim.Hardware.Cloud.H100
mlsysim.Hardware.Edge.JetsonOrinNX
mlsysim.Hardware.Tiny.ESP32_S3
mlsysim.Hardware.Mobile.iPhone15Pro
# Models: Language, Vision, Tiny, StateSpace, GenerativeVision categories
mlsysim.Models.Language.Llama3_70B
mlsysim.Models.Vision.ResNet50
mlsysim.Models.StateSpace.Mamba_2_8B
mlsysim.Models.GenerativeVision.StableDiffusion_v1_5
# Infrastructure: Regional grids
mlsysim.Infra.Grids.Quebec # hydro: ~20 gCO2/kWh
mlsysim.Infra.Grids.Poland # coal: ~820 gCO2/kWhAdjusting the Efficiency Parameter
The efficiency parameter (η) is the single most important tuning knob in MLSys·im. It represents the fraction of theoretical peak hardware performance that is actually achieved in practice.
# Default: well-optimized training (η = 0.5)
profile_default = Engine.solve(
model=model, hardware=hardware,
batch_size=32, precision="fp16", efficiency=0.5
)
# Conservative: typical inference workload (η = 0.35)
profile_inference = Engine.solve(
model=model, hardware=hardware,
batch_size=32, precision="fp16", efficiency=0.35
)
print(f"Training estimate: {profile_default.latency}")
print(f"Inference estimate: {profile_inference.latency}")Typical efficiency ranges:
| Scenario | η range | Notes |
|---|---|---|
| Well-optimized training (fp16) | 0.35–0.55 | Megatron-LM, DeepSpeed |
| Inference (fp16) | 0.25–0.45 | vLLM, TensorRT-LLM |
| Inference (int8) | 0.20–0.40 | Quantized serving |
See the Accuracy & Validation page for guidance on choosing η for different scenarios.
Defining Custom Models
You are not limited to the Zoo. Define any model by specifying its parameters and FLOPs:
from mlsysim import TransformerWorkload
from mlsysim import ureg
my_model = TransformerWorkload(
name="My-Custom-LLM",
architecture="Transformer",
parameters=13e9 * ureg.param,
layers=40,
hidden_dim=5120,
heads=40,
kv_heads=8,
inference_flops=2 * 13e9 * ureg.flop # Rule of thumb: ~2 FLOPs per parameter per forward pass
)
profile = Engine.solve(model=my_model, hardware=hardware, batch_size=1)
print(f"Bottleneck: {profile.bottleneck}")
print(f"Latency: {profile.latency}")Importing Models from HuggingFace
Instead of defining models manually, you can import any public model directly from HuggingFace Hub:
from mlsysim.models import import_hf_model
# Import any public model — no torch or transformers dependency required
model = import_hf_model("meta-llama/Meta-Llama-3-8B")
profile = Engine.solve(model=model, hardware=hardware, batch_size=1)
print(f"Bottleneck: {profile.bottleneck}")
print(f"Parameters: {model.parameters.to('Gparam'):.2f}")The importer fetches the model’s config.json and analytically estimates parameter count from the architecture fields (layers, hidden_dim, heads, FFN size). No GPU or large dependencies required.
Next Steps
Follow the structured learning path on the Tutorials page, starting with the Hello, Roofline.
For a complete reference of which solver to use for different questions, see the Resolver Guide.
For production capacity planning and cost modeling, see For Engineers.