Getting Started

Install MLSys·im and run your first analysis in under 5 minutes.

NotePrerequisites

MLSys·im assumes basic Python familiarity (variables, functions, pip install). No prior ML or hardware knowledge is required. Key concepts like roofline analysis, memory-bound vs. compute-bound, and FLOP/s are explained in context throughout the tutorials. For a full reference of terms, see the Glossary.

Installation

MLSys·im requires Python 3.9+ and installs cleanly with pip:

pip install mlsysim

For development or to follow along with tutorials locally:

git clone https://github.com/harvard-edge/cs249r_book
cd cs249r_book/mlsysim
pip install -e ".[dev]"
Note

All tutorials in this documentation can also be run on Google Colab or Binder without any local installation. Look for the launch buttons at the top of each tutorial.


Your First Analysis

Once installed, you can run a complete roofline analysis in five lines:

import mlsysim
from mlsysim import Engine

# 1. Load a model and hardware from the vetted Zoo
model    = mlsysim.Models.ResNet50
hardware = mlsysim.Hardware.Cloud.A100

# 2. Solve — the Engine applies the Iron Law of ML Systems
profile = Engine.solve(model=model, hardware=hardware, batch_size=1, precision="fp16")

# 3. Read the results
print(f"Bottleneck: {profile.bottleneck}")   # → 'Memory Bound'
print(f"Latency:    {profile.latency}")       # → 0.34 ms
print(f"Throughput: {profile.throughput}")    # → 2941 samples/sec
NoteWorking with units

MLSys·im uses the Pint library for physical units. All quantities carry attached units (ms, GB, TFLOP/s, etc.). Use .to('ms') to convert between units. Use .magnitude to extract the raw number when you need it for calculations or plotting.


The Agent-Ready CLI

MLSys·im features a powerful, infrastructure-as-code (IaC) CLI designed for both human exploration and CI/CD automation. You can evaluate hardware and workloads directly from your terminal.

1. Explore the Registry

Discover built-in hardware, models, and infrastructure:

mlsysim zoo hardware
mlsysim zoo models

2. Quick Evaluation

Evaluate the physics of a workload on a specific hardware node instantly:

mlsysim eval Llama3_8B H100 --batch-size 32

3. Declarative Infrastructure (YAML)

Define your entire cluster and SLA constraints in a declarative mlsys.yaml file:

version: "1.0"
workload:
  name: "Llama3_70B"
  batch_size: 4096
hardware:
  name: "H100"
  nodes: 64
ops:
  region: "Quebec"
  duration_days: 14.0
constraints:
  assert:
    - metric: "performance.latency"
      max: 50.0

Evaluate the full 3-lens scorecard (Feasibility, Performance, Macro):

mlsysim eval cluster.yaml

For automated scripts and AI agents, use the -o json flag for strict, parseable output.


Supported Workloads

MLSys·im models five families of ML architectures. Each workload type knows how to compute its own FLOPs, memory footprint, and arithmetic intensity:

Workload Key Parameters Scaling Behavior
Transformer Parameters, layers, heads, sequence length 2P FLOPs/token
CNN Parameters, inference FLOPs Fixed per image
Sparse (MoE) Total vs. active parameters, experts Active P for FLOPs, total P for memory
SSM (Mamba) Parameters, state dimension O(1) state cache
Diffusion Parameters, denoising steps T T × FLOPs/step
# Access workloads from the Model Zoo
model = mlsysim.Models.Language.Llama3_70B    # Transformer
model = mlsysim.Models.ResNet50               # CNN

# Or define a custom workload
from mlsysim import TransformerWorkload
custom = TransformerWorkload(
    name="My-Model", parameters=7e9,
    layers=32, hidden_dim=4096, heads=32
)

All workloads share a common lower() method that produces a ComputationGraph — the intermediate representation consumed by every solver. This means new workload types automatically work with all existing solvers.

See the Model Zoo for the full list of vetted workloads.


Understanding the Output

Field What it means
bottleneck 'Memory Bound' or 'Compute Bound' — which resource limits performance
latency Time to process one batch, derived from the roofline ceiling
throughput Samples per second = batch_size / latency
latency_compute Time if only compute were the constraint
latency_memory Time if only memory bandwidth were the constraint
TipThe key insight

If latency_memory > latency_compute, you’re memory-bound: buying faster GPUs won’t help much. You need to increase batch size or use a more compute-dense operation (e.g., fused attention). If you’re compute-bound, that’s when parallelism and quantization pay off.


Exploring the Zoo

MLSys·im ships with vetted registries of hardware, models, infrastructure, and systems. Use tab-completion to explore:

# Hardware: Cloud, Workstation, Mobile, Edge, Tiny categories
mlsysim.Hardware.Cloud.H100
mlsysim.Hardware.Edge.JetsonOrinNX
mlsysim.Hardware.Tiny.ESP32_S3
mlsysim.Hardware.Mobile.iPhone15Pro

# Models: Language, Vision, Tiny, StateSpace, GenerativeVision categories
mlsysim.Models.Language.Llama3_70B
mlsysim.Models.Vision.ResNet50
mlsysim.Models.StateSpace.Mamba_2_8B
mlsysim.Models.GenerativeVision.StableDiffusion_v1_5

# Infrastructure: Regional grids
mlsysim.Infra.Grids.Quebec      # hydro: ~20 gCO2/kWh
mlsysim.Infra.Grids.Poland      # coal:  ~820 gCO2/kWh

Adjusting the Efficiency Parameter

The efficiency parameter (η) is the single most important tuning knob in MLSys·im. It represents the fraction of theoretical peak hardware performance that is actually achieved in practice.

# Default: well-optimized training (η = 0.5)
profile_default = Engine.solve(
    model=model, hardware=hardware,
    batch_size=32, precision="fp16", efficiency=0.5
)

# Conservative: typical inference workload (η = 0.35)
profile_inference = Engine.solve(
    model=model, hardware=hardware,
    batch_size=32, precision="fp16", efficiency=0.35
)

print(f"Training estimate:  {profile_default.latency}")
print(f"Inference estimate: {profile_inference.latency}")

Typical efficiency ranges:

Scenario η range Notes
Well-optimized training (fp16) 0.35–0.55 Megatron-LM, DeepSpeed
Inference (fp16) 0.25–0.45 vLLM, TensorRT-LLM
Inference (int8) 0.20–0.40 Quantized serving

See the Accuracy & Validation page for guidance on choosing η for different scenarios.


Defining Custom Models

You are not limited to the Zoo. Define any model by specifying its parameters and FLOPs:

from mlsysim import TransformerWorkload
from mlsysim import ureg

my_model = TransformerWorkload(
    name="My-Custom-LLM",
    architecture="Transformer",
    parameters=13e9 * ureg.param,
    layers=40,
    hidden_dim=5120,
    heads=40,
    kv_heads=8,
    inference_flops=2 * 13e9 * ureg.flop  # Rule of thumb: ~2 FLOPs per parameter per forward pass
)

profile = Engine.solve(model=my_model, hardware=hardware, batch_size=1)
print(f"Bottleneck: {profile.bottleneck}")
print(f"Latency:    {profile.latency}")

Importing Models from HuggingFace

Instead of defining models manually, you can import any public model directly from HuggingFace Hub:

from mlsysim.models import import_hf_model

# Import any public model — no torch or transformers dependency required
model = import_hf_model("meta-llama/Meta-Llama-3-8B")

profile = Engine.solve(model=model, hardware=hardware, batch_size=1)
print(f"Bottleneck: {profile.bottleneck}")
print(f"Parameters: {model.parameters.to('Gparam'):.2f}")

The importer fetches the model’s config.json and analytically estimates parameter count from the architecture fields (layers, hidden_dim, heads, FFN size). No GPU or large dependencies required.


Next Steps

TipRecommended path

Follow the structured learning path on the Tutorials page, starting with the Hello, Roofline.

For a complete reference of which solver to use for different questions, see the Resolver Guide.

For production capacity planning and cost modeling, see For Engineers.

Back to top