MLSYSIM – MLSys·IM - Machine Learning Systems

Try it in 5 lines

import mlsysim
from mlsysim import Engine

profile = Engine.solve(
    model    = mlsysim.Models.ResNet50,
    hardware = mlsysim.Hardware.Cloud.A100,
    batch_size = 1,
    precision  = "fp16"
)

print(f"Bottleneck: {profile.bottleneck}")   # → Memory Bound
print(f"Latency:    {profile.latency.to('ms'):~.2f}")  # → 0.34 ms
print(f"Throughput: {profile.throughput:.0f} img/s")     # → 2941 img/s

At batch=1, ResNet-50 loads ~50 MB of weights but performs only ~8 GFLOPs, making it firmly memory-bound on any modern GPU. The solver identifies this in microseconds using the Iron Law [1]:

\[T = \max\!\left(\frac{\text{FLOPs}}{\text{Peak} \times \eta},\ \frac{\text{Bytes}}{\text{BW}}\right)\]

Six solvers, one framework

Every solver takes typed registry objects and returns analytically grounded estimates. No benchmarking required.

Roofline Analysis Compute vs. memory bottleneck identification using the Iron Law. Single-node latency and throughput. Tutorial: Hello Roofline

3D Parallelism Data, tensor, and pipeline parallel scaling efficiency. Ring all-reduce and pipeline bubble overhead. Tutorial: Scaling to 1000 GPUs

LLM Serving Time-to-first-token (TTFT), inter-token latency (ITL), and KV-cache memory pressure. Tutorial: Two Phases of Inference

Total Cost of Ownership CapEx, OpEx, electricity, maintenance, and per-query economics over any time horizon. Tutorial: The $9M Question

Sustainability Energy, carbon footprint (kg CO2e), and water usage across datacenter regions. Tutorial: Geography Matters

Reliability Fleet MTBF, failure probability, and Young-Daly optimal checkpoint interval. Tutorial: Sensitivity Analysis

Learn by doing

Beginner

Hello Roofline

Memory-bound vs. compute-bound in 5 lines of Python. Sweep batch sizes and see the roofline crossover.

Beginner

The Memory Wall

Why most LLM inference is memory-bound, not compute-bound. Visualize the gap between peak FLOP/s and bandwidth.

Intermediate

Two Phases of Inference

Pre-fill is compute-bound, decode is memory-bound. Model both phases and diagnose KV-cache pressure.

Advanced

Scaling to 1000 GPUs

Ring all-reduce communication, pipeline bubbles, and scaling efficiency on distributed GPU clusters.

See all tutorials →

Companion slide decks

MLSYSIM is the computational backbone for the Machine Learning Systems lecture slides: 35 Beamer decks, 1,099 slides, and 266 original SVG diagrams. Each solver maps directly to one or more slide decks so students can move between the analytical engine and lecture material.

17 Decks

Volume I: Foundations

The full single-machine ML stack: data engineering, neural computation, training, compression, hardware acceleration, and serving. 570 slides, 141 SVGs. Download All PDFs

18 Decks

Volume II: At Scale

Distributed infrastructure: compute clusters, network fabrics, distributed training, fault tolerance, fleet orchestration, inference at scale, and sustainability. 529 slides, 125 SVGs. Download All PDFs

Tutorial

ISCA Tutorial: Quantitative ML Systems

Full-day tutorial designed for ISCA / ASPLOS / MLSys. Covers the Iron Law, the 5-layer stack, live MLSYSIM demos from single-node roofline to fleet-scale carbon analysis.

All slides include speaker notes, timing guidance, and 8–11 active learning exercises per deck. See the Teaching Guide for semester plans and customization instructions.

Built for

Students

Build intuition for why ML systems behave as they do. Run roofline analysis, see the memory wall, compute carbon footprints — all without needing GPU hardware. See learning path →

Instructors

Assign analytically grounded problem sets with deterministic, reproducible outputs. Pair MLSYSIM exercises with 35 ready-to-teach Beamer slide decks — each with speaker notes and active learning prompts. See course integration →

Engineers & Researchers

Pre-deployment estimates for any architecture. Model distributed overheads, LLM serving latency, and multi-region sustainability before provisioning hardware. See quick API guide →

Citation

If you use MLSYSIM, the companion slides, or the textbook in coursework or research, please cite:

@book{mlsysbook2024,
  title     = {Machine Learning Systems: Principles and Practices of
               Engineering Artificially Intelligent Systems},
  author    = {Reddi, Vijay Janapa and others},
  year      = {2024},
  publisher = {Harvard EDGE Lab},
  url       = {https://mlsysbook.ai}
}

The slide decks, MLSYSIM engine, and interactive labs are all part of the same open-source ecosystem. View all resources on GitHub.