Machine Learning Systems
  • MLSys·IM
    • Full Textbook

    • Volume I: Foundations
    • Volume II: At Scale

    • TinyTorch
    • Hardware Kits
    • MLSys·IM

    • Labs
  • Downloads
    • MLSys·IM Paper PDF
  • Star
  • Support
  • Subscribe
  • GitHub
    • Edit this page
    • Report an issue
    • Discussions
    • View source

🚧 DEVELOPMENT PREVIEW - Built from dev@7d00dc46 • 2026-03-16 20:18 EDT • Stable version →

🧮 MLSys·im: First-principles ML systems modeling. Get started →
📚 Textbook: Read the ML Systems book. Explore →

MLSys·im

MLSys·im

Build intuition for ML system performance, cost, and carbon — from first principles.

pip install mlsysim

Get Started Tutorials

Companion to Harvard CS 249r · Python 3.10+ · No GPU required · GitHub · MIT License
Input
import mlsysim
from mlsysim import Engine

profile = Engine.solve(
    model    = mlsysim.Models.ResNet50,
    hardware = mlsysim.Hardware.Cloud.A100,
    batch_size = 1,
    precision  = "fp16"
)
Output
Bottleneck:  Memory Bound
Latency:     0.42 ms
Throughput:  2,381 img/s
MFU:         12.4%
Memory:      0.10 GB / 80 GB
AI (FLOP/B): 4.2  ← below ridge point

What it models

Arithmetic Intensity (FLOP/Byte) FLOP/s Memory Bound Compute Bound Ridge Point

Roofline Analysis

Identify whether your workload is memory-bound or compute-bound on any hardware.

Llama-3.1-8B on H100 Pre-fill 4.2 ms TTFT (compute-bound) → Decode 0.8 ms ITL (memory-bound) KV-Cache: 2.1 GB / 80 GB available

LLM Serving

Model pre-fill and decode phases, KV-cache pressure, and time-to-first-token.

Quebec 20 g CO₂/kWh Norway 10 g CO₂/kWh US Avg 390 g CO₂/kWh Poland 820 g CO₂/kWh

Sustainability

Same workload, different region — up to 41× difference in carbon footprint.

256× H100 — GPT-3 175B Data Parallel 32× Tensor Parallel 4× Pipeline Parallel 2× Scaling Efficiency 74% Pipeline Bubble 6.3%

Distributed Training

3D parallelism decomposition with scaling efficiency and pipeline bubble analysis.

What can you answer?

Is my workload memory-bound or compute-bound?

How many GPUs do I need to train a 70B model in 24 hours?

What is the carbon footprint of training in Iowa vs. Quebec?

What is the optimal checkpoint interval for a 1000-GPU job?

How much does quantization to INT8 actually save in latency?

What is the 3-year TCO for a 64×H100 cluster?

Model every constraint in your ML system

21Solvers
22System Walls
6Constraint Domains
<0.3sFull Analysis

Roofline Analysis Compute vs. memory bottleneck identification using the Iron Law. Single-node latency and throughput.

LLM Serving Time-to-first-token (TTFT), inter-token latency (ITL), and KV-cache memory pressure.

3D Parallelism Data, tensor, and pipeline parallel scaling with communication overhead and bubble analysis.

Sustainability Energy, carbon footprint (kg CO₂e), and water usage across datacenter regions.

Total Cost of Ownership CapEx, OpEx, electricity, maintenance, and per-query economics over any time horizon.

Reliability & Queueing Fleet MTBF, checkpoint intervals, tail latency (P99), and SLA compliance.

See the full solver guide →

Learn by doing

Beginner

Hello, Roofline

Memory-bound vs. compute-bound in 5 lines of Python. Sweep batch sizes and see the roofline crossover.

Beginner

The Memory Wall

Quantify model weights, activations, and optimizer state. Find out why your 7B model won't fit on one GPU.

Intermediate

Two Phases, One Request

Model the two phases of autoregressive generation (pre-fill and decode) and diagnose KV-cache pressure.

Intermediate

Quantization Trade-offs

INT8 vs. FP16 vs. FP4 — measure the memory savings, throughput gains, and accuracy costs of compression.

Advanced

Scaling to 1000 GPUs

Ring all-reduce communication, pipeline bubbles, and scaling efficiency on 256 GPUs.

Advanced

Geography is a Systems Variable

Same model, same GPU, yet up to 41× difference in carbon footprint depending on where you train.

See all 12 tutorials →

MLSys·im First-principles ML systems modeling
Docs Get Started Tutorials Solvers Architecture
Project GitHub PyPI Research Paper Contributing
Related Harvard CS 249r ML Systems Textbook
MIT License · Cite: Reddi, V.J. (2025). MLSys·im: First-Principles Infrastructure Modeling for ML Systems.

© 2024-2026 Harvard University. Licensed under CC-BY-NC-SA 4.0

Part of the Machine Learning Systems textbook