For Students

Build intuition for ML systems — without needing GPU hardware.

Whether you’re taking your first ML systems course or preparing for industry interviews, MLSys·im lets you experiment with real hardware specifications and see exactly why systems behave the way they do.


What You’ll Learn

By working through the MLSys·im tutorials and exercises, you will:

  • Identify bottlenecks — Determine whether a workload is memory-bound or compute-bound on any hardware, and understand why
  • Reason quantitatively — Use real datasheet numbers (not made-up examples) to calculate latency, throughput, and cost
  • Build systems intuition — See how batch size, precision, parallelism strategy, and datacenter location each affect performance
  • Think across the stack — Connect workload characteristics to hardware specs to infrastructure constraints

Your Learning Path

Start at the top and work through in order. Each tutorial builds on the one before it.

Step Tutorial You’ll Learn Time
1 Hello, Roofline The roofline model, memory-bound vs. compute-bound, batch size sweeps 10 min
2 The Memory Wall Why 3.2× more FLOPS ≠ 3.2× speedup, bandwidth as binding constraint 15 min
3 Two Phases, One Request TTFT vs. ITL, the two phases of LLM inference 15 min
4 KV-Cache: The Hidden Tax KV-cache pressure, concurrent serving limits, PagedAttention 20 min
5 Starving the GPU CPU preprocessing bottlenecks, when the GPU starves 15 min
6 Quantization: Not a Free Lunch Regime-dependent speedup from quantization 20 min
7 Scaling to 1000 GPUs Data/tensor/pipeline parallelism, reliability at scale 20 min
8 Geography is a Systems Variable Energy, carbon footprint, regional grid effects 15 min
9 The $9M Question Inference-time compute scaling, cost of reasoning 20 min
10 Sensitivity Analysis Binding constraints, inverse synthesis, procurement decisions 20 min
TipPredict Before You Compute

Every tutorial includes “predict first” exercises. Before running code, write down what you expect. This practice builds the mental models that make you effective at systems reasoning.


How MLSys·im Pairs with the Textbook

MLSys·im is the companion framework for the Machine Learning Systems textbook. Each solver maps to specific chapters:

Textbook Topic MLSys·im Solver What It Models
Hardware Acceleration SingleNodeModel Roofline analysis, compute vs. memory bottleneck
Model Serving ServingModel TTFT, ITL, KV-cache memory
Distributed Training DistributedModel 3D parallelism, all-reduce, pipeline bubbles
Compute Infrastructure EconomicsModel CapEx, OpEx, TCO
Sustainable AI SustainabilityModel Energy, carbon, water usage
Fault Tolerance ReliabilityModel MTBF, checkpoint interval
Scaling Physics ScalingModel, InferenceScalingModel Chinchilla laws, CoT cost
Data Engineering DataModel, TransformationModel I/O stalls, CPU preprocessing
Sensitivity Analysis SensitivitySolver, SynthesisSolver Binding constraints, inverse Roofline

Not using the textbook? No problem — MLSys·im is self-contained. The Math Foundations page documents every equation.


Prerequisites

  • Python: Comfortable with functions, loops, and f-strings
  • Math: Basic algebra (no calculus required — all solver equations are arithmetic)
  • ML: Familiarity with terms like “model parameters,” “inference,” and “training” (the Glossary defines everything else)

No GPU, no cloud account, no special hardware required. Just:

pip install mlsysim

Quick Start

import mlsysim
from mlsysim import SingleNodeModel

# Load a model and hardware from the vetted registry
model = mlsysim.Models.ResNet50
gpu   = mlsysim.Hardware.Cloud.A100

# Solve: is this workload memory-bound or compute-bound?
solver = SingleNodeModel()
profile = solver.solve(model=model, hardware=gpu, batch_size=1, precision="fp16")

print(f"Bottleneck: {profile.bottleneck}")   # → Memory Bound
print(f"Latency:    {profile.latency.to('ms'):~.2f}")

Next Steps

Back to top