For Instructors

Reproducible, hardware-independent exercises for ML systems courses.

MLSys·im provides a framework for assigning analytically grounded problem sets where every answer is deterministic and reproducible — regardless of what hardware your students have access to.


Why MLSys·im for Teaching?

Challenge How MLSys·im Helps
Students lack GPU access All analysis runs on a laptop — no cloud credits needed
Homework answers vary by hardware Vetted registry specs produce identical results everywhere
Hard to grade open-ended systems questions Analytical solvers give deterministic, verifiable outputs
Specifications become stale Registry updated from official datasheets; one update propagates everywhere
Students memorize without understanding “Predict first” exercises build genuine intuition

Course Integration Patterns

Pattern 1: Textbook Companion

MLSys·im maps directly to chapters in the Machine Learning Systems textbook. Assign tutorials alongside readings:

Week Textbook Chapter MLSys·im Assignment
3 Hardware Acceleration Hello, Roofline + The Memory Wall
4 Data Engineering Starving the GPU — CPU preprocessing bottleneck
5 Model Serving Two Phases, One Request — TTFT/ITL analysis
7 Distributed Training Scaling to 1000 GPUs — 3D parallelism
8 Scaling Physics The $9M Question — Inference compute scaling
9 Sustainable AI Geography is a Systems Variable — Carbon footprint
11 Compute Infrastructure Sensitivity Analysis — Binding constraints + procurement

Pattern 2: Standalone Labs

Use individual tutorials as self-contained lab assignments in any systems course. Each tutorial includes exercises with clear expected outputs.

Pattern 3: Capstone Projects

Advanced students can write custom solvers (see Extending MLSys·im) or compose multiple solvers to answer research-style questions.


Assignment Ideas

Homework: Hardware Comparison (30 min)

Using Engine.solve(), compare ResNet-50 inference latency on the A100, H100, and Jetson AGX at batch sizes 1, 32, and 256. For each configuration, state whether the workload is memory-bound or compute-bound and explain why the bottleneck changes.

Lab: Carbon-Aware Training (45 min)

Using the SustainabilityModel, calculate the carbon footprint of training GPT-3 on a 256-GPU H100 cluster in Quebec vs. US Average vs. Poland. Produce a table and a 2-paragraph analysis of why location matters.

Exam Question: Back-of-Envelope

The NVIDIA H100 has 1,979 TFLOP/s (FP16) and 3.35 TB/s bandwidth. What is the ridge point in FLOP/Byte? If a model has arithmetic intensity of 50 FLOP/Byte, is it compute-bound or memory-bound? Show your work.

Homework: Edge vs. Cloud Trade-off (30 min)

Using Engine.solve(), compare MobileNetV2 inference latency on ESP32-S3, Jetson Orin NX, iPhone 15 Pro, and H100. Calculate energy-per-inference for each (latency × TDP). Which device has the best latency? Which has the best energy efficiency (inferences per joule)? When would you choose the edge device despite worse latency?

Lab: Data Pipeline Bottleneck (30 min)

Using DataModel and TransformationModel, configure an ImageNet training pipeline with 8 CPU workers on an A100. At what batch size does the CPU preprocessing become the bottleneck? Verify by comparing transform_time to compute_time. Propose two solutions and evaluate each with the solver.

Homework: Checkpoint Cost Analysis (45 min)

Using ReliabilityModel, calculate the fleet MTBF for a 256-GPU H100 cluster. Then use CheckpointModel to find the optimal checkpoint interval and the MFU penalty. How much training time is “wasted” on checkpointing over a 30-day run? What happens if you double the checkpoint frequency?

Lab: LLM Sizing Exercise (45 min)

A startup wants to serve Llama-3.1-70B with ITL < 30 ms/token. Using SynthesisSolver, derive the minimum memory bandwidth required. Which hardware in the Silicon Zoo meets this requirement? Using EconomicsModel, calculate the annual cost of each qualifying option. Present a procurement recommendation with cost projections.

Lab: Scaling Laws and Carbon (60 min)

Using ScalingModel, find the compute-optimal model size for a budget of 10²¹ FLOPs. Then use DistributedModel to determine how many H100s are needed and how long training will take. Finally, use SustainabilityModel to compare the carbon footprint in Quebec vs. Poland. Write a 3-paragraph memo recommending a training location with quantitative justification.


Autograding with MLSys·im

MLSys·im’s deterministic outputs from frozen registry specifications make it ideal for automated grading:

  • Deterministic results: All solvers produce identical floating-point outputs from the same inputs, across platforms and semesters
  • Simple assertions: Write pytest-style checks against expected values
# Example: test_homework.py
import mlsysim
from mlsysim import Engine

def test_resnet50_bottleneck_a100():
    profile = Engine.solve(
        model=mlsysim.Models.ResNet50,
        hardware=mlsysim.Hardware.Cloud.A100,
        batch_size=1, precision="fp16"
    )
    assert profile.bottleneck == "Memory Bound"
    assert abs(profile.latency.to('ms').magnitude - 0.42) < 0.05
TipSemester-Proof Answer Keys

Pin the mlsysim version in your course requirements (mlsysim==0.1.0) to guarantee reproducibility. Registry specs are frozen per release — the same version always produces the same numbers.


Reproducibility Guarantee

All specifications in the MLSys Zoo are:

  • Sourced from official manufacturer datasheets and published benchmarks
  • Typed with pint.Quantity for dimensional correctness
  • Frozen per release — mlsysim==0.1.0 always produces the same answers

This means your answer key works for every student, every semester.


Jupyter & Quarto Compatibility

All tutorials are designed to run in:

  • Jupyter Notebooks — Standard .ipynb workflow
  • Quarto documents — Render to HTML, PDF, or slides with quarto render
  • Google Colabpip install mlsysim in the first cell, then go

No GPU runtime required. CPU-only environments work perfectly because MLSys·im computes from equations, not empirical profiling.


Getting Started

  1. Point students to the Getting Started guide for installation
  2. Assign the Hello, Roofline tutorial as a warmup
  3. Use the Resolver Guide to select solvers for your course topics
  4. Browse the MLSys Zoo for available hardware and model specifications

Next Steps

Back to top