Machine Learning Systems
  • Read
    • Volume I: Foundations
    • Volume II: At Scale

    • Volume I PDF
    • Volume I EPUB

    • Volume II PDF
    • Volume II EPUB
  • Build
    • TinyTorch
    • MLSys·IM
    • Interactive Labs
    • Hardware Kits
  • Teach
    • Lecture Slides
    • Instructor Hub
    • Interview Prep
  • Community
    • Newsletter
    • Global Network
    • Workshops & Events
    • Partners & Sponsors
  • About
    • Our Story
    • Mission
    • Milestones
    • People
    • Contributors

    • License
  • Support
  • Star
  • Subscribe
  • GitHub
    • Discussions
    • Edit this page
    • Report an issue
    • View source

Machine Learning Systems

TWO-VOLUME TEXTBOOK

Machine Learning
Systems.

Two volumes. One curriculum.
The physics of AI engineering.

A rigorous, principles-first treatment of how ML systems
are built, optimized, and deployed—from a single
machine to fleet-scale infrastructure.

TinyTorch · MLSys·im · Labs · Hardware Kits · Slides · Instructors · Interviews

GitHub · Open Collective

Volume I cover

Volume I

Introduction to Machine Learning Systems

HTML PDF EPUB

Volume II cover

Volume II

Machine Learning Systems at Scale

HTML PDF EPUB

Scroll to Explore
↓

TINYTORCH

Build it.
From scratch.

20 interactive modules.
Zero magic.

Understand the inner workings of modern ML frameworks by building your own tensor library, automatic differentiation engine, and neural network modules in Python.

Start Building →

class Tensor:
def __init__(self, data):
self.data = data
self.grad = 0.0
self._backward = lambda: None

A pedagogical framework for learning ML systems engineering.

MLSYS·IM

Model the
trade-offs.

One command.
Every bottleneck.

A first-principles modeling engine for reasoning about ML system performance. Evaluate training, serving, and distributed configurations before committing hardware or code.

Explore MLSys·im →

$ mlsysim eval llama-3-70b --hw h100 --batch 132128
ridge point b=1 b=32 b=128 mem-bound compute-bound Arithmetic Intensity FLOP/s
ModelLlama-3-70B HWH100 (80 GB) Batch132128 PrecisionBF16
Memory-bound At ridge point Compute-bound
MFU 12% HBM 54% TTFT 28 ms
MFU 62% HBM 81% TTFT 112 ms
MFU 71% HBM 97% TTFT 840 ms

Change one parameter. Watch every bottleneck shift.

INTERACTIVE LABS

Learn by
doing.

Marimo notebooks.
Coming Summer 2026.

Interactive labs that reveal the hidden costs of ML systems. Explore sustainability, performance trade-offs, and hardware constraints through hands-on simulation.

View Labs →

Lab 15 · Sustainable AIExplore
Same 1,000 GPU-hours. Pick your datacenter.
Norway — Hydro France — Nuclear Singapore — Gas US Average — Mixed India — Coal Poland — Coal
98% 90% 36% 40% 20% 15%
renewable
CO₂1.2 t3.0 t24 t26 t42 t52 t
Water0 ML0.8 ML2.4 ML3.8 ML4.2 ML5.1 ML
PUE1.061.101.201.121.401.58
gCO₂/kWh1050400429700820
NO
1.2
FR
3.0
SG
24
US
26
IN
42
PL
52
⚡Location is a 43× carbon multiplier for the same compute
MarimoExplore mode

Predict, explore, and discover why your intuition was wrong.

HARDWARE KITS

Deploy to
the edge.

Real silicon.
Real constraints.

Take your models out of the cloud and into the physical world. Hands-on deployment labs using Arduino, Raspberry Pi, and Seeed Studio hardware.

Explore Kits →

Nicla Vision

Microcontrollers, single-board computers, and specialized accelerators.

LECTURE SLIDES

Teach it.
Ready to go.

35 Beamer decks.
~38 hours of content.

Complete lecture slide decks with speaker notes, active learning exercises, and 266 original SVG diagrams. Available in Beamer, PDF, and PowerPoint formats.

Browse Slides →

IntroSystemsDNNTrainingAccelDeployEthics
The Iron Law of ML Systems
Ttotal = Dvol/BW + O/(Rpeak·η) + Llat
• Data Term — memory bandwidth is the binding constraint
• Compute Term — utilization η rarely exceeds 0.7
• Latency Term — irreducible orchestration overhead
Harvard University · ML Systems 12 / 38

INSTRUCTOR HUB

Adopt it.
Course in a box.

Two-semester curriculum.
Everything you need.

Syllabi, assessment rubrics, pedagogy guides, and TA resources for teaching AI Engineering. Designed for adoption at any university.

The Blueprint →

The Blueprint — Course Architecture
ML Systems · Two-Semester Curriculum
📖 Semester 1: Foundations
16 weeks · Vol I
Single-machine systems
8 assignments · 2 exams
🌐 Semester 2: At Scale
16 weeks · Vol II
Distributed systems
6 assignments · capstone
📊 Assessment
Rubrics · Grading
Peer review templates
Project milestones
🎓 Teaching Staff
Pedagogy guide
TA handbook
Office hours playbook
📦
Ready
to Ship

INTERVIEW PREP

Ace the
interview.

Systems questions.
Architect-level answers.

Study guides, topic maps, and practice questions organized by deployment domain: cloud, edge, mobile, and TinyML. Built from real interview patterns.

Start Practicing →

TinyML
Edge
How does quantization affect inference latency on edge accelerators?
Cloud
A 70B parameter model needs to serve 1,000 req/s. Walk through your hardware selection and parallelism strategy.
The Architects Rubric L5 · Systems Design
Hardware
Parallelism
Trade-offs
Cloud Edge Mobile TinyML
Vijay Janapa Reddi, Harvard University · MIT Press 2026

© 2024-2026 Harvard University. Licensed under CC-BY-NC-SA 4.0

Volume I · Volume II · About · Community · Newsletter