Machine Learning Systems
  • Co-Labs
    • Textbook
    • TinyTorch
    • Hardware Kits

    • Co-Labs
  • Star
  • Support
  • Subscribe
  • GitHub
    • Edit this page
    • Report an issue
    • Discussions
    • View source

🚧 DEVELOPMENT PREVIEW - Built from dev@afabe9ae • 2025-12-30 23:07 EST • Stable version →

💡 Co-Labs: Interactive ML simulations coming in 2026. Learn more →
📚 Textbook: Read the ML Systems book. Explore →

Coming 2026

Co-Labs

See ML Systems in Action

Watch quantization compress models. Measure memory hierarchies. Profile gradient flow.

Why Another Set of Notebooks?

There are many excellent Colab notebooks for ML. Most demonstrate algorithms. Few help you understand systems.

When you quantize a model from FP32 to INT8, what actually happens to the weights? When you increase batch size, where does memory go? When you add a layer, how does gradient flow change?

Co-Labs are designed to answer these questions. Each notebook is systematically aligned with textbook chapters, letting you experiment with the exact system concepts you just read about. The goal is not to teach you how to use PyTorch. It's to show you why PyTorch works the way it does.

— Vijay

The Systems Learning Path

Co-Labs fit between conceptual understanding and building from scratch.

📖

Understand

Learn system design principles: memory, compute, parallelism, efficiency

Textbook →
🔬

Experiment

Measure tradeoffs, profile bottlenecks, see system decisions ripple through models

Co-Labs
🔥

Build

Implement tensors, autograd, and training loops from scratch

TinyTorch →

What You'll Explore

Each Co-Lab maps directly to textbook chapters, focusing on the systems perspective:

Memory Systems

  • Batch size vs memory footprint
  • Activation checkpointing tradeoffs
  • Cache hierarchy effects on training

Numerical Representation

  • FP32 → FP16 → INT8 → INT4
  • Quantization error propagation
  • Mixed precision training dynamics

Compute Efficiency

  • Pruning and sparsity patterns
  • Knowledge distillation mechanics
  • Operator fusion benefits

Deployment Tradeoffs

  • Latency vs throughput curves
  • Batching strategy impact
  • Hardware utilization profiling

Help Shape This

I'm still figuring out what makes the most sense. If you have ideas for experiments that would help you understand ML systems better:

Share Ideas Get Updates
Back to top

© 2024-2025 Harvard University. Licensed under CC-BY-NC-SA 4.0

Part of the Machine Learning Systems textbook