Big Picture

2-minute orientation before you begin building

You’re about to build a working ML framework, one module at a time. Before diving in, take two minutes to see how the twenty modules connect, what you’ll have when you’re done, and which path through the book fits your goals.

🔥 TinyTorch Overview

· AI-generated

1 / -

Loading slides…

The Journey: Foundation to Production

TinyTorch takes you from a bare tensor to a production-style ML system in twenty modules. They connect like this.

Three tiers, one system:

  • Foundation (01-08) — Build the core machinery. Tensors hold data, activations add non-linearity, layers combine them, losses measure error, DataLoader streams batches, autograd computes gradients, optimizers update weights, training orchestrates the loop.

  • Architecture (green, 09-13) — Apply the foundation to real problems. The DataLoader from Module 05 feeds data; from there you take one of two paths—convolutions for images, or the transformer stack (Tokenization → Embeddings → Attention → Transformers) for text.

  • Optimization (14-19) — Make it fast. Profile to find bottlenecks, then apply quantization, compression, acceleration, or memoization. Benchmark to prove the gain.

Figure 1 shows how the pieces fit together.

Three-tier architecture. Foundation tier (blue, 01-08) stacks Tensor -> Activations -> Layers -> Losses -> DataLoader -> Autograd -> Optimizers -> Training. Architecture tier (green, 09-13) shows CNNs alongside a Tokenization -> Embeddings -> Attention -> Transformers chain. Optimization tier (orange, 14-19) shows Profiling fanning out to Quantization, Compression, Acceleration, and Memoization, all converging on Benchmarking. Red cross-tier arrows route through inter-panel gutters: DataLoader -> CNNs and Tokenization, CNNs and Transformers -> Profiling, and Benchmarking -> Capstone. Capstone (20) sits centered below the three tiers as an MIT-red accent.
Figure 1: TinyTorch Module Flow. The 20 modules progress through three tiers: Foundation (blue) builds core ML primitives, Architecture (green) applies them to vision and language tasks, and Optimization (orange) makes systems production-ready.
Back to top

Flexible paths:

Milestones You’ll Unlock

As you build, you unlock historical milestones—moments when your code does something that once made headlines:

  1. 1958 Perceptron: Your first learning algorithm with automatic weight updates (Rosenblatt)
  2. 1969 XOR: Your MLP solves the problem that stumped single-layer networks (Minsky & Papert → Rumelhart)
  3. 1986 MLP: Your network recognizes handwritten digits on real data
  4. 1998 CNN: Your convolutional network classifies images with spatial understanding (LeCun’s LeNet-5)
  5. 2017 Transformer: Your attention mechanism generates text (Vaswani et al.)
  6. 2018 MLPerf: Your optimized system benchmarks at production speed

Each milestone activates when you complete the required modules. You’re not just learning—you’re recreating seventy years of ML evolution, one working implementation at a time.

What You’ll Have at the End

Concrete outcomes at each major checkpoint:

Table 1 pins down the concrete outcome you unlock at each checkpoint.

Table 1: Concrete outcomes unlocked at each module checkpoint.
After Module You’ll Have Built Historical Context
01-03 Working Perceptron classifier (forward pass) Rosenblatt 1958
01-08 MLP solving XOR + complete training pipeline AI Winter breakthrough 1969→1986
01-09 CNN with convolutions and pooling LeNet-5 (1998)
01-08 + 11-13 GPT model with autoregressive generation “Attention Is All You Need” (2017)
01-08 + 14-19 Optimized, quantized, accelerated system Production ML today
01-20 MLPerf-style benchmarking submission Torch Olympics
TipThe North Star Build

By module 13, you’ll have a complete GPT model generating text—built from raw Python. By module 20, you’ll benchmark your entire framework with MLPerf-style submissions. Every tensor operation, every gradient calculation, every optimization trick: you wrote it.

Choose Your Learning Path

Pick the route that matches your goals and available time.

Sequential Builder
Complete all 20 modules in order

Best for: Students, career transitioners, deep understanding
Time: 60-80 hours (8-12 weeks part-time)
Outcome: Complete mental model of ML systems

Vision Track
01-09 → 14-19 (CNNs + optimization)

Best for: Computer vision focus, MLOps practitioners
Time: 40-50 hours
Outcome: CNN architectures + production optimization

Language Track
01-08 → 10-13 (transformers + GPT)

Best for: NLP focus, research engineers
Time: 35-45 hours
Outcome: Complete GPT model with text generation

Instructor Sampler
Read: 01, 03, 05, 07, 12 (key concepts)

Best for: Evaluating for course adoption
Time: 8-12 hours (reading, not building)
Outcome: Assessment of pedagogical approach

TipAll paths start at Module 01

Module 01 (Tensor) is the foundation everything else builds on. Start there, then switch paths anytime based on what you find interesting.

Expect to Struggle (That’s the Design)

ImportantGetting stuck is not a bug—it’s a feature

TinyTorch treats productive struggle as a teaching tool. You will debug tensor shape mismatches, trace gradient flow through tangled graphs, and fight for memory inside tight constraints. The friction is intentional. It is your brain rewiring around how ML systems actually work.

What helps when you’re stuck:

  • Run the tests early and often—they’re your fastest feedback loop.
  • The if __name__ == "__main__" blocks show the expected workflow.
  • The ML Systems Thinking questions validate that you understood, not just that you typed.
  • Production context notes connect your implementation back to PyTorch and TensorFlow.

When to ask for help:

  • After you’ve run the tests and read the error message carefully.
  • After you’ve tried explaining the problem out loud to a rubber duck.
  • If you’ve been stuck on a single bug for more than thirty minutes.

The goal isn’t to never struggle. It’s to struggle productively, and to leave each module knowing why the working version works.

Start Building

You have the map. Module 01 builds the tensor—the data structure every other module depends on. A few hours from now you’ll have a working Tensor class and a green test suite, and the path to CNNs, transformers, and an MLPerf-style benchmark will be one module shorter.

Next step. Follow the Quick Start Guide to set up your environment (2 minutes), complete Module 01: Tensor (2–3 hours), and watch your first tests pass.

NoteBefore you start

You don’t need to be an expert. You need to be curious and willing to struggle through hard problems. If you want to know why the book is built this way before you write a line of code, read the Learning Philosophy first.

The journey from tensors to transformers starts with a single import tinytorch.