Interactive Labs

34 interactive labs that run entirely in your browser. No install. No setup. Just open and go.

34Labs

2Volumes

~50Min Each

Lab 01 is one instance of a pattern that repeats 34 times. The rest of this page is about the pattern.

How Labs Work

Each lab is a structured confrontation with a quantitative reality that surprises. The pedagogical design rests on a simple observation: a student who predicts wrong and then discovers why has learned more than a student who reads a correct answer. The prediction lock is what makes that possible — you cannot passively watch the simulator; you have to commit first.

The Predict-Discover-Explain Cycle

Every part within every lab follows the same rhythm:

Stakeholder Scenario — A fictional but realistic message from a CTO, VP of Engineering, or ML lead frames a real-world problem. These are not toy examples — they are the decisions engineers make every day.
Prediction Lock — Before seeing any data, you must commit a structured prediction (multiple choice or numeric estimate). The simulator is locked until you predict. This forces you to surface your assumptions.
Interactive Instruments — Sliders, toggles, and charts powered by the mlsysim physics engine let you explore the design space. Every number traces to a specific textbook claim — no magic constants.
Prediction Reveal — The lab shows you what you predicted versus what actually happened, with specific numbers: “You predicted 2$\times$. Actual: 50$\times$. You were off by 25$\times$.” This gap is the learning moment.
Math Peek — A collapsible accordion reveals the governing equation. You can always see the physics behind the simulator.

Structure of Each Lab

Briefing          ~2 min    Learning objectives, prerequisites, core question
Part A            ~12 min   Calibration --- correct a wrong prior with data
Part B            ~12 min   Deepening --- quantify the mechanism behind Part A
Part C            ~12 min   Cross-context --- same system, different hardware
Part D            ~12 min   Design challenge --- make a decision with trade-offs
Synthesis         ~5 min    Key takeaways, connections, self-assessment

At least one part includes a failure state — push a slider too far and the system crashes (OOM, SLA violation, thermal throttle). These failures are reversible and instructive: the point is to find the boundary, not to punish.

The Design Ledger

Your predictions and design decisions persist across labs in the Design Ledger — a browser-based save system. Lab 08’s training memory budget builds on Lab 05’s activation analysis, which builds on Lab 01’s magnitude calibration. The capstone labs (Lab 16 in each volume) synthesize your full Design Ledger into a portfolio.

Lab Inventory

Volume I: Foundations

I. Foundations

The Architect's Portal

How do these labs work? A 5-minute walkthrough of the predict-discover-explain ritual every lab follows.

The AI Triad

If a model fails for three different physical reasons on three hardware targets, how do you diagnose which axis to fix?

The Iron Law

If you double compute power, why doesn't latency halve?

The Silent Degradation Loop

Why does discovering a constraint at Stage 5 cost 16× more than at Stage 1?

The Data Gravity Trap

When is moving compute to data cheaper than moving data to compute?

II. Build

The Activation Tax

ReLU and Sigmoid produce similar accuracy — so why does the choice determine whether your model fits in cache?

The Quadratic Wall

Why does self-attention cost O(n²) and what does that mean for sequence length?

The Kernel Fusion Dividend

Why does compiled execution run 17× faster than eager mode without changing a single weight?

The Training Memory Budget

Why does a 7B parameter model need 112 GB of memory before storing a single activation?

III. Optimize

The Data Selection Tradeoff

When does curating data produce more accuracy per dollar than adding more data?

The Compression Frontier

Can you compress a model 4× without losing accuracy? Where is the cliff?

The Roofline

Is your workload compute-bound or memory-bound — and why does the answer change everything?

IV. Deploy

The Speedup Ceiling

Amdahl's Law says 5% sequential code limits speedup to 20× regardless of parallelism. Is that right?

The Tail Latency Trap

Your server looks healthy at 50% utilization — why is it on fire at 80%?

The Silent Degradation Problem

Your model shipped Monday. By Friday it lost 3 accuracy points. Your dashboard is green. Why?

Capstone

No Free Fairness

Fairness costs accuracy, explanations cost latency, and all of it costs carbon — how do you budget?

The Architect's Audit

Synthesize every invariant from 15 labs into one deployment decision.

Volume II: At Scale

I. Foundations

The Scale Illusion

If you add 10× more GPUs, do you get 10× more throughput?

The Compute Infrastructure Wall

Why does Model FLOPs Utilization rarely exceed 50% on real hardware?

Network Fabric Design

When does topology, latency, bandwidth, or bisection become the real system limit?

The Data Pipeline Wall

Can your storage feed your GPUs fast enough, or are they starving?

II. Build

The Parallelism Puzzle

Data, tensor, or pipeline parallelism — which strategy fits your model and your cluster?

Collective Communication

Which collective algorithm fits your topology, message size, and residual risk?

When Failure Is Routine

At 10,000 GPUs, what is the probability of zero failures in 24 hours?

The Scheduling Trap

FIFO scheduling wastes 40% of your cluster. Can you do better?

III. Optimize

The Optimization Trap

Is the optimization you are about to implement attacking the right bottleneck?

The Inference Economy

What fraction of your ML budget is training vs. serving — and why does it flip at scale?

The Edge Thermodynamics Lab

When does moving inference to the edge save energy vs. waste it?

IV. Deploy

The Silent Fleet

At 1,000 models, a 24-hour silent failure costs $1M. How do you detect it?

The Price of Privacy

Differential privacy adds noise. How much accuracy do you lose for how much privacy?

The Robustness Budget

Adversarial training costs 8× compute. When is it worth it?

The Carbon Budget

Moving your training from Iowa to Quebec cuts carbon 10×. Why?

Capstone

The Fairness Budget

How do you allocate a finite fairness budget across competing metrics?

The Fleet Synthesis

Design a production fleet balancing cost, latency, fairness, and carbon.

Run Offline

Optional: Run Offline

Already running in your browser — nothing to install. Power users who want offline access or want to hack the simulations can optionally grab the package:

python3 -m pip install -r labs/requirements.txt
python3 -m pip install -e mlsysim
cd labs
marimo run vol1/lab_01_ml_intro.py

Part of the MLSysBook Ecosystem

These labs bridge the gap between reading about ML systems (the textbook) and building them from scratch (TinyTorch). Every computation is powered by the mlsysim physics engine — the same engine used in the textbook’s quantitative examples.

Share feedback | View source