Interactive Labs
33 interactive labs that run entirely in your browser. No install. No setup. Just open and go.
What happens to accuracy?
Every lab follows the same three-step structure: read a real-world scenario, commit a prediction before seeing any data, then explore the simulator to discover whether your intuition was right. The prediction lock ensures you canโt just passively watch โ you have to think first.
How Labs Work
Each lab is a structured confrontation with a quantitative reality that surprises. The pedagogical design is based on a simple observation: a student who predicts wrong and then discovers why has learned more than a student who reads a correct answer.
The Predict-Discover-Explain Cycle
Every part within every lab follows the same rhythm:
Stakeholder Scenario โ A fictional but realistic message from a CTO, VP of Engineering, or ML lead frames a real-world problem. These are not toy examples โ they are the decisions engineers make every day.
Prediction Lock โ Before seeing any data, you must commit a structured prediction (multiple choice or numeric estimate). The simulator is locked until you predict. This forces you to surface your assumptions.
Interactive Instruments โ Sliders, toggles, and charts powered by the mlsysim physics engine let you explore the design space. Every number traces to a specific textbook claim โ no magic constants.
Prediction Reveal โ The lab shows you what you predicted versus what actually happened, with specific numbers: โYou predicted 2x. Actual: 50x. You were off by 25x.โ This gap is the learning moment.
Math Peek โ A collapsible accordion reveals the governing equation. You can always see the physics behind the simulator.
Structure of Each Lab
Briefing ~2 min Learning objectives, prerequisites, core question
Part A ~12 min Calibration --- correct a wrong prior with data
Part B ~12 min Deepening --- quantify the mechanism behind Part A
Part C ~12 min Cross-context --- same system, different hardware
Part D ~12 min Design challenge --- make a decision with trade-offs
Synthesis ~5 min Key takeaways, connections, self-assessment
At least one part includes a failure state โ push a slider too far and the system crashes (OOM, SLA violation, thermal throttle). These failures are reversible and instructive: the point is to find the boundary, not to punish.
The Design Ledger
Your predictions and design decisions persist across labs in the Design Ledger โ a browser-based save system. Lab 08โs training memory budget builds on Lab 05โs activation analysis, which builds on Lab 01โs magnitude calibration. The capstone labs (Lab 16 in each volume) synthesize your full Design Ledger into a portfolio.
Lab Inventory
Volume I: Foundations (17 labs)
Single-machine ML systems โ from introduction through deployment.
| # | Lab | Core Question |
|---|---|---|
| 00 | The Architectโs Portal | Orientation: how do these labs work? |
| 01 | The AI Triad | If a model fails for three different physical reasons on three hardware targets, how do you diagnose which axis to fix? |
| 02 | The Iron Law | If you double compute power, why doesnโt latency halve? |
| 03 | The Silent Degradation Loop | Why does discovering a constraint at Stage 5 cost 16x more than at Stage 1? |
| 04 | The Data Gravity Trap | When is moving compute to data cheaper than moving data to compute? |
| 05 | The Activation Tax | ReLU and Sigmoid produce similar accuracy โ so why does the choice determine whether your model fits in cache? |
| 06 | The Quadratic Wall | Why does self-attention cost O(n2) and what does that mean for sequence length? |
| 07 | The Kernel Fusion Dividend | Why does compiled execution run 17x faster than eager mode without changing a single weight? |
| 08 | The Training Memory Budget | Why does a 7B parameter model need 112 GB of memory before storing a single activation? |
| 09 | The Data Selection Tradeoff | When does curating data produce more accuracy per dollar than adding more data? |
| 10 | The Compression Frontier | Can you compress a model 4x without losing accuracy? Where is the cliff? |
| 11 | The Roofline | Is your workload compute-bound or memory-bound โ and why does the answer change everything? |
| 12 | The Speedup Ceiling | Amdahlโs Law says 5% sequential code limits speedup to 20x regardless of parallelism. Is that right? |
| 13 | The Tail Latency Trap | Your server looks healthy at 50% utilization โ why is it on fire at 80%? |
| 14 | The Silent Degradation Problem | Your model shipped Monday. By Friday it lost 3 accuracy points. Your dashboard is green. Why? |
| 15 | No Free Fairness | Fairness costs accuracy, explanations cost latency, and all of it costs carbon โ how do you budget? |
| 16 | The Architectโs Audit | Capstone: synthesize every invariant from 15 labs into one deployment decision. |
Volume II: At Scale (16 labs)
Distributed ML systems โ from single machine to multi-datacenter fleet.
| # | Lab | Core Question |
|---|---|---|
| 01 | The Scale Illusion | If you add 10x more GPUs, do you get 10x more throughput? |
| 02 | The Compute Infrastructure Wall | Why does Model FLOPs Utilization rarely exceed 50% on real hardware? |
| 03 | Communication at Scale | At what point does network communication dominate compute in distributed training? |
| 04 | The Data Pipeline Wall | Can your storage feed your GPUs fast enough, or are they starving? |
| 05 | The Parallelism Puzzle | Data, tensor, or pipeline parallelism โ which strategy fits your model and your cluster? |
| 06 | When Failure Is Routine | At 10,000 GPUs, what is the probability of zero failures in 24 hours? |
| 07 | The Scheduling Trap | FIFO scheduling wastes 40% of your cluster. Can you do better? |
| 08 | The Inference Economy | What fraction of your ML budget is training vs. serving โ and why does it flip at scale? |
| 09 | The Optimization Trap | Is the optimization you are about to implement attacking the right bottleneck? |
| 10 | The Edge Thermodynamics Lab | When does moving inference to the edge save energy vs. waste it? |
| 11 | The Silent Fleet | At 1,000 models, a 24-hour silent failure costs $1M. How do you detect it? |
| 12 | The Price of Privacy | Differential privacy adds noise. How much accuracy do you lose for how much privacy? |
| 13 | The Robustness Budget | Adversarial training costs 8x compute. When is it worth it? |
| 14 | The Carbon Budget | Moving your training from Iowa to Quebec cuts carbon 10x. Why? |
| 15 | The Fairness Budget | How do you allocate a finite fairness budget across competing metrics? |
| 16 | The Fleet Synthesis | Capstone: design a production fleet balancing cost, latency, fairness, and carbon. |
Run Offline
Optional: Run Offline
Already running in your browser โ nothing to install. Power users who want offline access or want to hack the simulations can optionally grab the package:
pip install mlsysim
marimo run lab_01_ml_intro.py
Part of the MLSysBook Ecosystem
These labs bridge the gap between reading about ML systems (the textbook) and building them from scratch (TinyTorch). Every computation is powered by the mlsysim physics engine โ the same engine used in the textbookโs quantitative examples.