Interactive Labs
33 interactive labs that run entirely in your browser. No install. No setup. Just open and go.
What happens to accuracy?
Lab 01 is one instance of a pattern that repeats 33 times. The rest of this page is about the pattern.
How Labs Work
Each lab is a structured confrontation with a quantitative reality that surprises. The pedagogical design rests on a simple observation: a student who predicts wrong and then discovers why has learned more than a student who reads a correct answer. The prediction lock is what makes that possible — you cannot passively watch the simulator; you have to commit first.
The Predict-Discover-Explain Cycle
Every part within every lab follows the same rhythm:
Stakeholder Scenario — A fictional but realistic message from a CTO, VP of Engineering, or ML lead frames a real-world problem. These are not toy examples — they are the decisions engineers make every day.
Prediction Lock — Before seeing any data, you must commit a structured prediction (multiple choice or numeric estimate). The simulator is locked until you predict. This forces you to surface your assumptions.
Interactive Instruments — Sliders, toggles, and charts powered by the mlsysim physics engine let you explore the design space. Every number traces to a specific textbook claim — no magic constants.
Prediction Reveal — The lab shows you what you predicted versus what actually happened, with specific numbers: “You predicted 2\(\times\). Actual: 50\(\times\). You were off by 25\(\times\).” This gap is the learning moment.
Math Peek — A collapsible accordion reveals the governing equation. You can always see the physics behind the simulator.
Structure of Each Lab
Briefing ~2 min Learning objectives, prerequisites, core question
Part A ~12 min Calibration --- correct a wrong prior with data
Part B ~12 min Deepening --- quantify the mechanism behind Part A
Part C ~12 min Cross-context --- same system, different hardware
Part D ~12 min Design challenge --- make a decision with trade-offs
Synthesis ~5 min Key takeaways, connections, self-assessment
At least one part includes a failure state — push a slider too far and the system crashes (OOM, SLA violation, thermal throttle). These failures are reversible and instructive: the point is to find the boundary, not to punish.
The Design Ledger
Your predictions and design decisions persist across labs in the Design Ledger — a browser-based save system. Lab 08’s training memory budget builds on Lab 05’s activation analysis, which builds on Lab 01’s magnitude calibration. The capstone labs (Lab 16 in each volume) synthesize your full Design Ledger into a portfolio.
Lab Inventory
Volume I: Foundations
I. Foundations
How do these labs work? A 5-minute walkthrough of the predict-discover-explain ritual every lab follows.
If a model fails for three different physical reasons on three hardware targets, how do you diagnose which axis to fix?
If you double compute power, why doesn't latency halve?
Why does discovering a constraint at Stage 5 cost 16× more than at Stage 1?
When is moving compute to data cheaper than moving data to compute?
II. Build
ReLU and Sigmoid produce similar accuracy — so why does the choice determine whether your model fits in cache?
Why does self-attention cost O(n²) and what does that mean for sequence length?
Why does compiled execution run 17× faster than eager mode without changing a single weight?
Why does a 7B parameter model need 112 GB of memory before storing a single activation?
III. Optimize
When does curating data produce more accuracy per dollar than adding more data?
Can you compress a model 4× without losing accuracy? Where is the cliff?
Is your workload compute-bound or memory-bound — and why does the answer change everything?
IV. Deploy
Amdahl's Law says 5% sequential code limits speedup to 20× regardless of parallelism. Is that right?
Your server looks healthy at 50% utilization — why is it on fire at 80%?
Your model shipped Monday. By Friday it lost 3 accuracy points. Your dashboard is green. Why?
Capstone
Volume II: At Scale
I. Foundations
If you add 10× more GPUs, do you get 10× more throughput?
Why does Model FLOPs Utilization rarely exceed 50% on real hardware?
At what point does network communication dominate compute in distributed training?
Can your storage feed your GPUs fast enough, or are they starving?
II. Build
Data, tensor, or pipeline parallelism — which strategy fits your model and your cluster?
At 10,000 GPUs, what is the probability of zero failures in 24 hours?
FIFO scheduling wastes 40% of your cluster. Can you do better?
What fraction of your ML budget is training vs. serving — and why does it flip at scale?
III. Optimize
IV. Deploy
Capstone
Run Offline
Optional: Run Offline
Already running in your browser — nothing to install. Power users who want offline access or want to hack the simulations can optionally grab the package:
python3 -m pip install -r labs/requirements.txt
python3 -m pip install -e mlsysim
cd labs
marimo run vol1/lab_01_ml_intro.py
Part of the MLSysBook Ecosystem
These labs bridge the gap between reading about ML systems (the textbook) and building them from scratch (TinyTorch). Every computation is powered by the mlsysim physics engine — the same engine used in the textbook’s quantitative examples.