TA Guide

Everything you need to run labs, grade assignments, and support students

Welcome to the teaching team. This guide covers what you need to know to be an effective TA for the ML Systems course.

Before the Semester

TA Preparation Checklist

Complete these before Week 1:

Read the textbook chapters you will be covering (at minimum, the Part you are assigned)
Complete TinyTorch Modules 01–08 yourself (Foundations semester)
Run all labs for your assigned weeks — note where students will get stuck
Read the Pedagogy Guide — understand Prediction Locks, Decision Logs, and the A-B-C structure
Read the Assessment & Grading — internalize the Decision Log rubric
Attend the grading calibration session (Week 0)
Set up nbgrader on your machine (see TinyTorch Instructor Guide)

Grading Decision Logs

Decision Logs are the most important written artifact in the course. Every student submits one per week, so plan your grading time accordingly. Here is how to do it efficiently.

The 3-Question Speed Rubric

For each Decision Log, ask three questions:

Numbers? Did the student cite specific values from the instruments? (latency, memory, throughput, accuracy)
Why? Did the student use Iron Law terminology to explain the cause?
Tradeoff? Did the student acknowledge what they sacrificed for what they gained?

Yes to all three → 27-30 points. Missing one → 18-22. Missing two+ → 6-12.

See Assessment & Grading for the full rubric and sample student work at each quality level.

Time Budget

Expect 3–5 minutes per Decision Log using the speed rubric. For a 30-student section, that’s ~2 hours per week. See Assessment & Grading for full grading load estimates across all assignment types.

Batch Grading Tips

Grade all Decision Logs for one week in a single sitting (consistency matters)
Read the excellent sample first to calibrate your expectations
Mark the first 5, then check with another TA — align before continuing
Flag borderline cases for the instructor rather than agonizing

Grading TinyTorch

Auto-Graded (70 points)

Run pytest on student submissions — pass/fail per test
Students who pass all tests get 70/70; no partial credit per test
If a student passes 90%+ of tests, check whether the failures are edge cases vs. fundamental errors

Systems Thinking Questions (30 points)

Each module has 3 manually-graded questions (10 points each). Use this scale:

Score	What It Looks Like
10	Correct reasoning + quantitative estimate + hardware awareness
7	Right direction, missing numbers or hardware specifics
4	Partially correct with significant conceptual gaps
1	Attempted but fundamentally wrong

Running Lab Sections (50 minutes)

Recommended Structure

Time	Activity	Your Role
0-5 min	Prediction Lock	Collect predictions; don’t reveal answers
5-15 min	Part A walkthrough	Circulate; help students who can’t get instruments running
15-30 min	Part B exploration	Ask probing questions (see below)
30-45 min	Part C design challenge	Let students struggle; intervene only if truly stuck
45-50 min	Debrief	Revisit predictions; discuss surprises

Probing Questions to Use While Circulating

Situation	What to Ask
Student says “it’s faster”	“How much faster? Which Iron Law term changed?”
Student hits an OOM error	“Find the exact value where it breaks. What constraint did you hit?”
Student doesn’t know what to try	“Change one variable. What happened? Now try a different one.”
Student finishes Part B early	“Can you find a configuration 2\(\times\) better than your best? What’s the limit?”
Student’s prediction was wrong	“What did you assume that turned out to be false?”

Common Student Struggles by Week

Semester 1 (Foundations)

Week	Common Issue	How to Help
1-2	“What is a system? I thought this was an ML class.”	Redirect: “The model is just one layer. What carries the data to the model? What executes the math?”
5-6	TinyTorch Module 03 (Layers) — broadcasting bugs	Check tensor shapes at each step; remind students that NumPy broadcasting rules apply
6-8	TinyTorch Module 06 (Autograd) — wrong gradients	Most common cause: incorrect topological sort order. Have them draw the computation graph on paper first
8	“My training loop is slow”	Ask: “Is the GPU actually busy? Check utilization. The bottleneck is usually data loading, not compute.”
10	Lab 09 (Quantization) — “INT8 destroyed my model”	Check if they are quantizing batch norm layers. Remind them to use calibration data
13-16	Capstone overwhelm	Break it down: “First, meet the accuracy target. Then optimize for latency. Then for memory. One constraint at a time.”

Semester 2 (Scale)

Week	Common Issue	How to Help
5	“Which parallelism should I use?”	“Calculate communication-to-computation ratio for each strategy. The math tells you.”
6	“AllReduce is confusing”	Draw the ring on the whiteboard with 4 nodes. Walk through one full cycle
7	“Why checkpoint so often?”	“Calculate expected time-to-failure for 1000 GPUs. Now multiply by cost per GPU-hour.”
10	KV-cache memory confusion	“How many bytes per token per layer? Multiply by sequence length times batch size times number of layers.”

Office Hours Protocol

How Much Help is Too Much?

Do: Ask clarifying questions. Help students debug their approach, not their code.
Do: Point students to the right textbook section or lab instrument.
Don’t: Write code for students. Don’t give away Part C answers.
Don’t: Debug TinyTorch implementations line by line — have them add print statements and explain what they see.

The 10-Minute Rule

If a student has been stuck for 10+ minutes during office hours:

Ask them to explain what they’ve tried (this often unsticks them)
If still stuck, narrow the problem: “Is it a shape error, a value error, or a logic error?”
If still stuck after 15 minutes, give a directed hint: “Look at how the gradient flows through this specific node”

Escalation

Grading disputes: Flag for the instructor. Do not overrule your own grade without discussion.
Academic integrity concerns: Flag for the instructor immediately. Do not confront the student.
Accessibility needs: Refer to the instructor and campus disability services.

Quick Reference: What’s Due Each Week

See the full syllabi for detailed weekly breakdowns:

Foundations Syllabus — every week has a table with Read / Lab / Build / Due
Scale Syllabus — every week has a table with Read / Lab / Due

Each week, students typically submit:

A Decision Log (200 words) for the lab they completed
A TinyTorch module (Foundations only) auto-graded via pytest
A Design Challenge (bi-weekly) for the open-ended Part C problems