Milestone 01: The Perceptron (1958)
Foundation Milestone | Difficulty: ●○○○ | Time: 30–45 min | Prerequisites: Modules 01–04 (Part 1) · 01–08 (Part 2)
- Why random weights produce random results (and training fixes this)
- How gradient descent transforms guessing into learning
- The fundamental loop that powers all neural network training
Overview
You just finished the Foundation Tier. Your Tensor (Module 01), Linear layer (Module 03), BCELoss (Module 04), autograd (Module 06), SGD (Module 07), and Trainer (Module 08) are all working. This milestone runs the simplest possible model those pieces can drive — and the one that started the field.
It’s 1958. Computers fill entire rooms and can barely add numbers. Then Frank Rosenblatt makes an outrageous claim: he’s built a machine that can LEARN. Not through programming — through experience, like a human child.
The press goes wild. The Navy funds research expecting machines that will “walk, talk, see, write, reproduce itself and be conscious of its existence.” The New York Times runs the headline: “New Navy Device Learns by Doing.”
The optimism was premature. The insight wasn’t. You’re about to recreate the moment machine learning was born — with components YOU built yourself.
What You’ll Build
A single-layer perceptron for binary classification that demonstrates:
- The Problem — random weights produce random predictions (~50% accuracy)
- The Solution — training transforms random weights into learned patterns (95%+ accuracy)
Input (features) --> Linear --> Sigmoid --> Output (0 or 1)
Prerequisites
Table 1 lists the modules you need to have completed before starting.
| Module | Component | What It Provides |
|---|---|---|
| 01 | Tensor | YOUR data structure |
| 02 | Activations | YOUR sigmoid activation |
| 03 | Layers | YOUR Linear layer |
| 04 | Losses | YOUR loss functions |
| 06–08 | Training Infrastructure | YOUR autograd + optimizer (Part 2 only) |
Running the Milestone
Finish the prerequisite modules first — Modules 01–04 for Part 1, 01–08 for Part 2. Check your progress:
tito module statuscd milestones/01_1958_perceptron
# Part 1: See the problem
python 01_rosenblatt_forward.py
# Expected: ~50% accuracy (random guessing)
# Part 2: See the solution
python 02_rosenblatt_trained.py
# Expected: 95%+ accuracy (learned pattern)Expected Results
Table 2 records the accuracy and runtime you should expect to see.
| Script | Accuracy | What It Shows |
|---|---|---|
| 01 (Forward Only) | ~50% | Random weights = random guessing |
| 02 (Trained) | 95%+ | Training learns the pattern |
The Aha Moment: Learning IS the Intelligence
You’ll run two scripts. Both use the same architecture — YOUR Linear layer, YOUR sigmoid. But one achieves 50% accuracy (random chance), the other 95%+.
What’s the difference? Not the model. Not the data. The learning loop.
# Script 01: Forward-only (50% accuracy)
output = model(input) # YOUR code computes
loss = loss_fn(output, target) # YOUR code measures
# No backward(), no optimization, no learning
# Result: Random weights stay random
# Script 02: Complete training (95%+ accuracy)
output = model(input) # Same YOUR code
loss = loss_fn(output, target) # Same YOUR code
loss.backward() # YOUR autograd computes gradients
optimizer.step() # YOUR optimizer learns from mistakes
# Result: Random weights become intelligentRun script 01 and watch YOUR Linear layer make random guesses — 50% accuracy, no better than a coin flip. Now run script 02. Same architecture. Same data. But now YOUR autograd engine computes gradients, YOUR optimizer updates weights. Within seconds, accuracy climbs: 60%… 75%… 85%… 95%+.
You just watched YOUR implementation learn. This is the moment Rosenblatt proved machines could improve through experience. And you recreated it with your own code.
Your Code Powers This
Table 3 names the TinyTorch components that power this milestone.
| Component | Your Module | What It Does |
|---|---|---|
Tensor |
Module 01 | Stores inputs and weights |
Sigmoid |
Module 02 | YOUR activation function |
Linear |
Module 03 | YOUR fully-connected layer |
BCELoss |
Module 04 | YOUR loss computation |
backward() |
Module 06 | YOUR autograd engine |
SGD |
Module 07 | YOUR optimizer |
Historical Context
Rosenblatt didn’t just publish — he built. The Mark I Perceptron was custom hardware: a 20×20 grid of photocells wired to motor-driven potentiometers that physically adjusted the weights. The 1958 paper established the two ideas under every modern model: trainable weights and error-driven learning. Eleven years later, Minsky and Papert’s Perceptrons (1969) proved single-layer networks couldn’t learn XOR. Funding collapsed. The first AI winter began.
Systems Insights
- Memory: O(n) parameters for n input features
- Compute: O(n) operations per sample
- Limitation: Can only solve linearly separable problems
What’s Next
Linear separability — the Perceptron’s hard ceiling — sparked the first AI winter. Milestone 02 runs your network on XOR, watches it fail, then adds a hidden layer to break through.
Further Reading
- Original Paper: Rosenblatt, F. (1958). “The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain”
- Wikipedia: Perceptron