Milestone 02: The XOR Crisis (1969)

Milestone Info

Foundation Milestone | Difficulty: ●●○○ | Time: 30–45 min | Prerequisites: Modules 01–08

What You’ll Learn

Why single-layer networks have fundamental mathematical limits
How hidden layers enable non-linear decision boundaries
Why “deep” learning is called DEEP

Overview

It’s 1969. Neural networks are the hottest thing in AI. Funding is pouring in. Then Marvin Minsky and Seymour Papert publish a 308-page mathematical proof that destroys everything: perceptrons cannot solve XOR. Not “struggle with” — CANNOT. Mathematically impossible.

Funding evaporates overnight. Research labs shut down. The field dies for 17 years — the infamous AI Winter.

You’re about to live that crisis. You’ll watch your own perceptron — built from your own modules — fail on four points despite a flawless training loop. Loss stuck at 0.69. Accuracy frozen at 50%. Epoch after epoch of futility. Then you’ll add one hidden layer and watch the impossible collapse into the trivial.

What You’ll Build

Two demonstrations of perceptron limitations and the multi-layer solution:

The Crisis — watch a perceptron fail on XOR despite training
The Solution — add a hidden layer and solve the “impossible” problem

Crisis:   Input --> Linear --> Output (FAILS)
Solution: Input --> Linear --> ReLU --> Linear --> Output (100%!)

The XOR Problem

Inputs    Output
x1  x2    XOR
0   0  -->  0   (same)
0   1  -->  1   (different)
1   0  -->  1   (different)
1   1  -->  0   (same)

Plot those four points. The two zeros sit on one diagonal, the two ones on the other. No straight line separates them — and a single-layer perceptron can only draw straight lines. No amount of training fixes that. It’s geometry, not optimization.

Prerequisites

Table 1 lists the modules you need to have completed before starting.

Table 1: Prerequisite modules for the XOR milestone.

Module	Component	What It Provides
01	Tensor	YOUR data structure
02	Activations	YOUR sigmoid/ReLU
03	Layers	YOUR Linear layers
04	Losses	YOUR loss functions
05	DataLoader	YOUR data pipeline
06	Autograd	YOUR automatic differentiation
07	Optimizers	YOUR SGD optimizer
08	Training	YOUR training loop

Running the Milestone

Before running, ensure you have completed Modules 01–08. You can check your progress:

tito module status

cd milestones/02_1969_xor

# Part 1: Experience the crisis
python 01_xor_crisis.py
# Expected: Loss stuck at ~0.69, accuracy ~50%

# Part 2: See the solution
python 02_xor_solved.py
# Expected: Loss --> 0.0, accuracy 100%

Expected Results

Table 2 records the accuracy and runtime you should expect to see.

Table 2: Expected loss and accuracy for the XOR milestone scripts.

Script	Layers	Loss	Accuracy	What It Shows
01 (Single Layer)	1	~0.69 (stuck!)	~50%	Cannot learn XOR
02 (Multi-Layer)	2	–> 0.0	100%	Hidden layers solve it

The Aha Moment: Depth Changes Everything

The numbers in the table are the aftermath. Live, the experiment feels different.

Script 01 starts training. Loss: 0.69… 0.69… 0.69. Still 0.69. Why isn’t it learning? Did you break something?

You check the code. Everything’s correct. Your Linear layer works. Your autograd computes gradients. Your optimizer updates weights. But accuracy stays at 50%.

Then it lands: it’s not broken. It’s impossible. This is what Minsky proved. This is why funding died. Your code is slamming into the same mathematical wall that nearly ended AI research — every component working perfectly, all of it useless against XOR’s geometry.

Then you run script 02. Add one hidden layer. Loss drops immediately: 0.5… 0.3… 0.1… 0.01… 0.0. Accuracy: 100%.

Depth enables non-linear decision boundaries. The hidden layer learns to bend the input space until XOR becomes linearly separable. A single layer can only draw straight lines. Stack two, and you can draw any shape you need.

Same code. Same training loop. Same four points. The impossible is now trivial — and you’ve earned the right to call this deep learning.

Your Code Powers This

Table 3 names the TinyTorch components that power this milestone.

Table 3: TinyTorch components that power the XOR milestone.

Component	Your Module	What It Does
`Tensor`	Module 01	Stores inputs and weights
`ReLU`	Module 02	YOUR activation for hidden layer
`Linear`	Module 03	YOUR fully-connected layers
`BCELoss`	Module 04	YOUR loss computation
`DataLoader`	Module 05	YOUR data pipeline
`backward()`	Module 06	YOUR autograd engine
`SGD`	Module 07	YOUR optimizer
Training loop	Module 08	YOUR training orchestration

Systems Insights

Memory: O(n²) with hidden layers (vs O(n) for perceptron)
Compute: O(n²) operations
Breakthrough: Hidden representations unlock non-linear problems

Historical Context

Minsky and Papert’s proof was mathematically airtight — and read as a verdict on the whole research program. Multi-layer networks were known, but no one had a practical way to train them. That gap took 17 years to close: Rumelhart, Hinton, and Williams published backpropagation through hidden layers in 1986, and the field exhaled.

The lesson is uncomfortable. A correct theorem, applied to the wrong abstraction, set an entire field back nearly two decades.

What’s Next

XOR is a toy: four points, two dimensions, a problem you can solve in your head. The real question is whether the same trick — stack a hidden layer, let it learn its own representation — survives contact with messy, high-dimensional data. Milestone 03 points the same architecture at 70,000 handwritten digits and finds out.

Milestone 02: The XOR Crisis (1969)

Overview

What You’ll Build

The XOR Problem

Prerequisites

Running the Milestone

Expected Results

The Aha Moment: Depth Changes Everything

Your Code Powers This

Systems Insights

Historical Context

What’s Next

Further Reading