Milestone 02: The XOR Crisis (1969)

NoteMilestone Info

Foundation Milestone | Difficulty: ●●○○ | Time: 30–45 min | Prerequisites: Modules 01–08

TipWhat You’ll Learn
  • Why single-layer networks have fundamental mathematical limits
  • How hidden layers enable non-linear decision boundaries
  • Why “deep” learning is called DEEP

Overview

It’s 1969. Neural networks are the hottest thing in AI. Funding is pouring in. Then Marvin Minsky and Seymour Papert publish a 308-page mathematical proof that destroys everything: perceptrons cannot solve XOR. Not “struggle with” — CANNOT. Mathematically impossible.

Funding evaporates overnight. Research labs shut down. The field dies for 17 years — the infamous AI Winter.

You’re about to live that crisis. You’ll watch your own perceptron — built from your own modules — fail on four points despite a flawless training loop. Loss stuck at 0.69. Accuracy frozen at 50%. Epoch after epoch of futility. Then you’ll add one hidden layer and watch the impossible collapse into the trivial.

What You’ll Build

Two demonstrations of perceptron limitations and the multi-layer solution:

  1. The Crisis — watch a perceptron fail on XOR despite training
  2. The Solution — add a hidden layer and solve the “impossible” problem
Crisis:   Input --> Linear --> Output (FAILS)
Solution: Input --> Linear --> ReLU --> Linear --> Output (100%!)

The XOR Problem

Inputs    Output
x1  x2    XOR
0   0  -->  0   (same)
0   1  -->  1   (different)
1   0  -->  1   (different)
1   1  -->  0   (same)

Plot those four points. The two zeros sit on one diagonal, the two ones on the other. No straight line separates them — and a single-layer perceptron can only draw straight lines. No amount of training fixes that. It’s geometry, not optimization.

Prerequisites

Table 1 lists the modules you need to have completed before starting.

Table 1: Prerequisite modules for the XOR milestone.
Module Component What It Provides
01 Tensor YOUR data structure
02 Activations YOUR sigmoid/ReLU
03 Layers YOUR Linear layers
04 Losses YOUR loss functions
05 DataLoader YOUR data pipeline
06 Autograd YOUR automatic differentiation
07 Optimizers YOUR SGD optimizer
08 Training YOUR training loop

Running the Milestone

Before running, ensure you have completed Modules 01–08. You can check your progress:

tito module status
cd milestones/02_1969_xor

# Part 1: Experience the crisis
python 01_xor_crisis.py
# Expected: Loss stuck at ~0.69, accuracy ~50%

# Part 2: See the solution
python 02_xor_solved.py
# Expected: Loss --> 0.0, accuracy 100%

Expected Results

Table 2 records the accuracy and runtime you should expect to see.

Table 2: Expected loss and accuracy for the XOR milestone scripts.
Script Layers Loss Accuracy What It Shows
01 (Single Layer) 1 ~0.69 (stuck!) ~50% Cannot learn XOR
02 (Multi-Layer) 2 –> 0.0 100% Hidden layers solve it

The Aha Moment: Depth Changes Everything

The numbers in the table are the aftermath. Live, the experiment feels different.

Script 01 starts training. Loss: 0.69… 0.69… 0.69. Still 0.69. Why isn’t it learning? Did you break something?

You check the code. Everything’s correct. Your Linear layer works. Your autograd computes gradients. Your optimizer updates weights. But accuracy stays at 50%.

Then it lands: it’s not broken. It’s impossible. This is what Minsky proved. This is why funding died. Your code is slamming into the same mathematical wall that nearly ended AI research — every component working perfectly, all of it useless against XOR’s geometry.

Then you run script 02. Add one hidden layer. Loss drops immediately: 0.5… 0.3… 0.1… 0.01… 0.0. Accuracy: 100%.

Depth enables non-linear decision boundaries. The hidden layer learns to bend the input space until XOR becomes linearly separable. A single layer can only draw straight lines. Stack two, and you can draw any shape you need.

Same code. Same training loop. Same four points. The impossible is now trivial — and you’ve earned the right to call this deep learning.

Your Code Powers This

Table 3 names the TinyTorch components that power this milestone.

Table 3: TinyTorch components that power the XOR milestone.
Component Your Module What It Does
Tensor Module 01 Stores inputs and weights
ReLU Module 02 YOUR activation for hidden layer
Linear Module 03 YOUR fully-connected layers
BCELoss Module 04 YOUR loss computation
DataLoader Module 05 YOUR data pipeline
backward() Module 06 YOUR autograd engine
SGD Module 07 YOUR optimizer
Training loop Module 08 YOUR training orchestration

Systems Insights

  • Memory: O(n²) with hidden layers (vs O(n) for perceptron)
  • Compute: O(n²) operations
  • Breakthrough: Hidden representations unlock non-linear problems

Historical Context

Minsky and Papert’s proof was mathematically airtight — and read as a verdict on the whole research program. Multi-layer networks were known, but no one had a practical way to train them. That gap took 17 years to close: Rumelhart, Hinton, and Williams published backpropagation through hidden layers in 1986, and the field exhaled.

The lesson is uncomfortable. A correct theorem, applied to the wrong abstraction, set an entire field back nearly two decades.

What’s Next

XOR is a toy: four points, two dimensions, a problem you can solve in your head. The real question is whether the same trick — stack a hidden layer, let it learn its own representation — survives contact with messy, high-dimensional data. Milestone 03 points the same architecture at 70,000 handwritten digits and finds out.

Further Reading

Back to top