Foundation Tier (Modules 01-07)#
Build the mathematical core that makes neural networks learn.
What Youâll Learn#
The Foundation tier teaches you how to build a complete learning system from scratch. Starting with basic tensor operations, youâll construct the mathematical infrastructure that powers every modern ML frameworkâautomatic differentiation, gradient-based optimization, and training loops.
By the end of this tier, youâll understand:
How tensors represent and transform data in neural networks
Why activation functions enable non-linear learning
How backpropagation computes gradients automatically
What optimizers do to make training converge
How training loops orchestrate the entire learning process
Module Progression#
graph TB
M01[01. Tensor<br/>Multidimensional arrays] --> M03[03. Layers<br/>Linear transformations]
M02[02. Activations<br/>Non-linear functions] --> M03
M03 --> M04[04. Losses<br/>Measure prediction quality]
M03 --> M05[05. Autograd<br/>Automatic differentiation]
M04 --> M06[06. Optimizers<br/>Gradient-based updates]
M05 --> M06
M06 --> M07[07. Training<br/>Complete learning loop]
style M01 fill:#e3f2fd,stroke:#1976d2,stroke-width:3px
style M02 fill:#e3f2fd,stroke:#1976d2,stroke-width:3px
style M03 fill:#bbdefb,stroke:#1565c0,stroke-width:3px
style M04 fill:#90caf9,stroke:#1565c0,stroke-width:3px
style M05 fill:#90caf9,stroke:#1565c0,stroke-width:3px
style M06 fill:#64b5f6,stroke:#0d47a1,stroke-width:3px
style M07 fill:#42a5f5,stroke:#0d47a1,stroke-width:4px
Fig. 2 Foundation Module Dependencies. Tensors and activations feed into layers, which connect to losses and autograd, enabling optimizers and ultimately training loops.#
Module Details#
01. Tensor - The Foundation of Everything#
What it is: Multidimensional arrays with automatic shape tracking and broadcasting.
Why it matters: Tensors are the universal data structure for ML. Understanding tensor operations, broadcasting, and memory layouts is essential for building efficient neural networks.
What youâll build: A pure Python tensor class supporting arithmetic, reshaping, slicing, and broadcastingâjust like PyTorch tensors.
Systems focus: Memory layout, broadcasting semantics, operation fusion
02. Activations - Enabling Non-Linear Learning#
What it is: Non-linear functions applied element-wise to tensors.
Why it matters: Without activations, neural networks collapse to linear models. Activations like ReLU, Sigmoid, and Tanh enable networks to learn complex, non-linear patterns.
What youâll build: Common activation functions with their gradients for backpropagation.
Systems focus: Numerical stability, in-place operations, gradient flow
03. Layers - Building Blocks of Networks#
What it is: Parameterized transformations (Linear, Conv2d) that learn from data.
Why it matters: Layers are the modular components you stack to build networks. Understanding weight initialization, parameter management, and forward passes is crucial.
What youâll build: Linear (fully-connected) layers with proper initialization and parameter tracking.
Systems focus: Parameter storage, initialization strategies, forward computation
04. Losses - Measuring Success#
What it is: Functions that quantify how wrong your predictions are.
Why it matters: Loss functions define what âgoodâ means for your model. Different tasks (classification, regression) require different loss functions.
What youâll build: CrossEntropyLoss, MSELoss, and other common objectives with their gradients.
Systems focus: Numerical stability (log-sum-exp trick), reduction strategies
05. Autograd - The Gradient Revolution#
What it is: Automatic differentiation system that computes gradients through computation graphs.
Why it matters: Autograd is what makes deep learning practical. It automatically computes gradients for any computation, enabling backpropagation through arbitrarily complex networks.
What youâll build: A computational graph system that tracks operations and computes gradients via the chain rule.
Systems focus: Computational graphs, topological sorting, gradient accumulation
06. Optimizers - Learning from Gradients#
What it is: Algorithms that update parameters using gradients (SGD, Adam, RMSprop).
Why it matters: Raw gradients donât directly tell you how to update parameters. Optimizers use momentum, adaptive learning rates, and other tricks to make training converge faster and more reliably.
What youâll build: SGD, Adam, and RMSprop with proper momentum and learning rate scheduling.
Systems focus: Update rules, momentum buffers, numerical stability
07. Training - Orchestrating the Learning Process#
What it is: The training loop that ties everything togetherâforward pass, loss computation, backpropagation, parameter updates.
Why it matters: Training loops orchestrate the entire learning process. Understanding this flowâincluding batching, epochs, and validationâis essential for practical ML.
What youâll build: A complete training framework with progress tracking, validation, and model checkpointing.
Systems focus: Batch processing, gradient clipping, learning rate scheduling
What You Can Build After This Tier#
timeline
title Historical Achievements Unlocked
1957 : Perceptron : Binary classification with gradient descent
1969 : XOR Crisis Solved : Hidden layers enable non-linear learning
1986 : MLP Revival : Multi-layer networks achieve 95%+ on MNIST
Fig. 3 Foundation Tier Milestones. After completing modules 01-07, you unlock three historical achievements spanning three decades of neural network breakthroughs.#
After completing the Foundation tier, youâll be able to:
Milestone 01 (1957): Recreate the Perceptron, the first trainable neural network
Milestone 02 (1969): Solve the XOR problem that nearly ended AI research
Milestone 03 (1986): Build multi-layer perceptrons that achieve 95%+ accuracy on MNIST
Prerequisites#
Required:
Python programming (functions, classes, loops)
Basic linear algebra (matrix multiplication, dot products)
Basic calculus (derivatives, chain rule)
Helpful but not required:
NumPy experience
Understanding of neural network concepts
Time Commitment#
Per module: 3-5 hours (implementation + exercises + systems thinking)
Total tier: ~25-35 hours for complete mastery
Recommended pace: 1-2 modules per week
Learning Approach#
Each module follows the Build â Use â Reflect cycle:
Build: Implement the component from scratch (tensor operations, autograd, optimizers)
Use: Apply it to real problems (toy datasets, simple networks)
Reflect: Answer systems thinking questions (memory usage, computational complexity, design trade-offs)
Next Steps#
Ready to start building?
# Start with Module 01: Tensor
tito module start 01_tensor
# Follow the daily workflow
# 1. Read the ABOUT guide
# 2. Implement in *_dev.py
# 3. Test with tito module test
# 4. Export to *_sol.py
Or explore other tiers:
Architecture Tier (Modules 08-13): CNNs, transformers, attention
Optimization Tier (Modules 14-19): Production-ready performance
Torch Olympics (Module 20): Compete in ML systems challenges