Schedule & Readings
Course Schedule
“The goal isn’t to read everything, but to read the right things deeply and connect them meaningfully.”
📖 Reading Reflection Due: Before each class session
📝 Paper Presentation Signups — Sign up for your paper presentations here!
The Thematic Flow:
- AI for Software: AI systems understand what needs to be computed efficiently
- AI for Architecture: AI agents design how to compute it efficiently in hardware
- AI for Chip Design: AI tools implement the architecture physically in silicon
Table of Contents
📚 Quick Navigation:
Introduction & Foundations
Phase 1: AI for Software
- Week 3 - Code Generation & Software Engineering
- Week 4 - Performance Engineering & Code Optimization
- Week 5 - GPU Kernels & Parallel Programming
- Week 6 - Distributed Systems Integration
Phase 2: AI for Architecture
- Week 7 - Performance Prediction & Design Space Exploration
- Week 8 - Hardware Accelerators & AI Mappings
- Week 9 - Memory Systems & Data Management
- Week 10 - LLM Systems & AI Workload Scheduling
Phase 3: AI for Chip Design
- Week 11 - RTL Design & Logic Synthesis
- Week 12 - Physical Design & Layout
- Week 13 - Verification & Advanced Chip Design
Week 1 - Course Introduction & Logistics
Week of September 1 (first class: September 3)
Course overview, logistics, syllabus, and introduction to the vision of AI-driven computing stack design.
📋 Required Reading:
- Architecture 2.0: Foundations of Artificial Intelligence Agents for Modern Computer System Design
- Architecture 2.0: Why Computer Architects Need a Data-Centric AI Gymnasium
📚 Background Reading: Ch 1. Intro • Ch 2. ML Systems
📓 Class Notes: September 3 - Course Introduction • Materials: Slides • All materials
✍️ Blog Post: Week 1: The End of an Era, The Dawn of Architecture 2.0
Week 2 - Architecture 2.0 & Foundations
Week of September 8
Introduction to Architecture 2.0 and the paradigm shift from human-designed heuristics to agentic design methodologies. Introduction to datasets and survey paper methodology.
📋 Main Papers:
- QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture
- A Computer Architect’s Guide to Designing Abstractions for Intelligent Systems
📖 Supplemental Reading:
📚 Background Reading: Ch 3. DL Primer • Ch 4. DNN Arch
✍️ Blog Post: Week 2: The Fundamental Challenges Nobody Talks About
Phase 1: AI for Software
AI systems understand what needs to be computed efficiently
Week 3 - Code Generation & Software Engineering
Week of September 15
Why are we studying this? Code generation is the most accessible entry point for AI in systems - it’s where LLMs have shown dramatic success, but also where we can clearly see the gap between “impressive demos” and “production-ready tools.” This week examines: How do we evaluate whether AI can actually replace human programmers? What does it mean for code to be “correct” vs. “optimal”? How do we move from toy problems to real software engineering workflows?
🎤 Guest Speaker(s): Ofir Press (Princeton, Postdoc)
Bio: Ofir Press is a Princeton postdoctoral researcher focused on large language models for code and evaluation, and is a creator of SWE-bench and SWE-agent.
🎯 Main Papers:
- SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
- Competition-Level Code Generation with AlphaCode
📖 Supplemental Reading:
- CodeBERT: A Pre-Trained Model for Programming and Natural Languages
- Code Llama: Open Foundation Models for Code
- AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation
- code2vec: Learning Distributed Representations of Code
📚 Background Reading: Ch 6. AI Frameworks • Ch 5. Data Engr
Week 4 - Performance Engineering & Code Optimization
Week of September 22
Why are we studying this? Moving beyond correctness to performance requires understanding both algorithmic complexity and system behavior. This week explores: Can AI learn the subtle performance optimizations that expert programmers use? How do we teach machines to reason about cache behavior, instruction-level parallelism, and memory access patterns? What’s the difference between micro-optimizations and architectural improvements?
🎤 Guest Speaker(s): Amir Yazdanbaksh (Google DeepMind, Research Scientist)
Bio: Amir Yazdanbaksh is a research scientist at Google DeepMind working at the intersection of intelligent systems and computer architecture, with a focus on designing abstractions that enable AI-driven systems.
🎯 Main Papers:
- ECO: An LLM-Driven Efficient Code Optimizer for Warehouse Scale Computers
- Learning Performance-Improving Code Edits
📖 Supplemental Reading:
- Compiler-R1: Towards Agentic Compiler Auto-tuning with Reinforcement Learning
- Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation
- CompilerGym: Robust, Performant Compiler Optimization Environments for AI Research
- MLIR: A Compiler Infrastructure for the End of Moore’s Law
- Learning to Optimize Tensor Programs
📚 Background Reading: Ch 7. Efficient AI • Ch 8. Model Opt
Week 5 - GPU Kernels & Parallel Programming
Week of September 29
Why are we studying this? GPU kernel optimization sits at the intersection of domain expertise and automated optimization - a space too complex for pure heuristics but requiring deep hardware understanding. This week explores: Can AI learn hardware-specific optimization strategies that human experts use? How do we benchmark AI systems against decades of hand-tuned libraries? What happens when the optimization space is so large that even experts disagree on “optimal” solutions?
🎤 Guest Speaker(s): Sasha Rush (Cursor, Researcher)
Bio: Sasha Rush is a researcher at Cursor and Associate Professor at Cornell Tech working on building and improving language models, especially for code optimization
🎯 Main Papers:
📖 Supplemental Reading:
- AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms
- Ansor: Generating High-Performance Tensor Programs for Deep Learning
- Reinforcement Learning for FPGA Placement
📚 Background Reading: Ch 9. AI Acceleration • Ch 7. Efficient AI
Week 6 - Distributed Systems Integration
Week of October 6
Why are we studying this? Distributed systems are the culmination of software engineering challenges - where code generation, performance optimization, and parallel programming must work together at scale. This week examines: How do we optimize systems where the bottleneck might be network latency, not computation? Can AI learn to co-design algorithms and system architecture? What does “optimal” mean when dealing with failures, load balancing, and resource contention?
🎤 Guest Speaker(s): Martin Maas (Google DeepMind, Staff Research Scientist)
Bio: Martin Maas is a Staff Research Scientist at Google DeepMind working on leveraging machine learning to improve runtime systems, operating systems and computer architecture
🎯 Main Papers:
- COSMIC: Enabling Full-Stack Co-Design and Optimization of Distributed Machine Learning Systems
- Reinforcement Learning for Datacenter Congestion Control
📖 Supplemental Reading:
- Spatio-Temporal Self-Supervised Learning for Traffic Flow Prediction
- Remy: TCP ex Machina
- Learning Scheduling Algorithms for Data Processing Clusters (Decima)
- Aurora: A Reinforcement Learning Perspective on Internet Congestion Control
📚 Background Reading: Ch 10. AI Training • Ch 11. ML Ops
Phase 2: AI for Architecture
AI agents design how to compute efficiently in hardware
Week 7 - Performance Prediction & Design Space Exploration
Week of October 13
Why are we studying this? Performance prediction and design space exploration are fundamental to AI-driven architecture design. This week bridges performance modeling with systematic design space navigation. Key questions: How do we model complex interactions between architectural components? Can AI systematically explore spaces too large for human analysis? How do we predict performance across different workloads and design points? What architectural insights can emerge from data that human designers might miss?
🎤 Guest Speaker(s): Suvinay Subramanian (Google DeepMind, Staff Software Engineer)
Bio: Suvinay Subramanian is a Staff Software at Google working on designing and optimizing the performance of specialized hardware-accelerator systems (TPUs) for AI (LLMs, Recommendation Models)
🎯 Main Papers:
- Concorde: Fast and Accurate CPU Performance Modeling with Compositional Analytical-ML Fusion
- ArchGym: An Open-Source Gymnasium for Machine Learning Assisted Architecture Design
- Multi-Agent Reinforcement Learning for Microprocessor Design Space Exploration
📚 Background Reading: Ch 12. Benchmarking AI
✍️ Blog Post: Week 7: The Tacit Knowledge Problem - How AI Agents Learn What Architects Never Wrote Down
📖 Supplemental Reading:
- DNNPerf: Runtime Performance Prediction for Deep Learning Models with Graph Neural Networks
- NeuSight: Forecasting GPU Performance for Deep Learning Training and Inference
- Practical Design Space Exploration (HyperMapper 2.0)
- AutoDSE: Enabling Software Programmers to Design Efficient FPGA Accelerators
- Bayesian Optimization for Accelerator Design Space Exploration
Week 8 - Hardware Accelerators & AI Mappings
Week of October 20
Why are we studying this? Accelerator design is the ultimate co-design challenge - optimizing both the hardware architecture and the mapping of computations onto that hardware. This week explores: How do we jointly optimize dataflow, memory hierarchy, and compute units? Can AI learn the complex trade-offs between energy, performance, and area? What happens when the target workload is itself changing rapidly (like evolving DNN architectures)?
🎤 Guest Speaker(s): Jenny Huang (Nvidia, Research Scientist)
Bio: Jenny Huang is a research scientist at Nvidia working on GPU architecture with the computer architecture research group. Her research focuses on accelerated computing and the co-optimization of algorithm, hardware, and mappings.
🎯 Main Papers:
- DOSA: Differentiable Model-Based One-Loop Search for DNN Accelerators
- Learning to Optimize Tensor Programs (AutoTVM)
📚 Background Reading: Ch 9. AI Acceleration • Ch 8. Model Opt
📖 Supplemental Reading:
- Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks
- In-Datacenter Performance Analysis of a Tensor Processing Unit
- SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks
- MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects
- Understanding Reuse, Performance, and Hardware Cost of DNN Dataflows: A Data-Centric Approach Using MAESTRO
- Timeloop: A Systematic Approach to DNN Accelerator Evaluation
Week 9 - Memory Systems & Data Management
Week of October 27
Why are we studying this? Memory hierarchy design is where the theoretical meets the practical - where algorithmic access patterns meet physical constraints of latency, bandwidth, and energy. This week examines: Can AI learn to predict and optimize for complex memory access patterns? How do we design memory systems for workloads we can’t fully characterize? What’s the relationship between data structure design and memory hierarchy optimization?
🎤 Guest Speaker(s): Milad Hashemi (Google, Research Scientist)
Bio: Milad Hashemi is a research scientist at Google working on the ML, Systems, and Cloud AI team.
🎯 Main Papers:
📚 Background Reading: Ch 5. Data Engr • Ch 2. ML Systems
📖 Supplemental Reading:
- ALEX: An Updatable Adaptive Learned Index
- Learning-based Memory Allocation for C++ Server Workloads
- Designing a Cost-Effective Cache Replacement Policy Using Machine Learning
- Long Short-Term Memory (LSTM) Based Hardware Prefetcher
- Lightweight ML-based Runtime Prefetcher Selection on Many-core Platforms
Week 10 - LLM Systems & AI Workload Scheduling
Week of November 3
Why are we studying this? AI workloads are a new class of computational patterns that challenge traditional system design assumptions. This week explores: How do we optimize systems for workloads that are themselves AI-driven? What new scheduling challenges emerge with transformer architectures and attention mechanisms? Can we co-design the AI algorithms and the systems that run them?
🎤 Guest Speaker(s): Esha Choukse (Microsoft Azure Research, Principal Researcher)
Bio: Esha Choukse is a Principal Researcher in the Azure Research Systems team. She leads the efficient AI project that optimizes the GenAI workloads and systems for efficiency and sustainability.
🎯 Main Papers:
- Efficient LLM Scheduling by Learning to Rank
- Performance Prediction for Large Systems via Text-to-Text Regression
📚 Background Reading: Ch 10. AI Training • Ch 11. ML Ops
📖 Supplemental Reading:
- Neural Architecture Search with Reinforcement Learning
- Efficient Memory Management for Large Language Model Serving with PagedAttention
- Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
- S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Phase 3: AI for Chip Design
AI tools implement the architecture physically in silicon
Week 11 - RTL Design & Logic Synthesis
Week of November 10
Why are we studying this? RTL design and logic synthesis represent the transition from architectural intent to physical implementation. This week examines: Can AI learn the complex relationships between high-level hardware descriptions and optimized gate-level implementations? How do we teach machines to reason about timing, power, and area trade-offs? What does it mean for AI to “understand” hardware design languages?
🎤 Guest Speaker(s): Mark Ren (Nvidia, Director of Design Automation Research)
Bio: Mark Ren leads Design Automation Research at Nvidia. His research interest is in AI for chip design and GPU-acceleration EDA.
🎯 Main Papers:
- Comprehensive Verilog Design Problems: A Next-Generation Benchmark Dataset for Evaluating Large Language Models and Agents on RTL Design and Verification
- Make Every Move Count: LLM-based High-Quality RTL Code Generation Using MCTS
📚 Background Reading: Ch 9. AI Acceleration • Ch 16. Robust AI
📖 Supplemental Reading:
- ChipNeMo: Domain-Adapted LLMs for Chip Design
- ChipAlign: Instruction Alignment in Large Language Models for Chip Design via Geodesic Interpolation
- DRiLLS: Deep Reinforcement Learning for Logic Synthesis
- BOiLS: Bayesian Optimisation for Logic Synthesis
- MasterRTL: A Pre-Synthesis PPA Estimation Framework for Any RTL Design
- AutoChip: Automating HDL Generation Using LLM Feedback
- OpenABC-D: A Large-Scale Dataset for Machine Learning Guided Integrated Circuit Synthesis
Week 12 - Physical Design & Layout
Week of November 17
Why are we studying this? Physical design is the final translation from logical design to manufacturable silicon. This week explores: Can AI learn the complex geometric and electrical constraints of chip layout? How do we optimize for objectives that span multiple scales - from transistor placement to global routing? What happens when AI systems must reason about manufacturing variability and yield?
🎤 Guest Speaker(s): Richard Ho (OpenAI, Head of Hardware)
Bio: Richard Ho is Head of Hardware at OpenAI working to co-optimize ML models and the massive compute hardware they run on.
🎯 Main Papers:
- Chip Placement with Deep Reinforcement Learning
- DREAMPlace: Deep Learning Toolkit-Enabled GPU Acceleration for Modern VLSI Placement
📚 Background Reading: Ch 9. AI Acceleration • Ch 8. Model Opt
📖 Supplemental Reading:
- Chip Placement with Deep Reinforcement Learning (Circuit Training)
- MaskPlace: Fast Chip Placement via Reinforced Visual Representation Learning
- Learning on distributed traces for data center storage systems
Week 13 - Verification & Advanced Chip Design
Week of November 24
Why are we studying this? Verification is the ultimate test of whether AI-designed systems actually work. This week examines: How do we verify systems that are too complex for traditional formal methods? Can AI help generate better test cases and assertions? What does it mean to “trust” an AI-designed chip? How do we close the loop from verification results back to design optimization?
🎤 Guest Speaker(s): Kartik Hegde (ChipStack, Co-Founder)
Bio: Kartik Hegde is the co-founder of ChipStack, focusing on AI-assisted chip design and verification workflows that accelerate silicon development.
🎯 Main Papers:
- Using LLMs to Facilitate Formal Verification of RTL
- SLDB: An End-To-End Heterogeneous System-on-Chip Benchmark Suite for LLM-Aided Design
📚 Background Reading: Ch 16. Robust AI • Ch 17. Responsible AI
📖 Supplemental Reading:
- AssertLLM: Generating and Evaluating Hardware Verification Assertions from Design Specifications via Multi-LLMs
- SpecLLM: Exploring Generation and Review of VLSI Design Specification with Large Language Model
November 26: Thanksgiving Break - No Class
Week 13 - Final Projects & Integration
Student project synthesis.
📝 Projects Due: December 1
Schedule subject to adjustment based on Guest Speaker(s) availability and emerging research developments.