Author’s Note
The world is rushing to build AI systems. It is not yet engineering them.
That sentence is the reason this book exists. A model that wins on a benchmark is not yet a system, and the distance between the two is not a detail of implementation. It is engineering: the work of making something hold up under constraints that do not negotiate. A demo answers to no one. A system answers to latency budgets, memory limits, power envelopes, and the cost of being wrong in front of real users.
What makes this its own discipline is a single, unusual demand. Every other system you will study answers to one body of law. A processor answers to the physics of the machine, to clock cycles, cache lines, and the heat it can shed. A statistical model answers to the mathematics of learning, to bias, variance, and the way error falls as data grows. A machine learning system is the rare thing that must answer to both at once, in the same decision, and the two do not always agree. A change that pleases the hardware can starve the model of the data it needs to generalize; a change that helps the model can ask the silicon for more than it can give. Hold that tension in view and the field stops looking like a pile of tools and starts looking like a structure you can reason about.
This book hands you that structure. Not the framework that is fashionable this year or the accelerator that is fastest this quarter, but the way of thinking that outlasts them. Frameworks change, hardware turns over every few years, the benchmarks reset; the forces underneath stay the same. If it works, you will close it able to look at any machine learning system, from a sensor sipping microwatts to a server farm drawing megawatts, and see the same small set of forces deciding what is possible.
The textbooks that changed me, when I was a student, did not hand me facts. They handed me a way of seeing, and the facts arranged themselves around it. That is what I have tried to do here.
— Vijay Janapa Reddi
Cambridge, Massachusetts
2026