System Assumptions

Purpose

What assumptions sit underneath the “napkin math” throughout the book?

Every quantitative example in this book, from training-time estimates to energy-per-inference calculations, uses a specific set of assumed numbers: reference accelerator bandwidth, reference model scale, cloud electricity price, decimal GB definitions, and dozens more. This appendix collects those assumptions in one place so the book’s arithmetic can be audited, checked against chapter napkin math, or substituted with a different accelerator generation, regional power price, or local convention. In D·A·M terms, the assumptions keep data volume, algorithm work, and machine capacity measured on the same scale across chapters.

How to Use This Appendix

When a chapter says an H100 delivers a certain ridge point, or that training a 7B model needs a certain amount of memory, the underlying bandwidths, capacities, and model sizes are listed here. Find the relevant section (for example, NVIDIA H100 or Reference Model Specifications), read the Value and Unit columns, and plug them into your own estimate. The tables are grouped by topic: accelerators, reference models, energy per access, interconnect bandwidth, economics, production-scale anchors, and unit conventions. Assumptions come from vendor datasheet peaks, published studies or industry reports, illustrative market or grid statistics, and book conventions; section 1.1 lists the provenance in one place.

While the parameters of ML systems present a zoo of competing benchmarks and moving targets, these unified assumptions provide a stable foundation for the book’s quantitative reasoning.

Napkin Math 1.1: Napkin math with these constants

The constants in this appendix are not just for auditing—they are designed for quick calculations. Three examples illustrate the pattern.

Problem: Decide whether a workload is compute bound or memory bound.

Variables: For an H100 at FP16/BF16 peak, use 989 TFLOP/s peak compute and 3.35 TB/s memory bandwidth.

Math: Divide peak FLOP/s by memory bandwidth to get the roofline ridge point: 989 TFLOP/s / 3.35 TB/s $\approx$ 295.2 FLOP/byte. A large general matrix multiply (GEMM) with $n=$ 4,096 has intensity $n/3 \approx$ 1,365.3 FLOP/byte.

Result: Operations above 295.2 FLOP/byte are compute bound; operations below it are memory bound. The GEMM example is compute bound, while a single-token autoregressive decode with intensity $\approx 1$ FLOP/byte is deeply memory bound.

Systems insight: The same accelerator can be compute rich and memory constrained. Arithmetic intensity determines which resource the workload actually consumes.

Problem: Estimate the memory needed to train a 7B model.

Variables: Mixed-precision Adaptive Moment Estimation (Adam) stores BF16 weights, gradients, FP32 master weights, momentum, and variance. The optimizer state budget is 16 bytes per parameter.

Math: For 7B parameters, model state is 7 $\times 10^9 \times$ 16 bytes = 112 GB.

Result: An H100 has 85.9 GB of HBM, so the model state alone exceeds a single accelerator before accounting for activations.

Systems insight: Optimizer state, not weights alone, sets the floor for training memory. Capacity planning that starts from parameter bytes underestimates the real requirement.

Problem: Estimate the electricity cost of a GPT-3-scale training run on the reference cluster.

Variables: Use 1,024 A100s, 400 W per accelerator, ~25 days wall-clock, and $0.12/kWh electricity.

Math: The run requires roughly $3.14 \times 10^{23}$ FLOPs. The A100-equivalent energy estimate is ~25 days wall-clock $\times$ 1,024 A100s $\times$ 24 h/day $\times$ 0.4 kW $\times$ $0.12/kWh = ~$29,491.2.

Result: The electricity estimate is only a small fraction of total training cost, which is dominated by accelerator amortization. The original GPT-3 run used V100-era infrastructure; this is an A100-equivalent reference estimate, not a derivation of the run duration from peak FLOP/s alone.

Systems insight: Energy is measurable from power and time, but full cost accounting must also include capital utilization, networking, storage, staffing, and failed or repeated runs.

Accelerator Specifications

These are the accelerator assumptions used in the book’s roofline, training-memory, energy, and cost examples: peak throughput, memory bandwidth, memory capacity, and thermal design power (TDP) for each generation the chapters cite. Values are vendor datasheet peaks—ceilings for napkin math, not sustained utilization—drawn from NVIDIA product documentation and IEEE Micro architecture articles (NVIDIA Corporation 2017, 2020, 2018, 2024; Choquette et al. 2021; Choquette 2023), AMD MI300X documentation (AMD 2023), Google TPU publications (Jouppi et al. 2023; Jouppi et al. 2021), and Google Cloud’s current TPU v6e specification page (Google Cloud 2026). Tables progress from table 1 (Volta) and table 2 (Turing) through current and forward-looking generations.

NVIDIA V100

Table 1: NVIDIA V100 (Volta): Peak specs from the NVIDIA V100 architecture whitepaper (NVIDIA Corporation 2017). The V100 introduced Tensor Cores for mixed-precision training and anchors baseline-generation comparisons.

Assumption	Value	Unit
Peak FP16 tensor throughput (V100)	125	TFLOP/s
Peak FP32 throughput (V100)	15.7	TFLOP/s
HBM bandwidth (V100)	900	GB/s
HBM capacity (V100)	32	GiB
TDP (V100)	300	W

NVIDIA T4

Table 2: NVIDIA T4 (Turing): Peak specs from NVIDIA Tesla T4 product documentation (NVIDIA Corporation 2018). A low-power inference accelerator widely deployed in cloud serving; its 70 W TDP is a canonical cost-per-inference anchor.

Assumption	Value	Unit
Peak FP16 tensor throughput (T4)	65	TFLOP/s
Peak INT8 throughput (T4)	130	TOPS
Memory bandwidth (T4)	320	GB/s
TDP (T4)	70	W

NVIDIA A100

Table 3 lists the Ampere-generation specs that anchor most training examples in the book.

Table 3: NVIDIA A100 (Ampere): Peak specs from the NVIDIA A100 architecture whitepaper (NVIDIA Corporation 2020) and (Choquette et al. 2021). The A100 is the most commonly cited accelerator in training examples; 80 GB HBM2e and TF32 Tensor Cores set memory-capacity and compute-intensity anchors.

Assumption	Value	Unit
Peak FP16 tensor throughput (A100)	312	TFLOP/s
Peak FP32 throughput (A100)	19.5	TFLOP/s
Peak INT8 throughput (A100)	624	TOPS
Peak TF32 throughput (A100)	156	TFLOP/s
HBM bandwidth (A100)	2039	GB/s
HBM capacity (A100)	80	GiB
TDP (A100)	400	W

NVIDIA H100

Table 4 adds Hopper-generation FP8 Tensor Cores and the Transformer Engine, driving “current generation” estimates.

Table 4: NVIDIA H100 (Hopper): Peak specs from the Hopper architecture overview (Choquette 2023). FP8 Tensor Cores and the Transformer Engine drive current-generation training and serving estimates.

Assumption	Value	Unit
Peak FP16 tensor throughput (H100)	989	TFLOP/s
Peak FP32 CUDA throughput (H100)	67	TFLOP/s
Peak FP8 tensor throughput (H100)	1979	TFLOP/s
Peak INT8 throughput (H100)	1979	TOPS
Peak TF32 throughput (H100)	494	TFLOP/s
HBM bandwidth (H100)	3.35	TB/s
HBM capacity (H100)	80	GiB
TDP (H100)	700	W

NVIDIA B200

Table 5 provides Blackwell-generation specs used in forward-looking capacity-planning examples.

Table 5: NVIDIA B200 (Blackwell): Peak specs from NVIDIA Blackwell product documentation (NVIDIA Corporation 2024). Forward-looking capacity-planning examples use its HBM3e bandwidth and FP8 throughput.

Assumption	Value	Unit
Peak FP16 tensor throughput (B200)	2250	TFLOP/s
Peak FP8 tensor throughput (B200)	4500	TFLOP/s
Peak INT4 throughput (B200)	9000	TOPS
HBM bandwidth (B200)	8	TB/s
HBM capacity (B200)	192	GiB
TDP (B200)	1000	W

AMD Instinct MI300X

Table 6 provides specifications for the MI300X, often used as the primary alternative to NVIDIA’s H100 in large-scale inference and training clusters.

Table 6: AMD Instinct MI300X: Peak specs from AMD Instinct MI300X product documentation (AMD 2023). High HBM capacity (192 GB) and bandwidth make it a common cross-vendor baseline for memory-bound workloads.

Assumption	Value	Unit
Peak FP16 tensor throughput (MI300X)	1307	TFLOP/s
HBM bandwidth (MI300X)	5.3	TB/s
HBM capacity (MI300X)	192	GiB
TDP (MI300X)	750	W

Google TPU v4

Table 7 provides the ASIC-based alternative used when comparing training economics across accelerator families.

Table 7: Google TPU v4 and v6e (Trillium): TPU v4 values come from Google TPU publications (Jouppi et al. 2023; Jouppi et al. 2021), and TPU v6e values come from Google Cloud’s current Trillium specification page (Google Cloud 2026).

Assumption	Value	Unit
Peak BF16 throughput (TPU v4)	275	TFLOP/s
Memory bandwidth (TPU v4)	1200	GB/s
Peak BF16 throughput (TPU v6e)	918	TFLOP/s
Memory bandwidth (TPU v6e)	1600	GB/s

CPU and mobile/edge processors

Table 8 grounds the edge and mobile ML examples, where the contrast with data center accelerator throughput illustrates why deployment target shapes every design decision.

Table 8: CPU, Mobile NPU, and Edge Device Specs: Illustrative reference points for edge and mobile examples (not vendor peaks for a single SKU). The contrast with data-center accelerators shows why deployment target shapes every design decision.

Assumption	Value	Unit
Peak FP32 throughput (reference CPU)	1	TFLOP/s
DRAM bandwidth (reference server)	50	GB/s
Peak INT8 throughput (iPhone 15 Pro NPU)	35	TOPS
Memory bandwidth (iPhone 15 Pro)	100	GB/s
TDP (mobile device, reference)	5	W
Power (edge object detector, reference)	2	W
Battery capacity (phone, reference)	15	Wh

Model Specifications

These are the reference model assumptions behind training-cost, memory-footprint, and inference-workload examples: parameter counts, per-inference FLOP budgets, and published training-scale anchors. Published model sizes follow primary papers, model reports, and official model documentation (Devlin et al. 2019; Radford et al. 2019; Brown et al. 2020; Touvron et al. 2023; Dubey et al. 2024; He et al. 2016; Sandler et al. 2018; Ultralytics 2023). GPT-3 training FLOPs and duration follow (Brown et al. 2020). The GPT-4 parameter count and training GPU-days are public third-party MoE estimates (SemiAnalysis 2023) because the GPT-4 technical report does not disclose architecture size. When a chapter estimates GPT-3-scale training time or 7B optimizer state, it uses table 9.

Table 9: Reference Model Specifications: Parameter counts and FLOP budgets for worked examples. BERT, GPT-2, GPT-3, Llama, ResNet-50, MobileNetV2, and YOLOv8 rows use primary papers, model reports, or official model documentation (Devlin et al. 2019; Radford et al. 2019; Brown et al. 2020; Touvron et al. 2023; Dubey et al. 2024; He et al. 2016; Sandler et al. 2018; Ultralytics 2023). GPT-4 rows cite (OpenAI et al. 2023) for the official model family and (SemiAnalysis 2023) for the public MoE parameter and GPU-day estimates used in this edition.

Assumption	Value	Unit
Inference FLOPs (BERT-Base)	2.2e+10	flop
Parameters (BERT-Base)	1.1e+08	param
Parameters (Llama 3 8B)	8.03e+09	param
Hidden dimension (GPT-2)	1600	-
Layers (GPT-2)	48	-
Parameters (GPT-2)	1.5e+09	param
Parameters (GPT-3)	1.75e+11	param
Reference training duration (GPT-3)	25	d
Reference training FLOPs (GPT-3)	3.14e+23	flop
Parameters (GPT-4, public MoE estimate)	1.76e+12	param
Reference training GPU-days (GPT-4)	2.5e+06	-
Inference FLOPs (ResNet-50)	4.1e+09	flop
Parameters (ResNet-50)	2.56e+07	param
Inference FLOPs (MobileNetV2)	3e+08	flop
Parameters (MobileNetV2)	3.50487e+06	param
Inference FLOPs (YOLOv8-Nano)	8.7e+09	flop

Training Memory Conventions

Training-memory napkin math assumes the mixed-precision Adam storage model used in the napkin-math callout and several training chapters: BF16 weights, BF16 gradients, and FP32 master weights plus Adam first- and second-moment buffers (12 bytes per parameter). This is a book convention aligned with common mixed-precision Adam training layouts (Kingma and Ba 2014; Micikevicius et al. 2017; NVIDIA 2017), not a measured hardware constant. Table 10 lists per-component byte widths; multiplying bytes per parameter (mixed-precision Adam) by the parameter count gives optimizer-state footprint before activations.

Table 10: Training Memory Conventions: Per-parameter storage for mixed-precision Adam (2 + 2 + 12 = 16 bytes before activations). Book convention for napkin math; see (NVIDIA 2017) for mixed-precision training context.

Assumption	Value	Unit
Weight/gradient width (BF16)	2	bytes
Master weight and optimizer state width (FP32)	4	bytes
Adam state per parameter (momentum + variance, FP32)	8	bytes
Bytes per parameter (mixed-precision Adam)	12	bytes

Hardware and model assumptions fix what runs where; energy assumptions fix whether the design is thermally and economically viable at the operation and memory-access level.

Energy Constants

These energy assumptions (primarily from Horowitz’s 45 nm table (Horowitz 2014)) underpin the book’s efficiency and sustainability discussions. Table 11 lists the hierarchy from register through DRAM—why data reuse dominates kernel design.

Table 11: Energy per Operation and Access: Energy hierarchy from register through DRAM (Horowitz 2014). The roughly 64,000× gap between a 0.01 pJ register access and a 640 pJ DRAM access explains why data reuse dominates ML kernel optimization.

Assumption	Value	Unit
Register access energy	0.01	pJ
L1 SRAM access energy	0.5	pJ
L2 SRAM access energy	2	pJ
DRAM access energy (per access)	640	pJ
DRAM access energy (per byte)	160	pJ/byte
FP16 FLOP energy	1.1	pJ/FLOP
FP32 FLOP energy	3.7	pJ/FLOP
INT8 multiply-add energy	0.2	pJ/MAC
MobileNetV2 inference energy (reference)	0.1	mJ
5G transfer energy per MB	100	mJ/MB

Energy costs operate at the chip level, but real ML systems also move data across interconnects—between accelerators, across racks, and over wide-area networks. The next section lists the bandwidth assumptions used when chapters estimate communication overhead.

Interconnect and Network Bandwidth

These bandwidth assumptions apply when chapters reason about gradient synchronization, pipeline bubbles, checkpoint I/O, or cross–data center latency. NVLink and PCIe rates follow accelerator product documentation (NVIDIA Corporation 2017, 2020; Choquette 2023); the InfiniBand architecture specification anchors the protocol family (InfiniBand Trade Association 2000), while current high-speed product families and Ethernet roadmaps anchor modern link-rate examples (NVIDIA 2026; Ethernet Alliance 2025); the speed-of-light-in-fiber floor is a physics identity. Table 12 lists NVLink, InfiniBand, PCIe, NVMe, Ethernet, and WAN latency anchors.

Table 12: Interconnect and Network Bandwidth: Link-family bandwidths use vendor specs, the InfiniBand architecture specification, and current product/roadmap documentation (InfiniBand Trade Association 2000; NVIDIA 2026; Ethernet Alliance 2025). Speed of light in fiber sets the cross–data-center latency floor.

Assumption	Value	Unit
NVLink bandwidth (V100)	300	GB/s
NVLink bandwidth (A100)	600	GB/s
NVLink bandwidth (H100)	900	GB/s
InfiniBand HDR link bandwidth	200	Gb/s
InfiniBand NDR link bandwidth	400	Gb/s
InfiniBand XDR link bandwidth	800	Gb/s
InfiniBand GXDR link bandwidth	1600	Gb/s
PCIe Gen4 bandwidth (A100 host)	32	GB/s
PCIe Gen5 bandwidth (H100 host)	64	GB/s
NVMe sequential read bandwidth	7	GB/s
10 GbE bandwidth	10	Gb/s
100 GbE bandwidth	100	Gb/s
Speed of light in fiber	200000	km/s

Economic Constants

These pricing assumptions underpin TCO and energy-cost napkin math in table 13. They are illustrative hyperscaler-order rates for ratio analysis (similar in spirit to carbon accounting examples in (Patterson et al. 2021)), not quotes for a specific region or contract. Substitute your own values when absolute price dominates.

Table 13: Economic Assumptions: Illustrative cloud electricity and egress rates for TCO napkin math (2024–2025 order of magnitude). On-premise cost structures differ; relative magnitudes guide design trade-offs.

Assumption	Value	Unit
Cloud electricity price	0.12	dollar/kWh
Cloud egress price per GB	0.09	dollar/GB

Economic constants set the price per unit of compute and data transfer, but they mean little without a sense of the volumes involved. Production ML systems handle millions to billions of requests per day—numbers large enough to be difficult to internalize without concrete reference points.

Scale References

These scale assumptions anchor “how big is big?” in capacity-planning examples (table 14): order-of-magnitude public disclosures (email/search volume, autonomous-driving sensor rates) plus standard 1080p and 4K video parameters. They are magnitude anchors, not audited statistics for a specific year.

Table 14: Production Scale and Data Rate References: Order-of-magnitude public anchors for capacity planning (traffic volumes, sensor data rates, 1080p and 4K video parameters).

Assumption	Value	Unit
Gmail emails per day	1.21e+11	-
Google searches per day	8.5e+09	-
Waymo sensor data rate (low)	1	TB/h
Waymo sensor data rate (high)	19	TB/h
1080p frame width	1920	-
1080p frame height	1080	-
4K frame width	3840	-
4K frame height	2160	-
Bytes per RGB pixel	3	bytes
Video frame rate (standard)	30	Hz

The assumptions above use the unit conventions in table 15—decimal data prefixes (KB $= 10^3$ bytes), separate FLOPs (work) and FLOP/s (throughput), and the aliases below.

Unit Conventions

Table 15 fixes the unit conventions used in every quantitative example in this volume. Each row gives the multiplier $k$ in 1 alias $=$ $k$ base. Data prefixes use decimal SI ($\mathrm{KB} = 10^3$ bytes, not 1024). Binary IEC storage prefixes appear in a few storage-specific discussions but are omitted here because most fleet-scale estimates in the book use decimal KB/GB/TB. Throughput quantities (FLOP/s, GB/s) combine these aliases with time; adding incompatible dimensions (bytes to FLOP/s) is a category error in napkin math, not a unit conversion.

Table 15: Unit conventions: Decimal SI aliases and scale factors (book convention for napkin math). Throughput forms such as FLOP/s and GB/s divide work or data by time using the same conventions.

Alias	Multiplier	Base unit
`byte`	1	byte
`KB`	1000	byte
`MB`	1e+06	byte
`GB`	1e+09	byte
`TB`	1e+12	byte
`PB`	1e+15	byte
`flop`	1	flop
`GFLOPs`	1e+09	flop
`TFLOPs`	1e+12	flop
`ZFLOPs`	1e+21	flop
`param`	1	param
`Mparam`	1e+06	param
`Gbps`	1e+09	bit/s
`NS`	$10^{-9}$	second
`US`	$10^{-6}$	second
`MS`	$10^{-3}$	second
`second`	1	second
`hour`	3600	second
`day`	86400	second
`joule`	1	joule
`watt`	1	watt
`meter`	1	meter

With all constants, units, and scale references in place, the next section catalogs the provenance of each assumption so readers can trace the numbers back to their primary sources.

Assumption Provenance

The catalog below summarizes where each section’s numbers come from. The book uses Quarto @citekey references in captions and this table; mlsysim stores structured Provenance on Sourced registry scalars and metadata for audits and labs—no BibTeX keys in the package. See table 16 for the section-to-reference map.

Table 16: Assumption provenance catalog: Quick map from appendix section to source class and bibliography.

Appendix section	Source type	Primary references
Accelerator specifications	Vendor datasheet peaks (2026-Q1)	(NVIDIA Corporation 2017, 2018, 2020, 2024; Choquette et al. 2021; Choquette 2023; AMD 2023; Jouppi et al. 2023; Jouppi et al. 2021; Google Cloud 2026)
Model specifications	Published papers, model reports, official docs; GPT-4 size from public analysis	(Devlin et al. 2019; Radford et al. 2019; Brown et al. 2020; Touvron et al. 2023; Dubey et al. 2024; He et al. 2016; Sandler et al. 2018; Ultralytics 2023; OpenAI et al. 2023; SemiAnalysis 2023)
Training memory conventions	Book convention (mixed-precision Adam layout)	(Kingma and Ba 2014; Micikevicius et al. 2017; NVIDIA 2017)
Energy constants	Published 45 nm energy table	(Horowitz 2014)
Interconnect bandwidth	Vendor specs; InfiniBand standard; physics	(NVIDIA Corporation 2017, 2020; Choquette 2023; InfiniBand Trade Association 2000; NVIDIA 2026; Ethernet Alliance 2025)
Economic assumptions	Illustrative cloud/utility rates	(Patterson et al. 2021) (methodology context)
Scale references	Order-of-magnitude public anchors	Editorial magnitude anchors
Unit conventions	Decimal SI; book notation	Editorial

NVIDIA Corporation. 2017. NVIDIA Tesla V100 GPU Architecture. NVIDIA Whitepaper.

NVIDIA Corporation. 2018. NVIDIA Tesla T4 Tensor Core GPU. NVIDIA product documentation.

NVIDIA Corporation. 2020. NVIDIA A100 Tensor Core GPU Architecture. NVIDIA Whitepaper, V1.0.

NVIDIA Corporation. 2024. NVIDIA Blackwell Architecture. NVIDIA product documentation.

Choquette, Jack, Wishwesh Gandhi, Olivier Giroux, Nick Stam, and Ronny Krashinsky. 2021. “NVIDIA A100 Tensor Core GPU: Performance and Innovation.” IEEE Micro 41 (2): 29–35. https://doi.org/10.1109/mm.2021.3061394.

Choquette, Jack. 2023. “NVIDIA Hopper H100 GPU: Scaling Performance.” IEEE Micro 43 (3): 9–17. https://doi.org/10.1109/mm.2023.3256796.

AMD. 2023. AMD Instinct MI300X Accelerators. AMD product documentation.

Jouppi, Norm, George Kurian, Sheng Li, Peter Ma, Rahul Nagarajan, Lifeng Nai, Nishant Patil, et al. 2023. “TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings.” Proceedings of the 50th Annual International Symposium on Computer Architecture, 1–14. https://doi.org/10.1145/3579371.3589350.

Jouppi, Norman P., Doe Hyun Yoon, Matthew Ashcraft, Mark Gottscho, Thomas B. Jablin, George Kurian, James Laudon, et al. 2021. “Ten Lessons from Three Generations Shaped Google’s TPUv4i : Industrial Product.” 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA) 64: 1–14. https://doi.org/10.1109/isca52012.2021.00010.

Google Cloud. 2026. TPU v6e.

Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.” Proceedings of the 2019 Conference of the North, 4171–86. https://doi.org/10.18653/v1/n19-1423.

Radford, Alec, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language Models Are Unsupervised Multitask Learners. OpenAI.

Brown, Tom B., Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, et al. 2020. “Language Models Are Few-Shot Learners.” Advances in Neural Information Processing Systems 33: 1877–901. https://doi.org/10.48550/arxiv.2005.14165.

Touvron, Hugo, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, et al. 2023. “LLaMA: Open and Efficient Foundation Language Models.” arXiv Preprint arXiv:2302.13971.

Dubey, Abhimanyu, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, et al. 2024. The Llama 3 Herd of Models. arXiv preprint arXiv:2407.21783.

He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. “Deep Residual Learning for Image Recognition.” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–78. https://doi.org/10.1109/cvpr.2016.90.

Sandler, Mark, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. “MobileNetV2: Inverted Residuals and Linear Bottlenecks.” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4510–20. https://doi.org/10.1109/cvpr.2018.00474.

Ultralytics. 2023. YOLOv8.

OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, et al. 2023. “GPT-4 Technical Report.” arXiv Preprint arXiv:2303.08774, ahead of print. https://doi.org/10.48550/arXiv.2303.08774.

SemiAnalysis. 2023. GPT-4 Architecture, Infrastructure, Training Dataset, Costs, Vision, MoE. SemiAnalysis Blog.

Kingma, Diederik P., and Jimmy Ba. 2014. “Adam: A Method for Stochastic Optimization.” ICLR in press.

Micikevicius, Paulius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, et al. 2017. “Mixed Precision Training.” arXiv Preprint arXiv:1710.03740.

NVIDIA. 2017. Training with Mixed Precision.

Horowitz, Mark. 2014. “1.1 Computing’s Energy Problem (and What We Can Do about It).” 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 10–14. https://doi.org/10.1109/isscc.2014.6757323.

InfiniBand Trade Association. 2000. InfiniBand Architecture Specification Volume 1. InfiniBand Trade Association.

NVIDIA. 2026. NVIDIA Quantum-X800 InfiniBand Platform. NVIDIA product documentation.

Ethernet Alliance. 2025. 2025 Ethernet Roadmap. Ethernet Alliance roadmap.

Patterson, David, Joseph Gonzalez, Quoc Le, Chen Liang, Lluis-Miquel Munguia, Daniel Rothchild, David So, Maud Texier, and Jeff Dean. 2021. “Carbon Emissions and Large Neural Network Training.” arXiv Preprint arXiv:2104.10350.

Summary

This appendix is the book’s shared assumption sheet: every napkin-math estimate in the volume should be traceable to a row in the tables above—and, where it matters, to section 1.1 and table 16.

Key Takeaways: Using the assumption tables

One place for all shared numbers: Hardware peaks, model sizes, link bandwidths, energy per access, cloud prices, and scale anchors used in worked examples are listed here so you can audit or replace them without re-deriving from prose.
Hardware rows are ceilings: Peak FLOP/s, HBM bandwidth, and TDP are datasheet maxima. Real training often reaches 30 to 50 percent of peak compute utilization; using peak values without discounting yields optimistic timelines.
Ratios often outlast absolutes: Ridge point (FLOP/s ÷ bandwidth), mixed-precision Adam bytes per parameter (16), and the register-to-DRAM energy gap are more portable across generations than any single SKU’s peak TFLOP/s.
Substitute your own assumptions: When your deployment differs—different GPU, region, or utilization—swap the Value column and rerun the same formulas the chapters use.
Check provenance before debating a number: Datasheet peaks, published studies, illustrative rates, and book conventions are sourced differently; table 16 and section captions say which is which.