🚧 DEVELOPMENT PREVIEW - Built from dev@c07b37ec • 2026-06-24 12:10 EDT • Stable version →
🧮 MLSys·im — first-principles analytical modeling for ML training and inference; model the physics before you build. 📘 The book:Vol I: Foundations · Vol II: At Scale — open access, free forever. 🛠️ Alongside the book:TinyTorch (build) · Hardware Kits (deploy) · Labs (explore) · StaffML (practice) · Lecture Slides 📬 Newsletter: ML Systems insights & updates — Subscribe →
Cited Scalars for MFU, Chinchilla, Batch Size, and Communication
The Literature Zoo holds published anchors cited in the textbook appendices — MFU bands, Chinchilla ratios, critical-batch-size anchors, and communication overheads. Every entry is a Sourced scalar with structured Provenance (see Provenance).
ImportantNot hardware specs
Do not confuse Literature.Training.MfuHigh with a GPU datasheet field. Literature entries are cited teaching assumptions; silicon numbers live in Hardware. Operational run-overhead profiles live in Ops, and scenario scale-efficiency profiles live in Scenarios; provenance records the source, but the registry path records the category.
Training MFU bands
Entry
Value
Description
MFU Training (Upper Bound)
0.5
Upper bound MFU for excellent large-model training runs.
MFU Inference (Batch 1)
0.05
MFU for single-request inference, heavily memory-bandwidth-bound.
MFU Inference (Batched)
0.4
Illustrative MFU upper bound for large-batch inference.
MFU Training (Lower Bound)
0.3
Lower bound MFU for well-optimized large-model training.
Benchmark anchors
Entry
Value
Description
Llama 3 8B H100 ITL Lower Bound
3.0
Lower edge of the H100 Llama-family decode latency sanity envelope.
Llama 3 8B H100 ITL Upper Bound
10.0
Upper edge of the H100 Llama-family decode latency sanity envelope.
ResNet-50 A100 Training Throughput
3200.0
Single-accelerator ResNet-50/A100 throughput anchor for empirical sanity checks.
ResNet-50 H100 Training Throughput
5000.0
Single-accelerator ResNet-50/H100 throughput anchor for empirical sanity checks.
Critical batch size anchors
Entry
Value
Description
BERT critical batch size
256.0
Rounded critical batch-size anchor for BERT-scale training examples.
Default critical batch size
1024.0
Generic rounded critical batch-size anchor for first-pass training examples.
GPT-3 critical batch size
4096.0
Rounded critical batch-size anchor for GPT-3-scale training examples.
Chinchilla anchors
Entry
Value
Description
Training Compute Constant (C ≈ 6PD)
6.0
Training FLOPs multiplier (6PD): 2 forward + 4 backward FLOPs per parameter per token.
Decode Compute Constant (2P)
2.0
Autoregressive decode FLOPs multiplier (2P): 2 forward FLOPs per parameter per token.