The Silicon Zoo
Vetted Specifications for AI Accelerators and Edge Devices
The Silicon Zoo is the Single Source of Truth (SSoT) for all physical hardware in mlsysim. Every specification is typed (pint.Quantity), provenance-tracked, and validated against official datasheets and MLPerf baselines—so you never have to argue about what the A100’s bandwidth actually is.
Reference these specs when reasoning about bottlenecks. For any device listed here, you can load it directly in Python: hw = mlsysim.Hardware.Cloud.A100. The three columns that matter most for roofline analysis are Peak Performance, Memory BW, and Capacity.
Data Center Accelerators
| Device | Year | Peak Performance | Memory BW | Capacity | TDP |
|---|---|---|---|---|---|
| NVIDIA B200 | 2024 | 2.2 PFLOPs/s | 64.0 TFLOPs/s | 206.2 GB | 1,000 W |
| Cerebras CS-3 (WSE-3) | 2024 | 125.0 PFLOPs/s | 168.0 PFLOPs/s | 44.0 GB | 23,000 W |
| NVIDIA GB200 NVL72 | 2024 | 720.0 PFLOPs/s | 4.6 PFLOPs/s | 13.8 TB | 120 kW |
| NVIDIA H200 | 2023 | 989.0 TFLOPs/s | 38.4 TFLOPs/s | 141.0 GB | 700 W |
| AMD MI300X | 2023 | 1.3 PFLOPs/s | 42.4 TFLOPs/s | 206.2 GB | 750 W |
| Google TPU v5p | 2023 | 459.0 TFLOPs/s | 22.1 TFLOPs/s | 102.0 GB | 300 W |
| Google TPU v5p | 2023 | 459.0 TFLOPs/s | 22.1 TFLOPs/s | 102.0 GB | 300 W |
| NVIDIA H100 | 2022 | 989.0 TFLOPs/s | 26.8 TFLOPs/s | 85.9 GB | 700 W |
| NVIDIA A100 | 2020 | 312.0 TFLOPs/s | 16.3 TFLOPs/s | 85.9 GB | 400 W |
| NVIDIA T4 | 2018 | 65.0 TFLOPs/s | 2.6 TFLOPs/s | 17.2 GB | 70 W |
| NVIDIA V100 | 2017 | 125.0 TFLOPs/s | 7.2 TFLOPs/s | 34.4 GB | 300 W |
Workstations
| Device | Year | Peak Performance | Memory BW | Capacity | TDP |
|---|---|---|---|---|---|
| NVIDIA DGX Spark (GB10) | 2024 | 250.0 TFLOPs/s | 4.0 TFLOPs/s | 128.0 GB | 250 W |
| MacBook Pro (M3 Max) | 2023 | 14.2 TFLOPs/s | 3.2 TFLOPs/s | 128.0 GB | 100 W |
Mobile Devices
| Device | Year | Peak Performance | Memory BW | Capacity | TDP |
|---|---|---|---|---|---|
| Google Pixel 8 (Tensor G3) | 2023 | 15.0 TFLOPs/s | 480.0 GFLOPs/s | 8.0 GB | 5 W |
| Snapdragon 8 Gen 3 | 2023 | 45.0 TFLOPs/s | 616.0 GFLOPs/s | 12.0 GB | 5 W |
| iPhone 15 Pro (A17 Pro) | 2023 | 35.0 TFLOPs/s | 800.0 GFLOPs/s | 8.0 GB | 5 W |
Edge & Robotics
| Device | Year | Peak Performance | Memory BW | Capacity | TDP |
|---|---|---|---|---|---|
| Edge Server | 2024 | 1.0 TFLOPs/s | 800.0 GFLOPs/s | 128.0 GB | 300 W |
| iPhone 15 Pro (A17 Pro) | 2023 | 35.0 TFLOPs/s | 800.0 GFLOPs/s | 8.0 GB | 5 W |
| NVIDIA Jetson Orin NX | 2023 | 25.0 TFLOPs/s | 816.0 GFLOPs/s | 16.0 GB | 25 W |
| Intel NUC + Movidius | 2020 | 1.0 TFLOPs/s | 200.0 GFLOPs/s | 16.0 GB | 15 W |
| Google Coral Edge TPU | 2019 | 4.0 TFLOPs/s | 64.0 GFLOPs/s | 1.0 GB | 2 W |
TinyML Microcontrollers
| Device | Year | Peak Performance | Memory BW | Capacity | TDP |
|---|---|---|---|---|---|
| ESP32-S3 (AI) | 2022 | 500.0 MFLOPs/s | 1.6 GFLOPs/s | 524.3 KB | 1 W |
| ESP32-S3 (AI) | 2022 | 500.0 MFLOPs/s | 1.6 GFLOPs/s | 524.3 KB | 1 W |
| Himax WE-I Plus | 2020 | 200.0 MFLOPs/s | 800.0 MFLOPs/s | 2.0 MB | 0 W |
How to Read the Silicon Zoo
The Three Numbers That Matter
For roofline analysis, focus on three columns:
Peak Performance (TFLOP/s) — the compute ceiling. This determines how fast compute-bound workloads run (e.g., large-batch training, LLM pre-fill).
Memory Bandwidth (TB/s) — the memory ceiling. This determines how fast memory-bound workloads run (e.g., small-batch inference, LLM token decoding).
Capacity (GB) — the memory wall. If your model plus activations exceed this, the workload is infeasible on a single device.
The Ridge Point
The ratio of Peak Performance to Memory Bandwidth gives the ridge point (in FLOP/byte). Workloads with arithmetic intensity below the ridge point are memory-bound; above it, compute-bound. See the Math Foundations page for the full derivation.
Common Patterns
- Cloud GPUs (A100, H100, H200) have 40-80+ GB of HBM with very high bandwidth (2-5 TB/s). They are designed for throughput.
- Edge devices (Jetson) trade peak performance for lower power budgets, making TDP per TFLOP a useful comparison metric.
- TinyML MCUs (RP2040, nRF5340) have KB-scale memory — only the smallest quantized models fit. Use the Model Zoo to find matching workloads.
Textbook Connection
These specifications are used throughout Volumes 1 and 2 of the textbook. The Hardware Acceleration chapter uses them for roofline construction, and the Compute Infrastructure chapter uses them for fleet sizing and TCO analysis.
You can define custom hardware specs on-the-fly in Python or contribute new vetted specs to the registry. See the Contributing Guide for how to add persistent specs, or the Hardware API Reference for defining custom objects.
Note: For full technical specs and validation details, see the API Reference.