The Silicon Zoo

Vetted Specifications for AI Accelerators and Edge Devices

The Silicon Zoo is the authoritative registry for physical hardware in mlsysim. Every specification is typed (pint.Quantity), provenance-tracked, and validated against official datasheets and MLPerf baselines—so you never have to argue about what the A100’s bandwidth actually is.

TipHow to use this page

Reference these specs when reasoning about bottlenecks. For any device listed here, you can load it directly in Python: hw = mlsysim.Hardware.Cloud.A100. The three columns that matter most for roofline analysis are Peak Performance, Memory BW, and Capacity.

Data Center Accelerators

Device Year Peak Performance Memory BW Capacity TDP
NVIDIA B200 2024 2.2 PFLOPs/s 8.0 TB/s 206.2 GB 1,000 W
Cerebras CS-3 (WSE-3) 2024 125.0 PFLOPs/s 21.0 PB/s 44.0 GB 23,000 W
NVIDIA GB200 NVL72 2024 162.0 PFLOPs/s 576.0 TB/s 13.8 TB 120 kW
Intel Gaudi 3 2024 1.8 PFLOPs/s 3.7 TB/s 137.4 GB 900 W
Reference Desktop CPU 2024 1.0 TFLOPs/s 50.0 GB/s 68.7 GB 150 W
Google TPU v6 (Trillium) 2024 918.0 TFLOPs/s 1,600.0 GB/s 34.4 GB 300 W
AWS Trainium 2 2024 380.0 TFLOPs/s 2.4 TB/s 103.1 GB 500 W
NVIDIA H200 2023 989.0 TFLOPs/s 4.8 TB/s 140.7 GB 700 W
AMD MI300X 2023 1.3 PFLOPs/s 5.3 TB/s 206.2 GB 750 W
Google TPU v5p 2023 459.0 TFLOPs/s 2.8 TB/s 102.0 GB 300 W
Intel Gaudi 2 2022 432.0 TFLOPs/s 2.5 TB/s 103.1 GB 600 W
NVIDIA H100 2022 989.0 TFLOPs/s 3.4 TB/s 85.9 GB 700 W
AMD MI250X 2021 383.0 TFLOPs/s 3.2 TB/s 137.4 GB 500 W
Google TPU v4 2021 275.0 TFLOPs/s 1,200.0 GB/s 34.4 GB 200 W
NVIDIA A100 2020 312.0 TFLOPs/s 2,039.0 GB/s 85.9 GB 400 W
Intel SGX Enclave (Reference) 2020 1.0 TFLOPs/s 10.0 GB/s 128.0 MB
Google TPU v3 2019 105.0 TFLOPs/s 900.0 GB/s 34.4 GB 250 W
NVIDIA T4 2018 65.0 TFLOPs/s 320.0 GB/s 17.2 GB 70 W
Google TPU v2 2018 45.0 TFLOPs/s 700.0 GB/s 17.2 GB 200 W
Google TPU v1 2017 92.0 TFLOPs/s 34.0 GB/s 8.6 GB 75 W
NVIDIA V100 2017 125.0 TFLOPs/s 900.0 GB/s 34.4 GB 300 W
NVIDIA V100 PCIe 2017 112.0 TFLOPs/s 900.0 GB/s 34.4 GB 250 W

Workstations

Device Year Peak Performance Memory BW Capacity TDP
NVIDIA DGX Spark (GB10) 2025 125.0 TFLOPs/s 273.0 GB/s 128.0 GB 200 W
MacBook Pro (M3 Max) 2023 14.2 TFLOPs/s 400.0 GB/s 128.0 GB 100 W

Mobile Devices

Device Year Peak Performance Memory BW Capacity TDP
Google Pixel 8 (Tensor G3) 2023 15.0 TFLOPs/s 60.0 GB/s 8.0 GB 5 W
Snapdragon 8 Gen 3 2023 45.0 TFLOPs/s 77.0 GB/s 12.0 GB 5 W
iPhone 15 Pro (A17 Pro) 2023 35.0 TFLOPs/s 100.0 GB/s 8.0 GB 5 W
Apple M2 Neural Engine 2022 15.8 TFLOPs/s 100.0 GB/s 16.0 GB 20 W

Edge & Robotics

Device Year Peak Performance Memory BW Capacity TDP
Edge Server 2024 1.0 TFLOPs/s 100.0 GB/s 128.0 GB 300 W
NVIDIA Jetson Orin NX 2023 25.0 TFLOPs/s 102.0 GB/s 16.0 GB 25 W
NVIDIA Jetson Orin Nano 2023 10.0 TFLOPs/s 68.0 GB/s 8.0 GB 15 W
NVIDIA Jetson AGX Orin 2022 275.0 TFLOPs/s 204.0 GB/s 64.0 GB 60 W
RoboTaxi Reference Compute (NVIDIA DRIVE AGX Orin class) 2022 5.2 TFLOPs/s 200.0 GB/s 32.0 GB 60 W
Intel NUC + Movidius 2020 1.0 TFLOPs/s 25.0 GB/s 16.0 GB 15 W
Google Coral Edge TPU 2019 4.0 TFLOPs/s 8.0 GB/s 1.0 GB 2 W

TinyML Microcontrollers

Device Year Peak Performance Memory BW Capacity TDP
Oura Ring 4 (wearable reference profile) 2024 100.0 MFLOPs/s 0.1 GB/s 2.0 MB 0 W
ESP32-S3 (AI) 2022 500.0 MFLOPs/s 0.1 GB/s 4.0 MB 0 W
Himax WE-I Plus 2020 200.0 MFLOPs/s 0.1 GB/s 2.0 MB 0 W
Nordic nRF52840 (Cortex-M4F) 2018 64.0 MFLOPs/s 0.1 GB/s 1.0 MB 0 W

How to Read the Silicon Zoo

The Three Numbers That Matter

For roofline analysis, focus on three columns:

  1. Peak Performance (TFLOP/s) — the compute ceiling. This determines how fast compute-bound workloads run (e.g., large-batch training, LLM pre-fill).

  2. Memory Bandwidth (TB/s) — the memory ceiling. This determines how fast memory-bound workloads run (e.g., small-batch inference, LLM token decoding).

  3. Capacity (GB) — the memory wall. If your model plus activations exceed this, the workload is infeasible on a single device.

The Ridge Point

The ratio of Peak Performance to Memory Bandwidth gives the ridge point (in FLOP/byte). Workloads with arithmetic intensity below the ridge point are memory-bound; above it, compute-bound. See the Math Foundations page for the full derivation.

Common Patterns

  • Cloud GPUs (A100, H100, H200) have 40-80+ GB of HBM with very high bandwidth (2-5 TB/s). They are designed for throughput.
  • Edge devices (Jetson) trade peak performance for lower power budgets, making TDP per TFLOP a useful comparison metric.
  • TinyML MCUs (RP2040, nRF5340) have KB-scale memory — only the smallest quantized models fit. Use the Model Zoo to find matching workloads.

Textbook Connection

These specifications are used throughout Volumes 1 and 2 of the textbook. The Hardware Acceleration chapter uses them for roofline construction, and the Compute Infrastructure chapter uses them for fleet sizing and TCO analysis.


NoteMissing a device?

You can define custom hardware specs on-the-fly in Python or contribute new vetted specs to the registry. See the Contributing Guide for how to add persistent specs, or the Hardware API Reference for defining custom HardwareNode objects.

TipCLI Access

Browse the Silicon Zoo from your terminal: mlsysim zoo hardware

For full technical specs and validation details, see the API Reference.

Back to top