The Silicon Zoo

Vetted Specifications for AI Accelerators and Edge Devices

The Silicon Zoo is the authoritative registry for physical hardware in mlsysim. Every specification is typed (pint.Quantity), provenance-tracked, and validated against official datasheets and MLPerf baselines—so you never have to argue about what the A100’s bandwidth actually is.

How to use this page

Reference these specs when reasoning about bottlenecks. For any device listed here, you can load it directly in Python: hw = mlsysim.Hardware.Cloud.A100. The three columns that matter most for roofline analysis are Peak Performance, Memory BW, and Capacity.

Data Center Accelerators

Device	Year	Peak Performance	Memory BW	Capacity	TDP
NVIDIA B200	2024	2.2 PFLOPs/s	8.0 TB/s	206.2 GB	1,000 W
Cerebras CS-3 (WSE-3)	2024	125.0 PFLOPs/s	21.0 PB/s	44.0 GB	23,000 W
d-Matrix Corsair	2024	2.4 PFLOPs/s	150.0 TB/s	2.0 GB	600 W
NVIDIA GB200 NVL72	2024	162.0 PFLOPs/s	576.0 TB/s	13.8 TB	120 kW
Intel Gaudi 3	2024	1.8 PFLOPs/s	3.7 TB/s	137.4 GB	900 W
Reference Desktop CPU	2024	1.0 TFLOPs/s	50.0 GB/s	68.7 GB	150 W
Google TPU v6 (Trillium)	2024	918.0 TFLOPs/s	1,600.0 GB/s	34.4 GB	300 W
AWS Trainium 2	2024	380.0 TFLOPs/s	2.4 TB/s	103.1 GB	500 W
NVIDIA H200	2023	989.0 TFLOPs/s	4.8 TB/s	140.7 GB	700 W
AMD MI300X	2023	1.3 PFLOPs/s	5.3 TB/s	206.2 GB	750 W
Google TPU v5p	2023	459.0 TFLOPs/s	2.8 TB/s	102.0 GB	300 W
Untether AI speedAI240	2023	2.0 PFLOPs/s	1.0 PB/s	238.0 MB	—
Intel Gaudi 2	2022	432.0 TFLOPs/s	2.5 TB/s	103.1 GB	600 W
NVIDIA H100	2022	989.0 TFLOPs/s	3.4 TB/s	85.9 GB	700 W
AMD MI250X	2021	383.0 TFLOPs/s	3.2 TB/s	137.4 GB	500 W
Google TPU v4	2021	275.0 TFLOPs/s	1,200.0 GB/s	34.4 GB	200 W
NVIDIA A100	2020	312.0 TFLOPs/s	2,039.0 GB/s	85.9 GB	400 W
Graphcore Colossus Mk2 GC200 IPU	2020	250.0 TFLOPs/s	45.0 TB/s	900.0 MB	—
Groq LPU	2020	188.0 TFLOPs/s	80.0 TB/s	230.0 MB	215 W
Intel SGX Enclave (Reference)	2020	1.0 TFLOPs/s	10.0 GB/s	128.0 MB	—
Google TPU v3	2019	105.0 TFLOPs/s	900.0 GB/s	34.4 GB	250 W
NVIDIA T4	2018	65.0 TFLOPs/s	320.0 GB/s	17.2 GB	70 W
Google TPU v2	2018	45.0 TFLOPs/s	700.0 GB/s	17.2 GB	200 W
Google TPU v1	2017	92.0 TFLOPs/s	34.0 GB/s	8.6 GB	75 W
NVIDIA V100	2017	125.0 TFLOPs/s	900.0 GB/s	34.4 GB	300 W
NVIDIA V100 PCIe	2017	112.0 TFLOPs/s	900.0 GB/s	34.4 GB	250 W

Workstations

Device	Year	Peak Performance	Memory BW	Capacity	TDP
NVIDIA DGX Spark (GB10)	2025	125.0 TFLOPs/s	273.0 GB/s	128.0 GB	200 W
MacBook Pro (M3 Max)	2023	14.2 TFLOPs/s	400.0 GB/s	128.0 GB	100 W

Mobile Devices

Device	Year	Peak Performance	Memory BW	Capacity	TDP
Google Pixel 8 (Tensor G3)	2023	15.0 TFLOPs/s	60.0 GB/s	8.0 GB	5 W
Snapdragon 8 Gen 3	2023	45.0 TFLOPs/s	77.0 GB/s	12.0 GB	5 W
iPhone 15 Pro (A17 Pro)	2023	35.0 TFLOPs/s	100.0 GB/s	8.0 GB	5 W
Apple M2 Neural Engine	2022	15.8 TFLOPs/s	100.0 GB/s	16.0 GB	20 W

Edge & Robotics

Device	Year	Peak Performance	Memory BW	Capacity	TDP
Edge Server	2024	1.0 TFLOPs/s	100.0 GB/s	128.0 GB	300 W
NVIDIA Jetson Orin NX	2023	25.0 TFLOPs/s	102.0 GB/s	16.0 GB	25 W
NVIDIA Jetson Orin Nano	2023	10.0 TFLOPs/s	68.0 GB/s	8.0 GB	15 W
NVIDIA Jetson AGX Orin	2022	275.0 TFLOPs/s	204.0 GB/s	64.0 GB	60 W
RoboTaxi Reference Compute (NVIDIA DRIVE AGX Orin class)	2022	5.2 TFLOPs/s	200.0 GB/s	32.0 GB	60 W
Intel NUC + Movidius	2020	1.0 TFLOPs/s	25.0 GB/s	16.0 GB	15 W
Google Coral Edge TPU	2019	4.0 TFLOPs/s	8.0 GB/s	1.0 GB	2 W

TinyML Microcontrollers

Device	Year	Peak Performance	Memory BW	Capacity
Oura Ring 4 (wearable reference profile)	2024	100.0 MFLOPs/s	0.1 GB/s	2.0 MB
ESP32-S3 (AI)	2022	500.0 MFLOPs/s	0.1 GB/s	4.0 MB
Himax WE-I Plus	2020	200.0 MFLOPs/s	0.1 GB/s	2.0 MB
Nordic nRF52840 (Cortex-M4F)	2018	64.0 MFLOPs/s	0.1 GB/s	1.0 MB

How to Read the Silicon Zoo

The Three Numbers That Matter

For roofline analysis, focus on three columns:

Peak Performance (TFLOP/s) — the compute ceiling. This determines how fast compute-bound workloads run (e.g., large-batch training, LLM pre-fill).
Memory Bandwidth (TB/s) — the memory ceiling. This determines how fast memory-bound workloads run (e.g., small-batch inference, LLM token decoding).
Capacity (GB) — the memory wall. If your model plus activations exceed this, the workload is infeasible on a single device.

The Ridge Point

The ratio of Peak Performance to Memory Bandwidth gives the ridge point (in FLOP/byte). Workloads with arithmetic intensity below the ridge point are memory-bound; above it, compute-bound. See the Math Foundations page for the full derivation.

Common Patterns

Cloud GPUs (A100, H100, H200) have 40-80+ GB of HBM with very high bandwidth (2-5 TB/s). They are designed for throughput.
Edge devices (Jetson) trade peak performance for lower power budgets, making TDP per TFLOP a useful comparison metric.
TinyML MCUs (RP2040, nRF5340) have KB-scale memory — only the smallest quantized models fit. Use the Model Zoo to find matching workloads.

Textbook Connection

These specifications are used throughout Volumes 1 and 2 of the textbook. The Hardware Acceleration chapter uses them for roofline construction, and the Compute Infrastructure chapter uses them for fleet sizing and TCO analysis.

Missing a device?

You can define custom hardware specs on-the-fly in Python or contribute new vetted specs to the registry. See the Contributing Guide for how to add persistent specs, or the Hardware API Reference for defining custom HardwareNode objects.

CLI Access

Browse the Silicon Zoo from your terminal: mlsysim zoo hardware

For full technical specs and validation details, see the API Reference.