The Model Zoo

Reference Workloads for Systems Modeling

The Model Zoo defines the Computational Demand placed on the hardware. Every workload is pulled from the mlsysim.Models registry and characterized by its FLOPs, parameter count, and architecture type—independent of any specific hardware.

Arithmetic Intensity = FLOPs ÷ Bytes

The key number for roofline analysis is each model’s arithmetic intensity—how many floating-point operations it performs per byte of memory loaded. Models with low arithmetic intensity (small batch, decoder-only inference) tend to be memory-bound on any hardware. Pair these specs with the Silicon Zoo to find your bottleneck.

Workload Types

MLSys·im supports five workload architectures, each with distinct scaling characteristics:

Type	Architecture	Key Characteristic	Example Models
Transformer	Dense attention	2P FLOPs/token; KV-cache grows with sequence length	GPT-4, LLaMA, BERT
CNN	Convolutional	Fixed FLOPs per image; no sequence dependence	ResNet-50, EfficientNet
Sparse (MoE)	Mixture-of-Experts	Active params ≪ total params; All-to-All dispatch	Mixtral, GShard
SSM (Mamba)	State-space model	O(1) state cache; linear-time sequence processing	Mamba, S4
Diffusion	Iterative denoising	T × FLOPs/step; latency scales with denoising steps	Stable Diffusion, DALL-E

Active vs. Total Parameters

For Sparse/MoE models, parameters refers to total parameters (used for memory sizing), while active_parameters refers to the subset active per token (used for FLOP counting). This distinction is critical: a 340B MoE model may use only 47B parameters per forward pass.

Vetted Model Registry

Large Language Models (LLMs)

Model	Architecture	Parameters	Inference FLOPS	Layers
BERT-Base	Transformer	110.0 MFLOPs	22.0 GFLOPs	12
BERT-Large	Transformer	340.0 MFLOPs	72.0 GFLOPs	24
GPT-2 (1.5B)	Transformer	1.5 GFLOPs	3.0 GFLOPs	48
GPT-3 (175B)	Transformer	175.0 GFLOPs	350.0 GFLOPs	96
GPT-4	Transformer	1.8 TFLOPs	3.5 TFLOPs	120
Llama-2-70B	Transformer	70.0 GFLOPs	140.0 GFLOPs	80
Llama-3.1-70B	Transformer	70.6 GFLOPs	141.2 GFLOPs	80
Llama-3.1-8B	Transformer	8.0 GFLOPs	16.1 GFLOPs	32

Vision Models (CNNs)

Model	Architecture	Parameters	Inference FLOPS	Layers
AlexNet	CNN	60.0 MFLOPs	1.5 GFLOPs	8
MobileNetV2	CNN	3.5 MFLOPs	300.0 MFLOPs	54
ResNet-50	CNN	25.6 MFLOPs	4.1 GFLOPs	50
YOLOv8-Nano	CNN	3.2 MFLOPs	8.7 GFLOPs	225

TinyML Models

Model	Architecture	Parameters	Inference FLOPS	Layers
Anomaly Detector	MLP	270.0 Kparam	540.0 Kparam	—
DS-CNN (KWS)	CNN	200.0 Kparam	20.0 MFLOPs	None
Wake Vision (Doorbell)	CNN	250.0 Kparam	25.0 MFLOPs	None

How to Read the Model Zoo

Parameters vs. Inference FLOPs

These two numbers tell very different stories:

Parameters determine memory footprint: at fp16, each parameter is 2 bytes. A 70B-parameter model needs ~140 GB just for weights — more than a single A100.
Inference FLOPs determine compute time: the total floating-point operations for one forward pass. Higher FLOPs means more work for the GPU’s compute cores.

The ratio of FLOPs to memory accessed (the arithmetic intensity) determines whether a workload is compute-bound or memory-bound. At small batch sizes, most models are memory-bound because the weights must be loaded regardless of batch size.

Which Model for Which Hardware?

As a rough guide:

TinyML MCUs (KB-scale memory) — only Tiny models fit (MobileNetV2, TinyBERT)
Edge devices (Jetson, 8-32 GB) — small Vision and Language models at int8
Single Cloud GPU (40-80 GB) — models up to ~30B parameters at fp16
Multi-GPU clusters — 70B+ models require distributed serving or training

Textbook Connection

The Model Training and Model Serving chapters use these workload profiles to demonstrate roofline analysis and serving cost estimation. The Model Compression chapter shows how quantization reduces both parameter memory and inference FLOPs.

Add your own model

Defining custom workloads is straightforward. You can extend the registry or define a TransformerWorkload (or CNNWorkload, SSMWorkload, DiffusionWorkload) object directly in your code. Learn more in the Contributing Guide and the Models API Reference.

CLI Access

Browse the Model Zoo from your terminal: mlsysim zoo models

For dynamic memory footprint and KV-cache calculations, see the API Reference.