Which Solver Do I Need?

A decision guide for choosing the right MLSYSIM analytical tool

MLSYSIM provides six specialized solvers, each designed to answer a different class of question about ML systems. This page helps you pick the right one — and shows you how to compose them for real-world analyses.


Start With Your Question

“How fast will my model run on this GPU?”
Use the SingleNodeModel. It applies the roofline model to determine whether your workload is compute-bound or memory-bound and returns latency, throughput, and bottleneck classification.
Lecture slides: Hardware Acceleration (Vol I, Ch 11) · Benchmarking (Vol I, Ch 12)
“How fast will my LLM generate tokens?”
Use the ServingModel. It models the two distinct phases of autoregressive inference: the compute-bound prefill (TTFT) and the memory-bound decode (ITL), plus KV-cache memory pressure.
Lecture slides: Model Serving (Vol I, Ch 13) · Inference at Scale (Vol II, Ch 9)
“How does performance scale across multiple GPUs?”
Use the DistributedModel. It decomposes workloads using 3D/4D parallelism (DP, TP, PP, EP) and calculates communication overhead, pipeline bubbles, and scaling efficiency.
Lecture slides: Distributed Training (Vol II, Ch 5) · Collective Communication (Vol II, Ch 6) · Network Fabrics (Vol II, Ch 3)
“How much will this cost to run?”
Use the EconomicsModel. It calculates Total Cost of Ownership: CapEx (hardware purchase), OpEx (energy + maintenance), and total TCO over a specified duration.
Lecture slides: Compute Infrastructure (Vol II, Ch 2)
“What is the carbon footprint?”
Use the SustainabilityModel. It computes energy consumption (factoring in PUE), carbon emissions (using regional grid intensity), and water usage across datacenter locations.
Lecture slides: Sustainable AI (Vol II, Ch 15)
“How often will my cluster fail during training?”
Use the ReliabilityModel. It estimates fleet-wide MTBF, failure probability for a given job duration, and the Young-Daly optimal checkpoint interval.
Lecture slides: Fault Tolerance (Vol II, Ch 7)

Quick Reference

Solver Key Inputs Key Outputs Best For
SingleNodeModel model, hardware, batch_size, precision latency, throughput, bottleneck, MFU “Is my model memory-bound?”
ServingModel model, hardware, seq_len, batch_size TTFT, ITL, KV-cache size, feasibility “Can I serve this LLM on this GPU?”
DistributedModel model, fleet, tp_size, pp_size, ep_size scaling efficiency, communication overhead “How many GPUs do I actually need?”
EconomicsModel fleet, duration_days, kwh_price CapEx, OpEx, total TCO “What will this cost over 3 years?”
SustainabilityModel fleet, duration_days, datacenter energy (kWh), carbon (kg CO₂e), water (L) “Where should I train to minimize carbon?”
ReliabilityModel fleet, job_duration_hours, checkpoint_time_s MTBF, failure probability, checkpoint interval “Will my training job complete?”

Code Examples

Single-node roofline analysis

import mlsysim
from mlsysim import SingleNodeModel

solver = SingleNodeModel()
profile = solver.solve(
    model=mlsysim.Models.ResNet50,
    hardware=mlsysim.Hardware.Cloud.A100,
    batch_size=1
)
print(f"Bottleneck: {profile.bottleneck}")   # → Memory Bound
print(f"Latency:    {profile.latency.to('ms'):~.2f}")
print(f"MFU:        {profile.mfu:.1%}")

LLM serving analysis

from mlsysim import ServingModel

serving = ServingModel()
result = serving.solve(
    model=mlsysim.Models.Language.Llama3_8B,
    hardware=mlsysim.Hardware.Cloud.H100,
    seq_len=2048,
    batch_size=1
)
print(f"TTFT: {result['ttft'].to('ms'):~.1f}")
print(f"ITL:  {result['itl'].to('ms'):~.2f}")
print(f"KV$:  {result['kv_cache_size']:~.2f}")
print(f"Fits: {result['feasible']}")

Distributed training at scale

from mlsysim import DistributedModel, Systems

dist = DistributedModel()
result = dist.solve(
    model=mlsysim.Models.Language.Llama3_70B,
    fleet=Systems.Clusters.Frontier_8K,
    batch_size=2048,
    tp_size=8,
    pp_size=4,
    microbatch_count=16
)
print(f"Scaling efficiency: {result['scaling_efficiency']:.1%}")
print(f"Bubble fraction:    {result['bubble_fraction']:.1%}")
print(f"DP comm latency:    {result['dp_communication_latency'].to('ms'):~.2f}")

Parameter sweep (manual loop)

MLSYSIM does not provide a built-in sweep function. Instead, use a simple Python loop — this keeps the analysis transparent and gives you full control over what you collect:

import mlsysim
from mlsysim import SingleNodeModel

solver = SingleNodeModel()
targets = [
    mlsysim.Hardware.Cloud.T4,
    mlsysim.Hardware.Cloud.A100,
    mlsysim.Hardware.Cloud.H100,
    mlsysim.Hardware.Cloud.B200,
]

for hw in targets:
    p = solver.solve(model=mlsysim.Models.ResNet50, hardware=hw, batch_size=32)
    print(f"{hw.name:20s}  {p.latency.to('ms'):>8.2f~}  {p.bottleneck}")

Composing Solvers

Real-world questions often require chaining multiple solvers. The output of one solver feeds naturally into the next because all solvers share typed inputs and pint.Quantity-valued outputs.

“Can I serve Llama-70B on 4 H100s within budget?”

  1. ServingModel — check if the model fits in memory and estimate TTFT/ITL.
  2. EconomicsModel — calculate the cost of running that fleet.

“What is the most sustainable way to train GPT-3?”

  1. DistributedModel — find the optimal parallelism configuration.
  2. SustainabilityModel — compare carbon footprint across regions.

“Should I use A100s or H100s for inference?”

  1. SingleNodeModel on A100 — get latency and bottleneck.
  2. SingleNodeModel on H100 — get latency and bottleneck.
  3. EconomicsModel for each — compare cost per query.

Textbook Chapter Mapping

Each solver connects to specific chapters in the Machine Learning Systems textbook and corresponding lecture slide decks.

Direct PDF download links for each lecture deck. Full slide portal at mlsysbook.ai/slides.
Solver Vol I Chapters (Slides) Vol II Chapters (Slides)
SingleNodeModel Training · HW Acceleration · Benchmarking Performance Engineering
ServingModel Model Serving Inference at Scale
DistributedModel Distributed Training · Collective Communication · Network Fabrics
EconomicsModel Compute Infrastructure
SustainabilityModel Sustainable AI
ReliabilityModel Fault Tolerance

TipEngine.solve() vs. SingleNodeModel

Engine.solve() is a convenience shortcut that produces identical results to SingleNodeModel().solve(). Use Engine.solve() for quick single-node analysis. Use the individual solver classes (ServingModel, DistributedModel, etc.) when you need specialized analyses beyond the roofline.


Why Analytical Solvers?

MLSYSIM is not an empirical profiler (like PyTorch Profiler) or a cycle-accurate simulator (like gem5). It is an analytical modeling platform that computes performance bounds from specifications and first-order equations. This is a deliberate design choice:

  • Speed. Closed-form equations evaluate in microseconds. You can sweep thousands of hardware x model x parallelism configurations in seconds — impossible with empirical profiling.
  • Intuition. By working from equations rather than opaque traces, students see exactly which physical quantity (bandwidth, compute, memory capacity) creates the bottleneck.
  • Accessibility. No hardware required. A laptop running pip install mlsysim gives you the same analysis as a $50,000 GPU cluster.
  • Composability. Solvers can be chained because they share typed inputs/outputs. The output of one solver feeds naturally into the next.

Solver Architecture

Every solver follows the same three-step pattern:

  1. Takes typed registry objectsHardwareNode, TransformerWorkload, Fleet, GridProfile — as input. These carry physical units (pint.Quantity), so dimensional errors are caught at runtime.
  2. Applies first-order equations from the Math Foundations page.
  3. Returns typed results — either a PerformanceProfile (for SingleNodeModel) or a dict with Quantity-valued fields (for specialized solvers).

The key principle: every .solve() method is a pure function of its inputs. No hidden state, no side effects, no network calls.


Writing a Custom Solver

You can create your own solver by following the same pattern. Here is a “power efficiency” solver that computes TFLOP/s per watt across the hardware registry:

import mlsysim
from mlsysim.hardware.types import HardwareNode

class PowerEfficiencyModel:
    """Compare hardware on performance-per-watt."""

    def solve(self, hardware: HardwareNode) -> dict:
        if hardware.tdp is None:
            raise ValueError(f"{hardware.name}: no TDP specified")

        flops_per_watt = hardware.compute.peak_flops / hardware.tdp

        return {
            "device": hardware.name,
            "peak_flops": hardware.compute.peak_flops,
            "tdp": hardware.tdp,
            "flops_per_watt": flops_per_watt.to("TFLOPs/s/kW"),
        }

# Use it
solver = PowerEfficiencyModel()

for hw in [mlsysim.Hardware.Cloud.H100, mlsysim.Hardware.Cloud.A100,
           mlsysim.Hardware.Cloud.T4, mlsysim.Hardware.Edge.JetsonOrinNX]:
    r = solver.solve(hw)
    print(f"{r['device']:25s}  {r['flops_per_watt']:>10.1f~}")

Use pint.Quantity for all physical calculations so that unit errors are impossible. For more complex solvers, see the source code for the six built-in solvers.


For the equations behind each solver, see Math Foundations. For full API details, see the Solver API Reference.

Back to top