The 3-Tier Resolver Guide

Models evaluate physics. Solvers find limits. Optimizers search for trade-offs.

MLSys·im provides 24 specialized resolvers that map to the 22 Systems Walls organized across six domains — Node, Data, Algorithm, Fleet, Ops, and Analysis.

To make engineering decisions systematic, we organize these tools into a 3-Tier Architecture:

  1. Analytical Models (*Model): The Physics Engine. Given a configuration, evaluate the consequences (\(Y = f(X)\)).
  2. Analysis Solvers (*Solver): The Math Engine. Given a target, algebraically solve for the required input (\(X = f^{-1}(Y)\)).
  3. Optimizers (*Optimizer): The Engineering Engine. Search a design space to maximize or minimize an objective (\(\max f(X)\)).
TipQuick Start

For most use cases, start with Engine.solve() — it wraps SingleNodeModel with a clean API. Graduate to individual solvers when you need to compose multi-domain analyses.


1. Analytical Models (The Physics Engine)

Use these when you want to ask: “What happens if I run this exact setup?”

Domain 1 — Node (Single-Accelerator Resources)

Model Key Inputs Key Outputs Best For
SingleNodeModel model, hardware, batch_size latency, throughput, bottleneck “Is my model memory-bound?”
EfficiencyModel model, hardware, workload_type MFU, achievable FLOPS “What MFU will my workload achieve?”
ServingModel model, hardware, seq_len TTFT, ITL, KV-cache footprint “Can I serve this LLM on this GPU?”
ContinuousBatchingModel model, hardware, seq_len, max_batch throughput, fragmentation “What throughput with PagedAttention?”
WeightStreamingModel model, hardware, seq_len, batch_size throughput, optimal_batch “Cerebras wafer-scale inference?”
TailLatencyModel arrival_rate, service_latency, replicas P50, P99 wait times “Will I meet P99 latency SLAs?”

Domain 2 — Data (Movement & Pipelines)

Model Key Inputs Key Outputs Best For
DataModel workload_data_rate, hardware utilization, is_stalled “Is my storage/IO the bottleneck?”
TransformationModel batch_size, cpu_throughput transform_time, is_bottleneck “Is CPU preprocessing starving my GPU?”
TopologyModel fabric, topology, num_nodes effective_bw, bisection_bw “What topology should I use?”

Domain 3 — Algorithm (Scaling & Compression)

Model Key Inputs Key Outputs Best For
ScalingModel compute_budget optimal_params, optimal_tokens “What is my optimal model size?”
InferenceScalingModel model, hardware, reasoning_steps total_reasoning_time “How much does CoT reasoning cost?”
CompressionModel model, hardware, method accuracy_delta, compression_ratio “Is quantization/pruning worth it?”

Domain 4 — Fleet (Multi-Node Coordination)

Model Key Inputs Key Outputs Best For
DistributedModel model, fleet, tp/pp/dp sizes scaling efficiency, comm overhead “How many GPUs do I actually need?”
ReliabilityModel fleet, job_duration MTBF, failure probability “Will my training job complete?”
OrchestrationModel fleet, arrival_rate, avg_duration avg_wait_time, utilization “How busy is my cluster?”

Domain 5 — Ops (Economics, Sustainability & Safety)

Model Key Inputs Key Outputs Best For
EconomicsModel fleet, duration_days, kwh_price CapEx, OpEx, total TCO “What will this cost over 3 years?”
SustainabilityModel fleet, duration_days, datacenter energy, carbon (kg CO₂e), water “Where should I train to minimize carbon?”
CheckpointModel model, hardware, optimizer checkpoint_size, MFU penalty “How much MFU do I lose to checkpoints?”
ResponsibleEngineeringModel base_training_time, epsilon dp_slowdown “What is the cost of differential privacy?”

2. Analysis Solvers (The Math Engine)

Use these when you want to ask: “What exact number do I need to hit my target?”

Solver Key Inputs Key Outputs Best For
SensitivitySolver model, hardware, perturbation_pct sensitivities, binding_constraint “Which parameter should I invest in?”
SynthesisSolver model, target_latency required_bw, required_flops “What hardware do I need for this SLA?”

3. Design Space Exploration (DSE)

Use the DSE Engine when you want to ask: “What is the best possible configuration?” It replaces the old hardcoded optimizers.

Engine Key Inputs Objective Best For
DSE Engine search space, objective, constraints minimize/maximize “What max batch size is safe for my SLA?” or “Where should I build my datacenter?”

Composing Resolvers

Real-world questions require chaining multiple tiers. Here are three common patterns:

“Can I serve Llama-70B on 4 H100s within budget?”

from mlsysim import ServingModel, EconomicsModel, Hardware, Models, Systems

serving = ServingModel().solve(
    model=Models.Language.Llama3_70B,
    hardware=Hardware.Cloud.H100,
    seq_len=2048, batch_size=4
)
print(f"TTFT: {serving.ttft}, ITL: {serving.itl}")
print(f"KV-Cache: {serving.kv_cache_size}")

“Which parameter matters most for Llama-8B latency?”

from mlsysim import SensitivitySolver, Hardware, Models

sensitivity = SensitivitySolver().solve(
    model=Models.Language.Llama3_8B,
    hardware=Hardware.Cloud.H100,
    batch_size=1, precision="fp16",
    perturbation_pct=0.10
)
print(f"Binding constraint: {sensitivity.binding_constraint}")
print(f"BW sensitivity:     {sensitivity.bw_sensitivity:.3f}")
print(f"FLOPS sensitivity:  {sensitivity.flops_sensitivity:.3f}")

“What hardware do I need for a 50ms SLA?”

from mlsysim import SynthesisSolver, Models
from mlsysim.core.constants import Q_

synthesis = SynthesisSolver().solve(
    model=Models.Language.Llama3_8B,
    target_latency=Q_("50 ms"),
    batch_size=1, precision="fp16"
)
print(f"Required bandwidth: {synthesis.required_bw.to('TB/s'):.2f}")
print(f"Required FLOPS:     {synthesis.required_flops.to('TFLOPs/s'):.1f}")
Back to top