The 3-Tier Resolver Guide

Models evaluate physics. Solvers find limits. Optimizers search for trade-offs.

MLSys·im provides 28 specialized resolvers that map to the 22 Systems Walls organized across six domains — Node, Data, Algorithm, Fleet, Ops, and Analysis.

To make engineering decisions systematic, we organize these tools into a 3-Tier Architecture:

%%{init: {'theme': 'neutral'}}%%
flowchart LR
    A["<b>1. Analytical Models</b><br/><i>Y = f(X)</i><br/><span style='font-size:0.8em;color:#666;'>Evaluate consequences</span>"]
    B["<b>2. Analysis Solvers</b><br/><i>X = f⁻¹(Y)</i><br/><span style='font-size:0.8em;color:#666;'>Find exact limits</span>"]
    C["<b>3. Optimizers (DSE)</b><br/><i>max f(X)</i><br/><span style='font-size:0.8em;color:#666;'>Search design space</span>"]

    A --> B
    A --> C

    style A fill:#e0f2fe,stroke:#0284c7,stroke-width:2px
    style B fill:#fef08a,stroke:#d97706
    style C fill:#dcfce3,stroke:#10b981

Analytical Models (*Model): The Physics Engine. Given a configuration, evaluate the consequences (\(Y = f(X)\)).
Analysis Solvers (*Solver): The Math Engine. Given a target, algebraically solve for the required input (\(X = f^{-1}(Y)\)).
Optimizers (*Optimizer): The Engineering Engine. Search a design space to maximize or minimize an objective (\(\max f(X)\)).

Quick Start

For most use cases, start with Engine.solve() — it wraps SingleNodeModel with a clean API. Graduate to individual solvers when you need to compose multi-domain analyses.

1. Analytical Models (The Physics Engine)

Use these when you want to ask: “What happens if I run this exact setup?”

Domain 1 — Node (Single-Accelerator Resources)

Model	Key Inputs	Key Outputs	Best For
`SingleNodeModel`	model, hardware, batch_size	latency, throughput, bottleneck	“Is my model memory-bound?”
`TrainingMemoryModel`	model, hardware, batch_size, seq_len	weights, gradients, optimizer, activations	“Why does training need so much more memory?”
`EfficiencyModel`	model, hardware, workload_type	MFU, achievable FLOPS	“What MFU will my workload achieve?”
`ServingModel`	model, hardware, seq_len	TTFT, ITL, KV-cache footprint, decode stall proxy	“Can I serve this LLM on this GPU?”
`ServingCapacityModel`	model, hardware, QPS, target P99	replicas, capacity, queue wait	“How many replicas do I need for this SLA?”
`ContinuousBatchingModel`	model, hardware, seq_len, max_batch	throughput, fragmentation	“What throughput with PagedAttention?”
`WeightStreamingModel`	model, hardware, seq_len, batch_size	throughput, optimal_batch	“Cerebras wafer-scale inference?”
`TailLatencyModel`	arrival_rate, service_latency, replicas	P50, P99 wait times	“Will I meet P99 latency SLAs?”

Domain 2 — Data (Movement & Pipelines)

Model	Key Inputs	Key Outputs	Best For
`DataModel`	workload_data_rate, hardware	utilization, is_stalled	“Is my storage/IO the bottleneck?”
`TransformationModel`	batch_size, cpu_throughput	transform_time, is_bottleneck	“Is CPU preprocessing starving my GPU?”
`TopologyModel`	fabric, topology, num_nodes	effective_bw, bisection_bw	“What topology should I use?”

Domain 3 — Algorithm (Scaling & Compression)

Model	Key Inputs	Key Outputs	Best For
`ScalingModel`	compute_budget	optimal_params, optimal_tokens	“What is my optimal model size?”
`InferenceScalingModel`	model, hardware, reasoning_steps	total_reasoning_time	“How much does CoT reasoning cost?”
`CompressionModel`	model, hardware, method	accuracy_delta, compression_ratio	“Is quantization/pruning worth it?”

Domain 4 — Fleet (Multi-Node Coordination)

Model	Key Inputs	Key Outputs	Best For
`DistributedModel`	model, fleet, tp/pp/dp sizes	scaling efficiency, comm overhead	“How many GPUs do I actually need?”
`MoERoutingModel`	sparse model, batch, seq_len, EP	active experts, routed bytes, all-to-all	“How much does hot-expert imbalance cost?”
`ReliabilityModel`	fleet, job_duration	MTBF, failure probability	“Will my training job complete?”
`OrchestrationModel`	fleet, arrival_rate, avg_duration	avg_wait_time, utilization	“How busy is my cluster?”

Domain 5 — Ops (Economics, Sustainability & Safety)

Model	Key Inputs	Key Outputs	Best For
`EconomicsModel`	fleet, duration_days, kwh_price	CapEx, OpEx, total TCO	“What will this cost over 3 years?”
`SustainabilityModel`	fleet, duration_days, datacenter	energy, carbon (kg CO₂e), water	“Where should I train to minimize carbon?”
`CheckpointModel`	model, hardware, optimizer	checkpoint_size, MFU penalty	“How much MFU do I lose to checkpoints?”
`ResponsibleEngineeringModel`	base_training_time, epsilon	dp_slowdown	“What is the cost of differential privacy?”

2. Analysis Solvers (The Math Engine)

Use these when you want to ask: “What exact number do I need to hit my target?”

Solver	Key Inputs	Key Outputs	Best For
`SensitivitySolver`	model, hardware, perturbation_pct	sensitivities, binding_constraint	“Which parameter should I invest in?”
`SynthesisSolver`	model, target_latency	required_bw, required_flops	“What hardware do I need for this SLA?”

3. Design Space Exploration (DSE)

Use the DSE Engine when you want to ask: “What is the best possible configuration?” It replaces the old hardcoded optimizers.

Engine	Key Inputs	Objective	Best For
`DSE Engine`	search space, objective, constraints	minimize/maximize	“What max batch size is safe for my SLA?” or “Where should I build my datacenter?”

Composing Resolvers

Real-world questions require chaining multiple tiers. Here are three common patterns:

“Can I serve Llama-70B on 4 H100s within budget?”

from mlsysim import Hardware, Models, Systems
from mlsysim.solvers import ServingModel, EconomicsModel

serving = ServingModel().solve(
    model=Models.Language.Llama3_70B,
    hardware=Hardware.Cloud.H100,
    seq_len=2048, batch_size=4
)
print(f"TTFT: {serving.ttft}, ITL: {serving.itl}")
print(f"KV-Cache: {serving.kv_cache_size}")

“Which parameter matters most for Llama-8B latency?”

from mlsysim import Hardware, Models
from mlsysim.solvers import SensitivitySolver

sensitivity = SensitivitySolver().solve(
    model=Models.Language.Llama3_8B,
    hardware=Hardware.Cloud.H100,
    batch_size=1, precision="fp16",
    perturbation_pct=0.10
)
print(f"Binding constraint: {sensitivity.binding_constraint}")
print(f"BW sensitivity:     {sensitivity.bw_sensitivity:.3f}")
print(f"FLOPS sensitivity:  {sensitivity.flops_sensitivity:.3f}")

“What hardware do I need for a 50ms SLA?”

from mlsysim import Models
from mlsysim.solvers import SynthesisSolver
from mlsysim.core.units import Q_

synthesis = SynthesisSolver().solve(
    model=Models.Language.Llama3_8B,
    target_latency=Q_("50 ms"),
    batch_size=1, precision="fp16"
)
print(f"Required bandwidth: {synthesis.required_bw.to('TB/s'):.2f}")
print(f"Required FLOPS:     {synthesis.required_flops.to('TFLOPs/s'):.1f}")