The 3-Tier Resolver Guide
Models evaluate physics. Solvers find limits. Optimizers search for trade-offs.
MLSys·im provides 24 specialized resolvers that map to the 22 Systems Walls organized across six domains — Node, Data, Algorithm, Fleet, Ops, and Analysis.
To make engineering decisions systematic, we organize these tools into a 3-Tier Architecture:
- Analytical Models (
*Model): The Physics Engine. Given a configuration, evaluate the consequences (\(Y = f(X)\)). - Analysis Solvers (
*Solver): The Math Engine. Given a target, algebraically solve for the required input (\(X = f^{-1}(Y)\)). - Optimizers (
*Optimizer): The Engineering Engine. Search a design space to maximize or minimize an objective (\(\max f(X)\)).
For most use cases, start with Engine.solve() — it wraps SingleNodeModel with a clean API. Graduate to individual solvers when you need to compose multi-domain analyses.
1. Analytical Models (The Physics Engine)
Use these when you want to ask: “What happens if I run this exact setup?”
Domain 1 — Node (Single-Accelerator Resources)
| Model | Key Inputs | Key Outputs | Best For |
|---|---|---|---|
SingleNodeModel |
model, hardware, batch_size | latency, throughput, bottleneck | “Is my model memory-bound?” |
EfficiencyModel |
model, hardware, workload_type | MFU, achievable FLOPS | “What MFU will my workload achieve?” |
ServingModel |
model, hardware, seq_len | TTFT, ITL, KV-cache footprint | “Can I serve this LLM on this GPU?” |
ContinuousBatchingModel |
model, hardware, seq_len, max_batch | throughput, fragmentation | “What throughput with PagedAttention?” |
WeightStreamingModel |
model, hardware, seq_len, batch_size | throughput, optimal_batch | “Cerebras wafer-scale inference?” |
TailLatencyModel |
arrival_rate, service_latency, replicas | P50, P99 wait times | “Will I meet P99 latency SLAs?” |
Domain 2 — Data (Movement & Pipelines)
| Model | Key Inputs | Key Outputs | Best For |
|---|---|---|---|
DataModel |
workload_data_rate, hardware | utilization, is_stalled | “Is my storage/IO the bottleneck?” |
TransformationModel |
batch_size, cpu_throughput | transform_time, is_bottleneck | “Is CPU preprocessing starving my GPU?” |
TopologyModel |
fabric, topology, num_nodes | effective_bw, bisection_bw | “What topology should I use?” |
Domain 3 — Algorithm (Scaling & Compression)
| Model | Key Inputs | Key Outputs | Best For |
|---|---|---|---|
ScalingModel |
compute_budget | optimal_params, optimal_tokens | “What is my optimal model size?” |
InferenceScalingModel |
model, hardware, reasoning_steps | total_reasoning_time | “How much does CoT reasoning cost?” |
CompressionModel |
model, hardware, method | accuracy_delta, compression_ratio | “Is quantization/pruning worth it?” |
Domain 4 — Fleet (Multi-Node Coordination)
| Model | Key Inputs | Key Outputs | Best For |
|---|---|---|---|
DistributedModel |
model, fleet, tp/pp/dp sizes | scaling efficiency, comm overhead | “How many GPUs do I actually need?” |
ReliabilityModel |
fleet, job_duration | MTBF, failure probability | “Will my training job complete?” |
OrchestrationModel |
fleet, arrival_rate, avg_duration | avg_wait_time, utilization | “How busy is my cluster?” |
Domain 5 — Ops (Economics, Sustainability & Safety)
| Model | Key Inputs | Key Outputs | Best For |
|---|---|---|---|
EconomicsModel |
fleet, duration_days, kwh_price | CapEx, OpEx, total TCO | “What will this cost over 3 years?” |
SustainabilityModel |
fleet, duration_days, datacenter | energy, carbon (kg CO₂e), water | “Where should I train to minimize carbon?” |
CheckpointModel |
model, hardware, optimizer | checkpoint_size, MFU penalty | “How much MFU do I lose to checkpoints?” |
ResponsibleEngineeringModel |
base_training_time, epsilon | dp_slowdown | “What is the cost of differential privacy?” |
2. Analysis Solvers (The Math Engine)
Use these when you want to ask: “What exact number do I need to hit my target?”
| Solver | Key Inputs | Key Outputs | Best For |
|---|---|---|---|
SensitivitySolver |
model, hardware, perturbation_pct | sensitivities, binding_constraint | “Which parameter should I invest in?” |
SynthesisSolver |
model, target_latency | required_bw, required_flops | “What hardware do I need for this SLA?” |
3. Design Space Exploration (DSE)
Use the DSE Engine when you want to ask: “What is the best possible configuration?” It replaces the old hardcoded optimizers.
| Engine | Key Inputs | Objective | Best For |
|---|---|---|---|
DSE Engine |
search space, objective, constraints | minimize/maximize | “What max batch size is safe for my SLA?” or “Where should I build my datacenter?” |
Composing Resolvers
Real-world questions require chaining multiple tiers. Here are three common patterns:
“Can I serve Llama-70B on 4 H100s within budget?”
from mlsysim import ServingModel, EconomicsModel, Hardware, Models, Systems
serving = ServingModel().solve(
model=Models.Language.Llama3_70B,
hardware=Hardware.Cloud.H100,
seq_len=2048, batch_size=4
)
print(f"TTFT: {serving.ttft}, ITL: {serving.itl}")
print(f"KV-Cache: {serving.kv_cache_size}")“Which parameter matters most for Llama-8B latency?”
from mlsysim import SensitivitySolver, Hardware, Models
sensitivity = SensitivitySolver().solve(
model=Models.Language.Llama3_8B,
hardware=Hardware.Cloud.H100,
batch_size=1, precision="fp16",
perturbation_pct=0.10
)
print(f"Binding constraint: {sensitivity.binding_constraint}")
print(f"BW sensitivity: {sensitivity.bw_sensitivity:.3f}")
print(f"FLOPS sensitivity: {sensitivity.flops_sensitivity:.3f}")“What hardware do I need for a 50ms SLA?”
from mlsysim import SynthesisSolver, Models
from mlsysim.core.constants import Q_
synthesis = SynthesisSolver().solve(
model=Models.Language.Llama3_8B,
target_latency=Q_("50 ms"),
batch_size=1, precision="fp16"
)
print(f"Required bandwidth: {synthesis.required_bw.to('TB/s'):.2f}")
print(f"Required FLOPS: {synthesis.required_flops.to('TFLOPs/s'):.1f}")