The 3-Tier Resolver Guide
Models to evaluate physics, Solvers to find limits, and Optimizers to search for trade-offs.
MLSys·im provides 25 specialized resolvers that map to the 22 physical and logical constraints (“walls”) organized across six domains — Node, Data, Algorithm, Fleet, Ops, and Analysis.
To make engineering decisions systematic, we organize these tools into a 3-Tier Architecture:
- Analytical Models (
*Model): The Physics Engine. Given a configuration, it evaluates the consequences (\(Y = f(X)\)). - Analysis Solvers (
*Solver): The Math Engine. Given a target, it algebraically solves for the required input (\(X = f^{-1}(Y)\)). - Optimizers (
*Optimizer): The Engineering Engine. Searches a design space to maximize or minimize an objective (\(\max f(X)\)).
1. Analytical Models (The Physics Engine)
Use these when you want to ask: “What happens if I run this exact setup?”
Domain 1 — Node (Single-Accelerator Resources)
| Model | Key Inputs | Key Outputs | Best For |
|---|---|---|---|
SingleNodeModel |
model, hardware, batch_size | latency, throughput, bottleneck | “Is my model memory-bound?” |
EfficiencyModel |
model, hardware, workload_type | MFU, achievable FLOPS | “What MFU will my workload achieve?” |
ServingModel |
model, hardware, seq_len | TTFT, ITL, KV-cache footprint | “Can I serve this LLM on this GPU?” |
ContinuousBatchingModel |
model, hardware, seq_len, max_batch | throughput, fragmentation | “What throughput with PagedAttention?” |
WeightStreamingModel |
model, hardware, seq_len, batch_size | throughput, optimal_batch | “Cerebras wafer-scale inference?” |
TailLatencyModel |
arrival_rate, service_latency, replicas | P50, P99 wait times | “Will I meet P99 latency SLAs?” |
SingleNodeModel (offload) |
model, hardware | degraded bandwidth, spill bytes | “How slow when weights spill to host RAM?” |
Domain 2 — Data (Movement & Pipelines)
| Model | Key Inputs | Key Outputs | Best For |
|---|---|---|---|
DataModel |
workload_data_rate, hardware | utilization, is_stalled | “Is my storage/IO the bottleneck?” |
TransformationModel |
batch_size, cpu_throughput | transform_time, is_bottleneck | “Is CPU preprocessing starving my GPU?” |
TopologyModel |
fabric, topology, num_nodes | effective_bw, bisection_bw | “What topology should I use?” |
Domain 3 — Algorithm (Scaling & Compression)
| Model | Key Inputs | Key Outputs | Best For |
|---|---|---|---|
ScalingModel |
compute_budget | optimal_params, optimal_tokens | “What is my optimal model size?” |
InferenceScalingModel |
model, hardware, reasoning_steps | total_reasoning_time | “How much does CoT reasoning cost?” |
CompressionModel |
model, hardware, method | accuracy_delta, compression_ratio | “Is quantization/pruning worth it?” |
Domain 4 — Fleet (Multi-Node Coordination)
| Model | Key Inputs | Key Outputs | Best For |
|---|---|---|---|
DistributedModel |
model, fleet, tp/pp/dp sizes | scaling efficiency, comm overhead | “How many GPUs do I actually need?” |
ReliabilityModel |
fleet, job_duration | MTBF, failure probability | “Will my training job complete?” |
OrchestrationModel |
fleet, arrival_rate, avg_duration | avg_wait_time, utilization | “How busy is my cluster?” |
Domain 5 — Ops (Economics, Sustainability & Safety)
| Model | Key Inputs | Key Outputs | Best For |
|---|---|---|---|
EconomicsModel |
fleet, duration_days, kwh_price | CapEx, OpEx, total TCO | “What will this cost over 3 years?” |
SustainabilityModel |
fleet, duration_days, datacenter | energy, carbon (kg CO₂e), water | “Where should I train to minimize carbon?” |
CheckpointModel |
model, hardware, optimizer | checkpoint_size, MFU penalty | “How much MFU do I lose to checkpoints?” |
ResponsibleEngineeringModel |
base_training_time, epsilon | dp_slowdown | “What is the cost of differential privacy?” |
2. Analysis Solvers (The Math Engine)
Use these when you want to ask: “What exact number do I need to hit my target?”
| Solver | Key Inputs | Key Outputs | Best For |
|---|---|---|---|
SensitivitySolver |
model, hardware, perturbation_pct | sensitivities, binding_constraint | “Which parameter should I invest in?” |
SynthesisSolver |
model, target_latency | required_bw, required_flops | “What hardware do I need for this SLA?” |
3. Optimizers (The Engineering Engine)
Use these when you want to ask: “What is the best possible configuration?”
| Optimizer | Key Inputs | Objective | Best For |
|---|---|---|---|
ParallelismOptimizer |
model, cluster size | Maximize MFU | “What is the optimal TP/PP/DP split?” |
BatchingOptimizer |
model, arrival rate, SLA latency | Maximize Throughput | “What max batch size is safe for my SLA?” |
PlacementOptimizer |
fleet, training duration, budget | Minimize Carbon & Cost | “Where should I build my datacenter?” |
Composing Resolvers in Python
Real-world questions often require chaining multiple tiers. The output of a Model can feed into a Solver, which guides an Optimizer.
“Can I serve Llama-70B on 4 H100s within budget?”
ServingModel— check if the model fits in memory and estimate TTFT/ITLEconomicsModel— calculate the cost of running that fleet
“What is the most sustainable way to train GPT-3?”
ParallelismOptimizer— find the optimal TP/PP/DP configuration to minimize runtime.PlacementOptimizer— sweep the optimal run across the InfraZoo to find the lowest carbon footprint.
“Should I use A100s or H100s for inference?”
BatchingOptimizeron A100 — find max throughput under SLA.BatchingOptimizeron H100 — find max throughput under SLA.- Compare throughput per dollar to make the final choice.