The 3-Tier Resolver Guide

Models to evaluate physics, Solvers to find limits, and Optimizers to search for trade-offs.

MLSys·im provides 25 specialized resolvers that map to the 22 physical and logical constraints (“walls”) organized across six domains — Node, Data, Algorithm, Fleet, Ops, and Analysis.

To make engineering decisions systematic, we organize these tools into a 3-Tier Architecture:

  1. Analytical Models (*Model): The Physics Engine. Given a configuration, it evaluates the consequences (\(Y = f(X)\)).
  2. Analysis Solvers (*Solver): The Math Engine. Given a target, it algebraically solves for the required input (\(X = f^{-1}(Y)\)).
  3. Optimizers (*Optimizer): The Engineering Engine. Searches a design space to maximize or minimize an objective (\(\max f(X)\)).

1. Analytical Models (The Physics Engine)

Use these when you want to ask: “What happens if I run this exact setup?”

Domain 1 — Node (Single-Accelerator Resources)

Model Key Inputs Key Outputs Best For
SingleNodeModel model, hardware, batch_size latency, throughput, bottleneck “Is my model memory-bound?”
EfficiencyModel model, hardware, workload_type MFU, achievable FLOPS “What MFU will my workload achieve?”
ServingModel model, hardware, seq_len TTFT, ITL, KV-cache footprint “Can I serve this LLM on this GPU?”
ContinuousBatchingModel model, hardware, seq_len, max_batch throughput, fragmentation “What throughput with PagedAttention?”
WeightStreamingModel model, hardware, seq_len, batch_size throughput, optimal_batch “Cerebras wafer-scale inference?”
TailLatencyModel arrival_rate, service_latency, replicas P50, P99 wait times “Will I meet P99 latency SLAs?”
SingleNodeModel (offload) model, hardware degraded bandwidth, spill bytes “How slow when weights spill to host RAM?”

Domain 2 — Data (Movement & Pipelines)

Model Key Inputs Key Outputs Best For
DataModel workload_data_rate, hardware utilization, is_stalled “Is my storage/IO the bottleneck?”
TransformationModel batch_size, cpu_throughput transform_time, is_bottleneck “Is CPU preprocessing starving my GPU?”
TopologyModel fabric, topology, num_nodes effective_bw, bisection_bw “What topology should I use?”

Domain 3 — Algorithm (Scaling & Compression)

Model Key Inputs Key Outputs Best For
ScalingModel compute_budget optimal_params, optimal_tokens “What is my optimal model size?”
InferenceScalingModel model, hardware, reasoning_steps total_reasoning_time “How much does CoT reasoning cost?”
CompressionModel model, hardware, method accuracy_delta, compression_ratio “Is quantization/pruning worth it?”

Domain 4 — Fleet (Multi-Node Coordination)

Model Key Inputs Key Outputs Best For
DistributedModel model, fleet, tp/pp/dp sizes scaling efficiency, comm overhead “How many GPUs do I actually need?”
ReliabilityModel fleet, job_duration MTBF, failure probability “Will my training job complete?”
OrchestrationModel fleet, arrival_rate, avg_duration avg_wait_time, utilization “How busy is my cluster?”

Domain 5 — Ops (Economics, Sustainability & Safety)

Model Key Inputs Key Outputs Best For
EconomicsModel fleet, duration_days, kwh_price CapEx, OpEx, total TCO “What will this cost over 3 years?”
SustainabilityModel fleet, duration_days, datacenter energy, carbon (kg CO₂e), water “Where should I train to minimize carbon?”
CheckpointModel model, hardware, optimizer checkpoint_size, MFU penalty “How much MFU do I lose to checkpoints?”
ResponsibleEngineeringModel base_training_time, epsilon dp_slowdown “What is the cost of differential privacy?”

2. Analysis Solvers (The Math Engine)

Use these when you want to ask: “What exact number do I need to hit my target?”

Solver Key Inputs Key Outputs Best For
SensitivitySolver model, hardware, perturbation_pct sensitivities, binding_constraint “Which parameter should I invest in?”
SynthesisSolver model, target_latency required_bw, required_flops “What hardware do I need for this SLA?”

3. Optimizers (The Engineering Engine)

Use these when you want to ask: “What is the best possible configuration?”

Optimizer Key Inputs Objective Best For
ParallelismOptimizer model, cluster size Maximize MFU “What is the optimal TP/PP/DP split?”
BatchingOptimizer model, arrival rate, SLA latency Maximize Throughput “What max batch size is safe for my SLA?”
PlacementOptimizer fleet, training duration, budget Minimize Carbon & Cost “Where should I build my datacenter?”

Composing Resolvers in Python

Real-world questions often require chaining multiple tiers. The output of a Model can feed into a Solver, which guides an Optimizer.

“Can I serve Llama-70B on 4 H100s within budget?”

  1. ServingModel — check if the model fits in memory and estimate TTFT/ITL
  2. EconomicsModel — calculate the cost of running that fleet

“What is the most sustainable way to train GPT-3?”

  1. ParallelismOptimizer — find the optimal TP/PP/DP configuration to minimize runtime.
  2. PlacementOptimizer — sweep the optimal run across the InfraZoo to find the lowest carbon footprint.

“Should I use A100s or H100s for inference?”

  1. BatchingOptimizer on A100 — find max throughput under SLA.
  2. BatchingOptimizer on H100 — find max throughput under SLA.
  3. Compare throughput per dollar to make the final choice.
Back to top