core.solver

core.solver

Classes

Name Description
SingleNodeModel Resolves single-node hardware Roofline bounds and feasibility.
ServingModel Analyzes the two-phase LLM serving lifecycle: Pre-fill vs. Decoding.
ContinuousBatchingModel Analyzes production LLM serving with Continuous Batching and PagedAttention.
WeightStreamingModel Analyzes Wafer-Scale inference (e.g., Cerebras CS-3) using Weight Streaming.
TailLatencyModel Analyzes queueing delays and P99 tail latency for deployed inference (M/M/c).
DistributedModel Resolves fleet-wide communication, synchronization, and pipelining constraints.
ReliabilityModel Calculates Mean Time Between Failures (MTBF) and optimal checkpointing intervals.
CheckpointModel Analyzes checkpoint I/O burst penalties and MFU impact.
EconomicsModel Calculates Total Cost of Ownership (TCO) including Capex and Opex.
SustainabilityModel Calculates Datacenter-scale Sustainability metrics.

DistributedModel

core.solver.DistributedModel()

Resolves fleet-wide communication, synchronization, and pipelining constraints.

This solver models the constraints of distributed scale for distributed training. It decomposes a workload across a cluster using 3D Parallelism (DP, TP, PP) and calculates the resulting communication overheads and idle times (bubbles) that determine the Model FLOPs Utilization (MFU).

Methods

Name Description
solve Calculates distributed training performance using the 3D Parallelism model.
solve
core.solver.DistributedModel.solve(
    model,
    fleet,
    batch_size=1,
    precision='fp16',
    efficiency=0.5,
    tp_size=1,
    pp_size=1,
    microbatch_count=1,
    topology_override=None,
)

Calculates distributed training performance using the 3D Parallelism model.

Parameters
Name Type Description Default
model Workload The model architecture to simulate. required
fleet Fleet The hardware cluster and network topology. required
batch_size int Global batch size. 1
precision str Numerical precision (fp16, fp32, int8). 'fp16'
efficiency float Achieved compute efficiency (0.0 to 1.0). 0.5
tp_size int Tensor Parallelism degree. Splits individual layers across GPUs, usually within a single node over high-speed NVLink. 1
pp_size int Pipeline Parallelism degree. Chains model layers across multiple nodes, introducing ‘pipeline bubbles’ while saving memory. 1
microbatch_count int Number of microbatches (M). Increasing M reduces the pipeline bubble but increases synchronization overhead. 1
topology_override str Force a specific topology (ring, tree). None
Returns
Name Type Description
Dict[str, Any] Metrics including DP/TP latency, the Pipeline Bubble penalty, and the final Scaling Efficiency.

EconomicsModel

core.solver.EconomicsModel()

Calculates Total Cost of Ownership (TCO) including Capex and Opex.

Combines hardware costs, energy consumption, and maintenance into a single financial model for the fleet. This solver exposes the ROI of architectural efficiency by showing how reducing power draw or increasing throughput directly impacts the bottom line.

Methods

Name Description
solve Calculates the TCO for a fleet over a specified duration.
solve
core.solver.EconomicsModel.solve(fleet, duration_days, kwh_price=0.12)

Calculates the TCO for a fleet over a specified duration.

Parameters
Name Type Description Default
fleet Fleet The hardware cluster configuration. required
duration_days float Operation duration in days. required
kwh_price float Price of electricity per kWh, by default 0.12. 0.12
Returns
Name Type Description
Dict[str, Any] Financial metrics including CapEx, OpEx, and total TCO.

ReliabilityModel

core.solver.ReliabilityModel()

Calculates Mean Time Between Failures (MTBF) and optimal checkpointing intervals.

This solver handles the reliability modeling of massive clusters, helping determine the ‘Goodput’ of long-running training jobs. It identifies the probability of a job failure before completion and calculates the Young-Daly optimal interval to minimize wasted compute time.

Methods

Name Description
solve Calculates reliability and checkpointing metrics for a fleet.
solve
core.solver.ReliabilityModel.solve(
    fleet,
    job_duration_hours,
    checkpoint_time_s=60.0,
)

Calculates reliability and checkpointing metrics for a fleet.

Parameters
Name Type Description Default
fleet Fleet The hardware cluster configuration. required
job_duration_hours float Total wall-clock duration of the training job. required
checkpoint_time_s float Time taken to save a single checkpoint, by default 60.0. 60.0
Returns
Name Type Description
Dict[str, Any] Reliability metrics including fleet MTBF and failure probability.

ServingModel

core.solver.ServingModel()

Analyzes the two-phase LLM serving lifecycle: Pre-fill vs. Decoding.

LLM inference is not a single mathematical operation; it is a stateful process with two distinct physical regimes:

  1. Pre-fill Phase: The initial processing of the input prompt. This is a ‘Compute Beast’ phase where all prompt tokens are processed in parallel, saturating the GPU’s arithmetic units.
  2. Decoding Phase: The token-by-token generation. This is a ‘Bandwidth Hog’ phase. Because the model must read all parameters from memory just to generate a single token, it is limited entirely by HBM bandwidth.

This solver also models the KV-Cache, the memory required to store previous token states, which grows linearly with sequence length and batch size, eventually hitting the ‘Memory Wall’.

Methods

Name Description
solve Solves for LLM serving performance.
solve
core.solver.ServingModel.solve(
    model,
    hardware,
    seq_len,
    batch_size=1,
    precision='fp16',
    efficiency=0.5,
)

Solves for LLM serving performance.

Parameters
Name Type Description Default
model TransformerWorkload The LLM model architecture. required
hardware HardwareNode The target hardware for inference. required
seq_len int The total context window (prompt + generated tokens). required
batch_size int Number of concurrent user requests. 1
precision str Numerical format. Lower precision (INT8/INT4) reduces memory pressure and speeds up the Decoding phase. 'fp16'
efficiency float Compute utilization efficiency, primarily affecting the Pre-fill phase. 0.5
Returns
Name Type Description
Dict[str, Any] Inference metrics including Time-To-First-Token (TTFT), Inter-Token Latency (ITL), and total KV-cache footprint.

SingleNodeModel

core.solver.SingleNodeModel()

Resolves single-node hardware Roofline bounds and feasibility.

This solver handles the ‘Iron Law’ of machine learning systems, calculating whether a model fits in memory and predicting its throughput based on arithmetic intensity.

Methods

Name Description
solve Solves the performance profile for a single hardware node.
solve
core.solver.SingleNodeModel.solve(
    model,
    hardware,
    batch_size=1,
    precision='fp16',
    efficiency=0.5,
    raise_errors=False,
)

Solves the performance profile for a single hardware node.

Parameters
Name Type Description Default
model Workload The model architecture (Transformer, CNN). required
hardware HardwareNode The target hardware specification. required
batch_size int Number of samples per inference/step, by default 1. 1
precision str Numerical precision format (‘fp32’, ‘fp16’, ‘int8’, ‘int4’), by default “fp16”. 'fp16'
efficiency float Hardware utilization efficiency (0.0 to 1.0), by default 0.5. 0.5
raise_errors bool Whether to raise OOMError for infeasible workloads, by default False. False
Returns
Name Type Description
PerformanceProfile The resulting latency, throughput, and bottleneck analysis.

SustainabilityModel

core.solver.SustainabilityModel()

Calculates Datacenter-scale Sustainability metrics.

Handles Power Usage Effectiveness (PUE), Carbon Intensity, and Water Usage Effectiveness (WUE) across different regional grids. This solver models the ‘Infrastructure Tax’ — the energy spent on cooling and power delivery rather than on neural computation.

Methods

Name Description
solve Calculates energy, carbon, and water footprint for a fleet operation.
solve
core.solver.SustainabilityModel.solve(fleet, duration_days, datacenter=None)

Calculates energy, carbon, and water footprint for a fleet operation.

Parameters
Name Type Description Default
fleet Fleet The hardware cluster configuration. required
duration_days float Operating duration in days. required
datacenter Datacenter A specific datacenter profile, defaults to fleet’s region. None
Returns
Name Type Description
Dict[str, Any] Sustainability metrics including total energy (kWh) and carbon (kgCO2e).
Back to top