core.solver
core.solver
Classes
| Name | Description |
|---|---|
| SingleNodeModel | Resolves single-node hardware Roofline bounds and feasibility. |
| ServingModel | Analyzes the two-phase LLM serving lifecycle: Pre-fill vs. Decoding. |
| ContinuousBatchingModel | Analyzes production LLM serving with Continuous Batching and PagedAttention. |
| WeightStreamingModel | Analyzes Wafer-Scale inference (e.g., Cerebras CS-3) using Weight Streaming. |
| TailLatencyModel | Analyzes queueing delays and P99 tail latency for deployed inference (M/M/c). |
| DistributedModel | Resolves fleet-wide communication, synchronization, and pipelining constraints. |
| ReliabilityModel | Calculates Mean Time Between Failures (MTBF) and optimal checkpointing intervals. |
| CheckpointModel | Analyzes checkpoint I/O burst penalties and MFU impact. |
| EconomicsModel | Calculates Total Cost of Ownership (TCO) including Capex and Opex. |
| SustainabilityModel | Calculates Datacenter-scale Sustainability metrics. |
DistributedModel
core.solver.DistributedModel()Resolves fleet-wide communication, synchronization, and pipelining constraints.
This solver models the constraints of distributed scale for distributed training. It decomposes a workload across a cluster using 3D Parallelism (DP, TP, PP) and calculates the resulting communication overheads and idle times (bubbles) that determine the Model FLOPs Utilization (MFU).
Methods
| Name | Description |
|---|---|
| solve | Calculates distributed training performance using the 3D Parallelism model. |
solve
core.solver.DistributedModel.solve(
model,
fleet,
batch_size=1,
precision='fp16',
efficiency=0.5,
tp_size=1,
pp_size=1,
microbatch_count=1,
topology_override=None,
)Calculates distributed training performance using the 3D Parallelism model.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| model | Workload | The model architecture to simulate. | required |
| fleet | Fleet | The hardware cluster and network topology. | required |
| batch_size | int | Global batch size. | 1 |
| precision | str | Numerical precision (fp16, fp32, int8). | 'fp16' |
| efficiency | float | Achieved compute efficiency (0.0 to 1.0). | 0.5 |
| tp_size | int | Tensor Parallelism degree. Splits individual layers across GPUs, usually within a single node over high-speed NVLink. | 1 |
| pp_size | int | Pipeline Parallelism degree. Chains model layers across multiple nodes, introducing ‘pipeline bubbles’ while saving memory. | 1 |
| microbatch_count | int | Number of microbatches (M). Increasing M reduces the pipeline bubble but increases synchronization overhead. | 1 |
| topology_override | str | Force a specific topology (ring, tree). | None |
Returns
| Name | Type | Description |
|---|---|---|
| Dict[str, Any] | Metrics including DP/TP latency, the Pipeline Bubble penalty, and the final Scaling Efficiency. |
EconomicsModel
core.solver.EconomicsModel()Calculates Total Cost of Ownership (TCO) including Capex and Opex.
Combines hardware costs, energy consumption, and maintenance into a single financial model for the fleet. This solver exposes the ROI of architectural efficiency by showing how reducing power draw or increasing throughput directly impacts the bottom line.
Methods
| Name | Description |
|---|---|
| solve | Calculates the TCO for a fleet over a specified duration. |
solve
core.solver.EconomicsModel.solve(fleet, duration_days, kwh_price=0.12)Calculates the TCO for a fleet over a specified duration.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| fleet | Fleet | The hardware cluster configuration. | required |
| duration_days | float | Operation duration in days. | required |
| kwh_price | float | Price of electricity per kWh, by default 0.12. | 0.12 |
Returns
| Name | Type | Description |
|---|---|---|
| Dict[str, Any] | Financial metrics including CapEx, OpEx, and total TCO. |
ReliabilityModel
core.solver.ReliabilityModel()Calculates Mean Time Between Failures (MTBF) and optimal checkpointing intervals.
This solver handles the reliability modeling of massive clusters, helping determine the ‘Goodput’ of long-running training jobs. It identifies the probability of a job failure before completion and calculates the Young-Daly optimal interval to minimize wasted compute time.
Methods
| Name | Description |
|---|---|
| solve | Calculates reliability and checkpointing metrics for a fleet. |
solve
core.solver.ReliabilityModel.solve(
fleet,
job_duration_hours,
checkpoint_time_s=60.0,
)Calculates reliability and checkpointing metrics for a fleet.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| fleet | Fleet | The hardware cluster configuration. | required |
| job_duration_hours | float | Total wall-clock duration of the training job. | required |
| checkpoint_time_s | float | Time taken to save a single checkpoint, by default 60.0. | 60.0 |
Returns
| Name | Type | Description |
|---|---|---|
| Dict[str, Any] | Reliability metrics including fleet MTBF and failure probability. |
ServingModel
core.solver.ServingModel()Analyzes the two-phase LLM serving lifecycle: Pre-fill vs. Decoding.
LLM inference is not a single mathematical operation; it is a stateful process with two distinct physical regimes:
- Pre-fill Phase: The initial processing of the input prompt. This is a ‘Compute Beast’ phase where all prompt tokens are processed in parallel, saturating the GPU’s arithmetic units.
- Decoding Phase: The token-by-token generation. This is a ‘Bandwidth Hog’ phase. Because the model must read all parameters from memory just to generate a single token, it is limited entirely by HBM bandwidth.
This solver also models the KV-Cache, the memory required to store previous token states, which grows linearly with sequence length and batch size, eventually hitting the ‘Memory Wall’.
Methods
| Name | Description |
|---|---|
| solve | Solves for LLM serving performance. |
solve
core.solver.ServingModel.solve(
model,
hardware,
seq_len,
batch_size=1,
precision='fp16',
efficiency=0.5,
)Solves for LLM serving performance.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| model | TransformerWorkload | The LLM model architecture. | required |
| hardware | HardwareNode | The target hardware for inference. | required |
| seq_len | int | The total context window (prompt + generated tokens). | required |
| batch_size | int | Number of concurrent user requests. | 1 |
| precision | str | Numerical format. Lower precision (INT8/INT4) reduces memory pressure and speeds up the Decoding phase. | 'fp16' |
| efficiency | float | Compute utilization efficiency, primarily affecting the Pre-fill phase. | 0.5 |
Returns
| Name | Type | Description |
|---|---|---|
| Dict[str, Any] | Inference metrics including Time-To-First-Token (TTFT), Inter-Token Latency (ITL), and total KV-cache footprint. |
SingleNodeModel
core.solver.SingleNodeModel()Resolves single-node hardware Roofline bounds and feasibility.
This solver handles the ‘Iron Law’ of machine learning systems, calculating whether a model fits in memory and predicting its throughput based on arithmetic intensity.
Methods
| Name | Description |
|---|---|
| solve | Solves the performance profile for a single hardware node. |
solve
core.solver.SingleNodeModel.solve(
model,
hardware,
batch_size=1,
precision='fp16',
efficiency=0.5,
raise_errors=False,
)Solves the performance profile for a single hardware node.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| model | Workload | The model architecture (Transformer, CNN). | required |
| hardware | HardwareNode | The target hardware specification. | required |
| batch_size | int | Number of samples per inference/step, by default 1. | 1 |
| precision | str | Numerical precision format (‘fp32’, ‘fp16’, ‘int8’, ‘int4’), by default “fp16”. | 'fp16' |
| efficiency | float | Hardware utilization efficiency (0.0 to 1.0), by default 0.5. | 0.5 |
| raise_errors | bool | Whether to raise OOMError for infeasible workloads, by default False. | False |
Returns
| Name | Type | Description |
|---|---|---|
| PerformanceProfile | The resulting latency, throughput, and bottleneck analysis. |
SustainabilityModel
core.solver.SustainabilityModel()Calculates Datacenter-scale Sustainability metrics.
Handles Power Usage Effectiveness (PUE), Carbon Intensity, and Water Usage Effectiveness (WUE) across different regional grids. This solver models the ‘Infrastructure Tax’ — the energy spent on cooling and power delivery rather than on neural computation.
Methods
| Name | Description |
|---|---|
| solve | Calculates energy, carbon, and water footprint for a fleet operation. |
solve
core.solver.SustainabilityModel.solve(fleet, duration_days, datacenter=None)Calculates energy, carbon, and water footprint for a fleet operation.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| fleet | Fleet | The hardware cluster configuration. | required |
| duration_days | float | Operating duration in days. | required |
| datacenter | Datacenter | A specific datacenter profile, defaults to fleet’s region. | None |
Returns
| Name | Type | Description |
|---|---|---|
| Dict[str, Any] | Sustainability metrics including total energy (kWh) and carbon (kgCO2e). |