solvers.InferenceScalingModel

solvers.InferenceScalingModel()

Models inference-time compute scaling (Wall 12: Reasoning/CoT Cost).

This model quantifies the cost of ‘System-2 thinking’ — inference-time compute scaling via chain-of-thought (CoT) reasoning, where the model generates K intermediate reasoning steps before producing the final answer. Each step incurs the full cost of autoregressive decoding.

Literature Source: 1. Wei et al. (2022), “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.” 2. Snell et al. (2024), “Scaling LLM Test-Time Compute Optimally Can Be More Effective Than Scaling Model Parameters.” 3. OpenAI (2024), “Learning to Reason with LLMs.” (o1 reasoning model.)

Methods

Name	Description
solve	Solves for inference-time reasoning cost.

solve

solvers.InferenceScalingModel.solve(
    model,
    hardware,
    reasoning_steps=8,
    context_length=2048,
    precision='fp16',
    efficiency=0.5,
)

Solves for inference-time reasoning cost.

Parameters

Name	Type	Description	Default
model	TransformerWorkload	The language model used for reasoning.	required
hardware	HardwareNode	The target hardware node.	required
reasoning_steps	int	Number of reasoning steps K (each generates tokens).	`8`
context_length	int	Input context length in tokens.	`2048`
precision	str	Numerical precision.	`'fp16'`
efficiency	float	Compute efficiency factor (0.0 to 1.0).	`0.5`

Returns

Name	Type	Description
	Dict[str, Any]	Total reasoning time, cost per query, and token counts.