core.solver.ScalingModel

core.solver.ScalingModel()

Analyzes the ‘Scaling Physics’ of model training (Chinchilla Laws).

This solver determines the optimal model size (P) and dataset size (D) given a compute budget (C), following the compute-optimal training regime where D ≈ 20P.

Literature Source: 1. Hoffmann et al. (2022), “Training Compute-Optimal Large Language Models.” 2. Kaplan et al. (2020), “Scaling Laws for Neural Language Models.” 3. McCandlish et al. (2018), “An Empirical Model of Large-Batch Training.”

Methods

Name	Description
solve	Solves for compute-optimal model and dataset parameters.

solve

core.solver.ScalingModel.solve(compute_budget, target_model_size=None)

Solves for compute-optimal model and dataset parameters.

Parameters

Name	Type	Description	Default
compute_budget	Quantity	Total training budget (e.g., in TFLOPs or H100-GPU-days).	required
target_model_size	Quantity	If provided, calculates the required tokens for this specific model size.	`None`

Returns

Name	Type	Description
	Dict[str, Any]	Optimal parameters, token count, and training duration estimates.