core.solver.ScalingModel

core.solver.ScalingModel()

Analyzes the ‘Scaling Physics’ of model training (Chinchilla Laws).

This solver determines the optimal model size (P) and dataset size (D) given a compute budget (C), following the compute-optimal training regime where D ≈ 20P.

Literature Source: 1. Hoffmann et al. (2022), “Training Compute-Optimal Large Language Models.” 2. Kaplan et al. (2020), “Scaling Laws for Neural Language Models.” 3. McCandlish et al. (2018), “An Empirical Model of Large-Batch Training.”

Methods

Name Description
solve Solves for compute-optimal model and dataset parameters.

solve

core.solver.ScalingModel.solve(compute_budget, target_model_size=None)

Solves for compute-optimal model and dataset parameters.

Parameters

Name Type Description Default
compute_budget Quantity Total training budget (e.g., in TFLOPs or H100-GPU-days). required
target_model_size Quantity If provided, calculates the required tokens for this specific model size. None

Returns

Name Type Description
Dict[str, Any] Optimal parameters, token count, and training duration estimates.
Back to top