core.solver.ScalingModel
core.solver.ScalingModel()Analyzes the ‘Scaling Physics’ of model training (Chinchilla Laws).
This solver determines the optimal model size (P) and dataset size (D) given a compute budget (C), following the compute-optimal training regime where D ≈ 20P.
Literature Source: 1. Hoffmann et al. (2022), “Training Compute-Optimal Large Language Models.” 2. Kaplan et al. (2020), “Scaling Laws for Neural Language Models.” 3. McCandlish et al. (2018), “An Empirical Model of Large-Batch Training.”
Methods
| Name | Description |
|---|---|
| solve | Solves for compute-optimal model and dataset parameters. |
solve
core.solver.ScalingModel.solve(compute_budget, target_model_size=None)Solves for compute-optimal model and dataset parameters.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| compute_budget | Quantity | Total training budget (e.g., in TFLOPs or H100-GPU-days). | required |
| target_model_size | Quantity | If provided, calculates the required tokens for this specific model size. | None |
Returns
| Name | Type | Description |
|---|---|---|
| Dict[str, Any] | Optimal parameters, token count, and training duration estimates. |