solvers.ParallelismOptimizer

solvers.ParallelismOptimizer()

Searches for the optimal 3D/4D parallelism split (DP, TP, PP, EP).

Given a model architecture and a cluster size, this optimizer sweeps the integer design space of parallelism degrees to find the configuration that maximizes Model FLOPs Utilization (MFU).

Literature Source: 1. Narayanan et al. (2021), “Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM.”

Methods

Name Description
solve Searches for the optimal parallelism split.

solve

solvers.ParallelismOptimizer.solve(
    model,
    fleet,
    batch_size,
    precision='fp16',
    efficiency=0.5,
    max_tp=None,
    max_pp=None,
    overlap_comm=True,
)

Searches for the optimal parallelism split.

Back to top